Correlational analysis: Single and Multiple Regression
What is it?
There are two often used tools in inferential statistics - the t-test, and regression analysis. Both of these techniques are in the family of General Linear Modeling (GLM). This blog will focus on regression analysis. Essentially, regression analysis helps us to discover potential cause-effect relationships between the independent variable/s and the target, or dependent, variable. There can be many independent variables, or just one. But there is always only ONE dependent variable!
When should I use it?
You will want to use simple or multiple regression when you believe that there is a linear cause and effect relationship between two or more variables. An important distinction between a related t-test and regression is that with regression, we are not comparing two or more group's averages with themselves, but rather with regression we want to calculate the exact relationship between the inputs and the output. Confused yet? Perhaps the example that follows may help!
What is an example of it?
In a simple case of regression, we might want to find out how many liters of water an athlete consumes given the number of miles that he runs in a day. We have to assume that the volume of water consumed increases in a linear way with each mile that is run. This assumption may not work for every scenario, so the model has to be back-tested with actual data. After we collect the data, we have to plot the data points on an x-y Cartesian plane. Indeed, the concomitant regression "line of best fit" when we draw a line between the data points is of the form y=mx+b, with b as the y-intercept (when x is 0), and m is the constant, x is the number of miles run, and y is the number of liters consumed. When we do this, SPSS and other software packages provide an "R-squared value" of -1 to 1. Generally, anything below a -.7 or above a .7 R^2 value means that there is a high degree of predictability between the number of miles run and the volume of water consumed. For example, in this case, a potential equation might be y=.7x+2. So, if the athlete ran three miles, the equation would be y=.7(3)+2 - which would mean that the athlete consumes 4.1 liters of water.
In the education industry, the number of hours that a learner studies for an exam (the independent variable) might correlate directly with the score that the learner receives on a standardized exam. Of course, most experienced educators would probably agree that this cause-effect relationship with standardized tests is not necessarily the case. However, it may provide some insight into just how much study preparation time affects the final score.
How can I find out more about regression?
The Social Research Methods website that I included in an earlier blog is one of the best sources for researchers like you. A link to the exact webpage that contains this information is below.
I will also be blogging in the future about this and other related topics. Stay tuned!
Page Options