Multiple
Regression Analysis using SPSS Statistics
Name: Abdullahi Abdikarim
Nim: 18510195
Mata kuliah: Statistic
Dosen: M. Nanang
Choirudidin,SE, MM
Introduction
Multiple regression is an extension of simple linear
regression. It is used when we want to predict the value of a variable based on
the value of two or more other variables. The variable we want to predict is
called the dependent variable (or sometimes, the outcome, target or criterion
variable). The variables we are using to predict the value of the dependent
variable are called the independent variables (or sometimes, the predictor,
explanatory or regressor variables).
For example, you could use multiple regression to understand
whether exam performance can be predicted based on revision time, test anxiety,
lecture attendance and gender. Alternately, you could use multiple regression
to understand whether daily cigarette consumption can be predicted based on
smoking duration, age when started smoking, smoker type, income and gender.
Multiple regression also allows you to determine the overall
fit (variance explained) of the model and the relative contribution of each of
the predictors to the total variance explained. For example, you might want to
know how much of the variation in exam performance can be explained by revision
time, test anxiety, lecture attendance and gender "as a whole", but
also the "relative contribution" of each independent variable in
explaining the variance.
Example
A health researcher wants to be able to predict "VO2max",
an indicator of fitness and health. Normally, to perform this procedure
requires expensive laboratory equipment and necessitates that an individual
exercise to their maximum (i.e., until they can longer continue exercising due
to physical exhaustion). This can put off those individuals who are not very
active/fit and those individuals who might be at higher risk of ill health
(e.g., older unfit subjects). For these reasons, it has been desirable to find
a way of predicting an individual's VO2max based on attributes that
can be measured more easily and cheaply. To this end, a researcher recruited
100 participants to perform a maximum VO2max test, but also recorded
their "age", "weight", "heart rate" and
"gender". Heart rate is the average of the last 5 minutes of a 20
minute, much easier, lower workload cycling test. The researcher's goal is to
be able to predict VO2max based on these four attributes: age,
weight, heart rate and gender.
Setup in
SPSS Statistics
In SPSS Statistics, we created six variables: (1) VO2max,
which is the maximal aerobic capacity; (2) age, which is the participant's age;
(3) weight, which is the participant's weight (technically, it is their
'mass'); (4) heart_rate, which is the participant's heart rate; (5) gender,
which is the participant's gender; and (6) caseno, which is the case number.
The caseno variable is used to make it easy for you to eliminate cases (e.g.,
"significant outliers", "high leverage points" and
"highly influential points") that you have identified when checking
for assumptions. In our enhanced multiple regression guide, we show you how to
correctly enter data in SPSS Statistics to run a multiple regression when you
are also checking for assumptions.
- Click Analyze > Regression > Linear... on the main menu, as shown below:
Published with written permission
from SPSS Statistics, IBM Corporation.
Note:
Don't worry that you're selecting Analyze
> Regression > Linear... on the main menu or that the
dialogue boxes in the steps that follow have the title, Linear Regression.
You have not made a mistake. You are in the correct place to carry out the
multiple regression procedure. This is just the title that SPSS Statistics
gives, even when running a multiple regression procedure.
- You will be presented with the Linear Regression dialogue box below:
Published with written permission
from SPSS Statistics, IBM Corporation.
- Transfer the dependent variable, VO2max, into the Dependent: box and the independent variables, age, weight, heart rate and gender into the Independent(s): box, using the ⤹ buttons, as shown below (all other boxes can be ignored):
Published with written permission
from SPSS Statistics, IBM Corporation.
Note: For a standard multiple
regression you should ignore the Previous and Next buttons as they are for sequential (hierarchical) multiple
regression. The Method: option needs to be kept at the default value,
which is Enter. If, for whatever reason, Enter is not selected, you need to change Method: back to . The Enter method is the name given by SPSS Statistics to standard
regression analysis.
- Click
on the Statistics button. You will be presented with the Linear
Regression: Statistics dialogue box, as shown below:
Published with written permission
from SPSS Statistics, IBM Corporation.
- In
addition to the options that are selected by default, select Confidence
intervals in the –Regression Coefficients– area leaving the Level(%):
option at "95". You will end up with the following screen:
Published with written permission
from SPSS Statistics, IBM Corporation.
- Click
on the Continue button. You will be returned to the Linear
Regression dialogue box.
- Click
on the OK button. This will generate the output.
Interpreting
and Reporting the Output of Multiple Regression Analysis
SPSS Statistics will generate quite a few tables of output
for a multiple regression analysis. In this section, we show you only the three
main tables required to understand your results from the multiple regression
procedure, assuming that no assumptions have been violated. A complete
explanation of the output you have to interpret when checking your data for the
eight assumptions required to carry out multiple regression is provided in our
enhanced guide. This includes relevant scatterplots and partial regression
plots, histogram (with superimposed normal curve), Normal P-P Plot and Normal
Q-Q Plot, correlation coefficients and Tolerance/VIF values, casewise
diagnostics and studentized deleted residuals.
However, in this "quick start" guide, we focus
only on the three main tables you need to understand your multiple regression
results, assuming that your data has already met the eight assumptions required
for multiple regression to give you a valid result:
Determining
how well the model fits
The first table of interest is the Model Summary
table. This table provides the R, R2, adjusted R2,
and the standard error of the estimate, which can be used to determine how well
a regression model fits the data:
Published with written permission from SPSS Statistics, IBM
Corporation.
The "R" column represents the value of R,
the multiple correlation coefficient. R can be considered
to be one measure of the quality of the prediction of the dependent variable;
in this case, VO2max. A value of 0.760, in this example, indicates a
good level of prediction.
The "R Square" column represents the
R2 value (also called the coefficient of determination),
which is the proportion of variance in the dependent variable that can be
explained by the independent variables (technically, it is the proportion of
variation accounted for by the regression model above and beyond the mean
model). You can see from our value of 0.577 that our independent variables
explain 57.7% of the variability of our dependent variable, VO2max.
However, you also need to be able to interpret "Adjusted R Square"
(adj. R2) to accurately report your data. We explain the
reasons for this, as well as the output, in our enhanced multiple regression
guide.
The F-ratio in the ANOVA table (see below)
tests whether the overall regression model is a good fit for the data. The
table shows that the independent variables statistically significantly predict
the dependent variable, F(4, 95) = 32.393, p < .0005 (i.e.,
the regression model is a good fit of the data).
Published with written permission from SPSS Statistics, IBM
Corporation.
Estimated
model coefficients
The general form of the equation to predict VO2max
from age, weight, heart rate, gender, is:
predicted VO2max = 87.83
– (0.165 x age) – (0.385 x weight) – (0.118 x heart rate) + (13.208 x gender). This is obtained from the Coefficients table, as
shown below:
Published with written permission from SPSS Statistics, IBM
Corporation.
Unstandardized coefficients indicate how much the dependent
variable varies with an independent variable when all other independent
variables are held constant. Consider the effect of age in this example. The
unstandardized coefficient, B1, for age is equal to -0.165 (see Coefficients
table). This means that for each one year increase in age, there is a decrease
in VO2max of 0.165 ml/min/kg.
Statistical
significance of the independent variables
You can test for the statistical significance of each of the
independent variables. This tests whether the unstandardized (or standardized)
coefficients are equal to 0 (zero) in the population. If p < .05, you
can conclude that the coefficients are statistically significantly different to
0 (zero). The t-value and corresponding p-value are located in
the "t" and "Sig." columns, respectively, as
highlighted below:
Published with written permission from SPSS Statistics, IBM
Corporation.
You can see from the "Sig." column that all
independent variable coefficients are statistically significantly different
from 0 (zero). Although the intercept, B0, is tested for statistical
significance, this is rarely an important or interesting finding.
Putting it
all together
You could write up the results as follows:
- General
A multiple regression was run to predict VO2max
from gender, age, weight and heart rate. These variables statistically
significantly predicted VO2max, F(4, 95) = 32.393, p <
.0005, R2 = .577. All four variables added statistically
significantly to the prediction, p < .05.
REFERENCE
Armstrong, J. S. (2011).
Illusions in Regression Analysis. ScholarlyCommons. Retrieved from
Dhakal, C.P. (2016). Optimizing multiple regression model for rice production forecasting in
Nepal.(Doctoral thesis, Central Department of Statistics, Tribhuvan
University Nepal).
Dion, P.A. (2008).
Interpreting structural equation modeling results: A reply to Martin and Cullen. Journal of Business Ethics, 83(3), 365–368. Springer,
Stable URL http://www.jstor.org/stable/25482382 Retrieved from https://www.jstor.org/stable/25482382?seq=1#page_scan_tab_contents
Accessed on 15 June 2018.
Example of interpreting and
applying a multiple regression
model. (n.d). Retrieved from
http://psych.unl.edu/psycrs/statpage/full_eg.pdfAccessed
on 11 June 2018.
Frost, J. (2017). How to
interpret R-squared in regression
analysis. Retrieved
fromhttp://statisticsbyjim.com/regression/interpret-r-squared-regression/
Accessed on 02 June 2018.
Grace, B. J., &Bollen,
A. K. (2005). Interpreting the results from
multiple regression and stru
ctural equation models. Bulletin of the
Ecological Society of America, 86(4), 283 – 295. ISSN:0012-9623, EISSN:2327-6096, doi: 10.1890/0012-9623(2005)86[283:ITRFMR]2.0.CO;2
Guthery, F.S., &
Bingham, R. (2007). A primer on
interpreting regressionmodels. Journal of Wildlife Management, 71(3) 684 – 692. ISSN:0022-541X, EISSN:1937-2817.
The Wildlife Society doi:10.2193/2006-285
Interpreting regression
output (Without all the statistics theory).
(n.d). GraduateTutor.com. Retrieved from http://www.graduatetutor.com/statistics-tutor/interpreting-regression-output/
Accessed on 29 May 2018.
Klees, J. S. (2016).
Inferences from regression analysis: Are they
valid? University of Maryland, USA. Retrieved
from http://www.paecon.net/PAEReview/issue74/Klees74.pdf Accessed on 13
June 2018
Martin, K. G. (2018).
Interpreting Regression Coefficients. The Analysis Factor. Retrieved from https://www.theanalysisfactor.com/interpreting-regression-coefficients/Accessed
on 13 June 2018.
McCabe, G.P. (1980). The
interpretation of regression analysis results in
sex and race discrimination problems. The American Statistician 34(4) 212-215. ISSN:0003-1305 EISSN:1537-2731, American Statistical Association,doi: 10.1080/00031305.1980.10483030.
Miler, J.E. (n.d).
Interpreting the substantive significance of
multivariable regression coefficients., Rutgers University.
Retrieved from http://www.statlit.org/pdf/2008MillerASA.pdf Accessed on 13
June 2018.
.
Thank you so much, this blog is very helpful in understanding statistics
BalasHapus