PSYM221: (Introduction to Statistics)

PSYM221: Introduction to Statistics

Question 1

A researcher named Sun is examining predictors of subjective health in daily life. 39 participants were tracked over a 2-week period using wearable devices and daily questionnaires. Sun averaged these data to produce a set of variables for each participant:Â

- subjective health (on a 1-100 scale where higher numbers indicate better subjective health),Â
- step count (average number of steps per day),
- interactions with strangers (average number of interactions per day). Ethnicity (East Asian, South Asian, or European) was also recorded.Â

Sun hypothesizes that participants with greater average step counts would report better average subjective health over the 2-week period.

- (15 marks) Test Sunâ€™s hypothesis using correlation analysis. Report the relevant statistical results in full, including whether a two-tailed or one-tailed t-test is most appropriate.

- (15 marks) Run a multiple regression model predicting subjective health from step count and interactions with strangers. What proportion of the variance in subjective health is explained by variance in step count and interactions with strangers combined? Report an appropriate model fit statistic to answer this question.

- (10 marks) Based on your regression model in Question 1b, which of the two predictors shows the strongest association with the DV? Explain how you obtained this information.

- (10 marks) For the model in Question 1b, Sun then tests the assumption of normality of residuals and concludes that this assumption has been violated for this model. Why might Sun have come to this conclusion? Use an appropriate graph to support your answer.

### Question 1a: Correlation Analysis (15 marks)

Step 1: Test the hypothesis

Sun’s hypothesis is about the relationship between step count and subjective health. This is a perfect case for a correlation analysis since you want to see if there’s a statistical association between these two continuous variables.

- Youâ€™ll likely be using Pearsonâ€™s correlation coefficient (r) because youâ€™re working with continuous data. Run the correlation test to see if there is a significant relationship between step count and subjective health.
- Don’t forget to determine whether a one-tailed or two-tailed test is more appropriate. Since Sun has a specific direction in mind (that step count is positively correlated with subjective health), you can justify using a one-tailed t-test. But if you’re unsure about the direction (whether it could go either way), a two-tailed test is safer.

Step 2: Report the results

Once youâ€™ve run the test, you need to report the key values:

- r (the correlation coefficient),
- p-value (to determine significance),
- and the test statistic (if applicable).

Be sure to mention whether the result supports Sun’s hypothesis or not based on the p-value (e.g., if p < 0.05, the result is significant).

### Question 1b: Multiple Regression Model (15 marks)

Here, you’re running a multiple regression analysis to predict subjective health using step count and interactions with strangers as predictors. The key here is to understand the proportion of variance explained by these predictors.

Step 1: Run the regression

- Enter both step count and interactions with strangers as predictors.
- Look at the RÂ˛ value. This tells you the proportion of the variance in subjective health thatâ€™s explained by the two predictors combined.

Step 2: Report the model fit

Make sure you mention the RÂ˛ value in your answer. This is important because it shows how much of the variability in subjective health can be predicted by the combined predictors.

For example: â€śThe regression model explained RÂ˛ = 0.45 of the variance in subjective health, indicating that step count and interactions with strangers together explain 45% of the variability in subjective health.â€ť

### Question 1c: Strongest Predictor (10 marks)

Now you want to figure out which of the two predictors (step count or interactions) has the strongest association with subjective health.

Step 1: Look at the beta coefficients (Î˛)

The beta coefficient for each predictor tells you how strongly itâ€™s associated with the dependent variable (subjective health). The predictor with the larger absolute beta value is the stronger predictor.

For example, if the beta for step count is larger than for interactions, youâ€™d conclude that step count has the stronger association.

Step 2: Explain your answer

In your explanation, briefly mention the beta values and how they relate to the strength of the association. You might say something like: â€śThe beta coefficient for step count (Î˛ = 0.6) is larger than that for interactions with strangers (Î˛ = 0.3), indicating that step count is the stronger predictor of subjective health.â€ť

### Question 1d: Normality of Residuals (10 marks)

For this part, Sun found that the assumption of normality of residuals was violated. This means that the errors in the regression model do not follow a normal distribution, which is a key assumption in linear regression.

Step 1: Why might Sun have concluded this?

The normality of residuals is typically tested using a histogram or a Q-Q plot. If Sun saw a skewed distribution or outliers in these plots, it would indicate that the residuals are not normally distributed. Sun could also use a Shapiro-Wilk test for normality, which would provide statistical evidence of this violation.

Step 2: Use a graph to support your answer

Include a Q-Q plot or a histogram of the residuals to visually demonstrate whether they deviate from normality. If the points on the Q-Q plot deviate significantly from the straight line, this would indicate non-normality.

In your explanation, write something like: â€śThe Q-Q plot showed significant deviations from the normal line, suggesting that the residuals were not normally distributed, which violates the assumption of normality in regression.â€ť

### Overall Tips for the Report:

- Be clear and concise: Always explain your steps and results in simple terms. Avoid overly technical language unless itâ€™s required.
- Use evidence: Whether itâ€™s the p-value, RÂ˛, or beta coefficients, always support your answers with statistical results.
- Visual aids: When required, use graphs or plots to demonstrate points clearly, especially for normality tests.

