HBC203: Identify the most appropriate statistical test to conduct if the company wants to determine whether the proportion of people in each ethnic group in the sample is different from that in the population. Statistics and Data Analysis for the Social and Behavioural Sciences assignment

**Module / Subject / School**:

HBC203 Statistics and Data Analysis for the Social and Behavioural Sciences

Singapore University of Social Sciences

**Requirements: **

Question 1 (18 marks)

A beverage company conducted a study examining whether people preferred the taste of their soft drinks to the taste of their competitors’ soft drinks. In the study, there were 216 Chinese, 171 Malays, 103 Indians, and 90 people from other ethnic groups (Others). The company wants to know whether the proportion of people in each ethnic group in the sample is the same as the corresponding proportion in the population. According to census data, as of 2021, the proportions of the ethnic groups in the population are: Chinese (74.25%), Malays (13.66%), Indians (8.90%) and Others (3.19%).

Identify the most appropriate statistical test to conduct if the company wants to determine whether the proportion of people in each ethnic group in the sample is different from that in the population. Explain why this is the most appropriate statistical test by providing two reasons in the context of this study. Then, using hand calculations, analyse the data by conducting the statistical test you identified. Show all working. Interpret the results of your data analysis. Explain your answer with reference to the p value and the alpha level. Use an alpha level of .05. You do not need to report the results in APA format for this question.

Question 2 (42 marks)

In 2017, a group of behavioural scientists at National Environment Agency (NEA) wanted to find out if giving out free tissue packets (with a reminder to return the tray to the tray area) would improve tray return rates in hawker centres. They randomly selected 10 hawker centres for a 3-week field study. In the first week (pre-intervention phase), they observed the 10 hawker centres’ customers to establish a baseline tray return rate. In the second week (intervention phase), they recruited volunteers to give out the free tissue packets and recorded the tray return rate during the week. In the final week (post-intervention phase), they stopped giving out the free tissue packets and recorded the tray return rate for each hawker centre. The average tray return rates (%) in the 10 hawker centres for each phase are presented below.

Note. This study is modelled after an actual one conducted in 2017. The data are hypothetical but reflect the findings of that study.

Based on the study design, identify the most appropriate statistical test to conduct. Explain why this is the most appropriate statistical test by providing two reasons in the context of this study. Then, assuming all the assumptions of the statistical test you identified are met and there are no concerns about small sample size, analyse the data using jamovi. Show the jamovi spreadsheet by taking a screenshot of the spreadsheet (i.e., what you see when you click the DATA tab) and pasting it in your answer. The jamovi spreadsheet should be correctly formatted. Further, paste all the output necessary for interpretation of the results in your answer. Interpret fully the results of your data analysis. Explain your answer with reference to the p values and the alpha level, using an alpha level of .05. Discuss how NEA can use the findings to inform its policy. You do not need to report the results in APA format for this question

Question 3 (40marks)

Paul wants to know whether the obesity of politicians in a country predicts the corruption levels in that country. Using a computer vision algorithm that analysed frontal face images to calculate body-mass index (BMI), he estimated the BMI of 299 cabinet ministers from 15 post-Soviet countries who were in office in 2017. For each country, he then calculated the median estimated BMI for the cabinet ministers. Higher values on the median estimated BMI represent greater obesity. He also recorded the Transparency International Corruption Perceptions Index (CPI) score for each country. Lower scores on the CPI represent higher (perceived) levels of corruption in the country, with scores ranging from 0 (highly corrupt) to 100 (not corrupt). The data are shown below.

- For this question, assume that the scale of measurement for both variables is interval. Based on this information and the study design, identify the most appropriate statistical test to conduct. Explain why this is the most appropriate statistical test by providing two reasons in the context of this study. Then, assuming all the assumptions of the statistical test you identified are met and there are no concerns about small sample size, analyse the data using jamovi. Show the jamovi spreadsheet by taking a screenshot of the spreadsheet (i.e., what you see when you click the DATA tab) and pasting it in your answer. The jamovi spreadsheet should be correctly formatted. Further, paste all the output necessary for interpretation of the results in your answer. Report the results in APA format. Use an alpha level of .05 to determine if the result is statistically significant. (32 marks)
- Discuss the changes you would make to Paul’s study if you wanted to generalise the findings of the study to all the countries in the world and explain why you made those changes. (8 marks)

This assignment tests students on their understanding of the concepts.

Based on the assignment requirements provided, here are some comments to help students score well on this assignment:

Question 1:

- To determine whether the proportion of people in each ethnic group in the sample differs from that in the population, the most appropriate statistical test is the Chi-square test for independence.
- Two reasons why this test is suitable for this study are: (1) It assesses the association between two categorical variables (ethnicity and preference for soft drink taste), which is the case in this study. (2) It allows us to compare the observed frequencies with the expected frequencies based on the population proportions, which is crucial to determine any significant differences.
- Perform the Chi-square test using hand calculations and show all the working. After obtaining the test statistic and p-value, interpret the results. If the p-value is less than 0.05 (alpha level), this indicates that there is a significant difference between the sample proportions and the population proportions. You can reject the null hypothesis and conclude that the preferences for soft drink taste vary across ethnic groups.

Question 2:

- The most appropriate statistical test for this study is the Repeated Measures ANOVA. This test is suitable because it analyzes changes within the same group over three time points (pre-intervention, intervention, and post-intervention) for the tray return rates.
- Two reasons why Repeated Measures ANOVA is appropriate are: (1) It accounts for the repeated measures design, where data from the same hawker centers are collected at different time points. (2) It allows us to assess whether there are significant differences in tray return rates across the three phases.
- Analyze the data using jamovi and present the correctly formatted jamovi spreadsheet along with all the output needed for interpretation. If the p-value is less than 0.05 (alpha level), it indicates that there is a significant difference in tray return rates across the three phases. Provide a thorough interpretation of the results, discussing the impact of the intervention phase on tray return rates.
- Discuss how NEA can use the findings to inform its policy. If the intervention phase significantly increased tray return rates, NEA could consider implementing the free tissue packet giveaway as a cost-effective policy to encourage tray return behavior in hawker centers.

Question 3:

- The most appropriate statistical test for this study is the Pearson correlation coefficient. This test assesses the relationship between two interval variables (obesity levels of politicians and corruption levels in the country).
- Two reasons why Pearson correlation is suitable are: (1) Both variables (BMI and CPI scores) are measured on an interval scale. (2) The study aims to examine the strength and direction of the linear relationship between obesity and corruption levels.
- Analyze the data using jamovi and provide a correctly formatted jamovi spreadsheet along with all the output for interpretation in APA format. If the p-value is less than 0.05 (alpha level), it suggests a significant correlation between obesity levels of politicians and corruption levels in the country.
- To generalize the findings to all countries in the world, Paul should use a random or stratified sampling method to select a representative sample of politicians from various countries. This will increase the external validity of the study and allow for more general conclusions about the relationship between obesity and corruption worldwide.

Ensure that your explanations are clear, and your analysis is accurate. Justify your choices of statistical tests and assumptions, and use proper formatting and APA guidelines for reporting the results.

