Application: Data Analysis Procedures

To illustrate the application of these steps, consider a SoTL study using an independent-samples t-test to compare student engagement levels between two teaching methods.

1. Data Preparation:

a. Data Integration: Begin by aggregating all relevant data into a single spreadsheet. For example, compile student engagement scores, demographic information, and teaching method identifiers into one file to ensure each student's data is matched correctly.

b. Data Cleaning: Identify and handle missing data using mean imputation or listwise deletion techniques. Mean imputation offers the opportunity to keep data records that are missing only a few responses through a mathematical calculation to impute (or substitute) a value for the missing value based on other values. If many responses are missing, listwise deletion (deleting the record) may be the only alternative. Address outliers by examining their potential impact on the analysis and deciding whether to exclude or transform them. Tabachnick and Fidell (2018) provide an excellent discussion on why outliers may be present:

1. Incorrect data entry. The data may have been manually entered into the data set incorrectly. If you are downloading data from an archival data set, this is not likely unless you have reason to suspect the source data is in error.

2. Failure to specify missing value codes in computer data. This can be prevalent in archival data where a value, such as ‘9’ indicates missing data but could be misinterpreted as an outlier value.

3. The outlier is not a member of the population. For example, a study that includes college athletes but one participant is also an Olympic athlete may show that the Olympic athlete has different characteristics from the rest of the group.

4. The distribution for the variable in the population has more extreme values than a normal distribution. In other words, not all distributions are normal (Gaussian). Teacher experience is an example of this; there are many teachers with few years of experience and few teachers with many years of experience, resulting in an exponential distribution. This may give the illusion of outliers.

Keeping all data, including outliers, is a good practice unless there is a justifiable reason for removing the record (e.g., not genuine data, responses all the same, etc.). Removing data because it does not fit an expected distribution is not an appropriate reason.

c. Data Transformation: Convert raw engagement scores into standardized scores if needed. For instance, if engagement scores are recorded on different scales, normalize them to a common scale. Categorical variables may be coded for regression analysis using indicator (dummy) coding for regression (Hayes, 2022).

2. Exploratory Data Analysis:

a. Descriptive Statistics: Calculate means, standard deviations, minimum and maximum values, skewness, and kurtosis for continuous variables and frequencies for categorical variables. For example, summarize the average engagement levels for each teaching method and present the standard deviation, minimum and maximum values, and skewness and kurtosis to describe the central tendency and shape of the distribution.

b. Data Visualization: Create histograms or box plots to visualize the distribution of the variables. Use scatter plots to examine relationships between continuous variables. For example, create histograms of the engagement scores for each teaching method to visualize the distributions. Box plots of engagement scores for each teaching method will assist in identifying univariate outliers.

c. Correlation Analysis: Calculate correlation coefficients to assess the relationships between engagement scores and other variables. For example, examine whether there is a significant correlation between engagement scores and class size.

3. Tests of Assumptions: Each statistical analysis has certain requirements of the data. For example, an independent-samples t-test, which tests for differences in a continuous dependent variable between two groups of the independent variable, requires the distributions of the dependent variable for each group to be normal, and for there to be equal variances between groups. Meeting these requirements allows one to complete the independent-samples t-test, while not meeting these requirements requires one to select a different test based on the nature of the violation. Common tests for parametric analyses include checking for normality, such as the Shapiro-Wilk test, and testing for homogeneity of variances using Levene's test. The website statistics.laerd.com provides excellent guidance on conducting the tests of assumptions and completing the analyses.

4. Hypothesis Testing: Hypothesis testing is the reason you do research. The results of hypothesis testing inform the researcher whether there is sufficient evidence to reject the null hypothesis or not.

a. Formulate the Hypothesis: Based on the research question developed earlier (see Developing the Research Question), formulate the null hypothesis. In the example we have been using,

H₀: There is no difference in engagement levels between the two teaching methods

H₁: There is a difference in engagement levels between the two teaching methods

Note that the example is a two-tailed test, meaning we are only testing to see if there is a difference, not if one group has higher engagement than the other, which is a one-tailed test. A two-tailed test is recommended unless there is strong theoretical support for a one-tailed test.

b. Select the Statistical Test: You will select the statistical test based on the hypothesis you are testing. For example, comparisons between groups use independent-samples t-tests or analysis of variance (ANOVA). Single-sample tests that explore relationships use regression and allow you to examine the effect of multiple predictors on a single criterion. For the example of determining if there is a difference in engagement scores between teaching methods, you would choose the independent-samples t-test because the study aims to compare the means of engagement scores between two independent groups.

c. Conduct the Test: Use software tools like Microsoft Excel, SPSS, Python, or R to perform the statistical analysis. All the listed tools can perform univariate and many multivariate tests, so select the tool that is within your skillset and comfort. Enter the data, specify the groups, and run the analysis to obtain the results. For the independent-samples t-test, you must ensure you clearly define the independent variable (teaching method) and the dependent variable (engagement scores). The software will take care of the analysis.

5. Interpret the Results: Interpret the results by examining the p-value, confidence intervals, and effect size. If the p-value is less than the chosen significance level (e.g., .05), reject the null hypothesis and conclude that there is a significant difference in engagement levels between the two teaching methods. It is also important to interpret the effect size. The effect size is a measure of the relative change. A large effect size indicates the change was very noticeable with a large difference. On the other hand, a small effect size—even if statistically significant—indicates the relative difference between groups is minor or trivial. A significant result with a trivial effect size means that the difference was sufficient to differentiate it from zero, but the change was not large enough to make a difference. This regularly occurs with large sample sizes. See Cohen (1992) for more information about effect size.

6. Reporting the Results: Present the results in APA format, including a description of the statistical test, the results (e.g., t-value, degrees of freedom, p-value), and a narrative interpretation. For example:

"An independent-samples t-test was conducted to compare student engagement levels between traditional and flipped classroom teaching methods. The data were tested for the assumptions of an independent-samples t-test, and there were no violations of the assumptions. There was a significant difference in engagement levels between the traditional (M = 3.5, SD = 0.8) and flipped (M = 4.2, SD = 0.7) teaching methods; t(58) = 3.25, p = .002, d = .52. These results suggest that the flipped classroom method significantly enhances student engagement compared to the traditional method with a medium effect size."

SoTL Helper (AI - POE external)

RR: SoTL

RR: Qualitative

RR: Quantitative

Center for Innovation in Research and Teaching

SoTL Research Foundations: Data Analysis Procedures

1. Data Preparation:

Optional

Data Analysis Procedures