Data Analysis & Evaluation

Data Analysis and Evaluation

Data analysis and evaluation write-ups are essentially a statement of results. Rather than simply present a table of numbers, figures and statistics, the results are presented in a format that is easily understood by the reader.  Most of the time, the data analysis write-up will not include raw data; rather the raw data and basic statistics are placed in an appendix (if included at all). The data analysis is based on explanatory text that guides the reader's attention to significant results.

As explained by Carnegie Mellon University, effective data analysis write ups are dependent upon the organization of information and the inclusion of sufficient descriptive detail.

Qualities of Most Effective Data Analysis Write-Ups

The most effective data write-ups will consider the reader of the document, assuming that the general reader (that is, someone who is not an expert in statistics but who has a general knowledge of the concepts being described) should be able to read the document and understand the data. The writer does not assume that the reader has already read the data set, nor should the writer assume that the reader already has a fixed opinion of the "right" answer. In all cases, it is the job of the writer to produce a logical argument, drawing on data as facts and extrapolating from the data to support a particular opinion.

  • Includes an introduction and/or contextualizes the data. This introduction will typically provide information about what the data set represents and why the data are being analyzed. Another option for an interesting introduction is to begin with an opinion.

Ex. of contextualizing introduction: "The data set that was analyzed is a record of the launch temperatures of the Challenger space shuttle and the number of O-rings that failed at each launch up until the Challenger disaster on 1/28/86."

Ex. of introduction with strong opinion: "The space shuttle should not have been launched." (The reader is now prepared to read information explaining why the writer believes this.)

Ex. of introduction with strong opinion: "The data of the seven flights with O-ring failures is deceiving."

  • Provides information about why the data was collected or what the data represents.

Ex. of explaining why the data is being analyzed: "We have been asked to analyze the rate of O-ring failure given the ambient temperature and to give a recommendation as to whether or not today's Challenger flight should proceed."

  • Explains the significance of the data that is being presented.

Ex. of explaining significance: "From the presence of this outlier, I have learned that outliers can have a very big effect on normal probability curves." (Notice that the writer is explaining what he or she has learned from this data analysis.)

  • Indicates significant features of the data. These features might include: outliers, skewness, range of data, correlation coefficients, regression lines, etc.
  • Describes data both abstractly and in detail.

Ex. of abstracted data: "The slope of the regression line that does include zero O-ring failures is noticeably steeper, indicating that as temperatures decrease, failures will increase substantially." (Notice here that the author is describing the plots as "steeper" and is drawing a reasonable conclusion.)
Ex. of detailed data:
"The -.9951 is very close to -1.0, indicating that the variables are largely negatively correlated, meaning that a decrease in temperature increases the likelihood of failure." (Notice here that the writer states a fact and then draws a conclusion ["meaning that"].)

  • Offers a reasonable conclusion.

Ex. of reasonable conclusion: "All of these factors indicate that the Challenger should not have been launched on January 28, 1996."
Ex. of reasonable conclusion: "Our recommendation is to abort todayÕs flight due to extreme weather conditions and postpone the flight until the temperature is within the range of the distribution of the previous flight days, as shown in our first figure."

  • Considers the needs of the audience in reading the analysis.

Ex. of orienting the reader: "When you look at the data, you see..."

  • Uses appropriate transitional words and phrases.

Ex. of appropriate use of transitions: "However, it is always dangerous to extrapolate outside of the data set being interpreted."

Qualities of Moderately Effective Data Analysis Write-Ups

The difference between the most effective data analysis write-ups and those that are moderately effective is a difference in the number of features that exist within the write-up. In general, the moderately effective write-ups will identify significant features of the data (that is, will be able to do the analysis effectively), will include information about the data that is both specific and general (or abstract), and will offer reasonable conclusions. The most effective data analysis write-ups will include more contextualizing information. Types of contextualizing information include: an introduction that explains why the data is interesting or important, a statement of the significance of the data, and the use of appropriate transitional words and phrases. (See above for examples of appropriate kinds of phrases or sentences.)

  • Omits an introduction or other contextualizing information.
  • May fail to indicate the link between the data and the real world situation from which the data has been collected. (May not explain, or may not explain adequately, what the data represents.)
  • May not explain the significance of the data.
  • Indicates significant features of the data. These features might include: outliers, skewness, range of data, correlation coefficients, regression lines, etc.
  • Describes data both abstractly and in detail.
  • Offers a reasonable conclusion.
  • May not consider the needs of the reader in reading the analysis.
  • May or may not use appropriate transitional words or phrases.

Qualities of Least Effective Data Analysis Write-Ups

The least effective data analysis write-ups may actually include some of the features identified as markers of the most effective data analysis write-ups. However, the least effective write-ups will typically omit a major portion of the expected data analysis. An example would be providing lots of detail (e.g., the temperatures of a number of days, including for outliers) without ever explicitly stating that outliers exist or explaining why the outliers are significant.

Another type of problem includes clarity or precision with language. An example of this can be seen in the following sentence: "As far as the normality of the data, the data appears non-normal, primarily because of the low temperatures." The problem here is with precision: 1) data does not appear "non-normal," although distributions can be non-normal and 2) low temperatures cannot "cause" data to appear non-normal. (In fact, outliers can cause a distribution to appear non-normal.)

Other least effective write-ups may contain insufficient analysis, may overlook major features of the data, may fail to generalize from the data, or may not add sufficient detail.

Specific examples of problems are given below:

  • Offers little or no introduction and little to no contextualization of the data. Instead, immediately begins describing data without describing the significance of the data.

Ex. of poor introduction: "There is only one outlier in the data." (Keep in mind that this is the first sentence that the reader will see.)

  • Assumes that the reader will necessarily look at the boxplots or other graphic representations of the data and will find in that data the significance of the data.

Ex. of faulty assumption about reader: "See attached plot. As you can see, the plot shows an outlier." (Note that the writer does not reveal enough information to show that he or she knows what the outlier is. Also, the writer does not explain the significance of the outlier.

For more information on writing up data analysis, see:

Viewed 3,933 times