Laboratory Data Collection
Laboratory Data Collection
Essential questions
What is integrity of data collection?
How to ensure quality of experimental data?
Data collection is a process of gathering, recording and systemic analysis of the experimental outcomes that can be used to address research questions, test hypothesis, or compare the results.
Experimental data that are collected to test the primary hypothesis in experimental settings are primary data. Secondary data can be obtained from other sources and were collected in different experiments. However, secondary data can be useful for the purpose of a particular experimental study. Example: DNA sequencing data can be obtained from the published data bases and utilized as secondary data in genetic experiments.
Integrity of the research data depends on accuracy of data acquisition, recording, storage, as well as proper calibration and use of measuring device. The main sign of data quality is reproducibility of the results of experimental research.
Unless the data were collected properly, the research questions cannot be adequately addressed, and the outcomes of experimental results are not reproducible. This may be misleading to other researchers and can impede the ability to solve the research problem in hands. In the extreme cases, it can be harmful to the research subjects or researchers themselves.
Data integrity can be compromised by systemic or random errors, or, in the extreme cases, by deliberate falsifications. The two main approaches to insure data integrity are:
- Quality assurance – that includes all appropriate activities prior to the beginning of data collection
- Quality control – that takes place during and after data collection in experimental research
Data collection errors can be avoided by following the proper experimental protocol and checking for possible errors in individual data items. Systemic errors can be due to the problems with instrument calibration, instrument drift, experimental setup, or study design.
Digitization of research data can be the source of errors, e.g., by introducing spurious information into large data sets that are not carefully examined. Digital data can be easily distorted and, in the worst-case scenario, fraudulently manipulated, misrepresented or even fabricated.
Federal regulations that were developed by the Office of Science and Technology Policy, identify scientific misconduct as fabrication, falsification, or plagiarism of research results [2].
Once primary data are collected, the researcher commonly cleans up the data set. This includes the removal of data points that are either disconnected from the effect that is being studied in a particular experiment, or are obviously erroneous due to a mistake of data collection, reporting, etc. At this step, there is a chance that valid data points can be removed because they are the product of some effect not sufficiently accounted for by the hypothesis. However, this is a necessary step in every experimental research that involves measured values that will be statistically analyzed.
Suggested readings
Whitney C.W., Lind B.K., Wahl P.W. (1998). Quality assurance and quality control in longitudinal studies. Epidemiologic Reviews, 20(1): 71-80.
Office of Science and Technology Policy, Federal Policy on Research Misconduct. Available at http://ori.dhhs.gov/education/products/RCRintro/c02/b1c2.html.
Van den Broeck, J., Cunningham, S. A., Eeckels, R., Herbst, K. (2005). Data cleaning: detecting, diagnosing, and editing data abnormalities. PLoS medicine, 2(10), e267. doi:10.1371/journal.pmed.0020267
RR:Lab Research Modules
Page Options