Brain power Quality of science and science communication

Testing the reliability of research data

Adi Gaskell30 Jul 2018

385 2 minutes read

Originally posted on The Horizons Tracker.

The open science movement has attempted to improve the reliability and reputation of science by ensuring that all data used in scientific research is made publicly available, both for external scrutiny of the data and so that studies can be reproduced.

Whilst the movement is gathering pace, it’s still not the case across the board, especially in certain disciplines that appear to be more reluctant to open up. Whilst most of the time, this reticence is purely down to cultural or practical reasons, but there are also sadly times where it is designed to cover up the use of dubious data.

A recent paper¹ describes an AI-based system that is designed to examine data for its reliability. The work, by researchers at the University of Illinois at Urbana-Champaign, is capable of reconstructing all possible data sets that could give rise to the eventual results of the research. All they require is the mean, standard deviation and a few data points.

Data reliability

The system, which they refer to as CORVIDS (Complete Recovery of Values in Diophantine Systems), is designed to test the reliability of the research findings. If it’s able to reconstruct valid data sets, the user can then assess whether this data looks plausible or not.

The system first understands the linear equations from which the statistics being examined are calculated. From this, it then explores all combinations of numbers that can solve these equations. It’s an approach that was first proposed in the third century AD by Diophantus of Alexandria, but suffice to say, he didn’t have the computing power to do the job effectively.

This data is turned into histograms, which are then turned into a three-dimensional chart that makes it easier to spot any unusual patterns. Whilst this makes analysis relatively easy, the process of analyzing the data can take a few hours to run.

The team believe that initially the application will be used by editors and reviewers to identify problems with the papers that are submitted to their journals as early as possible so that they can be discussed with the authors. This is generally a much easier process than submitted the full data set alongside their paper.

The system is unlikely to fix the challenges facing the scientific industry on its own, but is a nice step in the right direction, and will make finding a hiding place for poor science harder to achieve.

Article source: Testing The Reliability Of Research Data.

Header image source: Image 2562325 by StockSnap on Pixabay is in the Public Domain.

Reference:

Wilner, S., Wood, K., & Simons, D. J. (2018). Complete recovery of values in Diophantine systems (CORVIDS). ↩

Rate this post

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.

Data reliability

Adi Gaskell

Related Articles

Tougher action needed in the fight against scientific fraud

Simple data visualisations have become key to communicating about the COVID-19 pandemic, but we know little about their impact

In science we trust… up to a point

Open access to scholarly knowledge in the digital era (chapter 2.4): Intersections between artistic making and scientific knowing