National Academies Kicks Off Open Science Study
The National Academies Board on Research Data and Information recently formed an ad hoc committee , sponsored by the John and Laura Arnold Foundation, to conduct an 18-month study focusing on how to “move the [scientific] research enterprise toward open science as the default for scientific research results.” The committee is chaired by Alexa McCray, co-director of the Center for Biomedical Informatics at Harvard Medical School, and most of its members are university professors.
At its first meeting on July 20, the committee heard from several invited speakers who described current policies and practices for facilitating open science. The committee’s efforts will culminate in a report with findings and recommendations for the research enterprise.
Report will recommend paths to open science, no “half measures”
McCray began by reading the definition of open science provided in the study’s task statement:
Open science is defined … as public access (i.e., no charge for access beyond the cost of an internet connection) to scholarly articles resulting from research projects, the data that support the results contained in those articles, computer code, algorithms, and other digital products of publicly funded scientific research, so that the products are findable, accessible, interoperable, and reusable (FAIR), with limited exceptions for privacy, proprietary business claims, and national security.
Stebbins explained that one of the intended outcomes of the study is to improve access to federally funded research by providing “specific policy and practice options” for the federal agencies. The Obama-era White House Office of Science and Technology (OSTP) issued a memorandum in 2013 that directed agencies with R&D expenditures of more than $100 million to develop plans for improving public access to the results of federally funded scientific research. A bipartisan group of congressional representatives this week also reintroduced the “Fair Access to Science and Technology Research Act,” a bill that would enshrine into law the same requirement.
Stebbins, a former assistant director of biotechnology at OSTP, argued that federal agencies are not implementing the memo’s directives “to the letter or the spirit,” and he pointed to competing priorities at the agencies as a major obstacle. However, he added that it is not the committee’s place to weigh any open science recommendations the study might make for agencies against other agency priorities.
Asked to identify the main drivers of open science, Stebbins responded that it is up to the committee to determine these categories, since the foundation “specifically did not set [them].” However, he highlighted data accessibility as “one of the more difficult challenges in the sciences in the next 10 years” and said the committee has the “opportunity to lay a path” to address it.
Speakers highlight coordination and incentives as essential
The speakers at the meeting discussed the challenges of integrating open science into the research enterprise and how to address them.
James Kurose, assistant director of the National Science Foundation’s Computer and Information Science and Engineering (CISE) Directorate, addressed the practical problem of data archiving. The extensive storage requirements necessary to host large amounts of data make it difficult for any one organization to create a single, sustainable database, he said.
National Academy of Sciences President Marcia McNutt commented that the journal Science, for which she served as editor-in-chief, addresses the issue by requiring each published author to archive their data so that can be shared when requested instead of hosting its own repository.
Victoria Stodden, associate professor of information science at the University of Illinois at Urbana-Champaign and a leader in the open science movement, emphasized that coordinating with stakeholders is essential to research reproducibility and transparency. She identified research environments, workflow systems, and dissemination platforms as key infrastructure areas that can be coordinated to promote “good scientific practice downstream.” Stodden emphasized that coordination aids researchers in “enabling efficiency and productivity, and discovery” for managing computational data.
Both Brian Nosek, executive director of the Center for Open Science, a nonprofit organization supported by the Arnold Foundation, and Heather Joseph, executive director of the Scholarly Publishing and Academic Resources Coalition (SPARC), argued that researchers’ busy schedules must be accounted for in any effort to promote open science. Respecting research workflow and integrating an open science framework throughout the entire research enterprise instead of just placing more burden on researchers is essential to achieving results, they agreed.
Nosek argued that current incentives focus “on getting research published, not getting it right,” and that the latter needs to be better encouraged. One solution he highlighted was moving to a “registered report” process of publication, where a project undergoes initial peer review after its design phase rather than the reporting results phase. While a second review occurs during the report phase, Nosek explained the review would be “outcome independent,” or focus on how well the researchers followed experimental protocol. Since publishing would be guaranteed after the initial peer review, Nosek said this would help limit the practice of only publishing results that support a favored hypothesis.
Nosek also encouraged journals to adopt “badges” that label publications for meeting certain open science standards, such as “Open Data,” “Open Materials,” or “Preregistered.” He noted that after a number of journals began this practice in 2014, the amount of open data available in repositories grew.
McNutt, who is a board member of the Center for Open Science, also identified the widespread use of preprint servers as “low-hanging fruit” for disseminating information. These online servers host draft research articles that have not been peer-reviewed, giving researchers earlier access to data and results.
Disciplinary variations present other obstacles
Several speakers acknowledged that a primary difficulty in implementing these recommendations is the many methodological differences between the various scientific disciplines. Stodden pointed to the differing types of reproducibility—empirical, statistical, and computational—across disciplines as an example, explaining that researchers will often refer to different practices and standards when discussing the successful reproduction of results, which can lead to miscommunication.
Kurose expanded on such methodological variations, pointing out how differences in practices at federal science agencies affects how data is owned and analyzed. “One size does not fit all,” even within a directorate at an agency like NSF, he emphasized.
Kurose noted that the disciplinary differences will require the study committee to make tradeoffs between generality and specificity in its recommendations. He said that regardless of what tradeoffs the committee makes, the best approach is to provide recommendations on both bottom-up implementation and top-down guiding principles.