Hahnel Argues for Making Data as Open as Possible
Speaking virtually from London to a group of more than 120 NIH employees at a recent NIH Data Science Town Hall sponsored by the Office of Data Science Strategy, Dr. Mark Hahnel said, “To get the most out of science, research data needs to be as open as possible, as closed as necessary.”
For Hahnel, “open as possible” means data that is published openly and well-described. It also means educating researchers on the importance of data-sharing and the tools available to them.
“Given today’s technology, academia should be moving further, faster,” he said. “To get there, we need open research data.”
Hahnel founded generalist data repository Figshare in 2011 while finishing his Ph.D. in stem cell biology at Imperial College London. The company was born out of his personal need for a place to store his research output. He quickly realized he wasn’t the only researcher who needed a place to publicly share data that had no other designated repository.
“I wanted to allow scientists and researchers like me to get credit and recognition for all their work,” Hahnel explained.
Fast forward to today, and Hahnel is a vocal advocate for open data and open research. He’s also been a partner for the past year on a project with NIH.
The ODSS launched a 1-year pilot project with Figshare in July 2019 to see how NIH-funded researchers would use a generalist repository when they had no other logical place to store their data.
“We all agree that researchers should use subject-specific repositories whenever possible,” Hahnel said. “But there isn’t always a suitable repository available. In those cases, a generalist repository—be it Figshare or another—is an excellent way to share data.”
A goal of the pilot project was to help researchers implement the FAIR principles, which state that data should be findable, accessible, interoperable and reusable—by humans and machines.
“FAIR is a great example of how we can get closer to ‘open as possible,’” Hahnel said. “Publishing datasets in a repository without some level of curation can get you to FAIR for humans, but you’ll rarely get there for machines.
“This pilot gave us an opportunity to test the idea that we need people curating and improving the data when it is added to a repository to make it FAIR for machines.”
The result was more discoverable data thanks to more descriptive titles and metadata.
“Truly FAIR data for humans and machines takes more than just data and technology,” Hahnel said. “You need people in the mix working with researchers and checking files.”
As a result of the pilot, NIH plans to continue finding opportunities to better engage with and educate the biomedical research community on the value of effective data management and FAIR data-sharing.
What will Hahnel and his team take forward from the pilot?
“It changed my mind that we need to be checking metadata for all our clients,” Hahnel said. “I don’t know how it’s going to scale, so that’s an interesting challenge to try to solve.
“We’re also going to keep educating as many people as we can on tools and best practices to improve their data-sharing.”
The pilot project is now archived at https://nih.figshare.com/, with the data still discoverable and reusable.
To learn more about the NIH Figshare project, visit https://datascience.nih.gov/data-ecosystem/exploring-a-generalist-repository-for-nih-funded-data.