Open access to scholarly knowledge in the digital era (chapter 5.3): Reading scholarship digitally
This article is chapter 5.3 in section 5 of a series of articles summarising the book Reassembling Scholarly Communications: Histories, Infrastructures, and Global Politics of Open Access.
In the third chapter of the infrastructures and platforms section, Martin Paul Eve asks what it means to think of scholarship as data.
Scholarship, labor power, and proliferation
The contemporary researcher faces a serious challenge in keeping up with the most recent research and scholarship, amid competing demands for time in the saturated life of an academic.
Academic hiring panels face hundreds of candidates per post, and it becomes near-impossible for panel members to read all of the scholarship before them. This challenge is where proxy measures such as the notorious journal impact factor (JIF) have sprung. Eve alerts that the problem with aggregating to the journal level is that such a method is deeply flawed on several levels, including restricting academic choice and freedom in publication venue, and creating a market problem for library budgets.
The San Francisco Declaration on Research Assessment (DORA) was born to avoid negative situations caused by the JIF, but it doesn’t answer the question of how to spend our reading time. Eve advises that one suggestion for fixing this is to move to a mode of assessment where candidates for hiring present a research narrative in which they outline the impact, outcomes, and overall arch of their research, referring to a couple of key outputs, which a hiring panel might turn and read in detail.
Eve states that the problem with implementing initiatives such as DORA is that it puts the onus on candidates to narrate their work, which is arduous, unpaid work, with only a slim chance of a payoff. This approach privileges those who can afford to put the most time into an academic job. A range of computational approaches could assist with the rigor of research and scholarship, which frequently does not and cannot cite secondary literature comprehensively, since discovery has become so hard in an
age of open abundance.
Distant reading methodologies
Various digital methods have been born under the name of “distant reading” to attempt to solve the problem of insufficient reading labor-power. The fundamental premise of such methods is to use digital techniques to scan through hundreds of thousands of papers, articles, or books, and to bring pertinent work or aspects to the attention of the operator.
Eve introduces one prominent group of scientists who are already engaged in such an approach. This is the Murray-Rust research group at Cambridge University, which has developed ContentMine, a suite of tools for extracting facts from the scientific literature. ContentMine has the potential to revolutionize how we search academic literature at scale. The benefits can be summarized as:
- comprehensive coverage of the secondary literature
- comprehensive coverage within a paper
- aggregation and interdomain analytics
- semantically rich entity tags.
Eve reports that Murray-Rust believes that its activities in mining the scholarly literature are covered by the Hargreaves amendments to UK copyright law, but cannot be utterly sure. This is also complicated by Technical Protection Measures (TPMs) and Digital Rights Management systems, which more publishers are now employing. These make it impossible to use research papers with any custom software without breaking the law. While it is technically easy to circumvent some of these systems, there are hefty criminal penalties for doing so. In the EU, this is specified by EU Directive 2001/29/EC, and in the US by the Digital Millennium Copyright Act.
Machine learning and research literature classification
Eve advises that machine learning approaches could provide a future way to bring relevant research and scholarly literature to the attention of researchers. Software systems, such as machine learning systems, are well suited to classifying problems. For example, a system could provide an appraisal on behalf of researchers when fed a new paper or book.
If we use machine learning to classify scholarship for personal reading preference, we need to inject the unexpected and fortuitous into such systems so that we can still have the experience of chance advancing thought and research.
Tempered possibilities
Such futurological technologies are not far off in technical terms, but in social and legal terms they remain some way off. Access to research works is the key to making them a reality. Eve alerts that this mean that despite the promise of amplifying our labor time by reading scholarship with computers, we still have some way to go to make it a workable reality.
Next part (chapter 5.4): Toward linked open data for Latin America.
Article source: This article is an edited summary of Chapter 191 of the book Reassembling scholarly communications: Histories, infrastructures, and global politics of Open Access2 which has been published by MIT Press under a CC BY 4.0 Creative Commons license.
Acknowledgements: This summary was drafted by Wordtune Read with further corrections and edits by Bruce Boyes.
Article license: This article is published under a CC BY 4.0 Creative Commons license.
References:
- Eve, M. P. (2020). Reading Scholarship Digitally. In Eve, M. P., & Gray, J. (Eds.) Reassembling scholarly communications: Histories, infrastructures, and global politics of Open Access. MIT Press. ↩
- Eve, M. P., & Gray, J. (Eds.) (2020). Reassembling scholarly communications: Histories, infrastructures, and global politics of Open Access. MIT Press. ↩