The Royal Society of Chemistry has announced that its near 180-year old archive of research insights has been made available for Text and Data Mining projects, but why is this important?

Machine-learning and artificial intelligence have gone from being the dreams of science fiction to something we now talk about every day without raising an eyebrow.

In fact, we now encounter these technologies in our normal day-to-day routine. From searching for the nearest store to looking for the perfect Christmas gift; internet search engines are now so capable they can even predict what you’re looking for before you’ve finished your search term.

The secret to this success is perhaps no secret at all – data. Over the years, search giants such as Google, Yahoo and Bing have compiled huge databases comprised of user interactions. By finding patterns and correlations between user input, your own searches and the most selected results, they can appear more intuitive than ever.

Similar techniques are already being utilised in scientific research, using Text and Data Mining of research journals to find relevant breakthroughs via machine learning and AI. Harnessing the speed and power of digital technology, teams using this technique can enjoy a crucial head start on their project by getting computers to identify all the key pointers to set you off in the right direction.

In fact, smart technology is now so clever it can identify patterns across millions of scientific papers, even cross-discipline, in the blink of an eye – unveiling insights that may never have been uncovered using more traditional techniques. It can even differentiate between meanings of a word; knowing what kind of mole you’re looking for could be the difference between accurate data and a wasted resource, so this is crucial.

However, having the smarts is nothing if you don’t have the data. That’s where scientific publishers such as ourselves have a crucial role to play if the potential of these smart systems is to be realised.

The Royal Society of Chemistry has been publishing high quality research for nearly 180 years, with hundreds of thousands of findings and studies contained within our archives, spanning several eras of scientific discovery. However, if we are to advance, we have to constantly look forward.

That’s why we have recently completed digitising this extensive archive, making all of those published papers searchable for Text and Data Mining projects.

In effect, this allows our collection of research to ‘plug and play’ with other resources, ensuring those connections can be made across disciplines and ensuring faster and more accurate research results.

The future of journal publications will of course evolve towards capturing research results and reports alongside FAIR data, ensuring research is findable, accessible, interoperable and reusable. But the complexity of language and expression will still need text analysis to feed machine learning and AI – drawing insights from a broad spectrum of evidence.

What’s clear is that this technology is a fantastic addition to a research team’s toolkit and can provide a crucial head start by pointing you in the right direction. As more and more resources such as our own become available, the functionality – and impact – of these processes can only improve.

Richard Kidd is Head of Chemistry Data at the Royal Society of Chemistry