AI and NLP for Publishers: How Artificial Intelligence & Natural Language Processing Are Transforming Scholarly Communications

A free report from Cenveo Publisher Services

You may have heard how artificial intelligence (AI) is being deployed within the information industry to combat fake news, detect plagiarism, and even recommend content to users. Until now however, AI has had minimal impact on the content creation and editorial functions of the publishing ecosystem. For scholarly publishers in particular, AI capabilities have advanced to a degree that they can actually automate significant portions of their workflows, with massive implications for their businesses, their authors and the research community.

AI is a method by which humans train machines to identify patterns and learn new patterns. It involves developing algorithms that enable machines to quickly process large swaths of data, recognize the patterns within that data, and make decisions or recommendations based on that analysis.

Natural language processing (NLP) incorporates grammar analysis into machine-learning. A computer program is trained to recognize the noun, verb, and object in a sentence, and to understand the structure of words in order to discern their meaning.

With NLP technology, publishers can automate simple editing and formatting tasks and focus their energy on adding greater value to the content. They can also manage more journal submissions or speed up tedious peer review without significantly increasing staff or production costs.

Traditionally, all articles submitted to an academic journal undergo a similar process with multiple rounds of corrections and changes before copyediting, formatting, composition and proofing. All told, this system could take several weeks before the article is published.

On the other hand, AI and NLP technology can implement pre-set grammar and formatting rules to analyze the content and score articles for quality. The technology will automatically correct minor errors like grammar and punctuation, and flag more complex issues that may need an editor’s attention. Journal submissions that are high-quality and can advance straight to the typesetting and composition stage.

AI & NLP technology can flag content that requires an editor's review

Because editing is often the most time-consuming part of the production process, fast tracking high-quality articles to the composition stage can save a significant amount of time for publishers—while also improving the author experience.

In our latest report, AI and NLP for Publishers, we explore how AI and NLP are being used today in scholarly publishing and how it may impact the evolution of research. We also explore how the technology works and how publishers like Taylor & Francis are, with the help of Cenveo Publisher Services, realizing the benefits of intelligent automation.

Download the free report.

Smart Suite 2.0 is Cenveo’s integrated, cloud-based publishing engine combining AI and system intelligence to achieve accelerated workflows.


Smart Suite 2.0 Released - A New Approach to Pre-editing, Copyediting, Production, and Content Delivery

Smart Suite Version 2.0 is a cloud-based ecosystem of publishing tools that streamlines the production of high-quality content. The system has a complete interface (UI) redesign and tighter integration with high-speed production engines to solve the challenges related to multi-channel publishing.

Smart Suite 2.0 is the next generation publishing engine that focuses on a combination of artificial intelligence, including NLP, and system intelligence that eliminates human intervention and achieves the goal of high-speed publishing with editorial excellence. Smart Suite auto generates multiple outputs, including PDF, XML, HTML, EPUB, and MOBI from a manuscript in record-setting time.
— Francis Xavier, VP of Operations at Cenveo Publisher Services

Offering a fresh approach to streamline production, the unified toolset comprises four modules that seamlessly advance content through publishing workflows while validating and maintaining mark-up language behind the scenes.

  • Smart Edit is a pre-edit, copyedit, and conversion tool that incorporates natural language processing (NLP) and artificial intelligence (AI) to benefit publishers not only in terms of editorial quality but also better, faster markup and delivery to output channels.
  • Smart Compose is a fully automated production engine that ingests structured output from Smart Edit and generates page proofs. Designed to work with both 3B2 and InDesign, built-in styles based on publisher specifications guarantee consistent, high-quality layouts.
  • Smart Proof provides authors and editors with a browser-based correction tool that captures changes and allows for valid round tripping of XML.
  • Smart Track brings everything together in one easy UI that logs content transactions. The kanban-styled UI presents a familiar workflow overview with drill-down capabilities that track issues and improve both system and individual performance.

Smart Suite is fully configurable for specific publisher requirements and content types. Customized data such as taxonomic dictionaries, and industry integrations such as FundRef, GenBank, and ORCID, enhance the system based on publisher requirements.


Download Brochure


Mike Groth

Michael Groth is Director of Marketing at Cenveo Publisher Services, where he oversees all aspects of marketing strategy and implementation across digital, social, conference, advertising and PR channels. Mike has spent over 20 years in marketing for scholarly publishing, previously at Emerald, Ingenta, Publishers Communication Group, the New England Journal of Medicine and Wolters Kluwer. He has made the rounds at information industry events, organized conference sessions, presented at SSP, ALA, ER&L and Charleston, and blogged on topics ranging from market trends, library budgets and research impact, to emerging markets and online communities.. Twitter Handle: @mikegroth72

How Open Access is Changing Scholarly Publishing

Guest blog by John Parsons

After almost two decades, the Open Access publishing model is still controversial, and misunderstood. Here’s where we stand today.

The beginnings of scholarly publishing correspond roughly to the Enlightenment period of the late 17th and early 18th Centuries. The practice of publishing one’s discoveries was driven by a belief—championed the Royal Society—in the transparent, open exchange of experiment-based ideas. Over the centuries, journals embraced a rigorous peer review process, to maintain the integrity (and the subscription value) of its research content.

Transparency, openness, and integrity all come at a cost, however. For many years, that cost was met by charging journal subscription fees—usually borne by institutions who either produced the research, benefited from it, or both. So long as the publishing model was solely print-based, the subscription model worked well, especially for institutions with deep pockets. That all changed with the Internet. Not only did the scope and volume of research increase rapidly, so did the perception that all information should be easily findable via search engines.

The Internet expanded the audience for research outside traditional institutions—to literally anyone with a connected device. With this expansion, the disparity between the well-funded and those less fortunate became acute. As it did with other publishing workflows, this disruption drove a need for new economic models for scholarly publishing.

Open Access Basics

Advocacy for less fettered access to knowledge is nothing new. But the current Open Access (OA) movement began in earnest in the early 2000s, with the “Three Bs” (the Budapest Open Access Initiative, the Bethesda Statement, and the Berlin Declaration by the Max Planck Institute). Much of the impetus occurred in the Scientific, Technical, and Medical, or STM publishing arena, and from research funding and policy entities like the European Commission and the U.S. National Institutes of Health. The latter’s full-text archive of free biomedical and life sciences articles, PubMedCentral or PMC, is a leading example—backed by a mandate that the results of publicly-funded research be freely available to the public.

In a nutshell, Open Access consists of two basic types—each with its own variations and exceptions. “Green” OA is the practice of self-archiving scholarly articles in a publicly-accessible data repository, such as PMC or one of many institutional repositories maintained by academic libraries. There is often a time lag between initial publication—especially by a subscription-based journal—and the availability of the archived version.

As we will discuss in future blogs, publishers and their service providers are exploring better ways to adapt their publishing workflows to the realities of OA and hybrid journals. In some cases, such as metadata tagging, XML generation, and output to print and online versions, these workflows can be highly automated. In others, publishers must find cost-effective ways to add value—while being as transparent as possible to the authors and users of journal content.

The alternative is the “Gold” OA model. It includes a growing number of journals, such as the Public Library of Science (PLOS), that do not charge subscription fees. Instead, they fund the cost of publishing through article processing charges (APCs) and other mechanisms. Although APCs are commonly thought of as being paid by the author, the real situation is more complex. Often, in cases where OA is mandated, APCs are built into the funding proposals, or otherwise factored into institutional and research budgets. PLOS and other journals can also waive APCs, or utilize voluntary funding “pools,” for researchers who cannot afford to pay them.

The appeal of Open Access is obvious to researchers and libraries of limited means. It also has the potential to accelerate research—by letting scientists more easily access and build upon others’ work. But for prestigious institutions, publishers, and their partners, the picture is more complicated.

Publishers in particular can be hard pressed to develop and enhance their brand—or offer a multitude of services that scholars may take for granted—when constrained by the APC funding model. (Those challenges will be addressed in a future blog.)

Misconceptions, Problems—and Solutions

Even today, researchers are not always clear about what Open Access means for scholarly publishing. Research librarians have their work cut out for them. They cite the common misconception that OA journals do not have an adequate peer review process, for example. This is caused by disreputable or “predatory” journals that continually spam researchers with publication offers. Librarians counter this with a growing arsenal of blacklist and whitelist sources, such as the Directory of Open Access Journals.

Perhaps a major contributor to the uncertainty surrounding OA is the practice of openly publishing “preprint” versions of articles prior to—or during the early stages of—the peer review process. Sometimes, this is part of the researcher’s strategy to secure further funding, but it can fuel the mistaken notion that peer review is not required in OA publishing workflow. Distinguishing preprints from final OA articles must be a goal for publishers and their partners.

Another problem is scholars’ unfamiliarity with the OA-driven changes in publishing workflows. Gold OA journals—particularly those involved in STM publishing—are usually quite adept at guiding authors through the publication process, just as their subscription-based counterparts and publishing service providers have been. For example, the practice of assigning Digital Object Identifiers (DOIs), ISSNs, and other metadata to scholarly publishing works is becoming increasingly efficient for both Gold OA and subscription journals.

Green OA is a thornier problem for traditional publishing workflows. Each institutional repository is separate from the others—with its own funding sources, development path, and legacy issues. A common approach to article metadata, for example, has not happened overnight. Fortunately, organizations like Crossref are working with multiple partners and initiatives to make these workflows universal—and transparent to the researcher.

Perhaps the biggest issue posed by OA is the fate of traditional, subscription-based journals. Despite the push to “flip” journals from a subscription model to Open Access, there are cases where this is simply not feasible or even desirable. Many journals have a large subscriber base of professionals who, although they value the research, do not themselves publish peer reviewed articles. This is especially true for STM publishing. Some of these journals have adopted a “hybrid” approach, charging APCs for some articles (which are available immediately) while maintaining others for subscribers only. These are eventually made Open Access under the Green model, especially when Open Access is a funding requirement.

Scanning the Horizon

As we will discuss in future blogs, publishers and their service providers are exploring better ways to adapt their publishing workflows to the realities of OA and hybrid journals. In some cases, such as metadata tagging, XML generation, and output to print and online versions, these workflows can be highly automated. In others, publishers must find cost-effective ways to add value—while being as transparent as possible to the authors and users of journal content.

Despite these challenges, Open Access is changing the scholarly publishing landscape forever. There is a compelling need for researchers to find and build upon the research of others—each needle buried in a haystack of immense proportions—to advance the human condition. Publishers and their service partners are well positioned to make that open process accessible and fair to all.


Resources for Publishers