AI and NLP for Publishers: How Artificial Intelligence & Natural Language Processing Are Transforming Scholarly Communications

A free report from Cenveo Publisher Services

You may have heard how artificial intelligence (AI) is being deployed within the information industry to combat fake news, detect plagiarism, and even recommend content to users. Until now however, AI has had minimal impact on the content creation and editorial functions of the publishing ecosystem. For scholarly publishers in particular, AI capabilities have advanced to a degree that they can actually automate significant portions of their workflows, with massive implications for their businesses, their authors and the research community.

AI is a method by which humans train machines to identify patterns and learn new patterns. It involves developing algorithms that enable machines to quickly process large swaths of data, recognize the patterns within that data, and make decisions or recommendations based on that analysis.

Natural language processing (NLP) incorporates grammar analysis into machine-learning. A computer program is trained to recognize the noun, verb, and object in a sentence, and to understand the structure of words in order to discern their meaning.

With NLP technology, publishers can automate simple editing and formatting tasks and focus their energy on adding greater value to the content. They can also manage more journal submissions or speed up tedious peer review without significantly increasing staff or production costs.

Traditionally, all articles submitted to an academic journal undergo a similar process with multiple rounds of corrections and changes before copyediting, formatting, composition and proofing. All told, this system could take several weeks before the article is published.

On the other hand, AI and NLP technology can implement pre-set grammar and formatting rules to analyze the content and score articles for quality. The technology will automatically correct minor errors like grammar and punctuation, and flag more complex issues that may need an editor’s attention. Journal submissions that are high-quality and can advance straight to the typesetting and composition stage.

AI & NLP technology can flag content that requires an editor's review

Because editing is often the most time-consuming part of the production process, fast tracking high-quality articles to the composition stage can save a significant amount of time for publishers—while also improving the author experience.

In our latest report, AI and NLP for Publishers, we explore how AI and NLP are being used today in scholarly publishing and how it may impact the evolution of research. We also explore how the technology works and how publishers like Taylor & Francis are, with the help of Cenveo Publisher Services, realizing the benefits of intelligent automation.

Download the free report.

Smart Suite 2.0 is Cenveo’s integrated, cloud-based publishing engine combining AI and system intelligence to achieve accelerated workflows.

 

Accessibility for Publishers: Practical Tips That Demonstrate it's Well Within Your Reach

a free report from Riverwinds Consulting and Cenveo Publisher Services

Accessibility is an approach to publishing and design that makes content available to all, including those with disabilities who use assistive technologies on the computer. The aim of accessible publishing is to make reading easier for users who have difficulties or disabilities including the blind, partially sighted, and people with learning disabilities. Making content accessible enables readers to experience content in the most efficient format and allows them to absorb the information in a better way. The term “accessibility” is used to address issues of content structure, format, and presentation.

The question of “why make the effort to have content accessible to readers with disabilities” still lingers. Of course, accessibility comes with a cost. However, publishing indeed benefits from embracing this essential initiative. When accessibility is well executed, it can expand readership and provide a higher-quality user experience for everyone. 

Let's look at an example comparing accessible alt text with alt text captured from a figure legend. Visual items such as images that are important to the content should include alternate-text descriptions (alt text), which allows users to understand visual information. Alt text descriptions should capture information that is not included in the caption or surrounding text, and convey meaningful information to the user from the visual item. Descriptive alt text is critical to understand the full meaning of an image for the visually impaired reader. The following image illustrates an example of accessible alt text that provides a more useful description for a visually impaired reader compared with alt text that simply repeats a figure legend.

In our latest report "Accessibility for Publishers: Practical Tips That Demonstrate it's Well Within Your Reach," we provide business cases that can be brought to leadership and stakeholders in a publishing organization. Download this free report and understand

  • how you can build the business case for accessibility in your publishing organization

  • emerging and compelling reasons for making content accessible

  • the key principles of accessibility

 

Rights & Permissions Service for Publishers

Copyright is far more than just a necessary evil to protect intellectual property from theft. Copyright furthers all creative interests by making the rich marketplace of ideas available to a wider audience. Resourceful rights and permissions management supports author content while maximizing the publisher’s budget.

Hiring one person to perform all the rights and permissions functions requires finding a pretty special person: an editorial specialist with enough copyright expertise to be an IP strategist, while being a skilled digital-image savvy photo researcher and database manager. That's why we offer R&P as a service for publishers.

Cenveo Publisher Services manages all aspects of text, image, and rich media content R&P. We assemble a team of project managers, assessment specialists, data entry staff, photo researchers, and permissions experts to support the management of R&P in your organization.

By identifying a rights strategy early, authors can stay on budget. Research and permissions runs alongside production cycles with clearly defined milestones. Targeted international expertise also allows a spectrum of pricing options. Contact us to learn how we can support R&P for your journals or books program.

 

Download Brochure


Comment

Mike Groth

Michael Groth is Director of Marketing at Cenveo Publisher Services, where he oversees all aspects of marketing strategy and implementation across digital, social, conference, advertising and PR channels. Mike has spent over 20 years in marketing for scholarly publishing, previously at Emerald, Ingenta, Publishers Communication Group, the New England Journal of Medicine and Wolters Kluwer. He has made the rounds at information industry events, organized conference sessions, presented at SSP, ALA, ER&L and Charleston, and blogged on topics ranging from market trends, library budgets and research impact, to emerging markets and online communities.. Twitter Handle: @mikegroth72

Publishers Keep Calm and Carry On

It was another busy year at London Book Fair last week with reports of increased registration numbers up by a double-digit percentage.

 
 

The following captured a brief quiet moment at the Cenveo Publisher Services Stand. The global team met with publishers, production managers, archivists, technology executives, and many others to discuss all things related to the creation and management of content.

 
 

Accessibility

Indeed, the hot topic for LBF17 at the Cenveo Stand was content accessibility. Long a champion of digital equality, we're helping publishers create and architect content that is "born accessible." The same technologies and guidelines that improve access to materials for people with visual or hearing impairments, limited mobility, perceptual and cognitive differences, are also tremendously useful for all publishers' customers.

No longer limited to education publishers, we see that journal publishers and others have a driving need to do more with content accessibility.

 

Google Books Decision

In an extremely packed room, America’s foremost copyright jurist and a judge on the U.S. Court of Appeals Second Circuit, told attendees that Google’s program to scan tens of millions of library books to create an online index “conferred gigantic benefits to authors and the public equally,” and did not “offer a substitute or interfere with authors’ exclusive rights” to control distribution. READ MORE: Judge Pierre Leval Defends Google Books Decision, Fair Use

Scholarly Publishing and Academic Market

The Research and Scholarly Publishing Forum offered academic publishers and service providers a half-day program with lively debates from Elsevier, Wiley, and Taylor & Francis. Some of the highlights included

  • A discussion about the future of Open Access in the UK between Alicia Wise, Elsevier’s Director of Policy and Access, Liam Earney, Jisc Collections’ Head of Library Support Services, and Chris Banks, Assistant Provost (Space) & Director of Library Services, Central Library, Imperial College London
  • A panel presenting global research policy developments chaired by Wiley’s James Perham-Marchant, featuring speakers from Taylor & Francis, Berghahn Books and Research Consulting
  • A panel session on new innovations to watch, chaired by Tracey Armstrong, President and CEO of the Copyright Clearance Center, including speakers from Sparrho, Frontiers and Cold Spring Harbor Laboratory Press

Full Coverage via Publishers Weekly

Publishers Weekly covered a range of topics across the many markets represented at the Fair.

 

Resources for Publishers


Stay Connected

1 Comment

Mike Groth

Michael Groth is Director of Marketing at Cenveo Publisher Services, where he oversees all aspects of marketing strategy and implementation across digital, social, conference, advertising and PR channels. Mike has spent over 20 years in marketing for scholarly publishing, previously at Emerald, Ingenta, Publishers Communication Group, the New England Journal of Medicine and Wolters Kluwer. He has made the rounds at information industry events, organized conference sessions, presented at SSP, ALA, ER&L and Charleston, and blogged on topics ranging from market trends, library budgets and research impact, to emerging markets and online communities.. Twitter Handle: @mikegroth72

How Open Access is Changing Scholarly Publishing

Guest blog by John Parsons

After almost two decades, the Open Access publishing model is still controversial, and misunderstood. Here’s where we stand today.

The beginnings of scholarly publishing correspond roughly to the Enlightenment period of the late 17th and early 18th Centuries. The practice of publishing one’s discoveries was driven by a belief—championed the Royal Society—in the transparent, open exchange of experiment-based ideas. Over the centuries, journals embraced a rigorous peer review process, to maintain the integrity (and the subscription value) of its research content.

Transparency, openness, and integrity all come at a cost, however. For many years, that cost was met by charging journal subscription fees—usually borne by institutions who either produced the research, benefited from it, or both. So long as the publishing model was solely print-based, the subscription model worked well, especially for institutions with deep pockets. That all changed with the Internet. Not only did the scope and volume of research increase rapidly, so did the perception that all information should be easily findable via search engines.

The Internet expanded the audience for research outside traditional institutions—to literally anyone with a connected device. With this expansion, the disparity between the well-funded and those less fortunate became acute. As it did with other publishing workflows, this disruption drove a need for new economic models for scholarly publishing.

Open Access Basics

Advocacy for less fettered access to knowledge is nothing new. But the current Open Access (OA) movement began in earnest in the early 2000s, with the “Three Bs” (the Budapest Open Access Initiative, the Bethesda Statement, and the Berlin Declaration by the Max Planck Institute). Much of the impetus occurred in the Scientific, Technical, and Medical, or STM publishing arena, and from research funding and policy entities like the European Commission and the U.S. National Institutes of Health. The latter’s full-text archive of free biomedical and life sciences articles, PubMedCentral or PMC, is a leading example—backed by a mandate that the results of publicly-funded research be freely available to the public.

In a nutshell, Open Access consists of two basic types—each with its own variations and exceptions. “Green” OA is the practice of self-archiving scholarly articles in a publicly-accessible data repository, such as PMC or one of many institutional repositories maintained by academic libraries. There is often a time lag between initial publication—especially by a subscription-based journal—and the availability of the archived version.

As we will discuss in future blogs, publishers and their service providers are exploring better ways to adapt their publishing workflows to the realities of OA and hybrid journals. In some cases, such as metadata tagging, XML generation, and output to print and online versions, these workflows can be highly automated. In others, publishers must find cost-effective ways to add value—while being as transparent as possible to the authors and users of journal content.

The alternative is the “Gold” OA model. It includes a growing number of journals, such as the Public Library of Science (PLOS), that do not charge subscription fees. Instead, they fund the cost of publishing through article processing charges (APCs) and other mechanisms. Although APCs are commonly thought of as being paid by the author, the real situation is more complex. Often, in cases where OA is mandated, APCs are built into the funding proposals, or otherwise factored into institutional and research budgets. PLOS and other journals can also waive APCs, or utilize voluntary funding “pools,” for researchers who cannot afford to pay them.

The appeal of Open Access is obvious to researchers and libraries of limited means. It also has the potential to accelerate research—by letting scientists more easily access and build upon others’ work. But for prestigious institutions, publishers, and their partners, the picture is more complicated.

Publishers in particular can be hard pressed to develop and enhance their brand—or offer a multitude of services that scholars may take for granted—when constrained by the APC funding model. (Those challenges will be addressed in a future blog.)

Misconceptions, Problems—and Solutions

Even today, researchers are not always clear about what Open Access means for scholarly publishing. Research librarians have their work cut out for them. They cite the common misconception that OA journals do not have an adequate peer review process, for example. This is caused by disreputable or “predatory” journals that continually spam researchers with publication offers. Librarians counter this with a growing arsenal of blacklist and whitelist sources, such as the Directory of Open Access Journals.

Perhaps a major contributor to the uncertainty surrounding OA is the practice of openly publishing “preprint” versions of articles prior to—or during the early stages of—the peer review process. Sometimes, this is part of the researcher’s strategy to secure further funding, but it can fuel the mistaken notion that peer review is not required in OA publishing workflow. Distinguishing preprints from final OA articles must be a goal for publishers and their partners.

Another problem is scholars’ unfamiliarity with the OA-driven changes in publishing workflows. Gold OA journals—particularly those involved in STM publishing—are usually quite adept at guiding authors through the publication process, just as their subscription-based counterparts and publishing service providers have been. For example, the practice of assigning Digital Object Identifiers (DOIs), ISSNs, and other metadata to scholarly publishing works is becoming increasingly efficient for both Gold OA and subscription journals.

Green OA is a thornier problem for traditional publishing workflows. Each institutional repository is separate from the others—with its own funding sources, development path, and legacy issues. A common approach to article metadata, for example, has not happened overnight. Fortunately, organizations like Crossref are working with multiple partners and initiatives to make these workflows universal—and transparent to the researcher.

Perhaps the biggest issue posed by OA is the fate of traditional, subscription-based journals. Despite the push to “flip” journals from a subscription model to Open Access, there are cases where this is simply not feasible or even desirable. Many journals have a large subscriber base of professionals who, although they value the research, do not themselves publish peer reviewed articles. This is especially true for STM publishing. Some of these journals have adopted a “hybrid” approach, charging APCs for some articles (which are available immediately) while maintaining others for subscribers only. These are eventually made Open Access under the Green model, especially when Open Access is a funding requirement.

Scanning the Horizon

As we will discuss in future blogs, publishers and their service providers are exploring better ways to adapt their publishing workflows to the realities of OA and hybrid journals. In some cases, such as metadata tagging, XML generation, and output to print and online versions, these workflows can be highly automated. In others, publishers must find cost-effective ways to add value—while being as transparent as possible to the authors and users of journal content.

Despite these challenges, Open Access is changing the scholarly publishing landscape forever. There is a compelling need for researchers to find and build upon the research of others—each needle buried in a haystack of immense proportions—to advance the human condition. Publishers and their service partners are well positioned to make that open process accessible and fair to all.

 

Resources for Publishers


A Simple Lesson From Walt Disney

Everyone Has a Story to Tell

Videos aid learning. Videos and animation are at the top of the elearning food chain. Whether it's within a traditional elearning course or as an independent asset, animated videos help learners visualize and understand complex concepts.

Increasingly, across all the markets we serve---journal publishers, K12 educational publishers, higher ed publishers, elearning providers, magazine publishers---all are interested in transforming complex content into animated video shorts.

Editorial credit: Alex Millauer / Shutterstock.com

Animation offers a medium of storytelling and visual entertainment, which can bring pleasure and information to people of all ages everywhere in the world.
— Walt Disney

Conceptualization and Production

Cenveo Publisher Services provides a blended team of creatives, editors, and technologists who transform a fuzzy vision into distinct products for use in digital publications, websites, and elearning courses. Our specialists comprise

  • instructional designers
  • subject matter experts
  • multimedia specialists
  • graphic visualizers

We work with our customers to provide the full-range of services around animation or à la carte options, including

  1. conceptualization
  2. content creation
  3. visual storyboarding
  4. art creation
  5. photo/video research and procurement
  6. permissions management
  7. audio recording
  8. animation
  9. live action shoots
  10. video editing and packaging
  11. accessibility--WCAG and Section 508 compliance

Animation Sample: SWOT Analysis

Have a look at an animated short we created to explain what a SWOT analysis is and why it's beneficial.

 
 


New England Publishing Collaboration Awards 2016 - Audience Choice Award

Earlier in November Bookbuilders of Boston hosted the annual New England Publishing Collaboration Awards or NEPCo Awards. Cenveo Publisher Services and the Association of American Publishers' Professional and Scholarly Publishing Division received the audience choice award.

 

Background on the Awards

Twenty years ago, job titles and job descriptions across publishers were remarkably similar. There were key skills, and mastery of these led to respect and reward. When the landscape changed, however, many of us adapted through meaningful partnerships. We investigated new business models, expanded our core competencies, and challenged our vendors to provide new services. We learned about the technologies our customers embraced. We found new customers.

The NEPCo Awards celebrates this agile and open-minded approach to unprecedented change. To educate our members who are new to the industry, and to reward those who have achieved through meaningful relationships with partners, Bookbuilders presents this timely event.

Winners

FIRST PLACE: Globe Pequot and Active Interest Media won first place for their collaboration on Backpacker: The National Parks Coast to Coast.

SECOND PLACE: Harvard Business Publishing and Jazz at Lincoln Center won second place for their project Wynton Marsalis & Jazz at Lincoln Center.

More pictures and videos of the winners are available on the NEPCo website here. Every finalist had 3 minutes to provide background on the collaborative publishing project and share insight on their project. Check them out!

 

Resources for Publishers


Comment

Mike Groth

Michael Groth is Director of Marketing at Cenveo Publisher Services, where he oversees all aspects of marketing strategy and implementation across digital, social, conference, advertising and PR channels. Mike has spent over 20 years in marketing for scholarly publishing, previously at Emerald, Ingenta, Publishers Communication Group, the New England Journal of Medicine and Wolters Kluwer. He has made the rounds at information industry events, organized conference sessions, presented at SSP, ALA, ER&L and Charleston, and blogged on topics ranging from market trends, library budgets and research impact, to emerging markets and online communities.. Twitter Handle: @mikegroth72

iPROSE Selected as a Finalist for the New England Publishing Collaboration Awards (NEPCo)

The New England Publishing Collaboration (NEPCo) Awards recognizes collaborative publishing achievements and educates a new generation of publishing leaders. NEPCo recently selected iPROSE: 40 Years of Excellence in Scholarly Publishing as a finalist for its prestigious award.

The Professional and Scholarly Publishing (PSP) Division of the American Association of Publishers (AAP) worked with The Design Studio at Cenveo Publisher Service to create iPROSE: 40 Years of Excellence in Scholarly Publishing. This digital edition celebrates and showcases the best in scholarly publishing during the past 40 years. The product was created and published earlier this year in honor of The Prose Awards' 40th anniversary. The PROSE Awards, often referred to as the Oscars of scholarly publishing, were created to honor the best scholarly publications across various fields in a variety of formats.

Working collaboratively with Cenveo Publisher Services, PSP went into their vaults to gather and organize 40 years of past winners, highlights, and history. Together the content was transformed into a digital product that includes video, rich media, photo galleries, and more. Cenveo Publisher Services has supported the PROSE Awards for a number of years and in 2016 we wanted to create something special for the 40th anniversary.

iPROSE will be competing against six other finalists at The Rockwell in Somerville, MA on November 9th. For the competition, finalists will present 3-minute descriptions of their collaborative projects on stage in front of judges and the audience. Tickets are still available via NEPCo's website here. Please come and join the celebration!


iPROSE is available for free download via the Cenveo Mobile Platform app in the iTunes store. Just click the icon below!

 
 
The creative team at Cenveo Publisher Services is exceptional. They conceptualized, designed and produced the digital edition. Now we have a digital product showcasing the best in scholarly publishing during the past 40 years.
— Kate Kolendo, project manager at the Association of American Publishers

1 Comment

Mike Groth

Michael Groth is Director of Marketing at Cenveo Publisher Services, where he oversees all aspects of marketing strategy and implementation across digital, social, conference, advertising and PR channels. Mike has spent over 20 years in marketing for scholarly publishing, previously at Emerald, Ingenta, Publishers Communication Group, the New England Journal of Medicine and Wolters Kluwer. He has made the rounds at information industry events, organized conference sessions, presented at SSP, ALA, ER&L and Charleston, and blogged on topics ranging from market trends, library budgets and research impact, to emerging markets and online communities.. Twitter Handle: @mikegroth72

SSP Fall Seminar Recap - Mentoring, RFPs, Metadata...These Were Just a Few of Our Favorite Things

This past Tuesday and Wednesday (October 4 to 5), SSP hosted its Fall Seminar at the American Geophysical Union office in DC. The event was organized around three themes with presentations from publishers', vendors', and consultants' perspectives:

  1. Develop Somebody---Even Yourself: Mentorship, Career Development, and Networking
  2. A How to Guide: Successfully Executing an RFP Process
  3. Bagged and Tagged: How the New Scholarly Infrastructure is Connecting People, Places, and Things

Unlike the large SSP Annual Meeting, the Fall Seminar is an intimate gathering of journal managers, publishers, editorial directors, content technology architects, developmental editors, graphic designers, and more. The focus throughout the 2 days was building networks, both professional and organizational, to strengthen yourself and your company. It was evident that the message was taken to heart as everyone involved was open to conversation and making new connections.

The RFP presentation was loaded with tips and best practices but also included thoughts on what NOT to include in an RFP. The participants and the audience shared many pet peeves that translated to a list of great tips related to RFP content and process.

Never miss an opportunity to hear Chuck Koscher from CrossRef speak about standards and metadata. His mission of creating a sustainable infrastructure for scholarly communication is always explained in detail and with passion.

Following is a small sample of information from the past 2 days:

 

Resources for Publishers

Comment

Mike Groth

Michael Groth is Director of Marketing at Cenveo Publisher Services, where he oversees all aspects of marketing strategy and implementation across digital, social, conference, advertising and PR channels. Mike has spent over 20 years in marketing for scholarly publishing, previously at Emerald, Ingenta, Publishers Communication Group, the New England Journal of Medicine and Wolters Kluwer. He has made the rounds at information industry events, organized conference sessions, presented at SSP, ALA, ER&L and Charleston, and blogged on topics ranging from market trends, library budgets and research impact, to emerging markets and online communities.. Twitter Handle: @mikegroth72

The Story Behind the Textbook: The Cost of Creating Course Materials

Behind that expensive textbook your college kids just charged, is a team of writers, editors, instructional designers, graphic designers, developers, compositors, marketers, and more.

The cost of creating high-quality learning materials is significant.

The Association of American Publishers collected real metrics based on the production of Pearson's Campbell Biology, 10th Edition and published the following infographic:

From the Association of American Publishers. Used with permission.


We work with publishers and content providers and understand that managing costs along with content is imperative.

From content creation to XML, we provide full-service editorial and production teams that include instructional designers, subject matter experts, editors, and writers. Whether it’s core textbook work or supplement creation and management, we can help.  Cenveo's Higher Education content team has experience managing textbook production and digital product creation, from setting up projects and working with authors to finalizing content. Want to discuss some of the ways we can help your production team?

Comment

Mike Groth

Michael Groth is Director of Marketing at Cenveo Publisher Services, where he oversees all aspects of marketing strategy and implementation across digital, social, conference, advertising and PR channels. Mike has spent over 20 years in marketing for scholarly publishing, previously at Emerald, Ingenta, Publishers Communication Group, the New England Journal of Medicine and Wolters Kluwer. He has made the rounds at information industry events, organized conference sessions, presented at SSP, ALA, ER&L and Charleston, and blogged on topics ranging from market trends, library budgets and research impact, to emerging markets and online communities.. Twitter Handle: @mikegroth72

Higher Education Textbooks: Student Watch Key Findings

As colleges and universities welcome students back to campus, it's a good time to revisit some of the findings from the National Association of College Stores' (NACS) Spring Study. Higher Education publishers are deep in the trenches, dealing with disruptions from Amazon, proliferation of digital distribution channels, and pricing transparency issues

 

From the National Association of College Stores. Used with permission.

 

In the Student Watch 2015 to 2016 Academic Year Report, NACS provides a number of useful attitudes and behavior toward course materials that should be interesting to higher ed publishers:

  • use of digital materials continued its slow and steady growth with 6 out of 10 students using at least one digital component during the fall 2015
  • 17% of students said they had not yet used a digital format during their college career
  • print is still the preferred textbook format
  • 26% prefer a print book with digital component
  • students spent an average of $602 on course materials during the year, compared with $563 last year
  • the campus store remains the top source for course materials purchases
  • the second most popular source for course materials is Amazon
  • the rentals market appears to have plateaued with about 40% of students renting at least one unit during both the fall 2014 and 2015 terms
  • during the spring 2016 term, rentals accounted for 24% of the units purchased and 17% of the dollars spent
  • convenience and lower cost remain the top reasons for acquiring digital
 

From the National Association of College Stores. Used with permission.

 

For more information and to grab a copy of the full report, click here.


Cenveo Publisher Services works with all types of educational publishers. From content creation to XML, we provide full-service editorial and production teams that include instructional designers, subject matter experts, editors, and writers. Whether it’s core textbook work or supplement creation and management, we can help.


Related Case Studies

McGraw-Hill Education: Book Management for a Landmark Textbook [click here]

National Geographic Learning: High-End ESL Production With Hybrid Workflow [click here]

GVE Online Education: Reinventing ESL Instruction With Innovative eLearning Solutions [click here]


Comment

Mike Groth

Michael Groth is Director of Marketing at Cenveo Publisher Services, where he oversees all aspects of marketing strategy and implementation across digital, social, conference, advertising and PR channels. Mike has spent over 20 years in marketing for scholarly publishing, previously at Emerald, Ingenta, Publishers Communication Group, the New England Journal of Medicine and Wolters Kluwer. He has made the rounds at information industry events, organized conference sessions, presented at SSP, ALA, ER&L and Charleston, and blogged on topics ranging from market trends, library budgets and research impact, to emerging markets and online communities.. Twitter Handle: @mikegroth72

Good vs Valid XML: Cheap is Dear

For many years I preached the merits of XML-first and XML-early workflows before it was the norm. Now my platform is "good vs valid XML."

Any service provider can provide XML.

Indeed automated XML is pretty much a standard output from most systems that have anything to do with publishing. It's been 13 years since Microsoft Office introduced the XML formats for Excel and Word files.

Yet when I hit the road and speak with publishers about their challenges, a lot comes back to what I put into this bucket of "good vs valid XML." There is a distinction between a valid XML file and a good valid XML file. You can have a file that is valid but doesn’t really achieve the goal of what the content is supposed to be. What happens too often is that budgets demand, or conversion teams choose to do whatever is easiest (i.e., cheapest) instead of doing the right thing to create a good XML file.

Let's look at some examples

Glossary Example

Following is the rendered text and image:


Following are examples of what I call "good" XML and "valid" XML. Take note of the tagging structure used. The <dl> tag itself better defines the content and provides inherent semantic meaning. The valid XML example is missing alternative text. Without alternative text publishers are missing out on improving SEO and, more important, are failing at content accessibility.

Good XML

Definition term with class to differentiate languages

Valid XML

Definition term in paragraph element with strong element.

<dl> <dt class="english"> <strong>amphibian</strong> </dt> <dd>(am fib&#x00B4; &#x0113; &#x0259;n) An animal that lives part of its life in water and part of its life on land. My pet frog is an <strong>amphibian.</strong></dd> <dt class="spanish"> <strong>anfibio</strong> </dt> <dd>Animal que pasa parte de su vida en el agua y parte en tierra. Mi ranita es un <strong>anfibio.</strong> <imggroup> <img id="pEM002-001" src="./images/U99C99/pEM002-001.jpg" alt="A red-eyed tree frog wrapped around a green branch."/> </imggroup> </dd> </dl>
 
<p><strong>amphibian</strong> (am fib&#x00B4; &#x0113; &#x0259;n) An animal that lives part of its life in water and part of its life on land. My pet frog is an <strong>amphibian.</strong></p> <p><strong>anfibio</strong> Animal que pasa parte de su vida en el agua y parte en tierra. Mi ranita es un <strong>anfibio.</strong> <imggroup> <img id="pEM002-001" src="./images/U99C99/pEM002-001.jpg" alt=""/> </imggroup> </p>

Annotated Text Example

Following is the rendered text and image:

The good and valid XML demonstrate an image with annotated text (good) and just an image (valid). Take note of the alternative text in the valid XML example. This description is virtually useless to a visually impaired reader.

Good XML

<sidebar render="required" id="fig_chap03_004"> <hd><strong>Figure 3-4</strong></hd> <br/>A purpose statement explains a website&#8217;s overall goals and the specific objectives that will be used to achieve those goals. <imggroup> <img id="p075-001" src="./images/U00C03/p075-001.jpg" alt="A page from a book that shows a purpose statement example with goals and objectives."/> <caption imgref="p075-001">&#x00A9; 2015 Publisher Name</caption> <prodnote render="required" imgref="p075-001"> <p>primary goal</p> <p>secondary goals</p> <p>objectives</p> <p><strong>Regifting Website</strong></p> <p><span class="underline">Purpose Statement:</span> </p> <p>The goal of the reusable and …</p> <list type="ul" depth="1"> <li>Promote an online …</li> …… </list> </prodnote> <prodnote render="required"/> </imggroup>
 

Valid XML

<sidebar render="required" id="fig_chap03_004"> <hd><strong>Figure 3-4</strong></hd> <br/>A purpose statement explains a website&#8217;s overall goals and the specific objectives that will be used to achieve those goals. <imggroup> <img id="p075-001" src="./images/U00C03/p075-001.jpg" alt="regifting website"/> <caption imgref="p075-001">&#x00A9; 2015 Publisher Name</caption> <prodnote render="required"/> <prodnote render="required"/> </imggroup> </sidebar>

Alt Text Example

 
 

This example demonstrates an image with alt text (good) compared with XML just as an image (valid). Alt text improves discoverability and supports accessibility.

Good XML

<sidebar render="required" class="quote"> <q>I bet the folks at home would like to know what we&#8217;re going to do this year!</q> <imggroup> <img id="piii-001" src="./images/U00/piii-001.jpg" alt="A teenage boy in jeans and sneakers smiling with hands folded in front of him."/> <prodnote render="required"/> <prodnote render="required"/> </imggroup> </sidebar>
 

Valid XML

<sidebar render="required"> <imggroup> <img id="piii-001" src="./images/U00/piii-001.jpg" alt=""/> <prodnote render="required"/> <prodnote render="required"/> </imggroup> </sidebar>

Takeaways

Talk to your vendor about the quality of the XML they produce. The proliferation of offshore vendors has brought pricing models down and this has impacted quality. While price is of great importance and low-cost XML is attractive, publishers are finding that thoughtfulness and editorial quality have been slipping away. With so much technology integrated into publishers’ workflows, it is easy to forget that human QA ensures premium editorial and production services.

  • Good XML is critical for accessibility
  • Good XML improves downstream discoverability
  • Good XML involves automation plus human intervention and that equals quality

If you would like to learn more about some of the ways we help publishers improve XML file creation and XML publishing workflows, simply click the link below.

 

The XML sample file was excellent. I went through it tag by tag, attribute by attribute, entity by entity, and I was very impressed by the level of attention to detail shown. You and your team deserve credit. Over the last 20 years or so I have seen sample files from both sides of the fence—-both supplying them and receiving them—-and these were the best I have ever seen!

Learn more about accessibility in our white paper. Click here to download.


Comment

Mike Groth

Michael Groth is Director of Marketing at Cenveo Publisher Services, where he oversees all aspects of marketing strategy and implementation across digital, social, conference, advertising and PR channels. Mike has spent over 20 years in marketing for scholarly publishing, previously at Emerald, Ingenta, Publishers Communication Group, the New England Journal of Medicine and Wolters Kluwer. He has made the rounds at information industry events, organized conference sessions, presented at SSP, ALA, ER&L and Charleston, and blogged on topics ranging from market trends, library budgets and research impact, to emerging markets and online communities.. Twitter Handle: @mikegroth72