The Elusive Promise of Content Reuse
Content reuse is one of the main “selling points” of DITA. Teams looking to make the transition to DITA are excited by the possibility of reducing the amount of new content they have to create and leveraging their existing content in many different ways. Yet when those teams actually look to reuse content, they find it’s not simple. Too often, reuse is inconsistent and neither a time nor money saver.
Why is effective reuse so darn hard to achieve? Documentation teams face three main challenges when trying to implement content reuse:
- Writing content for reuse
- Finding content to reuse
- Communicating about reused content
Let’s take a closer look at each of these challenges. But first a disclaimer: the concept of reuse generally, but not necessarily, implies structured content. These days, structured content more and more implies DITA. The following article uses some DITA terminology for the sake of a convenient example, but DITA is certainly not the only authoring model in which reuse is possible.
Writing Content for Reuse
Very few teams have the luxury of creating a fresh new body of content with reuse in mind from the start. Instead, teams have a large content store that’s been created very monolithically, as huge HTML topics or long documents. The first step in moving to a structured authoring model is often to break these large topics or documents up into smaller pieces, and at a high level, most teams understand that the goal is to end up with smaller pieces of content that can be used in multiple deliverables. Beyond that step, however, the direction is not so clear. How small is small enough? How small is too small? Where is the line between flexible and out-of-control? There is no single, textbook answer to these questions.
Many teams abandon a reuse strategy right here and satisfy themselves with having moved to a structured model. After all, it’s pretty easy to convert content from HTML to DITA—a <p> is still a <p>, this <li> clearly still a <li> but this one is obviously a <step>, and so on. But, spotting a note in one of your topics, thinking you remember it or something very similar in other topics, finding all those instances (which are indeed similar but not quite the same), deciding on one common verbiage, creating a reusable object, and replacing the note in all your topics with that reusable object—all of that is far from rote, or automated, or easy. The note example is actually a simple one. Far more common is the need to evaluate an entire content store to make sure it uses the same terminology, style, and voice.
You also need to evaluate every specific reference to a product or part or module (to name a few) to be sure that the specific reference is absolutely necessary and could not be replaced with a more generic reference that would increase the reusability of the topic without compromising the effectiveness of the topic. Specific references that are necessary need to be evaluated for replacement with a variable or a keyref, for example.
Contextual phrases such as “in the previous section” need to be located and removed. And, as perhaps the most difficult item of all, content needs to be organized consistently. (Consistency is a good idea in any case, but it’s absolutely essential for effective reuse.) Consistency often requires significant rewriting of the content; it’s not enough to simply convert it as-is from one format to another.
While conversion can often be largely automated, rewriting cannot. If you cannot commit to devoting very large chunks of time to this rewriting and reorganizing effort, you will not be able to reuse your content successfully. Going forward, when creating new content in a reuse world, writers must carefully adhere to the standards the team has developed. Standards adherence means relying on templates and internal guidelines, some of which can be built into the authoring tool and some of which cannot.
Lots of teams have these guidelines already and in their haste to complete work and get deliverables out the door, writers let the non-built-in standards slide. It’s important to get a commitment from writers up front that, no matter how rushed they are, they will still take the time to verify the correct format or structure when they’re not sure.
Finding Content to Reuse
Now you’ve got a very large body of content that has been optimized for reuse—thousands of topics. You, a writer, are preparing a new deliverable and you want to find out if there are any topics that are a good fit. Where do you start? There is no magic button for finding potentially reusable content. Even if that content has robust and accurate metadata, you need to perform the right kind of search to find the content that has exactly the right combination of metadata that applies to your current project— or at least a combination that’s pretty close. Even the most accurate and targeted content search is likely to turn up far more topics than you actually want to or can use in your new deliverable.
You need to comb through the results of the search to discover what is and is not suitable. Yes, that’s right: you have to look at the content of the search results and that takes time—people time, not machine time. Finding content to reuse can take as long as or longer than writing new content from scratch, so if you’re looking at reuse as a time-saver, be aware that you might not see this benefit on the authoring side. So where will you see the benefit? On the review/approval side.
Every new topic that you create must be (or should be) sent for reviews: technical review, peer review, and so on. Time saved by simply creating new content rather than reusing existing content might be spent instead reviewing the new content. For the first authoring cycle, it might be an even exchange, but consider: over five authoring cycles, without reuse, you create content from scratch rather than reusing. You go through five full cycles of both write and review.
With reuse, you have instead one full cycle of write and review, and four cycles with reuse and much less writing and review, saving significant time in the long run. As your store of reusable content grows over time, you’ll see greater potential returns from reuse and less need to create new content, but on the other hand, with more reusable content comes the added challenge of finding it. With a good case for taking the time to find reusable content, how can the search be made easier? One word: metadata. Metadata can take different forms; the term doesn’t necessarily refer only to categorization attributes and elements. Keywords and index terms within content are a form of metadata and so is something as basic as a file structure. Many Help Authoring Tools (HATs) provide ways to categorize topics.
The point here is that some kind of method for specifically tagging content as pertaining to certain products, certain audiences, certain modules—whatever you need— is essential to successful, effective reuse. The more specific your metadata is, the more targeted your search results can be, leaving you with fewer and more likely results to evaluate for suitableness in your current context. If the net is too wide, writers will be less willing to evaluate the results and more inclined to give up and create content from scratch unnecessarily. Good metadata requires a lot of planning and maintenance.
To some extent, you can add basic metadata without knowing exactly how your content will be reused. Beyond the simplest cases, however, you have to know what content you plan to reuse and how you plan to reuse it before you can determine how you need to apply metadata to that content. Planning can be a very large and time-consuming task. It’s a bit of a paradox: You don’t have enough time to meet your customers’ content needs. You want to save time with reuse. You don’t have enough time to develop a good reuse strategy, so reuse is spotty and inconsistent. Writers can’t find suitable reusable content, so they don’t reuse. Reuse fails. You’ve wasted the little bit of time you put into reuse because you didn’t put enough time into it.
To successfully implement reuse, you must make a time commitment up front in order to save time downstream.
Communicating about Reused Content
Communication is one area where no tool or technology can substitute for old-fashioned conversation. At the end of the day, no matter what Content Management System (CMS) or documentation methodology you are using, you still have to talk to each other. Reused content implies shared content, and shared implies that more than one writer has a stake. Before you make changes to a piece of content, you need to know all the places where that content is currently used and what the effects of your change will be across your content store.
Any good CMS (and many HATs) either actively display a list of dependencies when you attempt to change content, or passively enable you to view such a list. Pay attention! Look at the list! But don’t stop there. No CMS or any other tool can make an intelligent, context-based decision about whether reuse is appropriate in a given situation. That’s where you, the writer or information architect, must come in.
Find out who the last person to edit the content was and give that person a call to find out if the planned change will adversely affect her deliverable. It’s possible that you and the other writer(s) need to work together to rewrite the topic so that it suits all of your needs. It’s also possible that you can’t make the topic work for all of you and someone needs to create a new topic or topics. You won’t know this unless you talk to each other. Never force reuse just for the sake of reuse, and don’t go down the slippery slope of spaghetti conditions just so you can torture a single topic into a dozen variations. Making wise decisions not to reuse is just as important as making wise decisions to reuse, and those decisions must be collaborative to be consistent.
Unfortunately, reuse has a bad name among some practitioners, and it’s true: poorly planned or forced reuse compromises the quality of your documentation. As a group, a documentation team must communicate more closely than ever before when implementing reuse. Decisions about what to reuse, how to reuse it, when to reuse versus when to create new content—everything has to be discussed, decided upon as a team, and implemented consistently. If your team has not communicated well in the past or has not had a need to communicate very frequently, make sure that everyone is committed to changing that. On a related note, make sure that everyone on the team understands that the concept of content “ownership” is going to change radically. Sure, certain writers may remain primarily responsible for certain content, but the potential for anyone to use and change that content exists.
Assuring the writers that there are solid lines of communication in place to make sure everyone is aware of changes to “his” or “her” content is the best way of dealing with the misgivings that are likely to arise in a new shared content environment. If there is one theme that runs through all three of these challenges, it’s time. It takes time to evaluate and rewrite content for reuse.
It takes time to determine what your reuse cases are and create a metadata scheme that allows you to reuse as you need to. Even with good metadata and a good search facility, it takes time to evaluate “matches” to determine what content is suitable for reuse in your current context.
Finally, it takes time to stop what you’re doing and pick up the phone or walk over to a colleague’s desk to talk about a topic that you both plan to use, but that might need some work to meet both your needs. Many teams are discouraged early by the enormity of the planning effort and the ongoing time commitment required for effective reuse. Teams put reuse on a back burner—something they’ll come back to when their content store is more mature and they have time to plan and implement a reuse strategy. Unfortunately, that time often never comes because the teams find themselves buried under an avalanche of new documentation requests and scramble just to keep up. It takes time up front to save time downstream. Worse, some teams begin writing and reusing without a plan in place, resulting in haphazard and ineffective content reuse.
Content reuse is not for the faint of heart, but with a commitment to putting sufficient time into the planning and maintenance of reused content, documentation teams can see measurable time savings and enormous improvements in the accuracy and consistency of their deliverables.
About the Author
Leigh White
Leigh White is a DITA Specialist, where she works with product integrations, product design, and marketing communications. Leigh has spoken on DITA, content management systems content conversion, and DITA-OT plugin development at a number of industry conferences, including DITA North America, DITA Europe, Intelligent Content, Lavacon, Writers UA, DITA Netherlands, and Congility. She is the author of DITA For Print: A DITA Open Toolkit Workbook and a contributor to The Language of Content Strategy and The Language of Technical Communication. Leigh is also a member of the OASIS DITA Technical Committee.