Metadata Management in Microsoft Office
How Firms Can Protect Themselves against Unintentional Disclosure and Misuse of Metadata
The steady growth of electronic document exchange has intensified awareness that Microsoft Office files include metadata beyond their printable content. Unintentional disclosure can be awkward or even raise malpractice concerns. Although metadata has been used to identify, classify, and manage documents in the legal environment for many years, the level of firm-wide understanding regarding metadata management is still lacking. While this article’s intention is not to provide a comprehensive “how to” guide on metadata, you will come away with a better sense of what metadata is, how it can be misused and overlooked, and what your firm can do to proactively control and manage it. Metadata Defined As we all know, Microsoft Word includes many automated features to aid in document production and collaboration. Unfortunately, these automated features can embed electronic information used to reveal the identity of those who edited the document (revision authors), track the time, date, and frequency of edits (track changes and revisions), commentary (inserted comments), the document template (a unique firm identifier), and other data employed to control the document’s text and format. There's even an option in Microsoft Word called "Fast Save" that, if selected, allows deleted text to remain as part of a document’s electronic file history (new text is appended). These are just a few of the hidden elements and document information found in a Microsoft Office document that make up a document's metadata. Metadata Scenarios Many users often dup-and-revise (using save as) to save time. When this occurs, the original author information, document properties, document variables, hidden text (forgotten), and last print date stay with the document. Much of this metadata can be seen by looking at the document properties or by opening the document using a text editor. If the document is being prepared for a client who is paying for its creation, then it is even more important that all the metadata is removed before it is shared with the client. Tracked changes being left in a document are a common occurrence which alerts many people to the dangers of metadata. When a document has been edited using a powerful collaboration feature in Microsoft Word called track changes, they still remain with the document – even if they are not visible to the eye, unless those changes have been accepted. The track changes feature can be turned off, but this does not eliminate the existing track changes. If the document is sent to another user, whether a cooperator or an adversary, the recipient simply has to turn track changes on to see all the revisions of that document. Comments, as with track changes, remain with a document, if not deleted. When the "Reviewing" choice is set to "Final" and not "Final Showing Markup", then comments are invisible to the eye. If this document is shared outside the firm, the recipient can view the comments, which may contain embarrassing information that was never intended to be viewed outside of the originating company’s walls. Metadata referred to here as "identifier metadata" can reveal the originator based on the metadata’s uniqueness to both the user and firm. Identifier metadata includes uniquely named styles, bookmarks, hidden document variables, and custom document properties. Identifier metadata, although not necessarily considered high risk, should to be managed if the originator needs to remain anonymous or if document creation strategy is revealed by the metadata trail. Metadata “mismanagement” stories abound. Case in point; in 2004, a Microsoft Word document, produced as part of a lawsuit filed by SCO against DaimlerChrysler and AutoZone, revealed that SCO’s attorneys had also prepared a complaint against Bank of America. The document identified Bank of America as the defendant instead of the automaker. This revision and others in the document could clearly be seen through tracked changes. In another metadata disclosure blunder, the British government published a dossier on Iraq’s security and intelligence services without removing the related metadata. Upon further review, it was discovered that much of the text was plagiarized directly from a U.S researcher whose work was published on the internet. To add insult to injury, the report also revealed a list of the dossier’s last ten authors and their edits and commentary. Key Strategies for Metadata Control As the legal community becomes more aware of metadata and the damage unintentional disclosure of document information can cause, the necessity to establish metadata control strategies and parameters is becoming blatantly evident. Here are three recommended approaches worth considering: 1. Educate your firm about metadata concerns. Understand features that embed metadata (i.e. track changes) as well as the control and ramifications of these features. Much of the metadata that is inherited from the “dupe and revise” practice can be eliminated simply by using firm templates to create new clean documents that have minimal metadata. There are very powerful template and automation packages on the market now that, in many instances, are much faster and efficient that the standard dupe-revise. These packages also provide tools the help you copy text from one document into another without the inclusion of hidden metadata in the "copied" text. 2. Control and manage metadata via third party metadata scrubbing and management software. Microsoft provides a metadata removal tool for Microsoft Word, but it is rudimentary in its approach, doesn't catch out going email attachments, and scrubs a limited number of metadata elements. A more powerful third party metadata application not only scrubs metadata but allows a firm to manage the metadata at a very detailed level. For instance, you may want to keep track changes in a document, but eliminate the author and editing time information. Or a firm may want to ensure the user's name is never left in a document, but rather the firm's name is used instead. Always use a metadata removal application that publishes a clean copy of the document before it is shared electronically outside the firm. 3. Establish a firm-wide metadata scrubbing and management standard. Establishing metadata-related policies and procedures eliminates the need for individual users at your firm to decide what metadata gets scrubbed and makes the scrubbing process more efficient. This step is very important and should involve key users, especially attorneys. The firm metadata standard can be set up in levels that reflect what metadata gets scrubbed or changed (managed). For instance, a cooperator level might include most document properties but leave author information for collaboration purposes. An adversary level scrub might remove all metadata including turning all field codes to text and then converting the scrubbed document into a PDF for added metadata protection. In conclusion, metadata in Microsoft Office documents is real and can pose metadata risks to a firm if left unmanaged or even ignored. It is important that law firm users be educated about metadata elements and risks and articulate a medata strategy by considering the establishment of metadata standards or best practices. (reprinted by permission from Esquire Innovations)
Return to White Papers main page from Metadata Management in Microsoft Office
|