Home
DM Blog
Metadata News
Metadata Articles
White Papers
Office Metadata
MS Word
Contact Us

Microsoft Office Document Inspector

Microsoft Office Document Inspector

 

With the advent of Microsoft Office 2007 (MSO07) and its addition of the Microsoft Office Document Inspector, our clients (and prospects) are asking if there's a need for iScrub, our enterprise metadata management application. Has Microsoft, in one single blow, nullified many companies' investment in iScrub? If metadata scrubbing is now built-in, why use another application?

 

In this white paper, I want to compare two approaches to metadata management and to show that the Microsoft Office Document Inspector is significantly lacking as an enterprise-metadata management tool. The first approach is the out-of-the-box Microsoft Office Document Inspector (DI) object model. The second approach is Esquire Innovations metadata management application, iScrub.

 

Microsoft Office Document Inspector is Microsoft's response to the market's outcry about the hidden data that can so easily be stored in Microsoft Office documents. When sharing these files outside of the company or firm, there's risk of disclosing discoverable, unintentional, confidential or hidden information that might be adverse to client representation or, at least, extremely embarrassing. Prior to Microsoft Office Document Inspector, Microsoft provided the “Remove Hidden Data Tool” that was barely usable and kludgy at best, and the Microsoft Office Document Inspector was a needed addition.

 

Microsoft Office Document Inspector

 

Microsoft's idea behind the Microsoft Office Document Inspector is to provide a central location for users to view MSO07 documents for personal, hidden, or sensitive information. To view or remove this information a user can use the built-in DI (see Figure 1).  An organization can extend the DI with additional development using the DI Object Model.

 

document inspector

 

Figure 1

 

The Microsoft Office Document Inspector is Composed of Three Modules

 

The Microsoft Office Document Inspector is composed of three modules users can access to inspect and remove specific metadata from the document – MSO07 Word DI, MSO07 Excel DI, and MSO07 PowerPoint DI

 

Metadata Elements for MSO07 Word DI

  • Comments
  • Revision marks from tracked changes
  • Document version information
  • Ink annotations
  • Document properties, including information from the Summary, Statistics, and Custom tabs of the Document Properties dialog box
  • E-mail headers
  • Routing slips
  • Send-for-review information
  • Document server properties
  • Document Management Policy information
  • Databinding link information for databound fields (last value will be converted to text) Note: Does not handle some linked fields such as IncludeText
  • User name
  • Template name
  • Text that is formatted as hidden (a font effect that is available in the Font dialog box)

 

Metadata Elements for MSO07 Excel DI

  • Comments
  • Ink annotations
  • Document properties, including information from the Summary, Statistics, and Custom tabs of the Document Properties dialog box
  • E-mail headers
  • Routing slips
  • Send-for-review information
  • Document server properties
  • Document Management Policy information
  • User name
  • Printer path information
  • Scenario comments
  • File path for publishing Web pages
  • Comments for defined names and table names
  • Inactive external data connections
  • Information in worksheet headers
  • Information in worksheet footers
  • Hidden rows
  • Hidden columns that contain data
  • Objects that are not visible because they are formatted as invisible

 Metadata Elements for MSO07 PowerPoint DI

  • Comments
  • Ink annotations
  • Document properties, including information from the Summary, Statistics, and Custom tabs of the Document Properties dialog box
  • E-mail headers
  • Routing slips
  • Send-for-review information
  • Document server properties
  • Document Management Policy information
  • File path for publishing Web pages
  • Objects that are not visible because they are formatted as invisible
  • Text that was added to the Notes section of a presentation
  • Custom XML data that might be stored within a presentation

Removing Metadata from the MSO07 Document using DI

 

Once the user selects Inspect (see Figure 1), the DI dialog box displays the type of metadata found in the document. The Microsoft Office Document Inspector provides buttons to remove specific metadata elements contained in that document (see Figure 2).

 

Document Inspector Dialog Box - Inspect

 

Figure 2

 

Once the user selects which metadata to remove, they can recheck the document for metadata by selecting the Reinspect button (see Figure 2)

 

Extending the Microsoft Office Document Inspector

 

Microsoft Office Document Inspector can be extended using VBA and managed code (Visual Basic .Net). Microsoft has added a new Document Inspectors collection type to the object models in MSO07 Word (Document object), MSO07 Excel (Workbook object), and MSO07 PowerPoint (Presentation object). This means that an organization with the programming resources can use either VBA or .Net to develop its own custom DI modules.

 

iScrub

 

iScrub is the premier enterprise solution for metadata removal and metadata management in document intensive organizations. iScrub uses sophisticated technologies to remove the visible document properties and scrubs the difficult to reach file elements, such as the list of past authors (all document authors) and Deleted Text.

 

iScrub provides a centralized administration feature that allows firms to establish and control the metadata removal settings - this is called an enterprise-metadata management approach.

 

iScrub publishes the clean version of a document, separate from the original file, inside or outside of a document management system.

 

 iScrub works with Outlook, Lotus Notes and GroupWise to prompt users to scrub e-mail attachments before sending them; automatically helping to prevent sensitive metadata information from leaving the organization.

 

 Microsoft Office Document Inspector Limitations

 

Microsoft Office Document Inspector's lack of extensive out of the box metadata management ability is not suited for an enterprise-metadata management approach. The onus is on individual users to "inspect" their documents and then decide what to remove.

 

With the Federal Rules of Civil Procedure relating to electronically stored information, relying on Microsoft Office Document Inspector places the company or firm at risk of "...inadvertent production." The firm should decide how to manage a document's electronic information (me`tadata) from an enterprise-wide approach, not individual users.

 

Microsoft Office Document Inspector Removes Metadata from the Original

Microsoft Office Document Inspector does not publish a result document making accidental removal of the metadata very easy. If the user unintentionally removes metadata using the Microsoft Office Document Inspector, there are metadata items that cannot be “undone” (see Figure 3).

 

Document Inspector Dialog Box - Undo

 

Figure 3

 

In firms where document collaboration and client work product are the currency, accidental metadata destruction can be quite costly. For instance, an attorney asks a secretary to send an agreement he’s been working on all-night to his client. This particular document contains his and a colleague’s comments along with their track changes.  He tells the secretary to send the client a copy with the metadata scrubbed. The secretary uses Microsoft Office Document Inspector to inspect the document and notices that this document has “Revision marks and Comments” with a red EXCLAMATION POINT! (see Figure 4).  This must not be good, so he selects “Remove All” and then realizes in a panic that this was the original.  He tries to “undo” the removal and can’t.

 

Microsoft Office Document Inspector does not preserve the original and makes it too easy to lose important metadata from the original.  Along with this, Microsoft Office Document Inspector uses the term “Revision” when in fact it is Track Changes – this is confusing.

 

Document Inspector Dialog Box - Comments

 

Figure 4

 

Below is a list of metadata the inspector removes that cannot be undone for MSO07 Word:

  • Comments
  • Revisions (Track Changes)
  • Versions
  • Annotations
  • Custom Properties
  • Template Name
  • Statistics
  • Data binding link information for data bound fields (last value will be converted to text)
  • Template name

 

CAUTION: Microsoft Office Document Inspector does not always remove personal information in MSO07 Word.  In Office 2003, when personal information was removed the author info was removed from track changes.  In Microsoft Office Document Inspector the author information is NOT removed.

 

Header Footer Removal is Destructive

Here’s a feature of Microsoft Office Document Inspector I just don’t get. When the “Headers, Footers and Watermarks” (see Figure 5) are removed, the Microsoft Office Document Inspector removes everything in the footer, including the page number.

Document Inspector Dialog Box - Headers and Footers

 

Figure 5

 

For long documents such as agreements, contracts and corporate documents where the footers are complex, this can cause some major problems…and heart ache!  This can be Undone, but if you forget and save it first, it’s gone.

 

Not all Databinding Link Information Is Removed in MSO07 WORD

There are fields in all versions of Microsoft Word that can contain linked data in the form of text, pictures and hyperlinks that can reference files on a server that Microsoft Office Document Inspector does not remove or unlink (turn the field to text).

 

Here are examples of Link Fields that are not removed from a doument using the Microsoft Office Document Inspector (notice the server name and path information):

 

  • { HYPERLINK "\\\\\\\\PRODEV\\\\People\\\\JDoe\\\\DOCS-" \\l "609447-v18-Bylaws.DOC" }
  • { INCLUDEPICTURE  \\\\\\\\ PRODEV \\\\People\\\\ JDoe\\\\iRedlineLogo.gif"  \\* MERGEFORMAT }
  • { LINK  Equation.3 \\\\\\\\ PRODEV \\\\People\\\\ JDoe\\\\iRedlineLogo.gif"  \\p }
  • { INCLUDETEXT  "\\\\\\\\ PRODEV \\\\People\\\\ JDoe\\\\DOCS-#609447-v18-Bylaws.DOC"  \\* PRODEV }

 

MSO07 Excel Formula Errors Will Occur when Hidden Rows, Columns and Worksheets are Deleted

If there are formulas in a document that are referencing other values in hidden rows, columns or worksheets, when Microsoft Office Document Inspector removes them an error occurs (#REF!) in the formulas that originally referenced them.  On the other hand iScrub converts the formulas to values before unhiding or deleting these items.

 

Metadata Elements That Can’t Be Managed

Microsoft Office Document Inspector lacks the ability to manage much metadata. And unless a firm invests in development efforts to extend it, Microsoft Office Document Inspector is not robust enough to implement an enterprise-metadata management policy.

 

Additionally, the Microsoft Office Document Inspector does not show what the metadata is, or where it is. For instance, once the user inspects the document, DI will tell them that there are document properties (built-in and custom), but doesn't show what those document properties are. This metadata may contain case-supporting evidence that should be disclosed or discovered.

 

Because DI lacks the ability to view specific metadata then this metadata can be left in a document. If this document becomes part of an e-discovery process, then it could prove to be costly and embarrassing because at that point the metadata could be revealed outside the walls of a firm. The "..producing party must notify the opposing party and court and retrieve that information should privileged information be inadvertently produced."1

 

The table below shows the metadata Microsoft Office Document Inspector removes, compared to the metadata iScrub manages.

 

Metadata Element

Microsoft Office Document Inspector Removes

iScrub2 Manages

Multiple Document Scrub (Batch Scrubbing)

No

Yes

Word

 

 

Comments

Yes

Yes

Change Author Names

No

Yes

Track Changes

Yes

Yes

Document server properties

Yes

Yes

Document Management Policy information

Yes

Yes

Keep Track Changes Remove Author

No

Yes

Revision Number

Yes

Yes

Versions

Yes

Yes

Annotations

Yes

Yes

Built-in Properties

Yes

Yes

Custom properties

Yes

Yes

Preserve specific Custom properties

No

Yes

Personal Information

Yes

Yes

Custom XML Data

Yes

Yes

E-mail head

Yes`

Yes

Hidden Text

Yes

Yes

Keep Track Changes Remove date and time

No

Yes

Bookmarks

No

Yes

Unused Styles

No

Yes

Normalize Custom Styles names

No

Yes

Set Compatibility

No

Yes

Diminutive Fonts

No

Yes

Document Variables

No

Yes

Embedded True Type Fonts

No

Yes

Field Codes

No

Yes

Hyperlinks

No

Yes

Hyperlink history

No

Yes

Include Text Fields

That contain network paths

No

Yes

Invisible Ink

No

Yes

Linguistic Data

No

Yes

Linked Objects

No

Yes

Random Number

No

Yes

Routing Slips

Yes

Yes

Smart Tags

No

Yes

Style Sheets

No

Yes

IncludePicture Fields

No

Yes

Edit Time

Yes

Yes

Print Date

No

Yes

Creation Date

No

Yes

Modified Date

No

Yes

Convert Legacy document to Docx

No

Yes

Send-for-review information

Yes

Yes

Template name

Yes

Yes

 

 

 

Excel

 

 

Comments

Yes

Yes

All external data connections

No

Yes

Keep Comments Remove Author

No

Yes

Comments for defined names and table names

Yes

Yes

Annotations

Yes

Yes

Built-in Properties

Yes

Yes

Custom properties

Yes

Yes

E-mail head

Yes`

Yes

Personal Information

Yes

Yes

Custom XML Data

Yes

Yes

Document server properties

Yes

Yes

Document Management Policy information

Yes

Yes

Headers and Footers

Yes

Yes

Headers and Footers Specify Left, Center or Right Footers only

No

Yes

Delete Hidden Rows and Columns

Yes

Yes

Unhide Hidden Rows and Columns

No

Yes

Delete Hidden Sheets

No

Yes

Unhide Hidden Sheets

No

Yes

Linked Objects

No

Yes

Invisible Objects

Note: DI  cannot detect text that was hidden by other methods (for example, white text on a white background).

Yes

Yes

Printer path information

Yes

Yes

Track Changes

No

Yes

Custom Number Formats

No

Yes

Custom Style

No

Yes

Custom Views

No

Yes

Diminutive Fonts

No

Yes

External Links

No

Yes

Fonts Matching Cell Color

No

Yes

Formulas

No

Yes

Hyperlinks

No

Yes

Hyperlink history

No

Yes

Normalize Sheet Names

No

Yes

Pivot Tables – disable refresh

No

Yes

Pivot Tables – remove cache Data

No

Yes

Pivot Tables – remove Data Connection

No

Yes

Pivot Tables – remove Refresh Authors

No

Yes

Range Names

No

Yes

Scenarios

No

Yes

Smart Tags

No

Yes

 

 

 

PowerPoint

 

 

Comments

Yes

Yes

Annotations

Yes

Yes

Built-in Properties

Yes

Yes

Custom properties

Yes

Yes

E-mail head

Yes`

Yes

Personal Information

Yes

Yes

Custom XML Data

Yes

Yes

Invisible On-Slide Content

Yes

Yes

Document server properties

Yes

Yes

Document Management Policy information

Yes

Yes

Presentation Notes

Yes

Yes

Headers Footers

No

Yes

Delete Hidden Slides

No

Yes

Unhide Hidden Slides

No

Yes

Hyperlinks

No

Yes

Hyperlink history

No

Yes

Linked Objects

No

Yes

Notes Master

No

Yes

Slide Master

No

Yes

 

 

 

PDF documents

No

Yes

Document Title

No

Yes

Document Author

No

Yes

Document Subject

No

Yes

Keywords

No

Yes

Application Creator

No

Yes

Application Producer

No

Yes

 

Different Levels of Metadata Management

Metadata should be managed differently depending on who the document is going to or its intended purpose. If a document is going to a client or collaborator then perhaps only certain metadata elements might be removed. If the document is going to an adverse party, then most (if not all) of the document's metadata should be removed. A company may wish to provide several standardized levels of metadata management to their users, thus removing the decision-making responsibility from the individual, and transforming it into a conscious enterprise approach.

 

Microsoft Office Document Inspector does not provide different levels of inspection and removal. This disadvantage makes Microsoft Office Document Inspector a poor choice for enterprise-metadata management. Microsoft Office Document Inspector relies on each user to understand and remove metadata components they believe to be potentially damaging. Therefore, by its nature (to be effective), extensive user education and training is required.

 

iScrub enables a company to set up  fixed standards for metadata removal and enforce those standards. There are up to 5 different levels of scrubbing. Users simply have to select one of the levels available to them - there is no guess work and little training needed.

 

Preventing Metadata Disclosure for Email Attachments

iScrub prompts the user to scrub a document from e-mail. When a user has an attachment, iScrub sees it, and will remove the metadata as the document exits the company’s electronic walls.

 

Microsoft Office Document Inspector only works within its intrinsic Office Object Model, and will not prompt users to remove the metadata from within Outlook3.  Once again, the firm must put its trust in the individual user; trust in his/her memory to actually apply the DI before attaching the document, and trust in his/her judgment or knowledge to remove the proper elements for that specific transaction. .

 

Lack of E-Discovery Features

As more and more companies are instituting E-discovery processes for managing internal electronic information (metadata), the ability to report on what metadata is in the document and what has been removed, becomes paramount. Microsoft Office Document Inspector lacks any reporting capability and, in fact, the user has no idea what has been removed or where it was in the document.

 

iScrub for Office 2007, on the other hand, will provide an XML output file that can be utilized in any number of ways. iScrub's report will detail all the metadata in the document and also report on what was removed.

 

Summary

Microsoft Office Document Inspector is significantly lacking as an enterprise-metadata management tool. The limited number of metadata elements that can be removed (much less viewed or actually managed) makes it a poor choice for document intensive organizations that truly need to manage their discoverable metadata. On the other hand, the costs and effort associated with extending the Microsoft Office Document Inspector to a richer metadata management model will be much higher and less efficient than investing in a proven metadata product, such as iScrub, which does significantly more out of the box at a much lower overall cost.

 

(reprinted by permission from Esquire Innovations)


1 Michele C.S. Lange, Esq. "New FRCP Rules: What Does it Mean for You" MSBA Computer and Technology Law Section. December 01, 2006, http://mntech.typepad.com/msba/2006/12/new_frcp_rules_.html

 

2 iScrub version 5 for Microsoft Office 2007

3 iScrub also works from within Lotus Notes and GroupWise



Return to White Papers main page from Microsoft Office Document Inspector