Office Open XML - An International Standard

White Paper Available: Office Open XML Overview

Interoperability for Office Open XML Documents

The Office Open XML standard (OpenXML) helps to bring the benefits of multi-vendor interoperability to the pre-existing corpus of word-processing documents, presentations, and spreadsheets that are encoded in binary formats defined by Microsoft Corporation. The goal of the standard is to ensure that a document written by a conforming producer of Office Open XML can be read by a conforming consumer of Office Open XML, and that the two agree on the semantics of that document.

At the time of writing, more than 500 million users have generated documents in the binary formats, with estimates exceeding 40 billion documents with billions more being created each year. Therefore, a top priority in the design of Office Open XML was to maximize its ability to faithfully represent the information contained in the binary corpus, relegating to transitional status only content that would interfere with the primary goal of interoperability. To achieve this, the standardization process consisted of mirroring in XML the capabilities required to represent the existing corpus, extending them, providing detailed documentation, and enabling interoperability.

Why an Open Standard?

The original binary formats were created in an era when space was precious and parsing time severely impacted user experience. They were based on direct serialization of in-memory data structures used by Microsoft® Office® applications. Modern hardware, network, and standards infrastructure (especially XML) permit a new design that favors implementation by multiple vendors on multiple platforms, and allows for evolution.

Concurrently with those technological advances, markets have diversified to include a new range of applications not originally contemplated in the simple world of document editing programs. These new applications include ones that generate documents automatically from business data; extract business data from documents and feed those data into business applications; perform restricted tasks that operate on a small subset of a document, yet preserve editability; provide accessibility for user populations with specialized needs, such as the blind; or run on a variety of hardware, including mobile devices.

Perhaps the most profound issue is one of long-term preservation. We have learned to create exponentially increasing amounts of information. Yet we have been encoding that information using digital representations that are so deeply coupled with the programs that created them that after a decade or two, they routinely become extremely difficult to read without significant loss. Preserving the financial and intellectual investment in those documents (both existing and new) has become a pressing priority.

The emergence of these four forces – extremely broad adoption of the binary formats, technological advances, market forces that demand diverse applications, and the increasing difficulty of long-term preservation – have created an imperative to define an open XML format and migrate the billions of documents to it with as little loss as possible. Further, standardizing that open XML format and maintaining it over time create an environment in which any organization can safely rely on the ongoing stability of the specification, confident that further evolution will enjoy the checks and balances afforded by a standards process.

Various document standards and specifications exist; these include HTML, XHTML, PDF and its subsets, ODF, DocBook, DITA, and RTF. Like the numerous standards that represent bitmapped images, including TIFF/IT, TIFF/EP, JPEG 2000, and PNG, each was created for a different set of purposes. Office Open XML addresses the need for a standard that covers the features represented in the existing document corpus. To the best of our knowledge, it is the only XML document format that supports every feature in the binary formats.

Creation and Approval of the Standard

The work to standardize Office Open XML started in December 2005 in Ecma International via its Technical Committee 45 (TC45), which included representatives from Apple, Barclays Capital, BP, The British Library, Essilor, Intel, Microsoft, NextPage, Novell, Statoil, Toshiba, and the United States Library of Congress [1]. This effort resulted in the approval of ECMA-376 in December 2006, which was subsequently fast-tracked into ISO/IEC JTC 1. Ecma submitted the ECMA-376 standard to JTC 1 as DIS 29500 in January 2007 for approval via the fast-track process. During the letter-ballot phase, completed in September 2007, 3,522 comments were received from National Bodies. On 14 January 2008, the Editor, supported by TC45, published a Disposition of Comments document addressing all comments. On 25–29 February, 2008, a Ballot Resolution Meeting took place in Geneva, Switzerland; following this meeting, during one month, the National Bodies had the possibility of changing their vote. The standard met the ISO/IEC DIS approval criteria with 75% of the JTC 1 participating member votes cast positive and 14% of the total of national member body votes cast negative. Publication as ISO/IEC 29500 is expected to occur in October 2008, or soon thereafter. Publication of ECMA-376 Edition 2, which is fully aligned with ISO/IEC 29500, is expected to occur in December 2008.ISO/IEC JTC 1/SC 34 is the committee in charge of maintenance of ISO/IEC 29500, with active participation from Ecma TC45.