About OpenFormula

Author: The OASIS Office Formula Subcommittee

OpenFormula is the informal name of the [WWW] OpenDocument formula specification. The OpenFormula draft specification describes how to exchange recalculated formulas between any two applications (primarily spreadsheet programs). Anyone can implement this specification, both proprietary and open source software projects, and it's being developed through the cooperation of a large number of competing implementors.

OpenFormula is designed to let you own your own data, so you can choose which spreadsheet application you want to use, and still exchange data with people who made different choices. As a result, countries can remain sovereign from their suppliers, organizations are no longer entrapped by any particular supplier, data will be accessible far into the future, and anyone can innovate and use the innovations of others.

OpenFormula only covers recalculated formulas (such as those for spreadsheets). For the display of arbitrary mathematical expressions, OpenDocument uses MathML, the standard for displayed mathematical expressions in XML-based documents. MathML and OpenFormula are complementary standards, for example, the OpenFormula specification uses MathML to describe functions.

This page provides some information that counters some common myths, provides the current schedule, and describes some of the key advantages of OpenFormula. It then discusses issues related to supersets and subsets, and the "grouping" system in OpenFormula that we believes meets the needs of both implementors and users. This page then gives a brief history of OpenFormula, and explains how you can be confident that OpenFormula is an open standard. This page concludes by explaining how you can get more information.

Countering Myths

First, let's dispel some two myths:

Myth #1: You can't exchange spreadsheet files between applications that use OpenDocument format.

People are already routinely exchanging OpenDocument spreadsheet files, including formulas, between different applications. The OpenDocument specification itself, plus the hard work of many developers means that people are already quite successfully exchanging spreadsheets. That's in spite of concerns and carefully-crafted "problems" of years ago. It's also important to note that some of the old tests (such as those of NewsForge) used carefully-crafted formulas to cause problems, or used a very old version of KOffice, which had a poor formula engine. Much has been improved since then, in particular, the KOffice formula engine has since been completely replaced with a much better implementation.

And remember that if you "can't" exchange spreadsheet files using OpenDocument, the alternatives are worse. People who say "OpenDocument can't exchange spreadsheet formulas, because the definition isn't published in a final form" often use Excel's ".xls" - yet this is an essentially undocumented format! OpenFormula is already far better defined than ".xls" format, and in any case, this format is being abandoned by all spreadsheet developers. The other likely alternative is Microsoft's XML format, but this is much less mature than ODF's format (see below for more). What's worse, both of those formats are completely controlled by a single vendor who has a financial incentive to inhibit open competition. Not a good idea if you wish to be sovereign from any particular supplier.

So, go ahead and use OpenDocument right now, even for spreadsheets; typical spreadsheet formulas already exchange very well between OpenDocument applications.

Myth #2: It's okay to leave formula formats undocumented.

While files can be exchanged today between applications without a formal specification, in the long run open, public specifications are very important for full interoperability of formulas.

OpenDocument is the first format spec to include a formula specification. Contrary to what this myth might suggest, no format specification before OpenDocument format included an open specification for spreadsheet formulas. The lack of a formula specification in OpenDocument 1.0 was not an aberration. Rather, the OpenDocument community was the first in history to recognize the need for a formula specification, and wanted to ensure that it was done well. As such, we are at the forefront of the development of Open Standards. This need was first discussed in the OpenDocument TC in 2004, and it was agreed that it would be valuable (and that it would need to be done separately). The first draft of OpenFormula was released in February 2005, and was informally developed through the interaction of many in the community of OpenDocument users and application developers. OASIS formally established the formula subcommittee on February 2006; the subcommittee uses the OpenFormula project's specification as their base document.

Even more importantly, the OpenFormula specification is being developed through an open process, including competing vendors and several volunteers. It includes input from multiple implementors and multiple users, without domination and control by any.

Contrast this situation with the oft-cited alternatives to OpenDocument: The inadequately-documented .xls format, and the dreadfully incomplete Microsoft XML format. Again, both are completely controlled by a single vendor who has a financial incentive to inhibit open competition. Microsoft has steadfastly resisted, and still refuses, to publicly document the ".xls" file format that is widely used to exchange spreadsheets (including their formulas). Microsoft didn't even begin working with a standards body until December 2005 - 11 months after the first public draft of OpenFormula, and 7 months after OASIS had completed its work on the OpenDocument 1.0 format. What's more, the initial December 2005 version had no details on formulas; up through April 2006, Microsoft had refused to publicly document the formula specification in their vendor-specific XML format with any specificity. Finally, 15 months after the OpenDocument community began defining formulas, Microsoft finally began; in May 2006 it released its first public draft specifying the Microsoft XML format for formulas. Yet this version is very incomplete, and worse, it was released using committee rules that openly discriminate against all other suppliers. We are glad that 15 months after the OpenDocument community began developing an open standard for formulas, Microsoft has started to follow in the tail-lights of the leaders, and has begun to release information about its native format for formulas. We continue to invite Microsoft to instead join other developers to develop industry-wide standards in a neutral setting.

OASIS, and the wider OpenDocument community, take specifications in general (including specifying formulas) very seriously. We want there to be a public specification which gives much more detail about how to exchange spreadsheet formulas, including rigorous definitions of a large set of standard functions. To be credible, such a specification must be developed by many different implementors, working together. Without many representives from many different suppliers, any specification would be a sham. The goal of the office formula subcommittee (SC) is to ensure that users can switch from one application to another, and exchange data with users of different applications, all while recalculating completely correctly. That also means that users must not lose access to their existing document collections in Excel, OpenOffice.org, StarOffice, KSpread, Gnumeric, or other applications. We have developed this work based on existing practice, so that the many people who already use or are converting to OpenDocument can continue using what they're using with confidence.

And we're succeeding.

Estimated SC Schedule

Note that by Febuary 2006 we already had a specification defining over 100 functions. For the most part, OpenFormula is a specification documenting what applications already do, and application developers have already been modifying their applications to track the specification where necessary. As a result, when we complete it, we expect that many OpenDocument applications will already comply or mostly comply with it, since typical OpenDocument-handling applications already comply with much of it.

Some old messages reported that we would not complete until October 2007, but that is simply not true.

Advantages of OpenFormula

Here are some key advantages of OpenFormula, most of which are unique to OpenFormula as a formula specification:

Subsets, Supersets, and OpenFormula Solutions (Groups and Supplier-unique names)

Subsets and supersets are necessary for open standards, yet if not carefully handled they can be problem. Let's see why subsets and supersets are required - yet can be a problem - and how OpenFormula addresses those potential problems.

Need for Subsets and Supersets

Clearly, standards are critical for interoperability. One obvious but hopelessly naive way to get "perfect" interoperability is to strictly forbid the implementation of subsets or supersets. Malevolent organizations can enforce this legally by granting patent grants that only permit implementations of "this specification, exactly". This "one size fits all" model doesn't work, and many definitions of "open standards" (including the most popular definition by Perens) explicitly require the ability to implement subsets and supersets. Without the ability to implement subsets and supersets, standards cannot respond to changing conditions. For example:

Indeed, malevolent organizations can turn a statement requiring only "exact" implementation into a trap. An organization granting patent rights - but only to "exact" implementation - then becomes the only organization that can make improvements on the standard, and can entrap everyone else who uses the standard. The result would be that you have to use their proprietary version of the standard, or use a completely different incompatible specification that they control. That is not where users want to go.

But if a supplier can implement subsets and supersets, and there is no mechanism for addressing them, there are other risks. There's the risk that the products won't be interoperable, for one thing. Another risk is that users will become unintentionally locked into using a particular supplier, because they inadvertantly depend on supplier-unique extensions. This is the basis of the "embrace, enhance, and extinguish" attack on standards - supplier exploit the need to permit supersets, by making naive users unknowingly depend on their supplier-unique extensions. Users need to be able to find out if the product they're getting only implements a subset (as opposed to the whole standard), and be able to easily avoid non-standard extensions (which means they need to be able to know what is a non-standard extensions in the first place).

At first, this appears to be a conflict between implementors' needs (to allow extensions and subsets) and users' needs (who need to know what they're getting). But these are not really in conflict. There are lots of ways to address the needs of both implementors and users. Here are a few:

In practice, useful standards are updated, based on experience based on real implementations that create subsets and supersets... but standards bodies often discuss how handle them in a controlled way.

OpenFormula Solutions: Groups (Small, Medium, Large), Test Cases, and Supplier-unique Names

So an open standard must allow for subsets and supersets, so that it can be tailored for any user's needs. Yet if every application implemented their own set of functions, you could end up in a situation where documents could never be exchanged. Formulas don't work like many other office suite capabilities; "graceful degradation" is often fine when you're talking about minor formatting tweaks, but formulas either recompute correctly or they don't. How can we resolve this?

For OpenFormula, the "Group" system, the built-in test cases, and the supplier-unique function naming mechanism are the primary answers to resolving both user and implementor needs.

OpenFormula predefines a number of different "groups" of capabilities. Applications are free to implement subsets and supersets, but in most cases we expect at least some of the groups to be requested by users. That way, applications can quickly express what they support, and users can quickly express what they want. Users could even define their own groups, if they wanted to, but we expect most will be happy to use the predefined sets.

The most important predefined groups are small, medium, and large:

Applications are free to implement one group plus additional functions, or even no groups, depending on what their users want. Implementing a larger group doesn't necessarily make an application "better" for all users; smaller sets are quite sufficient for many users, and they may prize other attributes more (such as small size, low price, having a web interface, and so on). But the grouping mechanism, supported by tools such as the test cases, makes it easy for users to determine if a particular application could meet their needs or not.

It's worth noting that the built-in test cases help here too. A supplier could claim to implement some function, but the supplier better have tried, because the specification's built-in test cases provide a first-cut check at them. Yes, a supplier could create functions that only give correct answers to the specific test cases provided, but that is not enough to actually meet the specification, and a supplier's reputation will not do well if such easily-detected subterfuge is tried.

Finally, the specification lets suppliers experiment by adding supplier-unique functions, but it does it by defining a naming convention so that such functions are clearly different from standard functions and from each other. The convention is disarmingly simple, and similar to Java's - prefix any supplier-unique function with a reversed DNS name. Thus ORG.OPENOFFICE.STYLE and COM.MICROSOFT.CUBEKPIMEMBER are supplier-unique function names. Since the naming convention itself is in the standard, application developers can confidently create new function names without worrying that they will interfere with another application's function names. Without a naming convention like this, suppliers would quickly create incompatible names, or be beholden to a single supplier who would be the only one that could create new functions. In the OpenFormula way, different suppliers can experiment with new functions, and if their experiments turn out to be useful, they can then be added to the standard. Standards should not standardize useless or problematic areas, yet if vendors can't experiment, there's no way to know if they will work - this mechanism resolves the issue.

History

OpenFormula is an open standard

OpenFormula is an open standard. If you're not sure what an open standard is, take a look at [WWW] Is OpenDocument an Open Standard? Yes!, which identifies the requirements for an open standard. Then let's go point by point, and you can see that OpenFormula is indeed an open standard:

When people exchange simple text files, or PNG graphics, or JPEG pictures, people don't assume that only one program can use the result. Why? Because these (and many other formats) are open standards. There are many programs that can handle these formats, freeing users to choose the best application for their circumstances. That is what we plan for spreadsheet formulas as well.

For more information

This is a Wiki page, and is constantly updated for those outside of the OpenFormula subcommittee who want a quick, informal summary. It is not formally vetted, but we do actively try to keep it accurate; please let us know if there's some omitted information that needs to be here, or if there's a correction that needs to be made.

[WWW] See the OpenDocument formula page for the statement of purpose (charter), current documents including the current draft, mailing list archive, and other official information about OpenFormula. If you're looking to understand OpenFormula in more detail, that should be your first stop.

Some additional OpenFormula status information is here:

Some commentary about MathML, the format for non-recalculated mathematical expressions used in OpenDocument and everywhere else:

Other commentary:

[WWW] The OpenDocument Fellowship has more information about OpenDocument in general. [WWW] The ODF (OpenDocument) Alliance provides information on the benefits and opportunities of the OpenDocument Format and now has a long list of members. [WWW] Sam Hiser's "What Is OpenDocument" explains a little about OpenDocument. OpenFormula, like OpenDocument, is a choice that lets you choose.

If you have questions not answered otherwise, or are interested in participating, you may contact the SC chair, David A. Wheeler, at dwheeler (at) dwheeler, dot com. Include "xyzzy" in the subject line, so that he will know you're not sending him spam.

last edited 2006-09-08 20:01:50 by 129