A Note on the Use of Standards: 
Adopted, Proposed and Proprietary

An issue that often causes confusion and can be a significant source of inconvenience to computer users is the choice of appropriate file formats for information that is shared between users (e.g. files sent in email or made available to others on disc or in some other way). This may seem a minor issue, but the choice of formats and interfaces for use between different users and computers is a major concern in the development of computer hardware and software and is of growing importance for everyday users in today's networked environment.

In this note, we describe the issue of standardization as it applies to the choice of file formats to enable file exchange between cooperating users. But the same arguments apply to the choice of standards in other areas of computer system design such as the choice of interfaces for networked services or the choice of programming languages. This has an important impact on the usability and maintainability of computer systems. It is also a good example of a professional issue in computer science - an issue in which the correct choices depend not only on technical but on social, moral or legal considerations.

The problem of file format choice

When we expect others to work cooperatively with us (a social consideration), we should treat their needs as equivalent to ours (a moral consideration). Their choice of computer hardware, operating system and application software is as valid as our own.

Hence when we give them a piece of our work or contribute to a joint effort, we should take care to use data formats that are neutral with respect to the hardware and software used by the members of the cooperating group. In many cases the group is open (i.e. we cannot identify all the present or possible future members of the group, as in Internet users interested in knitting patterns). To fulfill the criterion of equal treatment defined above we must adopt an accepted standard for the format of any data that we wish to give to an open group.

If the group is closed (i.e. we can identify its members, as in students and staff of the QMW CS Department), then it may be admissible to choose a format that is known to be available on all of the member's computers, although it is often easier and less error-prone to maintain the policy of accepted standards. A typical error would be to interpret the statement "all members of the group use Microsoft Word" as equivalent to "all members of the group have a version of Microsoft Word that can handle the very latest MSWord file format". Applications such as MSWord do not aim to maintain an unchanged file format as their development proceeds through different versions, so the use of the latest version may exclude users who have not yet updated their software.

Fit for Purpose?

Standards are horses for courses: you need to define the user group for any set of applications that the standards apply to before making decisions. Common technology, requirements and culture all influence this decision. There is a consideration about choice of standards which is to do with a model of democracy or power amongst the user group - for example, we can make a utilitarian choice (greatest good for greatest possible number, according to Jeremy Bentham, or a greed-is-good choice, where the power user gets most say (perhaps the author, or originator of a document imposes form as well as content.

Usefulness can be objectively assessed but also subjectively TBA 1

Metcalfe's law states that the "power of the network is the square of the number of users. We can imagine a "network" of users of various tools rather than just the internet (or janet, or dcs.qmw.ac.uk, or cs.ucl.ac.uk) - in crude terms, Word is certainly "more powerful" for the world community of computer users than say LaTeX - but if we think of the functions of document interchange as (in sort of increasing complexity)

  • a) distribution for printing/read
  • b) distribution for browsing on screen
  • c) distribution for searching (free text or other)
  • d) distribution for annotation
  • e) possible shared editing/authoring/revision...
  • Then we can define this power of the number of users of a tool as a weighted function of the reason the document is distributed, and the number of people wishing to apply that function. By ordering the functions' values, we can find an optimal source form (transitive closure?) (Of course, all of the above functions are more complex for multimedia documents which include diagrams, and possibly voice or even video - synchronization and overlay techniques mean that there is a far richer set of possible reduced function forms of a document then that could still be useful).

    TBA 2

    We can consider evaluating the cost to a user of converting a document format (e.g. framemaker to word is small, rtf to latex is nearly infinite, postscript to text is possible, but lossy, etc). We discuss this model further later in this note.

    What is an accepted standard?

    This question has a somewhat fuzzier answer than might be thought desirable. This fuzziness is an inevitable counterpart of technical progress. The rigorous specification of standards is a lengthy process that tends to stand in the way of progress when it occurs at the rates prevailing in the computer field. But this is not an excuse for failing to choose satisfactory representations, it is simply an indication that the matter requires careful thought.

    Here are some possible interpretations of the word standard in this context:

    Category Description  Examples
    1. Adopted A file format that has been studied and adopted by a competent official standards body or professional institute (e.g. ISO, BSI, IEEE).  ASCII, HTML, GIF, JPEG, MPEG
    2. Proposed A file format that has been proposed by a company, organization or group for the purpose of data interchange. The specification has been published and is freely available for other companies to build into software.  Postscript (.ps) file, Quicktime (multimedia) file 
    3. Proprietary A file format that is proposed by a company or organization for the purpose of data interchange, not published, but supported by software that is available for most known computers and operating systems.  Adobe Acrobat (.pdf) file, Zip archive file, Stuffit archive file 
    4. Non-standard A file format that is proposed by a company for the purpose of data interchange, not published, supported by software on only some computers and operating systems.  Microsoft Word (.doc) file, Adobe Photoshop file 
    The usefulness of the standards decreases as we proceed down the categories. The choice of a standard for a particular purpose will depend upon the use to which it is to be put and the cost (in additional effort) of using a standard. But the use of a standard in category 1 or 2 can always be considered good professional practice, whereas the use of category 3 is often acceptable and the use of category 4 should be avoided except for very limited purposes.

    There is a further requirement that should be added to all of the categories before they are considered for use: the format should be in widespread use. We should never adopt a standard unless we know that others have, or are likely to adopt it.

    Commercial considerations

    Particular formats and interfaces tend to migrate up the sequence; many companies publish their specifications when they have recovered their initial investment in the development of a format. Successful formats are eventually adopted by standards bodies.

    Formats whose specifications have not been freely published are referred to as proprietary. Proprietary formats are controlled by the owners of their specifications. This may be beneficial during the early stages of development of a standard - a single company or organisation is often more responsive to the need for change than a standards committee - but once the technical features of the standard are stabilised, its publication offers the greatest benefit to the wider community of software developers and users. Thus for example, neither of the formats given as examples for category 4 are available to Unix users, whereas the formats in categories 1 and 2 are available on virtually all platforms because their published specifications have been implemented by a wider range of software developers.

    Perhaps surprisingly, many companies in the software industry have been willing to publish the specifications of their formats and interfaces. This is not always as altruistic as it seems. The publication of a format offers the best chance for its universal or widespread adoption as a standard. Systems whose interfaces have been published in a form that enables developers to extend them are often referred to as open systems. By contrast, closed systems are those whose interfaces are not accessible to third party developers and hence can only be extended by the company that originally developed them.

    Summary

    The choice of file formats and other standards for the interchange of information is an important aspect of computer system and application design. Professional criteria for the choice have been identified and we have classified the available choices in a hierarchy of types of standard.

    An Economic Model of Virtual Document Interchange Community Networks

    distinguish
  • base authoring system
  • interchange service TBA
  • Kelly (see Charging and rate control for elastic traffic. European Transactions on Telecommunications, volume 8 (1997) pages 33-37. and Charging and accounting for bursty connections In "Internet Economics" (Editors Lee W. McKnight and Joseph P. Bailey) MIT Press, 1997. 253-278. ) considers the joint optimisation problem of maximising revenue of a communications network while maximising utility for the users. We attempt to apply a similar model to the value graph of a document interchange community.

    We consider a "network provider" to be a provider of interchange utilities - in other words someone who publishes a specification so that a document can be received by other utilities than the one it is authored in.

    A user is someone who uses a particular utility. The set of users try to maximise their utility, while the set of providers try to maximise their revenue. We might consider the cost of excluding a person (e.g. sending a word document in binary form to an ASCII mail LaTeX user) to be infinite, but this is not a reasonable analysis - a more realistic cost is that the receiver needs to lease or buy a system to carry out the conversion (in extremis, retrain).

    The revenue accrued by selling the interchange facility (being a provider) is of course offset in the real world by the revenue i nselling the base system - if a provider can cause everyone to use their base document system, then the cost of the interchange utility is zero since it is a null service.

    But lets assume a non monopolistic scenario....(or if their is a monopoly of provision of base system, then there should be a regulator!)

    TBA...  


    Comments and debate welcome. Mail to:
    George.Coulouris@dcs.qmw.ac.uk 11.3.98

    J.Crowcroft@cs.cs.ac.uk 11.3.98