In this note, we describe the issue of standardization as it applies to the choice of file formats to enable file exchange between cooperating users. But the same arguments apply to the choice of standards in other areas of computer system design such as the choice of interfaces for networked services or the choice of programming languages. This has an important impact on the usability and maintainability of computer systems. It is also a good example of a professional issue in computer science - an issue in which the correct choices depend not only on technical but on social, moral or legal considerations.
Hence when we give them a piece of our work or contribute to a joint effort, we should take care to use data formats that are neutral with respect to the hardware and software used by the members of the cooperating group. In many cases the group is open (i.e. we cannot identify all the present or possible future members of the group, as in Internet users interested in knitting patterns). To fulfill the criterion of equal treatment defined above we must adopt an accepted standard for the format of any data that we wish to give to an open group.
If the group is closed (i.e. we can identify its members, as in students and staff of the QMW CS Department), then it may be admissible to choose a format that is known to be available on all of the member's computers, although it is often easier and less error-prone to maintain the policy of accepted standards. A typical error would be to interpret the statement "all members of the group use Microsoft Word" as equivalent to "all members of the group have a version of Microsoft Word that can handle the very latest MSWord file format". Applications such as MSWord do not aim to maintain an unchanged file format as their development proceeds through different versions, so the use of the latest version may exclude users who have not yet updated their software.
Usefulness can be objectively assessed but also subjectively TBA 1
Metcalfe's law states that the "power of the network is the square of the number of users. We can imagine a "network" of users of various tools rather than just the internet (or janet, or dcs.qmw.ac.uk, or cs.ucl.ac.uk) - in crude terms, Word is certainly "more powerful" for the world community of computer users than say LaTeX - but if we think of the functions of document interchange as (in sort of increasing complexity)
TBA 2
We can consider evaluating the cost to a user of converting a document format (e.g. framemaker to word is small, rtf to latex is nearly infinite, postscript to text is possible, but lossy, etc). We discuss this model further later in this note.
Here are some possible interpretations of the word standard in this context:
Category | Description | Examples |
1. Adopted | A file format that has been studied and adopted by a competent official standards body or professional institute (e.g. ISO, BSI, IEEE). | ASCII, HTML, GIF, JPEG, MPEG |
2. Proposed | A file format that has been proposed by a company, organization or group for the purpose of data interchange. The specification has been published and is freely available for other companies to build into software. | Postscript (.ps) file, Quicktime (multimedia) file |
3. Proprietary | A file format that is proposed by a company or organization for the purpose of data interchange, not published, but supported by software that is available for most known computers and operating systems. | Adobe Acrobat (.pdf) file, Zip archive file, Stuffit archive file |
4. Non-standard | A file format that is proposed by a company for the purpose of data interchange, not published, supported by software on only some computers and operating systems. | Microsoft Word (.doc) file, Adobe Photoshop file |
There is a further requirement that should be added to all of the categories before they are considered for use: the format should be in widespread use. We should never adopt a standard unless we know that others have, or are likely to adopt it.
Formats whose specifications have not been freely published are referred to as proprietary. Proprietary formats are controlled by the owners of their specifications. This may be beneficial during the early stages of development of a standard - a single company or organisation is often more responsive to the need for change than a standards committee - but once the technical features of the standard are stabilised, its publication offers the greatest benefit to the wider community of software developers and users. Thus for example, neither of the formats given as examples for category 4 are available to Unix users, whereas the formats in categories 1 and 2 are available on virtually all platforms because their published specifications have been implemented by a wider range of software developers.
Perhaps surprisingly, many companies in the software industry have been willing to publish the specifications of their formats and interfaces. This is not always as altruistic as it seems. The publication of a format offers the best chance for its universal or widespread adoption as a standard. Systems whose interfaces have been published in a form that enables developers to extend them are often referred to as open systems. By contrast, closed systems are those whose interfaces are not accessible to third party developers and hence can only be extended by the company that originally developed them.
We consider a "network provider" to be a provider of interchange utilities - in other words someone who publishes a specification so that a document can be received by other utilities than the one it is authored in.
A user is someone who uses a particular utility. The set of users try to maximise their utility, while the set of providers try to maximise their revenue. We might consider the cost of excluding a person (e.g. sending a word document in binary form to an ASCII mail LaTeX user) to be infinite, but this is not a reasonable analysis - a more realistic cost is that the receiver needs to lease or buy a system to carry out the conversion (in extremis, retrain).
The revenue accrued by selling the interchange facility (being a provider) is of course offset in the real world by the revenue i nselling the base system - if a provider can cause everyone to use their base document system, then the cost of the interchange utility is zero since it is a null service.
But lets assume a non monopolistic scenario....(or if their is a monopoly of provision of base system, then there should be a regulator!)
TBA...
Comments and debate welcome. Mail to:
George.Coulouris@dcs.qmw.ac.uk
11.3.98
J.Crowcroft@cs.cs.ac.uk 11.3.98