Framework For Reliable Multicast Application Design

Zheng Wang and Jon Crowcroft
Department of Computer Science
University College London
Gower Street, London, UK

Christophe Diot INRIA Sophia Antipolis, 2004 route des Lucioles BP93 06902, France

Atanu Ghosh School of Computing Sciences University of Technology Sydney PO Box 123, Broadway NSW 2007, Australia

1. Introduction

The Internet Multicast Backbone, or MBONE, first established in early 1992, has grown exponentially from 40 sites to around 2000 now. Although many applications, such as video/audio conferencing tools, have been developed for use over the MBONE, few of them require reliable delivery of data.

In recent years, there has been a growing interest in reliable multicast. Some applications, while they can benefit from MBONE's one-to-many or many-to-many communications, also require reliable data delivery. Examples of such applications include software distribution, mailing list distribution, interactive distributed simulation, networked games and collaborative tools. a number of reliable multicast applications and systems have been proposed and built recently, and most of them have been designed for very specific application requirements. An excellent collection of reliable multicast related links can be found at [1]

It is widely recognized that reliable multicast is considerably more complex than reliable unicast. In particular, it is difficult to build a generic reliable transport protocol for muitlcast, much as TCP is a generic transport protocol for unicast. Reliable multicast is a case where "one size fits all" does not work at all. Applications often have very different reliability and latency requirements, state management styles, error recovery and group management mechanisms. A reliable multicast transport protocol that meets the worst-case requirements is unlikely to be efficient and scalable for many application requirements.

Instead of trying to design one generic reliable multicast protocol for all applications, we propose to have a common framework for building a family of different reliable multicast applications. The framework should provide a structure for reliable multicast with which different applications can share common functionality and code yet have the flexibility of using application-specific mechanisms. We envisage a reliable multicast application design kit with various components, libraries and reference implementation as an alternative to a generic transport protocol and API for application design.

2. Issues in Reliable Multicast

In this section we examine a number of important issues in reliable multicast.

Application Classification

To better understand the requirements of reliable multicast applications, we roughly group them into two main classes. One class is multipoint-to-multpoint interactive applications. This class of applications often have direct human involvement and have stringent latency requirements due to their interactive nature. Examples of this class are interactive distributed simulation, networked games, collaborative applications.

The other class is point-to-multipoint data distribution applications. Such applications deliver data from a source to multiple receivers, and usually run without much human intervention. This class includes software distribution, file/data distribution, mailing list delivery and news/TV broadcast.

Note that some applications may appear to multipoint-to-multipoint, but each point-to-multipoint tree can proceed independently. In such cases, we consider the application as point-to-multipoint. An application is regarded as multipoint-to-multipoint only when multiple senders are participating in a single application.

Reliability Model

For unicast, the reliability model is simple. There are only two parties involved, therefore the communication between them fails if either of the two parties fails. For multicast, however, it becomes less straightforward. Take point-to-multiple for example, when one receiver is cut off because of link failure or congestion, other receivers may get all data fine. There are then a few options. The source can proceed ahead with new data or wait until the connection to the troubled receiver is recovered. For a large multicast tree, it is likely that at any time there are some receivers experiencing some problems. Dealing with such situations may be the norm rather than the exception.

Reliability is concerned with the eventual delivery of data. However, many applications also have a latency requirement, i.e. how long it should take to deliver the data. For many applications, the data is only useful when delivered before the deadline. There is clearly a conflict between the reliability and latency requirements, and one issue which an application must deal with is the actions it should take when the data can not delivered reliably within the delay requirement. To reflect this relationship, we introduce a new parameter Reliability Interval (RI) - the time interval within which reliable delivery is useful - as a way to describe the reliability and latency requirements. In essence, RI reflects the connection between reliability and latency, and also defines the policy for dealing with loss recovery. For example, A large RI would indicate that reliability has priority over latency, and a small RI would indicate the reverse. So, a mailing list distribution application may have a RI of 7 days while a multi-user talk tool would have a RI of 15 minutes. The RI concept may also be applied to applications traditionally not regarded as reliable multicast applications such as realtime audio/video tools. Take an audio tool for example. It simply has a very small RI (e.g. 500ms). Over local but lossy connections, such as radio links, even audio tools could do retransmission within some local scope. With RI, application's reliability requirement becomes a spectrum rather than simly reliable or unreliable.

State Control

Reliable multicast applications are often classified as sender-based or receiver-based, depending on which end is responsible for tracking the current state of the connection and controlling retransmission for lost packets.

For unicast both sender-based and receiver-based approaches works well, and the difference between the two is relatively small. For multicast, however, the two approaches have significant impacts on the application design.

Since there are multiple receivers in multicast, a sender-based approach needs to have one control block per receiver. The sender also has to find the proper control block and update the state for each acknowledgement from the receivers. In contrast, in a receiver-based system, the retransmission task is distributed to all receivers, and the sender keeps no state about each receivers but simply replies to requests for data. From this perspective, a receiver-based system scales much better than a sender-based one.

The state management also determines the flow of information between the senders and receivers. In a sender-based system, the senders need positive acks from individual receivers to update the per-receiver state. A receiver-based system may need to poll the receivers when the senders are about to remove the data from their buffer.

There are also other implications on group management, feedback suppresion and loss recovery mechansisms which we discuss in the next three sections.

Group Management

A reliable multicast application may have explicit or implicit group management. Many multicast applications use implicit group management, where the sources simply send to a multicast address, and receivers join the same address to receive data. In this approach, the complete receiver set is unknown to either senders or receivers. This apporach fits well with the way IP multicast works, and receiver-based state management where the the senders do not need to keep state for individual receivers. The implicit group management also removes from the senders the burden of tracking the join-in and sign-off of receivers.

Some applications may require explicit control of the group memebership. In such case, the group mmebership may need to be fxied throughout the session or a group management protocol may be needed to handle reliable join-in and sign-off of memebers.

Feedback Suppression

One of the major problems in multicast applications is feedback implosion. Feedback implosion occurs when multiple receivers respond to a single event such as a packet loss or a poll. A number of feedback supression mechamisms have been proposed to deal with the problem. One approach that has been used in SRM [2] and many other applications uses random delay and mutlicast to suppress feedback implosion. Each receiver delays the transmission of its feedback packets for a random period that is related to the one-way delay from the receiver to the source. When a receiver's delay timer expires, the feedback packet is sent to the group by multicast to suppress feedback from other receivers.

Other mechanisms involve selection of a set of representatives from the receivers. For example, a probabilistic polling mechanism is used in [3] to solicit feedback. In [4], a selection protocol is propsoed to choose a representative set. Another approach is to establish a hierarchical structure so that feedback may be aggregated at each hierarchy [5].

For feedback supression mechanisms to work, the feedback must not be specific to the receiver which sends the feedback. An example of such feedback is a negative acknowledge, or NAK which informs the sender about the lost packets. When a receiver sends a NAK by multicast, all other receivers will see the NAK and do not need send a NAK if they have lost the same packets. Positive acknowledgement, or ACK, usually have receiver-specific information, thus most of the mechanisms above do not work. The hierarchical approach may be able to aggreagte multiple ACKs into one at each level. As sender-based state management requires ACKs from individual receivers, this is another reason why it does not scales well.

Loss Recovery

Loss recoery in reliable multicast may be done in a number of ways. The sender can simply retransmit the lost packets by multicast to the group. The disadvantage is that all receivers, no matter whether they lost the packets or not, get the retransmitted packets again. A possible solution is to set up another group specifically for retransmission, and receivers should join the retransmission group only when they detect packet losses.

Another apporach is local repair, where receivers may also carry out retransmission. It takes advantage of the scoping in IP multicast. When a receiver detects packet losses, it first sends a retransmission request by multicast with a small scope, hoping that other receivers nearby may have the lost packets and will retransmit them. If unsuccessful, the retransmission request may be sent again with a larger scope.

There is also some discussion of subcast where a sender can transmit a packet to some point in the multicast tree and the packet is then sent down the tree by multicast [5,6]. When the losses are caused by congestion by a number of points in the multicast trees, the sender or local nodes may simply subcast lost packets to the congested nodes.

An alternative to the ARQ is to use Forward Error Correction (FEC) [7,8]. FEC is particularly effective for multicast error recovery. The losses at different receivers are often not correlated, thus the sender has to retransmit the superset of all lost packets. With FEC, the same redundant data can be used by different receivers to reconstruct different lost packets. The FEC approach can be combined with traditional ARQ so that redundant data needs to be sent only when there are packet losses.

Congestion Control

Congestion control is one of the most difficult issues in multicast. The point-to-multipoint nature of multicast can potentially cause wide spread congestion with just a few sources. The heterogeneity of multiple tree make the problem extremely difficult to solve.

Ideally reliable multicast applications should perform congestion control similar to TCP so that they can be TCP-friendly. However, if the sender reduces its window every time it detects a packet loss, as TCP does, the transmission rate will be determined by the slowest receiver. Or even worse, since there are always uncorrelated packet losses on different parts of the tree, the transmission rate will eventually be reduced to at the the minumum, i.e., one or two packet per round trip time. Therefore, one possible apporach is to simply transmit at the rate of one or two packets per round trip time, and open the window very slow only when none of the receivers report losses.

Another approach which has been suggested is to have multiple multicast groups and the sender then transmits at different rates to multiple groups. Receivers can choose to sign on the groups based on the congestion they experience [9].

Give-Up Policy

There is an issue which is fairly unique to reliable multicast - how to deal with "bad" receivers. In a large multicast group, it is evitable that some receivers may be cut off as a result of congestion, network failure or end system crush. At some point, the sender must decide whether to abandon them. For example, if a receiver has experience heavy losses (say 80% loss rate) for prolonged periods, the sender may decide not to respond to retransmission requests from that receiver, or even better for that receiver to stop asking for retransmission. The sender may single out those receivers continuously falling behind and treat them separately.

3. Reliable Multicast Framing Protocol

In this section, we describe a framing protocol for reliable multicast applications - Reliable Multicast Framing Protocol (RMFP). The purpose of RMFP is to provides a common framework upon which a set of reliable multicast applications can be built and share common functionalities where exist.

RMFP follows the principles of application level framing and integrated layer processing []. It is designed to be integrated into an application rather than being implemented as a separater layer. RMFP includes extension mechanisms to allow modifications and additions required by individual applications. A complete specification of RMTP for a particular application will require a profile document which specifies extensions and modifications for the application.

The philosophy of RMFP is in many respects similar to the one of RTP. However, we believe that using RTP for reliable multicast is not a right approach and will not lead to a clean application design.

RMFP Data Packet Format

RMFT data packet has the following header format:

 
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| V |P|R|F|S|E|X| PAYLOAD TYPE  |             LENGTH            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          SOURCE ID                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|        SEQUENCE NUMBER        |           OBJECT ID           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
|                             OFFSET                            |
|                                                               |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|                  PROFILE-SPECIFIC EXTENSIONS                  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The first 3 bits are used for version number and padding. The next 5 bits indicate the nature of the data being carried in the packet. The R bit indicates that the data is being retransmited. The F bit is set when Forward Error Correction (FEC) is used but the exact format of FEC is determined by the payload type and its profile. The S/E bits indicate that the packet contains the start/end of a data obejct. The Exceptional Handling (X) bit set marks the packet for special handling. The packet may be processed in a unusual way, for example, overriding the reliablie requirements. The use of the bit is determined by the payload type and its profile.

The Payload Type field identifies the format of the payload and determines its intepretation of profile-specific fields. A profile will be defined for each payload type. The Length field identifies the lenght of the packet in 32 bits minus one, including the header and any padding. It can be used to combine several RMFP packets into one UDP packet. When several RMFP packets (data packets + retransmission packets) are concatenated within one UDP packet, the RMFP data packets (R=0) should all be placed at the beginning of the UDP packet so that receivers that do not encounter losses can just drop the tail of the retransmitted packets without processing it.

The Source ID field identifies the source. It is generated randomly similar to the SSRC field in RTP. The Sequence Number is a packet count. It increments it by one for each data packet sent. It can be used to detect packet losses (including both data packet and retransmitted packets), and calculate loss rates.

The Object ID field identifies the object carried in the packet. Object ID provides a handler to the internal structure of the data, and it can be used to multiplex several objects into a single session. For example, when a sender transmits multiple HTML files to a group of receivers, the receivers can use the Object ID to find out which object a packet belongs to and to start processing immediately. Without the Object ID, the receivers may have to wait until all previous packet are received in order. For some applications that do not have multiple objects in one session such as "mcast talk", the Object ID may be used for some logical division. For example, a multicast talk may divide the session into a number of intervals. When a new member joins in, it will only get the historical data in the previous one (or several) intervals. For small objects (e.g., one per packet), one may use the Object ID for retransmission request instead of byte range.

The Offset field identifies the byte position of the data relative to the first byte of the session. A sender may not send data in the order that an application wants to receive. For example, a file transfer program may divide a file into blocks and then send the blocks in a random order. The offset field can be used to assemble the received data into right order.

The fixed data header covers most of the functions in common across all application. The Profile-Specific Extensions is a variable size field. However, profile-specific modications and extensions can be defined in the profile for each payload type.

RMFP Control Packet Format

RMFP control packets include sender's report packets and receiver's report packets.

Sender's Report Packet

Sender's report is sent periodically by the sender about the data transmitted in the session.

 
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| V |P| SR TYPE | PAYLOAD TYPE  |             LENGTH            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          SOURCE ID                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|        OBJECT ID              |    HIGHEST SEQUENCE NUMBER    |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|                  PROFILE-SPECIFIC EXTENSIONS                  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The SR Type field determines the intepretation of the profile-specific extensions. The Object ID here is is associated with the sequence number feild. The Highest Sequence Number field indicates the data sent at the time the report is sent.

Receiver's Report Packet

Receiver's report is periodically sent by the receivers to give feedback on congestion and packet losses.

 
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| V |P| RR TYPE | PAYLOAD TYPE  |           LENGTH              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          SOURCE ID                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| FRACTION LOST |   RESERVED    |    HIGHEST SEQUENCE NUMBER    |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|                  PROFILE-SPECIFIC EXTENSIONS                  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The RR Type field determines the intepretation of the profile-specific extensions. The Fraction Lost field is the fraction of packets lost since last Sender's report, expressed as a fixed point number with the binary point at the left edge of the field. Fraction lost is the loss rate seen by the receiver. The information may be used for congestion control, error recovery purpose by the sender. The Highest Sequence Number field indicates the highest sequence number received so far.

4. Future Work

We intend to implement the reliable framing protocol and develop a reliable multicast application development kit. The development kit is to facilitate reliable multicast application design with re-usable modules and reference software. An initial kit will include the following modules

RMFP Data module
RMFP Control module
RMFP Packet module
RMFP Profile module
State Control module
Group Management module
Ack/Nak module
Loss module
Congestion Control module
We plan to use the development kit to implement a number of popular applications such as mtalk, multicast slide show, bulk multipoint file transfer, and multicast mail delivery.

5. References

1. T. Montgomery, Reliable Broadcast/Multicast Website, 1997

2. S. Floyd, V. Jacobson, S. McCanne, C. G. Liu, and L.Zhang, "A reliable multicast framework for light-weight sessions and application-level framing", ACM SIGCOMM'95, Oct 1995.

3. T. Turletti, J. C. Bolot, I. Wakeman, "Scalable feedback control for multicast video distribution in the Internet", ACM SIGCOMM'94, Aug 1994.

4. D. Delucia and K. Obraczka, "Multicast Feedback Supression using Representatives", Preprint, 1997

5. J. Lin, and S. Paul, "RMTP: A Reliable Multicast Transport Protocol", IEEE INFOCOM'96, March 1996.

6. C. Papadopoulos, S. McCanne, messages to rm mailing list, 1997.

7. J. Nonnenmacher, E.W. Biersack, "Reliable Multicast: Where to use Forward Error Correction", Proc. 5th. Workshop on Protocols for High Speed Networks, Oct. 1996.

8. L. Rizzo, "Effective erasure codes for reliable computer communications protocols, DEIT Technical" Report LR-970115, Source available.

9. L. Vicisano, "Notes on a cumulative layered organization of data packets across multiple streams with different rates", draft, 1997.