Request for Comments: 713 Jack Haverty (JFH@MIT-DMS) NIC #34739 Apr 1976 I. ABSTRACT A mechanism is defined for use by message servers in transferring data between hosts. The mechanism, called the MSDTP, is defined in terms of a model of the process as a translation between two sets of items, the abstract entities such as 'strings' and 'integers', and the formats used to represent such data as a byte stream. A proposed organization of a general data transfer mechanism is described, and the manner in which the MSDTP would be used in that environment is presented. -1- II. REFERENCES Black, Edward H., "The DMS Message Composer", MIT Project MAC, Programming Technology Division Document SYS.16.02. Burchfiel, Jerry D., Leavitt, Elsie M., Shapiro, Sonya and Strollo, Theodore R., compilers, "Tenex Users' Guide", Bolt Beranek and Newman, Cambridge, Mass., May 1971, revised January 1975, Descriptive sections on the TENEX subsystems: MAlLER, p. 116-11; MAlLSTAT, p. 118-119; READMAIL, p. 137; and SNDMSG, p. 165-170. Haverty, Jack, "Communications System Overview", MIT Project MAC, Programming Technology Division Document SYS.16.00. Haverty, Jack, "Communications System Daemon Manual", MIT Project MAC, Programming Technology Division Document SYS.16.01. ISI Information Automation Project, "Military Message Processing System Design," Internal Project Documentation (Out of Print), Jan. 1975 Message Services Committee, "Interim Report", Jan. 28, 1975 Mooers, Charlotte D., "Mailsys Message System: Manual For Users", Bolt Beranek and Newman, Cambridge, Mass., June 1975 (draft). Myer, Theodore H., "Notes On The BBN Mail System", Bolt Beranek and Newman, November 8, 1974. Myer, Theodore H., and Henderson, D. Austin, "Message Transmission Protocol", Network Working Group RFC 680, NIC 32116, April 30, 1975. Postel, Jon, "The PCPB8 Format", NSW Proposal, June 5, 1975 Tugender, R., and D. R. Oestreicher, "Basic Functional Capabilities for a Military Message Processing Service," ISI?RR-74-23., May 1975 Vezza, Al, "Message Services Committee Minority Report", Jan. 1975 -2- III. OVERVIEW This document describes a mechanism developed for use by message servers communicating over an eight-bit byte-oriented network connection to move data structures and associated data-typing information. It is presented here in the hope that it may be of use to other projects which need to transfer data structures between dissimilar hosts. A set of abstract entities called PRIMITIVE ITEMS is enumerated. These are intended to include traditional data types of general utility, such as integers, strings, and arrays. A mechanism is defined for augmenting the set of abstract data entities handled, to allow the introduction of application-specific data, whose format and semantics are understood by the application programs involved, but which can be transmitted using common coding facilities. An example might be a data structure called a 'file specification', or a 'date'. Abstract data entities defined using this mechanism will be termed SEMANTIC ITEMS, since they are typically used to carry data having semantic content in the application involved. Semantic and primitive items are collectively referred to simply as ITEMS. The protocol next involves the definition of the format of the byte stream used to convey items from machine to machine. These encodings are described in terms of OBJECTS, which are the physical byte streams transmitted. To complete the protocol, the rules for translating between objects and items are presented as each object is defined. An item is transmitted by being translated into an object which is transmitted over the connection as a stream of bytes to the receiver, and reconstructed there as an item. The protocol mechanism may thus be viewed as a simple translator. It enumerates a set of abstract entities, the items, which are known to programmers, a set of entities in byte-stream format, the objects, and the translation rules for conversion between the sets. A site implementing the MSDTP would typically provide a facility to convert between objects and the local representation of the various items handled. Applications using the MSDTP define their interactions using items, without regard to the actual formats in which such items are represented at various machines. This permits programs to handle higher-level concepts such as a character string, without concern for its numerous representational formats. Such detail is handled by the MSDTP. -3- Finally, a discussion of a general data transfer mechanism for communication between programs is presented, and the manner in which the particular byte-oriented protocol defined herein would be used in that environment is discussed. Terminology, as introduced, is defined and highlighted by capitalizing. IV. PRIMITIVE DATA ITEMS The primitive data items include a variety of traditional, well-understood types, such as integers and strings. Primitive data items will be presented using mnemonic names preceded by the character pair "p-", to serve as a reminder that the named object is primitive. These items may be represented in various computer systems in whatever fashion their programmers desire. IV.1 -- Set Of Primitive Items The set of primitive items defined includes p-INT, p-STRING, p-STRUC, p-BITS, p-CHAR, p-BOOL, p-EMPTY, and p-XTRA. Since the protocol was developed primarily for use in message services, items such as p-FLOAT are not included since they were unnecessary. Additional items may be easily added as necessary. A p-INT performs the traditional role of representing integer numbers. A p-BITS (BIT Stream) item represents a bit stream. The two possible p-BOOL (BOOLean) items are used to represent the logical values of *TRUE* and *FALSE*. The single p-EMPTY item is used to, for example, indicate that a given field of a message is empty. It is provided to act as a place-holder, representing 'no data', and appears as *EMPTY*. The p-STRUC (STRUCture) item is used to group together a collection of items as a single value, maintaining the ordering of the elements, such as a p-STRUC of p-INTs. A p-CHAR is a single character. The most common occurrence of character data, however, will be as p-STRINGs. A p-STRING should be considered to be a synonym for a p-STRUC containing only p-CHARs. This concept is important for generality and consistency, especially when considering definitions of permissible operations on structures, such as extracting subsequences of elements, etc. -4- Four p-XTRA items, which can be transmitted in a single byte, are made available for higher level protocols to use when a frequently used datum is handled which can be represented just by its name. An example would be an acknowledgment between two servers. Using p-XTRAs to represent such data permits them to be handled in a single byte. There are four possible p-XTRA items, termed *XTRA0*, *XTRA1*, *XTRA2*, and *XTRA3*. These may be assigned meanings by user protocols as desired. IV.2 -- Printing Conventions The following printing conventions are introduced to facilitate discussion of the primitive items. When a specific instance of a primitive data item is presented, it will be shown in a traditional representation for that kind of data. For example, p-INTs are shown as sequences of digits, e.g. 100, p-STRINGs, as sequences of characters enclosed in double-quote characters, for example "ABCDEF". As shown above, the two possible p-BOOL items are shown as *TRUE* or *FALSE*. The object p-EMPTY appears as *EMPTY*. A bit stream, i.e. p-BITS, appears as a stream of 1s and 0s enclosed in asterisks, for example *100101001*. A p-CHAR will be presented as the character enclosed in single quote characters, e.g., 'A'. P-STRUCs are printed as the representations of their elements, enclosed in parentheses, for example (1 2 3 4) or ("XYZ" "ABC" 1 2) or ((1 2 3) "A" "B"). Note that because p-STRINGs are simply a class of p-STRUCs assigned a special name and printing format for brevity and convenience, the items "ABC" and ('A' 'B' 'C') are identical, and the latter format should not be used. To present a generic p-STRUC, as in specifying formats of the contents of something, the items are presented as a mnemonic name, optionally followed by a colon and the permissible types of values for that datum. When one of several items may appear as the value for some component, the permissible ones appear separated by vertical-bar characters. For example, p-INT|p-STRING represents a single item, which may be either a p-INT or a p-STRING. To represent a succession of items, the Kleene star convention is used. The specification p-INT[*] represents any number of p-INTs. Similarly, p-INT[3,5] represents from 3 to 5 p-INTs, while p-INT[*,5] specifies up to 5 and p-iNT[5,*] specifies at least 5 p-INTs. -5- For example, a p-STRUC which is used to carry names and numbers might be specified as follows. (name:p-STRING number:p-INT) In discussing items in general, when a specific data value is not intended, the name and types representation may be used, e.g., offset:p-INT to discuss an 'offset' which has a numeric value. V. SEMANTIC ITEM MECHANISM The semantic item mechanism provides a means for program designers to use a variety of application-specific data items. This mechanism is implemented using a special tagged structure to carry the data type information as well as the actual components of the particular semantic item. For discussion purposes. Such a special p-STRUC will be termed a p-EDT (Extended Data Type). When p-EDTs are transferred, their identity as a p-EDT is maintained. So that an applications program receives the corresponding semantic item instead of a simple p-STRUC. A p-EDT is identical to a p-STRUC in all other respects. V.1 -- Format of p-EDTs A prototypical p-EDT follows. It is printed as if it were a normal p-STRUC. Since p-EDTs are converted to semantic items for presentation to the user, a p-EDT will never be used except in this protocol definition. (type:p-INT|p-STRING version:p-INT com1:any com2:any ...) The first element, the 'type' is generally a p-INT, and is used to identify the particular type of semantic item. Types are assigned numeric codes in a controlled fashion. The type may alternatively be specified by a p-STRING, to permit development of new data types for possible later assignment of codes. Each type has an equivalent p-STRING name. These may be used interchangeably as 'type' elements, primarily to maintain upward compatibility. The second element of a p-EDT is always an p-INT, the 'version', and specifies the exact format of the particular datum. A semantic item may undergo several revisions of its internal structure. Which would be evident through assigning different versions to each revision. -6- Successive components. The 'com' elements, if any. carry the actual data of the semantic item. As each semantic item is defined, conventions on permissible values and interpretation of these components are presented. Such definitions may use any types of items to specify the format of the semantic item. Use of lower level concepts, such as objects, in these definitions is prohibited. Semantic items will be printed as the mnemonic for the type involved, preceded by the character pair "s-", to signify that the data item is handled by this mechanism. V.2 -- Printing Conventions A semantic item is represented as if it were a p-STRUC containing only the components, if any, but preceded by the semantic type name and a # character. The version number is assumed to be 1 if unspecified. For later versions, the version number is attached to the type name, as in, for example, FILE-2 to represent version 2 of the FILE data type. For example, a semantic item called a 'file specification' might be defined, containing two components, a host number and pathname. A specific instance of such an item might appear as #FILE(69 "DIRECTORY.NAME-OF-FILE"), while a generic s-FILE might be presented as the following. #FILE(host:p-INT|p-STRING pathname:p-STRING) the item, which may be either a p-INT or p-STRING, and 'pathname' is the second component, which must be a p-STRING. The full definition would present interpretation rules for these components. VI. ENCODING OBJECTS This section presents the set of objects which are used to represent items as byte streams for inter-server transmission. Objects will be presented using mnemonic type-names preceded by the character pair "b-", indicating their existence only as byte streams. All servers are required to be capable of decoding the entire set of objects. Servers are not required to transmit certain objects which are available to improve channel efficiency. -7- The encodings are designed to facilitate programming and efficiency of the receiving decoder. In all cases, the type and length in bytes of objects is supplied as the first information sent. This characteristic is important for ease of implementation. The type information permits a decoder to be constructed in a modular fashion. The most important advantage of including size information is that the receiver always knows how many bytes it must read to discover what to do next, and knows when each object terminates. This requirement avoids many potential problems with timing and synchronization of processes. Two varieties of objects are defined. The first will be called ATOMIC, and includes objects used to efficiently encode the most common data. The second variety is termed NON-ATOMIC, and is used to encode larger or less common items. In all cases, a data object begins with a single byte, which will be termed the TYPE-BYTE, a field of which contains the type code of the object. The following bytes, if any, are interpreted according to the type involved. VI.1 -- Presentation Conventations In discussing formats of bytes, the following conventions will be employed. The individual bits of a byte will be referenced by using capital letters from A to H, where A signifies the highest order bit, and H the lowest. The entire eight bit value, for example, could be referred to as ABCDEFGH. Similarly, subfields of the byte will be identified by such sequences. The CDEF field specifies the middle four bits of a byte. In referring to values of fields, binary format will be used, and small letters near the end of the alphabet will be used to identify particular bits for discussion. For example, we might say that the BCD field of a byte contains a specifier for some type, and define its value to be BCD=11z. In discussions of the specifier usage, we could refer to the cases where z=l and where z=0, as shorthand notation to identify BCD=111 and BCD=110, respectively. V1.2 -- Type-Byte Bit Assignment To assist in understanding the assignment of the various type-byte values, the table and graph below are included, showing representations of the eight bits. -8- OXXXXXXX -- CHAR7 (CHARacter, 7 bit) 10XXXXXX -- SINTEGER (Small INTEGER) l10XXXXX -- NON-ATOM (NON-ATOMic objects) 11100XXX -- LINTEGER (Large INTEGER) 11101XXX -- reserved 11110XXX -- SBITSTR (Short BIT STReam) 111110XX -- XTRA (eXTRA single-byte objects) 1111110X -- BOOL (BOOLean) 11111110 -- EMPTY (EMPTY data item) 11111111 -- PADDING (unused byte) In each case, the bits identified by X's are used to contain information specific to the type involved. These are explained when each type is defined. An equivalent tree representation follows, for those who prefer it. start with high order bit | | | 0-----0-----0-----0-----0-----0-----0-----0-----X | | | | | | | | PADDING 0| 0| 0| 0| 0| 0| 0| 0| | | | | | | | | X | X | X | X X CHAR7 | NON-ATOM | BITS | BOOL EMPTY (7) | (5) | (3) | (1) | 0| | | SINTEGER | XTRA (6) | (2) LINTEGER (3) Type-Byte Bit Assignment Scheme This picture is interpreted by entering at the top, and taking the appropriate branch at each node to correspond to the next bit of the type-byte, as it is scanned from left to right. When a type is assigned, the branch terminates with an "X' and the name of the type of the object, with the number of remaining bits in parentheses. The individual object definitions specify how these bits are used for that particular type. V1.3 -- Atomic Objects Atomic objects are identified by specific patterns in a type-byte. Receiving servers must be capable of recognizing -9- and handling all atomic types, since the size of the object is not explicitly present in a uniform fashion. ================================ | Atomic Object: B-CHAR7 | ================================ The b-CHAR7 (CHARacter 7 bit) object is introduced to handle transmission of characters, in 7-bit ASCII format. Since the vast majority of message-related data involves such objects, they are designed to be very efficient in transmission. Other formats, such as eight bit values, can be introduced as non-atomic objects. The format of a b-CHAR7 follows: A=0 identifying the b-CHAR7 data type BCDEFGH=tuvwxyz containing the character code The tuvwxyz objects contain the ASCII code of the character. For example, transmission of a "space' (ASCII code 32, 40 octal) would be accomplished by the following byte. 00100000 ABCDEFGH A=0 to identify this byte as a b-CHAR7. The remaining bits contain the 7 bit code, octal 40, for space. A b-CHAR7 standing alone is presented as a p-CHAR. Such occurrences will probably be rare if they are used at all. The most common use of b-CHAR7's is as elements of b-USTRUCs used to transmit p-STRINGS, as explained later. ============================= | Atomic Object: B-SINTEGER | ============================= The b-SINTEGER (Small INTEGER) object is used to transmit very small positive integers, of values up to 64. It always translates to an p-INT, and any p-INT between 0 and 63 may be encoded as a b-SINTEGER for transmission. The format of an b-SINTEGER follows. AB=10 identifying the object as a b-SINTEGER CDEFGH=uvwxyz containing the actual number For example, to transmit the integer 10 (12 octal), the following byte would be transmitted: 10001010 ABCDEFGH -10- AB=10 to specify a b-SINTEGER. The remaining six bits contain the number 10 expressed in binary. ============================= | Atomic Object: B-SINTEGER | ============================= The b-SINTEGER (Large INTEGER) object is used to transmit p-INTs to any precision up to 64 bits. It is always translated as a p-INT. Sending servers are permitted to choose either b-SINTEGER or b-SINTEGER format for transmission of numbers, as appropriate. When possible, b-SINTEGERs can be used for better channel efficiency. The format of a b-SINTEGER follows: ABCDE=11100 specifying that this is a b-SINTEGER. FGH=xyz containing a count of number of bytes to follow. The xyz bits are interpreted as a number of bytes to follow which contain the actual binary code of the the integer in 2's complement format. Since a zero-byte integer is disallowed, the pattern xyz=000 is interpreted as 1000, specifying that 8 bytes follow. The number is transmitted with high-order bits first. This format permits transmission of integers as large as 64 bits in magnitude. For example, if the number 4096 (10000 octal) is to be transmitted, the following sequence of bytes would be sent: 11100010 00010000 00000000 ABCDEFGH ---actual data--- ABCDE=11100, identifying this as a b-LINTEGER, E=0, specifying a positive number, and FGH=010, specifying that 2 bytes follow, containing the actual binary number. ============================ | Atomic Object: B-SBITSTR | ============================ The b-SBITSTR (Short BIT STReam) object is used to transmit a p-BITS of length 63 or less. For longer bit streams, the non-atomic object b-LBITSTR may be used. The format of a b-SBITSTR follows. ABCDE=11110 specifying the type as b-SBITSTR FGH=xyz specifying the number of bytes following. -11- The xyz value specifies the number of additional bytes to be read to obtain the bit stream values. As in the case of b-SINTEGER, the value xyz=000 is interpreted as 1000, specifying that 8 bytes follow. To avoid requiring specification of exactly the number of bits contained, the following convention is used. The first data byte is scanned from left to right until the first 1 bit is encountered. The bit stream is defined to begin with the immediately following bit, and run through the last bit of the last byte read. In other words, the bit stream is 'right-adjusted' in the collected bytes, with its left end delimited by the first "on' bit. For example, to send the bit stream *001010011* (9 bits), the following bytes are transmitted. 11110010 00000010 01010011 ABCDEhij klmnopqr stuvwxyz The hij=010 value specifies that two bytes follow. The q bit, which is the first 1 bit encountered, identifies the start of the bit stream as being the r bit. The rstuvwxyz bits are the bit stream being handled. ========================= | Atomic Object: b-BOOL | ========================= The b-BOOL (BOOLean) object is used to transmit p-BOOLs. The format of b-BOOL objects follows. ABCDEFG=1111110 specifying the type as b-BOOL H=z specifying the value The two possible translations of a b-BOOL are *FALSE* and *TRUE*. 11111100 represents *FALSE* 11111101 represents *TRUE* ABCDEFGz if z=0, the value is FALSE, otherwise TRUE. ======================================== | Atomic Object: B-EMPTY | ======================================== The b-EMPTY object type is used to transmit a 'null' object, i.e. an *EMPTY*. The format of an b-EMPTY follows. ABCDEFGH=11111110 specifying *EMPTY* -12- ========================= | Atomic Object: B-XTRA | ========================= The b-XTRA objects are used to carry the four possible p-XTRA items, i.e., *XTRA0*, *XTRA1*, *XTRA2*, and *XTRA3*. These four items correspond to the binary coding of the remaining two bits after the b-XTRA type code bits. The format of a b-XTRA follows. ABCDEF=111110 to specify the type b-XTRA GH=yz to identify the particular p-XTRA item carried The GH bits of the byte are decoded to produce a particular p-XTRA item, as follows. GH=00 -- *XTRA0* GH=01 -- *XTRA1* GH=10 -- *XTRA2* GH=11 -- *XTRA3* The b-XTRA object is included to provide the use of several single-byte data items to higher levels. These items may be assigned by individual applications to improve the efficiency of transmission of several very frequent data items. For example, the message services protocols will use these items to convey positive and negative acknowledgments, two very common items in every interaction. ======================================== | Atomic Object: B-PADDING ======================================== This object is anomalous, since it represents really no data at all. Whenever it is encountered in a byte stream in a position where a type-byte is expected, it is completely ignored, and the succeeding byte examined instead. Its purpose is to serve as a filler in byte streams, providing servers with an aid in handling internal problems related to their specific word lengths, etc. The encoders may freely use this object to serve as padding when necessary. All b-PADDING data objects exist only within an encoded byte stream. They never cause any data item whatsoever to be presented externally to the coder module. The format of a b-PADDING follows. ABCDEFGH=11111111 Note that this does not imply that all such 'null' bytes in a stream are to be ignored, since they could be encountered as a byte within some other type, such as b-LINTEGER. Only bytes of this format which, by their position in the stream, appear as a 'type' byte are to be ignored. -13- VI.4 -- Non-Atomic Objects Non-atomic objects are are always transmitted preceded by both a single type byte and some small number of size byte(s). The type byte identifies that the data object concerned is of a non-atomic type, as well as uniquely specifying the particular type involved. All non-atomic objects have type byte values of the following form. ABC=110 specifying that the object is non-atomic DEFGH=vwxyz specifying the particular type of object The vwxyz value is used to specify one of 31 possible non-atomic types. The value vwxyz=00000 is reserved for use in future expansion. In all non-atomic data objects, the byte(s) following the type-byte specify the number of bytes to follow which contain the data object. In all cases, if the number of bytes specified are processed, the next byte to be seen should be another type-byte, the beginning of the next object in the stream. The number of bytes containing the object size information is variable. These bytes will be termed the SIZE-BYTES. The first byte encountered has the following format. A=s specifying the manner in which the size information is encoded BCDEFGH=tuvwxyz specifying the size, or number of bytes containing the size The tuvwxyz values supply a positive binary number. If the s value is a one, the tuvwxyz value specifies the number of bytes to follow which should be read and concatenated as a binary number, which will then specify the size of the object. These bytes will appear with high order bits first. Thus, if s=1, up to 128 bytes may follow, containing the count of the succeeding data bytes, which should certainly be sufficient. Since many non-atomic objects will be fairly short, the s=0 condition is used to indicate that the 7 bits contained in tuvwxyz specify the actual data byte count. This permits objects of sizes up to 128 bytes to be specified using one size-information byte. The case tuvwxyz=0000000 is interpreted as specifying 128 bytes. For example, a data object of some non-atomic type which requires 100 (144 octal) bytes to be transmitted would be sent as follows. -14- 110XXXXX -- identifying a specific non-atomic object 01100100 -- specifying that 100 bytes follow . . data -- the 100 data bytes . . Note that the size count does not include the size-specifier byte(s) themselves, but does include all succeeding bytes in the stream used to encode the object. A data object requiring 20000 (47040 octal) bytes would appear in the stream as follows. 110XXXXX -- identifying a specific non-atomic object 10000010 -- specifying that the next 2 bytes contain the stream length 01001110 -- first byte of number 20000 00100000 -- second byte . . data -- 20,000 bytes . . Interpretation of the contents of the 20000 bytes in the stream can be performed by a module which knows the specific format of the non-atomic type specified by DEFGH in the type-byte. The remainder of this section defines an initial set of non-atomic types, the format of their encoding, and the semantics of their interpretation. ================================ | Non-atomic Object: B-LBITSTR | ================================ The b-LBITSTR (Long BIT Stream) data type is introduced to transmit p-BITS which cannot be handled by a b-SBITSTR. A b-LBITSTR may be used to transmit short p-BITS as well. Its format follows. -15- 11000001 size-bytes data-bytes ABCDEFGH ABC=110 identifies this as a non-atomic object. DEFGH=00001 specifies that it is a b-LBITSTR. The standard sizing information specifies the number of succeeding bytes. Within the data-bytes, the first object encountered must decode to a p-INT. This number conveys the length of the bit stream to follow. The actual bit stream begins with the next byte, and is left-adjusted in the byte stream. For example to encode *101010101010*, the following b-LBITSTR could be used, although a b-SBITSTR would be more compact. 11000001 -- identifies a b-LBITSTR 00000010 -- b-SINTEGER, to specify length 10001100 -- size = 2 10101010 -- first 8 data bits 10100000 -- last 4 data bits ============================== | Non-atomic Object: B-STRUC | ============================== The b-STRUC (STRUCture) data type is used to transmit any p-STRUC. The translation rules for converting a b-STRUC into a primitive item are presented following the discussion of b-REPEATs. The b-STRUC format appears as follows. 11000010 size-bytes data-bytes ABCDEFGH ABC=110 identifies this as a non-atomic type. DEFGH=00010 specifies that the object is a b-STRUC. Within the data-bytes stream, objects simply follow in order. This implies that the b-STRUC encoder and decoder modules can simply make use of recursive calls to a standard encoder/decoder for processing each element of the b-STRUC. Note that any type of object is permitted as an element of a b-STRUC, but the size information of the b-STRUC must include all bytes used to represent the elements. Containment of b-STRUCs within other b-STRUCs is permitted to any reasonable level. That is, a b-STRUC may contain as an element another b-STRUC, which contains another b-STRUC, and so on. All servers are requires to handle such containment to at least a minimum depth of three. Examples of encoded structures appear in a later section. -16- ============================ | Non-atomic Object: B-EDT | ============================ A b-EDT is the object used as the carrier for p-EDTs in transmission of semantic items. It is functionally identical to a b-STRUC, but has a different type code to permit it to be identified and converted to a semantic item instead of a p-STRUC. The format of a b-EDT follows. 11000011 size-bytes data-bytes ABCDEFGH As with all non-atomic types, ABC=110 to identify this as such, and DEFGH=00011 to specify a b-EDT. The objects in the data-bytes are decoded as for b-STRUCs. However, the first object must decode to a p-iNT or p-STRING and the second to a p-INT, to conform to the format of p-EDTs. =============================== | Non-atomic Object: b-REPEAT | =============================== The b-REPEAT object is never translated directly into an item. It is legal only as an component of an enclosing b-STRUC, b-USTRUC, b-EDT, or b-REPEAT. A b-REPEAT is used to concisely specify a set of elements to be treated as if they appeared in the enclosing structure in place of the b-REPEAT. This provides a mechanism for encoding a sequence of identical data items or patterns efficiently for transmission. A common example of this would be in transmission of text, where line images containing long sequences of spaces, or pages containing multiple carriage-return, line-feed pairs, are often encountered. Such sequences could be encoded as an appropriate b-REPEAT to compact the data for transmission. The format of a b-REPEAT is as follows. 11000100 -- identifyIng the object as a b-REPEAT size-bytes -- the standard non-atomic object size information countspec -- an object which translates to a p-INT . . data -- the objects which define the pattern . . The 'countspec' object must translate to an p-INT to specify the number of times that the following data pattern should be repeated in the object enclosing the b-REPEAT. -17- The remaining objects in the b-REPEAT constitute the data pattern which is to be repeated. The decoding of the enclosing structure will be continued as if the data pattern objects appeared 'countspec' times in place of the b-REPEAT. Zero repeat counts are permitted, for generality. They cause no objects to be simulated in the enclosing structure. An encoder does not have to use b-REPEATs at all, if simplicity of coding outweighs the benefits of data compression. In message services, for example, an encoder might limIt itself to only compressing long text strings. It is important for compatibility, however, to have the ability in the decoders to handle b-REPEATs. =============================== | Non-atomic Object: B-USTRUC | =============================== The b-USTRUC (Uniform Structure) object type is provided to enable servers to convey the fact that a p-STRUC being transferred contains items of only a single type. The most common example would involve a b-USTRUC which translates to a p-STRUC of only p-CHARs, and hence may be considered to be a p-STRING. Servers may use this information to assist them in decoding objects efficiently. No server is required to generate b-USTRUCs. The internal construction of a b-USTRUC is identical to that of a b-STRUC, except for the type-byte. The format of a b-USTRUC follows. 11000101 size-bytes data-bytes ABCDEFGH ABC=110 to identify a non-atomic object. DEFGH=00101 specifies the object as a b-USTRUC. =============================== | Non-atomic Object: B-STRING | =============================== The b-STRING object is included to permit explicit specification of a structure as a p-STRING. This information will permit receiving servers to process the incoming structure more efficiently. A b-STRING is formatted similarity to a b-USTRUC, except that its type-byte identifies the object as a b-STRI/NG. The normal sizing information is followed by a stream of bytes which are interpreted as b-CHAR7s, Ignoring the high-order bit. The format of a b-STRING follows. 11000110 size-bytes data-bytes ABCDEFGH ABC=110 to identify a non-atomic object. DEFGH=00110 specifies the object as a b-STRING. -18- VI.5 -- Structure Translation Rules A b-STRUC is translated into a p-STRUC. This is performed by translating each object of the b-STRUC Into its corresponding item, and saving it for inclusion In the p-STRUC being generated. A b-USTRUC is handled similarly, but the coding programs may utilize the information that the resultant p-STRUC will contain items of uniform type. The preferred method of coding p-STRINGS is to use b-USTRUCs. If all of the elements of the resultant p-STRUC are p-CHARs, it is presented to the user of the decoder as a p-STRING. A p-STRING should be considered to be a synonym for a p-STRUC containing only characters. It need not necessarily exist at particular sites which would present p-STRUCs of p-CHARs to their application programs The object b-REPEAT is handled in a special fashion when encountered as an element. When this occurs, the data pattern of the b-REPEAT is translated into a sequence of items, and that sequence is repeated in the next higher level as many times as specified in the b-REPEAT. Therefore, b-REPEATS are legal only as elements of a surrounding b-STRUC, b-USTRUC, b-EDT, or b-REPEAT. In encoding a p-STRUC or p-STRING for transmission, a translator may use b-REPEATs as desired to effect data compression, but their use is not mandatory. Similarly, b-STRINGS may be used, but are not mandatory. A b-EDT is translated into a p-EDT to identify it as a carrier for a semantic item. Otherwise, it is treated identically to a b-STRUC. VI.6 -- Translation Summary The following table summarizes the possible translations between primitive items and objects. p-INT <--> b-LINTEGER, b-SINTEGER p-STRING <--> b-STRING, b-STRUC, b-USTRUC p-STRUC <--> b-STRING, b-STRUC, b-USTRUC p-BITS <--> b=SBITSTR, b-LBITSTR p-CHAR <--> b-CHAR7 p-BOOL <--> b-BOOL p-EMPTY <--> b=EMPTY p-XTRA <--> b-XTRA p-EDT <--> b-EDT (all semantic items) -none- <--> b-PADDING -none- <--> b-REPEAT (only within structure) Note that all semantic items are represented as p-EDTs which always exist as b-EDTs in byte-stream format. -19- V1.7 -- Structure Coding Examples The following stream transmits a b-STRUC containing 3 b-SINTEGERs, with values 1, 2, and 3, representing a p-STRUC containing three p-INTs, i.e. (1 2 3). 11000010 -- b-STRUC 00000011 -- size=3 10000001 -- b-SINTEGER=1 10000010 -- b-SINTEGER=2 10000011 -- b-SINTEGER=3 The next example represents a b-STRUC containing the characters X and Y, followed by the b-LINTEGER 10, representing a p-STRUC of 2 p-CHARs and a p-INT, i.e., ('X' 'Y' 10). Note that the p-INT prevents considering this a p-STRING. 11000010 -- b-STRUC 00000100 -- size=4 01011000 -- b-CHAR7 'X' 01011001 -- b-CHAR7 'Y' 11100001 -- b-LINTEGER 00001010 -- 10 Note that a better way to send this p-STRUC would be to represent the integer as a b-SINTEGER, as shown below. 11000010 -- b-STRUC 00000011 -- size=3 01011000 -- b-CHAR7 'X' 01011001 -- b-CHAR7 'Y' 10001010 -- b-SINTEGER=10 The next example shows a b-STRUC of b-CHAR7s. It is the translation of the b-STRING "HELLO". 11000010 -- b-STRUC 00000101 -- size=5 01001000 -- b-CHAR7 'H' 01000101 -- b-CHAR7 'E' 01001100 -- b-CHAR7 'L' 01001100 -- b-CHAR7 'L' 01001111 -- b-CHAR7 'O' This datum could also be transmitted as a b-STRING. Note that the character bytes are not necessarily b-CHAR7s, since the high-order bit is ignored. 11000110 -- b-STRING 00000101 -- size=5 01001000 -- 'H' 01000101 -- 'E' 01001100 -- 'L' 01001100 -- 'L' 01001111 -- 'O' -20- To encode a p-STRING containing 20 carriage-return line-feed pairs, the following b-STRUC containing a b-REPEAT could be used. 11000010 -- b-STRUC 00000101 -- size=5 11000100 -- b-REPEAT 00000011 -- size=3 10010100 -- count, b-SINTEGER=20 00001101 -- b-CHAR7, "CR' 00001010 -- b-CHAR7, 'IF' To encode a p-STRUC of p-INTs, where the sequence contains a sequence of thirty 0's preceded by a single 1, the following b-STRUC could be used. 11000010 -- b-STRUC 00000110 -- size=6 10000001 -- b-SINTEGER=1 11000100 -- b-REPEAT 00000010 -- size=2 10011110 -- count, b-SINTEGER=30 10000000 -- b-SINTEGER=0 VII. A GENERAL DATA TRANSFER SCHEME This section considers a possible scheme for extending the concept of a data translator into an multi-purpose data transfer mechanism. The proposed environment would provide a set of primitive items, including those enumerated herein but extended as necessary to accommodate a variety of applications. Communication between processes would be defined solely in terms of these items, and would specifically avoid any consideration of the actual formats in which the data is transferred. A repertoire of translators would be provided, one of which is the MSDTP machinery, for use in converting items to any of a number of transmission formats. Borrowing a concept from radio terminology, each translator would be analogous to a different type of modulation scheme, to be used to transfer data through some communications medium. Such media could be an eight-bit byte-oriented connection, 36-bit connection, etc. and conceivably have other distinguishing features, such as bandwidth, cost, and delay. For each media which a site supports, it would provide its programmers with a module for performing the translations required. -21- Certain media or translators might not handle various items. For example, the MSDTP does not handle items which might be termed p-FLOATs, p-COMPLEXs, p-ARRAY, and so on. In addition, the efficiency of various media for transfer of specific items may differ drastically. MSDTP, for example, transfers data frequently used in message handling very efficiently, but is relatively poor at transfer of very large or deep tree structures. Available at each site as a process or subroutine package wouLd be a module responsible for interfacing with its counterpart at the other end of the media. These modules would use a protocol, not yet defined, to match their capabilities, and choose a particular media and translator, when more than one exists, for transfer of data items. Such a facility could totally insulate applications from need to consider encoding formats, machine differences, and so on, as well as eliminate duplication of effort in producing such facilities for every new project which requires them. In addition, as new translators or media are introduced, they would become immediately available to existing users without reprogramming. Implementation of such a protocol should not be very difficult or time-consuming, since it need not be very sophisticated in choosing the most appropriate transfer mechanism in initial implementations. The system is inherently upward-compatible and easily expandable. -22-
RFC, FYI, BCP