Proposed official standard for the format of ARPA Network messages :: RFC0724
RFC # 724
NIC #37435 12 May 1977
Proposed Official Standard for the
Format of ARPA Network Messages
by
Ken Pogran, MIT-LCS/CSR (Pogran at MIT-Multics)
John Vittal, BBN (Vittal at BBN-TENEXA)
Dave Crocker, RAND-ISD (DCrocker at Rand-Unix)
Austin Henderson, BBN (Henderson at BBN-TENEXD)
Proposed Standard for Message Format / ii
PREFACE
ARPA's Committee on Computer-Aided Human Communication
(CAHCOM) wishes to promulgate an official standard for the format
of ARPA Network mail headers which will adequately meet the needs
of the various message service subsystems on the Network today.
The authors of this RFC constitute the CAHCOM subcommittee
charged with the task of developing this new standard; this
document presents our current thoughts on the matter and a
specific proposal.
This document is organized as follows: First, we present a
history, of the development of what has become known as the ARPA
Network "mail" or "message" service, and the issues which we feel
are most pressing -- problems for which solutions are lacking
today, inhibiting the further development of message subsystems.
We then present the specification for the new ARPA Network
Message Header standard. This is followed by a References
section.
Essentially, we propose a revision to Request for Comments
(RFC) 561, "Standardizing Network Mail Headers", and RFC 680,
"Message Transmission Protocol". This revision removes and
compacts portions of the previous syntax and adds several
features to network address specification. In particular, we
focus on people and not mailboxes as recipients and allow
reference to stored address lists. We expect this syntax to
provide sufficient capabilities to meet most users' immediate
needs and, therefore, give developers enough breathing room to
produce a new mail transmission protocol "properly". We believe
that there is enough of a consensus in the Network community in
favor of such a standard syntax to make possible its adoption at
this time.
We would like to make clear the status of this proposed
standard: The CAHCOM Steering Committee has replaced the Message
Service Committee as the ARPANET standards-setting organization
in the area of message services. It is expected that the
proposal of this CAHCOM subcommittee, when in its final form,
will be adopted as an ARPANET standard by CAHCOM. In the
interests of making this standard the best possible one, we are
distributing this proposal as an RFC.
Please send any comments and criticisms to any of the
authors of this RFC by 15 June 1977. It is planned that the
standard will be officially adopted by 1 September 1977, with
hosts expected to accept its syntax by 1 January 1978.
Proposed Standard for Message Format / iii
CONTENTS
I. PROBLEMS WITH ARPANET
MESSAGE STANDARDS
A. Background and History
B. Issues and Conclusions
C. Message Parts
D. Adoption of the Standard
II. STANDARD FOR THE FORMAT
OF ARPA NETWORK MESSAGES
A. Framework
B. Syntax
C. Semantics
D. Examples
III. REFERENCES
APPENDIX
A. Alphabetical Listing of Syntax Rules
I. Problems with ARPANET Message Standards / 1
A. Background and History
I. PROBLEMS WITH ARPANET MESSAGE STANDARDS
A. BACKGROUND AND HISTORY
Today's ARPA Network "mail" or "message" service uses, for
its delivery mechanism, two special commands of the File Transfer
Protocol. Viewed from within the structure of FTP, the entire
message, both header and text, is data for the FTP MAIL and MLFL
commands. This facility was added to the File Transfer Protocol
as an afterthought; it was an interim solution to be used only
until a separate mail transmission protocol was specified.
Several versions of such a protocol have been proposed, but none
has yet received general acceptance. Meanwhile, attempts have
been made to improve upon the original interim facility.
As message service subsystems on various host systems
(especially TENEX) developed to the point where rudimentary
parsing of incoming messages was being done, it became clear that
it would be desirable to standardize the format and content of
the headers of messages transmitted between hosts using these FTP
commands. To this end, an ad hoc committee wrote RFC 561, which
suggested a standard message header format. The committee was
unofficial, so it could not legislate a standard, it could only
recommend. However, the standard it suggested adequately met an
urgent need, and was generally adopted.
Several salient points should be noted:
1. RFC 561 defined the concept of a message header, and
specified the syntax which delimited it from the actual
text of a message;
2. It proposed a standard format for the most obvious and
most urgently-needed header items: "From:", "Date:", and
"Subject:";
3. It proposed that a general standard syntax be used for
all other header items;
4. RFC 561 is still, today, an unofficial standard, adhered
to by most because of its utility;
5. Its syntax was designed to allow humans to read the text
easily, without the aid of special message processing
systems.
I. Problems with ARPANET Message Standards / 2
A. Background and History
As message services grew in sophistication, the need for
specific header items in RFC 561's "miscellaneous" category grew:
"To:" and "cc:", especially, were generated and recognized by
several different message services. However, there was no
specific standard for the syntax of the contents of these items.
The message service subsystems on TENEX developed a particular
format for these items; since more messages originated from the
TENEX hosts on the Network than from any other type of host
system, the TENEX format for these fields soon became a de facto
standard. Message service subsystems on TENEX began to parse
these fields, expecting them to be in the TENEX-generated format.
Message service subsystems on other hosts -- Multics, for example
-- began to dabble with other formats for these fields, since
there was no standard for them, only to receive complaints from
users of TENEX message service subsystems that their "non-
standard" message headers could not be parsed according to the
(de facto) "standard" syntax.
Recognizing that the time had come to make an attempt to
standardize the additional header fields that had come into use
since RFC 561 was published, ARPA's Message Service Committee
chartered a small group in 1975 to develop a revised version of
RFC 561 which would define the syntax of these additional message
header fields. Several things should be noted about this small
group of people: first, they were TENEX-oriented; when the
functionality of the message header items they desired was
matched by the functionality of an already-existing message
header item of the TENEX message subsystems, they adopted the
syntax used by the TENEX message subsystems. Second, they based
additional header items not already found on TENEX message
subsystems on the deliberations of the Message Service Committee.
Third, they were not familiar with the procedure for publication
of a document as a Network RFC.
The document which this group produced, labelled RFC 680,
"Message Transmission Protocol", received only limited
distribution. Matters were further confused because its title
was misleading, since it was not a protocol for the transmission
of messages between ARPA Network hosts, but rather a standard for
the format of messages transmitted via the standard File Transfer
Protocol. Some, including the Message Service Committee,
believed that RFC 680 became a Network Standard. This was not
strictly true, because it never received proper distribution, and
it had never been "officially blessed" by anyone, to turn it from
a request for comments into an accepted official ARPA Network
standard document. Reflecting this confusion over the status of
the document are the facts that the document DOES currently
reside in the "official" ARPANET Protocol Handbook, and most
users and message system implementors remain unaware that this is
so.
I. Problems with ARPANET Message Standards / 3
A. Background and History
For all its shortcomings, RFC 680 has performed a needed
service, just as did RFC 561 before it. It defined additional
message header items at a time when this needed to be done.
Unfortunately, since the group had not sought ideas and input
from others, the specification did not adequately respond to a
sufficient set of community needs. In addition, the manner in
which the document was promulgated -- or not promulgated -- left
a great deal to be desired. Implementators of message-processing
subsystems who had not received RFC 680 proceeded to go their own
ways, feeling justified in doing so, while those who accepted RFC
680 as a standard felt justified in complaining to -- and about
-- those whom they considered to be maverick implementors of
idiosyncratic message service subsystems.
Perhaps because of the ad-hoc nature of the interim mail
facility, users have not, until recently, attempted to push the
system to the limits of their imagination. Presently, however,
several different sites are using the "interim" mail facility for
more than it was designed and in ways which are incompatible both
with each other and with the original intent of the facility.
Mail subsystem implementors are increasingly being asked to
provide for the handling of mail from idiosyncratic hosts. Also,
it has become clear that there are a few very specific features,
too useful to ignore, which cannot reasonably be specified within
the syntax of RFC 680.
B. ISSUES AND CONCLUSIONS
At first glance, it would seem that a resolution of today's
somewhat chaotic situation could best be obtained by immediately
junking the existing "interim" mail facility, and adopting a true
mail transmission protocol. We strongly believe that this would
be ill-advised at this time, for we feel that there is no general
understanding within the Network community today of how to
specify and implement a full and adequate mail transmission
protocol. However, we are convinced that there is, finally, a
strong commitment within the Network community to attack this
problem (which there was not at the time the "interim" mail
transmission facility was specified and developed).
The frontal attacks on the mail protocol problem have, so
far, resulted in at least two suggestions for a mail transmission
protocol. Why should not one of these protocols be adopted
immediately? We feel that, in general, there has been a tendency
for experimental Network software to be prematurely treated as
though it were adequately designed and fully operational.
Typically, the system or protocol proposed is so much better than
what was previously available that its experimental nature is
disregarded, and it is pressed into service before it has had a
I. Problems with ARPANET Message Standards / 4
B. Issues and Conclusions
chance to properly develop and mature. We are very concerned
that this phenomenon not afflict the Network mail system any more
than it already has.
While it is true that there are several sites in the ARPA
Community which have mail systems that understand the syntax
specified in RFC's 561 and 680, in addition to some of the "non-
standard" syntax provided by the mail generating programs at
several other sites, most mail systems do not parse much of the
contents of received messages. A consideration of the syntax
specified here is that messages which are sent to people should
be easily read by people. Parsers which can turn an ugly,
syntactically expedient form into something which is easy to read
are the exception, rather than the rule, in today's message
systems. Also, the modifications to the existing "non-standard"
syntax should be kept to a minimum, enhancing the probability
that the requirement of small perturbations to existing software
will be accepted.
With this syntax, we introduce mechanisms so that:
1. Users of mail systems can have multiple mailboxes, either
on one machine or multiple machines, all of which are
treated identically; the default mailbox for a user is
not necessarily associated (directly) with his login
name.
2. Mail for a person can be sent to other than a single,
default mailbox.
3. Named groups may consist of both individuals and
(possibly) other named groups (i.e., nesting within
groups is permitted).
4. Address lists may contain references to other, stored,
lists. The complete path with which one can retrieve the
stored list may be specified in order to allow either
manual or automatic retrieval of the stored list.
5. Address lists may contain references to addresses which
are not accessible through the standard ARPANET message
system. For example, U.S. Postal system addresses can
be specified. Such addresses are, of course, expected to
be ignored by the ARPANET system, although individual
sites may provide services for using the information
(e.g., automatically sending a copy of the message to a
line printer, in preparation for transmission through the
Postal system).
6. Parenthetical remarks, or comments, can be included and
syntactically recognized as such within some header
items.
I. Problems with ARPANET Message Standards / 5
B. Issues and Conclusions
7. Received messages are capable of being read by humans
without a program having to parse the message (or parts
of it) before presenting the message to the user; however
there is sufficient formal syntax to enable a parsing
program to modify the appearance and content of material
presented to users. Although message-display software
may exercise considerable control over message
appearance, the degree to which a message's actual format
is PLEASANT for humans to read is entirely the
responsibility of the message creation program.
No mechanism for authentication is provided, since the Network
provides no mechanisms for enforcing mail security. The syntax
does provide for one aspect of "correctness": a distinction is
made between an address which is claimed to be a valid network
address and one which is simply free text, included for the
convenience of the human participants.
C. MESSAGE PARTS
Some confusion has existed over the roles played by
different message parts. Einar Stefferud has suggested using the
perspective of envelope, letter head, and letter content. The
presence of structured portions in messages additionally requires
reference to "headers".
In computer-based message systems, human users do not
generally encounter "envelopes", which are often constructed
automatically, to be used by the participating system(s) to
deliver the message. For example on TENEX, the envelope is the
name of the file containing a message awaiting transmission. For
FTP servers, it is the data portion of the MAIL or MLFL command
line. Some systems attach "envelope-like" information to the
message header, such as time-stamp and originating host name.
In paper-based communications, headers occur both before
(e.g., "To:" and "From:" and after (e.g., "cc:" and "enclosure:")
the body of the message. Within this standard, all headers occur
before the body of the message, although local message display
programs may choose to alter that ordering.
Wayne Hathaway has pointed out that ARPANET message format
does not support specification of letterheads, since these are a
type of organizational public relations symbol. Some
idiosyncrasies are supported, however, by way of choosing special
field names.
In general, it is important to realize that the header
portion of a message plays several roles during the life of a
I. Problems with ARPANET Message Standards / 6
C. Message Parts
message, variously participating in each of the three functions
suggested by Stefferud.
D. ADOPTION OF THE STANDARD
During the early phases of specifying this standard, a great
deal of concern was expressed over the problems which may be
experienced during the transition from the current standard to
this new one. We feel that the true problem is the lack of
realization that THERE IS NO CURRENT OFFICIAL STANDARD. Enough
systems have enough overlapping behaviors to allow the current
mail environment to function, but this in no way constitutes a
standard.
In fact, we strongly believe that the new requirements
imposed by the proposed standard involve less complexity than the
ambiguities resulting from the current variations in system
behaviors.
II. Standard for the Format of Messages / 7
II. STANDARD FOR THE FORMAT
OF ARPA NETWORK MESSAGES
This standard supercedes the informal standards specified in
ARPANET Request for Comments numbers 561, "Standardizing Network
Mail Headers", and 680, "Message Transmission Protocol". In this
document, a general framework is described. The formal syntax is
then specified, followed by a discussion of the semantics.
Finally, a number of examples are given.
This specification is intended strictly as a definition of
what is to be passed between hosts on the ARPANET. It is NOT
intended to dictate either features which systems on the Network
are expected to support, or user interfaces to message creating
or reading programs.
A distinction should be made between what the specification
requires and what it allows. Certain equivalences are defined,
such as between a space character and an end-of-line
character , which both facilitate the formal specification
and indicate what the OFFICIAL semantics are for messages.
Particular implementations may wish to preserve further
distinctions which the specification does not require.
A. FRAMEWORK
Since there are many message systems which exist outside the
ARPANET environment, as well as those within it, it may be useful
to consider the general framework, and resulting capabilities and
limitations, of this standard.
Messages are expected to consist of lines of text. No
special provisions are made, at this time, for encoding drawings,
facimile, speech, or structured text.
No significant consideration has been given to questions of
data compression or transmission/storage efficiency. The
standard, in fact, tends to be very free with the number of bits
consumed. For example, field names are specified as free text,
rather than special terse codes.
A general "memo" framework is used. That is, a message
consists of some information, in a rigid format, followed by the
main part of the message, which is text and whose format is not
II. Standard for the Format of Messages / 8
A. Framework
specified in this document. The syntax of several fields of the
rigidly-formated ("header") section is defined in this
specification; some of the header fields must be included in all
messages. In addition to the fields specified in this document,
it is expected that other fields will gain common use. User-
defined header fields allow systems to extend their functionality
while maintaining a uniform framework. Our approach is similar
to that of the TELNET protocol, in that we are defining a basic
standard which includes a mechanism for (optionally) extending
itself. The authors of this document will regulate the
publishing of specifications for these extensions.
Such a framework severely constrains document "tone" and
appearance and is primarily useful for most intra-organization
communications and relatively structured inter-organization
communication. A more robust environment might allow for multi-
font, multi-color, multi-dimension encoding of information. A
less robust environment, as is present in most single-machine
message systems, would more severely constrain the ability to add
fields and the decision to include specific fields. Relative to
paper-based communication, it is interesting to note that the
RECEIVER of a message can exercise an extraordinary amount of
control over the message's appearance. The amount of actual
control available to message receivers is contingent upon the
capabilties of their individual message systems.
II. Standard for the Format of Messages / 9
B. Syntax
B. SYNTAX
This syntax is given in four parts. The first part
describes a base-level lexical analyzer which feeds the higher-
level parser described in the succeeding sections. The second
part gives a general syntax for messages and standard header
fields. The third part specifies the syntax of addresses. A
final section specifies some general syntax which supports the
other sections.
1. LEXICAL ANALYSIS OF MESSAGES
a. General Description
A message consists of headers and, optionally, a body (i.e.
the ). The part is just a
sequence of ASCII characters; it is separated from the
headers by a null line (i.e., a line with nothing preceding
the ).
1) Folding and unfolding of headers
Each header item can be viewed as a single, logical, long
line of ASCII characters. For convenience, this
conceptual entity can be split into a multiple-line
representation (i.e., "folded"). The general rule is that
wherever there can be characters, you
can instead insert a immediately followed by AT
LEAST one character. Thus, the
single line
To: "Joe Dokes & J. Harvey" , JJV at BBN
can be represented as
To: "Joe Dokes & J. Harvey" ,
JJV at BBN
and
To: "Joe Dokes & J. Harvey"
,
JJV at BBN
II. Standard for the Format of Messages / 10
B. Syntax
1. Lexical Analysis
and
To: "Joe Dokes
& J. Harvey" , JJV at BBN
The process of moving from this folded multiple-line
representation of a header field to its single line
representation will be called "unfolding". Unfolding is
accomplished by regarding immediately followed by a
as equivalent to the .
2) Structure of header fields
Once header fields have been unfolded, they may be viewed
as being composed of a followed by a ":"
(colon), followed by a . The
must be composed of printable ASCII characters (i.e.,
characters which have decimal values between 33 and 126)
and characters. The may
composed of any ASCII characters (other than and
, which have been removed by unfolding).
Certain header fields may be interpreted according to an
internal syntax which some systems may wish to parse.
These fields will be referred to as structured fields.
Examples include fields containing dates and addresses.
Other fields, such as the subject field, are regarded
simply as a single line of text.
3) Field names
To aid in the creation and reading of s, the
free insertion of characters is
allowed in reasonable places. Rather than obscuring the
syntax specification for with the explicit
syntax for these characters, the
existence of a simple "lexical" analyzer is assumed. The
analyzer reinterprets the unfolded text which comprises
the as a sequence of separated by
characters. The field name may be
conveniently represented by the sequence of these atoms,
separated by a single ASCII space character.
II. Standard for the Format of Messages / 11
B. Syntax
1. Lexical Analysis
4) Field bodies
To aid in the creation and reading of structured fields,
the free insertion of characters is
allowed in reasonable places. Rather than obscuring the
syntax specifications for these structured fields with
explicit syntax for these characters,
the existence of another simple "lexical" analyzer is
assumed. It provides an interpretation of the unfolded
text comprising the body of the field as a sequence of
lexical symbols. These include
- individual special characters
- quoted strings
- comments
- atoms
The first three symbols are self-delimiting. Atoms are
not; they therefore are delimited by the self-delimiting
symbols and by .
So, for example, the folded body of an address field
":sysmail"@ Some-Host,
Muhammed(I am the greatest)Ali at WBA
is analyzed into the following lexical symbols and types:
":sysmail" quoted string
@ special
Some-Host atom
, special
Muhammed atom
(I am the greatest) comment
Ali atom
at atom
WBA atom
b. Formal Definition
::= ":" ::=
| ::=
|
II. Standard for the Format of Messages / 12
B. Syntax
1. Lexical Analysis
::= , as defined in
the following sections, and
consisting of combinations of
, , ,
and tokens>
::= >
::= <">
::=
and >
::= ::= |
| | ::= "(" | ")" | "<" | ">"
| "@" | "," | ";" | ":" | <">
::= "(" > ")"
::=
| ::= | ::= ::= ::= ::= ::=
II. Standard for the Format of Messages / 13
B. Syntax
1. Lexical Analysis
c. Clarifications
1) Comments
Comments may appear only within s of
structured fields. A comment is any set of TELNET ASCII
characters, which is not within a quoted string, and which
is enclosed in matching parentheses; parentheses nest, so
that if a left paren occurs in a comment string, there
must also be a matching right paren.
Comments are NOT passed to the FTP server, as part of a
MAIL or MLFL command, since comments are not part of the
"formal" address.
2) "White space"
Remember that in structured fields, MULTIPLE LINEAR WHITE
SPACE TELNET ASCII CHARACTERS (namely s and s)
ARE TREATED AS SINGLE SPACES AND MAY FREELY SURROUND ANY
SYMBOL. In all header fields, at least one is
REQUIRED only at the beginning of folded lines.
Writers of mail-sending (i.e. header generating) programs
should realize that there is no Network-wide definition of
the effect of TELNET ASCII characters on the
appearance of text at another Network host; therefore, the
use of s in message headers, though permitted, is
discouraged.
Note that the contents of messages are required to conform
with TELNET NVT conventions (e.g. must be followed
by either , making a , or , if the is
to stand alone).
3) Quoted strings
Where permitted (i.e., in structured fields) quoted
strings are treated as a single symbol (i.e. equivalent
to an syntactically). However, if quoted strings
are to be "folded" onto multiple lines, then the syntax
for folding must be adhered to (See items II.B.1.a.1,
above, and II.B.1.c.6, below.) Note that the official
semantics do not encounter s in quoted strings,
although particular parsing programs may wish to note
their presence.
II. Standard for the Format of Messages / 14
B. Syntax
1. Lexical Analysis
4) Bracketing characters
There are two types of brackets which must be well nested:
- Parentheses are used to indicate comments.
- Angle brackets ("<" and ">") are used
where there is a question of the presence
of machine-usable code (e.g. deliminating
mailboxes).
5) Case independence of certain specials s
It should be assumed by all mail reading programs that
certain s can be represented in any combination of
upper and lower case. These are:
- s,
- "File", in a ,
- "at", in an ,
- s,
- s,
- s, and
- s
For example, the s "From", "FROM", "from", and
even "FroM" should all be treated identically. Note that,
at the level of this specification, case IS relevant to
other s and s. Also see Section
II.C.1.a.4, below.
6) Folding long lines
Each header item (field of the message) may be represented
on exactly one line consisting of the name of the field
and its body, and this is what the parser sees. For
readability, it is recommended that the
portion of long header items be "folded" onto multiple
lines of the actual header.
7) Backspace characters
Backspace TELNET ASCII characters (ASCII BS, decimal 8)
may be included in and to
effect overstriking; however, any use of backspaces which
effects an overstrike to the left of the beginning of the
or is prohibited.
II. Standard for the Format of Messages / 15
B. Syntax
2. Messages
2. GENERAL SYNTAX OF MESSAGES:
NOTE: The syntax indicates that items in
must be in a specific order and precede all other header
items. Header fields, in fact, are NOT required to occur in
any particular order. Required header items must be unique
(occur exactly once). This specification permits multiple
occurrences of most optional fields. However, the
interpretation of such multiple occurrences is not specified
here.
::=
| ::=
| ::= ::=
|
|
| ::= "Date" ":" ::= "From" ":" ::= "From" ":" ::= "From" ":" ::= "Sender" ":" ::= "Reply-To" ":" ::=
| ::=
| ::= "To" ":"
| "cc" ":"
| "bcc" ":"
| "Fcc" ":" ::= "In-Reply-To" ":"
| "Keywords" ":"
| "Message-Id" ":"
| "References" ":"
| "Subject" ":"
| "Comments" ":"
|
II. Standard for the Format of Messages / 16
B. Syntax
2. Messages
::= which has a
not defined in this specification>
The following syntax for the bodies of various fields should be
thought of as describing each field body as a single long string
(or line). The section on Lexical Analysis (section II.B.1)
indicated how such long strings can be represented on more than
one line in the actual transmitted message.
3. SYNTAX OF GENERAL ADDRESSEE ITEMS
::=
| "," ::=
|
| "," ::=
| ":" ";"
|
| ::=
| "<" ">"
::= ::= ::=
| "," ::= ::= ":" "File" ":" ::=
| "<" ">"
::=
| "," ::=
II. Standard for the Format of Messages / 17
B. Syntax
4. Supporting Constructs
4. SUPPORTING SYNTAX
::=
|
| "," ::=
| ::= "<" ">"
::= ::= ::= "at" | "@"
::=
| ::=