Specifications for Datalanguage, Version 0/9 :: RFC0515
Network Working Group R. Winter
Request for Comments: 515 Computer Corporation of America
NIC 16446 6 June 1973
Specifications for Datalanguage, Version 0/9
Preface
Datalanguage is the language processed by the Datacomputer, a data
utility system being developed for the Arpanet. The Datacomputer
performs data storage and data management functions for the benefit
of computers on the network.
Version 0/9 is currently running at CCA. This version is extremely
primitive; however, it does offer an opportunity for experience with
the Datacomputer and with fundamental Datalanguage concepts.
Subsequent versions will provide greater portions of the full
Datalanguage capability, which has been described earlier
(Datalanguage, Working Paper No. 3, Datacomputer Project, October,
1971, NIC 8028). For example, one of the primary restrictions in
0/9--elementary data items must be fixed-length ASCII strings--will
be eliminated in Version 0/10, which is currently being implemented.
Based on the experience gained in the implementation of these early
versions, and based on the feedback from their use, a revised
specification of the full language will be issued.
1. Introduction
This document presents a precise and complete specification of
Datalanguage, Version 0/9. It is organized into 11 sections, of
which this introduction is the first. Section 2 discusses the
capabilities of Version 0/9 in general terms. Sections 3 and 4 are
concerned with data description and the directory. Sections 5
through 8 cover the expression of data management operations.
Section 9 discusses the recognition of names. Section 10 covers
miscellaneous topics and Section 11 specifies the syntax in BNF.
This specification is to be followed with a user manual, which will
present the language in tutorial form and treat components of the
Datacomputer-user interface other than the language.
Winter [Page 1]
RFC 515 Specifications for Datalanguage, Version 0/9 6 June 1973
2. Capabilities of Version 0/9
Version 0/9 of Datalanguage has capabilities for the storage of
files; for addition of data to existing files, and for the deletion
of files. Retrievals can output whole files as well as subsets of
files. Data can be selected from files by content, using expressions
formed from boolean and inequality operators.
At the option of the file creator, an inversion is constructed and
maintained by the Datacomputer. The inversion increases the
efficiency of selective retrieval, at the cost of storage space and
file maintenance effort. Users other than the file creator need not
be aware of the existence of the inversion, or of which fields are
inverted file keys. The language is designed so that they state the
desired result of a retrieval, and the Datacomputer uses the
inversion as much as the request permits.
Elementary data items are fixed-length ASCII strings. Files are a
restricted class of hierarchical structures.
Many of the restrictions mentioned in this memo will be short-lived.
In particular, those statements followed with 3 asterisks (***) refer
to restrictions that will be considerably weakened or eliminated
entirely in the next version of the software.
3. Data Description
A container is a variable whose value is a data object of general
character and arbitrary size (In Version 0/9, size is restricted.
See section 3.4). Examples of containers which are implemented in
other systems are files, records, fields, groups, and entries.
The container is distinct from the data in the container. For
example, space allocation is an operation on a container, while
changing the unit price field from 25 to 50 is an operation on data
in a container.
A container may enclose other containers. When a container is not
enclosed by another container, it is said to be outermost. If
container A encloses container B, and no other container in A also
encloses B, then A immediately encloses B.
A Datalanguage description is a statement of the properties of a
container.
All containers have the attributes ident and type. Ident is a
character string by which users refer to the container. Type
determines the form of the container's value; the value can be
Winter [Page 2]
RFC 515 Specifications for Datalanguage, Version 0/9 6 June 1973
elementary, or it can consist of other containers. There are 3
types: LIST, STRUCT, and STRING(***). A LIST contains a group of
containers having the same description. A STRUCT contains a group of
containers, each of which has its own description. A STRING is a
sequence of ASCII characters. While a STRING is not really an
elementary item, it is handled as one in Version 0/9.
Certain containers can have other attributes. An outermost container
has a function. The function attribute specifies whether the
container is to be used for storage or for transmission.
Size is some meaningful dimension of the container, which is type-
dependent. It is used for space allocation and data stream parsing.
An aggregate container (i.e., one that contains other containers) has
as an attribute the description or descriptions of its components.
Thus if S is a STRUCT containing A, B, and C, then the descriptions
of A, B, and C are attributes of S.
A STRING defined in certain contexts can have an inversion attribute.
This is an access property that is not really local to the STRING,
but is associated with it for convenience.
3.1 Ident
The ident of a container is composed of alphanumeric characters,
the first of which is alphabetic. It may not consist of more than
100 characters.
The elements of a STRUCT must have idents unique in the STRUCT.
3.2 Function
The function of a container is either FILE, PORT, or TEMPORARY
PORT. When the function is FILE, then the container is used for
storage of data at the Datacomputer. When the function is PORT,
then the container is used for transmission of data into or out of
the Datacomputer. When the function is TEMPORARY PORT (which may
be abbreviated TEMP PORT), the container behaves like a PORT;
however, its description is not retained in the Datacomputer
beyond the session in which it is created.
Winter [Page 3]
RFC 515 Specifications for Datalanguage, Version 0/9 6 June 1973
3.3 Type
Type is one of: LIST, STRUCT, or STRING. These are defined on the
preceding page.
In an occurrence of a STRUCT, the elements appear in the order in
which their descriptions appear in the STRUCT description. All
elements are present in each occurrence of the STRUCT.
An element of a STRUCT or LIST can be a container of any datatype.
However, the outermost container must be a LIST(***).
3.4 Size
The size of a STRING is the number of characters in it. The size
of a STRUCT is not defined (***). The meaning of the size of a
LIST depends upon other properties of the LIST (***).
Ordinarily, the size of a LIST is the number of LIST-members. An
exception is the case of the outermost-LIST. In an outermost-LIST
with a function of FILE, the size is the number of LIST-members
for which space should be allocated. When no size is present in
this case, the system computes a default. In an outermost-LIST
with a function of PORT, the size is ignored (***).
Only outermost containers may be larger than a TENEX page (2560
ASCII characters)(***).
3.5 Inversion
An inversion is an auxiliary data structure used to facilitate
retrieval by content.
Its basic application is the fast retrieval of sets of outermost-
LIST-members (this can be extended to other container sets, and
will be after release 1). Consider a list of weather
observations, stored as a file on the Datacomputer. If quick
retrieval of observations by COUNTRY is desired, then this is
indicated in the description of the COUNTRY container. According
to common usage in information retrieval, this makes COUNTRY a key
in the retrieval of observations.
Note that the inversion option only affects the efficiency of
retrieval by COUNTRY, not the ability to retrieve by COUNTRY.
Winter [Page 4]
RFC 515 Specifications for Datalanguage, Version 0/9 6 June 1973
There are restrictions on use of the inversion option. First, it
can be applied only to STRINGs. Second a STRING having the
inversion option must occur only once in each
outermost-LIST-member. Third, it is ignored when applied to
STRINGs in PORT descriptions.
Eventually there will be several types of inversion option; in
Version 0/9 there is only the 'D' option (for distinct).
3.6 Syntax
The description is simply an enumeration of properties; these
properties are specified in the order:
Properties which do not apply are omitted. An example:
F FILE LIST (25) A STR (10)
Here 'F' is the , 'FILE' is the , 'LIST' is the
, '(25)' is the size, and 'A STR (10)' is the of one
description. Of course, 'A STR (10)' is itself another
description: the description for members of the LIST named F.
An example of a complete description for a file of weather
observations keyed on location:
WEATHER FILE LIST
OBSERVATION STRUCT
LOCATION STRUCT
CITY STR (10), I=D
COUNTRY STR (10), I=D
END
TIME STRUCT
YEAR STR (2)
DAY STR (3)
HOUR STR (2)
END
DATE STRUCT
TEMPERATURE STR (3)
RAINFALL STR (3)
HUMIDITY STR (2)
END
END
The ENDs are needed to delimit the list of elements of a STRUCT.
`, I=D' indicates that the string is to be an inversion key for
the retrieval of outermost-LIST-members.
Winter [Page 5]
RFC 515 Specifications for Datalanguage, Version 0/9 6 June 1973
4. Directory
The directory is a system file in which the names and descriptions of
all user-defined containers are kept.
The directory is structured as a tree. Each node has an ident, which
need not be unique. There is a single path from the root of the tree
to any node. The idents of the nodes along this path are
concatenated, separated by periods, to form a pathname, which
unambiguously identifies the node (e.g., A.B.C could be a pathname
for node with an ident of C).
In a later version of the software, the directory will be generalized
to provide for links between nodes, so that it will not properly be a
tree. For now, however, the tree model is convenient and adequate.
A node may represent a container, or it may simply hold a place in
the space of pathnames. When it represents a container, it cannot
(currently) have subordinate nodes.
Eventually, it is planned to model the directory as a structure of
containers, with its description distributed throughout the
structure. Most operations defined on the directory will be defined
on user data, and vice versa. Access privileges and privacy locks
will be part of the data description and will likewise be applicable
both to directory nodes and data structures below the node level.
4.1 CREATE
A CREATE-request either; (a) adds a node to the directory,
optionally associating the description of either a PORT or a FILE
with the node, or (b) creates a temporary container which is not
entered in the directory, but has a description and can be
referenced in requests. If the description defines a file, CREATE
causes space to be allocated for the file.
To create a node with a description:
CREATE ;
To create a node with no description:
CREATE ;
Note that the description determines whether or not the container
is temporary (see section 3.2 for details).
Winter [Page 6]
RFC 515 Specifications for Datalanguage, Version 0/9 6 June 1973
A CREATE-request adds a single node to the directory. Thus to add
CCA.RAW.F to an empty directory, three requests are needed:
CREATE CCA ;
CREATE CCA.RAW ;
CREATE CCA.RAW.F ;
Notice that the last ident of the pathname doubles as the first
ident of the description:
CREATE CCA.RAW.G FILE LIST A STR (5) ;
That is, G is both the ident of a node and the ident of an
outermost container of type LIST.
4.2 DELETE
A DELETE-request deletes a tree of nodes and any associated
descriptions or data. The syntax is:
DELETE ;
The named node and any subordinates are deleted.
Note that to delete data while retaining the directory entry and
description, DELETE should not be used (see section 6.3 for the
proper method).
4.3 LIST
The LIST-request is used to display system data of interest to a
user. It causes the data specified to be transmitted through the
Datalanguage output port.
Several arguments of LIST apply to the directory. LIST %ALL
transmits all pathnames in the directory. LIST %ALL.%SOURCE
transmits all descriptions in the directory. Instead of %ALL, a
pathname can be used:
LIST .%ALL
Lists pathnames subordinate to .
LIST .%SOURCE
lists descriptions subordinate to the node represented by .
For details about the LIST-request, see section 10.1.
5. Opening and closing containers
Containers must be open before they can be operated on.
A container is open when it is first created. It remains open until
closed explicitly by a CLOSE-request or implicitly by a DELETE-
request or by session end.
Winter [Page 7]
RFC 515 Specifications for Datalanguage, Version 0/9 6 June 1973
A closed container is opened by an OPEN-request. A temporary
container is always open; a CLOSE-request deletes it.
5.1 Modes
An open container has a mode, which is one of: READ, WRITE, or
APPEND. The mode determines the meaning and/or legitimacy of
certain operations on the container.
The mode is established by the operation which opens the
container. It can be changed at any time by a MODE-request. A
CREATE leaves the container in WRITE mode. An OPEN either
specifies the mode explicitly or implicitly sets the mode to READ.
5.2 Syntax
To open a container:
OPEN ;
or:
OPEN ;
where is defaulted to READ.
To close a container:
CLOSE ;
where is the name of an outermost container.
Two containers with the same outermost can not be
opened at the same time (***).
To change the mode of an open container:
MODE ;
5.3 LIST
LIST %OPEN transmits name, mode and connection status of each open
outermost container through the Datalanguage output port. (The
Datalanguage output port is the destination to which all
Datacomputer diagnostics and replies are sent. It is established
when the user initially connects to the Datacomputer.) For
details of the LIST-request, see section 10.1.
6. Assignment
Assignment transfers data from one container to another.
The equal sign ('=') is the symbol for assignment. The value of the
operand on the right of the equal sign is transferred to the operand
on the left. (Eventually, both operands will be weakly-restricted
Winter [Page 8]
RFC 515 Specifications for Datalanguage, Version 0/9 6 June 1973
Datalanguage expressions, which may evaluate to sets as well as to
single containers. Now, the left must be a container name, the right
may be a container name or a constant.)
Assignment is defined for all types of containers. When the
containers are aggregates, their elements are paired and data is
transferred between paired elements. Elements of the target
container that do not pair with some source element are handled with
a default operation (currently they are filled with blanks).
The operands of an assignment must have descriptions that match. The
idea of matching is that the descriptions must be similar enough so
that it is obvious how to map one into the other.
6.1 Conditions for legitimate assignment
Assignment must reference objects, not sets. An object is:
(a) an outermost container, or
(b) a constant, or
(c) in the body of a FOR-loop, either
(c1) a member of a set defined by a FOR-OPERAND, or
(c2) a container which occurs once in such a member
In the case of a reference of type (c1), the object referenced is
taken to be the current member. In the case of (c2), the object
referenced is that which occurs in the current member. This is
explained further in section 7.
The left operand of an assignment is subject to further
restriction. If it is an outermost container, it must be open in
either WRITE- or APPEND-mode. If it is not an outermost
container, then the reference is of type (c), which means that
some FOR-operand has established a context in which the assign-
operand is an object. The FOR-operand which establishes this
context must be the output-operand of the FOR.
When the assign-operand is an outermost container, it must be
open. Such an operand must be referenced by its simple container
ident(***), not its directory pathname.
In the body of a loop nested in one or more other loops,
assignments are further restricted, due to a 0/9 implementation
problem. See section 7.2 for details.
Winter [Page 9]
RFC 515 Specifications for Datalanguage, Version 0/9 6 June 1973
Finally, the descriptions of the operands must match. If one is a
constant, then the other must be a STRING(***). If both are
containers, then in the expression:
A = B;
the descriptions of containers A and B match if:
1. A and B have the same type
2. If A and B are LISTs, then they have equal numbers of
LIST-members, or else A is an outermost-LIST.
3. If A and B are aggregates, then at least one container
immediately enclosed in A matches, and has the same ident as, one
container immediately enclosed in B.
6.2 Result of assignment
If the operands are STRINGs, then the value of B, left-justified,
replaces the value of A. If B is longer than A, the value is
truncated. If B is shorter than A, then A is filled on the right
with blanks as necessary.
If the operands are STRUCTs, then assignment is defined in terms
of the STRUCT members. If a member of A, mA, matches and has the
same name as a member of B, mB, then mB is assigned to mA. If no
such mB exists, then mA is filled with blanks.
If the operands are LISTs, the result depends on several factors.
First, notice that the descriptions of the LIST-members must
match; otherwise the assignment would not be legitimate by the
matching rules of 6.1.
If A is an outermost-LIST, then it can be in either of two modes:
WRITE or APPEND. If A is in WRITE-mode, its previous contents are
first discarded; it is then handled as though it were in APPEND-
mode.
If A is not an outermost-LIST, then it is always effectively in
WRITE-mode.
After taking the mode of A into account, as described above, the
procedure is:
for each member of LIST B
(a) add a new member to the end of A
(b) assign the current number of B to the new member of A
Winter [Page 10]
RFC 515 Specifications for Datalanguage, Version 0/9 6 June 1973
6.3 Deletion of Data Through Assignment
If A is an outermost container in WRITE-mode, and B is a container
with description that matches A, and if B contains no data, then
A=B has the effect of deleting all data from A. Note that if A is
in APPEND-mode in these circumstances, then A=B is a no-operation
(i.e., has no effect).
7. FOR
FOR