Token Tree library HOWTO
Hans Petter Jansson, hpj@teletopia.no
Release 1, 19981017

This document describes the token tree data manipulation/storage facilities
in libflux. It consists of a general introduction and example programs
putting them to use. This is not intended as a reference, and does not
cover all the details on functions calls and return values.


1.0.0  Introduction

1.1.0  The Tree Abstraction

A token tree is a hierarchical data structure. Such a tree has a root
node, and any number of data nodes. Though you're probably already
familiar with such structures, this is what it might look like:

 -- Greater depth -->

          .-. 
          | |
          `-'
         /
.-.   .-.     .-.
|R| - | |     | |
`-'   `-'     `-'
         \   /
          .-.   .-.
          | | - | |
          `-'   `-'
             \
              .-.
              | |
              `-'

This is true of most hierarchical structures, like your directory system,
the political organization of a feudal state, etc. In our model, each node
can have any number of children, linked thusly:
    _
|  |_|
  
G   |
r   v
e   _
a  |_|
t  
e   |
r   v
    _      _
d  |_| -> |_|
e   
p   |
t   v
h   _      _      _
   |_| -> |_| -> |_|
|
v

This means that a node does not have a pointer to each child, rather it
stores a pointer to the first node of a linked list of children. Every
child in such a list, however, stores a link to the list's parent.

1.2.0  Goals and Features

Organization of data in trees is nothing new, but used as a generic data
parsing/exchange philosophy, it can be a powerful tool. Together with
other libflux functions, the structures can be used in human-readable
configuration/specification, and parsed/generated by the library. Trees
can be read from files, sent as data streams and re-generated, with no
particular effort on the application programmer's part.

Also, trees can be sorted, searched and traversed efficiently, using
callback functions.

Stored in files, trees are 7-bit ASCII, easily read by a human. All
non-printable characters are represented by hexadecimal escape codes. In
memory, trees are arranged as to be as accessible as possible to the
programmer. When sent through a liblux socket, using the comm functions,
they're compact and subject to the socket's encryption, only readable
by libflux comm functions on the other end.

1.3.0  Token Trees on File

The file format used for storing token trees is very simple. The root node
is assumed, and holds no data. Its children are defined on the "root
level". The format is easily described by an example:

token { token token token { token } }
token { token }

Tokens with spaces in them are enclosed by double quotation marks. There
is no need to place spaces between braces and tokens. If a brace or
double quotation mark is part of a token, it must be escaped with
backslash, \.

\"
\{
\}

These are all legal tokens or token parts. Spaces, tabs and end-of-lines
are interchangeable delimiters. To have a token span several lines, you
will have to end the line in a \. Another example:

"This is a \"root-level\" token."{"And this is its child, \
spanning two lines."} "Another root-level token."
Third-root-level-token-with-\{braces\}-in-it.

The hexadecimal excape codes are also prepended with backslash: \ff, \00,
etc. The characters [a-f] are case insensitive.

Finally, you can have comments in files. Comments start with a hash mark,
#, and last until end-of-line. They're ignored by the parser, so a parsed
and re-written file will lose its comments.

2.0.0  Tree Manipulation and Function Calls

A ttree node is a typedef TTREE. To create a handle for a tree, do:

TTREE *your_ttree;

There is no external distinction between root nodes and others, so any
node can be addressed as a TTREE.

2.1.0  Parsing and Generating Files

File handling is simple. To load a ttree, you need either an open file (a
basic FILE handle), just the file name, or a pointer to a memory-resident
string. These are the functions:

ttree_strload(): Parses an in-memory null-terminated string.
ttree_fload(): Parses an open file.
ttree_load(): Opens and parses a file, given its name.

These return a (TTREE *), or NULL if there was a parsing or I/O error.
Storage functions are mostly symmetrical to the ones above:

ttree_fsave(): Saves to an open file.
ttree_save(): Creates (overwrites) a file by name.

These return the number of nodes saved, or zero upon I/O failure. Note
that there is no ttree_strsave(). The ttree_strload() function is mainly
intended to parse a program's internal defaults, and a generated in-memory
string probably has no great value (this may change).

2.2.0  Creating, Adding and Disposing of Nodes

You can create nodes with ttree_node_add(). It takes a pointer to the
node's data, and its size. If you want to add the node to an existing
tree you can pass a pointer to its parent. If this pointer is NULL, a root
node will be created. Important note: When inserting strings, you don't
have to include its terminating null-byte in the count. A terminating zero
is appended to any data, to simplify direct string operations and keep
silly bugs at a minimum.

As you might well imagine, destroying is easier than creating. A call to
ttree_branch_remove() removes the node passed, along with its children,
freeing their memory and deleting any associated files (more on this in
section 2.5.0).

As of version 0.2.0, there is no support for splitting or merging trees.

2.3.0  Collecting Statistics on a Branch

ttree_branch_stats() gives you a node and byte count for the passed top
node and its children, optionaly the whole branch (if the TTREE_DESCEND
flag is set in the options field).

The byte count regards data only, not the memory needed to contain the
node structures themselves.

2.4.0  Traversing and Searching

The generic traversal function is ttree_nodes_do(). This takes the nodes
on the level directly below the passed node (not the node itself), and
performs a specified function on them. The callback function's return
value specifies if the traversal is to stop or go on. The whole branch can
be done recursively by setting the TTREE_DESCEND flag in the options
argument.

ttree_nodes_match() is a discriminating ttree_nodes_do(). In addition to
the latter's arguments, it takes a data pointer and size, comparing it to
the contents of the nodes it processes, invoking the callback only for
nodes matching the data and size. This is faster and more convenient than
using ttree_nodes_do() to invoke the callback for every node and
performing the match there.

ttree_node_find1() is practical when you're looking for a node with data
you know is unique within the search scope, or when it doesn't matter
which of several similar nodes is found. It takes the same arguments as
ttree_nodes_match(), but needs no callback function, directly returning
the first match instead (or NULL if there was none).

The last traversal function discussed here stands a bit apart from the
others. It is called ttree_branch_walk() and picks out a path through the
branching tree, based on a list of node names passed. Its arguments are
the a pointer to the parent node of the supposed first element of the
list, and the actual list of strings defining the path. Strings are
separated by spaces, and as of version 0.2.0 there is no way to list nodes
with spaces in their data. The function returns a pointer to the last node
in the path, or NULL if the specified path didn't exist. This is good for
configuration parsing - checking if options are set, for example.

A future version of this function (named differently) will use varargs to
construct a list of nodes, with no constraints on data content.

2.5.0  Technical Notes

Loaded ttrees are stored in main memory, with some exceptions: Really
large nodes are stored on disk, in /tmp, and mmap()ed into memory instead.
Thus, ttrees can still be used to hold big data files transparently. Files
are created and deleted automatically.

2.6.0  Further reading

Ttrees can be exchanged over libflux' stream sockets, using functions
described in the comm-howto. You should read it, along with sock-howto, to
understand sockets and ttree serialization.

3.0.0  Examples

3.1.0  Taking a ttree as stdin, pretty-printing it to stdout

int main()
{
  TTREE *t;

  t = ttree_fload(stdin);
  if (!t) { puts("Ttree reading failed."); exit(2); }

  ttree_fsave(t, stdout);
  ttree_branch_remove(t);
  return(0);
}

3.2.0  Reading a config file and acting on options set in it

/*   /etc/test.conf might look like this:
 *
 *   options
 *   {
 *     logging
 *     {
 *       stdout
 *       logfile { /var/log/test.log }
 *     }
 *   }
 */

int main()
{
  TTREE *cfg, *n0;

  cfg = ttree_load("/etc/test.conf");
  if (!cfg) { puts("Error loading config file."); exit(2); }

  if (ttree_branch_walk(cfg, "options logging stdout"))
    puts("Stdout logging enabled.");

  if ((n0 = ttree_branch_walk(cfg, "options logfile")))
  {
    if (n0->child && n0->child->data)
      printf("Logging to file \"%s\".\n", n0->child->data);
  }

  /* ... */

  ttree_branch_remove(cfg);
  return(0);
}

3.3.0  A more complex config file parser

/*   /etc/test.conf might look like this:
 *
 *   options
 *   {
 *     logging
 *     {
 *       stdout
 *
 *       # Well, this is kind of silly, I admit it:
 *
 *       logfile { /var/log/test.log }
 *       logfile { /root/test.log }
 *       logfile { /var/test/test.log }
 *     }
 *   }
 */

int logfile_report(TTREE *n0)
{
  if (n0->child && n0->child->data)
  {
    printf("Logging to file \"%s\".\n", n0->child->data);
    return(1);
  }

  puts("Malformed logfile entry in configuration file. Halting.");
  return(-1);
}

int main()
{
  TTREE *cfg, *n0;

  cfg = ttree_load("/etc/test.conf");
  if (!cfg) { puts("Error loading config file."); exit(2); }

  if ((n0 = ttree_branch_walk(cfg, "options logging")))
  {
    /* Check if there is at least one stdout entry */

    if (ttree_node_find1(n0, "stdout", 6, 0))
      puts("Stdout logging enabled.");

    /* Find all logfile entries */

    ttree_nodes_match(n0, logfile_report, "logfile", 7, 0);
  }

  /* ... */

  ttree_branch_remove(cfg);
  return(0);
}
