Data storage
------------

Block allocation in separate files based on block size. Kinda like malloc().

8, 16, 32, 64, 128, 256, 512, 1024, etc.

Index by binary tree:

data_00004 =     4  --- Binary 32-bit integers
data_00008 =     8  --- ASCII decimal integers
data_00010 =    16  --- Most english words

Index by small hash function (strong polynomial):

data_00020 =    32  --- Longest identifiers
data_00040 =    64
data_00080 =   128  --- One line of text (short description, mail header line)
data_00100 =   256
data_00200 =   512
data_00400 =  1024

Index by medium hash function (XOR byte and rotate):

data_00800 =  2048  --- Typical large newsgroup text
data_01000 =  4096  --- Smallest useful picture data
data_02000 =  8192

Index by large hash function (XOR long and rotate):

data_04000 = 16384
data_08000 = 32768  --- Largest portrait pictures (JPEG)
data_10000 = 65536  --- Largest reasonable newsgroup texts (1/4 a book)

Everything else stored in per-blob files.

This makes for 16 categories, suitable for storing in upper 4 bits of data ID.


Hash table storage
------------------

hash_data_small   65536 buckets initially
hash_data_medium  32768 buckets initially
hash_data_large   16384 buckets initially

hash_words        65536

Use linear probing to reduce complexity?


Hints
-----

Should be stored as 4 bits.

- General:

Static      Data is set once, and won't change since
Dynamic     Data will change in database - replace, insert, prepend, append

- Specific:

String      Data consists of a short ASCII string
Identifier  Data is an identifier (word) - may occur repeatedly, with children
Filename    Data is or can be used as a Unix filename
Text        Data is an ASCII text with multiple words
