+---------+
| Catalog |
+----+----+-------------------------------+
| | |
| | +-------------+ |
| +--->| SymbolTable | |
| | +---+---------+---------------+ |
| | | | | |
| | | | +-------------------+ |
| | | | | Symbol (ID, Text) | |
| | | +---->| Symbol (ID, Text) | |
| | | | ... | |
| | +---------+-------------------+ |
| | |
| | +-------------+ |
| +--->| SymbolTable | |
| | +---+---------+---------------+ |
| | | | | |
| . | | +-------------------+ |
| . | +---->| Symbol (ID, Text) | |
| . | | ... | |
| . +---------+-------------------+ |
| . |
+-----------------------------------------+
The Catalog holds a collection of ion\Symbol\Table instances queried from ion\Reader and ion\Writer instances.
See also the ION spec's symbol guide chapter on catalogs.
<?php
$catalog = new ion\Catalog;
$symtab = ion\Symbol\PHP::asTable();
$catalog->add($symtab);
?>
There are three types of symbol tables:
Local symbol tables do not have names, while shared symbol tables require them; only shared symbol tables may be added to a catalog or to a writer’s list of imports.
Local symbol tables are managed internally by Ion readers and writers. No application configuration is required to tell Ion readers or writers that local symbol tables should be used.
Using local symbol tables requires the local symbol table (including all of its symbols) to be written at the beginning of the value stream. Consider an Ion stream that represents CSV data with many columns. Although local symbol tables will optimize writing and reading each value, including the entire symbol table itself in the value stream adds overhead that increases with the number of columns.
If it is feasible for the writers and readers of the stream to agree on a pre-defined shared symbol table, this overhead can be reduced.
Consider the following CSV in a file called test.csv
.
id,type,state
1,foo,false
2,bar,true
3,baz,true
...
An application that wishes to convert this data into the Ion format can generate a symbol table containing the column names. This reduces encoding size and improves read efficiency.
Consider the following shared symbol table that declares the column names of test.csv
as symbols. Note that the shared symbol table may have been generated by hand or programmatically.
$ion_shared_symbol_table::{
name: "test.csv.columns",
version: 1,
symbols: ["id", "type", "state"],
}
This shared symbol table can be stored in a file (or in a database, etc.) to be resurrected into a symbol table at runtime.
Because the value stream written using the shared symbol table does not contain the symbol mappings, a reader of the stream needs to access the shared symbol table using a catalog.
Consider the following complete example:
<?php
/**
* Representing a CSV row
*/
class Row {
public function __construct(
public readonly int $id,
public readonly string $type,
public readonly bool $state = true
) {}
}
/* Fetch the shared symbol table from file, db, etc. */
$symtab = ion\unserialize(<<<'SymbolTable'
$ion_shared_symbol_table::{
name: "test.csv.columns",
version: 1,
symbols: ["id", "type", "state"],
}
SymbolTable
);
/* Add the shared symbol table to a catalog */
$catalog = new ion\Catalog;
$catalog->add($symtab);
/* Use the catalog when writing the data */
$writer = new class(
catalog: $catalog,
outputBinary: true
) extends ion\Writer\Buffer\Writer {
public function writeRow(Row $row) : void {
$this->startContainer(ion\Type::Struct);
$this->writeFieldname("id");
$this->writeInt($row->id);
$this->writeFieldName("type");
$this->writeString($row->type);
$this->writeFieldName("state");
$this->writeBool($row->state);
$this->finishContainer();
}
};
$writer->writeRow(new Row(1, "foo", false));
$writer->writeRow(new Row(2, "bar"));
$writer->writeRow(new Row(3, "baz"));
$writer->flush();
?>
Let's inspect the binary ION stream and verify that the column names are actually replaced by SymbolIDs:
<?php
foreach (str_split($writer->getBuffer(), 8) as $line) {
printf("%-26s", chunk_split(bin2hex($line), 2, " "));
foreach (str_split($line) as $byte) {
echo $byte >= ' ' && $byte <= '~' ? $byte : ".";
}
echo "\n";
}
echo "\n";
/*
e0 01 00 ea ee a2 81 83 ........ \
de 9e 86 be 9b de 99 84 ........ |
8e 90 74 65 73 74 2e 63 ..test.c > here's ION symbol table metadata
73 76 2e 63 6f 6c 75 6d sv.colum |
6e 73 85 21 01 88 21 03 ns.!..!. <
da 8a 21 01 8b 83 66 6f ..!...fo |
6f 8c 11 da 8a 21 02 8b o....!.. > here's the actual data
83 62 61 72 8c 11 da 8a .bar.... |
21 03 8b 83 62 61 7a 8c !...baz. /
11 .
*/
?>
When unserializing without knowing the used symbols, our column name will actually be just symbol IDs $<SID>
:
<?php
var_dump(ion\unserialize($writer->getBuffer(), [
"multiSequence" => true,
]));
/*
array(3) {
[0]=>
array(3) {
["$10"]=>
int(1)
["$11"]=>
string(3) "foo"
["$12"]=>
bool(false)
}
[1]=>
array(3) {
["$10"]=>
int(2)
["$11"]=>
string(3) "bar"
["$12"]=>
bool(true)
}
[2]=>
array(3) {
["$10"]=>
int(3)
["$11"]=>
string(3) "baz"
["$12"]=>
bool(true)
}
}
*/
?>
When unserializing with known symbols, the symbol IDs will be resolved when using the catatalog with the appropriate symbol tables:
<?php
$reader = new \ion\Reader\Buffer\Reader($writer->getBuffer(),
catalog: $catalog
);
$unser = new ion\Unserializer\Unserializer(multiSequence: true);
var_dump($unser->unserialize($reader));
/*
array(3) {
[0]=>
array(3) {
["id"]=>
int(1)
["type"]=>
string(3) "foo"
["state"]=>
bool(false)
}
[1]=>
array(3) {
["id"]=>
int(2)
["type"]=>
string(3) "bar"
["state"]=>
bool(true)
}
[2]=>
array(3) {
["id"]=>
int(3)
["type"]=>
string(3) "baz"
["state"]=>
bool(true)
}
}
*/
?>