Usage
PlasmaStMan
maps Apache Arrow Tensors and Tables
(i.e., their Object IDs in the Plasma store)
to individual columns within a casacore Table.
Arrow Tensors map directly to casacore Columns one to one.
The mapping then consists on a pair of strings
indicating the Object ID of the Tensor in the Plasma store
and the name of the casacore Table column it provides data to.
Checks are in place to ensure that a Tensor’s shape and type
match those of the corresponding column of the casacore Table.
All casacore data types are supported by this mapping
with the exception of Strings
.
Arrow Tables on the other hand contain one or more Fields,
which individually map to casacore Columns.
The mapping then consists on a pair of strings
indicating the ObjectID of the Table in the Plasma store
and the name of the Field that should be considered,
which should match the name of the casacore Table column
it provides data to.
Like in the case of Tensors,
a Field’s shape (length) and type are checked
against those of the corresponding column of the casacore Table.
Columns in an Arrow Table have only a single dimension,
so they are currently only supported as scalar columns.
Additionally, Complex values are not supported natively by Arrow Tables,
and therefore Complex
and DComplex
values
are supported as Arrow Struct
objects with r
and i
fields.
Configuration
PlasmaStMan
always needs to connect to a Plasma store.
This happens through a Unix socket in the filesystem.
The location of this socket defaults to /tmp/plasma
,
but its value can be overriden
by setting the PLASMA_SOCKET
environment variable.
Either when reading or writing,
certain aspects of PlasmaStMan
can be configured at runtime via Storage manager properties
(arbitrary key-value pairs).
PlasmaStMan
supports the following properties:
PLASMACONNECTRETRIES
: the number of times the Plasma client should try to connect to the Plasma store before giving up. Defaults to50
.
PLASMAGETTIMEOUT
: the timeout in milliseconds to use when getting an object from the Plasma store that is not immediately available. Defaults to10000
.
Reading
When reading data from a Table
backed by a PlasmaStMan
storage manager
users need to ensured that the libplasmastman
shared library
is visible in the dynamic linker’s path
(e.g., adding the directory containing the library
to the LD_LIBRARY_PATH
environment variable in Linux).
Other than this, existing casacore-based applications do not require any modification or recompilation.
Writing
Note
At the moment PlasmaStMan
does not support writing data to plasma.
Writing is a trickier business.
Even though the data itself cannot be written through PlasmaStMan
,
what can currently be done is creating a casacore table
that points to existing data in Plasma.
To achieve this one must inform the storage manager
about the mapping between Object IDs and columns.
This can be done in two different ways:
If writing a program in C++, one can use the
PlasmaStMan
class to create the storage manager object and bind it to tables. The main constructor of this class accepts twostd::map
objects to provide the mapping from Object ID to column name for Tensors and Tables.Storage managers allow specifications to be given at creation time. This includes the properties specified above, along with the following additional keys:
PLASMASOCKET
: the Unix socket used to connect to Plasma, override thePLASMA_SOCKET
environment variable.
TENSOROBJECTIDS
: a casacoreRecord
object (i.e., a mapping) where keys are Tensor Object IDs and values are column names.
TABLEOBJECTIDS
: a casacoreRecord
object (i.e., a mapping) where keys are Table Object IDs and values are column names.Because this is a generic mechanism, these specifications can be given through different interfaces. For example, the
TaQL
language supports the creation of tables with a given Data Manager specification (see section 8.2, Data manager specification). Thepython-casacore
python bindings also allow the creation of tables with specific Data Manager inforation (seedminfo
argument).
Example
Note
This example needs pyarrow installed.
Included in the plasma-storage-manager
repository
is a python-based script that demonstrates
how to create a casacore Table pointing to Plasma-stored Tensors and Tables.
This can be used to test PlasmaStMan
from external programs:
# Start a plasma store and store tensor and table data with arbitrary values
# and create a table pointing to this new data (using taql).
# Use -h to see a bit more of information on how to use it
$> python scripts/plasma_writer.py -o <table_name> -t <tensor1> -t <tensor2> -T <table1> ... &
# Make the new storage manager visible to third-party apps
$> export LD_LIBRARY_PATH=your-build-directory/src/ska/plasma
# Read the table metadata with casacore's showtableinfo
$> showtableinfo in=<table_name>
# Read the table data back with casacore's taql
$> taql 'select * FROM <table_name>'