API
Casacore classes
ska::plasma::PlasmaStMan
and ska::plasma::PlasmaStManColumn
are the two main classes
implementing the Storage Manager API
as mandated by casacore.
-
class ska::plasma::PlasmaStMan : public DataManager
The Plasma-based storage manager
This is implemented using a pimpl idiom to hide the particulars of the implementation and hide it from users.
Public Functions
-
PlasmaStMan(std::string plasma_socket =
"", const std::map<std::string, ObjectID> &tensor_object_ids = {}, const std::map<std::string, ObjectID> &table_object_ids = {}) Creates a new instance of the Plasma Storage Manager connected to the given socket, and mapping columns to Arrow Tensors and Tables as indicated in the given mappings.
- Parameters
plasma_socket – The UNIX socket where the Plasma store listens for connections. If not given, or empty, it defaults to
/tmp/plasma
, unless thePLASMA_SOCKET
environment variable is set, in which case its value takes precedence.tensor_object_ids – A mapping from column names to Object IDs in the Plasma store where Arrow Tensors with the data for the respective column can be found.
table_object_ids – A mapping from column names to Object IDs in the Plasma store where Arrow Tables with the data for the respective column can be found (the name of the column being mapped must be the same as the column name in the Arrow Table).
-
~PlasmaStMan()
Destructor declaration because of the pimpl idiom, otherwise its implementation is defaulted.
-
void ping_plasma()
-
void set_plasma_get_timeout(std::int64_t timeout)
-
void set_plasma_connect_retries(int connect_retries)
Public Static Functions
-
static casacore::DataManager *makeObject(const casacore::String &aDataManType, const casacore::Record &spec)
Factory function invoked by casacore to create an instance of PlasmaStMan from a given DataManager specification.
- Parameters
aDataManType – The name of the data manager.
spec – The specification of the data manager.
- Returns
A new PlasmaStMan object.
-
class impl
The Plasma-based storage manager implementation
This class fully implements the plasma-based storage manager, while PlasmaStMan only exposes this implementation, while hiding its dependencies.
Public Functions
-
impl(std::string plasma_socket =
"", std::map<std::string, ObjectID> tensor_object_ids = {}, std::map<std::string, ObjectID> table_object_ids = {})
-
~impl()
Destructor declaration because of incomplete PlasmaStManColumn type usage in one of our members; otherwise its implementation is defaulted.
-
void ping_plasma()
-
void set_plasma_get_timeout(std::int64_t timeout)
-
void set_plasma_connect_retries(int connect_retries)
-
DataManager *clone() const
- See
PlasmaStMan::clone
-
String dataManagerType() const
- See
PlasmaStMan::dataManagerType
-
String dataManagerName() const
- See
PlasmaStMan::dataManagerName
-
void create64(rownr_t aNrRows)
- See
PlasmaStMan::create64
-
rownr_t open64(rownr_t aRowNr, AipsIO &ios)
- See
PlasmaStMan::open64
-
rownr_t resync64(rownr_t aRowNr)
- See
PlasmaStMan::resync64
-
Bool flush(AipsIO&, Bool doFsync)
- See
PlasmaStMan::flush
-
DataManagerColumn *makeScalarColumn(const String &aName, int aDataType, const String &aDataTypeID)
- See
PlasmaStMan::makeScalarColumn
-
DataManagerColumn *makeDirArrColumn(const String &aName, int aDataType, const String &aDataTypeID)
- See
PlasmaStMan::makeDirArrColumn
-
DataManagerColumn *makeIndArrColumn(const String &aName, int aDataType, const String &aDataTypeID)
- See
PlasmaStMan::makeIndArrColumn
-
void deleteManager()
- See
PlasmaStMan::deleteManager
-
void addRow64(rownr_t aNrRows)
- See
PlasmaStMan::addRow64
-
Record dataManagerSpec() const
- See
PlasmaStMan::dataManagerSpec
-
Record getProperties() const
- See
PlasmaStMan::getProperties
-
void setProperties(const Record &props)
- See
PlasmaStMan::setProperties
-
inline rownr_t nrows() const
Return the number of rows used by all columns managed by this storage manager
- Returns
The number of rows used by all columns managed by this storage manager
Public Static Functions
-
static DataManager *makeObject(const String &aDataManType, const Record &spec)
-
impl(std::string plasma_socket =
-
PlasmaStMan(std::string plasma_socket =
-
class ska::plasma::PlasmaStManColumn : public StManColumnBase
A single column of the Plasma Storage Manager
A PlasmaStManColumn manages a single column on a casacore Table, which will be backed up by an Arrow object stored in Plasma. The actual handling of the underlying Arrow object is done via an ArrowReader instace, which hides the differences between the different types of Arrow objects that can hold data. At the moment the only supported reader is TensorReader (and thus this class still silently assumes that), but more will come. When the Tensor is retrieved from Plasma this class will create the corresponding TensorReader instance, which will ensure the data types are compatible. Also, upon data access (again, through the reader), the tensor’s shape is compared against the column’s cell shape to ensure the tensor and the column define the same dimensionality.
While casacore is column-major, Arrow is by default row-major. On the other hand, the dimensions that this column receives via setShapeColumn are those of individual cells, while Arrow Tensors will contain the full column data. Thus:
The first dimension of the Tensor should always be the number of rows of the column
For the rest of the dimensions, they should match the column cell’s shape in reverse order.
In principle support for non-row-major Tensors should be possible to add, but that is left as a future improvement.
Public Functions
-
PlasmaStManColumn(const std::string &name, PlasmaClient &client, PlasmaStMan::impl &storage_manager, const ArrowObjectInfo &object_info, int dataType)
Create a new PlasmaStManColumn with the given name and data type. Upon construction it connects to Plasma and retrieves the underlying Arrow object, if known at this stage; otherwise a call to initialize_reader needs to be issued later before attempting to read anything.
- Parameters
name – The name of this column.
client – The Plasma client object used to read Arrow objects off Plasma.
storage_manager – A reference to the owning storage manager, used to retrieve the number of rows after table creation.
object_info – Structure containing the Object ID and type of Arrow object to read from Plasma. If the type is ArrowObjectType::UNKNOWN then no reading occurs.
dataType – The data type of this column.
-
void initialize_reader(const ArrowObjectInfo &object_info)
Initializes the underlying reader object with the provided information.
- Parameters
object_info – Structure containing the Object ID and type of Arrow object to read from Plasma. If the type is ArrowObjectType::UNKNOWN then no initialization occurs.
-
bool reader_initialized() const
- Returns
Whether the underlying reader is initialized or not.
Plasma access
-
class ska::plasma::PlasmaClient
A class encapsulating access to a Plasma Store.
This class encapsulates access to a Plasma Store. Although it’s a very thin wrapper around ::plasma::PlasmaClient, it adds configuration capabilities around certain aspects, like timeouts, the socket to connect to, retries and others.
Public Functions
-
PlasmaClient(std::string socket)
Create a new PlasmaClient that will connect to the given socket.
- Parameters
socket – The Plasma socket to connect to.
-
void ping()
Ensure communication between the client and the server works.
-
inline void set_get_timeout(std::int64_t timeout)
Set the timeout for the Plasma Get operation, in milliseconds.
- Parameters
timeout – The timeout for the Plasma Get operation, in milliseconds.
-
inline std::int64_t get_timeout() const
- Returns
The timeout for the Plasma Get operation, in milliseconds.
-
inline void set_connect_retries(int connect_retries)
Set the number of attempts to connect to the Plasma socket before failing.
- Parameters
connect_retries – the number of attempts to connect to the Plasma socket before failing.
-
inline int connect_retries() const
- Returns
The number of attempts to connect to the Plasma socket before failing.
-
::plasma::ObjectBuffer get(const ObjectID &object_id)
Read an object from the Plasma store. A plasma_error exception is thrown if no such object is found within the timeout.
- Parameters
object_id – The ID of the object to read.
- Returns
A Plasma Object Buffer pointing to the object in the Plasma Store.
-
inline std::string socket() const
- Returns
The socket where this Plasma client connects to.
-
PlasmaClient(std::string socket)
Data reading
Internally, data reading is organised in a hierarchy of the Reader classes, each taking care of reading different Arrow objects.
-
class ska::plasma::ArrowReader
Base class for Arrow data readers used by the PlasmaStManColumn class.
Arrow offers different storage types, like Tensors and Tables. This base class offers a common interface for accessing data from these different storage types.
Subclassed by ska::plasma::TableReader, ska::plasma::TensorReader
Public Functions
-
inline ArrowReader(const std::string &column_name, casacore::DataType data_type)
Constructs a reader for the given data type.
- Parameters
column_name – The casacore column backed by this reader.
data_type – The casacore data type of the column backed by this reader.
-
virtual ~ArrowReader() = default
Virtual destructor required by virtual base class.
-
inline void check_conformance(const Shape &column_shape)
Checks that the data type and the shape of the underlying Arrow object match those of the casacore column this reader backs up. The column data type is known at construction time, and the column shape is given here.
- Parameters
column_shape – The shape of the casacore column this reader backs up.
-
virtual void read_scalar(rownr_t rownr, void *dataPtr) = 0
Read a single scalar value from the underlying Arrow object. The scalar value is that corresponding to the cell in row rownr.
- Parameters
rownr – The (casacore) row number of the cell for which the scalar is being read.
dataPtr – The address where the scalar should be written to.
-
virtual void read_array(ArrayBase &array, std::size_t offset) = 0
Read an array from the underlying Arrow object starting at the given offset. The array’s shape determines how much data is effectively read, and might or might not be able to be created with zero-copy.
- Parameters
array – The array where the data should be read into.
offset – The offset in the underlying Arrow object at which reading will start.
-
inline ArrowReader(const std::string &column_name, casacore::DataType data_type)
-
class ska::plasma::TensorReader : public ska::plasma::ArrowReader
An ArrowReader that reads data off an Arrow Tensor.
TODO: The current implementation contains two private templated methods to handle all data types. This means we need to continuously do a runtime check for the casacore data type to choose the correct template instance. This could be avoided by offering a TensorReaderBase class that handles all common aspects, then a TensorReader class templated on the casacore data type, and finally a factory function that is called once from PlasmaStManColumn to create the correct reader for the given casacore data type.
Public Functions
-
TensorReader(const std::string &column_name, casacore::DataType data_type, arrow::io::InputStream *input_stream)
Constructs a TensorReader for the given casacore data type and column from an input stream.
- Parameters
column_name – The casacore column backed by this reader.
data_type – The casacore data type of the column backed by this
input_stream – The input stream from where the Tensor will be read. This is possibly created from an object read from Plasma.
-
virtual void read_scalar(rownr_t rownr, void *dataPtr) override
Read a single scalar value from the underlying Arrow object. The scalar value is that corresponding to the cell in row rownr.
- Parameters
rownr – The (casacore) row number of the cell for which the scalar is being read.
dataPtr – The address where the scalar should be written to.
-
virtual void read_array(ArrayBase &array, std::size_t offset) override
Read an array from the underlying Arrow object starting at the given offset. The array’s shape determines how much data is effectively read, and might or might not be able to be created with zero-copy.
- Parameters
array – The array where the data should be read into.
offset – The offset in the underlying Arrow object at which reading will start.
-
TensorReader(const std::string &column_name, casacore::DataType data_type, arrow::io::InputStream *input_stream)
-
class ska::plasma::TableReader : public ska::plasma::ArrowReader
An ArrowReader that reads data off an Arrow Table.
Tables can contain multiple “fields” or “columns”. The column read by this reader is the one with the same name of the casacore Table column backed up by this reader. If no such field/column is found in the Arrow Table then an error is raised. Only Tables written as a single BatchRecord are currently supported.
Public Functions
-
TableReader(const std::string &column_name, casacore::DataType data_type, arrow::io::InputStream *input_stream)
Constructs a TableReader for the given casacore data type and column from an input stream. The column name in casacore must be the same as the column in the Arrow Table that will be read.
- Parameters
column_name – The casacore column backed by this reader. Should be the same as the column in the Arrow Table.
data_type – The casacore data type of the column backed by this
input_stream – The input stream from where the Table will be read. This is possibly created from an object read from Plasma.
-
virtual void read_scalar(rownr_t rownr, void *dataPtr) override
Read a single scalar value from the underlying Arrow object. The scalar value is that corresponding to the cell in row rownr.
- Parameters
rownr – The (casacore) row number of the cell for which the scalar is being read.
dataPtr – The address where the scalar should be written to.
-
virtual void read_array(ArrayBase &array, std::size_t offset) override
Read an array from the underlying Arrow object starting at the given offset. The array’s shape determines how much data is effectively read, and might or might not be able to be created with zero-copy.
- Parameters
array – The array where the data should be read into.
offset – The offset in the underlying Arrow object at which reading will start.
-
TableReader(const std::string &column_name, casacore::DataType data_type, arrow::io::InputStream *input_stream)
Misc
-
class ska::plasma::ObjectID
Simple, immutable class containing an Object ID.
This is a simpler version of plasma’s own Object ID class, but without carrying all its dependencies, allowing us to have a specific type to represent Object IDs (other than std::string) without permeating the codebase with plasma dependencies.
Public Functions
-
ObjectID(const std::string &object_id)
Constructs an Object ID for the given string, which must be a valid plasma Object ID.
- Parameters
object_id – The contents of the Object ID
-
ObjectID(const char *object_id)
Constructs an Object ID for the given null-terminated C string, which must be a valid plasma Object ID.
- Parameters
object_id – The contents of the Object ID
-
inline const std::string &string() const
Returns the underlying string.
- Returns
The underlying string
-
inline bool valid() const
Returns whether this is a valid Object ID or not.
- Returns
true if this Object ID is valid
-
ObjectID(const std::string &object_id)