Add scores_precision
parameter to FeatureAttributionOutput.save
#202
Labels
enhancement
New feature or request
good first issue
Good for newcomers
help wanted
Extra attention is needed
Milestone
Description
This issue addresses the high space requirements of large attribution scores tensors by adding a
scores_precision
parameter toFeatureAttributionOutput.save
method.Proposant: @g8a9
Motivation
Currently, tensors in
FeatureAttributionOutput
objects (attributions and step scores) are serialized infloat32
precision as a default when usingout.save()
. While it is possible to compress the representation of these values withndarray_compact=True
, the resulting JSON files are usually quite large. Using more parsimonious data types could reduce the size of saved objects and facilitate systematic analyses leveraging large amounts of data.Proposal
float32
precision should probably remain the default behavior, as we do not want to cause any information loss by default.float16
andfloat8
should also be considered, both in the signed and unsigned variants, since leveraging the strictly positive nature of some score types would allow supporting greater precision while halving space requirements. Unsigned values will be used as defaults if no negative scores are present in a tensor.float16
can be easily used by casting tensors to the nativetorch.float16
data type, which would preserve precision up to 4 decimal values for scores normalized in the [-1;1] interval (8 for unsigned tensors). This corresponds to 2 or 4 decimal places forfloat8
. However, this data type is not supported natively in Pytorch, so tensors should be converted totorch.int8
andtorch.uint8
instead and transformed in floats upon reloading the object.The text was updated successfully, but these errors were encountered: