Feature request: support for attributes #30

adamryczkowski · 2017-10-26T06:06:59Z

I know it can be tough, since Julia doesn't support anything similar, but many people (including me) store important metadata in attributes.

I guess objects' metadata should be inserted as distinct items in the resulting dictionary.

E.g. if I read object a generated as

a<-"text"
attr(a, 'label') <- 'my label'
save("a", file="obj_a.rda")

then it should produce dictionary with two entries:

"a" → "Kuku!"
"a|label" → "my label"

Character | can be replaced with any other character that cannot occur in variable's name.

The text was updated successfully, but these errors were encountered:

alyst · 2017-10-26T11:58:48Z

Yes, it would be nice to extract attributes. I'm just not sure how they should be returned to the user.
Maybe 2 dictionaries: one for the objects, one for their attrs, i.e.:

Dict("a" => "Kuku!")
Dict("a" => Dict("label" => "my label"))

ararslan · 2017-10-26T18:00:28Z

Since metadata can be attached arbitrarily to R objects, it seems like we'd either have to defensively wrap every object read from the RDA file on the Julia side in a type that preserves attributes, or live with very severe type instability (i.e. scalar versus Dict depending on whether someone decided they want to do something like x <- 1; attr(x, "name") <- "Sally").

adamryczkowski · 2017-10-26T18:43:27Z

Definitely true. Personally I am for an extra dedicated function that loads R objects defensively.

I have also started a related discussion on Stackoverflow

nalimilan · 2017-10-26T20:07:07Z

For the top level objects, creating special entries in the Dict would work fine, but that won't work for objects contained inside objects. For example, if your data frame columns have attributes, you can't preserve them, except by creating a special AbstractVector type which would support attributes.

alyst · 2017-10-29T11:39:01Z

Columns of a dataframe is a good example showing that it would be very hard if possible at all to recover all the attributes while converting R objects into Julia equivalents.

In fact, RData::load() has convert keyarg (true by default) that specifies whether load() tries to convert the objects into Julia equivalents or just returns the low-level representation of R data (based on ROBJ subtypes).
It's still possible to convert specific parts of this low-level structure into Julia objects with sexp2julia(obj::ROBJ) function; at the same time the low-level representation preserves all the attributes.

So my vision is that for the convert=true mode we can recover "low-hanging" attributes of top-level objects or of objects in the lists/vectors. Lists/vectors are converted into RData::DictoVec objects that allow indexing elements both by string keys and integer indices (the R way). DictoVec could be extended to also store the attributes of its elements, so that one can access it with e.g. attrs(list, "<obj_name>") (i.e. maintain the second dictionary of attributes).
For the more complicated cases like extracting attributes of data frame columns, one would have to load the data without conversion and extract these attributes from the low-level representation.

There are also some standard R attributes like array "dimnames". I guess, we can implement some RData::RArray <: AbstractArray type if there's a frequent demand to support these attributes.
OTOH, I don't think RData should be a package that implements Julia types behaving like their R equivalents. If one's workflow heavily relies on the dimension labels, s/he should write a custom converter into e.g. AxisArray rather then requiring that all the necessary functionality is implemented in RData::RArray.

There could also be e.g. BioconductoR.jl package that uses RData.jl and provides support for Bioconductor standard datatypes and standards of storing metadata.
So the role of RData would be to provide convenient API to work with the low-level representation and some basic conversion utilities, while the support for custom datatypes is delegated to other packages.
We can also think of mechanisms to register external converters for certain S3/S4 classes that would be automatically called in convert=true mode.

lrnv · 2023-12-13T08:30:46Z

The fact that the default is to silently throw out information -- the attributes -- is not acceptable IMHO. Maybe there should be at least a large warning printed when that happend as a quick in-the-mean-time solution ?

The best possible solution IMHO is just to keep these attributes in the returned object, and --if possible-- convert each one of them. This list of attribute is just a name list of R objects after all. Is there a way I could achieve this with the current state of the package ? Can we somehow spit the ROBJ object into a dictionary of one-object-per-attributes + the object itself, to then attempt convertion on each piece ?

nalimilan · 2023-12-14T17:06:57Z

Since this issue was filed, support for metadata was added to DataAPI and DataFrames (JuliaData/DataAPI.jl#48). We could add support for this API to ROBJ so that attributes can be retrieved when convert=false is passed. When convert=true, we could preserve data frame attributes (in particular column labels); but that's not possible for other types as their Julia equivalents don't support metadata.

lrnv · 2023-12-14T17:34:29Z

basically I am using the following piece of code for the moment :

obj = R"myobject"
rez = Dict(
	:obj => rcopy(obj),
	:class => rcopy(rcall(:class, obj)),
	:attrs => rcopy(rcall(:attributes, obj))
)

Then every attributes should be converted automatically if they can. Unfortunately, attributes could be anything so this might need to be recursive, checking if there are attributes to the attributes an so on.

I am not sure this will be enough to always extract everything however :( Therefore, I would love the final implementation we do here to warn if anything was discarded and is not part of the obtained output.

alyst added enhancement up for grabs labels Nov 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: support for attributes #30

Feature request: support for attributes #30

adamryczkowski commented Oct 26, 2017

alyst commented Oct 26, 2017

ararslan commented Oct 26, 2017 •

edited

Loading

adamryczkowski commented Oct 26, 2017

nalimilan commented Oct 26, 2017

alyst commented Oct 29, 2017

lrnv commented Dec 13, 2023

nalimilan commented Dec 14, 2023

lrnv commented Dec 14, 2023 •

edited

Loading

Feature request: support for attributes #30

Feature request: support for attributes #30

Comments

adamryczkowski commented Oct 26, 2017

alyst commented Oct 26, 2017

ararslan commented Oct 26, 2017 • edited Loading

adamryczkowski commented Oct 26, 2017

nalimilan commented Oct 26, 2017

alyst commented Oct 29, 2017

lrnv commented Dec 13, 2023

nalimilan commented Dec 14, 2023

lrnv commented Dec 14, 2023 • edited Loading

ararslan commented Oct 26, 2017 •

edited

Loading

lrnv commented Dec 14, 2023 •

edited

Loading