Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: support for attributes #30

Open
adamryczkowski opened this issue Oct 26, 2017 · 8 comments
Open

Feature request: support for attributes #30

adamryczkowski opened this issue Oct 26, 2017 · 8 comments

Comments

@adamryczkowski
Copy link

I know it can be tough, since Julia doesn't support anything similar, but many people (including me) store important metadata in attributes.

I guess objects' metadata should be inserted as distinct items in the resulting dictionary.

E.g. if I read object a generated as

a<-"text"
attr(a, 'label') <- 'my label'
save("a", file="obj_a.rda")

then it should produce dictionary with two entries:

  1. "a" → "Kuku!"
  2. "a|label" → "my label"

Character | can be replaced with any other character that cannot occur in variable's name.

@alyst
Copy link
Collaborator

alyst commented Oct 26, 2017

Yes, it would be nice to extract attributes. I'm just not sure how they should be returned to the user.
Maybe 2 dictionaries: one for the objects, one for their attrs, i.e.:

  • Dict("a" => "Kuku!")
  • Dict("a" => Dict("label" => "my label"))

@ararslan
Copy link
Member

ararslan commented Oct 26, 2017

Since metadata can be attached arbitrarily to R objects, it seems like we'd either have to defensively wrap every object read from the RDA file on the Julia side in a type that preserves attributes, or live with very severe type instability (i.e. scalar versus Dict depending on whether someone decided they want to do something like x <- 1; attr(x, "name") <- "Sally").

@adamryczkowski
Copy link
Author

Definitely true. Personally I am for an extra dedicated function that loads R objects defensively.

I have also started a related discussion on Stackoverflow

@nalimilan
Copy link
Member

For the top level objects, creating special entries in the Dict would work fine, but that won't work for objects contained inside objects. For example, if your data frame columns have attributes, you can't preserve them, except by creating a special AbstractVector type which would support attributes.

@alyst
Copy link
Collaborator

alyst commented Oct 29, 2017

Columns of a dataframe is a good example showing that it would be very hard if possible at all to recover all the attributes while converting R objects into Julia equivalents.

In fact, RData::load() has convert keyarg (true by default) that specifies whether load() tries to convert the objects into Julia equivalents or just returns the low-level representation of R data (based on ROBJ subtypes).
It's still possible to convert specific parts of this low-level structure into Julia objects with sexp2julia(obj::ROBJ) function; at the same time the low-level representation preserves all the attributes.

So my vision is that for the convert=true mode we can recover "low-hanging" attributes of top-level objects or of objects in the lists/vectors. Lists/vectors are converted into RData::DictoVec objects that allow indexing elements both by string keys and integer indices (the R way). DictoVec could be extended to also store the attributes of its elements, so that one can access it with e.g. attrs(list, "<obj_name>") (i.e. maintain the second dictionary of attributes).
For the more complicated cases like extracting attributes of data frame columns, one would have to load the data without conversion and extract these attributes from the low-level representation.

There are also some standard R attributes like array "dimnames". I guess, we can implement some RData::RArray <: AbstractArray type if there's a frequent demand to support these attributes.
OTOH, I don't think RData should be a package that implements Julia types behaving like their R equivalents. If one's workflow heavily relies on the dimension labels, s/he should write a custom converter into e.g. AxisArray rather then requiring that all the necessary functionality is implemented in RData::RArray.

There could also be e.g. BioconductoR.jl package that uses RData.jl and provides support for Bioconductor standard datatypes and standards of storing metadata.
So the role of RData would be to provide convenient API to work with the low-level representation and some basic conversion utilities, while the support for custom datatypes is delegated to other packages.
We can also think of mechanisms to register external converters for certain S3/S4 classes that would be automatically called in convert=true mode.

@lrnv
Copy link

lrnv commented Dec 13, 2023

The fact that the default is to silently throw out information -- the attributes -- is not acceptable IMHO. Maybe there should be at least a large warning printed when that happend as a quick in-the-mean-time solution ?

The best possible solution IMHO is just to keep these attributes in the returned object, and --if possible-- convert each one of them. This list of attribute is just a name list of R objects after all. Is there a way I could achieve this with the current state of the package ? Can we somehow spit the ROBJ object into a dictionary of one-object-per-attributes + the object itself, to then attempt convertion on each piece ?

@nalimilan
Copy link
Member

Since this issue was filed, support for metadata was added to DataAPI and DataFrames (JuliaData/DataAPI.jl#48). We could add support for this API to ROBJ so that attributes can be retrieved when convert=false is passed. When convert=true, we could preserve data frame attributes (in particular column labels); but that's not possible for other types as their Julia equivalents don't support metadata.

@lrnv
Copy link

lrnv commented Dec 14, 2023

basically I am using the following piece of code for the moment :

obj = R"myobject"
rez = Dict(
	:obj => rcopy(obj),
	:class => rcopy(rcall(:class, obj)),
	:attrs => rcopy(rcall(:attributes, obj))
)

Then every attributes should be converted automatically if they can. Unfortunately, attributes could be anything so this might need to be recursive, checking if there are attributes to the attributes an so on.

I am not sure this will be enough to always extract everything however :( Therefore, I would love the final implementation we do here to warn if anything was discarded and is not part of the obtained output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants