vame
makes it simpler to define and make use of metadata pertaining
to one more variables (e.g. a tabular dataset). It implements the
VariableMetadata
class, which contains the metadata. The various metadata
are accessed using slot functions such as vm@var_description_get
.
The VariableMetadata
class is intended for storing metadata for which there
is "one right way". For instance, a variable has one correct description in
text. This philosophy excludes tasks such as creating a manual for a dataset,
which can take many forms.
See the help page ?vame::VariableMetadata
for more information. In
particular see the examples.
devtools::install_github(
"FinnishCancerRegistry/vame",
ref = readline("enter latest tag on github: ")
)
vm@vame_category_space_dt
:
vm@var_set_make
can be used in place of vm@var_set_value_space_eval
(or maybe vm@var_set_value_space_eval
can call vm@var_set_make
)
in the case of categorical variables when
multiple columns are requested AND the necessary data is available from
other value spaces. Currently in v0.3.0.16 it is necessary to have ALL
dependent variables in one value_space
to show the dependency in the
output of at least vm@vame_category_space_dt
. The downside is the
potential slowdown.
Maybe there needs to be a smart system, or user-input-based system, of
knowing which variable sets depend on what other variable sets. For instance
the user can include column var_set_dt$dep_id_set
.
vm@var_set_value_space_sample
:
The issue of dependent variables
appears in both sampler
and maker
objects. It could be argued that
they should be required to have the same set of dependent variables. This
would mean improving the corresponding assertion functions. Alternatively,
a new column var_set_dt$dep_var_nm_set
or even var_set_dt$dep_id_set
could be implemented --- but currently sampler
objects can also sample
independently, so implmenting e.g. var_set_dt$dep_id_set
would require
that independent samplers can still work. Currently in 0.3.0.15 using
dep_var_nm_set
in sampler
causes data
to be asserted to contain
such variables. Maybe an independent sampler
would need to be marked
in a special way so that the data
assertion is not performed.