-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-factor ipc check function #72
Conversation
When you make included dataset/datasets in your package as internal, that dataset becomes available to you for use within a function. So, for example, in But if you want to use an internal dataset in an example in documentation, you cannot just invoke the dataset by its name because they are not lazily loaded during the running of examples. This is why when you removed the exported datasets, the examples that use them are erroring (but the test are running ok). If you want to use an internal dataset for examples in your function documentation, you need to use the triple-colon accessor ( When you make included dataset/datasets in your package as external, on the other hand, that dataset becomes available to you for use in examples and tests just by invoking the name of the dataset. However, when you want to use it within one of your functions, you need to use the double colon accessor ( In your package, you don't use any of the datasets within a function. You use the included dataset for examples (mainly in the outliers utility functions) and mostly for tests. So, unless you have a really good reason to use internal instead (other than for the reasons stated in Hadley's book), for the usage that you want them for, what you really want are external datasets to use for examples and testing. I am very confident with what I am saying above is appropriate and best practice because my very first package that I learned to create (and is my first package on CRAN) is a data package so I learned early on the difference between an internal dataset and an external dataset. Then immediately, my second package is I also tested what I said above on your package. Test 1. I removed the exported datasets that are also internal and then removed documentation and then used the triple colon accesssor ( Tests 2. I removed For your package and how you use the accompanying dataset, it will be simpler for you to make all your datasets external (remove When trying to understand why you want some internal and some external, the only possible logic that I can think of is that you worry that those 3 datasets that you made internal are SMART datasets that you are not allowed to distribute or don't have permission to share widely hence you think by making them internal you are keeping others from getting access to them. I can tell you now that if that is your logic, then making them internal will not make it OK because the data is still accessible by anyone who knows what an internal R dataset is and will be able to access this dataset programmatically once they install the package or from the GitHub repository. Internal dataset doesn't mean private or no access dataset. These datasets can be accessed by other people than you. But even if it were the case that internal datasets were private, just the fact that you still exported them means that the whole logic of you making them internal in the first place is contradicted by you then making them external. My final point here is that even if we suppose that you really wanted those datasets to be both internal and external, it should have been obvious to you that because the datasets had the same name internally and externally, they would have overwritten each other in the common environments that they are loaded into. And because you were not experiencing errors when you both had them internal and external, this tells us that only the external datasets are actually being used in your package (it is overwriting the lazily loaded internal dataset). I'd like to say that this comment is a suggestion but I can't live with myself to allow both internal and external dataset to exist the way you have implemented it in this package. This has to change and you just need to choose which approach to take as described above. In my opinion, keeping all of your datasets external and removing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomaszaba see my comment above about your internal and external dataset approach and my explanation of what internal and external datasets are for and my recommendations of what the way forward should be. I am approving this pull request as I know you don't like hearing my feedback, but please listen to what I am saying about the datasets. You are not doing this correctly and I give my reasons behind this above.
No, it is not about that. I just thought I already had enough example datasets to export.
Hi @ernestguevarra , I really appreciate your comprehensive explanation about the difference between external and internal datasets. I really found it very helpful to me and to my future package development. Based on this and on your recommendation, I have deleted the |
Thank you, Ernest. To have read your comment was absolutely useful to clear my understanding about the differences between an internal and external dataset. I appreciate. I have removed the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomaszaba all checks pass. the main issue I raised with this PR has now been addressed. Please feel free to merge to dev and/or main as you deem necessary.
Great! Thanks. I will merge it to dev. |
Hi @ernestguevarra , please review this PR.
I re-factored the IPC check function.
In this branch I also attempted to address the issue #71 but there was an error when I deleted them from
data
file:devtools::check()
was not passing because it was not finding those datasets. I use those datasets in some examples and tests about prevalence, but I wanted to keep them as internal. I reviewed this chapter: https://r-pkgs.org/data.html#sec-data-sysdata and the approach is used to create thesysdata.rda
is correct. I wonder if the error stem from the fact that I documented these datasets?You will not see this error in the PR that I just raised because I cancelled changes on this.
If OK with the changes on the main objective of this branch/PR, please approve and we will address the data issue in the coming branch. Alternatively, please feel free to fix the issue locally and push your changes.
Thanks.