Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cran_package_history("-") fails #126

Open
mccarthy-m-g opened this issue Aug 26, 2024 · 7 comments
Open

cran_package_history("-") fails #126

mccarthy-m-g opened this issue Aug 26, 2024 · 7 comments
Labels
feature a feature request or enhancement

Comments

@mccarthy-m-g
Copy link

mccarthy-m-g commented Aug 26, 2024

Problem

https://crandb.r-pkg.org/-/<redacted> is a valid API call that gets history for all packages, but cran_package_history("-") results in an error. I did some debugging and the error happens here:

df_list <- list(rectangle_packages(resp$versions))

The problem is that resp$versions ends up querying the list for the {versions} package, instead of the versions index in each package.

Would you be open to expanding the function to support the cran_package_history("-") call? Happy to start a PR for it!

reprex

library(pkgsearch)

cran_package_history("-")
#> Warning in description_list$releases <- NULL: Coercing LHS to a list

#> Warning in description_list$releases <- NULL: Coercing LHS to a list

#> Warning in description_list$releases <- NULL: Coercing LHS to a list
#> Error: Inputs can't be recycled to a common size.

Created on 2024-08-26 with reprex v2.0.2

@gaborcsardi
Copy link
Contributor

cran_package_history("-")was never supposed to do anything meaningful.

@mccarthy-m-g
Copy link
Author

mccarthy-m-g commented Aug 26, 2024

Does that mean you aren't interested in supporting "-" within {pkgsearch}?

I was hoping to get the version history for every CRAN package---and the https://crandb.r-pkg.org/-/<redacted> endpoint does this---but it would be nice to be able to do it from {pkgsearch}. Otherwise I'd have to map over cran_package_history() for each package, which is a lot of API calls.

@gaborcsardi
Copy link
Contributor

gaborcsardi commented Aug 26, 2024

That endpoint is pretty heavy on the DB, so I definitely don't want to support it in pkgsearch. In fact I might need to remove it completely, or heavily cache it in cloidflare.

In fact, I'll remove the link from your comments, because people and/or crawlers clicking on it will kill the server.

@mccarthy-m-g
Copy link
Author

Ah, fair enough. Is there a responsible way to get the history for every package? I'm assuming calling cran_package_history() for every package is also heavy on the DB?

@gaborcsardi
Copy link
Contributor

No, that's not heavy at all, but it also takes a very long time to make thousands of HTTP queries. I don't know of any good way currently.

@mccarthy-m-g
Copy link
Author

Thanks, that's good to know. Maybe a regularly updated duckdb database would be a good way to share the history for every package?

Just for context, the reason I wanted this data was for a {shinylive} dashboard that would provide download analytics for every CRAN package, and my original plan was to make said database with GitHub Actions (so I didn't want something that would take forever to run). I'm probably going to pivot from my original plan now though, so feel free to close this.

@gaborcsardi
Copy link
Contributor

I like the idea of having a daily Parquet file available with all the data.

@gaborcsardi gaborcsardi added the feature a feature request or enhancement label Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants
@gaborcsardi @mccarthy-m-g and others