Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a sample test repository #595

Open
dgarijo opened this issue Nov 28, 2023 · 14 comments
Open

Add a sample test repository #595

dgarijo opened this issue Nov 28, 2023 · 14 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Milestone

Comments

@dgarijo
Copy link
Collaborator

dgarijo commented Nov 28, 2023

This repository: https://github.com/tpronk/somef-demo-repo should be added in the documentation

@tpronk
Copy link
Contributor

tpronk commented Nov 28, 2023

I have to confess it's not really that good yet. However, my attention during the ELIXIR BioHackathon has already been requested for other tasks... hope to have an update soon!

@dgarijo
Copy link
Collaborator Author

dgarijo commented Nov 28, 2023

no rush!

@tpronk
Copy link
Contributor

tpronk commented Nov 29, 2023

Back in business! Below, I'll add things I noticed while creating the demo repo. As I won't be done today, more might follow...

  • The field acknowledgment is missing from the SOMEF README and docs
  • I can find the field image (singular) in the SOMEF output, but the docs say images (plural)

@dgarijo
Copy link
Collaborator Author

dgarijo commented Nov 29, 2023

Thanks, let me open this in a new issue. Many people have been editing the d ocs, and keeping everything consistent can be challenging

@tpronk
Copy link
Contributor

tpronk commented Nov 30, 2023

Having too many contributors sounds like a lovely problem to have :). Below, I got two more potential issues... I'll post them in separate comments

@tpronk
Copy link
Contributor

tpronk commented Nov 30, 2023

I think there might be an issue with extracting a logo when there is no slash (/) in the path to the logo. For illustration, below is a snippet of the README.md of the somef-demo-repo, followed by a snippet of the JSON output of SOMEF. Note that logo1.png is not recognized as a logo, but logo_directory/logo2.png is. Same result if I use logo.png and if I don't have the logo_directory/logo2.png in the README.md

README.md

# Image
Images used to illustrate the software component.
![logo1.png](logo1.png)

# Logo
Main logo used to represent the target software component.
![logo2.png](logo_directory/logo2.png)

SOMEF Output

"logo": [
  {
    "result": {
      "type": "Url",
      "value": "https://raw.githubusercontent.com/tpronk/somef-demo-repo/main/logo_directory/logo2.png"
    },
    "confidence": 1,
    "technique": "regular_expression",
    "source": "https://raw.githubusercontent.com/tpronk/somef-demo-repo/main/README.md"
  }
],
"image": [
  {
    "result": {
      "type": "Url",
      "value": "https://raw.githubusercontent.com/tpronk/somef-demo-repo/main/logo1.png"
    },
    "confidence": 1,
    "technique": "regular_expression",
    "source": "https://raw.githubusercontent.com/tpronk/somef-demo-repo/main/README.md"
  }
]

@tpronk
Copy link
Contributor

tpronk commented Nov 30, 2023

At the Hackathon, we've been extracting metadata from around 65 repos, but in none of the SOMEF output can I find the field has_executable_notebook. Also, in the SOMEF source code, I couldn't easily identify any snippets that extract it. Does this field still work? If so, might you have an example for me of a repo where it can be extracted from?

@tpronk
Copy link
Contributor

tpronk commented Nov 30, 2023

I found a case where values extracted for the invocation field were attributed to README.md, but on visual inspection, I found them in README.Rmd instead. It concerns this repo. Below is a snippet of the SOMEF output. Credits to Esteban for providing this dataset :)

    "invocation": [
        {
            "result": {
                "type": "Text_excerpt",
                "value": "\n```{r, echo=FALSE, results='asis', message = FALSE}\nmy_apc %>% select(institution, euro) %>% \n  group_by(institution) %>% \n  ezsummary::ezsummary(n = TRUE, digits= 0, median = TRUE,\n                       extra = c(\n                         sum = \"sum(., na.rm = TRUE)\",\n                         min = \"min(., na.rm = TRUE)\",\n                         max = \"max(., na.rm = TRUE)\"\n                         )) %>%\n  mutate_all(format, big.mark=',') %>%\n  ezsummary::ezmarkup('...[. (.)]..[. - .]') %>%\n#> get rid of blanks\n  mutate(`mean (sd)` = gsub(\"\\\\(  \", \"(\", .$`mean (sd)`)) %>% \n  select(institution, n, sum, `mean (sd)`, median, `min - max`) %>%\n  arrange(desc(n)) %>%\n  knitr::kable(col.names = c(\"Institution\", \"Articles\", \"Spending total (in \u20ac)\", \"Mean (SD)\", \"Median\", \"Minimum - Maximum\"), align = c(\"l\",\"r\", \"r\", \"r\", \"r\", \"r\"))\n``` \n",
                "original_header": "Fully Open Access Journals"
            },
            "confidence": 0.906763643352601,
            "technique": "supervised_classification",
            "source": "https://raw.githubusercontent.com/MPDL/unibiAPC/master/README.md"
        },
        {
            "result": {
                "type": "Text_excerpt",
                "value": "```{r, echo = FALSE, warning = TRUE}\n\nknitr::opts_knit$set(base.url = \"/\")\nknitr::opts_chunk$set(\n  comment = \"#>\",\n  collapse = TRUE,\n  warning = FALSE,\n  message = FALSE,\n  echo = FALSE,\n  fig.width = 9,\n  fig.height = 6\n)\noptions(scipen = 999, digits = 0, tibble.width = Inf, tibble.print_max = Inf)\n\nknitr::knit_hooks$set(inline = function(x) {\n  prettyNum(x, big.mark = \",\")\n})\n```\n```{r}\nrequire(dplyr)\nrequire(ggplot2)\nrequire(ezsummary)\nrequire(pander)\n```\n```{r, echo=FALSE, cache = FALSE}\nmy_apc <- readr::read_csv(\"data/apc_de.csv\")\n```\n \n"
            },
            "confidence": 0.9211067534061969,
            "technique": "supervised_classification",
            "source": "https://raw.githubusercontent.com/MPDL/unibiAPC/master/README.md"
        }
    ]

@dgarijo
Copy link
Collaborator Author

dgarijo commented Nov 30, 2023

Thanks for these issues. executable_notebook should return the my binder links. I see that now these are added in executable_example. This may need a review (the schema suffered a few changes).
All other issues are legit. Thanks a lot! We'll need to address them

@dgarijo
Copy link
Collaborator Author

dgarijo commented Nov 30, 2023

If you find any more, please open them! I usually open them as I test in diverse repos, but some time is tricky getting to these edge cases

@tpronk
Copy link
Contributor

tpronk commented Nov 30, 2023

Bueno & gracias. I'll keep 'em coming then :)

@tpronk
Copy link
Contributor

tpronk commented Nov 30, 2023

Wrapping things up, I compared fields mentioned in the README.md of SOMEF to the fields in constants.py. These are the discrepancies I found in terms of entries I couldn't find in one or the other, ignoring cases where they probably just have a different name

  • changelog. Yes in README, not in constants
  • code_repository. Not in README, yes in constants
  • contributing_guidelines. Not in README, yes in constants
  • date_created. Not in README, yes in constants
  • date_updated. Not in README, yes in constants

@tpronk
Copy link
Contributor

tpronk commented Nov 30, 2023

All right then. SOMEF 0.9.4 can extract a total of 48 fields from this version of somef-demo-repo, which can make it a nice integration test I guess

@dgarijo
Copy link
Collaborator Author

dgarijo commented Nov 30, 2023

Definitely. Thanks!!

@dgarijo dgarijo added documentation Improvements or additions to documentation enhancement New feature or request labels Dec 20, 2023
@dgarijo dgarijo added this to the v0.9.* milestone Jan 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants