Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 2024 GSoC report Compute Summary for all detected packages #143

Merged
merged 7 commits into from
Aug 24, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion docs/source/archive/gsoc-toc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,15 @@ GSoC -- Google Summer of Code
open source software development. GSoC is completely online designed to encourage university
student participation in open source software development.
It was started by Google in 2005.
More about GSoc - <https://summerofcode.withgoogle.com/about/>_
More about GSoC - `<https://summerofcode.withgoogle.com/about/>`_

GSoC 2024
---------

.. toctree::
:maxdepth: 2

gsoc/reports/2024/scancode_toolkit_swastkk

GSoC 2022
---------
Expand Down
69 changes: 69 additions & 0 deletions docs/source/archive/gsoc/reports/2024/scancode_toolkit_swastkk.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
========================================================================
Compute summary for all detected packages.
========================================================================


| **Organization:** `AboutCode <https://aboutcode.org>`_
| **Project:** `Scancode Toolkit <https://github.com/aboutcode-org/scancode-toolkit>`_
| **Mentee:** `Swastik Sharma (swastkk) <https://github.com/swastkk>`_
| **Mentors:** Philippe Ombredanne, AyanSinhaMahapatra, AvishrantSh, Jonathan Yang, Jay Kumar

Overview
--------

Previously, we computed the summary at the codebase level, which included elements like the
`license_clarity_score`, `declared_holder`, `other_license_expressions`, and more.
This project aims to improve scanning accuracy by computing summaries and license clarity scores for
each package and its files, rather than for the entire scan. This involves enhancing package models
and ensuring accurate attribute collection across all package ecosystems.

swastkk marked this conversation as resolved.
Show resolved Hide resolved
Implementation
--------------

All the work I did is contained in `this single PR <https://github.com/aboutcode-org/scancode-toolkit/pull/3792>`_.
swastkk marked this conversation as resolved.
Show resolved Hide resolved
I added a new command-line option called ``--package-summary`` that users can employ to obtain
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to have unordered lists here, with a bit more details, use the points in the timeline from your proposal.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add issue links for specific items if they exist. Create issues and ask me to populate the project board as necessary.

a package-level summary within a single codebase. The package level summary involves the
``license_clarity_score`` calculation and population of package attributes like ``copyright``,
``holder``, ``other_license_expression``, ``notice_text``. This option must be called
with ``--classify`` option that helps ScanCode further classify scanned files/directories,
to determine whether they fall in these categories `legal`, `readme`, `top-level`, `manifest`
& ``--package`` or ``-p`` option detects various package manifests, lockfiles and
package-like data and then assembles codebase level packages and dependencies from
these package data detected at files. Also tags files if they are part of the packages.

This change allows users to get the more refined summary for each individual package
that is present in a codebase. Also this feature improves the package assembly for
various package ecosystems like npm, python-whl, rust, rubygems etc.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be a bit more detailed, and in an unordered list.



Finally, all these changes are tested through multiple unit tests validating both correct
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You haven't added any unit tests in aboutcode-org/scancode-toolkit#3792 unit tests means tests for a single function, you have added only full scan tests, and have to add unit tests before the PR can be merged. Let's only mention whatever work was completed and move things like this to a section called remaining work/post gsoc. We can always update this page after the work is done.

behavior and error handling as needed.

Post GSoC
---------

I would like to merge this PR into Scancode Toolkit, hopefully allowing users to leverage
this feature to expand their package/codebase scanning capabilities.

Links
-----

swastkk marked this conversation as resolved.
Show resolved Hide resolved
`Project idea <https://github.com/aboutcode-org/aboutcode/wiki/GSOC-2024-Project-Ideas#compute-summary-for-all-detected-packages>`_
swastkk marked this conversation as resolved.
Show resolved Hide resolved

`Official GSoC project page <https://summerofcode.withgoogle.com/programs/2024/projects/JzMlDtnM>`_

`GSoC Proposal <https://docs.google.com/document/d/1TcGqQVzXhTkz6Pmu9UaXAr4R4q1rlT4tof7H7dsVG0o/edit?usp=sharing>`_

swastkk marked this conversation as resolved.
Show resolved Hide resolved
Acknowledgements
----------------

I would like to thank my mentors
swastkk marked this conversation as resolved.
Show resolved Hide resolved

- `@pombredanne <https://github.com/pombredanne>`_
swastkk marked this conversation as resolved.
Show resolved Hide resolved
- `@AyanSinhaMahapatra <https://github.com/AyanSinhaMahapatra>`_
- `@AvishrantSh <https://github.com/AvishrantSsh>`_
- `@35C4n0r <https://github.com/35C4n0r>`_
- `@jono-yang <https://github.com/JonoYang>`_

Weekly calls were greatly helpful and those special 1:1 call with
swastkk marked this conversation as resolved.
Show resolved Hide resolved
`@AyanSinhaMahapatra` and `@pombredanne` were so amazing. Thank you for your time and your patience!
Loading