Skip to content

GSOD 2024 Proposal

Ayan Sinha Mahapatra edited this page Apr 8, 2024 · 5 revisions

Proposal Title

Add end-to-end guides and examples of using the AboutCode stack with scancode.io, purldb and vulnerablecode, to scan PurlDB reference docs and a glossary

About our Organization

Organization Scope

AboutCode.org is a community of developers who focus on Software Composition Analysis (SCA) tools (command line tools, web-based and API servers and applications) and data for identifying and tracking software origin, licensing, security vulnerabilities and maintenance quality. SCA tools and data are essential to enable everyone to safely produce and use free and open source software. And with modern software reusing millions of free and open source software components available on the web, we think that these essential SCA tools themselves being FOSS would have a huge impact in how we reuse open source software freely, safely and responsibly. Our tools are not only open source, they help everyone use more open source software!

Our Projects

The focus for GSoD 2024 is on end-to-end scenarios using scancode.io, PurlDB and vulnerablecode.

scancode.io is a server to script and automate software composition analysis pipelines, combining all other projects in the AboutCode stack, to uncover data about software and FOSS. VulnerableCode and PurlDB are the new and upcoming projects from aboutcode!

VulnerableCode is a free and open vulnerability data aggregation database, with information about vulnerabilities and the packages affected by them. VulnerableCode has data importers for all major vulneribility advisory publishers, tools for version range parsing and resolving, bots for data consistency check and improvements, and comparison tools for other vulnerability databases (Supports CVEs). We also have an instance (and an open API endpoint) for public use at: public.vulnerablecode.io.

PURL is a leading effort to standardizing software package identification, and is widely used by open source foundations and organizations, SBOMs and other data formats for SCA tools, and major tech organizations, also used by Google's https://github.com/google/osv.dev. And PurlDB is a database of packages keyed by PackageURL (PURL), with tools for scanning packages, matching and indexing for comparison, mining (fetching package metadata from package managers) and more.

We at aboutcode have also created and maintained a lot of other open-source projects like:

Why select us?

Accepting our project for GSoD to improve better documentation for our tools will be extremely useful for attracting more people to use and contribute to our projects and making sure open access SCA and vulnerability data becomes the standard.

Scanning source code and FOSS third-party packages used in software is extremely important to uncover license issues, vulnerabilities, and other quality issues. And these tools and the backing data should be open-source to make these capabilities available across entire supply chains, to safely use FOSS packages. Documentation is a crucial part in supporting this adoption and great documentation on using these modular tools together would go a long way in supporting aboutcode and the community usage of aboutcode tools to inspect FOSS packages.

AboutCode.org was started by nexB Inc. in 2013. We have many contributors from a growing FOSS community, including students who have continued contributing to our projects after GSoC and GSoD program participation. As with many open source projects, we only know the identity of a subset of our users, but we know that our AboutCode software is used by (and receives contributions from) several open source Foundations such as Eclipse, OW2, the FSFE and many projects such as ORT, REUSE and Tern. Our projects are also used by major tech companies including Google itself, and at their open source program offices.

About our project

Project problem

PurlDB is a recently released AboutCode project that has only limited documentation. The goals will be to migrate the existing documentation from the code repository to a more friendly ReadTheDocs format, and add detailed Reference documentation for PurlDB which is package data by Package-URLs (the leading modern package identifier specification used by modern SCA tools) and the tools used that you can use to fetch, create, index and compare records in this database. The goal will also include upgrading the Getting-Started documentation for both users and developers. This will help new users with the necessary context and directions to start using PURLs and PurlDB in their SCA workflows.

Project scope

Add end-to-end use cases for code scanning using purldb and scancode.io

  • Add end to end softwore composition analysis use cases in the AboutCode stack, using purldb and scancode.io together, with examples, and information about supported ecosystems:
    • Running a devel-to-deploy scan in SCIO with:
      • inspect packages and populate purldb
      • purldb scanning and indexing for these purls
      • scanning and matching to purldb packages
    • Scanning a source/binary package and all it's dependencies
      • sending all purls detected in SCIO for deps to purldb
      • get scancode scan data back for all these dependenices
    • getting all packages from a lockfile and their scans
      • parse and get packages for a lockfile/resolve dependency requirements
      • get metadata and scancode scans for all these packages
    • load and enrich SBOMs with package scans
      • load packages from SBOMs/other tool scans
      • get metadata for and scan packages for imported purls
      • enrich these packages with data from metadata and scans
      • export as SBOMs
  • Add backing reference documentation on PurlDB and SCIO:
    • What are the components: packagedb, matchcode, minecode, purlcli
    • How to install SCIO and purldb together with more backing SCIO workers
    • Enhance the docs in https://github.com/nexB/purldb#readme into RTD pages and sections

URLs:

Primary Mentors:

Contributors and Mentors

We already have 8 prospective candidates who have shown interest in our organization and shared their experience and previous technical writing experience with us at our public chat.

According to our timeline we have started receiving draft statements of interests, but will start the process of tech writer selection only after we are selected as an organization for GSoD 2024.

We have 6 members of the AboutCode community, maintainers of and contributors to different projects, who have committed their time as mentors (and for org-admin responsibilities). They will help and support the tech writer we select with various aspects of their documentation writing, planning, and understanding our projects. They are @pombredanne, @mjherzog, @DennisClark, @JonoYang, @johnmhoran, @AyanSinhaMahapatra.

Measuring our project’s success

It is tough measuring success for a privacy first org where we have clear guidelines on not tracking any users by google analytics in our documentation pages. But we have come up with other metrics that can be used to measure the success of the documentation project.

  1. Atleast 80% or more of the issues in purldb and scancode.io on usage questions will be closed by these docs.
  2. Atleast a 20% reduction in the questions about:
    • various use cases of using purldb and scancode.io from users
    • using scancode.io and purldb together for users
    • on setting up purldb/scancode.io individually and together These would be both in our element chatroom: https://matrix.to/#/#aboutcode-org_discuss:gitter.im and other project specific chatrooms.
  3. We will track new contributors (new Issues and Pull Requests) measured monthly for the 5 months in the GSoD project timeline, and compare the number of new contributions/issues gained after the completion of technical writing phase to the number of contributions/issues gained before tech writing started and during the tech writing period. We will consider this project to be a success if we see a 40% or more additional growth in the number of new contributions/issues when compared to the time before the documentation was written.
  4. We will also select 5 first-time contributors from this year follow the use cases and we expect that they will be able to follow and get the same results without asking any additional questions.
  5. We will select additional 5 first-time users who have asked questions about purldb to rate the available documentation on a scale of 1-10 on a couple different metrics like ease-of-use, beginner friendly, detailed docs etc before this project and after this project, and we expect to see atleast a 50% improvement in the scores.

Timeline

The project will take approximately 4 months to complete, including a buffer time of another month reserved for unforseen circumstances/challenges. This also excludes orientation and planning which will be completed before the tech writing period begins. See our detailed timeline for GSoD here.

Here is our rough project timeline, but note that this is subject to change after discussion and planning starts with mentors and the tech writer.

Technical Writer Hiring

  • April 10 - April 27: More Discussion on AboutCode proposed project
  • April 27: Deadline to submit updated Statement of Interest for the selected AboutCode project
  • April 28 - May 10: Interviews with shortlisted candidates
  • May 13: We announce our technical writer selection
  • May 22: Google Season of Docs Technical Writing Hiring Deadline

Technical Writing Timeline

  • May 13 - May 17: Community bonding period and Project Planning
  • May 17: GSoD project starts
  • June 5 - June 12: First monthly evaluation period
  • July 5 - July 29: Second monthly evaluation period
  • September 5 - September 12: Third monthly evaluation period
  • October 22 - October 29: Fourth monthly evaluation period
  • November 22 - December 10: Orgs submit case study and final evaluation
  • December 13: Results announced by Google

Technical Writing Timeline by goals

Phases Dates Project goals
0 13 May : 17 May Setting up environments, getting familiar with ecosystem, raising first PR
1 17 May : 12 June Move README content to RTD, add reference and setup (SCIO and purldb together) docs
2 13 June : 29 July document the deployment analysis use case
3 1 August : 12 September document the package/lockfile scan and scan/fetch package data use case
4 13 September : October 29 document the SBOM use case and other use cases discussed/added during the project
5 1 November : 22 November buffer time, solve other documentation issues, new sections for purldb
6 22 November : 10 December Final evaluation and case study creation
7 After 13 December post-gsod follow-up surveys

Budget

Budget Item Amount (US $) Notes/Justifications
Technical writer stipend to update, add, test, and publish documentation for AboutCode 6000.00 This is total 700 hours for 4 months, at 175 hours per month
Stipend for AboutCode mentors helping the Tech writer with technical expertise and use cases 1000.00 This is 500$ each for two volunteer mentors
US $7000.00

Our budget for the project is $7,000 which will be allocated completely to the technical writers and mentors working on the project. We do not see any other expenses because:

  • We will be using open source software for all our documentation efforts so there would not be any licenses/other expenses for commercial software.

We expect that the technical writer will work on our project full time over a period of 4 months (May to October), and the tasks will be divided into the timeline of 4 months with key deliverables set for each month. We would also disburse funds to the technical writer once after hiring the writer and thereafter monthly over the 4 months based on completing agreed upon deliverables (divided from the first org payment), and the rest upon project completion, from the final payment to the organisation.

The amount decided for the project is based on payments and working hours from the GSoC program, for both the tech writer and the mentors, so this is standard and fair for both the tech-writer and us.

We already have an open-collective account for aboutcode used actively to fund open source contributors, get contributions and sponsors from the community, fund community events and for previous years of GSoD/GSoC.

Additional Information

Previous experience with technical writers or documentation:

We believe that documentation should be created, managed and tested like code. With this in mind we expect to include any technical writer directly into the corresponding project development team. This approach worked well for our former GSoD project participation in 2019 and we have adapted it for other contributors to our project documentation.

Our documentation builds are tested in CI/CD, along with linters and link checkers. Most of this documentation infrastructure was implemented based on work from our 2019 GSoD program.

We worked with an experienced technical writer in GSoD 2021 and learnt a lot about managing technical writers from this program. We learnt a lot more about setting expectations and discuss deliverables. We also learned that it is best to let the core maintainer of the project mentor the technical writer primarily and have everyone else chime in for review and feedback for the documentation written.

Based on our recent experience mentoring both newcomers and experienced technical writers, we believe we have the process and tools in place for quickly on-boarding a new technical writer and let them focus on new content structure, design and creation.

Our mentors also have significant experience working with technical writers from prior product development work at commercial software companies (with experienced technical writers) and mentoring open source communities where we have lots of contributors (mostly newcomers to tech writing) writing and maintaining our technical documentation.

Previous participation in Google Season of Docs, Google Summer of Code:

GSoC

All of our present mentors have participated in one or more Google Summer of Code programs since 2015 and we also have 3 org-admins here who have been org-admins in GSoCs since 2015. Additionally we also have 3 mentors who have participated in GSoC as students in 2020, 2021, 2022 and 2023 respectively. We are also pleased to be selected for GSoC 2024 this year!

GSoD

We have been selected in GSoD twice, once in the inaugural year in 2019, and once in 2021 in the current format of GSoD. We have 4 mentors and org-admins this year who have been mentors and org-admins in these successful years of GSoD. So we have experience mentoring both beginners to open-source and also worked with experienced technical writers, working with them to successfully write great documentation.

We also have as a mentor and org-admin our GSoD contributor from 2019, who has also participated in GSoC 2020, and been a mentor and org-admin on both GSoC and GSoD programs thereafter, and is also the person responsible for creating the projects ideas and proposal with inputs and approval from the community.

See GSoD reports, case studies and documentation written in our previous GSoD years:

For GSoD 2019, our project focused on documentation for our ScanCode-Toolkit project. The first step was to move existing documentation from a GitHub wiki to ReadTheDocs (with Sphinx and other tools) in order to better link documentation with code as part of our overall CI process. The project then focused on adding Tutorials and improving How-to and (command-line) Reference documentation.

Based on experience from our 2019 GSoD project, we were able to confirm that the RTD/Sphinx tools were a good fit for our projects and we have since moved the primary documentation for our other projects to RTD. We also piloted our use of the documentation framework of Tutorials, HowTo Guides, Reference and Discussion (from Daniele Procida of Divio/Django) which we are applying to all of our projects as we improve their documentation.

For GSoD 2021, our project focused on our scancode.io project. The tasks were extending the HowTo Guides to cover Software Composition Analysis workflows, then upgrading the ScanCode.io Web UI documentation and create an introductory video to show how the web UI is used. We also worked on updating and improving the existing Pipe libraries reference API documentation (which is generated from code documentation “docstrings”). And lastly sync the new documentation set with the code to support continuous integration.

Since we were working with an experienced tech writer, and the hiring and admin work was entirely on us, we learnt a lot through this process and the feedback is deocumented here.

Our years on experience in GSoC mentoring successful projects and also 2 years of extensive GSoD experience puts us in a comfortable and confident position to successfully mentor in GSoD 2024.

Clone this wiki locally