Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate overview page for node and relationship types #21

Open
m-appel opened this issue Jan 19, 2023 · 4 comments
Open

Generate overview page for node and relationship types #21

m-appel opened this issue Jan 19, 2023 · 4 comments
Assignees

Comments

@m-appel
Copy link
Member

m-appel commented Jan 19, 2023

The different node and relationship types that are created by the crawlers are currently only accessible via the individual README's of the crawlers (e.g., APNIC). To make it easier getting started with the database, we need an overview page that summarizes briefly all nodes and relationships. However, to avoid having to maintain everything twice, this page should be generated from the individual README's.

  1. Figure out what information to include and how to represent it in the overview.
  2. Decide on a template for crawler README's.
  3. Create a script (maybe GitHub action?) to generate overview page.
@Yh010
Copy link
Contributor

Yh010 commented Mar 6, 2023

As per me :

A)For the overview HTML page, we could include the data in a tabular format that contains each crawler's generated
1)node types
2)relationship type between the nodes
3)a brief description about the nodes and the relation
4)A link to that Crawler's README

the table will contain the above for each of the crawler used.

B)for crawler's README template:
again we can use a table displaying the node types, relation type between the nodes , and a brief description.
this table will have to be updated by the crawler developer .

C)Script(Github action) to generate the overview page:
1.it should parse the crawlers README info and extract the info
2. next it should generate a HTML page
3.this should be an automation script using github action that runs on each update to the crawler

The nodes could include IP addresses, domain names, URLs, email addresses, usernames, and other types of information that can be identified and extracted from web pages.

@Yh010
Copy link
Contributor

Yh010 commented Mar 7, 2023

should I work on this issue?

@romain-fontugne
Copy link
Member

yes, sure. But I think the final output should be md no html

@Yh010
Copy link
Contributor

Yh010 commented Mar 16, 2023

@romain-fontugne, what if we create the crawler readme template this way:
https://round-bobcat-0ac.notion.site/GSOC-75bc3ef547614960b154275359d12562
If new columns are required, they can be added to the template in a similar fashion.
the above link shows how the data will be displayed on the overview page in .md format.

All new crawlers will be added in the above format.

the scripts required to automate the process are :

  1. whenever a new crawler is added, its readme info should be parsed and added to the overview page

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants