Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crawler transform #797

Open
wants to merge 16 commits into
base: dev
Choose a base branch
from
Open

Crawler transform #797

wants to merge 16 commits into from

Commits on Nov 8, 2024

  1. first implementation of web2parquet for crawling/downloading from see…

    …dURLs
    
    Signed-off-by: Maroun Touma <[email protected]>
    touma-I committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    41bed68 View commit details
    Browse the repository at this point in the history

Commits on Nov 11, 2024

  1. use makefile template

    Signed-off-by: Maroun Touma <[email protected]>
    touma-I committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    cf516b5 View commit details
    Browse the repository at this point in the history

Commits on Nov 13, 2024

  1. complete full implementation and testing with python runtime

    Signed-off-by: Maroun Touma <[email protected]>
    touma-I committed Nov 13, 2024
    Configuration menu
    Copy the full SHA
    acc35cd View commit details
    Browse the repository at this point in the history
  2. identified current requirements for web2parquet module

    Signed-off-by: Maroun Touma <[email protected]>
    touma-I committed Nov 13, 2024
    Configuration menu
    Copy the full SHA
    3e05f30 View commit details
    Browse the repository at this point in the history
  3. relaxed dependencies

    Signed-off-by: Maroun Touma <[email protected]>
    touma-I committed Nov 13, 2024
    Configuration menu
    Copy the full SHA
    5710653 View commit details
    Browse the repository at this point in the history
  4. added build target

    Signed-off-by: Maroun Touma <[email protected]>
    touma-I committed Nov 13, 2024
    Configuration menu
    Copy the full SHA
    80e4ebe View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    cf20268 View commit details
    Browse the repository at this point in the history

Commits on Nov 14, 2024

  1. added licence block

    Signed-off-by: Maroun Touma <[email protected]>
    touma-I committed Nov 14, 2024
    Configuration menu
    Copy the full SHA
    4dcebb6 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    137d92c View commit details
    Browse the repository at this point in the history
  3. fix filename issue

    Signed-off-by: Maroun Touma <[email protected]>
    touma-I committed Nov 14, 2024
    Configuration menu
    Copy the full SHA
    d2404f4 View commit details
    Browse the repository at this point in the history
  4. generate cicd workflow for new transform

    Signed-off-by: Maroun Touma <[email protected]>
    touma-I committed Nov 14, 2024
    Configuration menu
    Copy the full SHA
    1e810d0 View commit details
    Browse the repository at this point in the history
  5. build image only if a Dockerfile is defined

    Signed-off-by: Maroun Touma <[email protected]>
    touma-I committed Nov 14, 2024
    Configuration menu
    Copy the full SHA
    fcbcc0a View commit details
    Browse the repository at this point in the history
  6. Ignore page content as long as we get the right count

    Signed-off-by: Maroun Touma <[email protected]>
    touma-I committed Nov 14, 2024
    Configuration menu
    Copy the full SHA
    b5031c9 View commit details
    Browse the repository at this point in the history

Commits on Nov 15, 2024

  1. rename make.cicd.target

    Signed-off-by: Maroun Touma <[email protected]>
    touma-I committed Nov 15, 2024
    Configuration menu
    Copy the full SHA
    9ad3d18 View commit details
    Browse the repository at this point in the history
  2. updated notebook with example

    Signed-off-by: Maroun Touma <[email protected]>
    touma-I committed Nov 15, 2024
    Configuration menu
    Copy the full SHA
    c9c9779 View commit details
    Browse the repository at this point in the history
  3. updated notebook with example

    Signed-off-by: Maroun Touma <[email protected]>
    touma-I committed Nov 15, 2024
    Configuration menu
    Copy the full SHA
    b77bbe9 View commit details
    Browse the repository at this point in the history