Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI: add command to upload data from a JSON or CSV file #167

Open
1 task done
Mec-iS opened this issue Mar 10, 2018 · 13 comments
Open
1 task done

CLI: add command to upload data from a JSON or CSV file #167

Mec-iS opened this issue Mar 10, 2018 · 13 comments

Comments

@Mec-iS
Copy link
Contributor

Mec-iS commented Mar 10, 2018

I'm submitting a

  • feature request.

Current Behaviour:

To bulk-load data in the server the only possibility is to send PUT requests to the endpoint

Expected Behaviour:

Data can be loaded by running a command, pointing to a local text file.

@py-ranoid
Copy link
Contributor

py-ranoid commented Mar 11, 2018

@Mec-iS Is it okay to add pandas as an additional dependency ?
This will help read from a variety of formats (xlsx,pkl,json,csv,tsv, SQL queries and a lot more)

However it also has a number of modules for processing data that we don't need, (keeping in mind that hydrus was meant to be "lightweight")

@Mec-iS
Copy link
Contributor Author

Mec-iS commented Mar 11, 2018

Always use standard library tools. Standard library has a csv and json package. The use of pandas is not justified at the moment.

@chrizandr
Copy link
Member

When you say data, you mean instances of objects that the API serves right?

I think it could be handy to have some text file having some preloaded data that can load instances right away. But we would need to define the format of such a file.

@Mec-iS
Copy link
Contributor Author

Mec-iS commented Mar 11, 2018

Yeah with data I mean instances/objects that are actually served by the interface (the one we store using the PUT method on the Items endpoint).

But we would need to define the format of such a file.

Using standard formats is always the way to go. Probably would be better to support, beside JSON, also the different serialization formats for triples for backward compatibility with older Knowledge Bases.

@Mec-iS Mec-iS added this to the 0.2-alpha milestone Mar 12, 2018
@xadahiya
Copy link
Member

xadahiya commented Jul 6, 2018

@vaibhavchellani ^^

@sameshl
Copy link
Member

sameshl commented Apr 7, 2020

@xadahiya @Mec-iS I think that this issue is not solved yet?
I can continue work from #168 which was closed. Could you tell the reason that #168 PR was closed?

@Mec-iS
Copy link
Contributor Author

Mec-iS commented Apr 7, 2020

this is something needed @sameshl , also ping @vedangj044 as I think he is working on it

@vedangj044
Copy link
Contributor

@sameshl #168 looks good but it is specific to a particular case where we have the data for a particular resource that too if the data doesn't contain any abstract property.
I am working on a generic preloading script that maps the column names of a CSV file to the resource names of a Hydra Doc.
Starting with resources with 0 abstract property we load the data using the crud.insert function.
Then for resources where we need to link to properties, we first need to get the resource ID using get function. (this is not yet implemented)
Likewise, the data can be loaded from any CSV file.

Also, a broader solution should focus on all types of files even Rational Databases.
We can discuss 2 approaches for RDB

  1. Generic preloading script which automatically maps data to hydrus-generated database.
  2. Introducing a keyword named SQL in the Hydra vocab. This keyword would contain the SQL query needed to get data from the database and populate the hydrus-generated DB.
    Example: SQL: SELECT * FROM ARTIST;
    Now hydrus would run this query and populate it's database accordingly.
    I think this approach is much safer from a developer's point of view.

@Asmi8
Copy link

Asmi8 commented Feb 7, 2021

Why are batch endpoints useful? How can we add them to the existing REST API?

@Purvanshsingh
Copy link
Member

Purvanshsingh commented May 25, 2021

@Mec-iS Before working on this issue should we set up a Database Config_file and db_parser.py for hydrus in a separate PR.
The workflow will be easier and more manipulation can be done with the database as we will have a unified method to connect the DB.

from db_parser import get_db_url
DB_URL = get_db_url()

@Mec-iS
Copy link
Contributor Author

Mec-iS commented May 25, 2021

Everything about the code is in the code. For now we only use file database as SQLite. Ask your colleagues in Slack for generic directions.

@Vyvy-vi
Copy link

Vyvy-vi commented Nov 3, 2021

Is anyone working on this, at the moment?

@Akash-Kumar-Sen
Copy link

I want to work on this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests