CLI: add command to upload data from a JSON or CSV file #167

Mec-iS · 2018-03-10T17:16:10Z

I'm submitting a

feature request.

Current Behaviour:

To bulk-load data in the server the only possibility is to send PUT requests to the endpoint

Expected Behaviour:

Data can be loaded by running a command, pointing to a local text file.

py-ranoid · 2018-03-11T17:15:35Z

@Mec-iS Is it okay to add pandas as an additional dependency ?
This will help read from a variety of formats (xlsx,pkl,json,csv,tsv, SQL queries and a lot more)

However it also has a number of modules for processing data that we don't need, (keeping in mind that hydrus was meant to be "lightweight")

Mec-iS · 2018-03-11T17:24:34Z

Always use standard library tools. Standard library has a csv and json package. The use of pandas is not justified at the moment.

chrizandr · 2018-03-11T19:51:32Z

When you say data, you mean instances of objects that the API serves right?

I think it could be handy to have some text file having some preloaded data that can load instances right away. But we would need to define the format of such a file.

Mec-iS · 2018-03-11T21:09:33Z

Yeah with data I mean instances/objects that are actually served by the interface (the one we store using the PUT method on the Items endpoint).

But we would need to define the format of such a file.

Using standard formats is always the way to go. Probably would be better to support, beside JSON, also the different serialization formats for triples for backward compatibility with older Knowledge Bases.

xadahiya · 2018-07-06T05:46:47Z

@vaibhavchellani ^^

sameshl · 2020-04-07T13:45:01Z

@xadahiya @Mec-iS I think that this issue is not solved yet?
I can continue work from #168 which was closed. Could you tell the reason that #168 PR was closed?

Mec-iS · 2020-04-07T13:59:07Z

this is something needed @sameshl , also ping @vedangj044 as I think he is working on it

vedangj044 · 2020-04-07T16:31:00Z

@sameshl #168 looks good but it is specific to a particular case where we have the data for a particular resource that too if the data doesn't contain any abstract property.
I am working on a generic preloading script that maps the column names of a CSV file to the resource names of a Hydra Doc.
Starting with resources with 0 abstract property we load the data using the crud.insert function.
Then for resources where we need to link to properties, we first need to get the resource ID using get function. (this is not yet implemented)
Likewise, the data can be loaded from any CSV file.

Also, a broader solution should focus on all types of files even Rational Databases.
We can discuss 2 approaches for RDB

Generic preloading script which automatically maps data to hydrus-generated database.
Introducing a keyword named SQL in the Hydra vocab. This keyword would contain the SQL query needed to get data from the database and populate the hydrus-generated DB.
Example: SQL: SELECT * FROM ARTIST;
Now hydrus would run this query and populate it's database accordingly.
I think this approach is much safer from a developer's point of view.

Asmi8 · 2021-02-07T09:51:57Z

Why are batch endpoints useful? How can we add them to the existing REST API?

Purvanshsingh · 2021-05-25T08:22:23Z

@Mec-iS Before working on this issue should we set up a Database Config_file and db_parser.py for hydrus in a separate PR.
The workflow will be easier and more manipulation can be done with the database as we will have a unified method to connect the DB.

from db_parser import get_db_url
DB_URL = get_db_url()

Mec-iS · 2021-05-25T08:37:43Z

Everything about the code is in the code. For now we only use file database as SQLite. Ask your colleagues in Slack for generic directions.

Vyvy-vi · 2021-11-03T07:48:38Z

Is anyone working on this, at the moment?

Akash-Kumar-Sen · 2021-11-09T13:48:15Z

I want to work on this issue!

suryatejreddy mentioned this issue Mar 11, 2018

CLI : Add data directly without running server. #168

Closed

2 tasks

Mec-iS added this to the 0.2-alpha milestone Mar 12, 2018

xadahiya added post-poned and removed post-poned labels Jul 6, 2018

xadahiya assigned xadahiya and vaibhavchellani Jul 6, 2018

xadahiya added the GSOC-2018 label Jul 6, 2018

Mec-iS removed the GSOC-2018 label Jul 7, 2018

xadahiya added the enhancement label Nov 14, 2018

Mec-iS added the good-to-start label Mar 25, 2019

Mec-iS assigned Purvanshsingh and unassigned xadahiya and vaibhavchellani May 24, 2021

Mec-iS unassigned Purvanshsingh Jun 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI: add command to upload data from a JSON or CSV file #167

CLI: add command to upload data from a JSON or CSV file #167

Mec-iS commented Mar 10, 2018

py-ranoid commented Mar 11, 2018 •

edited

Loading

Mec-iS commented Mar 11, 2018

chrizandr commented Mar 11, 2018

Mec-iS commented Mar 11, 2018

xadahiya commented Jul 6, 2018

sameshl commented Apr 7, 2020

Mec-iS commented Apr 7, 2020

vedangj044 commented Apr 7, 2020

Asmi8 commented Feb 7, 2021

Purvanshsingh commented May 25, 2021 •

edited

Loading

Mec-iS commented May 25, 2021

Vyvy-vi commented Nov 3, 2021

Akash-Kumar-Sen commented Nov 9, 2021

CLI: add command to upload data from a JSON or CSV file #167

CLI: add command to upload data from a JSON or CSV file #167

Comments

Mec-iS commented Mar 10, 2018

I'm submitting a

Current Behaviour:

Expected Behaviour:

py-ranoid commented Mar 11, 2018 • edited Loading

Mec-iS commented Mar 11, 2018

chrizandr commented Mar 11, 2018

Mec-iS commented Mar 11, 2018

xadahiya commented Jul 6, 2018

sameshl commented Apr 7, 2020

Mec-iS commented Apr 7, 2020

vedangj044 commented Apr 7, 2020

Asmi8 commented Feb 7, 2021

Purvanshsingh commented May 25, 2021 • edited Loading

Mec-iS commented May 25, 2021

Vyvy-vi commented Nov 3, 2021

Akash-Kumar-Sen commented Nov 9, 2021

py-ranoid commented Mar 11, 2018 •

edited

Loading

Purvanshsingh commented May 25, 2021 •

edited

Loading