datasets : harmonize Netflix parsers with the rest #26

ocramz · 2018-12-30T10:53:50Z

The Netflix Prize dataset uses a custom parser because one data example does not fit into a single dataset row (such as CSV data) but has a custom "stanza-based" format. For example, these are two stanzas of the "qualifying.txt" data file :

1:
1046323,2005-12-19
1080030,2005-12-23
2127527,2005-12-04
1944918,2005-10-05
1057066,2005-11-07
954049,2005-12-20
10:
12868,2004-10-19
627923,2005-12-16
690763,2005-12-13

It would be nice to upgrade the library such that it can deal with these cases

Solution sketch:

Add one constructor to ReadAs that can accept an attoparsec parser as parameter

The text was updated successfully, but these errors were encountered:

ocramz added enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers R&D: library Research and (re-)design a library component labels Dec 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datasets : harmonize Netflix parsers with the rest #26

datasets : harmonize Netflix parsers with the rest #26

ocramz commented Dec 30, 2018

datasets : harmonize Netflix parsers with the rest #26

datasets : harmonize Netflix parsers with the rest #26

Comments

ocramz commented Dec 30, 2018

Solution sketch: