Note upfront

Auth credentials required

You need to ask win (at) westtoer.be a required api-key (id and secret) to consume this api directly (which is, by the way, not recommended).

Recommended alternative

The data retrieved by this system is made availabe (together with data from other sources) through the Westtoer datahub at http://datahub.westtoer.be/ext/feeds/WIN/2.0/

Contact datahub (at) westtoer.be for more details and access to that.

Install

This requires nodejs and npm to be installed.

Please, be a sport and run the typical

npm install

before trying further, to avoid unnecessary frustration and disappointment.

Usage

Default dump

The default (no dump-specifications) dump:

nodejs dhubdump.js -o ~/feeds/WIN/2.0/ -i <<yourid>> -s <<yoursecret>>

is equivalent to:

nodejs dhubdump.js -o ~/feeds/WIN/2.0/ -i <<yourid>> -s <<yoursecret>> vocs products

meaning that it will combine those 2 dumps in one run.

Products dump

nodejs dhubdump.js -o ~/feeds/WIN/2.0/ -i <<yourid>> -s <<yoursecret>> products

This will create the datahub LEVEL0 dump from the WIN 2.0 tdms. The created dump files are named and organized as such:

[ROOT]/
  <<type>>-<<pubstate>>.<<format>>
  <<type>>-<<pubstate>>-<<period-from-to>>.<<format>>
  bychannel/
    <<channel>>/
        <<channel>>.<<format>>
        <<channel>>-all-<<period-to-from>>.<<format>>
  bytourtype/
    allchannels-<<tourtype>>.<<format>>
    allchannels-<<period-from>>.<<format>>
  bycity/
    <<municipality>>/
      <<municipality>>-<<type>>-all.<<format>>
      <<municipality>>-<<type>>-<<pubstate>>.<<format>>

In this structure the following value-replacements can occur:

key	possibe values	meaning
`<<type>>`	...	type of touristic item in the dump

| accomodation       |   places to stay
| permanent_offering |   POI and attractions
| reca               |   places to eat and drink
| temporary_offering |   events 
| mice               |   mice info

The content of these files is descibed here

Vocabulary dump

nodejs dhubdump.js -o ~/feeds/WIN/2.0/ -i <<yourid>> -s <<yoursecret>> vocs

This dump produces:

[ROOT]/
  reference/
    taxonomies.<<format>>

Samples dump

nodejs dhubdump.js -o ~/feeds/WIN/2.0/ -i <<yourid>> -s <<yoursecret>> samples

This dump produces:

[ROOT]/
  samples/
    sample-<<id>>.<<format>>

Containing export-files holding single-item information, identified by their id. The content of these files follows the previsously mentioned 'products' layout.

Claims dump

nodejs dhubdump.js -o ~/feeds/WIN/2.0/ -i <<yourid>> -s <<yoursecret>> claims

This will produce the files claims.xml and claims.json containing the infromation about the claims in the WIN

Statistics dump

nodejs dhubdump.js -o ~/feeds/WIN/2.0/ -i <<yourid>> -s <<yoursecret>> stats

This will produce the files stats.xml and stats.json containing the statistical data in the WIN

Splitting json files

Adding this switch

-j <<max-number-records>>

Will force all json-array files to be split into separate files (parts) each containing maximum the specified number of elements. If the original json already contains less, no splitting will be provided.

This fetaure is to better support consumers of the dump that don't use a json streaming (or pull) parser.

With this swithch the following extra process goes on for every produced json file.

(only) If **/path/some-name.json has more then N records Then the dump will produce the folder:

    **/path/some-name/ with following contents:
    **/path/some-name/some-name-index.json en 
    **/path/some-name/some-name-part-00000.json 
       …
    **/path/some-name/some-name-part-00023.json

Known Limitations

Considering the wisdom from the Jon Postel Robustness Principle we strongly advise all client services of these files to be able to cope with these known limitations:

Availability Limitation

The api service occasionally runs into internal performance issues, this will result in files not being available in the dump.

In the current state this leads to errors emitted during the run that look like this:

error reading uri [http://api.westtoer.tdms.acc.saga.be/api/v1/temporary_offering/?format=xml&access_token=***&size=9750] to stream - response.status == 500

The status could equally be

any other 50x - the 502 typically indicating the problem is rather at some intermediate proxy.
or even more bare-bones netwerking issues (connectivity, dns resolution issues, ...)

If you didn't run the process yourself, but are just given access to the produced files, you should be able to grab the dhubdump-report.csv. That file lists the files it tried to produce, the query settings, the URI called and the status of that attempt.

Since producing this report is the last action of the dump-process its timestamp will tell you

when the dump-process last completed
and in the odd occasion that it is smaller then that of the produced files: that a dump is currently in progress, or was killed prematurely (without being able to write the report, and thus update its lastmod-ts)

Correctness Limitation

The dump-service currently does not check the "wellformed-ness" of the produced xml and json files.

In other words it can easily be fooled (by a bad working api service in the back) to produce *xml or *json files that simply can't be parsed.

We advise using some linting techniques or other safety checks to verify this. On linux (or windows + cygwin) you can use

for XML

find path/to/root -type f -name \*xml | while read f; do out=$(xmllint --noout $f 2>&1); err=$?; oln=$(echo -n "${out}"|wc -c); if [ "${err}${oln}" -ne "00" ]; then echo "ERROR  XML FILE @ $f"; fi; done;

for JSON

find path/to/root -type f -name \*json | while read f; do cat $f | python -mjson.tool > /dev/null 2>&1; if [ "$?" -ne "0" ]; then >&2 echo "ERROR JSON FILE @ $f"; fi; done;

For convenience a version of the above linting is ran centrally on the datahub instance. You should be able to grab the dhubdump-lintreport.txt. That file lists possibly faulty XML or JSON files in the dump.

Content Limitation

The content captured in these dump files are eventually maintained by humans. Those, at times, fail too.
In order to address possible errors in the content it is best to trace well (and communicate) what information (position by line-number or entity-id) you read from which file obtained at what time.

If at all possible be sure to let that kind of metadata ripple through your own application to be able to trace things back to the source.

#Dev

Ready to help out and delve in? You might want to install mocha, and make sure all your contributions are accompanied by tests covering the addressed issues or new features.

npm install -g mocha
npm test

Individual tests are run with

mocha test/test-specific-name.js

Running the tests requires an id/secret pair (see above on how to obtain) - set those in test/client-settings.json (layout example in test/client-settings.json.example)

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
lib		lib
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dhubdump.js		dhubdump.js
package.json		package.json
test.js		test.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Note upfront

Auth credentials required

Recommended alternative

Install

Usage

Default dump

Products dump

Vocabulary dump

Samples dump

Claims dump

Statistics dump

Splitting json files

Known Limitations

Availability Limitation

Correctness Limitation

Content Limitation

About

Releases

Packages

Contributors 3

Languages

License

westtoer/node-winapi

Folders and files

Latest commit

History

Repository files navigation

Note upfront

Auth credentials required

Recommended alternative

Install

Usage

Default dump

Products dump

Vocabulary dump

Samples dump

Claims dump

Statistics dump

Splitting json files

Known Limitations

Availability Limitation

Correctness Limitation

Content Limitation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages