Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel processing #101

Open
mgvarley opened this issue Dec 9, 2019 · 1 comment
Open

Parallel processing #101

mgvarley opened this issue Dec 9, 2019 · 1 comment

Comments

@mgvarley
Copy link

mgvarley commented Dec 9, 2019

Thanks for this excellent library, I have successfully used this to transform 40m records from Postgres to DynamoDb (I see there is an open issue from this so will work on a PR). I am working on a new etl pipeline which takes a csv, enriches it with a call to a web service and then generates a new csv with a line for each record. The file is huge (22m lines) and I need to be able to make multiple service calls in parallel for this to be efficient (~40). I think I need to use the cluster/worker model but I can’t work out how to use this. Would it be possible to add and example of how this should work? Many thanks.

@mgvarley
Copy link
Author

I got this working without the cluster function finally. It could be better optimised but seems to work well for our needs. This is the code if it helps anyone (I am using fast-csv for the CSV formatting):

let counter = 0
etl.file(FILE_IN)
  .pipe(etl.csv())
  .pipe(etl.collect(PARALLEL))
  .pipe(etl.map(async function(docs) {
    await Promise.all(docs.map(async (doc) => {
      const { id, params } = doc
      res = await myapi.call(params)
      counter++
      if (counter % LOG_EVERY === 0) console.log(`${counter} rows processed`)
      this.push(_.extend({ id }, res[0]))
    }))
  }))
  .pipe(csv.format({ headers: true }))
  .pipe(etl.toFile(FILE_OUT))
  .promise()
  .then(() => {
    console.log('done')
  })
  .catch(e => {
    console.error(e)
  })

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant