Skip to content

aut 0.50.0

Compare
Choose a tag to compare
@ruebot ruebot released this 06 Feb 01:20
4d1dcc9

Documentation

Release Notes

Full Changelog

Implemented enhancements:

  • Enhance keepValidPages #359
  • Add discardLanguage filter #352
  • Add crawl_date to binary DataFrames and imageLinks #413

Fixed bugs:

  • textFiles does not filter properly #390
  • DataFrame error with text files: java.net.MalformedURLException: unknown protocol: filedesc #362

Closed issues:

  • .webpages() additional tokenized columns? #402
  • Test and documentation inventory #372
  • Missing doc comments #392
  • Bug in ArcTest? Why run RemoveHTML? #369
  • UDF CaMeL cASe consistency issues #368
  • ExtractDomain or ExtractBaseDomain? #367
  • Align DataFrame boilerplate in Python and Scala #366
  • Create a ComputeSHA1 method #363
  • Discussion: Should we align our Named Entity Recognition output with WANE format? #297
  • DataFrame discussion: open thread #190

Merged pull requests: