Skip to content

aut 0.17.0

Compare
Choose a tag to compare
@ruebot ruebot released this 04 Oct 21:42
694382c

Change Log

aut-0.17.0 (2018-10-04)

Full Changelog

Implemented enhancements:

  • Add EscapeHTML Function for ExtractLinks #266
  • PySpark support #12

Fixed bugs:

  • AUT exits/dies on java.util.zip.ZipException: too many length or distance symbols #271
  • AUT exits/dies on java.util.zip.ZipException: invalid distance too far back #246
  • Improve ExtractDomain Normalization #239
  • Twitter analysis is broken; see also: json4s/json4s#496 #197
  • Prevent encoding errors in PySpark #122

Closed issues:

  • Cannot skip bad record while reading warc file #267
  • Why did Scalastyle not reject null values in TweetUtilTest #255
  • Create UDF to combine basic text filtering features #253
  • spark-shell --packages "io.archivesunleashed:aut:0.16.0" fails with not_found dependencies #242
  • CommandLineAppRunner.scala produces output per WARC instead of combined result. #235
  • Extract images out of images DataFrame and store to disk #232
  • Before the next release, make sure docker-aut builds on master... or make sure --packages works #227
  • DataFrames for image analysis #220
  • The attempt to upgrade Spark version to 2.3.0 is not successful #218
  • Convert nulls to Option(T) #212
  • Bringing Scala DataFrames into PySpark #209
  • What is AUT? #208
  • Refactor ExtractGraph and assess value of GraphX for producing network graphs #203
  • Codify creation of standard derivatives into apps #195
  • TweetUtils - support fulltext #192
  • Combine UDFs into appropriate objects #187
  • Register Scala functions for use in Pyspark #148
  • PySpark performance bottlenecks: counting values #130
  • Redesign of PySpark DataFrame interface for filtering #120
  • Improve RecordLoader.scala test coverage #60

Merged pull requests: