Skip to content

Releases: archivesunleashed/aut

BROKEN -- aut 0.13.0

07 Mar 18:42
cfb618e
Compare
Choose a tag to compare

BROKEN

See: #181

aut-0.13.0 (2018-03-07)

Full Changelog

Merged pull requests:

aut-0.12.2

28 Feb 21:50
89cf84b
Compare
Choose a tag to compare

Change Log

aut-0.12.2 (2018-02-28)

Full Changelog

Implemented enhancements:

  • ArchiveRecord.warcFile #171
  • Better approach to ids in WriteGraphML & WriteGEXF #168
  • Build pre-filtered networks #109
  • KeepDate UDF should support date range #108
  • Changing keepDate to allow multiple dates, would close #108 #161 (ianmilligan1)

Fixed bugs:

  • Broken GEXF Files Due to < and > characters in node id fields #172
  • There is insufficient memory for the Java Runtime Environment to continue #159
  • AUT Fails on Extracting Text from WARCs #158

Closed issues:

  • RecordLoader.loadArchives fails with nested dirs #169
  • Unparseable date error #163
  • remove angle brackets from ArchiveRecord.getUrl #157
  • Benchmarking Scala vs Python #121
  • Improve WacArcInputFormat.java test coverage #80
  • Improve WacWarcInputFormat.java test coverage #78
  • Improve WarcRecordWritable.java test coverage #77
  • Improve ArcRecordWritable.java test coverage #75
  • Improve ArcRecord.scala test coverage #69
  • Improve RemoveHttpHeader.scala test coverage #57
  • Investigate Jupyter notebooks on Altiscale #37

Merged pull requests:

aut 0.12.1

15 Dec 19:58
ab679a3
Compare
Choose a tag to compare

aut-0.12.1 (2017-12-15)

Full Changelog

Fixed bugs:

aut 0.12.0

12 Dec 04:46
5b0bccc
Compare
Choose a tag to compare

aut-0.12.0 (2017-12-11)

Full Changelog

Implemented enhancements:

Fixed bugs:

Closed issues:

  • Create tests for WriteGEXF.scala #138
  • ERROR ArcRecordUtils - Read 1224 bytes but expected 1300 bytes #128
  • WarcRecordUtils.java uses or overrides a deprecated API #127
  • class LanguageIdentifier in package language is deprecated #126
  • multiple versions of scala #125
  • ExtractLinks running slowly #123
  • com.cloudera.cdh:hadoop-ant:pom:0.20.2-cdh3u4 -- errors #118
  • Improve ExtractDate.scala test coverage #64

Merged pull requests:

aut 0.11.0

23 Nov 00:18
f1d8578
Compare
Choose a tag to compare

Change Log

aut-0.11.0 (2017-11-22)

Full Changelog

Implemented enhancements:

  • GetCrawlYear to accompany GetCrawlMonth #104
  • Refactor RecordLoader classes #102
  • Adding getCrawlYear in ArchiveRecords, resolves #104 #105 (ianmilligan1)

Closed issues:

  • spark-shell --packages "io.archivesunleashed:aut:0.10.0"` fails with not_found dependencies #113
  • update the version of the dependencies not available on the central maven repository #111
  • Bake keepValidPages() into RecordLoader #101
  • Create tests for JsonUtil.scala #66
  • Improve ExtractDomain.scala test coverage #63
  • Improve ExtractImageLinks.scala test coverage #62
  • Improve ExtractLinks.scala test coverage #61
  • Improve StringUtils.scala test coverage #58
  • Improve RemoveHTML.scala test coverage #56
  • Create tests for TweetUtils.scala #54
  • Create tests for ExtractTextFromPDFs.scala #51
  • Create tests for ExtractPopularImages.scala #50
  • Create tests for ExtractBoilerpipeText.scala #47
  • Create tests for ComputeMD5.scala #46
  • Create tests for ComputeImageSize.scala #45

Merged pull requests:

aut 0.10.0

02 Oct 19:48
e64b489
Compare
Choose a tag to compare

aut-0.10.0 (2017-10-02)

Full Changelog

Fixed bugs:

  • NER breaks for WARC files? #41

Closed issues:

  • Do we need pythonconverters/ArcRecordConverter.scala? If so, tests. If not, delete it. #65
  • Upgrade to Spark 2 on Altiscale #43
  • Investigate our test coverage according to codecov.io #36
  • Update Scala version #35
  • Update to use Java 8 #32
  • Migrate warcbase-resources to aut-resources #30
  • mvn site-deploy -DskipTests is still failing #27
  • Retarget Hadoop #9

Merged pull requests:

aut 0.9.0

24 Aug 20:20
70d4dd8
Compare
Choose a tag to compare

aut-0.9.0 (2017-08-24)

Closed issues:

  • More work needs to be done on the pom.xml to get us to a release. #25
  • Is src/main/java/io/archivesunleashed/demo required? #17
  • Visualization Repo (aut-viz) #16
  • Remove src/main/python #10
  • What do we do with all the documentation at docs.warcbase.org? #8
  • Setup to publish javadocs on ghpages #7
  • Get a project setup on sonatype #6
  • Setup license headers and mycila #4
  • Setup checkstyle #3
  • Setup codecov.io #1

Merged pull requests:

* This Change Log was automatically generated by github_changelog_generator