Releases: archivesunleashed/aut
Releases · archivesunleashed/aut
BROKEN -- aut 0.13.0
aut-0.12.2
Change Log
aut-0.12.2 (2018-02-28)
Implemented enhancements:
- ArchiveRecord.warcFile #171
- Better approach to ids in WriteGraphML & WriteGEXF #168
- Build pre-filtered networks #109
- KeepDate UDF should support date range #108
- Changing keepDate to allow multiple dates, would close #108 #161 (ianmilligan1)
Fixed bugs:
- Broken GEXF Files Due to < and > characters in node id fields #172
- There is insufficient memory for the Java Runtime Environment to continue #159
- AUT Fails on Extracting Text from WARCs #158
Closed issues:
- RecordLoader.loadArchives fails with nested dirs #169
- Unparseable date error #163
- remove angle brackets from ArchiveRecord.getUrl #157
- Benchmarking Scala vs Python #121
- Improve WacArcInputFormat.java test coverage #80
- Improve WacWarcInputFormat.java test coverage #78
- Improve WarcRecordWritable.java test coverage #77
- Improve ArcRecordWritable.java test coverage #75
- Improve ArcRecord.scala test coverage #69
- Improve RemoveHttpHeader.scala test coverage #57
- Investigate Jupyter notebooks on Altiscale #37
Merged pull requests:
- Gexf Fixes & StringUtil Functions #172 #173 (greebie)
- Graphml Improvements #170 (greebie)
- Graphml #167 (greebie)
- Fix bug -- label type should be "string" not "label". #166 (greebie)
- Add link to docker-aut. #160 (ruebot)
- Remove references to Arc and WarcRecord libraries (covered by Archive… #146 (greebie)
aut 0.12.1
aut-0.12.1 (2017-12-15)
Fixed bugs:
- ARC Handling Bug in 0.12.0 when Extracting Links #154
- Changes jsoup version in pom.xml (#154) #155 (ianmilligan1)
aut 0.12.0
aut-0.12.0 (2017-12-11)
Implemented enhancements:
Fixed bugs:
- NullPointerException error during build #124
- Resolves Issue #128: Uses new getOrigins method #136 (ianmilligan1)
Closed issues:
- Create tests for WriteGEXF.scala #138
- ERROR ArcRecordUtils - Read 1224 bytes but expected 1300 bytes #128
- WarcRecordUtils.java uses or overrides a deprecated API #127
- class LanguageIdentifier in package language is deprecated #126
- multiple versions of scala #125
- ExtractLinks running slowly #123
- com.cloudera.cdh:hadoop-ant:pom:0.20.2-cdh3u4 -- errors #118
- Improve ExtractDate.scala test coverage #64
Merged pull requests:
- Too many JUNITs #152 (ruebot)
- Add more packages and exclusions for #113 #150 (ruebot)
- Add tests for RecordLoader #149 (greebie)
- Tuple Formatter Test Improvement #145 (greebie)
- Check to replace partial coverage for ExtractDate. #144 (greebie)
- Add GraphML UDF #143 (greebie)
- Remove stackTrace output on caught error. #141 (greebie)
- Add deprecation warnings to outmoded Arc and Warc formats. #140 (greebie)
- Tests for WriteGEXF Issue #138 #139 (greebie)
- Include script to write to GEXF. (#103) #137 (greebie)
- Resolved archivesunleashed/aut issue #128. #135 (jrwiebe)
- Use correct import for WARCConstants; Resolves #127. #133 (ruebot)
- Downgrade Tika to 1.12. Resolves #126. #132 (ruebot)
- Pin everything to Scala 2.11.8; Resolves #125. #129 (ruebot)
- Exclude old version of Hadoop. Resolves #118. #119 (ruebot)
aut 0.11.0
Change Log
aut-0.11.0 (2017-11-22)
Implemented enhancements:
- GetCrawlYear to accompany GetCrawlMonth #104
- Refactor RecordLoader classes #102
- Adding getCrawlYear in ArchiveRecords, resolves #104 #105 (ianmilligan1)
Closed issues:
- spark-shell --packages "io.archivesunleashed:aut:0.10.0"` fails with not_found dependencies #113
- update the version of the dependencies not available on the central maven repository #111
- Bake keepValidPages() into RecordLoader #101
- Create tests for JsonUtil.scala #66
- Improve ExtractDomain.scala test coverage #63
- Improve ExtractImageLinks.scala test coverage #62
- Improve ExtractLinks.scala test coverage #61
- Improve StringUtils.scala test coverage #58
- Improve RemoveHTML.scala test coverage #56
- Create tests for TweetUtils.scala #54
- Create tests for ExtractTextFromPDFs.scala #51
- Create tests for ExtractPopularImages.scala #50
- Create tests for ExtractBoilerpipeText.scala #47
- Create tests for ComputeMD5.scala #46
- Create tests for ComputeImageSize.scala #45
Merged pull requests:
- This needs to hold steady. #117 (ruebot)
- Update all dependencies, and add missing dependencies to resolve #113. #116 (ruebot)
- Updated documentation links; link to project page #115 (ianmilligan1)
- Remove pom.xml cruft; Partially resolves #111. #112 (ruebot)
- Created Code of Conduct file #110 (SamFritz)
- Refactor ArchiveRecord classes; addresses #101 and #102 #107 (MapleOx)
- Improve coverage for issue-67 (RecordRDD.scala) #99 (greebie)
- Minor fix to improve coverage. #55 #98 (greebie)
- Test ExtractTextFromPDFs. #51 #97 (greebie)
- Remove example scripts. Resolves #95, #70, #71, #72. #96 (ruebot)
- Setup cobertura better so we have local html reports. #94 (ruebot)
- Create unit tests for Issue #50 (ExtractPopularImages) #93 (greebie)
- Add ExtractGraphTest; lint fixes on RemoveHttpHeaderTest. #92 (greebie)
- Improve coverage for Issue #80 #91 (greebie)
- Improve coverage for TweetUtils #90 (greebie)
- Increase coverage for ComputeImageSize. #45 #89 (greebie)
- Complete coverage for #66 #88 (greebie)
- Improve Test Coverage for #55, #56, #57, #58, #59, #60, #61, #62, #63, #64 & #66 #87 (greebie)
- Add PR template. #85 (ruebot)
- First round of unit tests #84 (greebie)
- Use Scala 2.11.8; Align further with Altiscale. #83 (ruebot)
aut 0.10.0
aut-0.10.0 (2017-10-02)
Fixed bugs:
- NER breaks for WARC files? #41
Closed issues:
- Do we need pythonconverters/ArcRecordConverter.scala? If so, tests. If not, delete it. #65
- Upgrade to Spark 2 on Altiscale #43
- Investigate our test coverage according to codecov.io #36
- Update Scala version #35
- Update to use Java 8 #32
- Migrate warcbase-resources to aut-resources #30
- mvn site-deploy -DskipTests is still failing #27
- Retarget Hadoop #9
Merged pull requests:
- Update to Apache Spark 2.1.1; resolves #43. #82 (ruebot)
- Remove unused file; resolves #65. #81 (ruebot)
- Removed inaccurate information from README.md #44 (lintool)
- Add WARC support for ExtractEntities; Resolve #41. #42 (ruebot)
- Add OpenJDK8 and remove OracleJDK7 so we can use trusty. #39 (ruebot)
- Link to aut-docs in README #38 (ianmilligan1)
- Resolve #32; Update to Java 8 #34 (ruebot)
- Resolve #9; Update Hadoop and Spark versions. #33 (ruebot)
- Added reference to the releases #31 (ianmilligan1)
- Resolve #27 - Deploy javadocs to gh-pages #29 (ruebot)
- Add Maven Central badge. #28 (ruebot)
aut 0.9.0
aut-0.9.0 (2017-08-24)
Closed issues:
- More work needs to be done on the pom.xml to get us to a release. #25
- Is src/main/java/io/archivesunleashed/demo required? #17
- Visualization Repo (aut-viz) #16
- Remove
src/main/python
#10 - What do we do with all the documentation at docs.warcbase.org? #8
- Setup to publish javadocs on ghpages #7
- Get a project setup on sonatype #6
- Setup license headers and mycila #4
- Setup checkstyle #3
- Setup codecov.io #1
Merged pull requests:
- Resolve #25 update pom.xml to do a release #26 (ruebot)
- Resolve #7 #24 (ruebot)
- Add Slack integration for TravisCI #21 (ruebot)
- Setup mycila plugin, and normalize all license headers; Resolves #4. #20 (ruebot)
- Add checkstyle plugin, and remove demo; resolves #3 #17. #19 (ruebot)
- Updating README #15 (ianmilligan1)
- Remove dir; resolves #10 #11 (ruebot)
- Setup codecov.io integration; resolves #1 #2 (ruebot)
* This Change Log was automatically generated by github_changelog_generator