Skip to content

Commit

Permalink
Merge branch 'master' of github.com:netarchivesuite/solrwayback
Browse files Browse the repository at this point in the history
  • Loading branch information
jesperlauridsen committed Mar 27, 2024
2 parents 3ae47b6 + 78b3ba2 commit df7382b
Show file tree
Hide file tree
Showing 54 changed files with 8,797 additions and 121 deletions.
6 changes: 5 additions & 1 deletion CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,16 @@
# SolrWayback changelog




5.1.0
-----
Substatial speed up when exporting (csv,warc etc.) from large multi sharded collections. See https://github.com/netarchivesuite/solrwayback/issues/329 (Thanks Toke Eskildsen)
Substatial speed up when exporting (csv,warc etc.) from large multi sharded collections. See https://github.com/netarchivesuite/solrwayback/issues/329 (Thanks Toke Eskildsen) This feature still needs a little more testing. Feedback will be welcome.

Minor tweaking of log info/debug. Less log lines in default solrwayback.log when running with log level INFO.
Fix regression bug where "page resources" was not showing missing resources for the webpage.

Updated the bundle install documentation. Added new section how to redeploy the Solr configuration.

5.0.0
-----
Expand Down
31 changes: 22 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# SolrWayback

## SolrWayback 5.0.0 software bundle has been released
SolrWayback bundle release 5.0.0 can be downloaded here: https://github.com/netarchivesuite/solrwayback/releases/tag/5.0.0
## SolrWayback 5.1.0 software bundle has been released
SolrWayback bundle release 5.1.0 can be downloaded here: https://github.com/netarchivesuite/solrwayback/releases/tag/5.1.0

The bundle is the recommended way to get started with SolrWayback. You download the bundle, follow the installation guide and index your own WARC files. Then you are up to speed.

Expand Down Expand Up @@ -135,13 +135,10 @@ Documents in SolrWayback are indexed through the [warc-indexer](https://github.c
* A Solr 9+ server with the index build from the Arc/Warc files using the Warc-Indexer version 3.2.0-SNAPSHOT+
* (Optional) chrome/(chromium) installed for page previews to work. (headless chrome)

## Build and usage
## Build and usage for developers.
* Build the application with: `mvn package`
* Deploy the `target/solrwayback-*.war` file in a web-container
* Copy `src/test/resources/properties/solrwayback.properties` and `/src/test/resources/properties/solrwaybackweb.properties`
to either the root of the tomcat folder or the `user/home/` folder for the J2EE server.
Alternatively use the [src/main/webapp/META-INF/context.xml](src/main/webapp/META-INF/context.xml) as template
for a context for the SolrWayback WAR and set the paths for the properties directly.
* Copy `properties/solrwayback.properties` and `properties/solrwaybackweb.properties` to the `user/home/` folder.
* Modify the property files. (default all urls http://localhost:8080)
* Open search interface: http://localhost:8080/solrwayback

Expand Down Expand Up @@ -171,12 +168,16 @@ Unzip and follow the instructions below.

### 1) INITIAL SETUP

* Copy `properties/solrwayback.properties` and `properties/solrwaybackweb.properties` to the `user/home/` folder.
If you want to use a custom location for the properties you can edit and enable the tomcat context environment variables in `/tomcat-9/conf/Catalina/localhost/solrwayback.xml`

* **Windows only:** Create an enviroment value that points to the folder with java11 or java 17 : `JAVA_HOME=C:\Program Files\Java\jdk-11`

* **Optional:** For screenshot previews to work you may have to edit the file `properties/solrwayback.properties` and change the value of the last two properties : `chrome.command` and `screenshot.temp.imagedir`.
Chrome(Chromium) must be installed for preview of images to work.

If you encounter any errors when running a script during installation or setup, try change the permissions for the file (`startup.sh` etc.). On Linux and mac, this can be done with the following command: `chmod +x filename.sh`

**Note:** Previous versions of the SolrWayback bundle expected the property files to be located at the root of the home folder of the user. If this is preferable, move the two property files `solrwayback.properties` and `solrwaybackweb.properties` from the `properties/` folder in the bundle to the root of the home folder of the user.
If you encounter any errors when running a script during installation or setup, try change the permissions for the file (`startup.sh` etc.). On Linux and mac, this can be done with the following command: `chmod +x filename.sh`

### 2) STARTING SOLRWAYBACK
SolrWayback requires both Solr and Tomcat to be running. These processes are started and stopped separately with the following commands:
Expand Down Expand Up @@ -317,6 +318,18 @@ If you want to remove and old index and create a new index from scratch, this ca
3. Start solr
4. Start the indexing script

### Update Solr cloud configuration
For experienced Solr users only that want to tweak the Solr configuration.
If you want to make changes to schema.xml or solrconfig.xml you must use the cloud update script on a running Solr Cloud.
Changes to schema.xml must be done before starting indexing. Changes to SolrConfig.xml can be done run time.
To update the configuration use the following two commands. (replace paths to your system)

`bin/solr zk upconfig -n netarchivebuilder_conf -d "/home/xxx/solrwayback/solrwayback_package_5.1.0/solr_config/conf" -z localhost:9983`

`curl -X POST "http://localhost:8983/api/collections/netarchivebuilder/" -H 'Content-Type: application/json' -d '{"modify":{"config": "netarchivebuilder_conf" } }`



### Faster indexing
A powerful laptop can handle up to 8 simultaneous indexing processes with Solr running on the same laptop.
Using an SSD for the Solr-index will speed up indexing and also improve search/playback performance drastically.
Expand Down
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<modelVersion>4.0.0</modelVersion>
<groupId>dk.kb.netarchivesuite.solrwayback</groupId>
<artifactId>solrwayback</artifactId>
<version>5.1.0</version>
<version>5.1.1</version>
<packaging>war</packaging>
<name>solrwayback</name>
<url>https://maven.apache.org</url>
Expand Down
22 changes: 22 additions & 0 deletions review.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
non redirect skal ikke sætte headers?
redirect skal adde headers og ikke overskrive.

Eksempel 1:
https://solrwb-test.kb.dk:4000/solrwayback/services/memento/http://www.twenty-fourflowers.com/
Eksempel 2:
https://solrwb-test.kb.dk:4000/solrwayback/services/memento/http://prak10k.dk/?page_id=13




ref:
http://timetravel.mementoweb.org/api/json/2013/http://cnn.com



Review:
Bug: fixed. Redirect må ikke have payload.
Kun redirect support - playback kan ikke køre under /memento url også. (Kompliceret forklaring).
Host -> localhost
todo comment in DatetimeNegotiationTest
Good unittests + solr unittest
104 changes: 0 additions & 104 deletions src/bundle/README.md

This file was deleted.

11 changes: 7 additions & 4 deletions src/bundle/properties/solrwayback.properties
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,10 @@ solr.server.caching.max.entries=10000
# will strain the system. In that case it is recommended to disable active checking and use fixed time
# cache clearing with solr.server.caching.age.seconds instead.
#
# Default is 10 minutes
# Default is 60 seconds. For a large multi sharded index, this limit should be increased to 600 seconds or higher.
# Disable by setting to -1
# If the checking is disabled, consider setting solr.server.caching.age.seconds instead
solr.server.check.interval.seconds=600
solr.server.check.interval.seconds=60

## Link to this webapp itself. BaseURL for link rewrites must be full url.
wayback.baseurl=http://localhost:8080/solrwayback/
Expand Down Expand Up @@ -90,16 +90,17 @@ url.normaliser=normal

# Optional list of Solr-params. Format is key1=value1;key2=value2,...
#solr.search.params=f.url_norm.qf=url

#------- sharddivide export ------------------
# THIS HAS FEATURE STILL NEEDS MORE TESTING. DO NO USE IT YET.
# Pre-SolrWayback 5.0, export always used standard Solr cursorMark for export.
# Solr cursorMark issues some redundant requests that scales with the number of shards in a Solr setup.
# sharddivide avoids redundant requests at the cost of SolrWayback memory overhead, speeding up export
# for multi-shard setups.

# Whether or not to use sharddivide. See subsequent properties when using 'auto'
# Possible values: always, never, auto (default)
solr.export.sharddivide.default=auto
solr.export.sharddivide.default=never

# When solr.export.sharddivide.default == auto, the backing Solr must have at least this number of shards
# for sharddivide to be activated.
Expand All @@ -115,6 +116,7 @@ solr.export.sharddivide.autolimit.hits.default=5000
# Default: 20
solr.export.sharddivide.concurrent.max=20


#------- Generate preview screenshots ------------------
#Used for preview screenshots shown on the page resources overview. Is not required.
#Chrome must be installed on the OS and headless chrome is used to generate the screenshots.
Expand All @@ -135,3 +137,4 @@ screenshot.temp.imagedir=/home/xxx/solrwayback_screenshots/

#Timeout in seconds. Optional, 20 seconds is default.
screenshot.preview.timeout=20
#-------------------------------------------------------
4 changes: 4 additions & 0 deletions src/bundle/solr_config/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Solr configuration

This folder contains a copy of the Solr configuration and can be used upload a new Solr configuration to Solr. Only for experience Solr users that knows what they are doing.
See the' Update Solr cloud configuration' in the project README.md
38 changes: 38 additions & 0 deletions src/bundle/solr_config/conf/elevate.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
<?xml version="1.0" encoding="UTF-8" ?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

<!-- If this file is found in the config directory, it will only be
loaded once at startup. If it is found in Solr's data
directory, it will be re-loaded every commit.
See http://wiki.apache.org/solr/QueryElevationComponent for more info
-->
<elevate>
<query text="foo bar">
<doc id="1" />
<doc id="2" />
<doc id="3" />
</query>

<query text="ipod">
<doc id="MA147LL/A" /> <!-- put the actual ipod at the top -->
<doc id="IW-02" exclude="true" /> <!-- exclude this cable -->
</query>

</elevate>
8 changes: 8 additions & 0 deletions src/bundle/solr_config/conf/lang/contractions_ca.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Set of Catalan contractions for ElisionFilter
# TODO: load this as a resource from the analyzer and sync it in build.xml
d
l
m
n
s
t
9 changes: 9 additions & 0 deletions src/bundle/solr_config/conf/lang/contractions_fr.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Set of French contractions for ElisionFilter
# TODO: load this as a resource from the analyzer and sync it in build.xml
l
m
t
qu
n
s
j
5 changes: 5 additions & 0 deletions src/bundle/solr_config/conf/lang/contractions_ga.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Set of Irish contractions for ElisionFilter
# TODO: load this as a resource from the analyzer and sync it in build.xml
d
m
b
23 changes: 23 additions & 0 deletions src/bundle/solr_config/conf/lang/contractions_it.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Set of Italian contractions for ElisionFilter
# TODO: load this as a resource from the analyzer and sync it in build.xml
c
l
all
dall
dell
nell
sull
coll
pell
gl
agl
dagl
degl
negl
sugl
un
m
t
s
v
d
5 changes: 5 additions & 0 deletions src/bundle/solr_config/conf/lang/hyphenations_ga.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Set of Irish hyphenations for StopFilter
# TODO: load this as a resource from the analyzer and sync it in build.xml
h
n
t
6 changes: 6 additions & 0 deletions src/bundle/solr_config/conf/lang/stemdict_nl.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Set of overrides for the dutch stemmer
# TODO: load this as a resource from the analyzer and sync it in build.xml
fiets fiets
bromfiets bromfiets
ei eier
kind kinder
Loading

0 comments on commit df7382b

Please sign in to comment.