Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement memento api #408

Open
wants to merge 78 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
cc13602
Add classes and tests for memento framework and first part of datetim…
VictorHarbo Jul 24, 2023
d4b7065
add first part of timemap
VictorHarbo Jul 24, 2023
73bc1ca
add placeholder json method
VictorHarbo Jul 24, 2023
25af886
add timemap test class
VictorHarbo Jul 25, 2023
e53917b
update
VictorHarbo Jul 25, 2023
16c1dfe
memento timemap work
VictorHarbo Jul 25, 2023
26f7656
Merge branch 'master' into implement_memento_api
VictorHarbo Jul 25, 2023
61491a7
Update memento implementation
VictorHarbo Jul 25, 2023
e0bc10b
Create timemap through streaming
VictorHarbo Jul 26, 2023
a7574ee
Add first and last memento relation to timemap
VictorHarbo Jul 27, 2023
c523596
Add method for single url closest date lookup
VictorHarbo Jul 27, 2023
58e7039
add boilerplate for memento pattern 2.2
VictorHarbo Jul 27, 2023
55ddcae
Add mementoDoc dto
VictorHarbo Jul 28, 2023
72cad0e
Create memento header for memento datetime negotiation pattern 2.2
VictorHarbo Jul 28, 2023
1499c9b
Add memento arc to timegate
VictorHarbo Jul 28, 2023
86082c8
make memento timegates work with HEAD and GET requests
VictorHarbo Aug 1, 2023
7115478
Add seperate path resolver class
VictorHarbo Aug 1, 2023
dc871bc
Update path resolving and add check to url normalizer
VictorHarbo Aug 2, 2023
c1fcb65
Implement working timemap endpoint
VictorHarbo Aug 2, 2023
c8ca536
Update javadoc
VictorHarbo Aug 2, 2023
220787e
Add documentation
VictorHarbo Aug 2, 2023
6bbd278
change filename construction
VictorHarbo Aug 3, 2023
8617ac9
Add todos and documentation
VictorHarbo Aug 3, 2023
7407dd9
Add memento2solr and solr2memento date conversion
VictorHarbo Aug 3, 2023
85bd726
Update test for pattern 2.2
VictorHarbo Aug 3, 2023
fa7ecd9
Update test for pattern 2.2
VictorHarbo Aug 3, 2023
4395ee1
Implement memento pattern 2.1 and add header test for pattern
VictorHarbo Aug 3, 2023
c2251e9
Add memento redirect property
VictorHarbo Aug 3, 2023
fd1206e
Update fields for mementoDoc object
VictorHarbo Aug 3, 2023
783a002
Make memento resolving use viewImpl for GET requests
VictorHarbo Aug 3, 2023
d0afc06
update docs
VictorHarbo Aug 3, 2023
2c1d1a3
Add first implementation of JSON timemap support
VictorHarbo Aug 4, 2023
2cac9ff
Update JSON construction
VictorHarbo Aug 4, 2023
d1ad539
refactor class
VictorHarbo Aug 4, 2023
a788550
remove unused object and debug logs
VictorHarbo Aug 7, 2023
b793f15
Convert tests to unit tests with mockito
VictorHarbo Aug 7, 2023
e53e4fd
add check for empty datetime
VictorHarbo Aug 7, 2023
931b6ce
Update mockito version
VictorHarbo Aug 7, 2023
3893104
remove unused imports
VictorHarbo Aug 7, 2023
d7bcfda
Add test for paged timemaps
VictorHarbo Aug 7, 2023
27a67b7
Implement paging for json
VictorHarbo Aug 8, 2023
28ec8d1
Remove old code
VictorHarbo Aug 8, 2023
6faff05
Implement paged json response
VictorHarbo Aug 8, 2023
374bae2
first part of paged link format
VictorHarbo Aug 8, 2023
e226db9
implement paged link-format
VictorHarbo Aug 8, 2023
dc3afc8
Update timemap uri construction
VictorHarbo Aug 8, 2023
b93f117
Update links in json and link-format
VictorHarbo Aug 8, 2023
c17fe4d
update tests to match updated code
VictorHarbo Aug 8, 2023
ba31f49
Update response
VictorHarbo Aug 8, 2023
c17638b
Update test to correct format
VictorHarbo Aug 8, 2023
bd1c20c
refactor timemap construction
VictorHarbo Aug 9, 2023
d9ec497
Add properties for memento pagesize and paginglimit
VictorHarbo Aug 9, 2023
8db930f
remove wrong newlines from link format
VictorHarbo Aug 9, 2023
5f04190
Update link format construction
VictorHarbo Aug 9, 2023
dba1b17
Update link timemap
VictorHarbo Aug 10, 2023
73417dc
Update memento counter
VictorHarbo Aug 10, 2023
ea81402
Make timemap unittests use embedded solr
VictorHarbo Aug 10, 2023
aba5566
Add tests for json format
VictorHarbo Aug 10, 2023
812062b
Add tests for json format
VictorHarbo Aug 10, 2023
d5ba63e
Add tests for json format
VictorHarbo Aug 10, 2023
9c6abfc
Add tests for different page views
VictorHarbo Aug 11, 2023
2575edc
Update paged responses with no specified pages to return first page
VictorHarbo Aug 11, 2023
3f53668
Change TimeMap inheritance
VictorHarbo Aug 11, 2023
d22d2f5
Add check for allowed playback
VictorHarbo Aug 11, 2023
b999bd7
Add notes on unit testing failures.
VictorHarbo Aug 12, 2023
616f705
Update use of memento pagesize property
VictorHarbo Aug 14, 2023
8e4e059
Change DatetimeNegotiation test to use embedded solr and remove old t…
VictorHarbo Aug 14, 2023
a17d1c8
set default properties
VictorHarbo Aug 15, 2023
3ae51f7
Merge branch 'master' into implement_memento_api
Dec 26, 2023
364981b
After merge from master, fix embedded solr test.
Dec 26, 2023
9f76a10
Unitest must also load web-properties. Unittest only works if there
Dec 28, 2023
38089c0
Memento headers must be added to existing headers.
Dec 29, 2023
45e8ac8
Memento redirect must not also have payload. Added unittest to ensure it
Dec 29, 2023
ab8f3a0
Redirect mode can now not be changed. Added Mementi properties to bundle
Dec 30, 2023
13bc3e2
Memento redirect is now default and only option. Fixed unittest
Dec 30, 2023
5a1fb92
cleaniup imports
Dec 30, 2023
76a8403
merge master into branch
VictorHarbo Aug 4, 2024
7cab822
make unittests run after merge from master
VictorHarbo Aug 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-core</artifactId>
<version>3.2.4</version>
<artifactId>mockito-inline</artifactId>
<version>3.4.0</version>
<scope>test</scope>
</dependency>

Expand Down
10 changes: 10 additions & 0 deletions src/bundle/properties/solrwayback.properties
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,16 @@ pid.collection.name=netarkivet.dk
# Use normal for all warc-indexers version 3.2.0+
url.normaliser=normal

#Memento properties
#Memento Datetime negotiation property
# If set to true, the datetime negotiation will return HTTP 302 instead of 200
# Only redirect for playback is supported for now. Property below can not be changed
#memento.redirect=true
# Defines when to split memento timemap into paged timemaps
memento.timemap.paginglimit=5
# Defines how many individual mementos that are to be presented in each paged timemap
memento.timemap.pagesize=2

# Optional list of Solr-params. Format is key1=value1;key2=value2,...
#solr.search.params=f.url_norm.qf=url

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,246 @@
package dk.kb.netarchivesuite.solrwayback.memento;

import dk.kb.netarchivesuite.solrwayback.facade.Facade;
import dk.kb.netarchivesuite.solrwayback.properties.PropertiesLoader;
import dk.kb.netarchivesuite.solrwayback.properties.PropertiesLoaderWeb;
import dk.kb.netarchivesuite.solrwayback.service.SolrWaybackResource;
import dk.kb.netarchivesuite.solrwayback.service.dto.ArcEntry;
import dk.kb.netarchivesuite.solrwayback.service.dto.IndexDoc;
import dk.kb.netarchivesuite.solrwayback.service.dto.MementoDoc;
import dk.kb.netarchivesuite.solrwayback.service.exception.InvalidArgumentServiceException;
import dk.kb.netarchivesuite.solrwayback.solr.NetarchiveSolrClient;
import dk.kb.netarchivesuite.solrwayback.util.DateUtils;
import org.apache.solr.common.SolrDocument;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import javax.ws.rs.core.MultivaluedHashMap;
import javax.ws.rs.core.MultivaluedMap;
import javax.ws.rs.core.Response;
import javax.ws.rs.core.Response.ResponseBuilder;

import java.text.ParseException;
import java.time.format.DateTimeFormatter;
import java.util.Optional;

/**
* This class implements the Datetime Negotiation of the Memento Framework
* as specified in <a href="https://datatracker.ietf.org/doc/html/rfc7089#section-4.1">RFC 7089</a>.
*
* All methods for non-redirect (2.2 mode) has been commented out since 2.2 is not supported.
*/
public class DatetimeNegotiation {
private static final Logger log = LoggerFactory.getLogger(DatetimeNegotiation.class);

public static Response getMemento(String url, String acceptDatetime) throws Exception {
if (PropertiesLoader.MEMENTO_REDIRECT) {
return redirectingTimegate(url, acceptDatetime);
} else {

throw new InvalidArgumentServiceException("Memento playback only supports redirect mode."); //
//return nonRedirectingTimeGate(url, acceptDatetime);
}
}

/**
*
*
*
* Non-Redirecting TimeGate (Memento Pattern 2.2)
* This behavior is consistent with Memento Pattern 2.2 and is the default behavior for timegates in SolrWayback.
*
* @param url to find timegate for.
* @param acceptDatetime the datetime that the enduser wants to obtain a memento for. SolrWayback delivers
* the closest possible memento.
* @return an HTTP 200 response with memento headers and the memento as the entity.
*/


//Playback will be very flawed with this approach since url now has /mememto.
//Only solution is to return a small HTML page with a frame, that has the correct playback url.
//Playback logic is implemented in root-service. Html URL parser, serviceworker, live leak referrer fix. We do not want to have double logic for playback
//Noone cares.. Everyone uses browser they will not notice a redirect, so this 2.2 option is not required.
//Also I do not like it! Playback is polluted with Memento header fields.
/*
public static Response nonRedirectingTimeGate(String url,
String acceptDatetime) throws Exception {

MementoMetadata metadata = new MementoMetadata();
String solrDate = DateUtils.convertMementodate2Solrdate(acceptDatetime);
log.info("Converted RFC1123 date to solrdate: '{}'", solrDate);

// Create response through streaming of a single SolrDocument.
Optional<Response> responseOpt = NetarchiveSolrClient.getInstance()
.findNearestHarvestTimeForSingleUrlFewFields(url, solrDate)
.map(doc -> addHeadersToMetadataObjectNonRedirecting(doc, metadata))
.map(doc -> streamMementoFromNonRedirectingTimeGate(doc, metadata))
.reduce((first, second) -> first);

return responseOpt.orElseGet(() -> Response.status(404).build());
}
*/
/**
* Redirecting TimeGate (Memento Pattern 2.1)
* This behavior is consistent with Memento Pattern 2.1 and can be configured through a property.
*
* @param url to find timegate for.
* @param acceptDatetime the datetime that the enduser wants to obtain a memento for. SolrWayback delivers
* the closest possible memento.
* @return an HTTP 302 response with memento headers and the memento as the entity.
*/
private static Response redirectingTimegate(String url, String acceptDatetime) throws ParseException {
MementoMetadata metadata = new MementoMetadata();
String solrDate = DateUtils.convertMementodate2Solrdate(acceptDatetime);
log.info("Converted RFC1123 date to solrdate: '{}'", solrDate);

// Create response through streaming of a single SolrDocument.
Optional<Response> responseOpt = NetarchiveSolrClient.getInstance()
.findNearestHarvestTimeForSingleUrlFewFields(url, solrDate)
.map(doc -> saveFirstAndLastDate(doc, metadata))
.map(doc -> addHeadersToMetadataObjectForRedirectingTimegate(doc, metadata))
.map(doc -> streamMementoFromRedirectingTimeGate(doc, metadata))
.reduce((first, second) -> first);

return responseOpt.orElseGet(() -> Response.status(404).build());
}

private static MementoDoc saveFirstAndLastDate(MementoDoc doc, MementoMetadata metadata) {
if (doc.getWayback_date() < metadata.getFirstWaybackDate()){
metadata.setFirstWaybackDate(doc.getWayback_date());
}
if (doc.getWayback_date() > metadata.getLastWaybackDate()){
metadata.setLastWaybackDate(doc.getWayback_date());
}
try{
metadata.setFirstMementoFromFirstWaybackDate();
metadata.setLastMementoFromLastWaybackDate();
} catch (ParseException e) {
throw new RuntimeException(e);
}
return doc;

}

/**
* Create HTTP headers for timegate found from Solr Index
* @param doc contains data used to construct the header for the memento.
* @param metadata object which stores variables, that are used to construct the headers. Headers are also stored
* in this object when constructed.
* @return the input doc for further use in a streaming chain.
*/

/*
private static MementoDoc addHeadersToMetadataObjectNonRedirecting(MementoDoc doc, MementoMetadata metadata) {
MultivaluedMap<String, Object> headers = new MultivaluedHashMap<>();
headers.add("Date", java.time.OffsetDateTime.now().format(DateTimeFormatter.RFC_1123_DATE_TIME));
headers.add("Vary", "accept-datetime");
headers.add("Content-Location", PropertiesLoaderWeb.WAYBACK_SERVER + "services/web/" +
doc.getWayback_date() + "/" + doc.getUrl());
try {
headers.add("Memento-Datetime", DateUtils.convertWaybackdate2Mementodate(doc.getWayback_date()));
} catch (ParseException e) {
throw new RuntimeException(e);
}
String linkString = "<" + doc.getUrl() + ">; rel=\"original\"," +
"<" + PropertiesLoaderWeb.WAYBACK_SERVER + "services/memento/timemap/" + doc.getUrl() + ">" +
"; rel=\"timemap\"; type=\"application/link-format\"," +
"<" + PropertiesLoaderWeb.WAYBACK_SERVER + "services/memento/" + doc.getUrl() + ">" +
"; rel=\"timegate\"";
headers.add("Link", linkString);
headers.add("Content-Length", doc.getContent_length());
headers.add("Content-Type", doc.getContent_type());
headers.add("Connection", "close");

metadata.setHttpHeaders(headers);

return doc;
}
*/
/**
* Create HTTP headers for timegate found from Solr Index
* @param doc contains data used to construct the header for the memento.
* @param metadata object which stores variables, that are used to construct the headers. Headers are also stored
* in this object when constructed.
* @return the input doc for further use in a streaming chain.
*/
private static MementoDoc addHeadersToMetadataObjectForRedirectingTimegate(MementoDoc doc, MementoMetadata metadata) {
MultivaluedMap<String, Object> headers = new MultivaluedHashMap<>();
headers.add("Date", java.time.OffsetDateTime.now().format(DateTimeFormatter.RFC_1123_DATE_TIME));
headers.add("Vary", "accept-datetime");
headers.add("Location", PropertiesLoaderWeb.WAYBACK_SERVER + "services/web/" +
doc.getWayback_date() + "/" + doc.getUrl());
String linkString = "<" + doc.getUrl() + ">; rel=\"original\"," +
"<" + PropertiesLoaderWeb.WAYBACK_SERVER + "services/memento/timemap/" + doc.getUrl() + ">" +
"; rel=\"timemap\"; type=\"application/link-format\"\n" +
"; from=\"" + metadata.getFirstMemento() + "\"\n" +
"; until=\"" + metadata.getLastMemento() + "\"";
headers.add("Link", linkString);
headers.add("Content-Length", 0);
headers.add("Content-Type", doc.getContent_type());
headers.add("Connection", "close");

metadata.setHttpHeaders(headers);

return doc;
}


/**
* Streams the memento found for the timegate.
* @param doc containing data from solr for accessing the memento in the WARC files.
* @param metadata object which contains metadata on the memento. Including headers.
* @return a response containing correct memento headers and the memento as the response entity.
*/
/*
private static Response streamMementoFromNonRedirectingTimeGate(MementoDoc doc, MementoMetadata metadata) {
if (PropertiesLoader.PLAYBACK_DISABLED){
return Response.noContent().replaceAll(metadata.getHttpHeaders()).build();
} else {
try {
SolrWaybackResource resource = new SolrWaybackResource();
Response resp = resource.viewImpl(doc.getSource_file_path(), doc.getSource_file_offset(), true, true);
ResponseBuilder entity = Response.fromResponse(resp);
addMementoHeadersToReponse(entity, metadata);
return entity.build();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
}
*/
/**
* Return the header with the 302 redirection location. Has no payload.
*
* @param doc containing data from solr for accessing the memento in the WARC files.
* @param metadata object which contains metadata on the memento. Including the additional headers.
* @return a response containing correct memento headers and the memento as the response entity.
*/
private static Response streamMementoFromRedirectingTimeGate(MementoDoc doc, MementoMetadata metadata) {
if (PropertiesLoader.PLAYBACK_DISABLED){
return Response.noContent().replaceAll(metadata.getHttpHeaders()).build();
} else {
try {
ResponseBuilder entity = Response.status(302);
addMementoHeadersToReponse(entity,metadata);
return entity.build();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
}

/**
* Add the additional headers for mememto reponse. The headers are different for redirecting and non-redirecting requests.
*
* @param entity The response to enrinch with additioal headers
* @param metadata Additional headers
*/
private static void addMementoHeadersToReponse(ResponseBuilder entity, MementoMetadata metadata) {
for (String header: metadata.getHttpHeaders().keySet()) {
String value=metadata.getHttpHeaders().get(header).get(0).toString();
log.debug("adding memento header:"+header +" value:"+value);
entity.header(header, value);
}
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
package dk.kb.netarchivesuite.solrwayback.memento;

import dk.kb.netarchivesuite.solrwayback.properties.PropertiesLoaderWeb;
import dk.kb.netarchivesuite.solrwayback.util.DateUtils;

import javax.ws.rs.core.MultivaluedMap;
import java.text.ParseException;

/**
* Object which contains metadata about a single original resource, used to produce correct mementos.
*/
public class MementoMetadata {

private String timeMapHead;
private String firstMemento;
private String lastMemento;
private long firstWaybackDate = 99999999999999L;
private long lastWaybackDate = 19500101010000L;

private MultivaluedMap<String, Object> httpHeaders;

public MementoMetadata(){}
public String getFirstMemento() {
return firstMemento;
}

public String getLastMemento() {
return lastMemento;
}

public long getFirstWaybackDate(){
return firstWaybackDate;
}

public long getLastWaybackDate(){
return lastWaybackDate;
}

public String getTimeMapHead() {
return timeMapHead;
}

public MultivaluedMap<String, Object> getHttpHeaders() {
return httpHeaders;
}

public void setFirstMemento(String firstMemento) {
this.firstMemento = firstMemento;
}

/**
* Sets the first memento to given wayback date.
* @param waybackDate represented as 14 digits
*/
public void setFirstMemento(Long waybackDate) throws ParseException {
this.firstMemento = DateUtils.convertWaybackdate2Mementodate(waybackDate);
}

public void setLastMemento(String lastMemento) {
this.lastMemento = lastMemento;
}

public void setFirstMementoFromFirstWaybackDate() throws ParseException {
this.firstMemento = DateUtils.convertWaybackdate2Mementodate(this.firstWaybackDate);
}

public void setLastMementoFromLastWaybackDate() throws ParseException {
this.lastMemento = DateUtils.convertWaybackdate2Mementodate(this.lastWaybackDate);
}

public void setFirstWaybackDate(long firstWaybackDate) {
this.firstWaybackDate = firstWaybackDate;
}

public void setLastWaybackDate(long lastWaybackDate) {
this.lastWaybackDate = lastWaybackDate;
}

public void setTimeMapHeadForLinkFormat(String originalResource, Integer pageNumber) {
String timemapLink = "";
if (pageNumber == null || pageNumber.equals(0)){
timemapLink = PropertiesLoaderWeb.WAYBACK_SERVER + "services/memento/timemap/link/" + originalResource;
} else {
timemapLink = PropertiesLoaderWeb.WAYBACK_SERVER + "services/memento/timemap/"+pageNumber+"/link/" + originalResource;
}

this.timeMapHead = "<" + originalResource + ">;rel=\"original\",\n" +
"<"+ timemapLink + ">; rel=\"self\"; type=\"application/link-format\"" +
"; from=\"" + this.getFirstMemento() + "\"" +
"; until=\"" + this.getLastMemento() + "\",\n" +
"<"+ PropertiesLoaderWeb.WAYBACK_SERVER +"services/memento/" + originalResource + ">" +
"; rel=\"timegate\",\n";
}

public void setHttpHeaders(MultivaluedMap<String, Object> httpHeaders) {
this.httpHeaders = httpHeaders;
}
}
Loading
Loading