The Prospects Table provides statistics on the number of subject/predicate/object data found in the triple store. It is currently a Map Reduce job that will run against the Apache Rya store and save all the statistics in the prospects table.
Deploy the extras/rya.prospector/target/rya.prospector-<version>-shade.jar
file to the hadoop cluster.
The prospector also requires a configuration file that defines where Accumulo is, which Rya table (has to be the SPO table) to read from, and which table to output to. (Note: Make sure you follow the same schema as the Rya tables (prospects table name: tableprefix_prospects)
A sample configuration file might look like the following:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>prospector.intable</name>
<value>triplestore_spo</value>
</property>
<property>
<name>prospector.outtable</name>
<value>triplestore_prospects</value>
</property>
<property>
<name>prospector.auths</name>
<value>U,FOUO</value>
</property>
<property>
<name>instance</name>
<value>accumulo</value>
</property>
<property>
<name>zookeepers</name>
<value>localhost:2181</value>
</property>
<property>
<name>username</name>
<value>root</value>
</property>
<property>
<name>password</name>
<value>secret</value>
</property>
</configuration>
Run the command, filling in the correct information.
hadoop jar rya.prospector-3.0.4-SNAPSHOT-shade.jar org.apache.rya.prospector.mr.Prospector /tmp/prospectorConf.xml