-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running amidict
or ami-dictionary
in Windows 10
#62
Comments
Thanks |
Try the following:
then run:
and get
This creates the dictionaries (actually we can probably drop the The Wikipedia errors mean that the format of the Wikipedia page has changed and we need to change the code. This is tedious and common with remote sites. |
Where should I open that file to put the terms? Should I open it in openVirus/Dictionaries? |
Kareena has already correctly created I think you will need We should probably have a 6th directory: where anyone can create and test small dictionaries |
While running the command as per your suggestion @petermr, the dictionary is created but there are multiple errors and some values are left out. eg in my case the input countries were 180 while the output .xml file had only 117 entries. The errors look something like this:
|
Hi, these sound like simple XML parsing errors, suggesting that your input is not well-formed XML. Use something like https://www.xmlvalidation.com/, or post it here for me to check. |
The input was provided as a .txt file as was indicated in the discussion within this thread do I need to convert it to XML first? |
It will be the Wikimedia pages. I will have to out them through a different
cleaner.
Most HTML in the wild is awful.
…On Wed, Jun 17, 2020 at 9:11 AM Ambreen H ***@***.***> wrote:
The input was provided as a .txt file as was indicated in the discussion
within this thread do I need to convert it to XML first?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#62 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFTCS74FJXP3SSRSQ2EUXTRXB3CHANCNFSM4N7U6HWQ>
.
--
Peter Murray-Rust
Founder ContentMine.org
and
Reader Emeritus in Molecular Informatics
Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
|
sorry out => put
On Wed, Jun 17, 2020 at 9:24 AM Peter Murray-Rust <
[email protected]> wrote:
… It will be the Wikimedia pages. I will have to out them through a
different cleaner.
Most HTML in the wild is awful.
On Wed, Jun 17, 2020 at 9:11 AM Ambreen H ***@***.***>
wrote:
> The input was provided as a .txt file as was indicated in the discussion
> within this thread do I need to convert it to XML first?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#62 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAFTCS74FJXP3SSRSQ2EUXTRXB3CHANCNFSM4N7U6HWQ>
> .
>
--
Peter Murray-Rust
Founder ContentMine.org
and
Reader Emeritus in Molecular Informatics
Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
--
Peter Murray-Rust
Founder ContentMine.org
and
Reader Emeritus in Molecular Informatics
Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
|
@petermr where are you looking up these terms? Can't we access a data source that is well-formed XML, e.g. by using an Accept header? |
They are Wikipedia pages. There is no alternative.
Here's the culprit
>
<form action="/w/index.php" id="searchform">
<div id="simpleSearch">
<input type="search" name="search" placeholder="Search Wikipedia" title="Search
Wikipedia [f]" accesskey="f" id="searchInput"/>
<input type="hidden" name="title" value="Special:Search">
<input type="submit" name="fulltext" value="Search" title="Search Wikipedia
for this text" id="mw-searchButton" class="searchButton
mw-fallbackSearchButton"/>
<input type="submit" name="go" value="Go" title="Go to a page with this
exact name if it exists" id="searchButton" class="searchButton"/>
</div>
</form>
On Wed, Jun 17, 2020 at 9:55 AM Richard Light <[email protected]> wrote:
This is HTML5 - its' not well formed. It was a pointless exercise. The
onus is on me to parse it. Drives me wild.
@petermr <https://github.com/petermr> where are you looking up these terms?
… Can't we access a data source that is well-formed XML, e.g. by using an
Accept header?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#62 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFTCS4A5C6DAXFRUAFBR5LRXCAIJANCNFSM4N7U6HWQ>
.
--
Peter Murray-Rust
Founder ContentMine.org
and
Reader Emeritus in Molecular Informatics
Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
|
Another thing that screws people is DTDs. Often they are not resolvable and
we crash. I strip all DTDs.
On Wed, Jun 17, 2020 at 11:12 AM Peter Murray-Rust <
[email protected]> wrote:
… They are Wikipedia pages. There is no alternative.
Here's the culprit
>>
<form action="/w/index.php" id="searchform">
<div id="simpleSearch">
<input type="search" name="search" placeholder="Search Wikipedia" title="Search
Wikipedia [f]" accesskey="f" id="searchInput"/>
<input type="hidden" name="title" value="Special:Search">
<input type="submit" name="fulltext" value="Search" title="Search
Wikipedia for this text" id="mw-searchButton" class="searchButton
mw-fallbackSearchButton"/>
<input type="submit" name="go" value="Go" title="Go to a page with this
exact name if it exists" id="searchButton" class="searchButton"/>
</div>
</form>
On Wed, Jun 17, 2020 at 9:55 AM Richard Light ***@***.***>
wrote:
This is HTML5 - its' not well formed. It was a pointless exercise. The
onus is on me to parse it. Drives me wild.
@petermr <https://github.com/petermr> where are you looking up these
> terms? Can't we access a data source that is well-formed XML, e.g. by using
> an Accept header?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#62 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAFTCS4A5C6DAXFRUAFBR5LRXCAIJANCNFSM4N7U6HWQ>
> .
>
--
Peter Murray-Rust
Founder ContentMine.org
and
Reader Emeritus in Molecular Informatics
Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
--
Peter Murray-Rust
Founder ContentMine.org
and
Reader Emeritus in Molecular Informatics
Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
|
Any scope for using Wikipedia's Linked Data twin dbpedia? (Not sure from the above exactly what you are searching for.) |
I don't think DBPedia gives any advantages. IN any case we are tooled up
for Wikidata.
…On Wed, Jun 17, 2020 at 11:50 AM Richard Light ***@***.***> wrote:
Any scope for using Wikipedia's Linked Data twin dbpedia? (Not sure from
the above exactly what you are searching for.)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#62 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFTCSY35GJWPQWKT56GDT3RXCNXFANCNFSM4N7U6HWQ>
.
--
Peter Murray-Rust
Founder ContentMine.org
and
Reader Emeritus in Molecular Informatics
Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
|
Then why are you searching in Wikipedia? |
On Wed, Jun 17, 2020 at 5:52 PM Richard Light ***@***.***> wrote:
Then why are you searching in Wikipedia?
We're searching in both.
* named wikipedia pages
* wikimedia categories
* pages with lists
and
* wikidata items
* search lists from Wikidata
… —
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#62 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFTCS63IEFP62S3MMAOFT3RXDYEJANCNFSM4N7U6HWQ>
.
--
Peter Murray-Rust
Founder ContentMine.org
and
Reader Emeritus in Molecular Informatics
Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
|
OK, well the advantage of dbpedia is that you can put SPARQL queries to it, and get back machine-processible responses. Of course, it may not have the content you are searching for: it's just the content of the info boxes. If you give me a specific query requirement, I can look into how well dbpedia might be able to answer it. |
But that's what Wikidata does. It has effectively subsumed DBPedia (it has
all the infoboxes and a lot more directly donated.
…On Wed, Jun 17, 2020 at 9:48 PM Richard Light ***@***.***> wrote:
OK, well the advantage of dbpedia is that you can put SPARQL queries to
it, and get back machine-processible responses. Of course, it may not have
the content you are searching for: it's just the content of the info boxes.
If you give me a specific query requirement, I can look into how well
dbpedia might be able to answer it.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#62 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFTCS4H4QNMIWWHUDS5KZ3RXETY3ANCNFSM4N7U6HWQ>
.
--
Peter Murray-Rust
Founder ContentMine.org
and
Reader Emeritus in Molecular Informatics
Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
|
Wikidata is a separate exercise (with its own problems of data consistency). My point is that dbpedia is in lockstep with (= is automatically extracted from) Wikipedia, so if you're searching for a Wikipedia page with a title which matches your dictionary term, you could just as well be searching for the equivalent 'page' in dbpedia. By doing that, you can establish whether the page exists (and therefore whether the corresponding Wikipedia page exists) and get back a reliable machine-processible response. Which brings me back to my original question: what information is there in the corresponding Wikipedia page which we need, and which we can't get from dbpedia? |
There is a lot of software that needs to be written. If you're volunteering
to do this for DBP and show that it's superior to WD, fine. But it's not
top priority. We work closely with the Wikidata group in Berlin.
…On Thu, Jun 18, 2020 at 9:22 AM Richard Light ***@***.***> wrote:
Wikidata is a separate exercise (with its own problems of data
consistency). My point is that dbpedia is in lockstep with (= is
automatically extracted from) Wikipedia, so if you're searching for a
Wikipedia page with a title which matches your dictionary term, you could
just as well be searching for the equivalent 'page' in dbpedia. By doing
that, you can establish whether the page exists (and therefore whether the
corresponding Wikipedia page exists) and get back a reliable
machine-processible response. Which brings me back to my original question:
what *information* is there in the corresponding Wikipedia page which we
need, and which we can't get from dbpedia?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#62 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFTCS3JUOGCSMDLL6QE3GLRXHFEJANCNFSM4N7U6HWQ>
.
--
Peter Murray-Rust
Founder ContentMine.org
and
Reader Emeritus in Molecular Informatics
Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
|
Peter, please re-read my comments, noting that I am simply trying to address the problems reported above that Wikipedia responses are not processible. I am not suggesting replacing WD by DBP! |
I am trying to create a dictionary using
amidict
commands' in Windows 10.I have installed AMI and checked its installation using
ami --help
. I have also usedgetpapers
in downloading the papers andami -search
in arranging the papers with respect to the required dictionaries.Now, I am trying to run
amidict
for creating a new dictionary. I was able to give the commandamidict --help
and it showed the commands. (as per in FAQ https://github.com/petermr/openVirus/wiki/FAQ )But when I gave the command
for testing to create a dictionary from tigr2ess tutorial https://github.com/petermr/tigr2ess/blob/master/dictionaries/TUTORIAL.md . I got the following output as
no directory given
I have also tried creating a new directory and gave the same above command. But the same
no directory given
was the output.What shall I change in the syntax to create a new dictionary? Kindly guide me.
The text was updated successfully, but these errors were encountered: