Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pronto fails to load OPMI by inferring wrong encoding #221

Open
ElDeveloper opened this issue Mar 28, 2024 · 0 comments
Open

Pronto fails to load OPMI by inferring wrong encoding #221

ElDeveloper opened this issue Mar 28, 2024 · 0 comments

Comments

@ElDeveloper
Copy link

To reproduce download OWL formatted OPMI from BioPortal.

import pronto
pronto.Ontology('opmi-merged.owl')

The exception below shows up. You'll notice the warning stating that "Windows-1252" was assumed. If I go to io.py and change this line to force "utf-8" as the encoding the file loads just fine. Is there another way to change the encoding of the file I'm loading?

/var/folders/4b/gklb4t292nq0vyg08x59gjzc0000gn/T/ipykernel_29628/2923417831.py:1: UnicodeWarning: unsound encoding, assuming Windows-1252 (73% confidence)
  Ontology('/Users/yoshiki/Downloads/opmi-merged.owl')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
Cell In[188], line 1
----> 1 Ontology('/Users/yoshiki/Downloads/opmi-merged.owl')

File ~/miniconda3/envs/db/lib/python3.11/site-packages/pronto/ontology.py:283, in Ontology.__init__(self, handle, import_depth, timeout, threads)
    281 for cls in BaseParser.__subclasses__():
    282     if cls.can_parse(typing.cast(str, self.path), buffer):
--> 283         cls(self).parse_from(_handle)  # type: ignore
    284         break
    285 else:

File ~/miniconda3/envs/db/lib/python3.11/site-packages/pronto/parsers/rdfxml.py:84, in RdfXMLParser.parse_from(self, handle, threads)
     82 def parse_from(self, handle, threads=None):
     83     # Load the XML document into an XML Element tree
---> 84     tree: etree.ElementTree = etree.parse(handle)
     86     # Load metadata from the `owl:Ontology` element
     87     owl_ontology = tree.find(_NS["owl"]["Ontology"])

File ~/miniconda3/envs/db/lib/python3.11/xml/etree/ElementTree.py:1218, in parse(source, parser)
   1209 """Parse XML document into element tree.
   1210 
   1211 *source* is a filename or file object containing XML data,
   (...)
   1215 
   1216 """
   1217 tree = ElementTree()
-> 1218 tree.parse(source, parser)
   1219 return tree

File ~/miniconda3/envs/db/lib/python3.11/xml/etree/ElementTree.py:580, in ElementTree.parse(self, source, parser)
    574     parser = XMLParser()
    575     if hasattr(parser, '_parse_whole'):
    576         # The default XMLParser, when it comes from an accelerator,
    577         # can define an internal _parse_whole API for efficiency.
    578         # It can be used to parse the whole source without feeding
    579         # it with chunks.
--> 580         self._root = parser._parse_whole(source)
    581         return self._root
    582 while True:

File ~/miniconda3/envs/db/lib/python3.11/site-packages/pronto/utils/io.py:24, in BufferedReader.read(self, size)
     22 def read(self, size: Optional[int] = -1) -> bytes:
     23     try:
---> 24         return super(BufferedReader, self).read(size)
     25     except ValueError:
     26         if typing.cast(io.BufferedReader, self.closed):

File ~/miniconda3/envs/db/lib/python3.11/site-packages/pronto/utils/io.py:60, in EncodedFile.readinto(self, buffer)
     59 def readinto(self, buffer: ByteString) -> int:
---> 60     chunk = self.read(len(buffer) // 2)
     61     typing.cast(bytearray, buffer)[: len(chunk)] = chunk
     62     return len(chunk)

File ~/miniconda3/envs/db/lib/python3.11/site-packages/pronto/utils/io.py:56, in EncodedFile.read(self, size)
     55 def read(self, size: Optional[int] = -1) -> bytes:
---> 56     chunk = super().read(-1 if size is None else size)
     57     return chunk.replace(b"\r\n", b"\n")

File <frozen codecs>:814, in read(self, size)

File <frozen codecs>:507, in read(self, size, chars, firstline)

File ~/miniconda3/envs/db/lib/python3.11/encodings/cp1252.py:15, in Codec.decode(self, input, errors)
     14 def decode(self,input,errors='strict'):
---> 15     return codecs.charmap_decode(input,errors,decoding_table)

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 29335: character maps to <undefined>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant