Skip to content

Commit

Permalink
Merge pull request #8 from JacksonBurns/v1.0.2-dev
Browse files Browse the repository at this point in the history
Addresses a group of issues for the next patch release 1.0.2 of `py2opsin`:

- resolves Packaging fix #7 with a small update to setup.py
- resolves Add a timing comparison to pubchempy #5 with a new performance test and README update
- resolves Add example usage notebook and binder link #6
- resolves Fix GH Pages Docs #4
  • Loading branch information
JacksonBurns authored Mar 21, 2023
2 parents 42d783a + eb4f1aa commit add5964
Show file tree
Hide file tree
Showing 10 changed files with 620 additions and 31 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/format_code.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,4 @@ jobs:
- name: Check Errors
run: |
pycodestyle --statistics --count --max-line-length=150 --show-source .
pycodestyle --statistics --count --max-line-length=150 --ignore=E731 --show-source .
2 changes: 1 addition & 1 deletion .github/workflows/run_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ jobs:
python-version: ${{ matrix.python-version }}
- name: Install Dependencies
run: |
python -m pip install -e .
python -m pip install -e .[dev]
python -m pip install coverage
- name: Run Tests
run: |
Expand Down
36 changes: 36 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@
## Installation
`py2opsin` can be installed with `pip install py2opsin`. It has _zero_ dependencies (`OPSIN v2.7.0` is included in the PyPI package) and should work inside any environment running modern Python.

Try a demo of `py2opsin` live on your browser (no installation required!): [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/JacksonBurns/py2opsin/HEAD?labpath=examples%2Fpy2opsin_example.ipynb)

## Usage
Command-line arguments available in `OPSIN` can be passed through to `py2opsin`:

Expand Down Expand Up @@ -50,12 +52,46 @@ Arguments:
- wildcard_radicals (bool, optional): Output radicals as wildcards. Defaults to False.
- jar_fpath (str, optional): Filepath to OPSIN jar file. Defaults to "opsin-cli.jar" which is distributed with py2opsin.


## Speedup 50x from `pubchempy`
`py2opsin` runs locally and is smaller in scope in what it provides, which makes it __dramatically__ faster at resolving identifiers. In the code block below, the call to `py2opsin` will execute ~58x faster than an equivalent call to `puchempy`:
```python
import time

from pubchempy import PubChemHTTPError, get_compounds
from py2opsin import py2opsin

compound_list = [
"dienochlor",
"kepone",
...
"ditechnetium decacarbonyl",
]

for compound in compound_list:
result = get_compounds(compound, "name")

smiles_strings = py2opsin(compound_list)
```


## Examples
- Jeremy Monat's ([@bertiewooster](https://github.com/bertiewooster)) fantastic [blog post](https://bertiewooster.github.io/2023/03/10/Revisiting-a-Classic-Cheminformatics-Paper-The-Wiener-Index.html) using `py2opsin` to help explore the Wiener Index by enabling translation from IUPAC names into molecules directly from the original paper.

## Online Documentation
[Click here to read the documentation](https://JacksonBurns.github.io/py2opsin/)

## Contributing & Developer Notes
Pull Requests, Bug Reports, and all Contributions are welcome! Please use the appropriate issue or pull request template when making a contribution.

When submitting a PR, please mark your PR with the "PR Ready for Review" label when you are finished making changes so that the GitHub actions bots can work their magic!

### Developer Install

To contribute to the `py2opsin` source code, start by cloning the repository (i.e. `git clone [email protected]:JacksonBurns/py2opsin.git`) and then inside the repository run `pip install -e .[dev]`. This will set you up with all the required dependencies to run `astartes` and conform to our formatting standards (`black` and `isort`), which you can configure to run automatically in vscode [like this](https://marcobelo.medium.com/setting-up-python-black-on-visual-studio-code-5318eba4cd00#:~:text=Go%20to%20settings%20in%20your,%E2%80%9D%20and%20select%20%E2%80%9Cblack%E2%80%9D.).

__Note for Windows Powershell or MacOS Catalina or newer__: On these systems the command line will complain about square brackets, so you will need to double quote the `molecules` command (i.e. `pip install -e ".

## License
`OPSIN` and `py2opsin` are both distributed under the MIT license.

Expand Down
3 changes: 1 addition & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,14 @@

# -- Path setup --------------------------------------------------------------

import codecs
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import codecs
import sys


sys.path.insert(0, os.path.abspath('.'))
sys.path.insert(0, os.path.abspath('..'))

Expand Down
125 changes: 121 additions & 4 deletions examples/py2opsin_example.ipynb
Original file line number Diff line number Diff line change
@@ -1,12 +1,129 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "65d0bde2",
"metadata": {},
"source": [
"# `py2opsin` to Resolve IUPAC Names to SMILES\n",
"Start by installing `py2opsin` from PyPI with this command:\n",
"`pip install py2opsin`\n",
"\n",
"This install includes a copy of `OPSIN` itself, so there is not additional setup required to make it work!"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 2,
"id": "c71ab290",
"metadata": {},
"outputs": [],
"source": []
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: py2opsin in /home/jackson/py2opsin/py2opsin (1.0.2)\n"
]
}
],
"source": [
"!pip install py2opsin"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "97cebdb6",
"metadata": {},
"source": [
"With `py2opsin` installed, you can now resolve names into SMILES strings, InChi, or any of the supported output formats either one input at a time, or in a list:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "22993d71",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"O\n"
]
}
],
"source": [
"from py2opsin import py2opsin\n",
"water_smiles = py2opsin(\"water\")\n",
"print(water_smiles)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "2bef9f0b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['InChI=1/C2H6/c1-2/h1-2H3', 'InChI=1/CH4/h1H4', 'InChI=1/C3H8/c1-3-2/h3H2,1-2H3']\n"
]
}
],
"source": [
"iupac_list = [\n",
" 'ethane',\n",
" 'methane',\n",
" 'propane',\n",
"]\n",
"hydrocarbon_inchis = py2opsin(iupac_list, output_format=\"InChI\")\n",
"print(hydrocarbon_inchis)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "9f4c65a4",
"metadata": {},
"source": [
"The following arguments are supported, which can control the behavior of `OPSIN` or optionally specify a different executable path altogether:\n",
" - chemical_name (str): IUPAC name of chemical as a Python string, or a list of strings.\n",
" - output_format (str, optional): One of \"SMILES\", \"CML\", \"InChI\", \"StdInChI\", or \"StdInChIKey\". Defaults to \"SMILES\".\n",
" - allow_acid (bool, optional): Allow interpretation of acids. Defaults to False.\n",
" - allow_radicals (bool, optional): Enable radical interpretation. Defaults to False.\n",
" - allow_bad_stereo (bool, optional): Allow OPSIN to ignore uninterpreatable stereochem. Defaults to False.\n",
" - wildcard_radicals (bool, optional): Output radicals as wildcards. Defaults to False.\n",
" - jar_fpath (str, optional): Filepath to OPSIN jar file. Defaults to \"opsin-cli.jar\" which is distributed with py2opsin.\n",
"\n",
"If you make a mistake when asking for the desired output, `py2opsin` will offer a helpful suggestion, too:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "62c68952",
"metadata": {},
"outputs": [
{
"ename": "RuntimeError",
"evalue": "Output format StandInChIKey is invalid. Did you mean 'StdInChIKey'?",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mRuntimeError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[11], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m py2opsin(\u001b[39m\"\u001b[39;49m\u001b[39mmethanol\u001b[39;49m\u001b[39m\"\u001b[39;49m, \u001b[39m\"\u001b[39;49m\u001b[39mStandInChIKey\u001b[39;49m\u001b[39m\"\u001b[39;49m)\n",
"File \u001b[0;32m~/py2opsin/py2opsin/py2opsin/py2opsin.py:74\u001b[0m, in \u001b[0;36mpy2opsin\u001b[0;34m(chemical_name, output_format, allow_acid, allow_radicals, allow_bad_stereo, wildcard_radicals, jar_fpath)\u001b[0m\n\u001b[1;32m 57\u001b[0m possiblity \u001b[39m=\u001b[39m get_close_matches(\n\u001b[1;32m 58\u001b[0m output_format,\n\u001b[1;32m 59\u001b[0m [\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 67\u001b[0m n\u001b[39m=\u001b[39m\u001b[39m1\u001b[39m,\n\u001b[1;32m 68\u001b[0m )\n\u001b[1;32m 69\u001b[0m addendum \u001b[39m=\u001b[39m (\n\u001b[1;32m 70\u001b[0m \u001b[39m\"\u001b[39m\u001b[39m Did you mean \u001b[39m\u001b[39m'\u001b[39m\u001b[39m{:s}\u001b[39;00m\u001b[39m'\u001b[39m\u001b[39m?\u001b[39m\u001b[39m\"\u001b[39m\u001b[39m.\u001b[39mformat(possiblity[\u001b[39m0\u001b[39m])\n\u001b[1;32m 71\u001b[0m \u001b[39mif\u001b[39;00m possiblity\n\u001b[1;32m 72\u001b[0m \u001b[39melse\u001b[39;00m \u001b[39m\"\u001b[39m\u001b[39m Try help(py2opsin).\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[1;32m 73\u001b[0m )\n\u001b[0;32m---> 74\u001b[0m \u001b[39mraise\u001b[39;00m \u001b[39mRuntimeError\u001b[39;00m(\n\u001b[1;32m 75\u001b[0m \u001b[39m\"\u001b[39m\u001b[39mOutput format \u001b[39m\u001b[39m{:s}\u001b[39;00m\u001b[39m is invalid.\u001b[39m\u001b[39m\"\u001b[39m\u001b[39m.\u001b[39mformat(output_format) \u001b[39m+\u001b[39m addendum\n\u001b[1;32m 76\u001b[0m )\n\u001b[1;32m 78\u001b[0m \u001b[39m# write the input to a text file\u001b[39;00m\n\u001b[1;32m 79\u001b[0m temp_f \u001b[39m=\u001b[39m \u001b[39m\"\u001b[39m\u001b[39mpy2opsin_temp_input.txt\u001b[39m\u001b[39m\"\u001b[39m\n",
"\u001b[0;31mRuntimeError\u001b[0m: Output format StandInChIKey is invalid. Did you mean 'StdInChIKey'?"
]
}
],
"source": [
"py2opsin(\"methanol\", \"StandInChIKey\")"
]
}
],
"metadata": {
Expand All @@ -25,7 +142,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.0"
"version": "3.11.0"
}
},
"nbformat": 4,
Expand Down
2 changes: 1 addition & 1 deletion py2opsin/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
from .py2opsin import py2opsin

__version__ = "1.0.1"
__version__ = "1.0.2"
17 changes: 11 additions & 6 deletions py2opsin/py2opsin.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,19 @@
from difflib import get_close_matches
import os
import subprocess
import sys
import warnings
import pkg_resources

from difflib import get_close_matches
from typing import Union

try:
from importlib.resources import files

pkg_fopen = lambda fname: files("py2opsin") / fname
except ImportError:
from pkg_resources import resource_filename

pkg_fopen = lambda fname: resource_filename(__name__, fname)


def py2opsin(
chemical_name: Union[str, list],
Expand All @@ -33,9 +40,7 @@ def py2opsin(
str: Species in requested format, or False if not found or an error occoured. List of strings if input is list.
"""
if jar_fpath == "default":
jar_fpath = pkg_resources.resource_filename(
__name__, "opsin-cli-2.7.0-jar-with-dependencies.jar"
)
jar_fpath = pkg_fopen("opsin-cli-2.7.0-jar-with-dependencies.jar")

# default arguments to start
arg_list = ["java", "-jar", jar_fpath]
Expand Down
10 changes: 7 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
import os.path
import codecs
import os.path
import pathlib
from setuptools import setup, find_packages

from setuptools import find_packages, setup


def read(rel_path):
Expand Down Expand Up @@ -36,6 +37,9 @@ def get_version(rel_path):
license="MIT",
classifiers=["Programming Language :: Python :: 3"],
install_requires=[],
packages=find_packages(),
extras_require={"dev": ["pubchempy", "black", "pytest", "isort"]},
packages=find_packages(
exclude=["test*", "docs*", "examples*"], include=["py2opsin*"]
),
include_package_data=True,
)
Loading

0 comments on commit add5964

Please sign in to comment.