Skip to content

Commit

Permalink
Merge pull request #170 from WMD-group/0.6.1_updates
Browse files Browse the repository at this point in the history
Fix `parse_species` to handle non-integer oxidation states
  • Loading branch information
AntObi authored Sep 18, 2024
2 parents c6b93a7 + cb93b20 commit 6c8bd34
Show file tree
Hide file tree
Showing 10 changed files with 213 additions and 48 deletions.
4 changes: 4 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,7 @@ repos:
args: [--toml, pyproject.toml]
additional_dependencies:
- tomli
- repo: https://github.com/adamchainz/blacken-docs
rev: 1.18.0
hooks:
- id: blacken-docs
52 changes: 35 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,21 +71,25 @@ With -e pip will create links to the source folder so that changes to the code w

For simple usage, you can instantiate an Embedding object using one of the embeddings in the [data directory](src/elementembeddings/data/element_representations/README.md). For this example, let's use the magpie elemental representation.

```python
```pycon
# Import the class
>>> from elementembeddings.core import Embedding

# Load the magpie data
>>> magpie = Embedding.load_data('magpie')
>>> magpie = Embedding.load_data("magpie")
```

We can access some of the properties of the `Embedding` class. For example, we can find the dimensions of the elemental representation and the list of elements for which an embedding exists.

```python
```pycon
# Print out some of the properties of the ElementEmbeddings class
>>> print(f'The magpie representation has embeddings of dimension {magpie.dim}')
>>> print(f'The magpie representation contains these elements: \n {magpie.element_list}') # prints out all the elements considered for this representation
>>> print(f'The magpie representation contains these features: \n {magpie.feature_labels}') # Prints out the feature labels of the chosen representation
>>> print(f"The magpie representation has embeddings of dimension {magpie.dim}")
>>> print(
... f"The magpie representation contains these elements: \n {magpie.element_list}"
... ) # prints out all the elements considered for this representation
>>> print(
... f"The magpie representation contains these features: \n {magpie.feature_labels}"
... ) # Prints out the feature labels of the chosen representation

The magpie representation has embeddings of dimension 22
The magpie representation contains these elements:
Expand All @@ -102,26 +106,40 @@ We can quickly generate heatmaps of distance/similarity measures between the ele
from elementembeddings.plotter import heatmap_plotter, dimension_plotter
import matplotlib.pyplot as plt

magpie.standardise(inplace=True) # Standardises the representation
magpie.standardise(inplace=True) # Standardises the representation

fig, ax = plt.subplots(1, 1, figsize=(6,6))
fig, ax = plt.subplots(1, 1, figsize=(6, 6))
heatmap_params = {"vmin": -1, "vmax": 1}
heatmap_plotter(embedding=magpie, metric="cosine_similarity",show_axislabels=False,cmap="Blues_r",ax=ax, **heatmap_params)
heatmap_plotter(
embedding=magpie,
metric="cosine_similarity",
show_axislabels=False,
cmap="Blues_r",
ax=ax,
**heatmap_params
)
ax.set_title("Magpie cosine similarities")
fig.tight_layout()
fig.show()

```

<img src="resources/magpie_cosine_sim_heatmap.png" alt = "Cosine similarity heatmap of the magpie representation" width="50%"/>

```python
fig, ax = plt.subplots(1, 1, figsize=(6,6))

reducer_params={"n_neighbors": 30, "random_state":42}
scatter_params = {"s":100}

dimension_plotter(embedding=magpie, reducer="umap",n_components=2,ax=ax,adjusttext=True,reducer_params=reducer_params, scatter_params=scatter_params)
fig, ax = plt.subplots(1, 1, figsize=(6, 6))

reducer_params = {"n_neighbors": 30, "random_state": 42}
scatter_params = {"s": 100}

dimension_plotter(
embedding=magpie,
reducer="umap",
n_components=2,
ax=ax,
adjusttext=True,
reducer_params=reducer_params,
scatter_params=scatter_params,
)
ax.set_title("Magpie UMAP (n_neighbours=30)")
ax.legend().remove()
handles, labels = ax1.get_legend_handles_labels()
Expand Down Expand Up @@ -149,7 +167,7 @@ The `composition_featuriser` function can be used to featurise the data. The com
```python
from elementembeddings.composition import composition_featuriser

df_featurised = composition_featuriser(df, embedding="magpie", stats=["mean","sum"])
df_featurised = composition_featuriser(df, embedding="magpie", stats=["mean", "sum"])

df_featurised
```
Expand Down
3 changes: 2 additions & 1 deletion contributing.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Contributing
`# Contributing

This is a quick guide on how to follow best practice and contribute smoothly to `ElementEmbeddings`.

Expand Down Expand Up @@ -49,3 +49,4 @@ pre-commit run --all-files # optionally run hooks on all files
```

Pre-commit hooks will check all files when you commit changes, automatically fixing any files which are not formatted correctly. Those files will need to be staged again before re-attempting the commit.
`
4 changes: 2 additions & 2 deletions docs/embeddings/element.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,8 +162,8 @@ The 118 200-dimensional vectors in `random_200_new` were generated using the fol
```python
import numpy as np

mu , sigma = 0 , 1 # mean and standard deviation s = np.random.normal(mu, sigma, 1000)
s = np.random.default_rng(seed=42).normal(mu, sigma, (118,200))
mu, sigma = 0, 1 # mean and standard deviation s = np.random.normal(mu, sigma, 1000)
s = np.random.default_rng(seed=42).normal(mu, sigma, (118, 200))
```

### skipatom
Expand Down
183 changes: 161 additions & 22 deletions docs/tutorials.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,25 +8,150 @@ For simple usage, you can instantiate an Embedding object using one of the embed

```python
# Import the class
>>> from elementembeddings.core import Embedding
from elementembeddings.core import Embedding

# Load the magpie data
>>> magpie = Embedding.load_data('magpie')
magpie = Embedding.load_data("magpie")
```

We can access some of the properties of the `Embedding` class. For example, we can find the dimensions of the elemental representation and the list of elements for which an embedding exists.

```python
# Print out some of the properties of the ElementEmbeddings class
>>> print(f'The magpie representation has embeddings of dimension {magpie.dim}')
>>> print(f'The magpie representation contains these elements: \n {magpie.element_list}') # prints out all the elements considered for this representation
>>> print(f'The magpie representation contains these features: \n {magpie.feature_labels}') # Prints out the feature labels of the chosen representation

The magpie representation has embeddings of dimension 22
The magpie representation contains these elements:
['H', 'He', 'Li', 'Be', 'B', 'C', 'N', 'O', 'F', 'Ne', 'Na', 'Mg', 'Al', 'Si', 'P', 'S', 'Cl', 'Ar', 'K', 'Ca', 'Sc', 'Ti', 'V', 'Cr', 'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn', 'Ga', 'Ge', 'As', 'Se', 'Br', 'Kr', 'Rb', 'Sr', 'Y', 'Zr', 'Nb', 'Mo', 'Tc', 'Ru', 'Rh', 'Pd', 'Ag', 'Cd', 'In', 'Sn', 'Sb', 'Te', 'I', 'Xe', 'Cs', 'Ba', 'La', 'Ce', 'Pr', 'Nd', 'Pm', 'Sm', 'Eu', 'Gd', 'Tb', 'Dy', 'Ho', 'Er', 'Tm', 'Yb', 'Lu', 'Hf', 'Ta', 'W', 'Re', 'Os', 'Ir', 'Pt', 'Au', 'Hg', 'Tl', 'Pb', 'Bi', 'Po', 'At', 'Rn', 'Fr', 'Ra', 'Ac', 'Th', 'Pa', 'U', 'Np', 'Pu', 'Am', 'Cm', 'Bk']
The magpie representation contains these features:
['Number', 'MendeleevNumber', 'AtomicWeight', 'MeltingT', 'Column', 'Row', 'CovalentRadius', 'Electronegativity', 'NsValence', 'NpValence', 'NdValence', 'NfValence', 'NValence', 'NsUnfilled', 'NpUnfilled', 'NdUnfilled', 'NfUnfilled', 'NUnfilled', 'GSvolume_pa', 'GSbandgap', 'GSmagmom', 'SpaceGroupNumber']
print(f"The magpie representation has embeddings of dimension {magpie.dim}")
print(
f"The magpie representation contains these elements: \n {magpie.element_list}"
) # prints out all the elements considered for this representation
print(
f"The magpie representation contains these features: \n {magpie.feature_labels}"
) # Prints out the feature labels of the chosen representation

# The magpie representation has embeddings of dimension 22
# The magpie representation contains these elements:
[
"H",
"He",
"Li",
"Be",
"B",
"C",
"N",
"O",
"F",
"Ne",
"Na",
"Mg",
"Al",
"Si",
"P",
"S",
"Cl",
"Ar",
"K",
"Ca",
"Sc",
"Ti",
"V",
"Cr",
"Mn",
"Fe",
"Co",
"Ni",
"Cu",
"Zn",
"Ga",
"Ge",
"As",
"Se",
"Br",
"Kr",
"Rb",
"Sr",
"Y",
"Zr",
"Nb",
"Mo",
"Tc",
"Ru",
"Rh",
"Pd",
"Ag",
"Cd",
"In",
"Sn",
"Sb",
"Te",
"I",
"Xe",
"Cs",
"Ba",
"La",
"Ce",
"Pr",
"Nd",
"Pm",
"Sm",
"Eu",
"Gd",
"Tb",
"Dy",
"Ho",
"Er",
"Tm",
"Yb",
"Lu",
"Hf",
"Ta",
"W",
"Re",
"Os",
"Ir",
"Pt",
"Au",
"Hg",
"Tl",
"Pb",
"Bi",
"Po",
"At",
"Rn",
"Fr",
"Ra",
"Ac",
"Th",
"Pa",
"U",
"Np",
"Pu",
"Am",
"Cm",
"Bk",
]
# The magpie representation contains these features:
[
"Number",
"MendeleevNumber",
"AtomicWeight",
"MeltingT",
"Column",
"Row",
"CovalentRadius",
"Electronegativity",
"NsValence",
"NpValence",
"NdValence",
"NfValence",
"NValence",
"NsUnfilled",
"NpUnfilled",
"NdUnfilled",
"NfUnfilled",
"NUnfilled",
"GSvolume_pa",
"GSbandgap",
"GSmagmom",
"SpaceGroupNumber",
]
```

### Plotting
Expand All @@ -37,26 +162,40 @@ We can quickly generate heatmaps of distance/similarity measures between the ele
from elementembeddings.plotter import heatmap_plotter, dimension_plotter
import matplotlib.pyplot as plt

magpie.standardise(inplace=True) # Standardises the representation
magpie.standardise(inplace=True) # Standardises the representation

fig, ax = plt.subplots(1, 1, figsize=(6,6))
fig, ax = plt.subplots(1, 1, figsize=(6, 6))
heatmap_params = {"vmin": -1, "vmax": 1}
heatmap_plotter(embedding=magpie, metric="cosine_similarity",show_axislabels=False,cmap="Blues_r",ax=ax, **heatmap_params)
heatmap_plotter(
embedding=magpie,
metric="cosine_similarity",
show_axislabels=False,
cmap="Blues_r",
ax=ax,
**heatmap_params
)
ax.set_title("Magpie cosine similarities")
fig.tight_layout()
fig.show()

```

![Magpie cosine similarity heatmap](images/magpie_cosine_sim_heatmap.png)

```python
fig, ax = plt.subplots(1, 1, figsize=(6,6))

reducer_params={"n_neighbors": 30, "random_state":42}
scatter_params = {"s":100}

dimension_plotter(embedding=magpie, reducer="umap",n_components=2,ax=ax,adjusttext=True,reducer_params=reducer_params, scatter_params=scatter_params)
fig, ax = plt.subplots(1, 1, figsize=(6, 6))

reducer_params = {"n_neighbors": 30, "random_state": 42}
scatter_params = {"s": 100}

dimension_plotter(
embedding=magpie,
reducer="umap",
n_components=2,
ax=ax,
adjusttext=True,
reducer_params=reducer_params,
scatter_params=scatter_params,
)
ax.set_title("Magpie UMAP (n_neighbours=30)")
ax.legend().remove()
handles, labels = ax1.get_legend_handles_labels()
Expand Down Expand Up @@ -84,7 +223,7 @@ The `composition_featuriser` function can be used to featurise the data. The com
```python
from elementembeddings.composition import composition_featuriser

df_featurised = composition_featuriser(df, embedding="magpie", stats=["mean","sum"])
df_featurised = composition_featuriser(df, embedding="magpie", stats=["mean", "sum"])

df_featurised
```
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

module_dir = os.path.dirname(os.path.abspath(__file__))

VERSION = "0.6"
VERSION = "0.6.1"
DESCRIPTION = "Element Embeddings"
with open(os.path.join(module_dir, "README.md"), encoding="utf-8") as f:
LONG_DESCRIPTION = f.read()
Expand Down
4 changes: 2 additions & 2 deletions src/elementembeddings/data/element_representations/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,8 +162,8 @@ The 118 200-dimensional vectors in `random_200_new` were generated using the fol
```python
import numpy as np

mu , sigma = 0 , 1 # mean and standard deviation s = np.random.normal(mu, sigma, 1000)
s = np.random.default_rng(seed=42).normal(mu, sigma, (118,200))
mu, sigma = 0, 1 # mean and standard deviation s = np.random.normal(mu, sigma, 1000)
s = np.random.default_rng(seed=42).normal(mu, sigma, (118, 200))
```

### skipatom
Expand Down
2 changes: 1 addition & 1 deletion src/elementembeddings/plotter.py
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ def dimension_plotter(
signs = [get_sign(charge) for _, charge in parsed_species]

species_labels = [
rf"$\mathregular{{{element}^{{{abs(charge)}{sign}}}}}}}$"
rf"$\mathregular{{{element}^{{{abs(charge)}{sign}}}}}$"
for (element, charge), sign in zip(parsed_species, signs)
]

Expand Down
3 changes: 3 additions & 0 deletions src/elementembeddings/tests/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,3 +57,6 @@ def test_parse_species(self):
assert species.parse_species("Fe1-") == ("Fe", -1)
assert species.parse_species("Fe+") == ("Fe", 1)
assert species.parse_species("Fe-") == ("Fe", -1)
assert species.parse_species("Fe2.5+") == ("Fe", 2.5)
assert species.parse_species("Fe2.5-") == ("Fe", -2.5)
assert species.parse_species("Fe2.555+") == ("Fe", 2.555)
4 changes: 2 additions & 2 deletions src/elementembeddings/utils/species.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@ def _parse_species_old(species: str) -> tuple[str, int]:
"""
ele = re.match(r"[A-Za-z]+", species).group(0)

charge_match = re.search(r"\d+", species)
ox_state = int(charge_match.group(0)) if charge_match else 0
charge_match = re.search(r"(\d+\.\d+|\d+)", species)
ox_state = float(charge_match.group(1)) if charge_match else 0

if "-" in species:
ox_state *= -1
Expand Down

0 comments on commit 6c8bd34

Please sign in to comment.