pymatgen.util package

The util package implements various utilities that are commonly used by various packages.

Subpackages

Submodules

pymatgen.util.coord module

Utilities for manipulating coordinates or list of coordinates, under periodic boundary conditions or otherwise. Many of these are heavily vectorized in numpy for performance.

class Simplex(coords)[source]

Bases: MSONable

A generalized simplex object. See https://wikipedia.org/wiki/Simplex.

space_dim[source]

Dimension of the space. Usually, this is 1 more than the simplex_dim.

Type:

int

simplex_dim[source]

Dimension of the simplex coordinate space.

Type:

int

Initialize a Simplex from vertex coordinates.

Parameters:

coords ([[float]]) – Coords of the vertices of the simplex. e.g. [[1, 2, 3], [2, 4, 5], [6, 7, 8], [8, 9, 10].

bary_coords(point)[source]
Parameters:

point (ArrayLike) – Point coordinates.

Returns:

Barycentric coordinations.

property coords: ndarray[source]

A copy of the vertex coordinates in the simplex.

in_simplex(point: Sequence[float], tolerance: float = 1e-08) bool[source]

Check if a point is in the simplex using the standard barycentric coordinate system algorithm.

Taking an arbitrary vertex as an origin, we compute the basis for the simplex from this origin by subtracting all other vertices from the origin. We then project the point into this coordinate system and determine the linear decomposition coefficients in this coordinate system. If the coeffs satisfy all(coeffs >= 0), the composition is in the facet.

Parameters:
  • point (list[float]) – Point to test

  • tolerance (float) – Tolerance to test if point is in simplex.

line_intersection(point1: Sequence[float], point2: Sequence[float], tolerance: float = 1e-08)[source]

Compute the intersection points of a line with a simplex.

Parameters:
  • point1 (Sequence[float]) – 1st point to determine the line.

  • point2 (Sequence[float]) – 2nd point to determine the line.

  • tolerance (float) – Tolerance for checking if an intersection is in the simplex. Defaults to 1e-8.

Returns:

points where the line intersects the simplex (0, 1, or 2).

point_from_bary_coords(bary_coords: ArrayLike)[source]
Parameters:

bary_coords (ArrayLike) – Barycentric coordinates (d+1, d).

Returns:

Point in the simplex.

Return type:

np.array

property volume: float[source]

Volume of the simplex.

all_distances(coords1: ArrayLike, coords2: ArrayLike) np.ndarray[source]

Get the distances between two lists of coordinates.

Parameters:
  • coords1 – First set of Cartesian coordinates.

  • coords2 – Second set of Cartesian coordinates.

Returns:

2d array of Cartesian distances. E.g the distance between coords1[i] and coords2[j] is distances[i,j]

barycentric_coords(coords, simplex)[source]

Convert a list of coordinates to barycentric coordinates, given a simplex with d+1 points. Only works for d >= 2.

Parameters:
  • coords – list of n coords to transform, shape should be (n,d)

  • simplex – list of coordinates that form the simplex, shape should be (d+1, d)

Returns:

a list of barycentric coordinates (even if the original input was 1d)

coord_list_mapping(subset: ArrayLike, superset: ArrayLike, atol: float = 1e-08)[source]

Get the index mapping from a subset to a superset. Subset and superset cannot contain duplicate rows.

Parameters:
  • subset (ArrayLike) – List of coords

  • superset (ArrayLike) – List of coords

  • atol (float) – Absolute tolerance. Defaults to 1e-8.

Returns:

list of indices such that superset[indices] = subset

coord_list_mapping_pbc(subset, superset, atol: float = 1e-08, pbc: PbcLike = (True, True, True))[source]

Get the index mapping from a subset to a superset. Superset cannot contain duplicate matching rows.

Parameters:
  • subset (ArrayLike) – List of frac_coords

  • superset (ArrayLike) – List of frac_coords

  • atol (float) – Absolute tolerance. Defaults to 1e-8.

  • pbc (tuple) – A tuple defining the periodic boundary conditions along the three axis of the lattice.

Returns:

list of indices such that superset[indices] = subset

find_in_coord_list(coord_list, coord, atol: float = 1e-08)[source]

Find the indices of matches of a particular coord in a coord_list.

Parameters:
  • coord_list – List of coords to test

  • coord – Specific coordinates

  • atol – Absolute tolerance. Defaults to 1e-8. Accepts both scalar and array.

Returns:

Indices of matches, e.g. [0, 1, 2, 3]. Empty list if not found.

find_in_coord_list_pbc(frac_coord_list, frac_coord, atol: float = 1e-08, pbc: PbcLike = (True, True, True)) np.ndarray[source]

Get the indices of all points in a fractional coord list that are equal to a fractional coord (with a tolerance), taking into account periodic boundary conditions.

Parameters:
  • frac_coord_list – List of fractional coords

  • frac_coord – A specific fractional coord to test.

  • atol – Absolute tolerance. Defaults to 1e-8.

  • pbc – a tuple defining the periodic boundary conditions along the three axis of the lattice.

Returns:

Indices of matches, e.g. [0, 1, 2, 3]. Empty list if not found.

get_angle(v1: ArrayLike, v2: ArrayLike, units: Literal['degrees', 'radians'] = 'degrees') float[source]

Calculate the angle between two vectors.

Parameters:
  • v1 – Vector 1

  • v2 – Vector 2

  • units – “degrees” or “radians”. Defaults to “degrees”.

Returns:

Angle between them in degrees.

get_linear_interpolated_value(x_values: ArrayLike, y_values: ArrayLike, x: float) float[source]

Get an interpolated value by linear interpolation between two values. This method is written to avoid dependency on scipy, which causes issues on threading servers.

Parameters:
  • x_values – Sequence of x values.

  • y_values – Corresponding sequence of y values

  • x – Get value at particular x

Returns:

Value at x.

in_coord_list(coord_list, coord, atol: float = 1e-08) bool[source]

Test if a particular coord is within a coord_list.

Parameters:
  • coord_list – List of coords to test

  • coord – Specific coordinates

  • atol – Absolute tolerance. Defaults to 1e-8. Accepts both scalar and array.

Returns:

True if coord is in the coord list.

Return type:

bool

in_coord_list_pbc(fcoord_list, fcoord, atol: float = 1e-08, pbc: PbcLike = (True, True, True)) bool[source]

Test if a particular fractional coord is within a fractional coord_list.

Parameters:
  • fcoord_list – List of fractional coords to test

  • fcoord – A specific fractional coord to test.

  • atol – Absolute tolerance. Defaults to 1e-8.

  • pbc – a tuple defining the periodic boundary conditions along the three axis of the lattice.

Returns:

True if coord is in the coord list.

Return type:

bool

is_coord_subset(subset: ArrayLike, superset: ArrayLike, atol: float = 1e-08) bool[source]

Test if all coords in subset are contained in superset. Doesn’t use periodic boundary conditions.

Parameters:
  • subset (ArrayLike) – List of coords

  • superset (ArrayLike) – List of coords

  • atol (float) – Absolute tolerance for comparing coordinates. Defaults to 1e-8.

Returns:

True if all of subset is in superset.

Return type:

bool

is_coord_subset_pbc(subset, superset, atol: float = 1e-08, mask=None, pbc: PbcLike = (True, True, True)) bool[source]

Test if all fractional coords in subset are contained in superset.

Parameters:
  • subset (list) – List of fractional coords to test

  • superset (list) – List of fractional coords to test against

  • atol (float or size 3 array) – Tolerance for matching

  • mask (boolean array) – Mask of matches that are not allowed. i.e. if mask[1,2] is True, then subset[1] cannot be matched to superset[2]

  • pbc (tuple) – a tuple defining the periodic boundary conditions along the three axis of the lattice.

Returns:

True if all of subset is in superset.

Return type:

bool

lattice_points_in_supercell(supercell_matrix)[source]

Get the list of points on the original lattice contained in the supercell in fractional coordinates (with the supercell basis). e.g. [[2,0,0],[0,1,0],[0,0,1]] returns [[0,0,0],[0.5,0,0]].

Parameters:

supercell_matrix – 3x3 matrix describing the supercell

Returns:

numpy array of the fractional coordinates

pbc_diff(frac_coords1: ArrayLike, frac_coords2: ArrayLike, pbc: PbcLike = (True, True, True))[source]

Get the ‘fractional distance’ between two coordinates taking into account periodic boundary conditions.

Parameters:
  • frac_coords1 – First set of fractional coordinates. e.g. [0.5, 0.6, 0.7] or [[1.1, 1.2, 4.3], [0.5, 0.6, 0.7]]. It can be a single coord or any array of coords.

  • frac_coords2 – Second set of fractional coordinates.

  • pbc – a tuple defining the periodic boundary conditions along the three axis of the lattice.

Returns:

Fractional distance. Each coordinate must have the property that abs(a) <= 0.5. Examples: pbc_diff([0.1, 0.1, 0.1], [0.3, 0.5, 0.9]) = [-0.2, -0.4, 0.2] pbc_diff([0.9, 0.1, 1.01], [0.3, 0.5, 0.9]) = [-0.4, -0.4, 0.11]

pbc_shortest_vectors(lattice, frac_coords1, frac_coords2, mask=None, return_d2: bool = False)[source]

Get the shortest vectors between two lists of coordinates taking into account periodic boundary conditions and the lattice.

Parameters:
  • lattice – lattice to use

  • frac_coords1 – First set of fractional coordinates. e.g. [0.5, 0.6, 0.7] or [[1.1, 1.2, 4.3], [0.5, 0.6, 0.7]]. It can be a single coord or any array of coords.

  • frac_coords2 – Second set of fractional coordinates.

  • mask (boolean array) – Mask of matches that are not allowed. i.e. if mask[1,2] is True, then subset[1] cannot be matched to superset[2]

  • return_d2 (bool) – whether to also return the squared distances

Returns:

of displacement vectors from frac_coords1 to frac_coords2

first index is frac_coords1 index, second is frac_coords2 index

Return type:

np.ndarray

pymatgen.util.coord_cython module

Utilities for manipulating coordinates or list of coordinates, under periodic boundary conditions or otherwise.

coord_list_mapping_pbc(subset, superset, atol=1e-08, pbc=(True, True, True))[source]

Gives the index mapping from a subset to a superset. Superset cannot contain duplicate matching rows.

Parameters:
  • subset – List of frac_coords.

  • superset – List of frac_coords.

  • pbc – a tuple defining the periodic boundary conditions along the three axis of the lattice.

Returns:

list of indices such that superset[indices] = subset

is_coord_subset_pbc(subset, superset, atol, mask, pbc=(True, True, True))[source]

Tests if all fractional coords in subset are contained in superset. Allows specification of a mask determining pairs that are not allowed to match to each other.

Parameters:
  • subset – List of fractional coords.

  • superset – List of fractional coords.

  • pbc – a tuple defining the periodic boundary conditions along the three axis of the lattice.

Returns:

True if all of subset is in superset.

pbc_shortest_vectors(lattice, fcoords1, fcoords2, mask=None, return_d2=False, lll_frac_tol=None)[source]

Get the shortest vectors between two lists of coordinates taking into account periodic boundary conditions and the lattice.

Parameters:
  • lattice – lattice to use

  • fcoords1 – First set of fractional coordinates. e.g., [0.5, 0.6, 0.7] or [[1.1, 1.2, 4.3], [0.5, 0.6, 0.7]]. Must be np.float64

  • fcoords2 – Second set of fractional coordinates.

  • mask (int_ array) – Mask of matches that are not allowed. i.e. if mask[1,2] == True, then subset[1] cannot be matched to superset[2]

  • lll_frac_tol (float_ array of length 3) – Fractional tolerance (per LLL lattice vector) over which the calculation of minimum vectors will be skipped. Can speed up calculation considerably for large structures.

Returns:

of displacement vectors from fcoords1 to fcoords2

first index is fcoords1 index, second is fcoords2 index

Return type:

np.array

pymatgen.util.due module

Stub file for a guaranteed safe import of duecredit constructs: if duecredit is not available.

Then use in your code as

from .due import due, Doi, BibTeX, Text

See https://github.com/duecredit/duecredit/blob/master/README.md for examples.

Origin: Originally a part of the duecredit Copyright: 2015-2021 DueCredit developers License: BSD-2

BibTeX(*args, **kwargs)[source]

Perform no good and no bad.

Doi(*args, **kwargs)[source]

Perform no good and no bad.

class InactiveDueCreditCollector[source]

Bases: object

Just a stub at the Collector which would not do anything.

activate(*args, **kwargs)[source]

Perform no good and no bad.

active = False[source]
add(*args, **kwargs)[source]

Perform no good and no bad.

cite(*args, **kwargs)[source]

Perform no good and no bad.

dcite(*args, **kwargs)[source]

If I could cite I would.

dump(*args, **kwargs)[source]

Perform no good and no bad.

load(*args, **kwargs)[source]

Perform no good and no bad.

Text(*args, **kwargs)[source]

Perform no good and no bad.

Url(*args, **kwargs)[source]

Perform no good and no bad.

pymatgen.util.graph_hashing module

Copyright (C) 2004-2022, NetworkX Developers Aric Hagberg <hagberg@lanl.gov> Dan Schult <dschult@colgate.edu> Pieter Swart <swart@lanl.gov> All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  • Neither the name of the NetworkX Developers nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


Functions for hashing graphs to strings. Isomorphic graphs should be assigned identical hashes. For now, only Weisfeiler-Lehman hashing is implemented.

weisfeiler_lehman_graph_hash(graph: nx.Graph, edge_attr=None, node_attr=None, iterations=3, digest_size=16)[source]

Return Weisfeiler Lehman (WL) graph hash.

The function iteratively aggregates and hashes neighborhoods of each node. After each node’s neighbors are hashed to obtain updated node labels, a hashed histogram of resulting labels is returned as the final hash.

Hashes are identical for isomorphic graphs and strong guarantees that non-isomorphic graphs will get different hashes. See [1]_ for details.

If no node or edge attributes are provided, the degree of each node is used as its initial label. Otherwise, node and/or edge labels are used to compute the hash.

Parameters:
  • graph – nx.Graph The graph to be hashed. Can have node and/or edge attributes. Can also have no attributes.

  • edge_attr – string, default=None The key in edge attribute dictionary to be used for hashing. If None, edge labels are ignored.

  • node_attr – string, default=None The key in node attribute dictionary to be used for hashing. If None, and no edge_attr given, use the degrees of the nodes as labels.

  • iterations – int, default=3 Number of neighbor aggregations to perform. Should be larger for larger graphs.

  • digest_size – int, default=16 Size (in bits) of blake2b hash digest to use for hashing node labels.

Returns:

string

Hexadecimal string corresponding to hash of the input graph.

Return type:

h

Notes

To return the WL hashes of each subgraph of a graph, use weisfeiler_lehman_subgraph_hashes

Similarity between hashes does not imply similarity between graphs.

References

Kurt Mehlhorn, and Karsten M. Borgwardt. Weisfeiler Lehman Graph Kernels. Journal of Machine Learning Research. 2011. https://www.jmlr.org/papers/volume12/shervashidze11a/shervashidze11a.pdf

See also

weisfeiler_lehman_subgraph_hashes

weisfeiler_lehman_subgraph_hashes(graph, edge_attr=None, node_attr=None, iterations=3, digest_size=16)[source]

Return a dictionary of subgraph hashes by node.

The dictionary is keyed by node to a list of hashes in increasingly sized induced subgraphs containing the nodes within 2*k edges of the key node for increasing integer k until all nodes are included.

The function iteratively aggregates and hashes neighborhoods of each node. This is achieved for each step by replacing for each node its label from the previous iteration with its hashed 1-hop neighborhood aggregate. The new node label is then appended to a list of node labels for each node.

To aggregate neighborhoods at each step for a node $n$, all labels of nodes adjacent to $n$ are concatenated. If the edge_attr parameter is set, labels for each neighboring node are prefixed with the value of this attribute along the connecting edge from this neighbor to node $n$. The resulting string is then hashed to compress this information into a fixed digest size.

Thus, at the $i$th iteration nodes within $2i$ distance influence any given hashed node label. We can therefore say that at depth $i$ for node $n$ we have a hash for a subgraph induced by the $2i$-hop neighborhood of $n$.

Can be used to to create general Weisfeiler-Lehman graph kernels, or generate features for graphs or nodes, for example to generate ‘words’ in a graph as seen in the ‘graph2vec’ algorithm. See [1]_ & [2]_ respectively for details.

Hashes are identical for isomorphic subgraphs and there exist strong guarantees that non-isomorphic graphs will get different hashes. See [1]_ for details.

If no node or edge attributes are provided, the degree of each node is used as its initial label. Otherwise, node and/or edge labels are used to compute the hash.

Parameters:
  • graph – nx.Graph The graph to be hashed. Can have node and/or edge attributes. Can also have no attributes.

  • edge_attr – string, default=None The key in edge attribute dictionary to be used for hashing. If None, edge labels are ignored.

  • node_attr – string, default=None The key in node attribute dictionary to be used for hashing. If None, and no edge_attr given, use the degrees of the nodes as labels.

  • iterations – int, default=3 Number of neighbor aggregations to perform. Should be larger for larger graphs.

  • digest_size – int, default=16 Size (in bits) of blake2b hash digest to use for hashing node labels. The default size is 16 bits

Returns:

dict

A dictionary with each key given by a node in G, and each value given by the subgraph hashes in order of depth from the key node.

Return type:

node_subgraph_hashes

Notes

To hash the full graph when subgraph hashes are not needed, use weisfeiler_lehman_graph_hash for efficiency.

Similarity between hashes does not imply similarity between graphs.

References

Kurt Mehlhorn, and Karsten M. Borgwardt. Weisfeiler Lehman Graph Kernels. Journal of Machine Learning Research. 2011. https://www.jmlr.org/papers/volume12/shervashidze11a/shervashidze11a.pdf .. [2] Annamalai Narayanan, Mahinthan Chandramohan, Rajasekar Venkatesan, Lihui Chen, Yang Liu and Shantanu Jaiswa. graph2vec: Learning Distributed Representations of Graphs. arXiv. 2017 https://arxiv.org/pdf/1707.05005.pdf

See also

weisfeiler_lehman_graph_hash

pymatgen.util.io_utils module

This module provides utility classes for io operations.

clean_lines(string_list: list[str], remove_empty_lines: bool = True, rstrip_only: bool = False) Iterator[str][source]

Strips whitespace, carriage returns and empty lines from a list of strings.

Parameters:
  • string_list (list[str]) – List of strings.

  • remove_empty_lines (bool) – Set to True to skip lines which are empty after stripping.

  • rstrip_only (bool) – Set to True to strip trailing whitespaces only (i.e., to retain leading whitespaces). Defaults to False.

Yields:

str – clean strings with no whitespaces.

micro_pyawk(filename, search, results=None, debug=None, postdebug=None)[source]

Small awk-mimicking search routine.

‘file’ is file to search through. ‘search’ is the “search program”, a list of lists/tuples with 3 elements; i.e. [[regex, test, run], [regex, test, run], …] ‘results’ is a an object that your search program will have access to for storing results.

Here regex is either as a Regex object, or a string that we compile into a Regex. test and run are callable objects.

This function goes through each line in filename, and if regex matches that line and test(results,line)==True (or test is None) we execute run(results,match), where match is the match object from running Regex.match.

The default results is an empty dictionary. Passing a results object let you interact with it in run() and test(). Hence, in many occasions it is thus clever to use results=self.

Author: Rickard Armiento, Ioannis Petousis

Returns:

The results dictionary.

Return type:

dict[str, Any]

pymatgen.util.joblib module

This module provides utility functions for getting progress bar with joblib.

set_python_warnings(warnings)[source]

Context manager to set the PYTHONWARNINGS environment variable to the given value. This is useful for preventing spam when using parallel processing.

tqdm_joblib(tqdm_object: tqdm) Iterator[None][source]

Context manager to patch joblib to report into tqdm progress bar given as argument.

pymatgen.util.misc module

Other util functions.

is_np_dict_equal(dict1, dict2, /) bool[source]

Compare two dict whose value could be NumPy arrays.

Parameters:
  • dict1 (dict) – The first dict.

  • dict2 (dict) – The second dict.

Returns:

Whether these two dicts are equal.

Return type:

bool

pymatgen.util.num module

This module provides utilities for basic math operations.

make_symmetric_matrix_from_upper_tri(val)[source]

Given a symmetric matrix in upper triangular matrix form as flat array indexes as: [A_xx,A_yy,A_zz,A_xy,A_xz,A_yz] This will generate the full matrix: [[A_xx,A_xy,A_xz],[A_xy,A_yy,A_yz],[A_xz,A_yz,A_zz].

round_to_sigfigs(num, sig_figs)[source]

Rounds a number rounded to a specific number of significant figures instead of to a specific precision.

pymatgen.util.numba module

This module provides a wrapper for numba such that no functionality is lost if numba is not available. Numba is a just-in-time compiler that can significantly accelerate the evaluation of certain functions if installed.

jit(func)[source]

Replacement for numba.jit when numba is not installed that does nothing.

njit(func)[source]

Replacement for numba.njit when numba is not installed that does nothing.

pymatgen.util.plotting module

Utilities for generating nicer plots.

add_fig_kwargs(func)[source]

Decorator that adds keyword arguments for functions returning matplotlib figures.

The function should return either a matplotlib figure or None to signal some sort of error/unexpected event. See doc string below for the list of supported options.

format_formula(formula: str) str[source]

Convert str of chemical formula into latex format for labelling purposes.

Parameters:

formula (str) – Chemical formula

get_ax3d_fig(ax: Axes = None, **kwargs) tuple[Axes3D, Figure][source]

Helper function used in plot functions supporting an optional Axes3D argument. If ax is None, we build the matplotlib figure and create the Axes3D else we return the current active figure.

Parameters:
  • ax (Axes3D, optional) – Axes3D object. Defaults to None.

  • kwargs – keyword arguments are passed to plt.figure if ax is not None.

Returns:

matplotlib Axes3D and corresponding figure objects

Return type:

tuple[Axes3D, Figure]

get_ax_fig(ax: Axes = None, **kwargs) tuple[Axes, Figure][source]

Helper function used in plot functions supporting an optional Axes argument. If ax is None, we build the matplotlib figure and create the Axes else we return the current active figure.

Parameters:
  • ax (Axes, optional) – Axes object. Defaults to None.

  • kwargs – keyword arguments are passed to plt.figure if ax is not None.

Returns:

matplotlib Axes object and Figure objects

Return type:

tuple[Axes, Figure]

get_axarray_fig_plt(ax_array, nrows=1, ncols=1, sharex: bool = False, sharey: bool = False, squeeze: bool = True, subplot_kw=None, gridspec_kw=None, **fig_kw)[source]

Helper function used in plot functions that accept an optional array of Axes as argument. If ax_array is None, we build the matplotlib figure and create the array of Axes by calling plt.subplots else we return the current active figure.

Returns:

Array of Axes objects figure: matplotlib figure plt: matplotlib pyplot module.

Return type:

ax

periodic_table_heatmap(elemental_data=None, cbar_label='', cbar_label_size=14, show_plot: bool = False, cmap='YlOrRd', cmap_range=None, blank_color='grey', edge_color='white', value_format=None, value_fontsize=10, symbol_fontsize=14, max_row: int = 9, readable_fontcolor=False, pymatviz: bool = True, **kwargs)[source]

A static method that generates a heat map overlaid on a periodic table.

Parameters:
  • elemental_data (dict) – A dictionary with the element as a key and a value assigned to it, e.g. surface energy and frequency, etc. Elements missing in the elemental_data will be grey by default in the final table elemental_data={“Fe”: 4.2, “O”: 5.0}.

  • cbar_label (str) – Label of the color bar. Default is “”.

  • cbar_label_size (float) – Font size for the color bar label. Default is 14.

  • cmap_range (tuple) – Minimum and maximum value of the color map scale. If None, the color map will automatically scale to the range of the data.

  • show_plot (bool) – Whether to show the heatmap. Default is False.

  • value_format (str) – Formatting string to show values. If None, no value is shown. Example: “%.4f” shows float to four decimals.

  • value_fontsize (float) – Font size for values. Default is 10.

  • symbol_fontsize (float) – Font size for element symbols. Default is 14.

  • cmap (str) – Color scheme of the heatmap. Default is ‘YlOrRd’. Refer to the matplotlib documentation for other options.

  • blank_color (str) – Color assigned for the missing elements in elemental_data. Default is “grey”.

  • edge_color (str) – Color assigned for the edge of elements in the periodic table. Default is “white”.

  • max_row (int) – Maximum number of rows of the periodic table to be shown. Default is 9, which means the periodic table heat map covers the standard 7 rows of the periodic table + 2 rows for the lanthanides and actinides. Use a value of max_row = 7 to exclude the lanthanides and actinides.

  • readable_fontcolor (bool) – Whether to use readable font color depending on background color. Default is False.

  • pymatviz (bool) – Whether to use pymatviz to generate the heatmap. Defaults to True. See https://github.com/janosh/pymatviz.

  • kwargs – Passed to pymatviz.ptable_heatmap_plotly

Returns:

matplotlib Axes object

Return type:

plt.Axes

pretty_plot(width: float = 8, height: float | None = None, ax: Axes = None, dpi: float | None = None, color_cycle: tuple[str, str] = ('qualitative', 'Set1_9')) Axes[source]

Get a publication quality plot, with nice defaults for font sizes etc.

Parameters:
  • width (float) – Width of plot in inches. Defaults to 8in.

  • height (float) – Height of plot in inches. Defaults to width * golden ratio.

  • ax (Axes) – If ax is supplied, changes will be made to an existing plot. Otherwise, a new plot will be created.

  • dpi (float) – Sets dot per inch for figure. Defaults to 300.

  • color_cycle (tuple) – Set the color cycle for new plots to one of the color sets in palettable. Defaults to a qualitative Set1_9.

Returns:

matplotlib axes object with properly sized fonts.

Return type:

Axes

pretty_plot_two_axis(x, y1, y2, xlabel=None, y1label=None, y2label=None, width: float = 8, height: float | None = None, dpi=300, **plot_kwargs)[source]

Variant of pretty_plot that does a dual axis plot. Adapted from matplotlib examples. Makes it easier to create plots with different axes.

Parameters:
  • x (Sequence[float]) – Data for x-axis.

  • y1 (Sequence[float] | dict[str, Sequence[float]]) – Data for y1 axis (left). If a dict, it will be interpreted as a {label: sequence}.

  • y2 (Sequence[float] | dict[str, Sequence[float]]) – Data for y2 axis (right). If a dict, it will be interpreted as a {label: sequence}.

  • xlabel (str) – If not None, this will be the label for the x-axis.

  • y1label (str) – If not None, this will be the label for the y1-axis.

  • y2label (str) – If not None, this will be the label for the y2-axis.

  • width (float) – Width of plot in inches. Defaults to 8in.

  • height (float) – Height of plot in inches. Defaults to width * golden ratio.

  • dpi (int) – Sets dot per inch for figure. Defaults to 300.

  • plot_kwargs – Passthrough kwargs to matplotlib’s plot method. e.g. linewidth, etc.

Returns:

matplotlib axes object with properly sized fonts.

Return type:

plt.Axes

pretty_polyfit_plot(x: ArrayLike, y: ArrayLike, deg: int = 1, xlabel=None, ylabel=None, **kwargs)[source]

Convenience method to plot data with trend lines based on polynomial fit.

Parameters:
  • x – Sequence of x data.

  • y – Sequence of y data.

  • deg (int) – Degree of polynomial. Defaults to 1.

  • xlabel (str) – Label for x-axis.

  • ylabel (str) – Label for y-axis.

  • kwargs – Keyword args passed to pretty_plot.

Returns:

plt.Axes

van_arkel_triangle(list_of_materials: Sequence, annotate: bool = True)[source]

A static method that generates a binary van Arkel-Ketelaar triangle to quantify the ionic, metallic and covalent character of a compound by plotting the electronegativity difference (y) vs average (x). See:

A.E. van Arkel, Molecules and Crystals in Inorganic Chemistry,

Interscience, New York (1956)

and
J.A.A Ketelaar, Chemical Constitution (2nd edition), An Introduction

to the Theory of the Chemical Bond, Elsevier, New York (1958).

Parameters:
  • list_of_materials (list) – A list of computed entries of binary materials or a list of lists containing two elements (str).

  • annotate (bool) – Whether or not to label the points on the triangle with reduced formula (if list of entries) or pair of elements (if list of list of str).

Returns:

matplotlib Axes object

Return type:

plt.Axes

pymatgen.util.provenance module

Classes and methods related to the Structure Notation Language (SNL).

class Author(name: str, email: str)[source]

Bases: NamedTuple

An Author contains two fields: name and email. It is meant to represent the author of a Structure or the author of a code that was applied to a Structure.

Create new instance of Author(name, email)

as_dict()[source]

Get MSONable dict.

email: str[source]

Alias for field number 1

classmethod from_dict(dct: dict) Self[source]
Parameters:

dct (dict) – Dict representation.

Returns:

Author

name: str[source]

Alias for field number 0

classmethod parse_author(author) Self[source]

Parse an Author object from either a String, dict, or tuple.

Parameters:

author – A String formatted as “NAME <email@domain.com>”, (name, email) tuple, or a dict with name and email keys.

Returns:

An Author object.

class HistoryNode(name: str, url: str, description: str)[source]

Bases: NamedTuple

A HistoryNode represents a step in the chain of events that lead to a Structure. HistoryNodes leave ‘breadcrumbs’ so that you can trace back how a Structure was created. For example, a HistoryNode might represent pulling a Structure from an external database such as the ICSD or CSD. Or, it might represent the application of a code (e.g. pymatgen) to the Structure, with a custom description of how that code was applied (e.g. a site removal Transformation was applied).

A HistoryNode contains three fields:

name[source]

The name of a code or resource that this Structure encountered in its history.

Type:

str

url[source]

The URL of that code/resource.

Type:

str

description[source]

A free-form description of how the code/resource is related to the Structure.

Type:

str

Create new instance of HistoryNode(name, url, description)

as_dict() dict[str, str][source]

Get MSONable dict.

description: str[source]

Alias for field number 2

classmethod from_dict(dct: dict[str, str]) Self[source]
Parameters:

dct (dict) – Dict representation.

Returns:

HistoryNode

name: str[source]

Alias for field number 0

classmethod parse_history_node(h_node) Self[source]

Parse a History Node object from either a dict or a tuple.

Parameters:

h_node – A dict with name/url/description fields or a 3-element tuple.

Returns:

HistoryNode

url: str[source]

Alias for field number 1

class StructureNL(struct_or_mol, authors, projects=None, references='', remarks=None, data=None, history=None, created_at=None)[source]

Bases: object

The Structure Notation Language (SNL, pronounced ‘snail’) is a container for a pymatgen Structure/Molecule object with some additional fields for enhanced provenance.

It is meant to be imported/exported in a JSON file format with the following structure:
  • sites

  • lattice (optional)

  • about
    • created_at

    • authors

    • projects

    • references

    • remarks

    • data

    • history

Parameters:
  • struct_or_mol – A pymatgen Structure/Molecule object

  • authorsList of {“name”:’’, “email”:’’} dicts, list of Strings as ‘John Doe <johndoe@gmail.com>’, or a single String with commas separating authors

  • projects – List of Strings [‘Project A’, ‘Project B’]

  • references – A String in BibTeX format

  • remarks – List of Strings [‘Remark A’, ‘Remark B’]

  • data – A free form dict. Namespaced at the root level with an underscore, e.g. {“_materialsproject”: <custom data>}

  • history – List of dicts - [{‘name’:’’, ‘url’:’’, ‘description’:{}}]

  • created_at – A datetime object.

as_dict()[source]

Get MSONable dict.

classmethod from_dict(dct: dict) Self[source]
Parameters:

dct (dict) – Dict representation.

Returns:

Class

classmethod from_structures(structures: Sequence[Structure], authors: Sequence[dict[str, str]], projects=None, references='', remarks=None, data=None, histories=None, created_at=None) list[Self][source]

A convenience method for getting a list of StructureNL objects by specifying structures and metadata separately. Some of the metadata is applied to all of the structures for ease of use.

Parameters:
  • structures – A list of Structure objects

  • authorsList of {“name”:’’, “email”:’’} dicts, list of Strings as ‘John Doe <johndoe@gmail.com>’, or a single String with commas separating authors

  • projects – List of Strings [‘Project A’, ‘Project B’]. This applies to all structures.

  • references – A String in BibTeX format. Again, this applies to all structures.

  • remarks – List of Strings [‘Remark A’, ‘Remark B’]

  • data – A list of free form dict. Namespaced at the root level with an underscore, e.g. {“_materialsproject”:<custom data>} . The length of data should be the same as the list of structures if not None.

  • histories – List of list of dicts - [[{‘name’:’’, ‘url’:’’, ‘description’:{}}], …] The length of histories should be the same as the list of structures if not None.

  • created_at – A datetime object

is_valid_bibtex(reference: str) bool[source]

Use pybtex to validate that a reference is in proper BibTeX format.

Parameters:

reference – A String reference in BibTeX format.

Returns:

True if reference is valid bibtex.

Return type:

bool

pymatgen.util.string module

This module provides utility classes for string operations.

TODO: make standalone functions in this module use the same implementation as Stringify Note: previous deprecations of standalone functions in this module were removed due to a community need.

class Stringify[source]

Bases: object

Mix-in class for string formatting, e.g. superscripting numbers and symbols or superscripting.

STRING_MODE = 'SUBSCRIPT'[source]
to_html_string() str[source]

Generate a HTML formatted string. This uses the output from to_latex_string to generate a HTML output.

Returns:

HTML formatted string.

to_latex_string() str[source]

Generate a LaTeX formatted string. The mode is set by the class variable STRING_MODE, which defaults to “SUBSCRIPT”. e.g. Fe2O3 is transformed to Fe$_{2}$O$_{3}$. Setting STRING_MODE to “SUPERSCRIPT” creates superscript, e.g. Fe2+ becomes Fe^{2+}. The initial string is obtained from the class’s __str__ method.

Returns:

for LaTeX display with proper sub-/superscripts.

Return type:

str

to_pretty_string() str[source]

A pretty string representation. By default, the __str__ output is used, but this method can be overridden if a different representation from default is desired.

to_unicode_string() str[source]

Unicode string with proper sub and superscripts. Note that this works only with systems where the sub and superscripts are pure integers.

charge_string(charge: float, brackets: bool = True, explicit_one: bool = True) str[source]

Get a string representing the charge of an Ion. By default, the charge is placed in brackets with the sign preceding the magnitude, e.g. ‘[+2]’. For uncharged species, the string returned is ‘(aq)’.

Parameters:
  • charge (float) – The charge of the ion.

  • brackets (bool) – Whether to enclose the charge in brackets, e.g. [+2]. Default is True.

  • explicit_one (bool) – whether to include the number one for monovalent ions, e.g. “+1” rather than “+”. Default is True.

disordered_formula(disordered_struct: Structure, symbols: Sequence[str] = ('x', 'y', 'z'), fmt: Literal['plain', 'HTML', 'LaTex'] = 'plain') str[source]

Get a formula of a form like AxB1-x (x=0.5) for disordered structures. Will only return a formula for disordered structures with one kind of disordered site at present.

Parameters:
  • disordered_struct (Structure) – a disordered structure.

  • symbols (Sequence[str]) – Characters to use for subscripts, by default this is (‘x’, ‘y’, ‘z’) but if you have more than three disordered species more symbols will need to be added.

  • fmt (str) – ‘plain’, ‘HTML’ or ‘LaTeX’,

Returns:

a disordered formula string

Return type:

str

formula_double_format(afloat: float, ignore_ones: bool = True, tol: float = 1e-08) float | Literal[''][source]

Format a float for pretty formulas. E.g. “Li1.0 Fe1.0 P1.0 O4.0” -> “LiFePO4”.

Parameters:
  • afloat (float) – The float to be formatted.

  • ignore_ones (bool) – if true, floats of 1.0 are ignored.

  • tol (float) – Absolute tolerance to round to nearest int. i.e. (2 + 1E-9) -> 2.

Returns:

Formatted float for formulas.

Return type:

float | “”

htmlify(formula: str) str[source]

Generate a HTML formatted formula, e.g. Fe2O3 is transformed to Fe<sub>2</sub>O</sub>3</sub>.

Note that Composition now has a to_html_string() method that may be used instead.

Parameters:

formula (str) – The string to format.

latexify(formula: str, bold: bool = False) str[source]

Generate a LaTeX formatted formula. e.g. Fe2O3 is transformed to Fe$_{2}$O$_{3}$.

Note that Composition now has to_latex_string method that may be used instead.

Parameters:
  • formula (str) – Input formula.

  • bold (bool) – Whether to make the subscripts bold. Defaults to False.

Returns:

Formula suitable for display as in LaTeX with proper subscripts.

Return type:

str

latexify_spacegroup(spacegroup_symbol: str) str[source]

Generate a latex formatted spacegroup. e.g. P2_1/c is converted to P2$_{1}$/c and P-1 is converted to P$\overline{1}$.

Note that SymmetryGroup now has a to_latex_string() method that may be called instead.

Parameters:

spacegroup_symbol (str) – A spacegroup symbol

Returns:

A latex formatted spacegroup with proper subscripts and overlines.

Return type:

str

str_delimited(results: Sequence[Sequence[Any]], header: Sequence[str] | None = None, delimiter: str = '\t') str[source]

Given a tuple of tuples, generate a delimited string form. >>> results = ((“a”, “b”, “c”), (“d”, “e”, “f”), (1, 2, 3)) >>> print(str_delimited(results, delimiter=”,”))

a,b,c d,e,f 1,2,3.

Parameters:
  • results (Sequence[Sequence[Any]]) – 2D sequence of arbitrary types.

  • header (Sequence[str]) – optional headers.

  • delimiter (str) – Defaults to “t” for tab-delimited output.

Returns:

Aligned string output in a table-like format.

Return type:

str

stream_has_colors(stream: TextIO) bool[source]

True if stream supports colors. Python cookbook, #475186.

transformation_to_string(matrix: ArrayLike, translation_vec: Vector3D = (0, 0, 0), components: tuple[str, str, str] = ('x', 'y', 'z'), c: str = '', delim: str = ',') str[source]

Convenience method. Given matrix returns string, e.g. x+2y+1/4.

Parameters:
  • matrix (ArrayLike) – A 3x3 matrix.

  • translation_vec (Vector3D) – The translation vector. Defaults to (0, 0, 0).

  • components (tuple[str, str, str]) – The components. Either (‘x’, ‘y’, ‘z’) or (‘a’, ‘b’, ‘c’). Defaults to (‘x’, ‘y’, ‘z’).

  • c (str) – An optional additional character to print (used for magmoms). Defaults to “”.

  • delim (str) – A delimiter. Defaults to “,”.

Returns:

xyz string.

Return type:

str

unicodeify(formula: str) str[source]

Generate a formula with unicode subscripts, e.g. Fe2O3 is transformed to Fe₂O₃. Does not support formulae with decimal points.

Note that Composition now has a to_unicode_string() method that may be used instead.

Parameters:

formula (str) – The string to format.

unicodeify_spacegroup(spacegroup_symbol: str) str[source]

Generate a unicode formatted spacegroup. e.g. P2$_{1}$/c is converted to P2₁/c and P$\overline{1}$ is converted to P̅1.

Note that SymmetryGroup now has a to_unicode_string() method that may be called instead.

Parameters:

spacegroup_symbol (str) – A spacegroup symbol as LaTeX.

Returns:

A unicode spacegroup with proper subscripts and overlines.

Return type:

str

unicodeify_species(specie_string: str) str[source]

Generate a unicode formatted species string, with appropriate superscripts for oxidation states.

Note that Species now has to_unicode_string method that may be used instead.

Parameters:

specie_string (str) – Species string, e.g. “O2-”

Returns:

Species string, e.g. “O²⁻”

Return type:

str

pymatgen.util.typing module

This module defines convenience types for type hinting purposes. Type hinting is new to pymatgen, so this module is subject to change until best practices are established.