Utilities for manipulating coordinates or list of coordinates, under periodic
boundary conditions or otherwise. Many of these are heavily vectorized in
numpy for performance.
Check if a point is in the simplex using the standard barycentric
coordinate system algorithm.
Taking an arbitrary vertex as an origin, we compute the basis for the
simplex from this origin by subtracting all other vertices from the
origin. We then project the point into this coordinate system and
determine the linear decomposition coefficients in this coordinate
system. If the coeffs satisfy all(coeffs >= 0), the composition
is in the facet.
Parameters:
point (list[float]) – Point to test
tolerance (float) – Tolerance to test if point is in simplex.
Get the indices of all points in a fractional coord list that are
equal to a fractional coord (with a tolerance), taking into account
periodic boundary conditions.
Parameters:
frac_coord_list – List of fractional coords
frac_coord – A specific fractional coord to test.
atol – Absolute tolerance. Defaults to 1e-8.
pbc – a tuple defining the periodic boundary conditions along the three
axis of the lattice.
Returns:
Indices of matches, e.g. [0, 1, 2, 3]. Empty list if not found.
Get an interpolated value by linear interpolation between two values.
This method is written to avoid dependency on scipy, which causes issues on
threading servers.
Get the list of points on the original lattice contained in the
supercell in fractional coordinates (with the supercell basis).
e.g. [[2,0,0],[0,1,0],[0,0,1]] returns [[0,0,0],[0.5,0,0]].
Parameters:
supercell_matrix – 3x3 matrix describing the supercell
Get the ‘fractional distance’ between two coordinates taking into
account periodic boundary conditions.
Parameters:
frac_coords1 – First set of fractional coordinates. e.g. [0.5, 0.6,
0.7] or [[1.1, 1.2, 4.3], [0.5, 0.6, 0.7]]. It can be a single
coord or any array of coords.
frac_coords2 – Second set of fractional coordinates.
pbc – a tuple defining the periodic boundary conditions along the three
axis of the lattice.
Returns:
Fractional distance. Each coordinate must have the property that
abs(a) <= 0.5. Examples:
pbc_diff([0.1, 0.1, 0.1], [0.3, 0.5, 0.9]) = [-0.2, -0.4, 0.2]
pbc_diff([0.9, 0.1, 1.01], [0.3, 0.5, 0.9]) = [-0.4, -0.4, 0.11]
Get the shortest vectors between two lists of coordinates taking into
account periodic boundary conditions and the lattice.
Parameters:
lattice – lattice to use
frac_coords1 – First set of fractional coordinates. e.g. [0.5, 0.6, 0.7]
or [[1.1, 1.2, 4.3], [0.5, 0.6, 0.7]]. It can be a single
coord or any array of coords.
frac_coords2 – Second set of fractional coordinates.
mask (boolean array) – Mask of matches that are not allowed.
i.e. if mask[1,2] is True, then subset[1] cannot be matched
to superset[2]
return_d2 (bool) – whether to also return the squared distances
Returns:
of displacement vectors from frac_coords1 to frac_coords2
first index is frac_coords1 index, second is frac_coords2 index
Tests if all fractional coords in subset are contained in superset.
Allows specification of a mask determining pairs that are not
allowed to match to each other.
Parameters:
subset – List of fractional coords.
superset – List of fractional coords.
pbc – a tuple defining the periodic boundary conditions along the three
axis of the lattice.
Get the shortest vectors between two lists of coordinates taking into
account periodic boundary conditions and the lattice.
Parameters:
lattice – lattice to use
fcoords1 – First set of fractional coordinates. e.g., [0.5, 0.6, 0.7]
or [[1.1, 1.2, 4.3], [0.5, 0.6, 0.7]]. Must be np.float64
fcoords2 – Second set of fractional coordinates.
mask (int_ array) – Mask of matches that are not allowed.
i.e. if mask[1,2] == True, then subset[1] cannot be matched
to superset[2]
lll_frac_tol (float_ array of length 3) – Fractional tolerance (per LLL lattice vector) over which
the calculation of minimum vectors will be skipped.
Can speed up calculation considerably for large structures.
Returns:
of displacement vectors from fcoords1 to fcoords2
first index is fcoords1 index, second is fcoords2 index
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following
disclaimer in the documentation and/or other materials provided
with the distribution.
Neither the name of the NetworkX Developers nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
“AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Functions for hashing graphs to strings.
Isomorphic graphs should be assigned identical hashes.
For now, only Weisfeiler-Lehman hashing is implemented.
The function iteratively aggregates and hashes neighborhoods of each node.
After each node’s neighbors are hashed to obtain updated node labels,
a hashed histogram of resulting labels is returned as the final hash.
Hashes are identical for isomorphic graphs and strong guarantees that
non-isomorphic graphs will get different hashes. See [1]_ for details.
If no node or edge attributes are provided, the degree of each node
is used as its initial label.
Otherwise, node and/or edge labels are used to compute the hash.
Parameters:
graph – nx.Graph
The graph to be hashed.
Can have node and/or edge attributes. Can also have no attributes.
edge_attr – string, default=None
The key in edge attribute dictionary to be used for hashing.
If None, edge labels are ignored.
node_attr – string, default=None
The key in node attribute dictionary to be used for hashing.
If None, and no edge_attr given, use the degrees of the nodes as labels.
iterations – int, default=3
Number of neighbor aggregations to perform.
Should be larger for larger graphs.
digest_size – int, default=16
Size (in bits) of blake2b hash digest to use for hashing node labels.
Returns:
string
Hexadecimal string corresponding to hash of the input graph.
Return type:
h
Notes
To return the WL hashes of each subgraph of a graph, use
weisfeiler_lehman_subgraph_hashes
Similarity between hashes does not imply similarity between graphs.
The dictionary is keyed by node to a list of hashes in increasingly
sized induced subgraphs containing the nodes within 2*k edges
of the key node for increasing integer k until all nodes are included.
The function iteratively aggregates and hashes neighborhoods of each node.
This is achieved for each step by replacing for each node its label from
the previous iteration with its hashed 1-hop neighborhood aggregate.
The new node label is then appended to a list of node labels for each
node.
To aggregate neighborhoods at each step for a node $n$, all labels of
nodes adjacent to $n$ are concatenated. If the edge_attr parameter is set,
labels for each neighboring node are prefixed with the value of this attribute
along the connecting edge from this neighbor to node $n$. The resulting string
is then hashed to compress this information into a fixed digest size.
Thus, at the $i$th iteration nodes within $2i$ distance influence any given
hashed node label. We can therefore say that at depth $i$ for node $n$
we have a hash for a subgraph induced by the $2i$-hop neighborhood of $n$.
Can be used to to create general Weisfeiler-Lehman graph kernels, or
generate features for graphs or nodes, for example to generate ‘words’ in a
graph as seen in the ‘graph2vec’ algorithm.
See [1]_ & [2]_ respectively for details.
Hashes are identical for isomorphic subgraphs and there exist strong
guarantees that non-isomorphic graphs will get different hashes.
See [1]_ for details.
If no node or edge attributes are provided, the degree of each node
is used as its initial label.
Otherwise, node and/or edge labels are used to compute the hash.
Parameters:
graph – nx.Graph
The graph to be hashed.
Can have node and/or edge attributes. Can also have no attributes.
edge_attr – string, default=None
The key in edge attribute dictionary to be used for hashing.
If None, edge labels are ignored.
node_attr – string, default=None
The key in node attribute dictionary to be used for hashing.
If None, and no edge_attr given, use the degrees of the nodes as labels.
iterations – int, default=3
Number of neighbor aggregations to perform.
Should be larger for larger graphs.
digest_size – int, default=16
Size (in bits) of blake2b hash digest to use for hashing node labels.
The default size is 16 bits
Returns:
dict
A dictionary with each key given by a node in G, and each value given
by the subgraph hashes in order of depth from the key node.
Return type:
node_subgraph_hashes
Notes
To hash the full graph when subgraph hashes are not needed, use
weisfeiler_lehman_graph_hash for efficiency.
Similarity between hashes does not imply similarity between graphs.
‘file’ is file to search through.
‘search’ is the “search program”, a list of lists/tuples with 3 elements;
i.e. [[regex, test, run], [regex, test, run], …]
‘results’ is a an object that your search program will have access to for
storing results.
Here regex is either as a Regex object, or a string that we compile into a
Regex. test and run are callable objects.
This function goes through each line in filename, and if regex matches that
line and test(results,line)==True (or test is None) we execute
run(results,match), where match is the match object from running
Regex.match.
The default results is an empty dictionary. Passing a results object let
you interact with it in run() and test(). Hence, in many occasions it is
thus clever to use results=self.
Given a symmetric matrix in upper triangular matrix form as flat array indexes as:
[A_xx,A_yy,A_zz,A_xy,A_xz,A_yz]
This will generate the full matrix:
[[A_xx,A_xy,A_xz],[A_xy,A_yy,A_yz],[A_xz,A_yz,A_zz].
This module provides a wrapper for numba such that no functionality
is lost if numba is not available. Numba is a just-in-time compiler
that can significantly accelerate the evaluation of certain functions
if installed.
Decorator that adds keyword arguments for functions returning matplotlib
figures.
The function should return either a matplotlib figure or None to signal
some sort of error/unexpected event.
See doc string below for the list of supported options.
Helper function used in plot functions supporting an optional Axes3D
argument. If ax is None, we build the matplotlib figure and create the
Axes3D else we return the current active figure.
Parameters:
ax (Axes3D, optional) – Axes3D object. Defaults to None.
kwargs – keyword arguments are passed to plt.figure if ax is not None.
Returns:
matplotlib Axes3D and corresponding figure objects
Helper function used in plot functions supporting an optional Axes argument.
If ax is None, we build the matplotlib figure and create the Axes else
we return the current active figure.
Parameters:
ax (Axes, optional) – Axes object. Defaults to None.
kwargs – keyword arguments are passed to plt.figure if ax is not None.
Helper function used in plot functions that accept an optional array of Axes
as argument. If ax_array is None, we build the matplotlib figure and
create the array of Axes by calling plt.subplots else we return the
current active figure.
A static method that generates a heat map overlaid on a periodic table.
Parameters:
elemental_data (dict) – A dictionary with the element as a key and a
value assigned to it, e.g. surface energy and frequency, etc.
Elements missing in the elemental_data will be grey by default
in the final table elemental_data={“Fe”: 4.2, “O”: 5.0}.
cbar_label (str) – Label of the color bar. Default is “”.
cbar_label_size (float) – Font size for the color bar label. Default is 14.
cmap_range (tuple) – Minimum and maximum value of the color map scale.
If None, the color map will automatically scale to the range of the
data.
show_plot (bool) – Whether to show the heatmap. Default is False.
value_format (str) – Formatting string to show values. If None, no value
is shown. Example: “%.4f” shows float to four decimals.
value_fontsize (float) – Font size for values. Default is 10.
symbol_fontsize (float) – Font size for element symbols. Default is 14.
cmap (str) – Color scheme of the heatmap. Default is ‘YlOrRd’.
Refer to the matplotlib documentation for other options.
blank_color (str) – Color assigned for the missing elements in
elemental_data. Default is “grey”.
edge_color (str) – Color assigned for the edge of elements in the
periodic table. Default is “white”.
max_row (int) – Maximum number of rows of the periodic table to be
shown. Default is 9, which means the periodic table heat map covers
the standard 7 rows of the periodic table + 2 rows for the lanthanides
and actinides. Use a value of max_row = 7 to exclude the lanthanides and
actinides.
readable_fontcolor (bool) – Whether to use readable font color depending
on background color. Default is False.
A static method that generates a binary van Arkel-Ketelaar triangle to
quantify the ionic, metallic and covalent character of a compound
by plotting the electronegativity difference (y) vs average (x).
See:
A.E. van Arkel, Molecules and Crystals in Inorganic Chemistry,
Interscience, New York (1956)
and
J.A.A Ketelaar, Chemical Constitution (2nd edition), An Introduction
to the Theory of the Chemical Bond, Elsevier, New York (1958).
Parameters:
list_of_materials (list) – A list of computed entries of binary
materials or a list of lists containing two elements (str).
annotate (bool) – Whether or not to label the points on the
triangle with reduced formula (if list of entries) or pair
of elements (if list of list of str).
An Author contains two fields: name and email. It is meant to represent
the author of a Structure or the author of a code that was applied to a Structure.
A HistoryNode represents a step in the chain of events that lead to a
Structure. HistoryNodes leave ‘breadcrumbs’ so that you can trace back how
a Structure was created. For example, a HistoryNode might represent pulling
a Structure from an external database such as the ICSD or CSD. Or, it might
represent the application of a code (e.g. pymatgen) to the Structure, with
a custom description of how that code was applied (e.g. a site removal
Transformation was applied).
The Structure Notation Language (SNL, pronounced ‘snail’) is a container for a pymatgen
Structure/Molecule object with some additional fields for enhanced provenance.
It is meant to be imported/exported in a JSON file format with the following structure:
sites
lattice (optional)
about
created_at
authors
projects
references
remarks
data
history
Parameters:
struct_or_mol – A pymatgen Structure/Molecule object
authors – List of {“name”:’’, “email”:’’} dicts,
list of Strings as ‘John Doe <johndoe@gmail.com>’,
or a single String with commas separating authors
projects – List of Strings [‘Project A’, ‘Project B’]
references – A String in BibTeX format
remarks – List of Strings [‘Remark A’, ‘Remark B’]
data – A free form dict. Namespaced at the root level with an
underscore, e.g. {“_materialsproject”: <custom data>}
history – List of dicts - [{‘name’:’’, ‘url’:’’, ‘description’:{}}]
A convenience method for getting a list of StructureNL objects by
specifying structures and metadata separately. Some of the metadata
is applied to all of the structures for ease of use.
Parameters:
structures – A list of Structure objects
authors – List of {“name”:’’, “email”:’’} dicts,
list of Strings as ‘John Doe <johndoe@gmail.com>’,
or a single String with commas separating authors
projects – List of Strings [‘Project A’, ‘Project B’]. This
applies to all structures.
references – A String in BibTeX format. Again, this applies to all
structures.
remarks – List of Strings [‘Remark A’, ‘Remark B’]
data – A list of free form dict. Namespaced at the root level
with an underscore, e.g. {“_materialsproject”:<custom data>}
. The length of data should be the same as the list of
structures if not None.
histories – List of list of dicts - [[{‘name’:’’, ‘url’:’’,
‘description’:{}}], …] The length of histories should be the
same as the list of structures if not None.
This module provides utility classes for string operations.
TODO: make standalone functions in this module use the same implementation as Stringify
Note: previous deprecations of standalone functions in this module were removed due to
a community need.
Generate a LaTeX formatted string. The mode is set by the class variable STRING_MODE, which defaults to
“SUBSCRIPT”. e.g. Fe2O3 is transformed to Fe$_{2}$O$_{3}$. Setting STRING_MODE to “SUPERSCRIPT” creates
superscript, e.g. Fe2+ becomes Fe^{2+}. The initial string is obtained from the class’s __str__ method.
A pretty string representation. By default, the __str__ output is used, but this method can be
overridden if a different representation from default is desired.
Get a string representing the charge of an Ion. By default, the
charge is placed in brackets with the sign preceding the magnitude, e.g.
‘[+2]’. For uncharged species, the string returned is ‘(aq)’.
Parameters:
charge (float) – The charge of the ion.
brackets (bool) – Whether to enclose the charge in brackets, e.g. [+2]. Default is True.
explicit_one (bool) – whether to include the number one for monovalent ions,
e.g. “+1” rather than “+”. Default is True.
Get a formula of a form like AxB1-x (x=0.5)
for disordered structures. Will only return a
formula for disordered structures with one
kind of disordered site at present.
Parameters:
disordered_struct (Structure) – a disordered structure.
symbols (Sequence[str]) – Characters to use for subscripts,
by default this is (‘x’, ‘y’, ‘z’) but if you have more than three
disordered species more symbols will need to be added.
This module defines convenience types for type hinting purposes.
Type hinting is new to pymatgen, so this module is subject to
change until best practices are established.