Pymatgen (Python Materials Genomics) is a robust, open-source Python library for materials analysis. It currently powers the public Materials Project (, an initiative to make calculated properties of all known inorganic materials available to materials researchers. These are some of the main features:

  1. Highly flexible classes for the representation of Element, Site, Molecule, Structure objects.
  2. Extensive io capabilities to manipulate many VASP ( and ABINIT ( input and output files and the crystallographic information file format. This includes generating Structure objects from vasp input and output. There is also support for Gaussian input files and XYZ file for molecules.
  3. Comprehensive tool to generate and view compositional and grand canonical phase diagrams.
  4. Electronic structure analyses (DOS and Bandstructure).
  5. Integration with the Materials Project REST API.

Pymatgen, like all scientific research, will always be a work in progress. While the development team will always strive to avoid backward incompatible changes, they are sometimes unavoidable, and tough decisions have to be made for the long term health of the code.

Pymatgen is free to use. However, we also welcome your help to improve this library by making your own contributions. These contributions can be in the form of additional tools or modules you develop, or even simple things such as bug reports. Please report any bugs and issues at pymatgen’s Github page. If you wish to be notified of pymatgen releases, you may become a member of pymatgen’s Google Groups page.

The code is mightier than the pen.

Why use pymatgen?

There are many materials analysis codes out there, both commerical and free. So you might ask - why should I use pymatgen over others? Pymatgen offer several advantages over other codes out there:

  1. It is (fairly) robust. Pymatgen is used in the Materials Project. As such, the analysis it produces survives rigorous scrutiny every single day. Bugs tend to be found and corrected quickly. Pymatgen also uses CircleCI for continuous integration, which ensures that all unittests pass with every commit.
  2. It is well documented. A fairly comprehensive documentation has been written to help you get to grips with it quickly.
  3. It is open. You are free to use and contribute to pymatgen. It also means that pymatgen is continuously being improved. We have a policy of attributing any code you contribute to any publication you choose. Contributing to pymatgen means your research becomes more visible, which translates to greater impact.
  4. It is fast. Many of the core numerical methods in pymatgen have been optimized by vectorizing in numpy. This means that coordinate manipulations are extremely fast and are in fact comparable to codes written in other languages. Pymatgen also comes with a complete system for handling periodic boundary conditions.

With effect from version 3.0, pymatgen now supports both Python 2.7 as well as Python 3.x. For developers working to add new features to pymatgen, this also means that all new code going forward has to be Python 2.7+ and 3 compatible. Our approach is to have a single codebase support Python 2.7 and 3.x, as per current best practices. Please review the coding guidelines.

Change log


  • Add warning for limited subgroup testing functionality in Spacegroup.

Older versions

Getting pymatgen

Guided install

For users who intend to use pymatgen purely as an analysis library (without developing on it), a user-friendly script has been written to guide users through the installation process for 64-bit Linux and Mac users. This installation script requires only basic Python 2.7+, setuptools, and a working version of gcc as prerequisites. Click to download the script. Move the script to an empty directory and then run:


Unless you are working in a virtual environment, you will probably need to run the above command with admin privileges (e.g., sudo). This will install pymatgen with all basic dependencies.

To include more optional dependencies, build the enumlib and bader executables as well as a step-by-step initial setup for POTCARs and Materials API usage, run:

python -f

The full installation requires a Fortran compiler (ifort or gfortran) to be in the PATH, as well as X11 (XQuartz on Mac) to be installed for matplotlib.

Stable version



Before installing pymatgen, you may need to first install a few critical dependencies manually.

  1. Installation has been tested to be most successful with gcc, and several dependencies have issues with icc. Use gcc where possible and do “export CC=gcc” prior to installation.

  2. Numpy’s distutils is needed to compile the spglib and pyhull dependencies. This should be the first thing you install.

  3. Although PyYaml can be installed directly through pip without additional preparation, it is highly recommended that you install pyyaml with the C bindings for speed. To do so, install LibYaml first, and then install pyyaml with the command below (see the pyyaml doc for more information):

    python --with-libyaml install

The version at the Python Package Index (PyPI) is always the latest stable release that is relatively bug-free. The easiest way to install pymatgen on any system is to use easy_install or pip, as follows:

easy_install pymatgen


pip install pymatgen

Detailed installation instructions for various platforms (Mac and Windows) are given on this page, including how to setup your machine for POTCAR generation, Materials Project REST interface usage, etc.

Developmental version

The bleeding edge developmental version is at the pymatgen’s Github repo. The developmental version is likely to be more buggy, but may contain new features. The Github version include test files as well for complete unit testing. After cloning the source, you can type:

python install

or to install the package in developmental mode:

python develop

To run the very comprehensive suite of unittests included with the developmental version, make sure you have nose installed and then just type:


in the pymatgen root directory.

Note on Shared Compute Cluster Installation

If you are installing pymatgen on shared computing clusters, e.g., the XSEDE or NERSC resources in the US, there are several things you need to take note of:

  1. Some older clusters have Python 2.6 or older by default. Pymatgen requires Python 2.7 or newer. Sometimes, the cluster may have Python 2.7 that you can load, e.g., using “module load python/2.7”. Otherwise, you are out of luck and you need to contact the cluster admin to install python 2.7 or you can try to install it in your home directory.

  2. Unless you are the sys admin, you will not have write access to the default locations that python installs packages. What you need to do is to install pymatgen (and other dependencies) using the “–user” option:

    pip install pymatgen --user


    python develop --user

    This will install pymatgen in your $HOME/.local/lib/python2.7/site-packages. You may need to add this to your PYTHONPATH variable, e.g., in your .bash_profile if it is not automatically set.

“Sample” Docker version

If you would like to try out pymatgen’s capabilities before committing to an install, one way is to use Docker. The Materials Virtual Lab has created an Docker image for the latest version of pymatgen. After installing Docker for your platform, you may pull and run the pymatgen Docker image as follows:

docker pull materialsvirtuallab/pymatgen
docker run -t -i materialsvirtuallab/pymatgen

This will run ipython shell where you can import pymatgen and run most of the examples. If you want to use your own files to run some examples, you may mount a directory in your host computer containing the files you wish to work in the docker container using the -v option. For example, let’s say you have your files in the /Users/myname/research directory. You may then run docker as follows:

docker run -t -i -v /Users/myname/research:/opt/research materialsvirtuallab/pymatgen

Using pymatgen

pymatgen overview

Overview of a typical workflow for pymatgen.

The figure above provides an overview of the functionality in pymatgen. A typical workflow would involve a user converting data (structure, calculations, etc.) from various sources (first principles calculations, crystallographic and molecule input files, Materials Project, etc.) into Python objects using pymatgen’s io packages, which are then used to perform further structure manipulation or analyses.

Quick start

Useful aliases for commonly used objects are now provided. Supported objects include Element, Composition, Structure, Molecule, Spin and Orbital. Here are some quick examples of the core capabilities and objects:

>>> import pymatgen as mg
>>> si = mg.Element("Si")
>>> si.atomic_mass
>>> si.melting_point
u'1687 K'
>>> comp = mg.Composition("Fe2O3")
>>> comp.weight
>>> #Note that Composition conveniently allows strings to be treated just
>>> #like an Element object.
>>> comp["Fe"]
>>> comp.get_atomic_fraction("Fe")
>>> lattice = mg.Lattice.cubic(4.2)
>>> structure = mg.Structure(lattice, ["Cs", "Cl"],
...                          [[0, 0, 0], [0.5, 0.5, 0.5]])
>>> structure.volume
>>> structure[0]
PeriodicSite: Cs (0.0000, 0.0000, 0.0000) [0.0000, 0.0000, 0.0000]
>>> # You can create a Structure using spacegroup symmetry as well.
>>> li2o = mg.Structure.from_spacegroup("Fm-3m", mg.Lattice.cubic(3),
                                        ["Li", "O"],
                                        [[0.25, 0.25, 0.25], [0, 0, 0]])
>>> #Integrated symmetry analysis tools from spglib.
>>> from pymatgen.symmetry.analyzer import SpacegroupAnalyzer
>>> finder = SpacegroupAnalyzer(structure)
>>> finder.get_spacegroup_symbol()
>>> # Convenient IO to various formats. You can specify various formats.
>>> # Without a filename, a string is returned. Otherwise,
>>> # the output is written to the file. If only the filenmae is provided,
>>> # the format is intelligently determined from a file.
>>> # Reading a structure is similarly easy.
>>> structure = mg.Structure.from_str(open("CsCl.cif").read(), fmt="cif")
>>> structure = mg.Structure.from_file("CsCl.cif")
>>> # Reading and writing a molecule from a file. Supports XYZ and
>>> # Gaussian input and output by default. Support for many other
>>> # formats via the optional openbabel dependency (if installed).
>>> methane = mg.Molecule.from_file("")
>>> # Pythonic API for editing Structures and Molecules (v2.9.1 onwards)
>>> # Changing the specie of a site.
>>> structure[1] = "F"
>>> print(structure)
Structure Summary (Cs1 F1)
Reduced Formula: CsF
abc   :   4.200000   4.200000   4.200000
angles:  90.000000  90.000000  90.000000
Sites (2)
1 Cs     0.000000     0.000000     0.000000
2 F     0.500000     0.500000     0.500000
>>> #Changes species and coordinates (fractional assumed for structures)
>>> structure[1] = "Cl", [0.51, 0.51, 0.51]
>>> print(structure)
Structure Summary (Cs1 Cl1)
Reduced Formula: CsCl
abc   :   4.200000   4.200000   4.200000
angles:  90.000000  90.000000  90.000000
Sites (2)
1 Cs     0.000000     0.000000     0.000000
2 Cl     0.510000     0.510000     0.510000
>>> # Because structure is like a list, it supports most list-like methods
>>> # such as sort, reverse, etc.
>>> structure.reverse()
>>> print(structure)
Structure Summary (Cs1 Cl1)
Reduced Formula: CsCl
abc   :   4.200000   4.200000   4.200000
angles:  90.000000  90.000000  90.000000
Sites (2)
1 Cl     0.510000     0.510000     0.510000
2 Cs     0.000000     0.000000     0.000000
>>> # Molecules function similarly, but with Site and cartesian coords.
>>> # The following changes the C in CH4 to an N and displaces it by 0.01A
>>> # in the x-direction.
>>> methane[0] = "N", [0.01, 0, 0]

The above illustrates only the most basic capabilities of pymatgen. Users are strongly encouraged to explore the usage pages (toc given below).


A good way to explore the functionality of pymatgen is to look at examples. Please check out the ipython notebooks at our examples page.

API documentation

For detailed documentation of all modules and classes, please refer to the API docs.

More resources

The founder and maintainer of pymatgen, Shyue Ping Ong, has conducted several workshops (together with Anubhav Jain) on how to effectively use pymatgen (as well as the extremely useful custodian error management and FireWorks workflow software. The slides for these workshops are available on the Materials Virtual Lab.

pmg - Command line tool

To demonstrate the capabilities of pymatgen and to make it easy for users to quickly use the functionality, pymatgen comes with a set of useful scripts that utilize the library to perform all kinds of analyses. These are installed to your path by default when you install pymatgen through the typical installation routes.

Here, we will discuss the most versatile of these scripts, known as pmg. The typical usage of pmg is:

pmg {analyze, plotdos, plotchgint, convert, symm, view, compare} additional_arguments

At any time, you can use "pmg --help" or "pmg subcommand --help" to bring up a useful help message on how to use these subcommands. Here are a few examples of typical usages:

#Parses all vasp runs in a directory and display the basic energy
#information. Saves the data in a file called vasp_data.gz for subsequent

pmg analyze .

#Plot the dos from the vasprun.xml file.

pmg plotdos vasprun.xml

#Convert between file formats. The script attempts to intelligently
#determine the file type. Input file types supported include CIF,
#vasprun.xml, POSCAR, CSSR. You can force the script to assume certain file
#types by specifying additional arguments. See pmg convert -h.

pmg convert input_filename output_filename.

#Obtain spacegroup information.

pmg symm -s filename1 filename2

#Visualize a structure. Requires VTK to be installed.

pmg view filename

#Compare two structures for similarity

pmg compare filename1 filename2

#Generate a POTCAR with symbols Li_sv O and the PBE functional

pmg generate --potcar Li_sv O --functional PBE


Some add-ons are available for pymatgen today:

  1. The pymatgen-db add-on provides tools to create databases of calculated run data using pymatgen.
  2. The custodian package provides a JIT job management and error correction for calculations.


Pymatgen is developed by a team of volunteers. It is started by a team comprising of MIT and Lawrence Berkeley National Laboratory staff to be a robust toolkit for materials researchers to perform advanced manipulations of structures and analyses.

For pymatgen to continue to grow in functionality and robustness, we rely on other volunteers to develop new analyses and report and fix bugs. We welcome anyone to use our code as-is, but if you could take a few moment to give back to pymatgen in some small way, it would be greatly appreciated. A benefit of contributing is that your code will now be used by other researchers who use pymatgen, and we will include an acknowledgement to you (and any related publications) in pymatgen.

Reporting bugs

A simple way that anyone can contribute is simply to report bugs and issues to the developing team. You can either send an email to the pymatgen’s Google Groups page or even better, submit an Issue in our Github page.

Developing new functionality

Another way to contribute is to submit new code/bugfixes to pymatgen. While you can always zip your code and email it to the maintainer of pymatgen, the best way for anyone to develop pymatgen is by adopting the collaborative Github workflow (see contributing page).

How to cite pymatgen

If you use pymatgen in your research, please consider citing the following work:

Shyue Ping Ong, William Davidson Richards, Anubhav Jain, Geoffroy Hautier, Michael Kocher, Shreyas Cholia, Dan Gunter, Vincent Chevrier, Kristin A. Persson, Gerbrand Ceder. Python Materials Genomics (pymatgen) : A Robust, Open-Source Python Library for Materials Analysis. Computational Materials Science, 2013, 68, 314–319. doi:10.1016/j.commatsci.2012.10.028

In addition, some of pymatgen’s functionality is based on scientific advances / principles developed by various scientists. Please refer to the references page for citation info.


Pymatgen is released under the MIT License. The terms of the license are as follows:

The MIT License (MIT)
Copyright (c) 2011-2012 MIT & LBNL

Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.


About the Pymatgen Development Team

Shyue Ping Ong started Pymatgen in 2011, and is still the project lead.

The Pymatgen Development Team is the set of all contributors to the pymatgen project, including all subprojects.

The full list of contributors are listed in the team page.

Indices and tables