pymatgen.io.cp2k.sets module
This module defines input sets for CP2K and is a work in progress. The structure/philosophy of this module is based on the VASP input sets in Pymatgen. These sets are meant to contain tested parameters that will result in successful, reproducible, consistent calculations without need for intervention 99% of the time. 99% of the time, you only need to provide a pymatgen structure object and let the defaults take over from there.
The sets are intended to be very general, e.g. a set for geometry relaxation, and so most of the time, if you have specific needs, you can simply specify them via the keyword argument override_default_params (see Section.update() method). If you have the need to create a new input set (say for a standardized high throughput calculation) then you can create a new child of the Cp2kInputSet class.
- In order to implement a new Set within the current code structure, follow this 3 step flow:
Inherit from Cp2kInputSet or one of its children and call the super() constructor
Create the new sections and insert them into self and its subsections as needed
Call self.update(override_default_params) in order to allow user settings.
- class CellOptSet(structure: Structure | Molecule, project_name: str = 'CellOpt', override_default_params: dict = None, **kwargs)[source]
Bases:
DftSet
CP2K input set containing the basic settings for performing geometry optimization. Values are all cp2k defaults, and should be good for most systems of interest.
- Parameters:
structure – Pymatgen structure object
max_drift – Convergence criterion for the maximum geometry change between the current and the last optimizer iteration. This keyword cannot be repeated and it expects precisely one real. Default value: 3.00000000E-003 Default unit: [bohr]
max_force (float) – Convergence criterion for the maximum force component of the current configuration. This keyword cannot be repeated and it expects precisely one real. Default value: 4.50000000E-004 Default unit: [bohr^-1*hartree]
max_iter (int) – Specifies the maximum number of geometry optimization steps. One step might imply several force evaluations for the CG and LBFGS optimizers. This keyword cannot be repeated and it expects precisely one integer. Default value: 200
optimizer (str) – Specify which method to use to perform a geometry optimization. This keyword cannot be repeated and it expects precisely one keyword. BFGS is a quasi-newtonian method, and will best for “small” systems near the minimum. LBFGS is a limited memory version that can be used for “large” (>1000 atom) systems when efficiency outweighs robustness. CG is more robust, especially when you are far from the minimum, but it slower. Default value: BFGS
- class Cp2kInputSet(structure: Structure | Molecule, basis_and_potential: dict | str = 'preferred', multiplicity: int = 0, project_name: str = 'CP2K', override_default_params: dict = None, **kwargs)[source]
Bases:
Cp2kInput
The basic representation of a CP2K input set as a collection of “sections” defining the simulation connected to a structure object. At the most basis level, CP2K requires a &GLOBAL section and &FORCE_EVAL section. Global sets parameters like “RUN_TYPE” or the overall verbosity. FORCE_EVAL is the largest section usually, containing the cell and coordinates of atoms, the DFT settings, and more. This top level input set is meant to initialize GLOBAL and FORCE_EVAL based on a structure object and and sections that the user provides.
Like everything that goes into a cp2k input file, this base input set is essentially a section object. These sets are distinguished by saving default settings for easy implementation of calculations such as relaxation and static calculations. This base set is here to transfer a pymatgen structure object into the input format for cp2k and associate the basis set and pseudopotential to use with each element in the structure.
Generally, this class will not be used directly, and instead one of its child-classes will be used, which contain more predefined initializations of various sections, and, if modifications are required, the user can specify override_default_settings.
- Parameters:
structure – (Structure or Molecule) pymatgen structure or molecule object used to define the lattice, coordinates, and elements. This structure object cannot contain “special” species like the Dummy species, e.g. X, or fractional occupations, e.g. Fe0.2, etc.
potential_and_basis –
(dict) Specifies what basis set and potential to use. Specify these as a dict of the form:
- { element: {‘cardinality’: __, ‘sr’: __, ‘q’: __},
’cardinality’: __, ‘functional’: __}
Where cardinality and functional are overall specifications (for all elements), while <key=’element’> specifies the overrides for a specific element. Currently the following conventions must be followed:
All species of a particular element must have the same potential/basis
multiplicity – (int) Specify the system’s multiplicity if appropriate
project_name – (str) Specify the project name. This will be used to name the output files from a CP2K calculation
override_default_params – (dict) Specifies user-defined settings to override the settings of any input set (See Section.update())
- class DftSet(structure: Structure | Molecule, ot: bool = True, energy_gap: float = -1, eps_default: float = 1e-12, eps_scf: float = 1e-06, max_scf: int | None = None, minimizer: str = 'DIIS', preconditioner: str = 'FULL_SINGLE_INVERSE', algorithm: str = 'IRAC', linesearch: str = '2PNT', rotation: bool = True, occupation_preconditioner: bool = False, cutoff: int = 0, rel_cutoff: int = 50, ngrids: int = 5, progression_factor: int = 3, override_default_params: dict = None, wfn_restart_file_name: str = None, kpoints: Kpoints | None = None, smearing: bool = False, **kwargs)[source]
Bases:
Cp2kInputSet
Base for an input set using the Quickstep module (i.e. a DFT calculation). The DFT section is pretty vast in CP2K, so this set hopes to make the DFT setup fairly simple. The provided parameters are pretty conservative, and so they should not need to be changed very often.
- Parameters:
structure – Pymatgen structure or molecule object
ot (bool) – Whether or not to use orbital transformation method for matrix diagonalization. OT is the flagship scf solver of CP2K, and will provide huge speed-ups for this part of the calculation, but the system must have a band gap for OT to be used (higher band-gap –> faster convergence). Band gap is also used by the preconditioner for OT, and should be set as a value SMALLER than the true band gap to get good efficiency. Generally, this parameter does not need to be changed from default of 0.01
energy_gap (float) – Estimate of energy gap for preconditioner. Default is -1, leaving it up to cp2k.
eps_default (float) – Replaces all EPS_XX Keywords in the DFT section (NOT its subsections!) to have this value, ensuring an overall accuracy of at least this much.
eps_scf (float) –
The convergence criteria for leaving the SCF loop. Default is 1e-6. Should ensure reasonable results, but is not applicable to all situations.
Note: eps_scf is not in units of energy, as in most DFT codes. For OT method, it is the largest gradient of the energy with respect to changing any of the molecular orbital coefficients. For diagonalization, it is the largest change in the density matrix from the last step.
max_scf (int) – The max number of SCF cycles before terminating the solver. NOTE: With the OT solver, this corresponds to the max number of INNER scf loops, and then the outer loops are set with outer_max_scf, while with diagonalization it corresponds to the overall (INNER*OUTER) number of SCF steps, with the inner loop limit set by
minimizer (str) – The minimization scheme. DIIS can be as much as 50% faster than the more robust conjugate gradient method, and so it is chosen as default. Switch to CG if dealing with a difficult system.
preconditioner (str) – Preconditioner for the OT method. FULL_SINGLE_INVERSE is very robust and compatible with non-integer occupations from IRAC+rotation. FULL_ALL is considered “best” but needs algorithm to be set to STRICT. Only change from these two when simulation cell gets to be VERY large, in which case FULL_KINETIC might be preferred.
algorithm (str) – Algorithm for the OT method. STRICT assumes that the orbitals are strictly orthogonal to each other, which works well for wide gap ionic systems, but can diverge for systems with small gaps, fractional occupations, and some other cases. IRAC (iterative refinement of the approximate congruency) transformation is not analytically correct and uses a truncated polynomial expansion, but is robust to the problems with STRICT, and so is the default.
linesearch (str) – Linesearch method for CG. 2PNT is the default, and is the fastest, but is not as robust as 3PNT. 2PNT is required as of cp2k v9.1 for compatibility with irac+rotation. This may be upgraded in the future. 3PNT can be good for wide gapped transition metal systems as an alternative.
rotation (bool) – Whether or not to allow for rotation of the orbitals in the OT method. This equates to allowing for fractional occupations in the calculation.
occupation_preconditioner (bool) – Whether or not to account for fractional occupations in the preconditioner. This method is not fully integrated as of cp2k v9.1 and is set to false by default.
cutoff (int) – Cutoff energy (in Ry) for the finest level of the multigrid. A high cutoff will allow you to have very accurate calculations PROVIDED that REL_CUTOFF is appropriate. By default cutoff is set to 0, which will assign it to be the largest exponent of your basis times the rel_cutoff.
rel_cutoff (int) –
This cutoff decides how the Gaussians are mapped onto the different levels of the multigrid. If REL_CUTOFF is too low, then even if you have a high CUTOFF, all Gaussians will be mapped onto the coarsest level of the multi-grid, and thus the effective integration grid for the calculation may still be too coarse. By default 50Ry is chosen, which should be sufficient given the cutoff is large enough.
From CP2K manual: A Gaussian is mapped onto the coarsest level of the multi-grid, on which the function will cover number of grid points greater than or equal to the number of grid points will cover on a reference grid defined by REL_CUTOFF.
ngrids (int) – number of multi-grids to use. CP2K default is 4, but the molopt basis files recommend 5.
progression_factor (int) – Divisor of CUTOFF to get the cutoff for the next level of the multigrid.
wfn_restart_file_name (str) – RESTART file for the initial wavefunction guess.
kpoints (Kpoints) – kpoints object from pymatgen.io.vasp.inputs.Kpoints. By default, CP2K runs with gamma point only.
smearing (bool) – whether or not to activate smearing (should be done for systems containing no (or a very small) band gap.
- activate_hybrid(hybrid_functional: str = 'PBE0', hf_fraction: float = 0.25, gga_x_fraction: float = 0.75, gga_c_fraction: float = 1, max_memory: int = 2000, cutoff_radius: float = 8.0, potential_type: str = None, scale_coulomb: float = 1, scale_gaussian: float = 1, scale_longrange: float = 1, omega: float = 0.11, aux_basis: dict | None = None, admm: bool = True, eps_schwarz: float = 1e-07, eps_schwarz_forces: float = 1e-06, screen_on_initial_p: bool = True, screen_p_forces: bool = True)[source]
Basic set for activating hybrid DFT calculation using Auxiliary Density Matrix Method.
Note 1: When running ADMM with cp2k, memory is very important. If the memory requirements exceed what is available (see max_memory), then CP2K will have to calculate the 4-electron integrals for HFX during each step of the SCF cycle. ADMM provides a huge speed up by making the memory requirements feasible to fit into RAM, which means you only need to calculate the integrals once each SCF cycle. But, this only works if it fits into memory. When setting up ADMM calculations, we recommend doing whatever is possible to fit all the 4EI into memory.
Note 2: This set is designed for reliable high-throughput calculations, NOT for extreme accuracy. Please review the in-line comments in this method if you want more control.
- Parameters:
hybrid_functional (str) – Type of hybrid functional. This set supports HSE (screened) and PBE0 (truncated). Default is PBE0, which converges easier in the GPW basis used by cp2k.
hf_fraction (float) – fraction of exact HF exchange energy to mix. Default: 0.25
gga_x_fraction (float) – fraction of gga exchange energy to retain. Default: 0.75
gga_c_fraction (float) – fraction of gga correlation energy to retain. Default: 1.0
max_memory (int) – Maximum memory available to each MPI process (in Mb) in the calculation. Most modern computing nodes will have ~2Gb per core, or 2048 Mb, but check for your specific system. This value should be as large as possible while still leaving some memory for the other parts of cp2k. Important: If this value is set larger than the memory limits, CP2K will likely seg-fault. Default: 2000
cutoff_radius (float) – for truncated hybrid functional (i.e. PBE0), this is the cutoff radius. The default is selected as that which generally gives convergence, but maybe too low (if you want very high accuracy) or too high (if you want a quick screening). Default: 8 angstroms
potential_type (str) – what interaction potential to use for HFX. Available in CP2K are COULOMB, GAUSSIAN, IDENTITY, LOGRANGE, MIX_CL, MIX_CL_TRUNC, MIX_LG, SHORTRANGE, and TRUNCATED. Default is None, and it will be set automatically depending on the named hybrid_functional that you use, but setting it to one of the acceptable values will constitute a user-override.
omega (float) – For HSE, this specifies the screening parameter. HSE06 sets this as 0.2, which is the default.
aux_basis (dict) – If you want to specify the aux basis to use, specify it as a dict of the form {‘specie_1’: ‘AUX_BASIS_1’, ‘specie_2’: ‘AUX_BASIS_2’}
admm (bool) – Whether or not to use the auxiliary density matrix method for the exact HF exchange contribution. Highly recommended. Speed ups between 10x and aaa1000x are possible when compared to non ADMM hybrid calculations. Default: True
eps_schwarz (float) – Screening threshold for HFX, in Ha. Contributions smaller than this will be screened. The smaller the value, the more accurate, but also the more costly. Default value is 1e-7. 1e-6 works in a large number of cases, but is quite aggressive, which can lead to convergence issues.
eps_schwarz_forces (float) – Same as for eps_schwarz, but for screening contributions to forces. Convergence is not as sensitive with respect to eps_schwarz forces as compared to eps_schwarz, and so 1e-6 should be good default.
screen_on_initial_p (bool) – If an initial density matrix is provided, in the form of a CP2K wfn restart file, then this initial density will be used for screening. This is generally very computationally efficient, but, as with eps_schwarz, can lead to instabilities if the initial density matrix is poor.
screen_p_forces (bool) – Same as screen_on_initial_p, but for screening of forces.
- activate_motion()[source]
Turns on the motion section for GEO_OPT, CELL_OPT, etc. calculations. Will turn on the printing subsections and also bind any constraints to their respective atoms.
- activate_nonperiodic(solver='MT')[source]
Activates a calculation with non-periodic calculations by turning of PBC and changing the poisson solver. Still requires a CELL to put the atoms
- activate_robust_minimization()[source]
Method to modify the set to use more robust SCF minimization technique
- activate_very_strict_minimization()[source]
Method to modify the set to use very strict SCF minimization scheme :return:
- modify_dft_print_iters(iters, add_last='no')[source]
Modify all DFT print iterations at once. Common use is to set iters to the max number of iterations + 1 and then set add_last to numeric. This would have the effect of printing only the first and last iteration, which might be useful for speeding up/saving space on GEO_OPT or MD runs where you don’t need the intermediate values.
- Args
iters (int): print each “iters” iterations. add_last (str): Whether to explicitly include the last iteration, and how to mark it.
numeric: mark last iteration with the iteration number symbolic: mark last iteration with the letter “l” no: do not explicitly include the last iteration
- print_e_density(stride=(2, 2, 2))[source]
Controls the printing of cube files with the electronic density and, for LSD calculations, the spin density
- print_hartree_potential(stride=(2, 2, 2))[source]
Controls the printing of a cube file with eletrostatic potential generated by the total density (electrons+ions). It is valid only for QS with GPW formalism. Note that by convention the potential has opposite sign than the expected physical one.
- print_ldos(nlumo=-1)[source]
Activate the printing of LDOS files, printing one for each atom kind by default
- Parameters:
nlumo (int) – Number of virtual orbitals to be added to the MO set (-1=all). CAUTION: Setting this value to be higher than the number of states present may cause a Cholesky error.
- print_mo_cubes(write_cube=False, nlumo=-1, nhomo=-1)[source]
Activate printing of molecular orbitals.
- Parameters:
write_cube (bool) – whether to write cube file for the MOs (setting false will just print levels in out file)
nlumo (int) – Controls the number of lumos that are printed and dumped as a cube (-1=all)
nhomo (int) – Controls the number of homos that are printed and dumped as a cube (-1=all)
- class HybridCellOptSet(structure: Structure | Molecule, hybrid_functional: str = 'PBE0', hf_fraction: float = 0.25, project_name: str = 'Hybrid-Static', potential_type: str = None, gga_x_fraction: float = 0.75, gga_c_fraction: float = 1, scale_coulomb: float = 1, scale_gaussian: float = 1, scale_longrange: float = 1, override_default_params: dict = None, max_memory: int = 2000, cutoff_radius: float = 8.0, omega: float = 0.2, aux_basis: dict | None = None, admm: bool = True, eps_schwarz: float = 1e-06, eps_schwarz_forces: float = 1e-06, screen_on_initial_p: bool = True, screen_p_forces: bool = True, **kwargs)[source]
Bases:
CellOptSet
Static calculation using hybrid DFT with the ADMM formalism in Cp2k.
- Parameters:
structure – pymatgen structure object
method – hybrid dft method to use (currently select between HSE06 and PBE0)
hf_fraction – percentage of exact HF to mix-in
project_name – what to call this project
gga_x_fraction – percentage of gga exchange to use
gga_c_fraction – percentage of gga correlation to use
override_default_params – override settings (see above).
- class HybridRelaxSet(structure: Structure | Molecule, hybrid_functional: str = 'PBE0', hf_fraction: float = 0.25, project_name: str = 'Hybrid-Static', potential_type: str = None, gga_x_fraction: float = 0.75, gga_c_fraction: float = 1, scale_coulomb: float = 1, scale_gaussian: float = 1, scale_longrange: float = 1, override_default_params: dict = None, max_memory: int = 2000, cutoff_radius: float = 8.0, omega: float = 0.2, aux_basis: dict | None = None, admm: bool = True, eps_schwarz: float = 1e-06, eps_schwarz_forces: float = 1e-06, screen_on_initial_p: bool = True, screen_p_forces: bool = True, **kwargs)[source]
Bases:
RelaxSet
Static calculation using hybrid DFT with the ADMM formalism in Cp2k.
- Parameters:
structure – pymatgen structure object
method – hybrid dft method to use (currently select between HSE06 and PBE0)
hf_fraction – percentage of exact HF to mix-in
project_name – what to call this project
gga_x_fraction – percentage of gga exchange to use
gga_c_fraction – percentage of gga correlation to use
override_default_params – override settings (see above).
- class HybridStaticSet(structure: Structure | Molecule, hybrid_functional: str = 'PBE0', hf_fraction: float = 0.25, project_name: str = 'Hybrid-Static', gga_x_fraction: float = 0.75, gga_c_fraction: float = 1, potential_type: str = None, scale_coulomb: float = 1, scale_gaussian: float = 1, scale_longrange: float = 1, override_default_params: dict = None, max_memory: int = 2000, cutoff_radius: float = 8.0, omega: float = 0.2, aux_basis: dict | None = None, admm: bool = True, eps_schwarz: float = 1e-06, eps_schwarz_forces: float = 1e-06, screen_on_initial_p: bool = True, screen_p_forces: bool = True, **kwargs)[source]
Bases:
StaticSet
Static calculation using hybrid DFT with the ADMM formalism in Cp2k.
- Parameters:
structure – pymatgen structure object
method – hybrid dft method to use (currently select between HSE06 and PBE0)
hf_fraction – percentage of exact HF to mix-in
project_name – what to call this project
gga_x_fraction – percentage of gga exchange to use
gga_c_fraction – percentage of gga correlation to use
override_default_params – override settings (see above).
- class RelaxSet(structure: Structure | Molecule, max_drift: float = 0.003, rms_drift: float = 0.0015, max_force: float = 0.00045, rms_force: float = 0.0003, max_iter: int = 200, project_name: str = 'Relax', optimizer: str = 'BFGS', override_default_params: dict = None, **kwargs)[source]
Bases:
DftSet
CP2K input set containing the basic settings for performing geometry optimization. Values are all cp2k defaults, and should be good for most systems of interest.
- Parameters:
structure – Pymatgen structure object
max_drift – Convergence criterion for the maximum geometry change between the current and the last optimizer iteration. This keyword cannot be repeated and it expects precisely one real. Default value: 1.5.00000000E-003 Default unit: [bohr]
rms_drift – Convergence criterion for the RMS geometry change between the current and the last optimizer iteration. This keyword cannot be repeated and it expects precisely one real. Default value: 1.00000000E-003 Default unit: [bohr]
max_force (float) – Convergence criterion for the maximum force component of the current configuration. This keyword cannot be repeated and it expects precisely one real. Default value: 1e-3 Default unit: [bohr^-1*hartree]
rms_force (float) – Convergence criterion for the RMS force component of the current configuration. This keyword cannot be repeated and it expects precisely one real. Default value: 1e-3 Default unit: [bohr^-1*hartree]
max_iter (int) – Specifies the maximum number of geometry optimization steps. One step might imply several force evaluations for the CG and LBFGS optimizers. This keyword cannot be repeated and it expects precisely one integer. Default value: 200
optimizer (str) – Specify which method to use to perform a geometry optimization. This keyword cannot be repeated and it expects precisely one keyword. BFGS is a quasi-newtonian method, and will best for “small” systems near the minimum. LBFGS is a limited memory version that can be used for “large” (>1000 atom) systems when efficiency outweighs robustness. CG is more robust, especially when you are far from the minimum, but it slower. Default value: BFGS
- class StaticSet(structure: Structure | Molecule, project_name: str = 'Static', run_type: str = 'ENERGY_FORCE', override_default_params: dict = None, **kwargs)[source]
Bases:
DftSet
Basic static energy calculation. Turns on Quickstep module, sets the run_type in global, and uses structure object to build the subsystem.
- Parameters:
structure – Pymatgen structure object
project_name (str) – What to name this cp2k project (controls naming of files printed out)
run_type (str) – Run type. As a static set it should be one of the static aliases, like ‘ENERGY_FORCE’