# pymatgen.io.cp2k.sets module¶

This module defines input sets for CP2K and is a work in progress. The structure/philosophy of this module is based on the Vasp input sets in Pymatgen. These sets are meant to contain tested parameters that will result in successful, reproducible, consistent calculations without need for intervention 99% of the time. 99% of the time, you only need to provide a pymatgen structure object and let the defaults take over from there.

The sets are intended to be very general, e.g. a set for geometry relaxation, and so most of the time, if you have specific needs, you can simply specify them via the keyword argument override_default_params (see Section.update() method). If you have the need to create a new input set (say for a standardized high throughput calculation) then you can create a new child of the Cp2kInputSet class.

In order to implement a new Set within the current code structure, follow this 3 step flow:
1. Inherit from Cp2kInputSet or one of its children and call the super() constructor

2. Create the new sections and insert them into self and its subsections as needed

3. Call self.update(override_default_params) in order to allow user settings.

class CellOptSet(structure: , project_name: str = 'CellOpt', override_default_params: Dict = {}, **kwargs)[source]

CP2K input set containing the basic settings for performing geometry optimization. Values are all cp2k defaults, and should be good for most systems of interest.

Parameters
• structure – Pymatgen structure object

• max_drift – Convergence criterion for the maximum geometry change between the current and the last optimizer iteration. This keyword cannot be repeated and it expects precisely one real. Default value: 3.00000000E-003 Default unit: [bohr]

• max_force (float) – Convergence criterion for the maximum force component of the current configuration. This keyword cannot be repeated and it expects precisely one real. Default value: 4.50000000E-004 Default unit: [bohr^-1*hartree]

• max_iter (int) – Specifies the maximum number of geometry optimization steps. One step might imply several force evaluations for the CG and LBFGS optimizers. This keyword cannot be repeated and it expects precisely one integer. Default value: 200

• optimizer (str) – Specify which method to use to perform a geometry optimization. This keyword cannot be repeated and it expects precisely one keyword. BFGS is a quasi-newtonian method, and will best for “small” systems near the minimum. LBFGS is a limited memory version that can be used for “large” (>1000 atom) systems when efficiency outweights robustness. CG is more robust, especially when you are far from the minimum, but it slower. Default value: BFGS

class Cp2kInputSet(structure: , potential_and_basis: Dict = {}, multiplicity: int = 0, project_name: str = 'CP2K', override_default_params: Dict = {}, **kwargs)[source]

The basic representation of a CP2K input set as a collection of “sections” defining the simulation connected to a structure object. At the most basis level, CP2K requires a &GLOBAL section and &FORCE_EVAL section. Global sets parameters like “RUN_TYPE” or the overall verbosity. FORCE_EVAL is the largest section usually, containing the cell and coordinates of atoms, the DFT settings, and more. This top level input set is meant to initialize GLOBAL and FORCE_EVAL based on a structure object and and sections that the user provides.

Like everything that goes into a cp2k input file, this base input set is essentially a section object. These sets are distinguished by saving default settings for easy implementation of calculations such as relaxation and static calculations. This base set is here to transfer a pymatgen structure object into the input format for cp2k and associate the basis set and pseudopotential to use with each element in the structure.

Generally, this class will not be used directly, and instead one of its child-classes will be used, which contain more predefined initializations of various sections, and, if modifications are required, the user can specify override_default_settings.

Parameters
• structure – (Structure or Molecule) pymatgen structure or molecule object used to define the lattice, coordinates, and elements. This structure object cannot contain “special” species like the Dummy species, e.g. X, or fractional occupations, e.g. Fe0.2, etc.

• potential_and_basis

(dict) Specifies what basis set and potential to use. Specify these as a dict of the form:

{ element: {‘cardinality’: __, ‘sr’: __, ‘q’: __},

’cardinality’: __, ‘functional’: __}

Where cardinality and functional are overall specifications (for all elements), while <key=’element’> specifies the overrides for a specific element. Currently the following conventions must be followed:

1. All species of a particular element must have the same potential/basis

• multiplicity – (int) Specify the system’s multiplicity if appropriate

• project_name – (str) Specify the project name. This will be used to name the output files from a CP2K calculation

• override_default_params – (dict) Specifies user-defined settings to override the settings of any input set (See Section.update())

create_subsys(structure: )[source]

Create the structure for the input

print_forces()[source]

Print out the forces and stress during calculation

print_motion()[source]

Print the motion info (trajectory, cell, forces, stress

class DftSet(structure: , ot: bool = True, band_gap: float = 0.01, eps_default: float = 1e-12, eps_scf: float = 1e-07, max_scf: Optional[int] = None, minimizer: str = 'DIIS', preconditioner: str = 'FULL_ALL', algorithm: str = 'STRICT', linesearch: str = '2PNT', cutoff: int = 1200, rel_cutoff: int = 80, ngrids: int = 5, progression_factor: int = 3, override_default_params: Dict = {}, wfn_restart_file_name: str = None, kpoints: Optional[pymatgen.io.cp2k.inputs.Kpoints] = None, smearing: bool = False, **kwargs)[source]

Base for an input set using the Quickstep module (i.e. a DFT calculation). The DFT section is pretty vast in CP2K, so this set hopes to make the DFT setup fairly simple. The provided parameters are pretty conservative, and so they should not need to be changed very often.

Parameters
• structure – Pymatgen structure or molecule object

• ot (bool) – Whether or not to use orbital transformation method for matrix diagonalization. OT is the flagship scf solver of CP2K, and will provide huge speed-ups for this part of the calculation, but the system must have a band gap for OT to be used (higher band-gap –> faster convergence). Band gap is also used by the preconditioner for OT, and should be set as a value SMALLER than the true band gap to get good efficiency. Generally, this parameter does not need to be changed from default of 0.01

• band_gap (float) – The band gap can also be specified in order to determine if ot should be turned on.

• eps_default (float) – Replaces all EPS_XX Keywords in the DFT section (NOT its subsections!) to have this value, ensuring an overall accuracy of at least this much.

• eps_scf (float) – The convergence criteria for leaving the SCF loop in Hartrees. Default is 1e-7. Should ensure reasonable results for all properties. Smaller than 1e-7 is generally not needed unless you need very high precision. 1e-6 may be used for difficult systems, and should still give reasonable results for most properties.

• max_scf (int) – The max number of SCF cycles before terminating the solver. NOTE: With the OT solver, this corresponds to the max number of INNER scf loops, and then the outer loops are set with outer_max_scf, while with diagnolization it corresponds to the overall (INNER*OUTER) number of SCF steps, with the inner loop limit set by

• minimizer (str) – The minimization scheme. DIIS can be as much as 50% faster than the more robust conjugate gradient method, and so it is chosen as default. Switch to CG if dealing with a difficult system.

• preconditioner (str) – Preconditioner for the OT method. FULL_ALL is the most reliable, and is the default. Though FULL_SINGLE_INVERSE has faster convergence according to our internal tests. Should only change from theses two when simulation cell gets to be VERY large, in which case FULL_KINETIC might be preferred.

• cutoff (int) – Cutoff energy (in Ry) for the finest level of the multigrid. A high cutoff will allow you to have very accurate calculations PROVIDED that REL_CUTOFF is appropriate.

• rel_cutoff (int) –

This cutoff decides how the Guassians are mapped onto the different levels of the multigrid. From CP2K: A Gaussian is mapped onto the coarsest level of the multi-grid, on which the

function will cover number of grid points greater than or equal to the number of grid points will cover on a reference grid defined by REL_CUTOFF.

• progression_factor (int) – Divisor of CUTOFF to get the cutoff for the next level of the multigrid.

• for the cutoffs (Takeaway) – https://www.cp2k.org/howto:converging_cutoff

• CUTOFF is too low, then all grids will be coarse and the calculation may become inaccurate; and if (If) –

• is too low, then even if you have a high CUTOFF, all Gaussians will be mapped onto the coarsest (REL_CUTOFF) –

• of the multi-grid, and thus the effective integration grid for the calculation may still be too (level) –

• coarse.

activate_fast_minimization(on)[source]

Method to modify the set to use fast SCF minimization.

activate_hybrid(hybrid_functional: str = 'PBE0', hf_fraction: float = 0.25, gga_x_fraction: float = 0.75, gga_c_fraction: float = 1, max_memory: int = 2000, cutoff_radius: float = 8.0, potential_type: str = None, omega: float = 0.2, aux_basis: Optional[Dict] = None, admm: bool = True, eps_schwarz: float = 1e-06, eps_schwarz_forces: float = 1e-06, screen_on_initial_p: bool = True, screen_p_forces: bool = True)[source]

Basic set for activating hybrid DFT calculation using Auxiliary Density Matrix Method.

Note 1: When running ADMM with cp2k, memory is very important. If the memory requirements exceed what is available (see max_memory), then CP2K will have to calculate the 4-electron integrals for HFX during each step of the SCF cycle. ADMM provides a huge speed up by making the memory requirements feasible to fit into RAM, which means you only need to calculate the integrals once each SCF cycle. But, this only works if it fits into memory. When setting up ADMM calculations, we recommend doing whatever is possible to fit all the 4EI into memory.

Note 2: This set is designed for reliable high-throughput calculations, NOT for extreme accuracy. Please review the in-line comments in this method if you want more control.

Parameters
• hybrid_functional (str) – Type of hybrid functional. This set supports HSE (screened) and PBE0 (truncated). Default is PBE0, which converges easier in the GPW basis used by cp2k.

• hf_fraction (float) – fraction of exact HF exchange energy to mix. Default: 0.25

• gga_x_fraction (float) – fraction of gga exchange energy to retain. Default: 0.75

• gga_c_fraction (float) – fraction of gga correlation energy to retain. Default: 1.0

• max_memory (int) – Maximum memory available to each MPI process (in Mb) in the calculation. Most modern computing nodes will have ~2Gb per core, or 2048 Mb, but check for your specific system. This value should be as large as possible while still leaving some memory for the other parts of cp2k. Important: If this value is set larger than the memory limits, CP2K will likely seg-fault. Default: 2000

• cutoff_radius (float) – for truncated hybrid functional (i.e. PBE0), this is the cutoff radius. The default is selected as that which generally gives convergence, but maybe too low (if you want very high accuracy) or too high (if you want a quick screening). Default: 8 angstroms

• potential_type (str) – what interaction potential to use for HFX. Available in CP2K are COULOMB, GAUSSIAN, IDENTITY, LOGRANGE, MIX_CL, MIX_CL_TRUNC, MIX_LG, SHORTRANGE, and TRUNCATED. Default is None, and it will be set automatically depending on the named hybrid_functional that you use, but setting it to one of the acceptable values will constitute a user-override.

• omega (float) – For HSE, this specifies the screening parameter. HSE06 sets this as 0.2, which is the default.

• aux_basis (dict) – If you want to specify the aux basis to use, specify it as a dict of the form {‘specie_1’: ‘AUX_BASIS_1’, ‘specie_2’: ‘AUX_BASIS_2’}

• admm (bool) – Whether or not to use the auxiliary density matrix method for the exact HF exchange contribution. Highly recommended. Speed ups between 10x and aaa1000x are possible when compared to non ADMM hybrid calculations. Default: True

• eps_schwarz (float) – Screening threshold for HFX, in Ha. Contributions smaller than this will be screened. The smaller the value, the more accurate, but also the more costly. Default value is 1e-6, which is quite aggressive. Aggressive screening can also lead to convergence issues. 1e-7 should be a safe value if 1e-6 is too aggressive.

• eps_schwarz_forces (float) – Same as for eps_schwarz, but for screening contributions to forces. Convergence is not as sensitive with respect to eps_schwarz forces as compared to eps_schwarz, and so 1e-6 should be good default.

• screen_on_initial_p (bool) – If an initial density matrix is provided, in the form of a CP2K wfn restart file, then this initial density will be used for screening. This is generally very computationally efficient, but, as with eps_schwarz, can lead to instabilities if the initial density matrix is poor.

• screen_p_forces (bool) – Same as screen_on_initial_p, but for screening of forces.

activate_nonperiodic()[source]

Activates a calculation with non-periodic calculations by turning of PBC and changing the poisson solver. Still requires a CELL to put the atoms

activate_robust_minimization()[source]

Method to modify the set to use more robust SCF minimization technique

activate_very_strict_minimization()[source]

Method to modify the set to use very strict SCF minimization scheme :return:

modify_dft_print_iters(iters, add_last='no')[source]

Modify all DFT print iterations at once. Common use is to set iters to the max number of iterations + 1 and then set add_last to numeric. This would have the effect of printing only the first and last iteration, which might be useful for speeding up/saving space on GEO_OPT or MD runs where you don’t need the intermediate values.

Args

iters (int): print each “iters” iterations. add_last (str): Whether to explicitly include the last iteration, and how to mark it.

numeric: mark last iteration with the iteration number symbolic: mark last iteration with the letter “l” no: do not explicitly include the last iteration

print_e_density()[source]

Controls the printing of cube files with the electronic density and, for LSD calculations, the spin density

print_hartree_potential(stride=[1, 1, 1])[source]

Controls the printing of a cube file with eletrostatic potential generated by the total density (electrons+ions). It is valid only for QS with GPW formalism. Note that by convention the potential has opposite sign than the expected physical one.

print_ldos(nlumo=- 1)[source]

Activate the printing of LDOS files, printing one for each atom kind by default

Parameters

nlumo (int) – Number of virtual orbitals to be added to the MO set (-1=all). CAUTION: Setting this value to be higher than the number of states present may cause a Cholesky error.

print_mo()[source]

Print molecular orbitals when running non-OT diagonalization

print_mo_cubes(write_cube=False, nlumo=- 1, nhomo=- 1)[source]

Activate printing of molecular orbitals.

Parameters
• write_cube (bool) – whether to write cube file for the MOs (setting false will just print levels in out file)

• nlumo (int) – Controls the number of lumos that are printed and dumped as a cube (-1=all)

• nhomo (int) – Controls the number of homos that are printed and dumped as a cube (-1=all)

print_pdos(nlumo=- 1)[source]

Activate creation of the PDOS file.

Parameters

nlumo (int) – Number of virtual orbitals to be added to the MO set (-1=all). CAUTION: Setting this value to be higher than the number of states present may cause a Cholesky error.

set_charge(charge)[source]

Set the overall charge of the simulation cell

class HybridCellOptSet(structure: , hybrid_functional: str = 'PBE0', hf_fraction: float = 0.25, project_name: str = 'Hybrid-CellOpt', gga_x_fraction: float = 0.75, gga_c_fraction: float = 1, override_default_params: Dict = {}, max_memory: int = 2000, cutoff_radius: float = 8.0, omega: float = 0.2, aux_basis: Optional[Dict] = None, admm: bool = True, eps_schwarz: float = 1e-06, eps_schwarz_forces: float = 1e-06, screen_on_initial_p: bool = True, screen_p_forces: bool = True, **kwargs)[source]

Static calculation using hybrid DFT with the ADMM formalism in Cp2k.

Parameters
• structure – pymatgen structure object

• method – hybrid dft method to use (currently select between HSE06 and PBE0)

• hf_fraction – percentage of exact HF to mix-in

• project_name – what to call this project

• gga_x_fraction – percentage of gga exchange to use

• gga_c_fraction – percentage of gga correlation to use

• override_default_params – override settings (see above).

class HybridRelaxSet(structure: , hybrid_functional: str = 'PBE0', hf_fraction: float = 0.25, project_name: str = 'Hybrid-Relax', gga_x_fraction: float = 0.75, gga_c_fraction: float = 1, override_default_params: Dict = {}, max_memory: int = 2000, cutoff_radius: float = 8.0, omega: float = 0.2, aux_basis: Optional[Dict] = None, admm: bool = True, eps_schwarz: float = 1e-06, eps_schwarz_forces: float = 1e-06, screen_on_initial_p: bool = True, screen_p_forces: bool = True, **kwargs)[source]

Static calculation using hybrid DFT with the ADMM formalism in Cp2k.

Parameters
• structure – pymatgen structure object

• method – hybrid dft method to use (currently select between HSE06 and PBE0)

• hf_fraction – percentage of exact HF to mix-in

• project_name – what to call this project

• gga_x_fraction – percentage of gga exchange to use

• gga_c_fraction – percentage of gga correlation to use

• override_default_params – override settings (see above).

class HybridStaticSet(structure: , hybrid_functional: str = 'PBE0', hf_fraction: float = 0.25, project_name: str = 'Hybrid-Static', gga_x_fraction: float = 0.75, gga_c_fraction: float = 1, override_default_params: Dict = {}, max_memory: int = 2000, cutoff_radius: float = 8.0, omega: float = 0.2, aux_basis: Optional[Dict] = None, admm: bool = True, eps_schwarz: float = 1e-06, eps_schwarz_forces: float = 1e-06, screen_on_initial_p: bool = True, screen_p_forces: bool = True, **kwargs)[source]

Static calculation using hybrid DFT with the ADMM formalism in Cp2k.

Parameters
• structure – pymatgen structure object

• method – hybrid dft method to use (currently select between HSE06 and PBE0)

• hf_fraction – percentage of exact HF to mix-in

• project_name – what to call this project

• gga_x_fraction – percentage of gga exchange to use

• gga_c_fraction – percentage of gga correlation to use

• override_default_params – override settings (see above).

class RelaxSet(structure: , max_drift: float = 0.003, max_force: float = 0.0045, max_iter: int = 200, project_name: str = 'Relax', optimizer: str = 'BFGS', override_default_params: Dict = {}, **kwargs)[source]

CP2K input set containing the basic settings for performing geometry optimization. Values are all cp2k defaults, and should be good for most systems of interest.

Parameters
• structure

• max_drift – Convergence criterion for the maximum geometry change between the current and the last optimizer iteration. This keyword cannot be repeated and it expects precisely one real. Default value: 3.00000000E-003 Default unit: [bohr]

• max_force (float) – Convergence criterion for the maximum force component of the current configuration. This keyword cannot be repeated and it expects precisely one real. Default value: 4.50000000E-004 Default unit: [bohr^-1*hartree]

• max_iter (int) – Specifies the maximum number of geometry optimization steps. One step might imply several force evaluations for the CG and LBFGS optimizers. This keyword cannot be repeated and it expects precisely one integer. Default value: 200

• optimizer (str) – Specify which method to use to perform a geometry optimization. This keyword cannot be repeated and it expects precisely one keyword. BFGS is a quasi-newtonian method, and will best for “small” systems near the minimum. LBFGS is a limited memory version that can be used for “large” (>1000 atom) systems when efficiency outweights robustness. CG is more robust, especially when you are far from the minimum, but it slower. Default value: BFGS

class StaticSet(structure: , project_name: str = 'Static', run_type: str = 'ENERGY_FORCE', override_default_params: Dict = {}, **kwargs)[source]

Basic static energy calculation. Turns on Quickstep module, sets the run_type in global, and uses structure object to build the subsystem.

Parameters
• structure – Pymatgen structure object

• project_name (str) – What to name this cp2k project (controls naming of files printed out)

• run_type (str) – Run type. As a static set it should be one of the static aliases, like ‘ENERGY_FORCE’