pymatgen.io.abinit.flows module

A Flow is a container for Works, and works consist of tasks. Flows are the final objects that can be dumped directly to a pickle file on disk Flows are executed using abirun (abipy).

class Flow(workdir, manager=None, pickle_protocol=-1, remove=False)[source]

Bases: pymatgen.io.abinit.nodes.Node, pymatgen.io.abinit.works.NodeContainer, monty.json.MSONable

This object is a container of work. Its main task is managing the possible inter-dependencies among the work and the creation of dynamic workflows that are generated by callbacks registered by the user.

Important methods for constructing flows:

Parameters:
  • workdir – String specifying the directory where the works will be produced. if workdir is None, the initialization of the working directory is performed by flow.allocate(workdir).
  • managerTaskManager object responsible for the submission of the jobs. If manager is None, the object is initialized from the yaml file located either in the working directory or in the user configuration dir.
  • pickle_protocol – Pickle protocol version used for saving the status of the object. -1 denotes the latest version supported by the python interpreter.
  • remove – attempt to remove working directory workdir if directory already exists.
Error

alias of FlowError

PICKLE_FNAME = '__AbinitFlow__.pickle'
Results

alias of FlowResults

VERSION = '0.1'
abivalidate_inputs()[source]

Run ABINIT in dry mode to validate all the inputs of the flow.

Returns:(isok, tuples)

isok is True if all inputs are ok. tuples is List of namedtuple objects, one for each task in the flow. Each namedtuple has the following attributes:

retcode: Return code. 0 if OK. log_file: log file of the Abinit run, use log_file.read() to access its content. stderr_file: stderr file of the Abinit run. use stderr_file.read() to access its content.
Raises:RuntimeError if executable is not in $PATH.
all_ok

True if all the tasks in works have reached S_OK.

allocate(workdir=None)[source]

Allocate the Flow i.e. assign the workdir and (optionally) the TaskManager to the different tasks in the Flow.

Parameters:workdir – Working directory of the flow. Must be specified here if we haven’t initialized the workdir in the __init__.
allocated

Numer of allocations. Set by allocate.

as_dict(**kwargs)[source]

JSON serialization, note that we only need to save a string with the working directory since the object will be reconstructed from the pickle file located in workdir

classmethod as_flow(obj)[source]

Convert obj into a Flow. Accepts filepath, dict, or Flow object.

batch(timelimit=None)[source]

Run the flow in batch mode, return exit status of the job script. Requires a manager.yml file and a batch_adapter adapter.

Parameters:
  • timelimit – Time limit (int with seconds or string with time given with the slurm convention:
  • "days-hours – minutes:seconds”). If timelimit is None, the default value specified in the
  • entry of manager.yml is used. (batch_adapter) –
build(*args, **kwargs)[source]

Make directories and files of the Flow.

build_and_pickle_dump(abivalidate=False)[source]

Build dirs and file of the Flow and save the object in pickle format. Returns 0 if success

Parameters:abivalidate – If True, all the input files are validate by calling the abinit parser. If the validation fails, ValueError is raise.
cancel(nids=None)[source]

Cancel all the tasks that are in the queue. nids is an optional list of node identifiers used to filter the tasks.

Returns:Number of jobs cancelled, negative value if error
check_dependencies()[source]

Test the dependencies of the nodes for possible deadlocks.

check_pid_file()[source]

This function checks if we are already running the Flow with a PyFlowScheduler. Raises: Flow.Error if the pif file of the scheduler exists.

check_status(**kwargs)[source]

Check the status of the works in self.

Parameters:
  • show – True to show the status of the flow.
  • kwargs – keyword arguments passed to show_status
chroot(new_workdir)[source]

Change the workir of the Flow. Mainly used for allowing the user to open the GUI on the local host and access the flow from remote via sshfs.

Note

Calling this method will make the flow go in read-only mode.

connect_signals()[source]

Connect the signals within the Flow. The Flow is responsible for catching the important signals raised from its works.

db_insert()[source]

Insert results in the MongDB database.

debug(status=None, nids=None)[source]

This method is usually used when the flow didn’t completed succesfully It analyzes the files produced the tasks to facilitate debugging. Info are printed to stdout.

Parameters:
  • status – If not None, only the tasks with this status are selected
  • nids – optional list of node identifiers used to filter the tasks.
disconnect_signals()[source]

Disable the signals within the Flow.

errored_tasks

List of errored tasks.

finalize()[source]

This method is called when the flow is completed. Return 0 if success

find_deadlocks()[source]

This function detects deadlocks

Returns:deadlocks, runnables, running
Return type:named tuple with the tasks grouped in
fix_abicritical()[source]

This function tries to fix critical events originating from ABINIT. Returns the number of tasks that have been fixed.

fix_queue_critical()[source]

This function tries to fix critical events originating from the queue submission system.

Returns the number of tasks that have been fixed.

classmethod from_dict(d, **kwargs)[source]

Reconstruct the flow from the pickle file.

classmethod from_inputs(workdir, inputs, manager=None, pickle_protocol=-1, task_class=<class 'pymatgen.io.abinit.tasks.ScfTask'>, work_class=<class 'pymatgen.io.abinit.works.Work'>)[source]

Construct a simple flow from a list of inputs. The flow contains a single Work with tasks whose class is given by task_class.

Warning

Don’t use this interface if you have dependencies among the tasks.

Parameters:
  • workdir – String specifying the directory where the works will be produced.
  • inputs – List of inputs.
  • managerTaskManager object responsible for the submission of the jobs. If manager is None, the object is initialized from the yaml file located either in the working directory or in the user configuration dir.
  • pickle_protocol – Pickle protocol version used for saving the status of the object. -1 denotes the latest version supported by the python interpreter.
  • task_class – The class of the Task.
  • work_class – The class of the Work.
get_dict_for_mongodb_queries()[source]

This function returns a dictionary with the attributes that will be put in the mongodb document to facilitate the query. Subclasses may want to replace or extend the default behaviour.

get_mongo_info()[source]

Return a JSON dictionary with information on the flow. Mainly used for constructing the info section in FlowEntry. The default implementation is empty. Subclasses must implement it

get_njobs_in_queue(username=None)[source]

returns the number of jobs in the queue, None when the number of jobs cannot be determined.

Parameters:username – (str) the username of the jobs to count (default is to autodetect)
get_results(**kwargs)[source]
groupby_status()[source]

Returns a ordered dictionary mapping the task status to the list of named tuples (task, work_index, task_index).

groupby_task_class()[source]

Returns a dictionary mapping the task class to the list of tasks in the flow

has_chrooted

Returns a string that evaluates to True if we have changed the workdir for visualization purposes e.g. we are using sshfs. to mount the remote directory where the Flow is located. The string gives the previous workdir of the flow.

has_db

True if flow uses MongoDB to store the results.

iflat_nodes(status=None, op='==', nids=None)[source]

Generators that produces a flat sequence of nodes. if status is not None, only the tasks with the specified status are selected. nids is an optional list of node identifiers used to filter the nodes.

iflat_tasks(status=None, op='==', nids=None)[source]

Generator to iterate over all the tasks of the Flow.

If status is not None, only the tasks whose status satisfies the condition (task.status op status) are selected status can be either one of the flags defined in the Task class (e.g Task.S_OK) or a string e.g “S_OK” nids is an optional list of node identifiers used to filter the tasks.

iflat_tasks_wti(status=None, op='==', nids=None)[source]

Generator to iterate over all the tasks of the Flow. :Yields: (task, work_index, task_index)

If status is not None, only the tasks whose status satisfies the condition (task.status op status) are selected status can be either one of the flags defined in the Task class (e.g Task.S_OK) or a string e.g “S_OK” nids is an optional list of node identifiers used to filter the tasks.

inspect(nids=None, wslice=None, **kwargs)[source]

Inspect the tasks (SCF iterations, Structural relaxation ...) and produces matplotlib plots.

Parameters:
  • nids – List of node identifiers.
  • wslice – Slice object used to select works.
  • kwargs – keyword arguments passed to task.inspect method.

Note

nids and wslice are mutually exclusive. If nids and wslice are both None, all tasks in self are inspected.

Returns:List of matplotlib figures.
listext(ext, stream=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]

Print to the given stream a table with the list of the output files with the given ext produced by the flow.

look_before_you_leap()[source]

This method should be called before running the calculation to make sure that the most important requirements are satisfied.

Returns:List of strings with inconsistencies/errors.
make_light_tarfile(name=None)[source]

Lightweight tarball file. Mainly used for debugging. Return the name of the tarball file.

make_scheduler(**kwargs)[source]

Build a return a PyFlowScheduler to run the flow.

Parameters:kwargs – if empty we use the user configuration file. if filepath in kwargs we init the scheduler from filepath. else pass **kwargs to PyFlowScheduler __init__ method.
make_tarfile(name=None, max_filesize=None, exclude_exts=None, exclude_dirs=None, verbose=0, **kwargs)[source]

Create a tarball file.

Parameters:
  • name – Name of the tarball file. Set to os.path.basename(flow.workdir) + “tar.gz”` if name is None.
  • max_filesize (int or string with unit) – a file is included in the tar file if its size <= max_filesize Can be specified in bytes e.g. max_files=1024 or with a string with unit e.g. max_filesize=”1 Mb”. No check is done if max_filesize is None.
  • exclude_exts – List of file extensions to be excluded from the tar file.
  • exclude_dirs – List of directory basenames to be excluded.
  • verbose (int) – Verbosity level.
  • kwargs – keyword arguments passed to the TarFile constructor.
Returns:

The name of the tarfile.

mongo_assimilate()[source]

This function is called by client code when the flow is completed Return a JSON dictionary with the most important results produced by the flow. The default implementation is empty. Subclasses must implement it

mongo_id
mongodb_upload(**kwargs)[source]
ncores_allocated

Returns the number of cores allocated in this moment. A core is allocated if it’s running a task or if we have submitted a task to the queue manager but the job is still pending.

ncores_reserved

Returns the number of cores reserved in this moment. A core is reserved if the task is not running but we have submitted the task to the queue manager.

ncores_used

Returns the number of cores used in this moment. A core is used if there’s a job that is running on it.

node_from_nid(nid)[source]

Return the node in the Flow with the given nid identifier

num_errored_tasks

The number of tasks whose status is S_ERROR.

num_tasks

Total number of tasks

num_unconverged_tasks

The number of tasks whose status is S_UNCONVERGED.

on_dep_ok(signal, sender)[source]
open_files(what='o', status=None, op='==', nids=None, editor=None)[source]

Open the files of the flow inside an editor (command line interface).

Parameters:
  • what

    string with the list of characters selecting the file type Possible choices:

    i ==> input_file, o ==> output_file, f ==> files_file, j ==> job_file, l ==> log_file, e ==> stderr_file, q ==> qout_file, all ==> all files.
  • status – if not None, only the tasks with this status are select
  • op – status operator. Requires status. A task is selected if task.status op status evaluates to true.
  • nids – optional list of node identifiers used to filter the tasks.
  • editor – Select the editor. None to use the default editor ($EDITOR shell env var)
parse_timing(nids=None)[source]

Parse the timer data in the main output file(s) of Abinit. Requires timopt /= 0 in the input file (usually timopt = -1)

Parameters:nids – optional list of node identifiers used to filter the tasks.

Return: AbinitTimerParser instance, None if error.

pickle_dump()[source]

Save the status of the object in pickle format. Returns 0 if success

pickle_dumps(protocol=None)[source]

Return a string with the pickle representation. protocol selects the pickle protocol. self.pickle_protocol is

used if protocol is None
pickle_file

The path of the pickle file.

classmethod pickle_load(filepath, spectator_mode=True, remove_lock=False)[source]

Loads the object from a pickle file and performs initial setup.

Parameters:
  • filepath – Filename or directory name. It filepath is a directory, we scan the directory tree starting from filepath and we read the first pickle database. Raise RuntimeError if multiple databases are found.
  • spectator_mode – If True, the nodes of the flow are not connected by signals. This option is usually used when we want to read a flow in read-only mode and we want to avoid callbacks that can change the flow.
  • remove_lock – True to remove the file lock if any (use it carefully).
classmethod pickle_loads(s)[source]

Reconstruct the flow from a string.

pid_file

The path of the pid file created by PyFlowScheduler.

plot_networkx(mode='network', with_edge_labels=False, node_size='num_cores', node_label='name_class', layout_type='spring', **kwargs)[source]

Use networkx to draw the flow with the connections among the nodes and the status of the tasks.

Warning

Requires networkx package.

pyfile

Absolute path of the python script used to generate the flow. Set by set_pyfile

rapidfire(check_status=True, **kwargs)[source]

Use PyLauncher to submits tasks in rapidfire mode. kwargs contains the options passed to the launcher.

Returns:number of tasks submitted.
register_task(input, deps=None, manager=None, task_class=None)[source]

Utility function that generates a Work made of a single task

Parameters:
  • inputAbinitInput
  • deps – List of Dependency objects specifying the dependency of this node. An empy list of deps implies that this node has no dependencies.
  • manager – The TaskManager responsible for the submission of the task. If manager is None, we use the TaskManager specified during the creation of the work.
  • task_class – Task subclass to instantiate. Default: AbinitTask
Returns:

The generated Work for the task, work[0] is the actual task.

register_work(work, deps=None, manager=None, workdir=None)[source]

Register a new Work and add it to the internal list, taking into account possible dependencies.

Parameters:
  • workWork object.
  • deps – List of Dependency objects specifying the dependency of this node. An empy list of deps implies that this node has no dependencies.
  • manager – The TaskManager responsible for the submission of the task. If manager is None, we use the TaskManager specified during the creation of the work.
  • workdir – The name of the directory used for the Work.
Returns:

The registered Work.

register_work_from_cbk(cbk_name, cbk_data, deps, work_class, manager=None)[source]

Registers a callback function that will generate the Task of the Work.

Parameters:
  • cbk_name – Name of the callback function (must be a bound method of self)
  • cbk_data – Additional data passed to the callback function.
  • deps – List of Dependency objects specifying the dependency of the work.
  • work_classWork class to instantiate.
  • manager – The TaskManager responsible for the submission of the task. If manager is None, we use the TaskManager specified during the creation of the Flow.
Returns:

The Work that will be finalized by the callback.

reload()[source]

Reload the flow from the pickle file. Used when we are monitoring the flow executed by the scheduler. In this case, indeed, the flow might have been changed by the scheduler and we have to reload the new flow in memory.

rm_and_build()[source]

Remove the workdir and rebuild the flow.

rmtree(ignore_errors=False, onerror=None)[source]

Remove workdir (same API as shutil.rmtree).

select_tasks(nids=None, wslice=None)[source]

Return a list with a subset of tasks.

Parameters:
  • nids – List of node identifiers.
  • wslice – Slice object used to select works.

Note

nids and wslice are mutually exclusive. If no argument is provided, the full list of tasks is returned.

set_garbage_collector(exts=None, policy='task')[source]

Enable the garbage collector that will remove the big output files that are not needed.

Parameters:
  • exts – string or list with the Abinit file extensions to be removed. A default is provided if exts is None
  • policy – Either flow or task. If policy is set to ‘task’, we remove the output files as soon as the task reaches S_OK. If ‘flow’, the files are removed only when the flow is finalized. This option should be used when we are dealing with a dynamic flow with callbacks generating other tasks since a Task might not be aware of its children when it reached S_OK.
set_pyfile(pyfile)[source]

Set the path of the python script used to generate the flow.

set_spectator_mode(mode=True)[source]

When the flow is in spectator_mode, we have to disable signals, pickle dump and possible callbacks A spectator can still operate on the flow but the new status of the flow won’t be saved in the pickle file. Usually the flow is in spectator mode when we are already running it via the scheduler or other means and we should not interfere with its evolution. This is the reason why signals and callbacks must be disabled. Unfortunately preventing client-code from calling methods with side-effects when the flow is in spectator mode is not easy (e.g. flow.cancel will cancel the tasks submitted to the queue and the flow used by the scheduler won’t see this change!

set_workdir(workdir, chroot=False)[source]

Set the working directory. Cannot be set more than once unless chroot is True

show_abierrors(nids=None, stream=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]

Write to the given stream the list of ABINIT errors for all tasks whose status is S_ABICRITICAL.

Parameters:
  • nids – optional list of node identifiers used to filter the tasks.
  • stream – File-like object. Default: sys.stdout
show_corrections(status=None, nids=None)[source]

Show the corrections applied to the flow at run-time.

Parameters:
  • status – if not None, only the tasks with this status are select.
  • nids – optional list of node identifiers used to filter the tasks.

Return: The number of corrections found.

show_dependencies(stream=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]

Writes to the given stream the ASCII representation of the dependency tree.

show_events(status=None, nids=None)[source]

Print the Abinit events (ERRORS, WARNIING, COMMENTS) to stdout

Parameters:
  • status – if not None, only the tasks with this status are select
  • nids – optional list of node identifiers used to filter the tasks.
show_history(status=None, nids=None, full_history=False, metadata=False)[source]

Print the history of the flow to stdout.

Parameters:
  • status – if not None, only the tasks with this status are select
  • full_history – Print full info set, including nodes with an empty history.
  • nids – optional list of node identifiers used to filter the tasks.
  • metadata – print history metadata (experimental)
show_info(**kwargs)[source]

Print info on the flow i.e. total number of tasks, works, tasks grouped by class.

Example

Task Class Number ———— ——– ScfTask 1 NscfTask 1 ScrTask 2 SigmaTask 6

show_inputs(varnames=None, nids=None, wslice=None, stream=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]

Print the input of the tasks to the given stream.

Parameters:
  • varnames – List of Abinit variables. If not None, only the variable in varnames are selected and printed.
  • nids – List of node identifiers. By defaults all nodes are shown
  • wslice – Slice object used to select works.
  • stream – File-like object, Default: sys.stdout
show_inpvars(*args, **kwargs)
show_qouts(nids=None, stream=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]

Write to the given stream the content of the queue output file for all tasks whose status is S_QCRITICAL.

Parameters:
  • nids – optional list of node identifiers used to filter the tasks.
  • stream – File-like object. Default: sys.stdout
show_receivers(sender=None, signal=None)[source]
show_status(**kwargs)[source]

Report the status of the works and the status of the different tasks on the specified stream.

Parameters:
  • stream – File-like object, Default: sys.stdout
  • nids – List of node identifiers. By defaults all nodes are shown
  • wslice – Slice object used to select works.
  • verbose – Verbosity level (default 0). > 0 to show only the works that are not finalized.
show_summary(**kwargs)[source]

Print a short summary with the status of the flow and a counter task_status –> number_of_tasks

Parameters:stream – File-like object, Default: sys.stdout

Example

Status Count ——— ——- Completed 10

<Flow, node_id=27163, workdir=flow_gwconv_ecuteps>, num_tasks=10, all_ok=True

single_shot(check_status=True, **kwargs)[source]

Use PyLauncher to submits one task. kwargs contains the options passed to the launcher.

Returns:number of tasks submitted.
status

The status of the Flow i.e. the minimum of the status of its tasks and its works

status_counter

Returns a Counter object that counts the number of tasks with given status (use the string representation of the status as key).

tasks_from_nids(nids)[source]

Return the list of tasks associated to the given list of node identifiers (nids).

Note

Invalid ids are ignored

classmethod temporary_flow(manager=None)[source]

Return a Flow in a temporary directory. Useful for unit tests.

to_dict(**kwargs)

JSON serialization, note that we only need to save a string with the working directory since the object will be reconstructed from the pickle file located in workdir

unconverged_tasks

List of unconverged tasks.

use_smartio()[source]

This function should be called when the entire Flow has been built. It tries to reduce the pressure on the hard disk by using Abinit smart-io capabilities for those files that are not needed by other nodes. Smart-io means that big files (e.g. WFK) are written only if the calculation is unconverged so that we can restart from it. No output is produced if convergence is achieved.

validate_json_schema()[source]

Validate the JSON schema. Return list of errors.

works

List of Work objects contained in self..

wti_from_nids(nids)[source]

Return the list of (w, t) indices from the list of node identifiers nids.

class G0W0WithQptdmFlow(workdir, scf_input, nscf_input, scr_input, sigma_inputs, manager=None)[source]

Bases: pymatgen.io.abinit.flows.Flow

Build a Flow for one-shot G0W0 calculations. The computation of the q-points for the screening is parallelized with qptdm i.e. we run independent calculations for each q-point and then we merge the final results.

Parameters:
  • workdir – Working directory.
  • scf_input – Input for the GS SCF run.
  • nscf_input – Input for the NSCF run (band structure run).
  • scr_input – Input for the SCR run.
  • sigma_inputs – Input(s) for the SIGMA run(s).
  • managerTaskManager object used to submit the jobs Initialized from manager.yml if manager is None.
cbk_qptdm_workflow(cbk)[source]

This callback is executed by the flow when bands_work.nscf_task reaches S_OK.

It computes the list of q-points for the W(q,G,G’), creates nqpt tasks in the second work (QptdmWork), and connect the signals.

bandstructure_flow(workdir, scf_input, nscf_input, dos_inputs=None, manager=None, flow_class=<class 'pymatgen.io.abinit.flows.Flow'>, allocate=True)[source]

Build a Flow for band structure calculations.

Parameters:
  • workdir – Working directory.
  • scf_input – Input for the GS SCF run.
  • nscf_input – Input for the NSCF run (band structure run).
  • dos_inputs – Input(s) for the NSCF run (dos run).
  • managerTaskManager object used to submit the jobs Initialized from manager.yml if manager is None.
  • flow_class – Flow subclass
  • allocate – True if the flow should be allocated before returning.
Returns:

Flow object

g0w0_flow(workdir, scf_input, nscf_input, scr_input, sigma_inputs, manager=None, flow_class=<class 'pymatgen.io.abinit.flows.Flow'>, allocate=True)[source]

Build a Flow for one-shot $G_0W_0$ calculations.

Parameters:
  • workdir – Working directory.
  • scf_input – Input for the GS SCF run.
  • nscf_input – Input for the NSCF run (band structure run).
  • scr_input – Input for the SCR run.
  • sigma_inputs – List of inputs for the SIGMA run.
  • flow_class – Flow class
  • managerTaskManager object used to submit the jobs. Initialized from manager.yml if manager is None.
  • allocate – True if the flow should be allocated before returning.
Returns:

Flow object

phonon_flow(workdir, scf_input, ph_inputs, with_nscf=False, with_ddk=False, with_dde=False, manager=None, flow_class=<class 'pymatgen.io.abinit.flows.PhononFlow'>, allocate=True)[source]

Build a PhononFlow for phonon calculations.

Parameters:
  • workdir – Working directory.
  • scf_input – Input for the GS SCF run.
  • ph_inputs – List of Inputs for the phonon runs.
  • with_nscf – add an nscf task in front of al phonon tasks to make sure the q point is covered
  • with_ddk – add the ddk step
  • with_dde – add the dde step it the dde is set ddk is switched on automatically
  • managerTaskManager used to submit the jobs Initialized from manager.yml if manager is None.
  • flow_class – Flow class
Returns:

Flow object