Melts and fluids are ubiquitous vectors of mass transport in natural systems. We have developed an open-source package to analyze ab initio molecular-dynamics simulations of such systems. We compute structural (bonding, clusterization, chemical speciation), transport (diffusion, viscosity) and thermodynamic properties (vibrational spectrum).
We have developed a Python-based open-source package to analyze the results stemming from ab initio molecular-dynamics simulations of fluids. The package is best suited for applications on natural systems, like silicate and oxide melts, water-based fluids, and various supercritical fluids. The package is a collection of Python scripts that include two major libraries dealing with file formats and with crystallography. All the scripts are run at the command line. We propose a simplified format to store the atomic trajectories and relevant thermodynamic information of the simulations, which is saved in UMD files, standing for Universal Molecular Dynamics. The UMD package allows the computation of a series of structural, transport and thermodynamic properties. Starting with the pair-distribution function it defines bond lengths, builds an interatomic connectivity matrix, and eventually determines the chemical speciation. Determining the lifetime of the chemical species allows running a full statistical analysis. Then dedicated scripts compute the mean-square displacements for the atoms as well as for the chemical species. The implemented self-correlation analysis of the atomic velocities yields the diffusion coefficients and the vibrational spectrum. The same analysis applied on the stresses yields the viscosity. The package is available via the GitHub website and via its own dedicated page of the ERC IMPACT project as open-access package.
Fluids and melts are active chemical and physical transport vectors in natural environments. The elevated rates of atomic diffusion favor chemical exchanges and reactions, the low viscosity coupled with varying buoyancy favor large mass transfer, and crystal-melt density relations favor layering inside planetary bodies. The absence of a periodic lattice, typical high-temperatures required to reach the molten state, and the difficulty for quenching make the experimental determination of a series of obvious properties, like density, diffusion, and viscosity, extremely challenging. These difficulties make alternative computational methods strong and useful tools for investigating this class of materials.
With the advent of computing power and the availability of supercomputers, two major numerical atomistic simulations techniques are currently employed to study the dynamical state of a non-crystalline atomistic system, Monte Carlo1 and molecular dynamics (MD)1,2. In Monte Carlo simulations the configurational space is randomly sampled; Monte Carlo methods show linear scaling in parallelization if all sampling observations are independent of each other. The quality of the results depends on the quality of the random number generator and the representativeness of the sampling. Monte Carlo methods show linear scaling in parallelization if the sampling is independent of each other. In molecular dynamics (MD) the configurational space is sampled by time-dependent atomic trajectories. Starting from a given configuration, the atomic trajectories are computed by integrating the Newtonian equations of motion. The interatomic forces can be computed using model interatomic potentials (in classical MD) or using first-principles methods (in ab initio, or first-principles, MD). The quality of the results depends on the length of the trajectory and its capability to not be attracted to local minima.
Molecular dynamics simulations contain a plethora of information, all related to the dynamical behavior of the system. Thermodynamic average properties, like internal energy, temperature, and pressure, are rather standard to compute. They can be extracted from the output file(s) of the simulations and averaged, whereas quantities related directly to the movement of the atoms as well as to their mutual relation need to be computed after extraction of the atomic positions and velocities.
Consequently, a lot of effort has been dedicated to visualizing the results, and various packages are available today, on different platforms, open source or not [Ovito3, VMD4, Vesta5, Travis6, etc.]. All these visualization tools deal efficiently with interatomic distances, and as such, they allow the efficient computation of pair distribution functions and diffusion coefficients. Various groups performing large-scale molecular dynamics simulations have proprietary software to analyze various other properties resulting from the simulations, sometimes in shareware or other forms of limited access to the community, and sometimes limited in scope and use to some specific packages. Sophisticated algorithms to extract information about interatomic bonding, geometrical patterns, and thermodynamics are developed and implemented in some of these packages3,4,5,6,7, etc.
Here we propose the UMD package - an open-source package written in Python to analyze the output of molecular dynamics simulations. The UMD package allows for the computation of a wide range of structural, dynamical, and thermodynamical properties (Figure 1). The package is available via the GitHub website (https://github.com/rcaracas/UMD_package) and via a dedicated page (http://moonimpact.eu/umd-package/) of the ERC IMPACT project as an open-access package.
To make it universal and easier to handle, our approach is to first extract all the information related to the thermodynamic state and the atomic trajectories from the output file of the actual molecular-dynamics run. This information is stored in a dedicated file, whose format is independent of the original MD package where the simulation was run. We name these files “umd” files, which stands for Universal Molecular Dynamics. In this way, our UMD package can be easily used by any ab initio group with any software, all with a minimum effort of adaptation. The only requirement to use the present package is to write the appropriate parser from the output of the particular MD software into the umd file format, if this is not yet existent. For the time being, we provide such parsers for the VASP8 and the QBox9 packages.
Figure 1: Flowchart of the UMD library.
Physical properties are in blue, and major Python scripts and their options are in red. Please click here to view a larger version of this figure.
The umd files are ASCII files; typical extension is “umd.dat” but not mandatory. All the analysis components can read ASCII files of the umd format, regardless of the actual name extension. However, some of the automatic scripts designed to perform fast large-scale statistics over several simulations specifically look for files with the umd.dat extension. Each physical property is expressed on one line. Every line starts with a keyword. In this way the format is highly adaptable and allows for new properties to be added to the umd file, all the while preserving its readability throughout versions. The first 30 lines of the umd file of the simulation of pyrolite at 4.6 GPa and 3000 K, used below in the discussion, are shown in Figure 2.
Figure 2: The beginning of the umd file describing the simulation of liquid pyrolite at 4.6 GPa and 3000 K.
The header is followed by the description of each snapshot. Each property is written on one line, containing the name of the physical property, the value(s), and the units, all separated by spaces. Please click here to view a larger version of this figure.
All umd files contain a header describing the content of the simulation cell: the number of atoms, electrons, and atomic types, as well as details for each atom, such as its type, chemical symbol, number of valence electrons, and its mass. An empty line marks the end of the header and separates it from the main part of the umd file.
Then each step of the simulation is detailed. First, the instantaneous thermodynamic parameters are given, each on a different line, specifying (i) the name of the parameter, like energy, stresses, equivalent hydrostatic pressure, density, volume, lattice parameters, etc., (ii) its value(s), and (iii) its units. A table describing the atoms comes next. A header line gives the different measures, like Cartesian positions, velocities, charges, etc., and their units. Then each atom is detailed on one line. By groups of three, corresponding to the three x, y, z axes, the entries are: the reduced positions, the Cartesian positions folded into the simulation cell, the Cartesian positions (that properly take into account the fact that atoms can traverse several unit cells during a simulation), the atomic velocities, and the atomic forces. The last two entries are scalars: charge and magnetic moment.
Two major libraries ensure the proper functioning of the entire package. The umd_process.py library deals with the umd files, like reading and printing. The crystallography.py library deals with all the information related to the actual atomic structure. The underlying philosophy of the crystallography.py library is to treat the lattice as a vectorial space. The unit cell parameters together with their orientation represent the basis vectors. The “Space” has a series of scalar attributes (specific volume, density, temperature, and specific number of atoms), thermodynamic properties (internal energy, pressure, heat capacity, etc.), and a series of tensorial properties (stress and elasticity). Atoms populate this space. The “Lattice” class defines this ensemble, alongside various few short calculations, like specific volume, density, obtaining the reciprocal lattice from the direct one, etc. The “Atoms” class defines the atoms. They are characterized by a series of scalar properties (name, symbol, mass, number of electrons, etc.) and a series of vectorial properties (the position in space, either relative to the vectorial basis described in the Lattice class, or relative to universal Cartesian coordinates, velocities, forces, etc.). Apart from these two classes, the crystallography.py library contains a series of functions to perform a variety of tests and calculations, such as atomic distances, or cell multiplication. The periodic table of the elements is also included as a dictionary.
The various components of the umd package write several output files. As a general rule, they are all ASCII files, all their entries are separated by tabs, and they are made as self-explanatory as possible. For example, they always clearly indicate the physical property and its units. The umd.dat files fully comply with this rule.
1. Analysis of the molecular-dynamics runs
NOTE: The package is available via the GitHub website (https://github.com/rcaracas/UMD_package) and via a dedicated page (http://moonimpact.eu/umd-package/) of the ERC IMPACT project as an open-access package.
Flag | Meaning | Script using it | Default value |
-h | Short help | all | |
-f | UMD file name | all | |
-i | Thermalization steps to be discarded | all | 0 |
-i | Input file containing the interatomic bonds | speciation | bonds.input |
-s | Sampling of the frequency | msd, speciation | 1 (every step is considered) |
-a | List of atoms or anions | speciation | |
-c | List of cations | speciation | |
-l | Bond length | speciation | 2 |
-t | Temperature | vibrations, rheology | |
-v | Discretization of the width of the sampling window of the trajectory for the mean-square displacement analysis | msd | 20 |
-z | Discretization of the start of the sampling window of the trajectory for the mean-square displacement analysis | msd | 20 |
Table 1: Most common flags used in the UMD package and their most common significance.
2. Perform the structural analysis
Figure 3: Determination of pair distribution function.
(a) For each atom of one species (for example red), all the atoms of the coordinating species (for example grey and/or red) are counted as a function of distance. (b) The resulting distance distribution graph for each snapshot, which at this stage is only a collection of delta functions, is then averaged over all the atoms and all the snapshots and weighted by the ideal gas distribution to generate (c) the pair distribution function that is continuous. The first minimum of the g(r) is the radius of the first coordination sphere, used later on in the speciation analysis. Please click here to view a larger version of this figure.
3. Perform the speciation analysis
Figure 4: Identification of the atomic clusters.
The coordination polyhedra are defined using interatomic distances. All atoms at a distance smaller than a specified radius are considered to be bonded. Here the threshold corresponds to the first coordination sphere (the light red circles), defined in Figure 1. Polymerization and thus chemical species are obtained from the networks of the bonded atoms. Note the central Red1Grey2 cluster, which is isolated from the other atoms, which form an infinite polymer. Please click here to view a larger version of this figure.
4. Compute diffusion coefficients
5. Time correlation functions
6. Thermodynamic parameters stemming from the simulations.
Pyrolite is a model multi-component silicate melt (0.5Na2O 2CaO 1.5Al2O3 4FeO 30MgO 24SiO2) that best approximates the composition of the bulk silicate Earth—the geochemical average or our entire planet, except for its iron-based core19. The early Earth was dominated by a series of large-scale melting events20, the last one might have engulfed the entire planet, after its condensation for the protolunar disk21. Pyrolite represents the best approximant to the composition of such planetary-scale magma oceans. Consequently, we extensively studied the physical properties of pyrolite melt in the 3,000‒5,000 K temperature range and 0‒150 GPa pressure range from ab initio molecular-dynamics simulations in the VASP implementation. These thermodynamic conditions entirely characterize the Earth’s most extreme magma ocean conditions. Our study is an excellent example of a successful use of the UMD package for the entire in-depth analysis of the melts22. We computed the distribution and the average bond lengths, we traced the changes in cation-oxygen coordination, and compared our results to previous experimental and computational studies on amorphous silicates of various compositions. Our in-depth analysis helped decompose standard coordination numbers into their basic constituents, outline the presence of exotic coordination polyhedra in the melt, and extract lifetimes for all the coordination polyhedra. It also outlined the importance of sampling in simulations in terms of both length of the trajectory and also number of atoms present in the system that is modeled. As for post-processing, the UMD analysis is independent of these factors, however, they should be taken into account when interpreting the results provided by the UMD package. Here, we show a few examples of how the UMD package can be used to extract several characteristic features of the melts, with an application to molten pyrolite.
The Si-O pair distribution function obtained from the gofrs_umd.py script shows that the radius of the first coordination sphere, which is the first minimum of the g(r) function, lies around 2.5 angstroms at T = 3000 K and P = 4.6 GPa. The maximum of the g(r) lies at 1.635 Å—this is the best approximation to the bend length. The long tail is due to the temperature. Using this limit as the Si-O bond distance, the speciation analysis shows that SiO4 units, which can last for up to a few picoseconds, dominate the melt (Figure 5). There is an important part of the melt that shows partial polymerization, as reflected by the presence of dimers like Si2O7, and trimers like Si3Ox units. Their corresponding lifetime is in the order of the picosecond. Higher-order polymers all have considerably shorter lifetime.
Figure 5: Lifetime of the Si-O chemical species.
The speciation is identified in a multi-component melt at 4.6 GPa and 3000 K. The labels mark the SiO3, SiO4, and SiO5 monomers and the various SixOy polymers. Please click here to view a larger version of this figure.
The different values of the vertical and horizontal steps, defined by the –z and –v flags above, yield various samplings of the MSD (Figure 6). Even large values of z and v are enough to define the slopes and thus the diffusion coefficients of the different atoms. The gain in time for the post-processing is remarkable when going to large values of z and v. The MSD offers a very strong validation criterion for the quality of the simulations. If the diffusion part of the MSD is not sufficiently long, that is a sign that the simulation is too short, and fails to reach the fluid state in a statistical sense. The minimum requirement for the diffusive part of the MSD highly depends on the system. One can require that all the atoms change their site at least once in the structure of the melt in order for it to be considered as a fluid10. An excellent example with applications in planetary sciences is complex silicate melts at high pressures close to or even below their liquidus line11. The Si atoms, the major network-forming cations, switch sites after more than two dozen picoseconds. Simulations shorter than this threshold would be considerably under-sampling the possible configurational space. However, as the coordinating anions, namely the O atoms, move faster than the central Si atoms, they can compensate for part of the slow mobility of Si. As such, the entire system could indeed cover a better sampling of the configurational space than assumed only from the Si displacements.
Figure 6: Mean-square displacements (MSD).
The MSD are illustrated for a few atomic types of a multi-component silicate melt. The sampling with various horizontal and vertical steps, z and v, yields consistent results. Solid circles: -z 50 –v 50. Open circles: -z 250 –v 500. Please click here to view a larger version of this figure.
Finally, the atomic VAC functions yield the vibrational spectrum of the melt. Figure 7 shows the spectrum at the same pressure and temperature conditions as above. We represent the contributions of Mg, Si, and O atoms, as well as the total value. At zero frequency there is a finite value of the spectrum, which corresponds to the diffusion character of the melt. Extraction of the thermodynamic properties from the vibrational spectrum needs to remove this gas-like diffusive character from zero but also to properly take into account its decay at higher frequencies.
Figure 7: The vibrational spectrum of pyrolite melt.
The real part of the Fourier transform of the atomic velocity-velocity self-correlation function yields the vibrational spectrum. Here the spectrum is computed for a multi-component silicate melt. Fluids have a non-zero gas-like diffusive character at zero frequency. Please click here to view a larger version of this figure.
The UMD package has been designed to work better with ab initio simulations, where the number of snapshots is typically limited to tens to hundreds of thousands of snapshots, with a few hundred atoms per unit cell. Larger simulations are also tractable provided the machine on which the post-processing runs has enough active memory resources. The code distinguishes itself by the variety of properties it can compute and by its open-source license.
The umd.dat files are appropriate to the ensembles that preserve the number of particles unchanged throughout the simulation. The UMD package can read files stemming from calculations where the shape and volume of the simulation box varies. These cover the most common calculations, like NVT and NPT, where the number of particles, N, temperature T, volume, V, and/or pressure, P, are kept constant.
For the time begin the pair distribution function as well as all the scripts needing to estimate the interatomic distances, like the speciation scripts, work only for orthogonal unit cells, meaning for cubic, tetragonal, and orthorhombic cells, where the angles between the axes are 90°.
The major lines of development for version 2.0 are removal of the orthogonality restriction for distances and adding more features for the speciation scripts: to analyze individual chemical bonds, to analyze the interatomic angles, and to implement the second coordination sphere. With help from external collaboration, we are working on porting the code onto a GPU for faster analysis in larger systems.
The authors have nothing to disclose.
This work was supported by the European Research Council (ERC) under the European Union Horizon 2020 research and innovation program (grant agreement number 681818 IMPACT to RC), by the Extreme Physics and Chemistry Directorate of the Deep Carbon Observatory, and by the Research Council of Norway through its Centers of Excellence funding scheme, project number 223272. We acknowledge access to the GENCI supercomputers through the stl2816 series of eDARI computing grants, to the Irene AMD supercomputer through the PRACE RA4947 project, and the Fram supercomputer through the UNINETT Sigma2 NN9697K. FS was supported by a Marie Skłodowska-Curie project (grant agreement ABISSE No.750901).
getopt library | open-source | ||
glob library | open-source | ||
matplotlib library | open-source | ||
numpy library | open-source | ||
os library | open-source | ||
Python software | The Python Software Foundation | Version 2 and 3 | open-source |
random library | open-source | ||
re library | open-source | ||
scipy library | open-source | ||
subprocess library | open-source | ||
sys library | open-source |