Dirk Bakowies - Home Page

DIRK BAKOWIES | HOME PAGE

Dr. Dirk Bakowies
Computational Chemist

Tel: +41 78 856 61 45

Welcome to Dirk's homepage


Research Interests	Selected recent and past projects
Efficient computational models in (bio)chemistry	Biomolecular Simulation	Water cavities
Scientific programming		FABP
Biomolecular simulation		Carbopeptoids
Quantum/classical embedding		Enzymes & Free Energies
Semiempirical methods
High-level ab initio thermochemistry	Software & Algorithms	Cavity analysis
Exploration of chemical space		Pair list algorithms

Experience	Ab initio Thermochemistry	CBS extrapolation
Extensive programming and scripting experience		ATOMIC
Expert knowledge of FORTRAN		Error and uncertainty (1)
GPU and parallel programming		Error and uncertainty (2)
High-performance computing		ATOMIC-2
Linux system administration (scientific compute environments)
Teaching: Advanced courses, student supervision.	Semiempirical methods	Giant Fullerenes

Ab initio thermochemistry: The ATOMIC-2 protocol with estimates of error and uncertainty

top of page less detail next

ATOMIC-2 is the latest version of the ATOMIC protocol that implements Pople's concept of bond separation reactions to reduce the error of midlevel ab initio approaches in calculations of atomization energies and enthalpies of formation. The new protocol focuses on computational efficiency and increased accuracy; it retains the overall concept and all previously defined composite models, but improves on ATOMIC-1 in various other ways:

ATOMIC-2 vs ATOMIC-1
1. A computationally more efficient and more accurate level is employed for geometry optimizations and zero-point-energy evaluations.
2. The framework is improved, using more accurate CBS extrapolations to the Full CI limit for auxiliary molecules in bond separation reactions.
3. Bias and confidence interval for each of the contributing components are estimated from readily available proxies that represent type and size of a molecule (based on calibrations as in top figure).

The bias-corrected ATOMIC-2_um protocol is found to be more accurate for enthalpies of formation than popular G4 theory, yet remarkably efficient computationally (below and right). The uncertainty model predicts confidence intervals consistent with actual deviations from highly accurate reference data ATcT V 1.122r .

Estimate of higher-order electron correlation effects (top right)
Higher-order electron correlation corrections for mild to severe multi-reference cases can be estimated based on the so-called exess T1 diagnostic that is readily available without further quantum-chemical calculation.

Computational efficiency (bottom right)
1. Calculations for molecules with up to six non-hydrogen atoms nearly always finish in less than one hour on a single core of a fairly standard computer.
2. Most of them finish in just a few minutes. Computational effort is well-balanced between the major steps: (a) Geometry optimization and frequency calculation. (b) MP2 calculations. (c) CCSD(T) calculations.
3. CCSD(T) calculations normally show the least favorable scaling with system size. System size scaling is moderated, however, for molecules with many hydrogen atoms (typical of organic chemistry), since CCSD(T) steps employ very small basis sets for hydrogens. Hence calculations will be very feasible for larger molecules with more than ten non-hydrogen atoms and any number of hydrogen atoms, and they can be made even faster through the use of multi-threaded codes and / or the exploitation of molecular symmetry.

Computer code available (bottom right)
Computer code is available on zenodo.org that performs ATOMIC-2 analysis. It takes a standard MOL / SDF file as input (to define geometry and bond pattern), augmented by lists of single-point energies and harmonic frequencies obtained from standard QM outputs.

top of page
less detail

Estimate of higher-order electron correlation effects. Higher-order electron correlation effects (beyond CCSD(T)) are notoriously difficult to calculate and cannot be obtained in any thermochemistry protocol that is focused on computational efficiency. These effects can, however, make significant contributions, typically up to about 0.1 kcal/mol per correlated valence electron for moderate to severe multi-reference cases, which amounts to more than 1 kcal/mol for a small to medium-sized molecule. The figure demonstrates that the excess T₁ diagnostic, available without further quantum-chemical calculation, is a valuable estimator of the expected higher-order electron correlation effect. Magenta and violet lines show the esimtated correction and uncertainty, respectively, predicted by ATOMIC-2_um for the neglect of higher- order electron correlation contributions.

Computational efficiency. ATOMIC-2 / B₅ single-core (Xeon E5-26xx) running times as observed for 248'972 molecules or conformers H_nC_aN_bO_cF_d (2 <= a+b+c+d <= 6), taken from work to be published. Molecules are grouped by numbers of non-hydrogen (X = C, N, O, F) and hydrogen atoms (n); average running times are shown along with two standard deviations. Colors indicate the number of molecules per data point: up to 10 (red), 100 (orange), 1000 (yellow), 10000 (green), or more (dark green). The bottom panels indicate the average fraction of time spent on geometry optimizations and frequency calculations (violet), MP2 (blue), and coupled cluster calculations (black) needed for a full ATOMIC-2 / B5 evaluation. See the paper for details.

Bakowies, D. " ATOMIC-2 protocol for thermochemistry " J. Chem. Theory Comput. 2022, 18, 4142-4163.
Bakowies, D. "Ab initio thermochemistry with ATOMIC-2" DOI: 10.5281/zenodo.5780172
Bakowies, D.; von Lilienfeld, O. A. " Density functional geometries and zero-point energies in ab initio thermochemical treatments of compounds with first-row atoms (H, C, N, O, F) "
J. Chem. Theory Comput. 2021, 17(8), 4872-4890.

Ab initio thermochemistry: Estimates of error and uncertainty (1)

top of page less detail previous next

In experimental thermochemistry it is accepted standard to report results together with uncertainties, usually taken to be intervals of 95% confidence. The predictive power of theoretical procedures is routinely assessed by comparison to sets of known experimental results, however, it is still uncommon to augment theoretical predictions with fair estimates of uncertainty. A number of questions arise that may be difficult to answer: Is the benchmark representative enough to allow for meaningful error estimates outside? How do we account for the expected scaling of error with molecular size?

Focusing on hydrocarbons, ATOMIC(hc) follows an alternative strategy: Each of the components contributing to the bottom-of- the-well atomization energy is scrutinized for possible error by comparison to a large number of very high-level results, including complete-basis-set estimates of CCSDT(Q) bond separation energies for 83 hydrocarbons up to the size of naphthalene.

Bakowies, D. ,
" Estimating systematic error and uncertainty in ab initio thermochemistry. I. Atomization energies of hydrocarbons in the ATOMIC(hc) protocol " J. Chem. Theory Comput. 2019, 15(10), 5230-5251.

The graphic above shows that post-CCSD(T) effects can be quite sizeable, but it also indicates that simple uncertainty estimates (black lines) manage to cover most of the systems. A quality criterion computed from the T1 diagnostic warns of cases for which the error estimate may be unreliable (black circles). Other sources of error are studied as well, including the limited accuracy of complete-basis-set extrapolations of CCSD(T) and of computed relativistic effects and diagonal Born-Oppenheimer corrections.

top of page
less detail

Ab initio thermochemistry: Estimates of error and uncertainty (2)

top of page less detail previous next

In an extension to the ATOMIC(hc) model for bottom-of-the-well atomization energies, we have also studied errors and uncertainties for the remaining components necessary to evaluate enthalpies of formation.

ATOMIC(hc) provides enthalpies of formation complete with uncertainty estimates corresponding to intervals of 95% confidence. It focuses on computational efficiency, not on benchmark level quality, and so cannot compete with experiment for extremely well-studied molecules (see benzene, (1), above). However it may help to identify the more accurate among conflicting experimental values (2), to increase the accuracy of poorly known experimental data (3), to identify experimental numbers likely to be in error (4), and to provide experimentally unknown information (5).

Bakowies, D. , " Estimating systematic error and uncertainty in ab initio thermochemistry: II. ATOMIC(hc) enthalpies of formation for a large set of hydrocarbons " J. Chem. Theory Comput. 2020, 16(1), 399-426.

The evaluation of ZPEs from scaled harmonic frequencies (black in top graph) expectedly emerges as the leading source of uncertainty if highly accurate composite models are used to treat the electronic problem (such as A*, green, vs ZPE, black, in center graph), but uncertainties are usually balanced with those from computationally more attractive B level models (such as B5, blue, center graph) to estimate the CBS limit of CCSD(T).

top of page
less detail

Ab initio thermochemistry: ATOMIC protocol

top of page previous next

The ATOMIC approach was developed with the needs in mind that are posed by the calibration of modern approximate models of quantum chemistry, such as semiempirical methods. It is a robust and computationally efficient approach to otherwise dauntingly expensive calculations of atomization energies. The graph shows how the use of bond separation reactions (BSRs) helps to reduce errors in each of the components contributing to the CCSD(T)(full) atomization energy at the complete-basis set limit. Each single chart shows RMS errors for a particular component as function of the basis-set cardinal number, without (top) or with (bottom) extrapolation. In practice only small basis-set calculations are feasible for larger systems.

Bakowies, D. , " Ab initio thermochemistry using optimal-balance models with isodesmic corrections: The ATOMIC protocol " J. Chem. Phys. 2009, 130, 144113/1-21.
Bakowies, D. , " Ab initio thermochemistry with high-level isodesmic corrections: Validation of the ATOMIC protocol for a large set of compounds with first-row atoms (H, C, N, O, F) " J. Phys. Chem. A 2009, 113(43), 11517-11534.
Bakowies, D. , " Assessment of density functional theory for thermochemical approaches based on bond separation reactions "
J. Phys. Chem. A 2013, 117(1), 228-243.
Bakowies, D. , " Simplified wave function models in thermochemical protocols based on bond separation reactions " J. Phys. Chem. A 2014, 118(50), 11811-11827.

Corrections to atomization energies beyond the CCSD(T) level of theory are estimated from thermoneutral BSRs. This simplification renders the calculation of these corrections a trivial task of summing up bond increments. Such an approach is astonishingly accurate for scalar relativistic corrections and works reasonably well even for CCSDTQ-CCSD(T) corrections.

top of page

Ab initio thermochemistry: CBS extrapolation

top of page previous next

The development of accurate extrapolation formulas for electron correlation energies is an important field in ab initio thermochemistry. Electron correlation energies are known to converge slowly to the complete basis set limit, and finite basis set calculations will thus carry substantial error. On the other hand, computational restraints usually force one to resort to small basis-set calculations. The graph compares residual errors for our newly developed and theoretically well-motivated extrapolation formula (2nd and 4th panel) to those of the best alternative formulations (1st and 3rd panels). The large improvement is expected to have a significant impact on producing reliable ab initio reference data for the calibration of empirical and semiempirical potentials.

Bakowies, D. , " Extrapolation of electron correlation energies to finite and complete basis set targets " J. Chem. Phys. 2007, 127, 084105/1-23.
Bakowies, D. , " Accurate extrapolation of electron correlation energies from small basis sets " J. Chem. Phys. 2007, 127, 164109/1-12

top of page

Biomolecular simulation: Algorithms for trajectory analysis

top of page previous next

A simulation of FABP in water. The protein carries a large water-filled cavity in the interior. How can we extract this internal water?

Bakowies, D.; van Gunsteren, W. F., " Water in protein cavities: A procedure to identify internal water and exchange pathways and application to fatty acid binding protein " Proteins 2002, 47, 534-545.

top of page

Essentially, we represent the protein barrel by its C_α skeleton, triangulate it, and determine all water molecules inside the polyhedron.

Some details on the algorithm. The triangulation exploits locality and can thus be performed in linear time.

Biomolecular simulation: FABP

top of page previous next

Using the above algorithm, we may analyze the entire trajectory and identify the distribution of 3 (apo) and 4 (holo) water molecules which in NMR experiments have been found to be particularly immobile,

Bakowies, D.; van Gunsteren, W. F., " Simulations of apo and holo-fatty acid binding protein: Structure and dynamics of protein, ligand and internal water "
J. Mol. Biol. 2002, 315, 713-736.

top of page

or analyze the entire interior water density

or even time-resolved interaction potentials with other water, with protein residues, and with the ligand to improve our understanding of the internal water dynamics.

Biomolecular simulation: Carbopeptoids

top of page previous next

Carbopeptoids are homooligomers of sugar-containing peptides, and they serve as rigidified peptide models with potential applications as drugs that block protein-protein interactions and inhibit enzyme catalysis.

top of page

MD simulations reproduce experimentally (NOE) derived distance constraints. Cluster analyses of MD trajectories demonstrates, however, that the experimentally postulated helical structure is only one of several dominating structural motifs comprising the entire ensemble, and that the unfolded state is in fact not structureless. Such insight is hard if not impossible to obtain from experiment alone.

Baron, R.; Bakowies, D.; van Gunsteren, W. F., " Principles of carbopeptoid folding: A molecular dynamics simulation study " J. Peptide Sci. 2005, 11, 74-84.
Baron, R.; Bakowies, D.; van Gunsteren, W. F., " Carbopeptoid folding: Effects of stereochemistry, chain length, and solvent " Angew. Chem. Int. Ed. 2004, 43, 4055-4059, Angew. Chem. 2004, 116, 4147-4151.

Cluster analysis combining the ensembles of the tetrapeptide and equally long blocks of the hexapeptide demonstrates the repetition of structural motifs in longer peptide chains, a result, that was postulated in experimental studies. Note the "overlapping" (blue/red) clusters in the graph.

MD simulation software: Pair list algorithms

top of page previous next

MD simulations often apply a distance-cutoff for pair potentials, and the scan of the atom pair matrix is one of the very time-critical parts of such an MD simulation. While linear-scaling grid-cell techniques become efficient for very large system sizes, improved double-loop algorithms are beneficial for intermediate sizes often considered in current-day simulations.

van Gunsteren, W. F.; Bakowies, D.; Baron, R. et al., " Biomolecular modeling: Goals, problems, perspectives " Angew. Chem. Int. Ed. 2006, 45, 4064-4092, Angew. Chem. 2006, 118, 4168-4198.

top of page

Here we take advantage of the fast processor cache found in modern CPUs and replace the row-wise atom pair scan (unshaded) by a window scan (shaded) which can process a number of pairs that scales quadratically, rather than linearly, with the number of atoms loaded into cache memory. The triangular atom-pair matrix may be reordered to become rectangular, in which case all rhombic windows become quadratic.

Giant fullerenes

top of page previous

Fullerenes were discovered in the mid-80's and have attracted a lot of attention as new allotropes of carbon. The prototype buckminsterfullerene, C₆₀, is spherical due to its high symmetry.

top of page

Despite earlier claims, however, our semiempirical calculations have shown that larger fullerenes of icosahedral symmetry prefer facetted over spherical shapes. These results were confirmed by more rigorous density functional calculations . The picture above shows the facetted form of C₉₆₀ from two different perspectives and a hypothetical spherical alternative.

Bakowies, D.; Buehl, M.; Thiel, W., " Can large fullerenes be spherical ? " J. Am. Chem. Soc. 1995, 117, 10113-10118.
Bakowies, D.; Buehl, M.; Thiel, W., " A density functional study on the shape of C₁₈₀ and C₂₄₀ fullerenes " Chem. Phys. Lett. 1995, 247, 491-493.