# Basis Sets - Defining Vector Spaces

Three questions have to be addressed before tackling an electronic
structure problem:
1. Which computer code is best suited for a given problem
2. Which computational method will give the most accurate results in a reasonable time
3. What basis set offers the best compromise of accuracy and efficiency? 

Throughout this course, you shall always be using the same code ([Psi4](https://psicode.org/)) - but you will get to try out some of the different approaches discussed in the lecture. Before the first
practical example - applying the Hartree-Fock-Roothaan scheme (that you
will treat in detail in the lectures) to simple systems, there remains
one issue to be resolved: What is the basis in which we want to expand
our wavefunction that is described by the in principle infinite
expansion 

$$
 \Psi(\mathbf{r}_1,\dots,\mathbf{r}_N) = \sum_j c_j \psi_j(\mathbf{r}_1,\dots,\mathbf{r}_N)
$$ (wf_expansion)

## One-Electron Wavefunctions: Slater-Type Orbitals (STOs)

By defining a basis set, we define a *vector space* in which the
Schrödinger equation is to be solved - and we wish this space to be as
close as possible to the complete space that defines the accurate
solution. You have already seen that the Hartree-Fock scheme makes a
convenient (but not always accurate) approximation to $\Psi$, in that it
is assumed that one Slater determinant is enough to accurately describe
the problem. Therefore, in Hartree-Fock theory, the {eq}`wf_expansion` reduces to:

$$
 \Psi(\mathbf{r}_1,\dots,\mathbf{r}_N) = \psi(\mathbf{r}_1,\dots,\mathbf{r}_N),
$$ (HF_wf)

where

$$ 
 \psi(\mathbf{r}_1,\dots,\mathbf{r}_N) = \det\left|\phi_1(\mathbf{r}_1),\dots,\phi_N(\mathbf{r}_N)\right|
$$ (slater_det)

is a Slater determinant to account for the antisymmetry requirement as
discussed in the preceeding chapter, and the $\left\{\phi\right\}$ are
one-electron orbitals. Although an expression for the many-electron
wavefunction in terms of one-particle wavefunctions is now given, the
latter are not yet specified. An intuitive approach to the one-electron
orbitals may be based on the *LCAO* (Linear Combination of Atomic
Orbitals) theory, where one-particle molecular orbitals are formed from
one-particle atomic orbitals. This implies that $\phi_m(\mathbf{r}_m)$
will be expanded in terms of all *atomic one-particle orbitals* of the
system, a set of *atomic basis functions* 

$$
 \phi_m(\mathbf{r}_m) = \sum_n D_{mn} \chi_n(\mathbf{r}_m),
$$ (LCAO)

where the $\left\{\chi\right\}$ are the atomic orbitals and $D_{mn}$ is
the expansion coefficient (the contribution) of the n$^{th}$ atomic
orbital to the single-particle molecular orbital $\phi_m$. As the
Hartree-Fock many-electron wavefunction is expressed as a single Slater
determinant, the coefficients $c_j$ as defined in the introduction
vanish, and the only coefficients left in the definition are the
$D_{mn}$. These are the expansion coefficients that are optimised in a
Hartree-Fock calculation.\
 \
Still, the question how to define the single-particle atomic orbitals is
not yet resolved. In principle, the condition that there be a cusp at
the nuclei and that the orbital fall off exponentially at large
distances from the nuclei dictates a certain form. One suitable form was
proposed by Slater in the 30ies of the last century: 

$$
 \chi_{\xi,n,l,m}(\mathbf{r},\theta,\phi) = N \cdot Y_{lm}(\theta,\phi)\cdot r^{n-1}\cdot e^{-\zeta r}
$$ (STO)

A Slater-type orbital is composed of an angular part that is taken from
the exact solution of the hydrogen atom $Y_{lm}$ (the spherical
harmonics), an exponential part (to ensure the right long-range decay)
and a polynomial. However, products of these functions will need to be
evaluated - and these are impractically expensive to compute. It is
therefore more convenient to choose basis functions that offer some
computational advantages. *Gaussian functions* would be especially
suited, as products of Gaussians will simply yield another Gaussian that
is placed off the initial centres. Frank Boys therefore proposed to
approximate Slater-type orbitals with a linear combination of
Gaussian-type functions. These Gaussian-type basis functions are
referred to as *contraction functions*. This implies that the atomic
basis function $\chi$ is in turn defined by several basis functions (the
term contraction is chosen to avoid confusion between the atomic basis
functions, and the linear combination of Gaussians they are based upon):

$$
 \chi_{\xi,n,l,m}^{STO-3G}(\mathbf{r},\theta,\phi) = \sum_{i=1}^3 d_i \cdot N_i \cdot Y_{lm}(\theta,\phi) \cdot r^{2n-2-l} \cdot e^{-\xi_i r^2},
$$ (STO-3G)

where $N_i$ is a normalisation constant, and $\xi_i$ is the i$^{th}$
prefactor in the exponent that guarantees an optimal fit to the
Slater-type orbitals. This defines a *minimal Gaussian basis set* known
as *STO-3G* (STO stands for Slater-type orbital and refers to the origin
of the Gaussian expansion). The term minimal basis does not refer to the
number of contractions, but to the number of basis functions: For each
orbital, there is one basis function. Minimal bases create minimal
computational overhead, but will often not provide sufficient
flexibility to accurately describe the system's wavefunction - there is
always a certain trade-off between the desired accuracy and the
efficiency of a calculation. For more details, you may refer to the main
course script.


### Pople-Type Split-Valence Basis Sets

Core and valence orbitals are equally important for the energetics of a
system, but bonding is dictated by the valence electrons. One may
therefore want to improve over the STO-3G basis by allowing for
additional flexibility in the description of valence electrons. In a
*split-valence basis set*, the number of basis functions that is
assigned to core orbitals differs from the one for the valence orbitals.
Usually, core electrons are described by one function, which is in turn
composed of a certain number of Gaussian functions (*i.e.*
contractions). For the description of the valence electrons, multiple
functions will be included (most often 2 to 6); and every of these
functions will in turn be expressed by a varying number of Gaussian
contractions.\
 \
An example of a split-valence basis set is John Pople's 3-21G. The
notation encodes information about the contraction: The number on the
left of the hyphen denotes the number of contractions for the core
orbitals, which consist of a single basis function per orbital only. The
information on the right describes the contraction of the valence
orbitals: There are two numbers, hence there are two basis functions
$\chi$ per orbital. These basis functions, in turn, are constructed by
two and one Gaussian contraction(s) respectively.


```{figure} ../../images/orbitals.png
---
name: orbitals
---
Explanation of what the numbers in the 3-21G basis set notation mean. 
```

Consider, as a practical example, carbon with the electronic
configuration $1s^22s^22p^2$ in the 3-21G basis. The core orbital (1s)
is given by a contraction over <span style="color:green">three</span> Gaussians. 

$$
 \chi(1s)=\sum_{k=1}^3 \alpha_{1s,k}\mathrm{e}^{-\zeta_{1s,k}\mathbf{r}^2}
$$ (example_C_core)

To every valence orbital (2s and 2p), one function containing <span style="color:blue"> two </span>
  Gaussians and one function containing
<span style="color:red"> one </span> Gaussian is attributed. 

$$
\begin{aligned}
\begin{split}
\chi(2s)^{(2)} & = \sum_{k=1}^2 \alpha_{2s,k} \ \mathrm{e}^{-\zeta_{2s,k}\mathbf{r}^2} \\
\chi(2s)^{(1)} & = \alpha'_{2s} \ \mathrm{e}^{-\zeta'_{2s}\mathbf{r}^2}
\end{split}
\end{aligned}
$$ (example_C_valence1)

$$
\begin{aligned}
\begin{split}
\chi(2p)^{(2)}_{\Gamma} & = \sum_{k=1}^2 \alpha_{2p,k} \ \Gamma_p(\mathbf{r}) \ \mathrm{e}^{-\zeta_{2p,k}\mathbf{r}^2} \\
\chi(2p)^{(1)}_{\Gamma}  & =\alpha'_{2p} \ \Gamma_p(\mathbf{r}) \ \mathrm{e}^{-\zeta'_{2p}\mathbf{r}^2}
\end{split}
\end{aligned}$$ (example_C_valence2)


where $\Gamma_p(\mathbf{r})=x,y,z$ accounts
for orbitals $p_x$, $p_y$, $p_z$. Fixed coefficients are added in front
of each Gaussian, denoted by $\alpha$.\
For each atom, there are individual sets of parameters $\alpha$ and
$\zeta$, which were determined back when the basis set was designed.
These contraction parameters are *never* changed during an electronic
structure calculation. Recall that the molecular one-electron
wavefunctions are variable linear combinations of *fixed* atomic
orbitals; changing the contraction parameters during the calculation
would change and therefore mess up the atomic basis functions. The
values for standard basis sets are usually hard-coded in the electronic
structure codes. 

For instance, *Psi4* represents the basis set
parameters in the following format:


```{figure} ../../images/basis_set_param_noted.png
---
name: basis_set_param
---
Example of a basis set parameter file
```

which are the 3-21G basis set parameters for a carbon atom (from https://github.com/psi4/psi4/blob/master/psi4/share/psi4/basis/3-21g.gbs).   
The $S$ entry contains information about the core, the $SP$ entries about the valence orbitals. The first number after $S$ or $SP$ refers to the index of the contraction $k$, the column below gives the contraction parameters
$\zeta_k$, the second column gives the $\alpha_{s,k}$ and the third the
$\alpha_{p,k}$. Note that if there is just one contraction, then
$\alpha_{l,1} = 1$. In general, s and p orbitals do not differ in
$\zeta_k$, but just in $\alpha_{l,k}$.



```{admonition} Exercise 1
:class: exercise 
A minimal basis set...\
    a) ...always gives the lowest energy.\
    b) ...is optimized for small molecules.\
    c) ...contains one basis function for each atomic orbital only.
```

```{admonition} Exercise 2
:class: exercise 
A split-valence basis set...\
    a) ...contains two basis functions for each valence atomic orbital.\
    b) ...doubles the CPU time of the calculation.\
    c) ...attributes a different number of basis functions to valence and
    core orbtials.
```

```{admonition} Exercise 3
:class: exercise 
Which of the following basis sets does not contain polarisation functions?\
    a) 6-31G$^\ast$\
    b) 6-31G(d,p)\
    c) 3-21+G\
    d) DZP
```

```{admonition} Exercise 4
:class: exercise 
Diffuse functions are added to a basis set to...\
    a) ...save CPU time.\
    b) ...better represent electronic effects at larger distances from the nuclei.\
    c) ...take polarisation into account.\
    d) ...enhance the description of core orbitals.
```

```{admonition} Exercise 5
:class: exercise 
Using the information given about the 3-21G contraction coefficients:\
    a) Give the basis functions corresponding to the 1s, 2s and 2p orbitals of Carbon (**Hint**: use information from **Fig. 2.2** ).\
    b) If you wish to calculate the Hartree-Fock energy of a carbon atom,
    how many coefficients are *optimised* during the calculation?
```

```{admonition} Exercise 6
:class: exercise 
You wish to calculate the wavefunction of ethylene C$_2$H$_2$ using the 6-31G\* basis. \
   Indicate the number of basis functions and the number of Gaussian primitives that will be used in the calculation.
```