name: rolnicklab-feb24 class: title, middle ## Crystal-GFN: sampling crystals and catalysts with desirable properties and constraints .turquoise[Rolnick Lab, February 9th 2024] .center[
] .smaller[.footer[ Slides: [alexhernandezgarcia.github.io/slides/{{ name }}](https://alexhernandezgarcia.github.io/slides/{{ name }}) ]] --- ## What are crystals? [Wikipedia](https://en.wikipedia.org/wiki/Crystal): A crystal or crystalline solid is a solid material whose constituents (such as atoms, molecules, or ions) are arranged in a .highlight1[highly ordered microscopic structure], forming .highlight1[a crystal lattice that extends in all directions]. .left-column[ .center[![:scale 70%](/assets/images/slides/crystals/crystals_polycrystalline_amorphous.png)] ] .right-column[ .center[![:scale 40%](/assets/images/slides/crystals/Li2O.png)] ] -- Here, we are concerned mainly with _inorganic crystals_, where the constituents are atoms or ions. -- A crystal structure is characterized by its .highlight1[unit cell], a small imaginary box containing atoms in a specific spatial arrangement with certain symmetry. The unit cell repeats itself periodically in all directions. --- ## Why do we care about crystals? Many solid state materials are crystal structures and they are a core component of .highlight1[solar cells, batteries, electrocatalysts], etc. -- Accelerating .highlight1[material discovery is key in the climate crisis]. From the IPCC Sixth Assessment Report, 2022: * Improving material efficiency can reduce 0.93 ($\pm$ 0.23) GtCO₂-eq per year. * Fuel switching can reduce 2.1 ($\pm$ 0.52) GtCO₂-eq per year, only in the industry sector. * Carbon capture and storage can reduce 0.54 ($\pm$ 0.27) GtCO₂-eq per year in the energy sector. .smaller[.footnote[Global anthropogenic emissions in 2019 were estimated in 59 ($\pm$6.6) GtCO₂-eq.]] -- However, .highlight1[material modelling is very challenging]: * Limited data: only about 200k known inorganic materials, but potentially $10^{180}$ possible stable materials (for reference: more than a billion molecules are known) * Sparsity: .highlight2[stable materials] only exist in a low-dimensional subspace of all possible 3D arrangements. --- ## Crystal structure generation in the literature Example: .highlight2[Crystal Diffusion Variational Autoencoder (CDVAE)]: a diffusion process that moves .highlight1[atomic coordinates] towards a lower energy state and updates atom types to satisfy bonding preferences between neighbors. The key idea is to learn the diffusion process from the data distribution of stable materials. .cite[(Xie et al., 2022)] .center[![:scale 100%](/assets/images/slides/crystals/cdvae.png)] .references[Xie et al. [Crystal diffusion variational autoencoder for periodic material generation](https://arxiv.org/abs/2110.06197). ICLR 2022] ??? A: atom types X: atom coordinates L: perdiodic lattice: l1, l2, l3 (3x3) --- count: false ## Crystal structure generation in the literature Example: .highlight2[Physics Guided Crystal Generative Model (PGCGM)]: A GAN with the affine matrices of the symmetry operations and element properties as inputs, augmented with a distance loss between real and fake materials, and data augmentation on the atomic positions. .cite[(Zhao et al., 2023)] .center[![:scale 70%](/assets/images/slides/crystals/pgcgm.png)] .references[Zhao et al. [Physics guided deep learning for generative design of crystal materials with symmetry constraints](https://www.nature.com/articles/s41524-023-00987-9). npj computational materials 2023] --- count: false ## Crystal structure generation in the literature Example: .highlight2[MatterGen]: An evolution of CDVAE that performs diffusion not only on atomic positions but also on the atom types and the lattice. .cite[(Zeni, Pinsler, Zügner, Fowler et al., 2023)] .center[![:scale 90%](/assets/images/slides/crystals/mattergen.png)] .references[Zeni, Pinsler, Zügner, Fowler et al. [MatterGen: a generative model for inorganic materials design](https://arxiv.org/abs/2312.03687). arXiv 2023] --- count: false ## Crystal structure generation in the literature Example: .highlight2[UniMat]: A denoising diffusion model learns to move atoms from random locations back to their original locations. Atoms not present in the crystal are moved to the null location during the denoising process, allowing crystals with an arbitrary number of atoms to be generated. .center[![:scale 90%](/assets/images/slides/crystals/unimat.png)] .references[Yang et al. [Scalable Diffusion for Materials Generation](https://arxiv.org/abs/2311.09235). arXiv 2023] --- ## Our approach ### Crystal structure parameters .context[CDVAE and other works tackle crystal structure generation in the space of atom coordinates.] Instead of optimising the atom positions by learning from a small data set, we draw .highlight1[inspiration from theoretical crystallography to sample crystals in a lower-dimensional space of crystal structure parameters]. .left-column[ .center[![:scale 65%](/assets/images/slides/crystals/crystal_systems_table.png)] ] .right-column[ .center[![:scale 30%](/assets/images/slides/crystals/unit_cell.png)] ] .conclusion[We sample materials in the space of space groups (230 groups), compositions (elements and number of atoms) and lattice parameters (6 parameters: $a, b, c, \alpha, \beta, \gamma$).] --- ## GFlowNet approach ### Advantages .context[We generate materials in the lower-dimensional space of crystal structure parameters.] * Constructing materials by their crystal structure parameters allows us to introduce .highlight1[physicochemical and geometric _hard_ constraints]. -- * .highlight1[Searching in the lower-dimensional space] of crystal structure parameters may be more efficient than in the space of atom coordinates. Particularly suitable in active learning .cite[(Hernandez-Garcia, Saxena et al., 2023)]. .references[ Hernandez-Garcia, Saxena et al. [Multi-fidelity active learning with GFlowNets](https://arxiv.org/abs/2306.11715). arXiv 2306.11715, 2023 (RealML workshop at NeurIPS). ] -- * Provided we have access to a predictive model of a material property, we can .highlight1[flexibly generate materials with desirable properties]. -- * We can .highlight1[flexibly sample materials with specific characteristics, such as composition or space group]. --- ## Crystal-GFlowNet ### Main properties * .highlight1[State space]: 1. **Space group**: 230 groups, subdivided into crystal system, lattice system, point symmetry and space group (discrete) 2. **Composition**: how many atoms of which elements (discrete) 3. **Lattice parameters**: 6 parameters, $a, b, c, \alpha, \beta, \gamma$ (continuous) -- * .highlight1[Action space]: discrete and continuous movements in the subspace of the state space, with constraints. -- * .highlight1[Constraints]: * Charge neutrality of the composition. * Compatibility of composition and space group. * Hierarchical structure of the space group. * Compatibility of lattice parameters and lattice system. --- ## Crystal-GFlowNet ### Schematic .center[![:scale 100%](/assets/images/slides/crystals/crystalgfn_init.png)] --- count: false ## Crystal-GFlowNet ### Schematic .center[![:scale 100%](/assets/images/slides/crystals/crystalgfn_sg.png)] --- count: false ## Crystal-GFlowNet ### Schematic .center[![:scale 100%](/assets/images/slides/crystals/crystalgfn_sg_output.png)] --- count: false ## Crystal-GFlowNet ### Schematic .center[![:scale 100%](/assets/images/slides/crystals/crystalgfn_comp.png)] --- count: false ## Crystal-GFlowNet ### Schematic .center[![:scale 100%](/assets/images/slides/crystals/crystalgfn_comp_output.png)] --- count: false ## Crystal-GFlowNet ### Schematic .center[![:scale 100%](/assets/images/slides/crystals/crystalgfn_lp.png)] --- count: false ## Crystal-GFlowNet ### Schematic .center[![:scale 100%](/assets/images/slides/crystals/crystalgfn_lp_output.png)] --- count: false ## Crystal-GFlowNet ### Schematic .center[![:scale 100%](/assets/images/slides/crystals/crystalgfn_all.png)] --- ## Crystal-GFlowNet ### Properties We can train a Crystal-GFN with any reward function, provided it is computationally tractable. Therefore, we can use it to .highlight1[generate materials with different properties]. -- We have tested the following properties: - Formation energy per atom [eV/atom], via a pre-trained machine learning model. - Electronic band gap [eV] (squared distance to a target value, 1.34 eV), via a pre-trained machine learning model. - Unit cell density [g/cm
3
], calculated _exactly_ from the GFN outputs. -- .highlight1[Coming soon]: pre-trained machine learning model to predict the ionic conductivity [S/cm]. --- ## Crystal-GFlowNet ### Proxy ML models of the formation energy and the band gap - MLP architecture - Inputs: - Space group embedding - Composition embedding as in PhAST (Duval et al., 2022) - Standardised lattice parameters - Trained on MatBench (Materials Project) - Mean absolute error: - Formation energy: 0.10 eV/atom $\pm$ 0.005 - Band gap: 0.321 eV $\pm$ 0.003 - Reward: $R(x) = \exp(-\frac{MLP(x)}{T})$ --- ## Crystal-GFlowNet ### Experiments - Space groups: all 113 space groups present in the MatBench data set. - Compositions: - Up to 5 unique elements from the these 22 elements: ts: H, Li, B, C, N, O, F, Na, Mg, Al, Si, P, S, Cl, K, V, Mn, Fe, Co, Ni, Cu, Se. Th (22 most common elements in the training set). - Up to 16 atoms per element (80 atoms in total) - Lattice parameters: - Lengths: 0.9-100 angstroms. - Angles: 50-150°. - These ranges contain the bulk of the data set (excluding outliers). --- ## Results ### Density What do we expect Crystal-GFN to sample? - Small length lattice parameters (small volume): -- 4.9 Å compared to 8.3 Å in the validation dataset. ✔ -- - Denser packing space group: -- We see a shift towards spacegroups from the cubic, tetragonal, and hexagonal lattices (81, 99, 115, 195, 200, 207, 215, 221). ✔ -- - Higher number of atoms (larger mass): -- 58.8 atoms on average compared to 48.1 over the validation set. ✔ -- - Heavier elements (larger mass): -- Heavier such as Se, Cu, Ni are sampled the most often. ✔ --- ## Results ### Density .center[![:scale 80%](/assets/images/slides/crystals/density_elements.png)] --- ## Results ### Formation energy .center[![:scale 70%](/assets/images/slides/crystals/distributions_fe_22.png)] --- ## Results ### Band gap .center[![:scale 70%](/assets/images/slides/crystals/distributions_bg_22.png)] --- ## Results ### Restricted sampling We restrict the sampling space at sampling time: - A: The composition is restricted to only elements Fe and O, with a maximum of 10 atoms per element. - B: We sample in the ternary space for Li-Mn-O, keeping the element count to maximum 16 atoms. - C: We restrict the space groups to only cubic lattices. - D: We restrict the range of the lattice parameters to lengths between 10 and 20 angstroms and angles between 75 and 135 degrees. --- ## Results ### Restricted sampling .center[![:scale 70%](/assets/images/slides/crystals/distributions_restricted_sampling.png)] --- ## Results ### Diversity .context[.highlight2[Diversity] is key in materials discovery.] Analysis of 10,000 sampled crystals and the top-100 with lowest formation energy. - All 10,000 samples are unique. - All crystal systems, lattice systems and point symmetries found in the 10,000 samples. - 4 out of 8 crystal-lattice systems in the top-100. - 4 out of the 5 point symmetries in the top-100. - All 22 elements found in the 10,000 samples. - 15 out of 22 elements in the top-100. - 73 out of 113 space groups (65 %) found in the 10,000 samples - 19 out of 113 space groups in the top-100. --- name: title class: title, middle ## Summary and conclusions .center[![:scale 30%](/assets/images/slides/misc/conclusion.png)] --- ## Summary and conclusions * Discovering new crystal structures with desirable properties can help mitigate the climate crisis. * There are infinitely many conceivable crystals. Only a few are stable. Only a few stable crystals have interesting properties. This is a hard problem. * Crystal-GFN introduces .highlight1[physicochemical and structural constraints], reducing the search space. * Crystal-GFN was trained in 12 hours in a CPU-only machine. * Our results show that we can generate .highlight1[diverse, high scoring samples with the desired constraints]. * The .highlight1[framework can be flexibly extended] with more constraints, crystal structure descriptors (atomic positions) and other properties. .references[ * Mila AI4Science et al. [Crystal-GFN: sampling crystals with desirable properties and constraints](https://arxiv.org/abs/2310.04925). AI4Mat, NeurIPS 2023 (spotlight). ] .highlight2[Open source code]: [github.com/alexhernandezgarcia/gflownet](https://github.com/alexhernandezgarcia/gflownet) --- ## Acknowledgements ### Amazing co-authors and friends .left-column[ * Alexandre Duval * Alexandra Volokhova * Pierre Luc Carrier * Divya Sharma * Victor Schmidt * Yasmine Benabed * Michał Koziarski * Yoshua Bengio * Pierre-Paul De Breuck ] --- name: title class: title, middle ![:scale 50%](/assets/images/slides/materials/crystals.png) Alex Hernández-García (he/il/él) .smaller[Divya Sharma, Michał Koziarski, Alexandra Volokhova, Victor Schmidt,]
.smaller[Alexandre Duval, Yasmine Benabed, Pierre Luc Carrier, Yoshua Bengio] [github.com/alexhernandezgarcia/gflownet](https://github.com/alexhernandezgarcia/gflownet) .center[
] .footer[[alexhernandezgarcia.github.io](https://alexhernandezgarcia.github.io/) | [alex.hernandez-garcia@mila.quebec](mailto:alex.hernandez-garcia@mila.quebec)]
.footer[[@alexhg@scholar.social](https://scholar.social/@alexhg) [![:scale 1em](/assets/images/slides/misc/mastodon.png)](https://scholar.social/@alexhg) | [@alexhdezgcia](https://twitter.com/alexhdezgcia) [![:scale 1em](/assets/images/slides/misc/twitter.png)](https://twitter.com/alexhdezgcia)]