name: mila-visit-day-mar25 class: title, middle ## Machine learning for science ### How generative and active learning can facilitate scientific discoveries Alex Hernández-García (he/il/él) .turquoise[Visit Day · Mila · March 14th 2025] .center[
    
] .center[
    
] .smaller[.footer[ Slides: [alexhernandezgarcia.github.io/slides/{{ name }}](https://alexhernandezgarcia.github.io/slides/{{ name }}) ]] .qrcode[] --- count: false name: title class: title, middle ### Why scientific discoveries? .center[] --- ## Why scientific discoveries? .context[Climate change is a major challenge for humanity.]
.center[
.smaller[Observed (1900–2020) and projected (2021–2100) changes in global surface temperature relative to 1850–1900 (adapted from:
IPCC Sixth Assessment Report
)]
] .conclusion["The evidence is clear: the time for action is now." .smaller[IPCC Report, 2022]] --- ## Why scientific discoveries? .context[Climate change is a major challenge for humanity.] .center[
.smaller[Climate-sensitive health risks (adapted from:
World Health Organization
)]
] .smaller[ * Environmental factors take the lives of around 13 million people _per year_. * Climate change affects people’s mental and physical health, access to clean air, safe water, food and health care. ] .full-width[ .conclusion["Climate change is the single biggest health threat facing humanity." .smaller[[WHO and WMO](https://climahealth.info/), 2024]] ] --- ## Machine Learning for Science .center[] .conclusion[Machine learning research has the potential to facilitate scientific discoveries to tackle climate and health challenges.] --- count: false ## Machine Learning for Science and Science for Machine Learning .center[] .conclusion[Machine learning research has the potential to facilitate scientific discoveries to tackle climate and health challenges. Scientific challenges stimulate in turn machine learning research.] --- count: false name: mlforscience class: title, middle ### Machine learning for scientific discoveries .center[] --- ## Traditional discovery cycle .context35[Climate and health challenges demand accelerating scientific discoveries.] -- .right-column-66[
.center[]] .left-column-33[
The .highlight1[traditional cycle] for scientific discovery: * relies on .highlight1[highly specialised human expertise], * it is .highlight1[time-consuming] and * .highlight1[financially and computationally expensive]. ] --- count: false ## _Active_ (machine) learning .context35[The traditional scientific discovery loop is too slow for certain applications.] .right-column-66[
.center[]] .left-column-33[
A .highlight1[machine learning model] can be: * trained with data from _real-world_ experiments and ] --- count: false ## _Active_ (machine) learning .context35[The traditional scientific discovery loop is too slow for certain applications.] .right-column-66[
.center[]] .left-column-33[
A .highlight1[machine learning model] can be: * trained with data from _real-world_ experiments and * used to quickly and cheaply evaluate queries ] --- count: false ## _Active_ (machine) learning .context35[The traditional scientific discovery loop is too slow for certain applications.] .right-column-66[
.center[]] .left-column-33[
A .highlight1[machine learning model] can be: * trained with data from _real-world_ experiments and * used to quickly and cheaply evaluate queries .conclusion[There are infinitely many conceivable materials, $10^{180}$ potentially stable and $10^{60}$ drug molecules. Are predictive models enough?] ] --- count: false ## Active and _generative_ machine learning .right-column-66[
.center[]] .left-column-33[
.highlight1[Generative machine learning] can: * .highlight1[learn structure] from the available data, * .highlight1[generalise] to unexplored regions of the search space and * .highlight1[build better queries] ] --- count: false ## Active and _generative_ machine learning .right-column-66[
.center[]] .left-column-33[
.highlight1[Generative machine learning] can: * .highlight1[learn structure] from the available data, * .highlight1[generalise] to unexplored regions of the search space and * .highlight1[build better queries] .conclusion[Active learning with generative machine learning can in theory more efficiently explore the candidate space.] ] --- count: false name: title class: title, middle ### The challenges of scientific discoveries .center[] .center[] --- ## An intuitive trivial problem .highlight1[Problem]: find one arrangement of Tetris pieces on the board that minimise the empty space. .left-column-33[ .center[] ] .right-column-66[ .center[] ] -- .full-width[.center[
Score: 12
]] --- count: false ## An intuitive ~~trivial~~ easy problem .highlight1[Problem]: find .highlight2[all] the arrangements of Tetris pieces on the board that minimise the empty space. .left-column-33[ .center[] ] .right-column-66[ .center[] ] -- .full-width[.center[
12
12
12
12
12
]] --- count: false ## An intuitive ~~easy~~ hard problem .highlight1[Problem]: find .highlight2[all] the arrangements of Tetris pieces on the board that minimise the empty space. .left-column-33[ .center[] ] .right-column-66[ .center[] ] -- .full-width[.center[
]] --- count: false ## An incredibly ~~intuitive easy~~ hard problem .highlight1[Problem]: find .highlight2[all] the arrangements of Tetris pieces on the board that .highlight2[optimise an unknown function]. .left-column-33[ .center[] ] .right-column-66[ .center[] ] -- .full-width[.center[
]] --- count: false ## An incredibly ~~intuitive easy~~ hard problem .highlight1[Problem]: find .highlight2[all] the arrangements of Tetris pieces on the board that .highlight2[optimise an unknown function]. .left-column-33[ .center[] ] .right-column-66[ .center[] ] .full-width[.conclusion[Materials and drug discovery involve finding **diverse** candidates in **combinatorially or infinitely large search spaces**, with **rare properties** that are **expensive to evaluate**.]] --- ## Why Tetris for scientific discovery? .context35[The "Tetris problem" involves .highlight1[sampling from an unknown distribution] in a .highlight1[discrete, high-dimensional, combinatorially large space].] --- count: false ## Why Tetris for scientific discovery? ### Biological sequence design
Proteins, antimicrobial peptides (AMP) and DNA can be represented as sequences of amino acids or nucleobases. There are $22^{100} \approx 10^{134}$ protein sequences with 100 amino acids. .context35[The "Tetris problem" involves sampling from an unknown distribution in a discrete, high-dimensional, combinatorially large space] .center[] .left-column-66[ .dnag[`G`].dnaa[`A`].dnag[`G`].dnag[`G`].dnag[`G`].dnac[`C`].dnag[`G`].dnaa[`A`].dnac[`C`].dnag[`G`].dnag[`G`].dnat[`T`].dnaa[`A`].dnac[`C`].dnag[`G`].dnag[`G`].dnaa[`A`].dnag[`G`].dnac[`C`].dnat[`T`].dnac[`C`].dnat[`T`].dnag[`G`].dnac[`C`].dnat[`T`].dnac[`C`].dnac[`C`].dnag[`G`].dnat[`T`].dnat[`T`].dnaa[`A`]
.dnat[`T`].dnac[`C`].dnaa[`A`].dnac[`C`].dnac[`C`].dnat[`T`].dnac[`C`].dnac[`C`].dnac[`C`].dnag[`G`].dnaa[`A`].dnag[`G`].dnac[`C`].dnaa[`A`].dnaa[`A`].dnat[`T`].dnaa[`A`].dnag[`G`].dnat[`T`].dnat[`T`].dnag[`G`].dnat[`T`].dnaa[`A`].dnag[`G`].dnag[`G`].dnac[`C`].dnaa[`A`].dnag[`G`].dnac[`C`].dnag[`G`].dnat[`T`].dnac[`C`].dnac[`C`].dnat[`T`].dnaa[`A`].dnac[`C`].dnac[`C`].dnag[`G`].dnat[`T`].dnat[`T`].dnac[`C`].dnag[`G`]
.dnac[`C`].dnat[`T`].dnaa[`A`].dnac[`C`].dnag[`G`].dnac[`C`].dnag[`G`].dnat[`T`].dnac[`C`].dnat[`T`].dnac[`C`].dnat[`T`].dnat[`T`].dnat[`T`].dnac[`C`].dnag[`G`].dnag[`G`].dnag[`G`].dnag[`G`].dnag[`G`].dnat[`T`].dnat[`T`].dnaa[`A`]
.dnat[`T`].dnat[`T`].dnag[`G`].dnac[`C`].dnaa[`A`].dnag[`G`].dnaa[`A`].dnag[`G`].dnag[`G`].dnat[`T`].dnat[`T`].dnaa[`A`].dnaa[`A`].dnac[`C`].dnag[`G`].dnac[`C`].dnag[`G`].dnac[`C`].dnaa[`A`].dnat[`T`].dnag[`G`].dnac[`C`].dnag[`G`].dnaa[`A`].dnac[`C`].dnat[`T`].dnag[`G`].dnag[`G`].dnag[`G`].dnag[`G`].dnat[`T`].dnat[`T`].dnaa[`A`].dnag[`G`].dnat[`T`].dnaa[`A`].dnag[`G`].dnat[`T`].dnac[`C`].dnag[`G`].dnaa[`A`].dnaa[`A`].dnac[`C`].dnaa[`A`].dnat[`T`].dnaa[`A`].dnat[`T`].dnaa[`A`].dnat[`T`].dnat[`T`].dnag[`G`].dnaa[`A`].dnat[`T`].dnaa[`A`].dnaa[`A`].dnaa[`A`].dnac[`C`].dnaa[`A`]
.dnag[`G`].dnac[`C`].dnat[`T`].dnac[`C`].dnag[`G`].dnac[`C`].dnat[`T`].dnat[`T`].dnaa[`A`].dnag[`G`].dnag[`G`].dnag[`G`].dnac[`C`].dnac[`C`].dnat[`T`].dnac[`C`].dnag[`G`].dnaa[`A`].dnac[`C`].dnat[`T`].dnac[`C`].dnac[`C`].dnat[`T`].dnac[`C`].dnat[`T`].dnag[`G`].dnaa[`A`].dnaa[`A`].dnat[`T`].dnag[`G`].dnag[`G`].dnaa[`A`].dnag[`G`].dnat[`T`].dnag[`G`].dnat[`T`].dnat[`T`].dnac[`C`].dnaa[`A`].dnat[`T`].dnac[`C`].dnag[`G`].dnaa[`A`].dnaa[`A`].dnat[`T`].dnag[`G`].dnag[`G`].dnaa[`A`].dnag[`G`].dnat[`T`].dnag[`G`]
] --- ## Why Tetris for scientific discovery? ### Molecular generation .context35[The "Tetris problem" involves sampling from an unknown distribution in a discrete, high-dimensional, combinatorially large space]
Small molecules can also be represented as sequences or by a combination of of higher-level fragments. There may be about $10^{60}$ drug-like molecules. .columns-3-left[ .center[  `CC(=O)NCCC1=CNc2c1cc(OC)cc2 CC(=O)NCCc1c[nH]c2ccc(OC)cc12` ]] .columns-3-center[ .center[  `OCCc1c(C)[n+](cs1)Cc2cnc(C)nc2N` ]] .columns-3-right[ .center[  `CN1CCC[C@H]1c2cccnc2` ]] --- ## Why Tetris for scientific discovery? ### Crystal structure generation .context35[The "Tetris problem" involves sampling from an unknown distribution in a discrete, high-dimensional, combinatorially large space]
Crystal structures can be described by their chemical composition, the symmetry group and the lattice parameters (and more generally by atomic positions). .left-column[ .center[] ] .right-column[ .center[] ] .references[ * Mila AI4Science et al. [Crystal-GFN: sampling crystals with desirable properties and constraints](https://arxiv.org/abs/2310.04925). AI4Mat, NeurIPS 2023 (spotlight). ] --- count: false name: gflownets class: title, middle ### A gentle introduction to Generative Flow Networks (GFlowNets) .center[] --- count: false name: gflownets class: title, middle ### A brief ~~gentle~~ introduction to Generative Flow Networks (GFlowNets) .center[] --- count: false name: gflownets class: title, middle ### A lightning ~~brief gentle~~ introduction to Generative Flow Networks (GFlowNets) .center[] --- ## GFlowNets for science ### 3 key ingredients .context[Materials and drug discovery involve .highlight1[sampling from unknown distributions] in .highlight1[discrete or mixed, high-dimensional, combinatorially large spaces.]] --
1. .highlight1[Diversity] as an objective. -- - Given a score or reward function $R(x)$, learn to _sample proportionally to the reward_. -- 2. .highlight1[Compositionality] in the sample generation. -- - A meaningful decomposition of samples $x$ into multiple sub-states $s_0\rightarrow s_1 \rightarrow \dots \rightarrow x$ can yield generalisable patterns. -- 3. .highlight1[Deep learning] to learn from the generated samples. -- - A machine learning model can learn the transition function $F(s\rightarrow s')$ and generalise the patterns. -- .references[ - [Skip to multi-fidelity active learning](#mfal) ] --- ## 1. Diversity as an objective .context[Many existing approaches treat scientific discovery as an _optimisation_ problem.]
Given a reward or objective function $R(x)$, GFlowNet can be seen a generative model trained to sample objects $x \in \cal X$ according to .highlight1[a sampling policy $\pi(x)$ proportional to the reward $R(x)$]: -- .left-column[ $$\pi(x) = \frac{R(x)}{Z} \propto R(x)$$ ] -- .right-column[ $$Z = \sum_{x' \in \cal X} R(x')$$ ] -- .full-width[ .center[                                     ]] --- count: false ## 1. Diversity as an objective .context[Many existing approaches treat scientific discovery as an _optimisation_ problem.]
Given a reward or objective function $R(x)$, GFlowNet can be seen a generative model trained to sample objects $x \in \cal X$ according to .highlight1[a sampling policy $\pi(x)$ proportional to the reward $R(x)$]: .left-column[ $$\pi(x) = \frac{R(x)}{Z} \propto R(x)$$ ] .right-column[ $$Z = \sum_{x' \in \cal X} R(x')$$ ] .full-width[ → Sampling proportionally to the reward function enables finding .highlight1[multiple modes], hence .highlight1[diversity]. .center[] ] --- ## 2. Compositionality ### Sample generation process .context35[Sampling _directly_ from a complex, high-dimensional distribution is difficult.] The principle of compositionality is fundamental in semantics, linguistics, mathematical logic and is thought to be a cornerstone of human reasoning. --- count: false ## 2. Compositionality ### Sample generation process .context35[Sampling _directly_ from a complex, high-dimensional distribution is difficult.] For the Tetris problem, a meaningful decomposition of the samples is .highlight1[adding one piece to the board at a time]. -- .left-column[.center[]] --- count: false ## 2. Compositionality ### Sample generation process .context35[Sampling _directly_ from a complex, high-dimensional distribution is difficult.] For the Tetris problem, a meaningful decomposition of the samples is .highlight1[adding one piece to the board at a time]. .left-column[.center[]] --- count: false ## 2. Compositionality ### Sample generation process .context35[Sampling _directly_ from a complex, high-dimensional distribution is difficult.] For the Tetris problem, a meaningful decomposition of the samples is .highlight1[adding one piece to the board at a time]. .left-column[.center[]] --- count: false ## 2. Compositionality ### Sample generation process .context35[Sampling _directly_ from a complex, high-dimensional distribution is difficult.] For the Tetris problem, a meaningful decomposition of the samples is .highlight1[adding one piece to the board at a time]. .left-column[.center[]] --- count: false ## 2. Compositionality ### Sample generation process .context35[Sampling _directly_ from a complex, high-dimensional distribution is difficult.] For the Tetris problem, a meaningful decomposition of the samples is .highlight1[adding one piece to the board at a time]. .left-column[.center[]] --- count: false ## 2. Compositionality ### Sample generation process .context35[Sampling _directly_ from a complex, high-dimensional distribution is difficult.] For the Tetris problem, a meaningful decomposition of the samples is .highlight1[adding one piece to the board at a time]. .left-column[.center[]] --- count: false ## 2. Compositionality ### Sample generation process .context35[Sampling _directly_ from a complex, high-dimensional distribution is difficult.] For the Tetris problem, a meaningful decomposition of the samples is .highlight1[adding one piece to the board at a time]. .left-column[.center[]] --- count: false ## 2. Compositionality ### Sample generation process .context35[Sampling _directly_ from a complex, high-dimensional distribution is difficult.] For the Tetris problem, a meaningful decomposition of the samples is .highlight1[adding one piece to the board at a time]. .left-column[.center[]] --- count: false ## 2. Compositionality ### Sample generation process .context35[Sampling _directly_ from a complex, high-dimensional distribution is difficult.] For the Tetris problem, a meaningful decomposition of the samples is .highlight1[adding one piece to the board at a time]. .left-column[.center[]] --- count: false ## 2. Compositionality ### Sample generation process .context35[Sampling _directly_ from a complex, high-dimensional distribution is difficult.] For the Tetris problem, a meaningful decomposition of the samples is .highlight1[adding one piece to the board at a time]. .left-column[.center[]] --- count: false ## 2. Compositionality ### Sample generation process .context35[Sampling _directly_ from a complex, high-dimensional distribution is difficult.] For the Tetris problem, a meaningful decomposition of the samples is .highlight1[adding one piece to the board at a time]. .left-column[.center[]] --- count: false ## 2. Compositionality ### Sample generation process .context35[Sampling _directly_ from a complex, high-dimensional distribution is difficult.] For the Tetris problem, a meaningful decomposition of the samples is .highlight1[adding one piece to the board at a time]. .left-column[.center[]] --- count: false ## 2. Compositionality ### Sample generation process .context35[Sampling _directly_ from a complex, high-dimensional distribution is difficult.] For the Tetris problem, a meaningful decomposition of the samples is .highlight1[adding one piece to the board at a time]. .left-column[.center[]] --- count: false ## 2. Compositionality ### Sample generation process .context35[Sampling _directly_ from a complex, high-dimensional distribution is difficult.] For the Tetris problem, a meaningful decomposition of the samples is .highlight1[adding one piece to the board at a time]. .left-column[.center[]] --- count: false ## 2. Compositionality ### Sample generation process .context35[Sampling _directly_ from a complex, high-dimensional distribution is difficult.] For the Tetris problem, a meaningful decomposition of the samples is .highlight1[adding one piece to the board at a time]. .left-column[.center[]] --- count: false ## 2. Compositionality ### Sample generation process .context35[Sampling _directly_ from a complex, high-dimensional distribution is difficult.] For the Tetris problem, a meaningful decomposition of the samples is .highlight1[adding one piece to the board at a time]. .left-column[.center[]] --- count: false ## 2. Compositionality ### Sample generation process .context35[Sampling _directly_ from a complex, high-dimensional distribution is difficult.] For the Tetris problem, a meaningful decomposition of the samples is .highlight1[adding one piece to the board at a time]. .left-column[.center[]] --- count: false ## 2. Compositionality ### Sample generation process .context35[Sampling _directly_ from a complex, high-dimensional distribution is difficult.] For the Tetris problem, a meaningful decomposition of the samples is .highlight1[adding one piece to the board at a time]. .left-column[.center[]] --- count: false ## 2. Compositionality ### Sample generation process .context35[Sampling _directly_ from a complex, high-dimensional distribution is difficult.] For the Tetris problem, a meaningful decomposition of the samples is .highlight1[adding one piece to the board at a time]. .left-column[.center[]] --- count: false ## 2. Compositionality ### Sample generation process .context35[Sampling _directly_ from a complex, high-dimensional distribution is difficult.] For the Tetris problem, a meaningful decomposition of the samples is .highlight1[adding one piece to the board at a time]. .left-column[.center[]] --- count: false ## 2. Compositionality ### Sample generation process .context35[Sampling _directly_ from a complex, high-dimensional distribution is difficult.] For the Tetris problem, a meaningful decomposition of the samples is .highlight1[adding one piece to the board at a time]. .left-column[.center[]] --- count: false ## 2. Compositionality ### Sample generation process .context35[Sampling _directly_ from a complex, high-dimensional distribution is difficult.] For the Tetris problem, a meaningful decomposition of the samples is .highlight1[adding one piece to the board at a time]. .left-column[.center[]] --- count: false ## 2. Compositionality ### Sample generation process .context35[Sampling _directly_ from a complex, high-dimensional distribution is difficult.] For the Tetris problem, a meaningful decomposition of the samples is .highlight1[adding one piece to the board at a time]. .left-column[.center[]] --- count: false ## 2. Compositionality ### Sample generation process .context35[Sampling _directly_ from a complex, high-dimensional distribution is difficult.] For the Tetris problem, a meaningful decomposition of the samples is .highlight1[adding one piece to the board at a time]. .left-column[.center[]] --- count: false ## 2. Compositionality ### Sample generation process .context35[Sampling _directly_ from a complex, high-dimensional distribution is difficult.] For the Tetris problem, a meaningful decomposition of the samples is .highlight1[adding one piece to the board at a time]. .left-column[.center[]] --- count: false ## 2. Compositionality ### Sample generation process .context35[Sampling _directly_ from a complex, high-dimensional distribution is difficult.] For the Tetris problem, a meaningful decomposition of the samples is .highlight1[adding one piece to the board at a time]. .left-column[.center[]] .right-column[
.conclusion[The decomposition of the sampling process into meaningful steps yields patterns that may be correlated with the reward function and facilitates learning complex distributions.] ] --- ## 3. Deep learning policy .context35[GFlowNets learn a sampling policy $\pi\_{\theta}(x)$ proportional to the reward $R(x)$.] -- .left-column[ .center[] ] --- count: false ## 3. Deep learning policy .context35[GFlowNets learn a sampling policy $\pi\_{\theta}(x)$ proportional to the reward $R(x)$.] .left-column[ .center[] ] .right-column[
Deep neural networks are trained to learn the transitions (flows) policy: $F\_{\theta}(s\_t\rightarrow s\_{t+1})$. ] -- .right-column[ Consistent flow theorem (informal): if the sum of the flows into state $s$ is equal to the sum of the flows out, then $\pi(x) \propto R(x)$. ] .references[ Bengio et al. [Flow network based generative models for non-iterative diverse candidate generation](https://arxiv.org/abs/2106.04399), NeurIPS, 2021. (_not_ co-authored) ] -- .right-column[ .conclusion[GFlowNets can be trained with deep learning methods to learn a sampling policy $\pi\_{\theta}$ proportional to a reward $R(x)$.] ] --- ## GFlowNets review paper A review of the potential of GFlowNets for AI-driven scientific discoveries. .center[] .references[ Jain et al. [GFlowNets for AI-Driven Scientific Discovery](https://pubs.rsc.org/en/content/articlelanding/2023/dd/d3dd00002h). Digital Discovery, Royal Society of Chemistry, 2023. ] --- ## GFlowNet Python package Open sourced GFlowNet package, together with Mila collaborators: Nikita Saxena, Alexandra Volokhova, Michał Koziarski, Divya Sharma, Pierre Luc Carrier, Victor Schmidt, Joseph Viviano. .highlight2[Open source GFlowNet implementation]: [github.com/alexhernandezgarcia/gflownet](https://github.com/alexhernandezgarcia/gflownet) --- count: false name: mfal class: title, middle ## Multi-fidelity active learning Nikita Saxena, Moksh Jain, Cheng-Hao Liu, Yoshua Bengio .smaller[[Multi-fidelity active learning with GFlowNets](https://arxiv.org/abs/2306.11715). Transactions on Machine Learning Research (TMLR). 2024.] .center[] --- ## Why multi-fidelity? .context35[We had described the scientific discovery loop as a cycle with one single oracle.]
.right-column[ .center[] ] -- .left-column[ Example: "incredibly hard" Tetris problem: find arrangements of Tetris pieces that optimise an .highlight2[unknown function $f$]. - $f$: Oracle, cost per evaluation 1000 CAD. .center[
] ] --- count: false ## Why multi-fidelity? .context35[However, in practice, multiple oracles (models) of different fidelity and cost are available in scientific applications.]
.right-column[ .center[] ] .left-column[ Example: "incredibly hard" Tetris problem: find arrangements of Tetris pieces that optimise an .highlight2[unknown function $f$]. - $f$: Oracle, cost per evaluation 1000 CAD. .center[
] ] --- count: false ## Why multi-fidelity? .context35[However, in practice, multiple oracles (models) of different fidelity and cost are available in scientific applications.]
.right-column[ .center[] ] .left-column[ Example: "incredibly hard" Tetris problem: find arrangements of Tetris pieces that optimise an .highlight2[unknown function $f$]. - $f$: Oracle, cost per evaluation 1000 CAD. - $f\_1$: Slightly inaccurate oracle, cost 100 CAD. - $f\_2$: Noisy but informative oracle, cost 1 CAD. .center[
] ] --- count: false ## Why multi-fidelity? .context[In many scientific applications we have access to multiple approximations of the objective function.] .left-column[ For example, in .highlight1[material discovery]: * .highlight1[Synthesis] of a material and characterisation of a property in the lab * Molecular dynamic .highlight1[simulations] to estimate the property * .highlight1[Machine learning] models trained to predict the property ] .right-column[ .center[] ] -- .conclusion[However, current machine learning methods cannot efficiently leverage the availability of multiple oracles and multi-fidelity data. Especially with .highlight1[structured, large, high-dimensional search spaces].] --- ## Contribution - An .highlight1[active learning] algorithm to leverage the availability of .highlight1[multiple oracles at different fidelities and costs]. -- - The goal is two-fold: 1. Find high-scoring candidates 2. Candidates must be diverse -- - Experimental evaluation with .highlight1[biological sequences and molecules]: - DNA - Antimicrobial peptides - Small molecules - Classical multi-fidelity toy functions (Branin and Hartmann) -- .conclusion[Likely the first multi-fidelity active learning method for biological sequences and molecules.] --- ## Our multi-fidelity active learning algorithm .center[] --- ## Applications ### Ongoing, planned and potential * Discovering materials with high ionic conductivity for solid-state electrolyte batteries. * Discovering novel antibiotics through a lab-in-the-loop approach. * Designing electrocatalysts for sustainability purposes. * Designing DNA aptamers and proteins that can bind to specific targets. * `
` --- name: mila-visit-day-mar25 class: title, middle  Alex Hernández-García (he/il/él) .center[
    
    
    
] .highlight2[We are looking for students and collaborators to work on multi-fidelity active learning!] .footer[[alexhernandezgarcia.github.io](https://alexhernandezgarcia.github.io/) | [alex.hernandez-garcia@mila.quebec](mailto:alex.hernandez-garcia@mila.quebec)] | [alexhergar.bsky.social](https://bsky.app/profile/alexhergar.bsky.social) [](https://bsky.app/profile/alexhergar.bsky.social)
.smaller[.footer[ Slides: [alexhernandezgarcia.github.io/slides/{{ name }}](https://alexhernandezgarcia.github.io/slides/{{ name }}) ]]