name: rolnicklab-apr24 class: title, middle ## Active learning for scientific discoveries Presenting: Alex Hernández-García (he/il/él) .smaller[Working with: David Rolnick, Nikita Saxena, Alex Duval, Victor Schmidt, Chritina Humer, Pierre Luc Carrier, Moksh Jain, Yoshua Bengio...] .turquoise[Rolnick's Lab, Mila · April 5th 2024] .center[
    
    
] .smaller[.footer[ Slides: [alexhernandezgarcia.github.io/slides/{{ name }}](https://alexhernandezgarcia.github.io/slides/{{ name }}) ]] --- ## Why machine learning for scientific discoveries? .left-column-66[
- Mitigation of the climate crisis demands innovation in the energy and materials sector: .highlight1[materials discovery]. - The threat of antibiotic resistance and of new global pandemics demands innovation in .highlight1[drugs discovery]. - Other scientific disciplines can potentially benefit from the progress in scientific discovery pipelines. ] .right-column-33[
.center[![:scale 50%](/assets/images/slides/materials/lithium_oxide_crystal.png)] .center[![:scale 70%](/assets/images/slides/drugs/who_amr.png)] ] --- ## Traditional discovery cycle .context35[The climate crisis demands innovation in the scientific discovery model.] -- .right-column-66[
.center[![:scale 80%](/assets/images/slides/scientific-discovery/loop_1.png)]] .left-column-33[
The .highlight1[traditional pipeline] for scientific discovery: * relies on .highlight1[highly specialised human expertise], * it is .highlight1[time-consuming] and * .highlight1[financially and computationally expensive]. ] --- count: false ## Machine learning in the loop .context35[The traditional scientific discovery loop is too slow for certain applications.] .right-column-66[
.center[![:scale 80%](/assets/images/slides/scientific-discovery/loop_2.png)]] .left-column-33[
A .highlight1[machine learning model] can be: * trained with data from _real-world_ experiments and ] --- count: false ## Machine learning in the loop .context35[The traditional scientific discovery loop is too slow for certain applications.] .right-column-66[
.center[![:scale 80%](/assets/images/slides/scientific-discovery/loop_3.png)]] .left-column-33[
A .highlight1[machine learning model] can be: * trained with data from _real-world_ experiments and * used to quickly and cheaply evaluate queries ] --- count: false ## Machine learning in the loop .context35[The traditional scientific discovery loop is too slow for certain applications.] .right-column-66[
.center[![:scale 80%](/assets/images/slides/scientific-discovery/loop_3.png)]] .left-column-33[
A .highlight1[machine learning model] can be: * trained with data from _real-world_ experiments and * used to quickly and cheaply evaluate queries .conclusion[There are infinitely many conceivable materials, $10^{180}$ potentially stable and $10^{60}$ drug molecules. Are predictive models enough?] ] --- count: false ## _Generative_ machine learning in the loop .right-column-66[
.center[![:scale 80%](/assets/images/slides/scientific-discovery/loop_4.png)]] .left-column-33[
.highlight1[Generative machine learning] can: * .highlight1[learn structure] from the available data, * .highlight1[generalise] to unexplored regions of the search space and * .highlight1[build better queries] ] --- count: false ## _Generative_ machine learning in the loop .right-column-66[
.center[![:scale 80%](/assets/images/slides/scientific-discovery/loop_4.png)]] .left-column-33[
.highlight1[Generative machine learning] can: * .highlight1[learn structure] from the available data, * .highlight1[generalise] to unexplored regions of the search space and * .highlight1[build better queries] .conclusion[Active learning with generative machine learning can in theory more efficiently explore the candidate space.] ] --- ## Our multi-fidelity active learning algorithm .center[![:scale 100%](/assets/images/slides/mfal/mfal_0.png)] --- count: false ## Our multi-fidelity active learning algorithm .center[![:scale 100%](/assets/images/slides/mfal/mfal_1.png)] --- count: false ## Our multi-fidelity active learning algorithm .center[![:scale 100%](/assets/images/slides/mfal/mfal_2.png)] --- count: false ## Our multi-fidelity active learning algorithm .center[![:scale 100%](/assets/images/slides/mfal/mfal_3.png)] --- count: false ## Our multi-fidelity active learning algorithm .center[![:scale 100%](/assets/images/slides/mfal/mfal_4.png)] --- count: false ## Our multi-fidelity active learning algorithm .center[![:scale 100%](/assets/images/slides/mfal/mfal_5.png)] --- count: false ## Our multi-fidelity active learning algorithm .center[![:scale 100%](/assets/images/slides/mfal/mfal_6.png)] --- count: false ## Our multi-fidelity active learning algorithm .center[![:scale 100%](/assets/images/slides/mfal/mfal_7.png)] --- count: false ## Our multi-fidelity active learning algorithm .center[![:scale 100%](/assets/images/slides/mfal/mfal_8.png)] --- count: false ## Our multi-fidelity active learning algorithm .center[![:scale 100%](/assets/images/slides/mfal/mfal_9.png)] --- count: false ## Our multi-fidelity active learning algorithm .center[![:scale 100%](/assets/images/slides/mfal/mfal_10.png)] --- count: false ## Our multi-fidelity active learning algorithm .center[![:scale 100%](/assets/images/slides/mfal/mfal_11.png)] --- count: false ## Our multi-fidelity active learning algorithm .center[![:scale 100%](/assets/images/slides/mfal/mfal_12.png)] --- count: false ## Our multi-fidelity active learning algorithm .center[![:scale 100%](/assets/images/slides/mfal/mfal_13.png)] --- ## Results - Realistic experiments with experimental oracles and costs that reflect the computational demands (1, 3, 7). - GFlowNet adds one SELFIES token (out of 26) at a time with variable length up to 64 ($|\mathcal{X}| > 26^{64}$). - Property: Adiabatic electron affinity (EA). Relevant in organic semiconductors, photoredox catalysis and organometallic synthesis. -- .center[![:scale 50%](/assets/images/slides/mfal/molecules_ea_1.png)] --- count: false ## Results - Realistic experiments with experimental oracles and costs that reflect the computational demands (1, 3, 7). - GFlowNet adds one SELFIES token (out of 26) at a time with variable length up to 64 ($|\mathcal{X}| > 26^{64}$). - Property: Adiabatic electron affinity (EA). Relevant in organic semiconductors, photoredox catalysis and organometallic synthesis. .center[![:scale 50%](/assets/images/slides/mfal/molecules_ea_2.png)] --- count: false ## Results - Realistic experiments with experimental oracles and costs that reflect the computational demands (1, 3, 7). - GFlowNet adds one SELFIES token (out of 26) at a time with variable length up to 64 ($|\mathcal{X}| > 26^{64}$). - Property: Adiabatic electron affinity (EA). Relevant in organic semiconductors, photoredox catalysis and organometallic synthesis. .center[![:scale 50%](/assets/images/slides/mfal/molecules_ea_3.png)] --- count: false ## Results - Realistic experiments with experimental oracles and costs that reflect the computational demands (1, 3, 7). - GFlowNet adds one SELFIES token (out of 26) at a time with variable length up to 64 ($|\mathcal{X}| > 26^{64}$). - Property: Adiabatic electron affinity (EA). Relevant in organic semiconductors, photoredox catalysis and organometallic synthesis. .center[![:scale 50%](/assets/images/slides/mfal/molecules_ea_4.png)] --- count: false ## Results - Realistic experiments with experimental oracles and costs that reflect the computational demands (1, 3, 7). - GFlowNet adds one SELFIES token (out of 26) at a time with variable length up to 64 ($|\mathcal{X}| > 26^{64}$). - Property: Adiabatic electron affinity (EA). Relevant in organic semiconductors, photoredox catalysis and organometallic synthesis. .center[![:scale 50%](/assets/images/slides/mfal/molecules_ea_5.png)] --- count: false ## Results - Realistic experiments with experimental oracles and costs that reflect the computational demands (1, 3, 7). - GFlowNet adds one SELFIES token (out of 26) at a time with variable length up to 64 ($|\mathcal{X}| > 26^{64}$). - Property: Adiabatic electron affinity (EA). Relevant in organic semiconductors, photoredox catalysis and organometallic synthesis. .center[![:scale 50%](/assets/images/slides/mfal/molecules_ea_6.png)] --- count: false ## Results - Realistic experiments with experimental oracles and costs that reflect the computational demands (1, 3, 7). - GFlowNet adds one SELFIES token (out of 26) at a time with variable length up to 64 ($|\mathcal{X}| > 26^{64}$). - Property: Adiabatic electron affinity (EA). Relevant in organic semiconductors, photoredox catalysis and organometallic synthesis. .center[![:scale 50%](/assets/images/slides/mfal/molecules_ea_7.png)] --- name: rolnicklab-apr24 class: title, middle ![:scale 40%](/assets/images/slides/climatechange/climate_health_ai.png) Alex Hernández-García (he/il/él) .center[
    
] .footer[[alexhernandezgarcia.github.io](https://alexhernandezgarcia.github.io/) | [alex.hernandez-garcia@mila.quebec](mailto:alex.hernandez-garcia@mila.quebec)]
.footer[[@alexhg@scholar.social](https://scholar.social/@alexhg) [![:scale 1em](/assets/images/slides/misc/mastodon.png)](https://scholar.social/@alexhg) | [@alexhdezgcia](https://twitter.com/alexhdezgcia) [![:scale 1em](/assets/images/slides/misc/twitter.png)](https://twitter.com/alexhdezgcia)] .smaller[.footer[ Slides: [alexhernandezgarcia.github.io/slides/{{ name }}](https://alexhernandezgarcia.github.io/slides/{{ name }}) ]]