Andrei Paleyes

I am a chief engineer at Pasteur Labs, where we apply ML to engineering design and simulations. I am also a visiting researcher in the ML@CL group at the University of Cambridge. My interests lie somewhere between machine learning and software systems, leaning towards the latter.

I have received a PhD in 2024 while being a part of the same research group, supervised by Neil Lawrence. Somehow I could not keep still, and while working on my thesis at various times also spent time with Secondmind, Seldon, and DELVE.

Before jumping into the world of academia I have spent more than a decade as a software engineer, developing everything from small webapps to data center network software.

software

Emukit
Emukit is the Python package for all kinds of sequential decision making methods under uncertainty: optimization, quadrature, experiment design, sensitivity analysis. It was developed and released by our research group in Amazon Cambridge, but now lives in a neutral territory. I am the lead developer and main maintainer of Emukit.
[website] [github]

TTI Explorer
TTI Explorer is the simulation package we developed in DELVE to study effects of "test-trace-isolate" (TTI) strategies on spread of COVID-19. Our report on it was released shortly before TTI was deployed in the UK, and received wide press coverage.
[github]

GPyOpt
GPyOpt is one of the first Python packages for Bayesian optimization. GPyOpt was initially developed mostly by Javier González while he was with Neil Lawrence's group at the University of Sheffield. I took over ownership of GPyOpt from Javier, and lead the package development for few years. GPyOpt is now archived.
[website] [github]

Trieste
Trieste is a Bayesian optimization package built on Tensorflow. I became involved in Trieste during my placement at Secondmind, added several major features and became one of the main overall contributors to the project.
[website] [github]

Seldon Core
Seldon Core is an ML inference platform. I co-led the design of the second version of it, focusing the dataflow architecture and data streaming.
[announcement] [website] [github]

Tesseract Core
Tesseract Core is a software package that allows users to build and distribute autodiff-native containers, thus enabling differential programming concept at system level.
[github]

Amazon
I was fortunate enough to contribute to many different parts of Amazon. Some highlights:
Data center network (my code likely powers your internet!)
AWS CloudWatch
AWS SageMaker
Supply chain optimization technologies
Prime Air

papers

Selected papers are mentioned here. For a complete list please check out the Google Scholar profile.

Causal fault localisation in dataflow systems
Andrei Paleyes, Neil D. Lawrence
3rd Workshop on Machine Learning and Systems (EuroMLSys), EuroSys 2023
[video] [paper] [code]

Dataflow graphs as complete causal graphs
Andrei Paleyes, Siyuan Guo, Bernhard Schölkopf, Neil D. Lawrence
2nd International Conference on AI Engineering - Software Engineering for AI (CAIN), ICSE 2023
[paper] [code]

Emulation of physical processes with Emukit
Andrei Paleyes, Mark Pullin, Maren Mahsereci, Cliff McCollum, Neil D. Lawrence, Javier González
Second workshop on machine learning and the physical sciences, NeurIPS, 2019
[paper]

Automatic Discovery of Privacy-Utility Pareto Fronts
Brendan Avent, Javier González, Tom Diethe, Andrei Paleyes, Borja Balle
Proceedings on Privacy Enhancing Technologies, 2020 (Andreas Pfitzmann Best Student Paper Award)
Privacy Preserving Machine Learning Workshop, 2019
[video] [code] [paper]

Challenges in deploying machine learning: a survey of case studies
Andrei Paleyes, Raoul-Gabriel Urma, Neil D. Lawrence
ACM Computing Surveys (CSUR), 2022
The ML-Retrospectives, Surveys & Meta-Analyses Workshop, NeurIPS, 2020
[paper]

Causal Bayesian Optimization
Virginia Aglietti, Xiaoyu Lu, Andrei Paleyes, Javier González
International Conference on Artificial Intelligence and Statistics, 2020
[paper]

Effectiveness and resource requirements of test, trace and isolate strategies for COVID in the UK
Bobby He, Sheheryar Zaidi, Bryn Elesedy, Michael Hutchinson, Andrei Paleyes, Guy Harling, Anne M. Johnson, Yee Whye Teh, Royal Society's DELVE group
Royal Society open science 8 (3), 2021
[paper] [code]

talks

Success, sensitivity and unbelievable quality of LLM code generation
Together with Diana Robinson
AI group seminar, Computer Lab, Cambridge, June 2025
[event] [slides]

Design of ML software systems, or How industry experience can lead someone to do a PhD
AI Cafe, St Edmund's College, Cambridge, November 2024
[event] [slides]

The science of ML deployment and software systems
PyData meetup, Cambridge, January 2024
[slides]

Introduction into MLOps
AI for the study of Environmental Risks (AI4ER), UKRI CDT, Cambridge, November 2023
[slides]

Dataflow architecture and machine learning systems
Pasteur Labs Invited Speaker Series, October 2023
[slides]

Emukit: decision making under uncertainty
SciPy 2023
[slides]

Dataflow Architecture - Revisiting the Classic Software Architecture Paradigm
SAP Inspiration Sessions (online), June 2023
[slides]

Dataflows for machine learning operations
Together with Alex Rakowski
Kafka Summit, London, May 2023
[event] [video] [slides]

Introduction into MLOps
Data Science Africa Summer School, Kigali, Rwanda, May 2023
[event] [slides]

Dataflow software as complete causal graphs
Causal Digital Twins workshop, ELLIS Unconference, January 2023
[event] [slides]

Benefits of dataflow modeling for data management in software systems
The Ocean Cleanup Challenge (online), Kili Technology, December 2022
[event] [slides]

Machine Learning Systems Design
Together with Neil Lawrence
Data Science Africa Summer School, Arusha, Tanzania, July 2022
[event] [slides]

Challenges in Deploying Machine Learning
Industry Expert Insights (online), Cambridge Spark, August 19, 2021
[slides]

Challenges in Deploying Machine Learning, or What is rarely talked about at ML conferences
RSE Lunch Bytes (online), University of Sheffield, July 05, 2021
[video] [event] [slides]

Data Oriented Architectures for Deploying Machine Learning
DSAIL Research Day at DeKUT (online), June 19, 2021
[event] [slides]

Random tips for aspiring scientific open sources
Gaussian Processes meetup (online), January 21, 2021
[event] [slides]

Simulating Contact Tracing in the Pandemic: TTI Explorer
Together with Bryn Elesedy
ML and the Physical World course at the University of Cambridge (online), November 12, 2020
[video] [event] [slides]

Exploring lockdown exit strategies
Data Science Africa COVID-19 Webinar (online), April 8, 2020
[event]

Как использовать Hadoop on Windows Azure для анализа больших объемов данных
ITShare: High load проекты на .Net, Minsk, December 8, 2012
[event]

misc

Royal Society DELVE
During the initial phase of the COVID-19 pandemic I became a member of the action team of the DELVE group (thanks to Neil Lawrence for the invite!). DELVE (Data Evaluation and Learning for Viral Epidemics) is a multi-disciplinary group, convened by the Royal Society, to support a data-driven approach to learning from the different approaches countries are taking to managing the pandemic. Over the course of 2020 we produced a number of reports, software and datasets, and provided advice to SAGE and ultimately the UK Government.
[website]

Challenges in Deploying and Monitoring Machine Learning Systems workshop
I am co-organizing a series of workshops (led by Alessandra Tosi) where we discuss all aspects surrounding running ML in production.
[ICML 2021] [NeurIPS 2022]

DALI Meeting
I co-organized DALI 2025 in Sorrento, Italy, together with Neil Lawrence, Christian Cabrera, and Hanni Sondremann.
[website]

Data Science Africa
I often participate in DSA summer school, where with my colleagues we deliver lectures and practicals on the topic of design of ML systems.
[website 2020] [website 2021] [website 2022] [website 2023]

Reviewing
I have been a member of PCs for the following events: ECML 2021, ECML 2022, ECML 2023, ECML 2024, ECML 2025, SciPy 2023, SciPy 2024, SciPy 2025, EuroMLSys workshop at EuroSys 2024, ML-RSA workshop at NeurIPS 2020, DMML workshop at ICML 2020, PPAI workshop at AAAI 2023. I have reviewed articles for Nature Communications, Journal of Decision Systems, and Natural Language Processing Journal. I semi-regularly review ML and software engineering books for Manning.