I am a PhD candidate at the University of Cambridge, working in ML@CL group lead by Neil Lawrence. My research lies somewhere between machine learning and software systems, leaning towards the latter. I also like to dabble into Bayesian optimization sometimes. Before jumping into the world of academia I have spent more than a decade as a software engineer, developing everything from small webapps to data center network software.


Emukit is the Python package for all kinds of sequential decision making methods under uncertainty: optimization, quadrature, experiment design, sensitivity analysis. It was developed and released by our research group in Amazon Cambridge, but now lives in a neutral territory. I am the lead developer and main maintainer of Emukit.
[website] [github]

TTI Explorer
TTI Explorer is the simulation package we developed in DELVE to study effects of "test-trace-isolate" strategies on spread of COVID-19. Our report on it was released shortly before TTI was deployed in the UK, and received wide press coverage.

GPyOpt is one of the first Python packages for Bayesian optimization. GPyOpt was initially developed mostly by Javier González while he was with Neil Lawrence's group at the University of Sheffield. I took over ownership of GPyOpt from Javier, and lead the package development for few years. GPyOpt is now archived.
[website] [github]

Trieste is a Bayesian optimization package built on Tensorflow. I became involved in Trieste during my placement at Secondmind, added several major features and became one of the main overall contributors to the codebase and discussions around it.
[website] [github]

Seldon Core
Seldon Core is an ML inference platform. I co-led the design of the second version of it, focusing the dataflow architecture and data streaming.
[announcement] [website] [github]

I was fortunate enough to contribute to many different parts of Amazon. Some highlights:
Data center network (my code likely powers your internet!)
AWS CloudWatch
AWS SageMaker
Supply chain optimization technologies
Prime Air


Selected papers are mentioned here. For a complete list please check out the Google Scholar profile.

Causal fault localisation in dataflow systems
Andrei Paleyes, Neil D. Lawrence
3rd Workshop on Machine Learning and Systems (EuroMLSys), EuroSys 2023
[video] [paper] [code]

Dataflow graphs as complete causal graphs
Andrei Paleyes, Siyuan Guo, Bernhard Schölkopf, Neil D. Lawrence
2nd International Conference on AI Engineering - Software Engineering for AI (CAIN), ICSE 2023
[paper] [code]

An Empirical Evaluation of Flow Based Programming in the Machine Learning Deployment Context
Andrei Paleyes, Christian Cabrera-Jojoa, Neil D. Lawrence
1st International Conference on AI Engineering - Software Engineering for AI (CAIN), ICSE 2022
[paper (IEEE library)] [paper (ACM library)] [code]

Emulation of physical processes with Emukit
Andrei Paleyes, Mark Pullin, Maren Mahsereci, Cliff McCollum, Neil D. Lawrence, Javier González
Second workshop on machine learning and the physical sciences, NeurIPS, 2019

Automatic Discovery of Privacy-Utility Pareto Fronts
Brendan Avent, Javier González, Tom Diethe, Andrei Paleyes, Borja Balle
Proceedings on Privacy Enhancing Technologies, 2020 (Andreas Pfitzmann Best Student Paper Award)
Privacy Preserving Machine Learning Workshop, 2019
[video] [code] [paper]

Challenges in deploying machine learning: a survey of case studies
Andrei Paleyes, Raoul-Gabriel Urma, Neil D. Lawrence
The ML-Retrospectives, Surveys & Meta-Analyses Workshop, NeurIPS, 2020
ACM Computing Surveys (CSUR), 2022

Causal Bayesian Optimization
Virginia Aglietti, Xiaoyu Lu, Andrei Paleyes, Javier González
International Conference on Artificial Intelligence and Statistics, 2020

Effectiveness and resource requirements of test, trace and isolate strategies for COVID in the UK
Bobby He, Sheheryar Zaidi, Bryn Elesedy, Michael Hutchinson, Andrei Paleyes, Guy Harling, Anne M. Johnson, Yee Whye Teh, Royal Society's DELVE group
Royal Society open science 8 (3), 2021
[paper] [code]


The science of ML deployment and software systems
PyData meetup, Cambridge, January 2024

Introduction into MLOps
AI for the study of Environmental Risks (AI4ER), UKRI CDT, Cambridge, November 2023

Dataflow architecture and machine learning systems
Pasteur Labs Invited Speaker Series, October 2023

Emukit: decision making under uncertainty
SciPy 2023

Dataflow Architecture - Revisiting the Classic Software Architecture Paradigm
SAP Inspiration Sessions (online), June 2023

Dataflows for machine learning operations
Together with Alex Rakowski
Kafka Summit, London, May 2023
[event] [video] [slides]

Introduction into MLOps
Data Science Africa Summer School, Kigali, Rwanda, May 2023
[event] [slides]

Dataflow software as complete causal graphs
Causal Digital Twins workshop, ELLIS Unconference, January 2023
[event] [slides]

Benefits of dataflow modeling for data management in software systems
The Ocean Cleanup Challenge (online), Kili Technology, December 2022
[event] [slides]

Machine Learning Systems Design
Together with Neil Lawrence
Data Science Africa Summer School, Arusha, Tanzania, July 2022
[event] [slides]

Challenges in Deploying Machine Learning
Industry Expert Insights (online), Cambridge Spark, August 19, 2021

Challenges in Deploying Machine Learning, or What is rarely talked about at ML conferences
RSE Lunch Bytes (online), University of Sheffield, July 05, 2021
[video] [event] [slides]

Data Oriented Architectures for Deploying Machine Learning
DSAIL Research Day at DeKUT (online), June 19, 2021
[event] [slides]

Random tips for aspiring scientific open sources
Gaussian Processes meetup (online), January 21, 2021
[event] [slides]

Simulating Contact Tracing in the Pandemic: TTI Explorer
Together with Bryn Elesedy
ML and the Physical World course at the University of Cambridge (online), November 12, 2020
[video] [event] [slides]

Exploring lockdown exit strategies
Data Science Africa COVID-19 Webinar (online), April 8, 2020

Как использовать Hadoop on Windows Azure для анализа больших объемов данных
ITShare: High load проекты на .Net, Minsk, December 8, 2012


Royal Society DELVE
During the initial phase of the COVID-19 pandemic I became a member of the action team of the DELVE group (thanks to Neil Lawrence for inviting!). DELVE (Data Evaluation and Learning for Viral Epidemics) is a multi-disciplinary group, convened by the Royal Society, to support a data-driven approach to learning from the different approaches countries are taking to managing the pandemic. Over the course of 2020 we produced a number of reports, software and datasets, and provided advice to SAGE and ultimately the UK Government.

Challenges in Deploying and Monitoring Machine Learning Systems workshop
I am co-organizing a series of workshops (led by Alessandra Tosi) where we discuss all aspects surrounding running ML in production.
[ICML 2021] [NeurIPS 2022]

Data Science Africa
I often participate in DSA summer school, where with my colleagues we deliver lectures and practicals on the topic of design of ML systems.
[website 2020] [website 2021] [website 2022] [website 2023]

I have been a member of PCs for the following events: ECML 2021, ECML 2022, ECML 2023, SciPy 2023, ML-RSA workshop at NeurIPS 2020, DMML workshop at ICML 2020, PPAI workshop at AAAI 2023. I have reviewed articles for Nature Communications and Journal of Decision Systems. I semi-regularly review ML and software engineering books for Manning.