I am a PhD candidate at the University of Cambridge, working in ML@CL group lead by Neil Lawrence. My research lies somewhere between machine learning and software systems, leaning towards the latter. I also like to dabble into Bayesian optimization sometimes. Before jumping into the world of academia I have spent more than a decade as a software engineer, developing everything from small webapps to data center network software.


Emukit is the Python package for all kinds of sequential decision making methods under uncertainty: optimization, quadrature, experiment design, sensitivity analysis. It was developed and released by our research group in Amazon Cambridge, but now lives in a neutral territory. I am the lead developer and main maintainer of Emukit.
[website] [github]

TTI Explorer
TTI Explorer is the simulation package we developed in DELVE to study effects of "test-trace-isolate" strategies on spread of COVID-19. Our report on it was released shortly before TTI was deployed in the UK, and received wide press coverage.

GPyOpt is one of the first Python packages for Bayesian optimization. GPyOpt was initially developed mostly by Javier González while he was with Neil Lawrence's group at the University of Sheffield. I took over ownership of GPyOpt from Javier, and lead the package development for few years. GPyOpt is now archived.
[website] [github]

Trieste is a Bayesian optimization package built on Tensorflow. I became involved in Trieste during my placement at Secondmind, added several major features and became one of the main overall contributors to the codebase and discussions around it.
[website] [github]

Seldon Core
Seldon Core is an ML serving platform. I co-designed second version of it, focusing on bringing on dataflow ideas.
[website] [github] [Announcement blog post]

I was fortunate enough to contribute to many different parts of Amazon. Some highlights:
Data center network (my code likely powers your internet!)
AWS CloudWatch
AWS SageMaker
Supply chain optimization technologies
Prime Air


Selected papers are mentioned here. For a complete list please check out the Google Scholar profile.

An Empirical Evaluation of Flow Based Programming in the Machine Learning Deployment Context
Andrei Paleyes, Christian Cabrera-Jojoa, Neil D. Lawrence
International Conference on AI Engineering - Software Engineering for AI (CAIN), 2022
[Paper on arXiv] [Paper at IEEE] [code]

Emulation of physical processes with Emukit
Andrei Paleyes, Mark Pullin, Maren Mahsereci, Cliff McCollum, Neil D. Lawrence, Javier González
Second workshop on machine learning and the physical sciences, NeurIPS, 2019

Automatic Discovery of Privacy-Utility Pareto Fronts
Brendan Avent, Javier González, Tom Diethe, Andrei Paleyes, Borja Balle
Proceedings on Privacy Enhancing Technologies, 2020 (Andreas Pfitzmann Best Student Paper Award)
Privacy Preserving Machine Learning Workshop, 2019
[video] [code] [paper]

Challenges in deploying machine learning: a survey of case studies
Andrei Paleyes, Raoul-Gabriel Urma, Neil D. Lawrence
The ML-Retrospectives, Surveys & Meta-Analyses Workshop, NeurIPS, 2020
ACM Computing Surveys (CSUR), 2022
[Paper on arXiv] [Paper in journal]

Causal Bayesian Optimization
Virginia Aglietti, Xiaoyu Lu, Andrei Paleyes, Javier González
International Conference on Artificial Intelligence and Statistics, 2020

Effectiveness and resource requirements of test, trace and isolate strategies for COVID in the UK
Bobby He, Sheheryar Zaidi, Bryn Elesedy, Michael Hutchinson, Andrei Paleyes, Guy Harling, Anne M. Johnson, Yee Whye Teh, Royal Society's DELVE group
Royal Society open science 8 (3), 2021
[paper] [code]


Dataflow software as complete causal graphs
Causal Digital Twins workshop, ELLIS Unconference, January 2023

Benefits of dataflow modeling for data management in software systems
The Ocean Cleanup Challenge, Kili Technology, December 2022
[event] [slides]

Machine Learning Systems Design
Together with Neil Lawrence
Data Science Africa Summer School, Arusha, July 2022
[event] [slides]

Challenges in Deploying Machine Learning
Industry Expert Insights, Cambridge Spark, August 19, 2021

Challenges in Deploying Machine Learning, or What is rarely talked about at ML conferences
RSE Lunch Bytes, University of Sheffield, July 05, 2021
[video] [event] [slides]

Data Oriented Architectures for Deploying Machine Learning
DSAIL Research Day, DeKUT, June 19, 2021
[event] [slides]

Random tips for aspiring scientific open sources
Gaussian Processes meetup, January 21, 2021
[event] [slides]

Simulating Contact Tracing in the Pandemic: TTI Explorer
Together with Bryn Elesedy
ML and the Physical World course at the University of Cambridge, November 12, 2020
[video] [event] [slides]

Exploring lockdown exit strategies
Data Science Africa COVID-19 Webinar, April 8, 2020

Как использовать Hadoop on Windows Azure для анализа больших объемов данных
ITShare: High load проекты на .Net, December 8, 2012


Royal Society DELVE
During the initial phase of the COVID-19 pandemic I became a member of the action team of the DELVE group (thanks to my supervisor Neil Lawrence for inviting!). DELVE (Data Evaluation and Learning for Viral Epidemics) is a multi-disciplinary group, convened by the Royal Society, to support a data-driven approach to learning from the different approaches countries are taking to managing the pandemic. Over the course of 2020 we produced a number of reports, software and datasets, and provided advice to SAGE and ultimately the UK Government.

Challenges in Deploying and Monitoring Machine Learning Systems, ICML and NeurIPS workshop
I was involved in two installments of this workshop (led by Alessandra Tosi): as reviewer in 2020 and as organizer in 2021. We organizing it again for NeurIPS 2022!
[website 2020] [website 2021] [website 2022]

Data Science Africa
Our group regularly participates in DSA summer school, where we deliver lectures and practicals on the topic of design of ML systems.
[website 2020] [website 2021] [website 2022]