Andrei Paleyes
PyData meetup
Cambridge, January 2024
“Effective big data mining at scale doesn't begin or end with what academics would consider data mining”
“Data scientists expend a large amount of effort to understand the data available to them, before they even begin any meaningful analysis”
“Exploratory data analysis always reveals data quality issues”
“A dichotomy: Data systems are about exposing data. Services are about hiding it.”
“The underlying issue is that data and services don't sing too sweetly together.”
“We need to consider [data] a first class citizen of the architectures we build.”
"Challenges in Deploying Machine Learning: A Survey of Case Studies", ACM Computing Surveys 2022
"Data-Oriented Architecture: A Loosely-Coupled Real-Time SOA", Joshi, 2007
"Real-world Machine Learning Systems: A survey from a Data-Oriented Architecture Perspective", Under review
"DOA-Based Composition", https://github.com/cabrerac/doa-composition
"A preliminary architecture for a basic data-flow processor", Dennis and Misunas, 1974
"Data-flow computer architecture", Dennis, 1987
"Flow-Based Programming: A new approach to application development", Morrison, 2010
"Towards better data discovery and collection with flow-based programming", DCAI @ NeurIPS 2021
"An empirical evaluation of flow based programming in the machine learning deployment context", CAIN 2022
https://xkcd.com/552/
"Dataflow graphs as complete causal graphs", CAIN 2023
"Causal fault localisation in dataflow systems", EuroMLSys @ EuroSys 2023
"Can causality accelerate experimentation in software systems?", CAIN 2024
"Self-sustaining Software Systems (S4): Towards Improved Interpretability and Adaptation", SATrends 2024
"Desiderata for next generation of ML model serving", DMML @ NeurIPS 2022
"Dataflows for Machine Learning Operations", Kafka Summit 2023
"A locally time-invariant metric for climate model ensemble predictions of extreme risk", Environmental Data Science 2023
"Multi-fidelity experimental design for ice-sheet simulation", GP Seminar Series @ NeurIPS 2022
"Calculating exposure to extreme sea level risk will require high resolution ice sheet models", Under review
https://www.datascienceafrica.org/
Neil Lawrence | Christian Cabrera | Jessica Montgomery | Eric Meissner |
Pierre Thodoroff | Diana Robinson | Markus Kaiser | Mala Virdee |
Reach out!
ap2169@cl.cam.ac.uk | https://paleyes.info
https://mlatcl.github.io