Andrei Paleyes
Accelerate Science AI Cafe
St Edmund's College, November 2024
“Effective big data mining at scale doesn't begin or end with what academics would consider data mining”
“Data scientists expend a large amount of effort to understand the data available to them, before they even begin any meaningful analysis”
“Exploratory data analysis always reveals data quality issues”
“A dichotomy: Data systems are about exposing data. Services are about hiding it.”
“The underlying issue is that data and services don't sing too sweetly together.”
“We need to consider [data] a first class citizen of the architectures we build.”
"Challenges in Deploying Machine Learning: A Survey of Case Studies", ACM Computing Surveys 2022
"Data-Oriented Architecture: A Loosely-Coupled Real-Time SOA", Joshi, 2007
"Real-world Machine Learning Systems: A survey from a Data-Oriented Architecture Perspective", Under review
"DOA-Based Composition", https://github.com/cabrerac/doa-composition
"Towards better data discovery and collection with flow-based programming", DCAI @ NeurIPS 2021
"An empirical evaluation of flow based programming in the machine learning deployment context", CAIN 2022
"Dataflow graphs as complete causal graphs", CAIN 2023
"Causal fault localisation in dataflow systems", EuroMLSys @ EuroSys 2023
"Can causality accelerate experimentation in software systems?", CAIN 2024
"Self-sustaining Software Systems (S4): Towards Improved Interpretability and Adaptation", SATrends 2024
Neil Lawrence | Christian Cabrera | Jessica Montgomery | Eric Meissner |
Pierre Thodoroff | Diana Robinson | Markus Kaiser | Siyuan Guo |
Reach out!
ap2169@cl.cam.ac.uk | https://paleyes.info
https://mlatcl.github.io