Best Practices for Engineering ML Pipelines - Part 2

Posted on Mon 07 November 2022 in machine-learning-engineering • Tagged with python, machine-learning, mlops, kubernetes, bodywork

ml-pipeline-engineering

This is the second part in a series of articles demonstrating best practices for engineering ML pipelines and deploying them to production. In the first part we focused on project setup - everything from codebase structure to configuring a CI/CD pipeline and making an initial deployment of a skeleton pipeline …


Continue reading

Best Practices for Engineering ML Pipelines - Part 1

Posted on Wed 03 March 2021 in machine-learning-engineering • Tagged with python, machine-learning, mlops, kubernetes, bodywork

ml-pipeline-engineering

The is the first in a series of articles demonstrating how to engineer a machine learning pipeline and deploy it to a production environment. We’re going to assume that a solution to a ML problem already exists within a Jupyter notebook, and that our task is to engineer this …


Continue reading

Best Practices for PySpark ETL Projects

Posted on Sun 28 July 2019 in data-engineering • Tagged with data-engineering, data-processing, apache-spark, python

png

I have often lent heavily on Apache Spark and the SparkSQL APIs for operationalising any type of batch data-processing ‘job’, within a production environment where handling fluctuating volumes of data reliably and consistently are on-going business concerns. These batch data-processing jobs may involve nothing more than joining data sources and …


Continue reading

Stochastic Process Calibration using Bayesian Inference & Probabilistic Programs

Posted on Fri 18 January 2019 in data-science • Tagged with probabilistic-programming, python, pymc3, quant-finance, stochastic-processes

jpeg

Stochastic processes are used extensively throughout quantitative finance - for example, to simulate asset prices in risk models that aim to estimate key risk metrics such as Value-at-Risk (VaR), Expected Shortfall (ES) and Potential Future Exposure (PFE). Estimating the parameters of a stochastic processes - referred to as ‘calibration’ in the parlance …


Continue reading

Deploying Python ML Models with Flask, Docker and Kubernetes

Posted on Thu 10 January 2019 in machine-learning-engineering • Tagged with python, machine-learning, machine-learning-operations, kubernetes

jpeg

  • 17th August 2019 - updated to reflect changes in the Kubernetes API and Seldon Core.
  • 14th December 2020 - the work in this post forms the basis of the Bodywork MLOps tool - read about it here.

A common pattern for deploying Machine Learning (ML) models into production environments - e.g. ML models …


Continue reading

Bayesian Regression in PYMC3 using MCMC & Variational Inference

Posted on Wed 07 November 2018 in data-science • Tagged with machine-learning, probabilistic-programming, python, pymc3

jpeg

Conducting a Bayesian data analysis - e.g. estimating a Bayesian linear regression model - will usually require some form of Probabilistic Programming Language (PPL), unless analytical approaches (e.g. based on conjugate prior models), are appropriate for the task at hand. More often than not, PPLs implement Markov Chain Monte Carlo …


Continue reading