The Seven Principles of Data Reliability Engineering
Based on Google’s Site Reliability Engineering (SRE) principles, Data Reliability Engineering (DRE) is a combined set of tools and processes that modern data teams use to solve data challenges in a scalable way. Monitoring, standard-setting, change management, and incident management are being applied to data warehouses and pipelines, just like they’re used to keep applications and infrastructure working reliably.
DRE encompasses the set of practices that data teams engage in to maintain freshness, quality, and ultimately, the reliability of the data they provide to stakeholders.
Principle No. 1Embracing risk
The only way to have perfectly reliable data, is to not have any data at all. Data pipelines break in unexpected ways—embrace the risk and plan for how to manage it effectively.
Principle No. 2Set standards
When someone depends on data, it's wise to clarify what exactly they can depend on with clear definitions, hard numbers, and clear cross-team agreements.
Principle No. 3Reduce toil
Removing repetitive manual tasks needed to operate your data platform repays dividends in reduced overhead and fewer human errors.
Principle No. 4Monitor everything
It's impossible for a data team to understand how their data and infrastructure is behaving without comprehensive, always-on monitoring.
Principle No. 5Use automation
Automating manual processes reduces manual mistakes and frees up brainpower and time for tackling higher-order problems.
Principle No. 6Control releases
Making changes is ultimately how things improve, and how things break, and having a process for reviewing and releasing data pipeline code helps you ship improvements without causing breakage.
Principle No. 7Maintain simplicity
The enemy of reliability is complexity. Minimizing and isolating the complexity in any one pipeline job goes a long way toward keeping it reliable.