a man sits in front of a computer on a video call

MAY 25-26TH, 2022

VIRTUAL EVENT

Data Reliability Engineering Conference 2022

  Observability   Version Control  

  Observability   Version Control  

  DRE Conference   DRE Conference   DRE Conference   DRE Conference

  DRE Conference   DRE Conference   DRE Conference   DRE Conference

  Reliability   Discovery   Observability  

  Reliability   Discovery   Observability  

01 - Overview

Two days of data reliability engineering—standards, tools, and teams

First held in December 2021, DRE Con is the only conference to focus on the practice of data reliability engineering—the tools, processes, and people needed to treat data quality and pipeline reliability like an engineering problem.

Join us virtually on May 25th and 26th for hands-on workshops, live Q&A, and presentations on the role data reliability engineers play, now and in the future. Hear from data teams, modern data stack companies, and other speakers working to solve the most complex data reliability challenges.

02 - Speakers

Learn from the best in
data engineering

Miriah
Peterson

DRE and Member of Technical Staff

Tailscale

View Bio

Egor
Gryaznov

CTO and co-founder

Bigeye

View Bio

Scott
Shi

Co-Founder and CTO

ZettaBlock

View Bio

Harish
Srigiriraju

Principal Engineer

Verizon

View Bio

Linda
Liu

Director, Data Analytics & Data Science

HyreCar

View Bio

Randy
Pitcher

Senior Solutions Architect

dbt labs

View Bio

Peter
Fishman

Co-Founder

Mozart Data

View Bio

Segun
Adelowo

Lead Machine Learning Engineer

Interswitch Group

View Bio

Shailvi
Wakhlu

Sr. Director of Data

Strava

View Bio

Pavani
Rangavajhula

Senior Data Engineer

Ecobee

View Bio

Sudhir
Tonse

Engineering Leadership

DoorDash

View Bio

Kevin
Kho

Community Engineer

Prefect

View Bio

Loc
Nguyen

Data Engineer

Mayan

View Bio

Christianna
Clark

Managing Supervisor/Lead Machine Learning Data Engineer

Methods+Mastery

View Bio

Jerry
Shen

Head of Data

OpenSea

View Bio

Glen-Erik
Cortés

ML Engineering Manager

Royal Caribbean Group

View Bio

Chris
Handy

Senior Product Marketing Manager

Crux

View Bio

Dan
Lynn

Senior Vice President of Product

Crux

View Bio

03 - Schedule

Daily program for the
DRE conference

5:10 PM UTC

Keynote

Data Fitness for a Healthy Business

Speaker

Shailvi Wakhlu

Sr. Director of Data
Strava
View Bio
Data insights can only be as good as the quality of the data they're based on. Businesses use data to make good decisions, that can help supercharge positive outcomes. If the data they use to make those decisions are of low quality, there is a high chance that the results will be low quality as well. Inaccuracies and biases in your data can result in costly mistakes. In this talk, I highlight the typical lifecycle of data and the phases where bad data sneaks in. Knowing this, and understanding the importance of high data quality is the best way to prevent problems that are caused by poor data quality.

5:40 PM UTC

Embrace risk

Riding Upon the Shoulders of the Imperfect

Speaker

Christianna Clark

Managing Supervisor/Lead Machine Learning Data Engineer
Methods+Mastery
View Bio
Track 1
In 2020 the DICE’s Tech Job Report listed Data Engineering as the fastest growing job in 2019, growing 50% YoY. With more and more tools being developed to help answer the demand for automated data system solutions, we would think the time spent maintaining these systems would be going down greatly--but it’s not. It is estimated that Data Engineers still spend 56% of their time on executing their systems and only 22% on the innovation that delivers value. So why is this? In her talk Riding Upon the Shoulders of the Imperfect, Christianna Clark implores us to stop thinking like data engineers and start thinking like agile software developers, through the defiance of perfection and the embrace of the imperfect.

Embrace risk

Where We Are Going We Don't Need Standards

Speaker

Miriah Peterson

DRE and Member of Technical Staff
Tailscale
View Bio
Track 2
“Move Fast and Break Things” (Mark Zuckerberg, CEO of Facebook) is the mantra of many fast pace startups. Every start-up hits a point where infrastructure and data integrity matter as much as or more than speed. Come explore a practical application of data governance for critical system performance.

6:10 PM UTC

Set standards

Challenges in Defining and Achieving High Data Quality Standards

Speaker

Sudhir Tonse

Engineering Leadership
DoorDash
View Bio
Track 1
Most modern businesses focus on utilizing the power of the data they have collected in optimizing their business and making informed data driven decisions. While it's clear that the decisions and optimizations thus driven depend on the quality and trust of the underlying data architecture, what is not clear is how exactly this quality is defined, shared, measured and the trust gained. What challenges exist in a complex organization with many functions that can benefit from a structured approach to achieving high data quality standards? This talk will focus on these challenges and outlines approaches used in large organizations such as DoorDash.

Set standards

From Billions of Tickets to a Few Thousand that Matter, the Data Journey of Uber safety Report

Speaker

Scott Shi

Co-Founder and CTO
ZettaBlock
View Bio
Track 2
In 2019, nearly 4 million Uber trips happened every day in the US — more than 45 rides every second. Only 0.0003% of trips had a report of a critical safety incident. As the engineering lead, Scott initiated, designed, built, and maintained a complex data system to systematically and accurately classify critical safety incidents. In this talk, Scott will share the learnings about data quality from this 2 years journey.

6:40 PM UTC

Reduce toil

Reducing Toil with EL, T, & BI

Speaker

Peter Fishman

Co-Founder
Mozart Data
View Bio
Track 1
Abstract coming soon.

Reduce toil

Pipeline Promotion without the Commotion!

Speaker

Randy Pitcher

Senior Solutions Architect
dbt labs
View Bio
Track 2
In this session, we will provide a live walk through of building out a tested and automated dev -> test -> prod promotion strategy using dbt. You'll learn: - how to remove the toil and fragility of manual environment promotion - the principals of introducing deployment environments from scratch - the basics of automating quality checks - how to revert changes with much less stress than before"

7:30 PM UTC

Hands-on Workshop: Monitor Everything

Workshop: Build Your Own Data Observability Solution on Snowflake in Minutes

Speaker

Egor Gryaznov

CTO and co-founder
Bigeye
View Bio
Monitoring data pipelines for operational problems like failed loads is a critical skill for any data reliability engineer. Thankfully, Snowflake makes it possible to automatically monitor every dataset in minutes. In this workshop, we’ll guide you through the building process, and in less than two hours, you will be able to spot data problems across your Snowflake environment. Everyone who registers for DRE Con can attend - free!

5:10 PM UTC

Monitor

Managing Data Quality When it's Outside of Your Control

Track 1
When data originates outside of your company, you can’t just get under the hood and fix the source. You have to work a little differently. In this session, we’ll talk through the elements of a great data product, and what you can do when you aren’t handed something great. Join Dan Lynn, SVP of Product at Crux for a conversation on how Crux monitors and ensures external data quality at the point of delivery.

Monitor

Monitoring Data, Machine Learning Models and Performance from a Personal Cloud Perspective

Speaker

Harish Srigiriraju

Principal Engineer
Verizon
View Bio
Track 2
As we setup data systems, machine learning models and make improvements in performance, often times the outcomes are not measured accurately. Lack of robust monitoring systems affects the end users or internal stakeholders. This presentation will explore various challenges and solutions to effectively monitor various data systems. Moreover, the presentation will cover specific examples from a personal cloud application

5:40 PM UTC

Automation

Introduction to Workflow Orchestration with Prefect

Speaker

Kevin Kho

Community Engineer
Prefect
View Bio
Track 1
Modern data applications are as complex as ever, and along with this complexity also comes an ever-growing number of ways these applications can fail. Data sources can go down intermittently. Input data may come in malformed. To guard against this, a non-trivial amount of effort is spent creating code paths to handle these failures gracefully. These include task retrying, execution timeouts, and notifications in the event of failure. Collectively, the effort spent guarding against failure is called negative engineering. Workflow orchestration frameworks are specifically designed to reduce the effort spent on negative engineering by enabling the scheduling and monitoring of workflows. Failure code paths can be added so that the application knows how to respond in the event something unexpected happens. This talk will cover basic workflow orchestration functionality and we'll familiarize ourselves with the features that let us handle these problems gracefully.

Automation

Supercharging Big Data Automation Pipeline Engine

Speaker

Linda Liu

Director, Data Analytics & Data Science
HyreCar
View Bio
Track 2
Regardless of size, industry, or market, big data has become an intrinsic part of a company's road to success. Often, how a company's big data initiatives turn out can be a telling story on its fortunes and outlook. We are all no longer strangers to pipelines bridging the divide between the raw and curated. Needless to say, automation plays a key role in tackling big data initiatives. Automation pipelines are must-haves in the pursuit of big data success. The engine that creates those pipelines deserves special attention. How do we go beyond the traditional automation pipelines to increase "capitalization" in driving business growth and success? What can we do to supercharge that engine to increase data product utilization and value to end users? In this talk, Linda Liu will share her tweaks on tuning and adding enhancements to maximize the big data automation pipeline engine performance.

6:10 PM UTC

Control releases

Controlled Releases - How to Enforce Safe Data Pipeline Releases

Speaker

Pavani Rangavajhula

Senior Data Engineer
Ecobee
View Bio
Track 1
CI/CD, DevOps and PRs have been few of the best practices for SDE for years. These practices have assured incremental delivery and necessary gates for production systems. Since data engineering and ML Engineering has become more mainstream in recent years, these best practices are being incorporated into delivery of these systems as well. We will go through some best practices that are applicable in general and also those that are particularly applicable to ML, data systems. We will cover GHA, PR reviews and how we incorporate vulnerability scan on libraries.

Control releases

Using Bigeye in data CI Pipeline with Snowflake & dbt

Track 2
Inspired by Gitlab's own talk at Data Council https://www.youtube.com/watch?v=eu623QBwakc - Mayan has added Bigeye to their Gitlab CI pipeline to increase confidence in merge requests. This talk will walk through the whys & hows

6:40 PM UTC

Simplicity

Maintain Simplicity in Keeping Data Fresh, Accurate and Reliable

Speaker

Segun Adelowo

Lead Machine Learning Engineer
Interswitch Group
View Bio
Track 1
Companies experiencing growth and using data generated for internal processes or value realisation over time add new features, rules and business processes to the data flow, which eventually gets complicated to a point where changing or updating features or lines of code get some push back from data professionals for fear of breaking pipelines or computation logic embedded in layers of code and conditions, this results in stale, inaccurate and eventually unreliable data. A proper holistic review of the data systems with management buy-in to reduce/remove complexity through the splitting of these processes into single responsibility components, by adding governance, continuous collaboration with domain experts and education of the organisation's data users. This simplification will go a long way to ensure the organisation gets value from the data systems and scales accordingly.

Simplicity

Accelerating Development and Delivery of ML Solutions using Sandboxed Environments, Version Control, and Modular Design

Speaker

Glen-Erik Cortés

ML Engineering Manager
Royal Caribbean Group
View Bio
Track 2
Deployment of Machine Learning (ML) solutions remains an industry challenge with nearly 80% of projects reportedly never making it to Production. However, increasing maturity around MLOps is making it easier for ML teams to deliver robust and transparent ML solutions that are easy to iterate on when working in teams. In this talk, we focus on: • the use of separated Development, QA, and Production environments and the ability to compare outputs between sandboxed environments. • proper git branching methodology for team collaboration. • the importance of breaking down the solution into separate steps to facilitate monitoring, debugging, and speed up overall development. Discussed technologies will include Databricks, Git, MLFlow, and DevOps, but the chat will focus on overall strategy and principles, so the ideas can be applied with different tech stacks.

7:30 PM UTC

Q&A Session

Ask Us Anything: Data Reliability from Three Perspectives

Data reliability engineering is an emerging field and set of practices. You probably have a lot of questions, and we have a few answers from our slate of esteemed data leaders. In this open Q&A session, you can ask us anything. We'll do our best to answer and point you to resources to help you on your journey to making data reliability engineering a key part role and responsibility at your organization.