Data Engineer Interview Guide: Real-Time & Streaming Questions

3 min readAug 20, 2025

https://medium.com/@goyalarchana17/15-critical-questions-every-data-engineer-should-be-prepared-to-answer-3aa82791653e

Core Concepts

What’s the difference between batch processing and real-time (stream) processing? When would you use one vs the other? Answer
Explain event-time vs processing-time. Why is this distinction critical in real-time pipelines?Answer
What are watermarks and how do they help with late-arriving events?Answer
Describe stateful vs stateless stream processing with an example.Answer
Compare at-most-once, at-least-once, and exactly-once delivery semantics.
What are common bottlenecks in streaming systems and how do you handle them?
In Kafka, what is the role of partitions, offsets, and consumer groups?Answer
Difference between Kafka Streams, Flink, and Spark Structured Streaming. Answer

Metrics & Product Scenarios

How would you define and calculate DAU/WAU/MAU in a streaming system? Answer
Suppose you need to track real-time order cancellations in DoorDash — what’s your approach? Answer
Netflix wants “Avg 30-Day Viewing Days” in real-time. How would you model and compute it? Answer
How do you track active drivers in Uber in the last 15 minutes using event streams? Answer
How would you calculate retention cohorts in near real-time?Answer
What metrics would you define to monitor fraud detection pipeline latency? Answer

Data Modeling

Design fact and dimension tables for a real-time ride-hailing system (Uber/Lyft).Answer
How would you model clickstream events in a dimensional schema?Answer

Get Archana Goyal’s stories in your inbox

Join Medium for free to get updates from this writer.

Explain why we need bridge tables (like multi-genre titles in Netflix) in event modeling. Answer
How would you store partial viewing sessions for analytics?Answer

SQL & ETL Scenarios

Write SQL to compute rolling 30-day distinct active users.Answer
You receive duplicate events in Kafka — how do you deduplicate in SQL/Spark?
Write a query to find top 5 restaurants with most real-time orders in last 24h (DoorDash).Answer
Suppose you have trip events (start, end). Write SQL to calculate average trip duration in the last 1 hour.Answer
How would you implement incremental ETL for streaming → warehouse (Snowflake/BigQuery)?Answer

System Design & Edge Cases

Design a real-time dashboard for Uber surge pricing. What components do you need? Answer
How would you design an alerting system if order delivery exceeds 45 minutes?
What happens if events arrive out of order? How do you correct them?
How do you scale a Flink job when state grows too large?
How do you design for backpressure handling in Spark Structured Streaming?
If your Kafka consumer is lagging heavily, how do you debug and fix it?
Explain how you’d handle schema evolution in streaming data.

Refer Material:

Data Engineering Interview (2025): 10 Data Streaming Pattern You Can’t Ignore

1. Daily Active Users (DAU) with Late Arrivals

medium.com

Interview-Ready Streaming Scenarios : Questions with Solutions

Imagine you are in the middle of a streaming interview and the interviewer asks:

medium.com

Cracking Streaming Interviews: How to Tame Time in Data Pipelines -1

If you are preparing for a data engineering or streaming interview, most questions aren’t about fancy ML models or…

medium.com

Cracking Streaming Interviews: How to Tame Time in Data Pipelines -2

This is the continuation of the blog Cracking Streaming Interviews: How to Tame Time in Data Pipelines -1 .

medium.com

Cracking Product Sense Interviews: Think Like a PM, Solve Like a Data Engineer

Recently, I had the privilege of mentoring some really talented folks preparing for interviews at major product…

medium.com

Data Engineer Interview Guide: Real-Time & Streaming Questions

Core Concepts

Metrics & Product Scenarios

Data Modeling

Get Archana Goyal’s stories in your inbox

SQL & ETL Scenarios

System Design & Edge Cases

Data Engineering Interview (2025): 10 Data Streaming Pattern You Can’t Ignore

1. Daily Active Users (DAU) with Late Arrivals

Interview-Ready Streaming Scenarios : Questions with Solutions

Imagine you are in the middle of a streaming interview and the interviewer asks:

Cracking Streaming Interviews: How to Tame Time in Data Pipelines -1

If you are preparing for a data engineering or streaming interview, most questions aren’t about fancy ML models or…

Cracking Streaming Interviews: How to Tame Time in Data Pipelines -2

This is the continuation of the blog Cracking Streaming Interviews: How to Tame Time in Data Pipelines -1 .

Cracking Product Sense Interviews: Think Like a PM, Solve Like a Data Engineer

Recently, I had the privilege of mentoring some really talented folks preparing for interviews at major product…

Written by Archana Goyal

No responses yet

More from Archana Goyal

Think Metadata-First: Architect Metadata-Driven Data Lakes with These 8 Golden Rules

In this blog, I’ll cover :

Atlassian Interview Experience — Data Engineering & Architecture Challenges

Recently, I had the opportunity to go through the Atlassian Data Engineer interview process, and I can say this — it’s a true test of…

Kafka Interview Scenario-Solution

Designing a Real-Time Event Streaming System:

Microsoft Data Engineer 2 Interview Experience — A Deep Dive into Big Data & Beyond

Landing a Data Engineer 2 role at Microsoft is a challenging yet rewarding journey. It pushes your boundaries across distributed systems…

Recommended from Medium

Data Engineering Design Patterns You Must Learn in 2026

These are the 8 data engineering design patterns every modern data stack is built on. Learn them once, and every data engineering tool…

Spark Jobs, Stages, Tasks Scenario— 10 Practice Questions with Answers

In Spark interviews, understanding how a job actually executes — from transformations to stages and tasks — is one of the most underrated…

Data Engineer Interview questions — Amazon DE III

Hi Guys, I consolidated some data engineer interview questions from various platforms including LinkedIn, Reddit and Medium. Here’s I have…

🔥 50 Databricks Interview Questions & Answers: The Ultimate Guide

Unlock your Databricks interview success with expert questions and practical, detailed answers covering everything from Spark fundamentals…

The Data Engineer’s Leetcode Algorithm Cheat Sheet (2025)

15 Python patterns that show up in every FAANG data engineering interview — with real examples and when to use each

Mini Bible of PySpark

Create DataFrame