Interview Experience

Data Engineer Loop: The SQL Test That Wasn't Just SQL

A data engineering loop at a Series C analytics company. The SQL round looked like a standard SQL test. It was grading something else entirely.

Data Engineer Loop: The SQL Test That Wasn't Just SQL

A data engineering loop at a Series C analytics company. The SQL round looked like a standard SQL test. It was grading something else entirely.

sql

interview-prep

system-design-interview

behavioral-interview

performance

By @aishataylor

April 29, 2026

Updated May 18, 2026

818 views

4.4 (9)

The interviewer pasted a schema with seven tables into the shared editor, then said: "those are the right kinds of questions; what would you do if I were not here to answer them?" That was minute 18 of a 60-minute SQL round at a Series C B2B analytics company in Austin (about 130 people). I had walked in expecting the standard window-function-and-CTE workout I had done at every other data engineering interview, with three years of healthcare-analytics work behind me. The SQL was easy. The round was grading whether I understood what the data actually was, and that was the part I almost failed.

The five rounds and the one that mattered

Round 1: 30 min recruiter screen
Round 2: 60 min SQL round (a senior data engineer, a real schema, a series of questions)
Round 3: 60 min data modeling round (design a star schema for a fictional but plausible product)
Round 4: 60 min pipeline reliability round (debugging a broken pipeline diagram)
Round 5: 45 min behavioral with the engineering manager

I am going to walk through round 2 in detail because the rest of the loop was standard. Anyone who has done a few data engineering loops knows the model.

What the SQL round actually was

The interviewer shared a Postgres-flavored schema in a doc. Three tables, anonymized but real-feeling:

CREATE TABLE orders (
  order_id        BIGINT PRIMARY KEY,
  customer_id     BIGINT NOT NULL,
  placed_at       TIMESTAMPTZ NOT NULL,
  total_cents     INTEGER NOT NULL,
  status          TEXT NOT NULL  -- 'pending', 'confirmed', 'cancelled', 'refunded'
);

CREATE TABLE order_events (
  event_id        BIGINT PRIMARY KEY,
  order_id        BIGINT NOT NULL REFERENCES orders(order_id),
  event_type      TEXT NOT NULL,  -- 'placed', 'confirmed', 'cancelled', 'refunded', 'partial_refund'
  occurred_at     TIMESTAMPTZ NOT NULL,
  amount_cents    INTEGER          -- only set for refund events
);

CREATE TABLE customers (
  customer_id     BIGINT PRIMARY KEY,
  signed_up_at    TIMESTAMPTZ NOT NULL,
  segment         TEXT NOT NULL    -- 'enterprise', 'mid-market', 'self-serve'
);

The first three questions were the warm-up. "Daily revenue, last 30 days." "Customer count by segment." "Top 10 customers by lifetime spend, excluding refunds." Standard. I wrote each one in 4 to 6 minutes.

The fourth question was the round. The interviewer said: "compute net revenue per day for the last 90 days, broken out by customer segment, factoring in partial refunds correctly".

I started writing. Then I stopped. The sentence had a lot of joints. "Net revenue" could be revenue minus refunds. But how do partial refunds work in this data, exactly? amount_cents was only set for refund events. The orders.total_cents was set at order time. Was an order with a partial refund still in the confirmed status, or did the status change? Was the refund attributed to the day of the original order or the day of the refund event?

I asked. The interviewer said "those are the right questions; what would you do if I were not here to answer them".

The pivot

This is where the round actually started. The SQL round was not testing whether I could write window functions. It was testing whether I would write a query against a schema I did not understand and confidently produce a wrong number, or whether I would stop and probe the data first.

I asked if I could see some sample rows from order_events for an order with a partial refund. He shared a Snowflake-style preview:

order_id  event_type        occurred_at    amount_cents
40219     placed            2025-01-12     null
40219     confirmed         2025-01-12     null
40219     partial_refund    2025-01-20     1500

The partial_refund row had amount_cents = 1500 but the original order's total_cents was 8000. So a partial refund of $15 against an $80 order. The order's status in the orders table was still confirmed (a partial refund did not flip the status; only a full refund did).

Now the question made sense. Net revenue is the order's total_cents minus the sum of any partial refunds for that order, plus a separate handling for fully-refunded orders. The attribution day is a business choice: I asked the interviewer which it was, and he said "both views are valid; pick one and tell me why". I picked attribution to the original order day, on the argument that a leadership dashboard wants to see net revenue trends over the days the orders happened, not the days the refunds happened. He said "that is the choice we made; the alternative is also a defensible answer; what matters is that you said why".

The SQL itself was not hard once the question was actually understood:

WITH refund_totals AS (
  SELECT
    order_id,
    SUM(amount_cents) AS refunded_cents
  FROM order_events
  WHERE event_type IN ('partial_refund', 'refunded')
  GROUP BY order_id
),
net_per_order AS (
  SELECT
    o.order_id,
    o.placed_at::date AS order_date,
    c.segment,
    o.total_cents - COALESCE(r.refunded_cents, 0) AS net_cents
  FROM orders o
  JOIN customers c ON c.customer_id = o.customer_id
  LEFT JOIN refund_totals r ON r.order_id = o.order_id
  WHERE o.placed_at >= NOW() - INTERVAL '90 days'
    AND o.status != 'cancelled'
)
SELECT
  order_date,
  segment,
  SUM(net_cents) / 100.0 AS net_revenue_dollars
FROM net_per_order
GROUP BY order_date, segment
ORDER BY order_date DESC, segment;

The interviewer asked one follow-up that I want to flag because it was the third hidden grade: "what if the same order has both a partial refund and a full refund event". The answer is that the data model probably should not allow that, but if it does, my query would over-count refunds. I admitted it would, and proposed handling it by ranking refund events per order and taking the most informative one. He said "good; that is the kind of thing we find in our data once a week".

What the SQL round was grading underneath the SQL

Three things, in order:

First, would you stop and probe the data before you write SQL against a schema you do not understand. Most candidates do not.

Second, when there is an ambiguity that the data does not resolve, do you make a defensible business choice and name it, or do you produce one number and ship it.

Third, do you anticipate the dirty cases (overlapping refunds, missing data, late-arriving events) that any real production data engineer will see in the first month.

The SQL was the smallest part of the round. The interviewer told me later that the team had been burned twice by data engineers who wrote technically-correct queries against the wrong understanding of the data, and had decided to rebuild the SQL round around that failure mode.

The rest of the loop and outcome

The data modeling round was a clean star-schema design for an e-commerce-shaped product (facts: orders, events; dimensions: customers, products, time, segments). I drew the schema, walked through the grain, defended my choice of slowly-changing-dimension for the customer segment table. The pipeline reliability round was a debugging round on a broken Airflow DAG; I narrowed the breakage to a backfill job that was running concurrently with the daily run and stomping on a partition. The behavioral round was 45 minutes; nothing surprising. The recruiter screen earlier in the loop was 30 minutes and the question that decided the rest of the loop was "tell me about a time you owned a pipeline that broke at 3am, what you did, and what you changed about how you operated afterward." I had a story about a partition-explosion incident from a previous role. I told it in three minutes. The recruiter wrote down "yes" in the doc she was sharing on screen, which was the first signal that the loop was going to keep moving. The Airflow debugging round had its own pivot inside it: the partition collision was the diagnosis, but the interviewer pushed once more with "what monitor would have caught this earlier", and the answer that landed was a per-DAG run-overlap metric on the scheduler, with an alert if two runs of the same DAG were active outside a known backfill window.

I got the offer and accepted. The single thing I would do differently if I ran this loop again is the data modeling round. I had defaulted to a star schema because that was what the round seemed to want, but the right answer for their actual product (which the interviewer described to me later) was probably an event-sourced fact table with materialized views per consumer. I had not had the courage to suggest it because the prompt sounded like it wanted star schema. The round would have been stronger if I had said "here is the star schema you asked for, here is the event-sourced version that is probably better for your actual product, and here is when each one wins". Sequencing the answer that way is the thing I want to walk in with next time.

Back to Interview Experiences