You are working with a weekly meal-kit subscription service. Every week, a delivery window opens and each active subscriber has an order record created for that week. Subscribers can choose to skip a delivery, in which case their order record is marked with skip = True and no line items are generated. If they do not skip, skip = False and the order includes one or more line items representing the meals they selected.
The business does not have a formal "cancel" flow. Instead, subscribers who want to leave simply skip every week indefinitely. The company tracks these as Shadow Cancellations internally, and is also very interested in Re-engagements -- subscribers who come back after shadow cancelling.
Your job is to build a dbt pipeline that identifies these two events from the raw order data.
A subscriber enters a shadow cancelled state when they have skipped 3 or more consecutive delivery weeks. The shadow cancellation event occurs on the 3rd consecutive skipped week. This is a one-time state transition per skip streak -- a subscriber who skips weeks 10 through 30 has exactly one shadow cancellation event (at week 12, their 3rd consecutive skip), not seven.
For example, if a subscriber skips weeks 10, 11, and 12, they become shadow cancelled as of week 12.
A re-engagement occurs when a subscriber who is currently in a shadow cancelled state places a successful order (skip = False). The re-engagement event occurs on the week of that first non-skipped order.
For example, if a subscriber is shadow cancelled (skipped weeks 10-12) and then orders in week 13, week 13 is a re-engagement.
- A subscriber who skips exactly 2 weeks and then orders again has not shadow cancelled.
- A subscriber who shadow cancels and never returns is still counted as a single shadow cancellation (on their 3rd consecutive skip week).
- A subscriber who re-engages and then shadow cancels again should be counted as a new shadow cancellation.
- Missing order records: Not every account has an order record for every delivery week. Your model must handle this and document its assumption -- are missing weeks skips? Data gaps? Something else? Hint: at least one account in the dataset has weeks with no order record at all.
The seed data consists of 5 tables:
accounts deliveries variants
+-----------+ +-------------+ +-----------+
| account_id| | delivery_id | | variant_id|
| first_name| | week_number | | name |
| last_name | | delivery_date| | price_cents|
| email | +-------------+ +-----------+
| created_at|
+-----------+
\ |
\ |
v v
orders
+------------+
| order_id |
| account_id | --> accounts
| delivery_id| --> deliveries
| skip | (boolean)
| created_at |
+------------+
|
v
line_items (only when skip = False)
+--------------+
| line_item_id |
| order_id | --> orders
| variant_id | --> variants
| quantity |
+--------------+
There are 50 accounts and 52 delivery weeks. The first 5 accounts have been hardcoded with specific patterns to test edge cases. Accounts 6-50 have randomized skip behavior (~20% skip rate).
Build a dbt pipeline organized into three layers:
-
Staging (
models/staging/) -- Already provided. Clean, typed versions of the raw seed tables. -
Intermediate (
models/intermediate/) -- This is where the core logic lives. Build models that:- Join orders with deliveries to get a timeline per account.
- Identify consecutive skip streaks (the "gaps and islands" problem).
- Flag when a shadow cancellation occurs (3rd consecutive skip).
- Flag when a re-engagement occurs (first non-skip after shadow cancellation).
-
Marts (
models/marts/) -- Final output table(s). At minimum, produce a model that shows:- Per week: the count of new shadow cancellations and the count of re-engagements.
- This is the table you will visualize in Rill.
- Python 3.10+
- Install the Python dependencies:
pip install dbt-duckdb pandas faker- Rill Developer (for visualization):
curl -s https://cdn.rilldata.com/install.sh | bashWindows users: check the Rill installation docs for alternative instructions.
python generate_data.pyThis creates 5 CSV files in the seeds/ directory.
dbt seed --profiles-dir .This loads the CSVs into a local data.duckdb file. The --profiles-dir . flag tells dbt to use the profiles.yml in the project root rather than ~/.dbt/.
Build out the intermediate and marts models in:
models/intermediate/-- your transformation logicmodels/marts/-- your final output tables
The staging layer is already provided as a reference for the pattern. Use {{ ref('stg_orders') }} etc. to reference upstream models.
dbt build --profiles-dir .This runs seeds, models, and tests. Fix any failures before proceeding.
rill startRill will start a local server. Connect it to the data.duckdb file and build a line chart showing:
- New shadow cancellations per week
- Re-engagements per week
If you are short on time, this step is optional. A screenshot, a brief write-up describing what you would build, or even just the final SQL query backing the chart is acceptable.
.
├── README.md
├── generate_data.py
├── dbt_project.yml
├── profiles.yml
├── seeds/
│ ├── accounts.csv
│ ├── deliveries.csv
│ ├── variants.csv
│ ├── orders.csv
│ └── line_items.csv
└── models/
├── staging/
│ ├── schema.yml
│ ├── stg_accounts.sql
│ ├── stg_deliveries.sql
│ ├── stg_variants.sql
│ ├── stg_orders.sql
│ └── stg_line_items.sql
├── intermediate/
│ └── (your models here)
└── marts/
└── (your models here)
| Criteria | What We're Looking For |
|---|---|
| Correctness | Does your logic correctly handle all edge cases, including the hardcoded test accounts? |
| SQL Proficiency | Clean, readable SQL. Effective use of window functions and CTEs. |
| dbt Best Practices | Proper use of ref(), layered model organization, schema documentation. |
| Data Modeling | Thoughtful intermediate models that break the problem into logical steps. |
| Visualization (Bonus) | A clear Rill dashboard showing the two metrics over time. |
| Documentation | Brief comments or a short write-up explaining your approach and any assumptions. |
Good luck! We are excited to see your approach.