-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathsetup_database.py
More file actions
464 lines (343 loc) · 19.6 KB
/
Copy pathsetup_database.py
File metadata and controls
464 lines (343 loc) · 19.6 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
# -*- coding: utf-8 -*-
"""setup_database.ipynb
Automatically generated by Colab.
Original file is located at
https://colab.research.google.com/drive/1GC0hIH9jGifJTNo9erF-fqcCMmIoWgIG
# **E-Commerce Database Setup Project**
## **Introduction**
This project focuses on building a relational database for a simple e-commerce application using SQLite and Python. The objective is to design a structured database, import transactional data from CSV files, and ensure data integrity through the use of primary keys, foreign keys, and constraints.
By organizing customer, product, and order data into separate but related tables, this project demonstrates core database design principles and practical data engineering workflows. The completed database serves as a foundation for efficient querying, analysis, and further development in subsequent tasks.
## **Part 1: Database Creation and Data Import**
In this project, we create a relational **SQLite database** for a simple e-commerce application using Python. The goal of this part is to build a clean database structure, import the provided CSV data, and confirm that everything loads correctly.
The dataset includes three files:
- **customers.csv** — customer information
- **products.csv** — product catalog
- **orders.csv** — order transactions
This notebook completes the following steps:
1. Create a new SQLite database file using Python (`sqlite3`).
2. Define tables with proper **primary keys**, **foreign keys**, and constraints such as `NOT NULL` and `UNIQUE` to maintain data integrity.
3. Import data from each CSV file into the correct table.
4. Verify the import by printing **row counts** for each table.
This setup ensures relationships between customers, products, and orders are consistent and ready for querying in later tasks.
## **Upload CSV Files**
The code below opens a file picker dialog. Please upload the following three CSV files from computer:
- **customers.csv**
- **orders.csv**
- **products.csv**
These files contain the core data required to build the e-commerce database:
| File Name | Description |
|------------------|-----------------------------------------------------------------------------|
| customers.csv | Customer information (customer ID, name, email, city, join date) |
| products.csv | Product catalog details (product ID, name, category, price, cost) |
| orders.csv | Order transaction data (order ID, customer ID, product ID, quantity, date) |
Once uploaded, these files will be used to populate the corresponding tables in the SQLite database.
## **Upload File 1: Customers**
Click the **Choose Files** button below and select **customers.csv** from local Downloads folder.
This file contains customer-level information such as customer ID, name, email address, city, and join date.
The data from this file will be imported into the `customers` table of the SQLite database and will serve as the foundation for linking customer records to order transactions in later steps.
"""
from google.colab import files
print("Please select 'customers (1).csv' file")
uploaded = files.upload()
print("Customers file uploaded!")
"""The `customers.csv` file has been successfully uploaded and is now available in the Colab runtime environment. This file will be used to populate the `customers` table in the SQLite database.
## **Upload File 2: Products**
Click the **Choose Files** button below and select **products.csv** from local Downloads folder.
This file contains product-level information such as product ID, product name, category, price, and cost.
The data from this file will be used to populate the `products` table in the SQLite database and will later be linked to order transactions through product identifiers.
"""
print("Please select 'products (1).csv' file")
uploaded = files.upload()
print("Products file uploaded!")
"""The `products.csv` file has been successfully uploaded and is now available in the Colab runtime environment. This file will be used to populate the `products` table in the SQLite database.
## **Upload File 3: Orders**
Click the **Choose Files** button below and select **orders.csv** from your local Downloads folder.
This file contains order-level transaction data, including order ID, customer ID, product ID, quantity, and order date.
The data from this file will be used to populate the `orders` table in the SQLite database and establish relationships between customers and products through foreign key constraints.
"""
print("Please select 'orders (1).csv' file")
uploaded = files.upload()
print("Orders file uploaded!")
"""The `orders.csv` file has been successfully uploaded and is now available in the Colab runtime environment. This file contains transaction-level data that will be used to populate the `orders` table in the SQLite database.
## **Create Database and Define Table Schemas**
Now we will create a SQLite database file named `ecommerce.db` and define three relational tables: `customers`, `products`, and `orders`.
Each table is created with appropriate data types, primary keys, and constraints to maintain data quality and enforce consistent relationships across the dataset.
## **Table Schemas**
### 1) `customers` table
| Column | Data Type | Constraints |
|-------------|-----------|--------------------------|
| customer_id | INTEGER | PRIMARY KEY |
| name | TEXT | NOT NULL |
| email | TEXT | NOT NULL, UNIQUE |
| city | TEXT | NOT NULL |
| join_date | DATE | NOT NULL |
### 2) `products` table
| Column | Data Type | Constraints |
|------------|-----------|----------------------------------|
| product_id | INTEGER | PRIMARY KEY |
| name | TEXT | NOT NULL |
| category | TEXT | NOT NULL |
| price | REAL | NOT NULL, CHECK (price >= 0) |
| cost | REAL | NOT NULL, CHECK (cost >= 0) |
### 3) `orders` table
| Column | Data Type | Constraints |
|-------------|-----------|-------------------------------------------|
| order_id | INTEGER | PRIMARY KEY |
| customer_id | INTEGER | NOT NULL, FOREIGN KEY |
| product_id | INTEGER | NOT NULL, FOREIGN KEY |
| quantity | INTEGER | NOT NULL, CHECK (quantity > 0) |
| order_date | DATE | NOT NULL |
## **Foreign Key Relationships**
- `orders.customer_id` → references → `customers.customer_id`
- `orders.product_id` → references → `products.product_id`
These foreign key links ensure that every order is associated with a valid customer and a valid product, supporting relational integrity in the database.
"""
import sqlite3
def create_database(db_path: str) -> None:
"""
Create (or recreate) an SQLite database and define tables for a simple
e-commerce schema: customers, products, and orders.
This function:
1) Connects to SQLite (creates the DB file if it doesn't exist)
2) Enables foreign key enforcement
3) Drops existing tables (to avoid schema conflicts on rerun)
4) Creates tables with primary keys, constraints, and foreign keys
5) Creates helpful indexes for faster joins and filtering
"""
# Connect to the SQLite database file (creates file if missing)
with sqlite3.connect(db_path) as conn:
cursor = conn.cursor()
# IMPORTANT: SQLite requires this pragma to actually enforce foreign keys
cursor.execute("PRAGMA foreign_keys = ON;")
# Reset existing schema
# Drop child table first (orders) to avoid foreign key dependency issues
cursor.execute("DROP TABLE IF EXISTS orders;")
cursor.execute("DROP TABLE IF EXISTS products;")
cursor.execute("DROP TABLE IF EXISTS customers;")
# Create customers table
cursor.execute("""
CREATE TABLE IF NOT EXISTS customers (
customer_id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
email TEXT NOT NULL UNIQUE,
city TEXT NOT NULL,
join_date DATE NOT NULL
);
""")
print(" ✓ Created 'customers' table")
# Create products table
cursor.execute("""
CREATE TABLE IF NOT EXISTS products (
product_id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
category TEXT NOT NULL,
price REAL NOT NULL CHECK(price >= 0),
cost REAL NOT NULL CHECK(cost >= 0)
);
""")
print(" ✓ Created 'products' table")
# Create orders table (transaction table)
# This table links customers and products using foreign keys
cursor.execute("""
CREATE TABLE IF NOT EXISTS orders (
order_id INTEGER PRIMARY KEY,
customer_id INTEGER NOT NULL,
product_id INTEGER NOT NULL,
quantity INTEGER NOT NULL CHECK(quantity > 0),
order_date DATE NOT NULL,
-- Foreign key to customers table
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
ON DELETE RESTRICT
ON UPDATE CASCADE,
-- Foreign key to products table
FOREIGN KEY (product_id) REFERENCES products(product_id)
ON DELETE RESTRICT
ON UPDATE CASCADE
);
""")
print(" ✓ Created 'orders' table")
# Indexes
# These indexes speed up common join/filter operations
cursor.execute("CREATE INDEX IF NOT EXISTS idx_orders_customer ON orders(customer_id);")
cursor.execute("CREATE INDEX IF NOT EXISTS idx_orders_product ON orders(product_id);")
cursor.execute("CREATE INDEX IF NOT EXISTS idx_orders_date ON orders(order_date);")
cursor.execute("CREATE INDEX IF NOT EXISTS idx_products_category ON products(category);")
cursor.execute("CREATE INDEX IF NOT EXISTS idx_customers_city ON customers(city);")
print(" ✓ Created indexes for optimized queries")
# Commit all schema changes
conn.commit()
print(f"\n✓ Database created successfully: {db_path}")
# Run the database setup
DB_PATH = "ecommerce.db"
print("Creating database and tables...")
print("-" * 40)
create_database(DB_PATH)
"""## **Database Creation Confirmation**
The SQLite database and all required tables have been successfully created. The `customers`, `products`, and `orders` tables are now defined with the appropriate primary keys, constraints, and foreign key relationships.
In addition, indexes have been created to improve query performance for common join and filtering operations. The database file `ecommerce.db` is now ready for data import in the next step.
## **Import Data from CSV Files**
In this step, we read data from the uploaded CSV files and insert it into the corresponding tables in the SQLite database.
## **Import Order**
The tables must be populated in the following order to maintain referential integrity:
1. **customers** (parent table)
2. **products** (parent table)
3. **orders** (child table – contains foreign keys referencing `customers` and `products`)
This import sequence is required because the `orders` table depends on existing records in both the `customers` and `products` tables through foreign key constraints. Importing the parent tables first ensures that all referenced keys exist before order records are inserted.
"""
import sqlite3
import csv
def import_csv(db_path: str, csv_file: str, table_name: str) -> int:
"""
Import data from a CSV file into a specified SQLite table.
Assumptions:
- The CSV column order matches the table schema order.
- The CSV contains a header row.
- Each row has the same number of columns as the header.
Returns:
- Number of rows successfully inserted.
"""
rows_imported = 0
# Connect to the database (and ensure FK enforcement)
with sqlite3.connect(db_path) as conn:
conn.execute("PRAGMA foreign_keys = ON;")
cursor = conn.cursor()
# Open the CSV file for reading
with open(csv_file, "r", newline="", encoding="utf-8") as file:
csv_reader = csv.reader(file)
# Read the header row (column names) and infer the number of columns
header = next(csv_reader)
num_columns = len(header)
# Build an INSERT statement with the right number of placeholders
# Example: INSERT INTO customers VALUES (?, ?, ?, ?, ?)
placeholders = ", ".join(["?" for _ in range(num_columns)])
insert_sql = f"INSERT INTO {table_name} VALUES ({placeholders})"
# Loop through each data row and insert into the table
for row in csv_reader:
try:
cursor.execute(insert_sql, row)
rows_imported += 1
except sqlite3.IntegrityError as e:
# IntegrityError covers UNIQUE/NOT NULL/CHECK/FK constraint failures
# We skip problematic rows but keep importing the rest
print(f" ⚠ Skipping row due to constraint violation: {e}")
# Commit all inserts for this file
conn.commit()
print(f" ✓ Imported {rows_imported} rows into '{table_name}'")
return rows_imported
# Import data from CSV files
print("Importing data from CSV files...")
print("-" * 40)
# IMPORTANT: Import parent tables first, then child table (orders)
import_csv(DB_PATH, "customers (1).csv", "customers")
import_csv(DB_PATH, "products (1).csv", "products")
import_csv(DB_PATH, "orders (1).csv", "orders")
print("\n✓ All data imported successfully!")
"""## **Data Import Confirmation**
All CSV files have been successfully imported into the SQLite database. The `customers`, `products`, and `orders` tables have been populated with the expected number of records, confirming that the import process completed without errors.
With the data now stored in the database, the next step is to verify the contents of each table and proceed with querying and analysis tasks.
## **Verify Data Import**
In this final step, we verify that the data was imported into the SQLite database correctly. This verification ensures both data completeness and referential integrity across all tables.
The verification process includes the following checks:
1. **Checking row counts** – Confirming that each table contains the expected number of records after the import process.
2. **Displaying sample data** – Showing the first few rows from each table to confirm that values were inserted correctly.
3. **Validating foreign keys** – Ensuring that all records in the `orders` table reference valid `customers` and `products` entries.
These checks help confirm that the database is ready for querying and further analysis.
"""
import sqlite3
def verify_data(db_path: str) -> None:
"""
Verify that data was imported correctly by:
1) Printing row counts for each table
2) Displaying the first 3 rows from each table
3) Validating foreign key relationships in the orders table
"""
# Header for a clean verification report
print("=" * 60)
print("DATA VERIFICATION REPORT")
print("=" * 60)
# Connect to the database for verification queries
with sqlite3.connect(db_path) as conn:
cursor = conn.cursor()
# 1) Row Counts
tables = ["customers", "products", "orders"]
print("\nRow Counts:")
print("-" * 40)
total_rows = 0
for table in tables:
# Count records in each table
cursor.execute(f"SELECT COUNT(*) FROM {table}")
count = cursor.fetchone()[0]
total_rows += count
# Print formatted row count per table
print(f" {table:15} : {count:5} rows")
print("-" * 40)
print(f" {'TOTAL':15} : {total_rows:5} rows")
# 2) Sample Data (First 3 rows)
print("\n" + "-" * 60)
print("Sample Data (First 3 rows from each table):")
print("-" * 60)
# Customers sample
print("\nCustomers:")
cursor.execute("SELECT * FROM customers LIMIT 3")
for row in cursor.fetchall():
print(f" {row}")
# Products sample
print("\nProducts:")
cursor.execute("SELECT * FROM products LIMIT 3")
for row in cursor.fetchall():
print(f" {row}")
# Orders sample
print("\nOrders:")
cursor.execute("SELECT * FROM orders LIMIT 3")
for row in cursor.fetchall():
print(f" {row}")
# 3) Foreign Key Verification
# We confirm that every order references a valid customer and a valid product
print("\n" + "-" * 60)
print("Foreign Key Verification:")
print("-" * 60)
# Count total orders
cursor.execute("SELECT COUNT(*) FROM orders")
total_orders = cursor.fetchone()[0]
# Count orders that match an existing customer_id
cursor.execute("""
SELECT COUNT(*)
FROM orders o
INNER JOIN customers c
ON o.customer_id = c.customer_id
""")
valid_customer_orders = cursor.fetchone()[0]
# Count orders that match an existing product_id
cursor.execute("""
SELECT COUNT(*)
FROM orders o
INNER JOIN products p
ON o.product_id = p.product_id
""")
valid_product_orders = cursor.fetchone()[0]
# Print validation results
print(f" Orders with valid customer references: {valid_customer_orders}/{total_orders}")
print(f" Orders with valid product references: {valid_product_orders}/{total_orders}")
# Final pass/fail message
if valid_customer_orders == total_orders and valid_product_orders == total_orders:
print("\n ✓ All foreign key relationships are valid!")
else:
print("\n ⚠ Warning: Some foreign key relationships are invalid!")
# Run verification + final message
verify_data(DB_PATH)
print("\n" + "=" * 60)
print("DATABASE SETUP COMPLETE!")
print("=" * 60)
"""## **Verification Summary**
The data verification process confirms that the database setup and import steps were completed successfully.
- **Row counts** match the expected values across all tables:
- 30 customers
- 20 products
- 100 orders
- **Sample records** from each table display correctly, indicating that data was inserted in the proper format.
- **Foreign key validation** confirms that every order references a valid customer and a valid product, ensuring full referential integrity.
With all checks passing, the SQLite database is correctly structured, fully populated, and ready for querying and further analysis.
## **Conclusion**
In this project, we successfully designed and implemented a relational SQLite database for a simple e-commerce application. The database schema was carefully defined with appropriate primary keys, foreign keys, and constraints to ensure data integrity. Data from the provided CSV files was imported in the correct order, and comprehensive verification confirmed accurate row counts, valid sample records, and fully enforced foreign key relationships.
With the database structure complete and all data validated, the system is now ready for querying, analysis, and further extension in subsequent tasks.
"""