poyrazK · poyrazK · May 15, 2026 · May 15, 2026 · May 15, 2026
diff --git a/docs/adr/003-btree-multi-level-growth.md b/docs/adr/003-btree-multi-level-growth.md
@@ -0,0 +1,101 @@
+# ADR 003: B+ Tree Multi-Level Growth
+
+## Status
+Accepted
+
+## Date
+2026-05-05
+
+## Context
+
+The cloudSQL storage engine needed a durable on-disk B+ tree index capable of multi-level growth. Early phases implemented slot array format (Phase 1) and find_leaf() traversal (Phase 2), but inserts into a full leaf would fail silently or corrupt tree structure.
+
+The problem: a B+ tree must handle arbitrary depth growth through a cascade of splits — leaf splits propagate to parent internal nodes, which may themselves split, recursively up to a new root.
+
+## Decision
+
+Implement a five-phase approach to multi-level B+ tree growth:
+
+### Phase 1: Slot Array Format
+- **Entries grow backward** from PAGE_SIZE end
+- **Slots grow forward** from after NodeHeader
+- Slot array: `SlotEntry { uint16_t offset, uint16_t length }` — 4 bytes each
+- Binary entry format enables O(1) slot access without deserializing all entries
+
+### Phase 2: find_leaf() with Binary Search
+- Traverse from root to leaf by binary-searching internal node slots
+- `compare_separator()` compares key against separator at slot position
+- Returns leaf page number directly; no iteration needed
+
+### Phase 3: Leaf Split (split_leaf)
+- Split at midpoint: upper half entries copied to new right leaf
+- Right leaf's `next_leaf` pointer chain maintained for range scans
+- `pending_separator_` stores the separator key for parent insertion
+- Returns new right page number so caller can wire up parent link
+
+### Phase 4: Parent Propagation (insert_into_parent / split_internal)
+- **Separator promotion**: entry at split_point is **promoted** to parent, not copied to children
+- Left node: slots [0, split_point), children [0, split_point+1)
+- Right node: slots [split_point+1, num_keys), children [split_point+1, num_keys+1)
+- Child at split_point+1 becomes leftmost child of right node after split
+- `update_child_parent()` updates parent_page pointers on all affected children
+- Split cascade: if parent is also full, recurse with promoted separator
+
+### Phase 5: Root Split Handling
+- Root split detected when `parent_page == 0` (root has no parent)
+- `create_new_root()` allocates new root as internal node with 1 separator
+- Both split children updated to point to new root
+- `root_page_` updated to new root page number
+
+### Entry Format
+- **Leaf entry**: `type(1) + key_len(4) + key_data(N) + page_num(4) + slot_num(2)` = 11+N bytes
+- **Internal entry**: `type(1) + key_len(4) + key_data(N) + child_page_num(4)` = 9+N bytes
+- `NodeHeader`: 12 bytes — type + num_keys + parent_page + next_leaf
+
+### Slot Access
+- `get_slot(buffer, slot_idx, out)`: returns SlotEntry at slot_idx
+- `put_slot(buffer, slot_idx, entry)`: writes SlotEntry at slot_idx
+- `get_data_start_offset(num_keys)`: returns start of entry data area (grows backward)
+- `compute_entry_size(key)`: computes serialized entry size for a key
+
+## Consequences
+
+### Positive
+- Multi-level tree growth handled correctly through split cascade
+- Root split case properly distinguished from non-root splits
+- Range scans remain correct via next_leaf chain maintained on split
+- Slot array format enables binary search without full entry deserialization
+
+### Negative
+- Split cascade may cause multiple page writes per insert in worst case
+- Internal node entries do not store slot_num (unlike leaf entries which store page_num + slot_num for RIDs)
+- No balancing/redistribution between siblings — always splits at midpoint
+
+### Neutral
+- Depth grows only when root (and only root) splits — tree depth increments slowly
+- All children of split internal nodes get correct parent pointers via update_child_parent()
+
+## Alternatives Considered
+
+### Alternative 1: Always split at first available slot, redistribute later
+**Why rejected:** Redistribution adds complexity and requires additional writes. Midpoint split is deterministic and provides good balance.
+
+### Alternative 2: Store full entries in internal nodes (not just separators)
+**Why rejected:** Internal nodes store separator keys only — actual data lives in leaf nodes. This keeps internal nodes lean and maximizes branching factor.
+
+### Alternative 3: Top-down splitting (split during descent)
+**Why rejected:** Top-down splitting requires holding locks on multiple pages during traversal. Bottom-up (split on insert) defers splits and only touches affected pages.
+
+## Implementation Phases
+
+| Phase | Feature | Status |
+|-------|---------|--------|
+| 1 | Slot array format | Done |
+| 2 | find_leaf() traversal | Done |
+| 3 | split_leaf() | Done |
+| 4 | insert_into_parent() / split_internal() | Done |
+| 5 | Root split handling | Done |
+
+## Test Results
+- 29/29 BTreeIndexTests pass
+- 1 pre-existing failure: BTreeIndexNextLeafTests.ScanIterator_NextLeaf (page format mismatch — raw test predates slot array)
diff --git a/include/storage/btree_index.hpp b/include/storage/btree_index.hpp
@@ -34,9 +34,23 @@ class BTreeIndex {
         NodeType type;
         uint16_t num_keys;
         uint32_t parent_page;
-        uint32_t next_leaf;  // For leaf nodes
+        uint32_t next_leaf;  // For leaf nodes: next leaf page. For internal: rightmost child.
     };
 
+    /**
+     * @brief Slot entry — points to an entry in the data area of a page.
+     * Slot array grows forward from after NodeHeader.
+     * Entry data grows backward from end of page.
+     */
+    struct SlotEntry {
+        uint16_t offset;   // Byte offset from start of page to entry data
+        uint16_t length;   // Entry size in bytes
+    };
+
+    static constexpr uint16_t kSlotSize = sizeof(SlotEntry);  // 4 bytes per slot
+    static constexpr uint16_t kMaxSlots =
+        (Page::PAGE_SIZE - sizeof(NodeHeader)) / sizeof(SlotEntry);  // ~1014 slots max
+
     /**
      * @brief Index entry (Key + TupleId)
      */
@@ -71,6 +85,7 @@ class BTreeIndex {
     BufferPoolManager& bpm_;
     common::ValueType key_type_;
     uint32_t root_page_ = 0;
+    common::Value pending_separator_;
 
    public:
     BTreeIndex(std::string index_name, BufferPoolManager& bpm, common::ValueType key_type);
@@ -87,6 +102,7 @@ class BTreeIndex {
 
     [[nodiscard]] const std::string& index_name() const { return index_name_; }
     [[nodiscard]] common::ValueType key_type() const { return key_type_; }
+    [[nodiscard]] uint32_t root_page() const { return root_page_; }
 
     bool create();
     bool open();
@@ -103,12 +119,47 @@ class BTreeIndex {
    private:
     /* Internal B-tree logic */
     [[nodiscard]] uint32_t find_leaf(const common::Value& key) const;
-    void split_leaf(uint32_t page_num, char* buffer);
-    // void split_internal(...) // TODO phase 2
+    [[nodiscard]] uint32_t split_leaf(uint32_t page_num, char* buffer);
+    bool split_internal(uint32_t page_num, char* buffer, uint16_t insert_pos,
+                        uint32_t left_child, uint32_t right_child,
+                        uint32_t& out_right_page);
 
     bool read_page(uint32_t page_num, char* buffer) const;
     bool write_page(uint32_t page_num, const char* buffer);
     [[nodiscard]] uint32_t allocate_page();
+
+    /* Slot array helpers */
+    [[nodiscard]] uint16_t get_data_start_offset(uint16_t num_keys) const;
+    [[nodiscard]] uint16_t compute_entry_size(const common::Value& key) const;
+    [[nodiscard]] bool get_slot(const char* buffer, uint16_t slot_idx, SlotEntry& out) const;
+    bool put_slot(char* buffer, uint16_t slot_idx, const SlotEntry& entry);
+    bool append_entry_at(char* buffer, uint16_t slot_idx, const SlotEntry& entry,
+                         const common::Value& key, HeapTable::TupleId tuple_id);
+
+    /* Entry serialization */
+    [[nodiscard]] bool serialize_entry(const common::Value& key, HeapTable::TupleId tuple_id,
+                                      char* out_buf, uint16_t buf_size,
+                                      uint16_t& bytes_written) const;
+    [[nodiscard]] bool deserialize_entry(const char* buf, uint16_t buf_size,
+                                        common::Value& out_key,
+                                        HeapTable::TupleId& out_tuple_id) const;
+
+    /* Key comparison */
+    [[nodiscard]] int compare_keys(const common::Value& a, const common::Value& b) const;
+
+    /* Internal node navigation */
+    [[nodiscard]] uint32_t find_child_for_key(const char* buffer, const common::Value& key, uint16_t num_keys) const;
+    [[nodiscard]] uint32_t get_child_page(const char* buffer, uint16_t slot_idx) const;
+    [[nodiscard]] int compare_separator(const char* buffer, uint16_t sep_idx, const common::Value& key) const;
+
+    /* Internal node insertion (Phase 4/5) */
+    [[nodiscard]] common::Value extract_key_from_entry(const char* entry_ptr, uint16_t entry_length) const;
+    [[nodiscard]] bool serialize_internal_entry(const common::Value& key, uint32_t child_page_num,
+                                                char* out_buf, uint16_t buf_size,
+                                                uint16_t& bytes_written) const;
+    bool insert_into_parent(const common::Value& sep_key, uint32_t left_page, uint32_t right_page);
+    bool create_new_root(const common::Value& sep_key, uint32_t left_child, uint32_t right_child);
+    bool update_child_parent(uint32_t child_page, uint32_t parent_page);
 };
 
 }  // namespace cloudsql::storage