Feature/gemini cli tests by Michal-Fularz · Pull Request #3 · PUTvision/MachineLearningCourse

Michal-Fularz · 2026-04-15T18:25:50Z

Description

This PR addresses TODO tasks 1 through 4 for Lab 05, establishing a functional baseline for the Titanic classification experiment.

Changes

Lab 05 Implementation:
- Added data loading and inspection logic.
- Dropped leakage-prone columns (boat, body) and non-essential ones (home.dest).
- Renamed Pclass to TicketClass for clarity.
- Implemented a stratified train-test split (90/10 ratio).
- Added a naive baseline classifier for evaluating future models.
Environment & Build:
- Updated pyproject.toml to transition to uv for dependency management.
- Removed pandasgui from dependencies to resolve installation issues on Windows.

Verification

Ran the updated script via uv run to confirm successful data processing and baseline accuracy calculation.

/gemini review

tasks (TODO 1-4) - Add implementation for loading the Titanic dataset, cleaning irrelevant columns, renaming Pclass, and creating a stratified train-test split. - Implement a naive baseline model for accuracy comparison. - Update environment dependencies by removing incompatible `pandasgui` package for Windows compatibility. - Configure project for `uv` package management.

gemini-code-assist

Code Review

This pull request adds several new sections to the machine learning course, including anomaly detection methods, hyperparameter tuning with GridSearchCV, and ensemble techniques like Voting and Stacking. It also introduces a Titanic dataset analysis script and removes the pandasgui dependency. The review feedback identifies a data leakage issue in the LocalOutlierFactor implementation, suggesting the use of novelty=True for proper test set evaluation. Other recommendations include moving imports to the top level to comply with PEP 8, simplifying model selection logic using a dictionary, and adopting a majority-class baseline for more stable performance benchmarking.

gemini-code-assist · 2026-04-15T18:31:44Z

+    # Define the anomaly detection methods
+    methods = {
+        'IsolationForest': ensemble.IsolationForest(contamination=0.1, random_state=42),
+        'LocalOutlierFactor': neighbors.LocalOutlierFactor(n_neighbors=20),


To use LocalOutlierFactor for detecting anomalies on new data (the test set), you should set novelty=True. This allows you to fit the model on the training data and then call predict on new, unseen data. The current approach of using fit_predict on the test set (in lines 317 and 325) leads to data leakage.

Suggested change

'LocalOutlierFactor': neighbors.LocalOutlierFactor(n_neighbors=20),

'LocalOutlierFactor': neighbors.LocalOutlierFactor(n_neighbors=20, novelty=True),

gemini-code-assist · 2026-04-15T18:31:44Z

+        if method_name == 'LocalOutlierFactor':
+            method.fit(X_train_features)
+            y_pred_train = method.fit_predict(X_train_features)
+            y_pred_test = method.fit_predict(X_test_features)
+        else:
+            method.fit(X_train_features)
+            y_pred_train = method.predict(X_train_features)
+            y_pred_test = method.predict(X_test_features)
+
+        # Predict on mesh
+        if method_name == 'LocalOutlierFactor':
+            Z = method.fit_predict(np.c_[xx.ravel(), yy.ravel()])
+        else:
+            Z = method.predict(np.c_[xx.ravel(), yy.ravel()])


After setting novelty=True for LocalOutlierFactor (as suggested for line 291), this block can be greatly simplified. The special if/else logic for LocalOutlierFactor is no longer necessary and contains data leakage issues by fitting on the test set. You can use a single, unified approach for all models. Additionally, the y_pred_train variable is unused.

method.fit(X_train_features) y_pred_test = method.predict(X_test_features) # Predict on mesh Z = method.predict(np.c_[xx.ravel(), yy.ravel()])

gemini-code-assist · 2026-04-15T18:31:44Z

+
+    print_function_name()
+
+    from sklearn import covariance


According to the PEP 8 style guide, all imports should be at the top of the file. This improves readability and makes it easier to see the script's dependencies at a glance. I've noticed several imports inside functions (e.g., in todo_6, todo_7, todo_10). Please consider moving them to the top-level of the module.

References

Imports should usually be on separate lines. Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants. (PEP 8) ^(link)

gemini-code-assist · 2026-04-15T18:31:44Z

+    if best_model_name == 'Voting (Soft)':
+        best_model = voting_soft
+    elif best_model_name == 'Voting (Hard)':
+        best_model = voting_hard
+    elif best_model_name == 'Stacking':
+        best_model = stacking_clf
+    elif best_model_name == 'SVM':
+        best_model = clf_svm
+    elif best_model_name == 'Decision Tree':
+        best_model = clf_dt
+    elif best_model_name == 'KNN':
+        best_model = clf_knn
+    else:  # Logistic Regression
+        best_model = clf_lr


This if/elif chain can be simplified by using a dictionary to map model names to model objects. This makes the code more concise and easier to maintain if you add more models.

model_map = { 'Voting (Soft)': voting_soft, 'Voting (Hard)': voting_hard, 'Stacking': stacking_clf, 'SVM': clf_svm, 'Decision Tree': clf_dt, 'KNN': clf_knn, 'Logistic Regression': clf_lr, } best_model = model_map[best_model_name]

gemini-code-assist · 2026-04-15T18:31:44Z

+def todo_4(y_test):
+    print("\n--- TODO 4 ---")
+    # Naive baseline: random guess based on random probability
+    y_pred_random = np.random.choice([0, 1], size=len(y_test))
+    accuracy = metrics.accuracy_score(y_test, y_pred_random)
+    print(f"Random baseline accuracy: {accuracy:.4f}")


A random baseline's performance varies with each run, which isn't ideal for a stable benchmark. A more robust and standard naive baseline is to always predict the majority class from the training data. This provides a deterministic score to compare against.

Note: You'll need to update the function signature to accept y_train and update the call in main to todo_4(y_train, y_test).

Suggested change

def todo_4(y_test):

print("\n--- TODO 4 ---")

# Naive baseline: random guess based on random probability

y_pred_random = np.random.choice([0, 1], size=len(y_test))

accuracy = metrics.accuracy_score(y_test, y_pred_random)

print(f"Random baseline accuracy: {accuracy:.4f}")

def todo_4(y_train, y_test):

print("\n--- TODO 4 ---")

# Naive baseline: always predict the most frequent class from the training set

most_frequent_class = y_train.mode()[0]

y_pred_baseline = np.full_like(y_test, fill_value=most_frequent_class)

accuracy = metrics.accuracy_score(y_test, y_pred_baseline)

print(f"Majority class baseline accuracy: {accuracy:.4f}")

Michal-Fularz added 3 commits April 15, 2026 19:09

Copilot based solution for some of the tasks.

c592655

Merge branch 'main' into feature/gemini-cli-tests

120621a

gemini-code-assist bot reviewed Apr 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/gemini cli tests#3

Feature/gemini cli tests#3
Michal-Fularz wants to merge 3 commits intomainfrom
feature/gemini-cli-tests

Michal-Fularz commented Apr 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 15, 2026

Uh oh!

gemini-code-assist bot Apr 15, 2026

Uh oh!

gemini-code-assist bot Apr 15, 2026

Uh oh!

gemini-code-assist bot Apr 15, 2026

Uh oh!

gemini-code-assist bot Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	'LocalOutlierFactor': neighbors.LocalOutlierFactor(n_neighbors=20),
	'LocalOutlierFactor': neighbors.LocalOutlierFactor(n_neighbors=20, novelty=True),

Conversation

Michal-Fularz commented Apr 15, 2026

Description

Changes

Verification

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant