🌾 Smallholder Agricultural Yield Prediction & Causal Inference Pipeline

Developed by Unity Aaron

🔗 Live Interactive Web Application: Click Here to Launch the App

An end-to-end data science and machine learning system engineered to analyze, predict, and optimize smallholder farmer crop yields in Edo State. This repository transitions an initial exploratory analysis into a production-ready, automated predictive pipeline featuring advanced feature engineering and an interactive web deployment interface.

📊 Project Architecture & System Overview

The project is built using a modular framework designed for scalability and professional code separation:

exploratory_analysis/: Contains the statistical identification of data structures, distribution shifts, and bimodal yield characteristics.
statistical_inference/: Houses multiple OLS regression modeling used to isolate the true treatment effects of agricultural extension training while strictly controlling for confounding physical resource factors (Farm Size, Input Volumes).
machine_learning_model/: Houses the predictive engines, including the optimized Random Forest Regressor champion model, cross-validation scripts, and frozen deployment assets (.pkl).
run_pipeline.py: An automated, single-click ETL pipeline that ingests raw operational spreadsheets, runs category mapping, executes feature calculations, and exports live predictions.
app.py: A public-facing interactive web application allowing non-technical stakeholders to interface with the predictive model in real time.

🧠 Key Analytical Discoveries & Performance Metrics

1. Causal Inference Check (Combating the Umbrella Illusion)

Through multiple linear regression, the intervention isolated the standalone impact of extension training. The training treatment effect remained statistically robust ($p < 0.001$) even when physical factor endowments were mathematically controlled, proving that yield gains are actively driven by capacity building rather than baseline farmer resource wealth.

2. Feature Engineering Multiplier Effect

By introducing an engineered interaction variable ($\text{Fertilizer Used} \times \text{Training Attended}$), the Random Forest model isolated the combined impact of physical inputs and technical knowledge.

Primary Predictive Driver: The interaction feature achieved the single highest predictive weight (40.5% Feature Importance), proving that fertilizer efficiency is structurally unlocked when combined with extension education.

3. Model Validation Stability

To guard against lucky data splits, the system was validated using a strict 5-Fold Cross-Validation protocol:

Out-of-Sample Performance: Achieved a cross-validated Average Mean Absolute Error (MAE) of 2.41%.
System Consistency: Recorded an exceptionally low Standard Deviation of 0.37%, demonstrating immense model stability across varying farmer sub-populations.

🛠️ Installation & Operational Deployment

Follow these terminal instructions on your local machine to deploy the automated pipeline and interface dashboard:

1. Clone the Architecture

git clone [https://github.com/unityaaron/DATA_SCIENCE_MACHINE_LEARNING.git](https://github.com/unityaaron/DATA_SCIENCE_MACHINE_LEARNING.git)
cd DATA_SCIENCE_MACHINE_LEARNING

2. Run the Automated Production Pipeline

To process raw farmer documents and export an automated predictions spreadsheet automatically, execute:

python run_pipeline.py

3. Launch the Interactive Web Dashboard

To deploy the live local user interface on your browser, execute:

streamlit run app.py

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
exploratory_analysis		exploratory_analysis
machine_learning_model		machine_learning_model
statistical_inference		statistical_inference
README.md		README.md
app.py		app.py
final_automated_predictions.xlsx		final_automated_predictions.xlsx
jpil_farmers_dataset.xlsx		jpil_farmers_dataset.xlsx
requirements.txt		requirements.txt
run_pipeline.py		run_pipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌾 Smallholder Agricultural Yield Prediction & Causal Inference Pipeline

Developed by Unity Aaron

📊 Project Architecture & System Overview

🧠 Key Analytical Discoveries & Performance Metrics

1. Causal Inference Check (Combating the Umbrella Illusion)

2. Feature Engineering Multiplier Effect

3. Model Validation Stability

🛠️ Installation & Operational Deployment

1. Clone the Architecture

2. Run the Automated Production Pipeline

3. Launch the Interactive Web Dashboard

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🌾 Smallholder Agricultural Yield Prediction & Causal Inference Pipeline

Developed by Unity Aaron

📊 Project Architecture & System Overview

🧠 Key Analytical Discoveries & Performance Metrics

1. Causal Inference Check (Combating the Umbrella Illusion)

2. Feature Engineering Multiplier Effect

3. Model Validation Stability

🛠️ Installation & Operational Deployment

1. Clone the Architecture

2. Run the Automated Production Pipeline

3. Launch the Interactive Web Dashboard

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages