A Data Science Project by Javin Chutani
Author: Javin Chutani
Project Type: Machine Learning & Data Analytics
Status: Active
This project explores mental health challenges faced by employees in the tech industry using machine learning and data analytics. By analyzing survey data from tech workers, the project develops predictive models and insights to help organizations create better mental health support systems.
- Classification Task - Predict whether an individual is likely to seek mental health treatment based on workplace and personal factors
- Regression Task - Predict age of individuals to design age-targeted interventions
- Clustering Analysis - Segment tech employees into distinct groups based on mental health indicators for tailored HR policies
- π Exploratory Data Analysis (EDA) - Comprehensive visualization and statistical analysis
- π€ Machine Learning Models
- Random Forest Classifier
- XGBoost Classifier
- Logistic Regression
- Random Forest Regressor
- K-Means Clustering
- π Interactive Dashboard - Streamlit web application for model predictions and insights
- π Model Performance Metrics - ROC curves, confusion matrices, and detailed evaluation
- π¨ Data Visualizations - Univariate, bivariate, and multivariate analysis
- π³ Docker Support - Containerized deployment for easy setup
188nmv/
β
βββ π Images/ # Visualization outputs
β βββ bivariate1.png
β βββ bivariate2.png
β βββ cluster0.png
β βββ cluster1.png
β βββ cluster2.png
β βββ cluster3.png
β βββ dimred.png
β βββ multivariate1.png
β βββ multivariate2.png
β βββ ROC Curve - Classification.png
β βββ univariate1.png
β βββ univariate2.png
β
βββ π Models & Dataset/ # Trained models and processed data
β βββ classification_model.pkl
β βββ regression_model.pkl
β βββ df.pkl
β
βββ π Notebooks/ # Jupyter notebooks for analysis
β βββ EDA.ipynb # Exploratory Data Analysis
β βββ classification_model.ipynb # Classification model training
β βββ regression_model.ipynb # Regression model training
β βββ clustering.ipynb # Clustering analysis
β
βββ π .devcontainer/ # Development container config
β βββ devcontainer.json
β
βββ π app.py # Streamlit web application
βββ π survey.csv # Raw dataset
βββ π requirements.txt # Python dependencies
βββ π Dockerfile # Docker configuration
βββ π .dockerignore # Docker ignore file
βββ π README.md # Project documentation
- Python 3.11 or higher
- pip package manager
- Docker (optional, for containerized deployment)
-
Clone the repository
git clone https://github.com/javin1106/188nmv.git cd 188nmv -
Install dependencies
pip install -r requirements.txt
-
Run the Streamlit app
streamlit run app.py
-
Access the application
- Open your browser and navigate to
http://localhost:8501
- Open your browser and navigate to
The Docker image uses Python 3.11 as the base image for improved performance and compatibility.
# Pull the latest image
docker pull javin1106/mental-health-app:latest
# Run the container
docker run -p 8501:8501 javin1106/mental-health-app:latest# Build the image
docker build -t mental-health-app .
# Run the container
docker run -p 8501:8501 mental-health-appdocker-compose upAccess the application:
- Open your browser and navigate to
http://localhost:8501
Source: Mental Health in Tech Survey - Kaggle
The dataset contains responses from tech employees regarding:
- Demographics (age, gender, country)
- Work environment characteristics
- Mental health history
- Workplace mental health benefits
- Attitudes toward mental health treatment
- Handling missing values
- Feature engineering
- Encoding categorical variables
- Data normalization and scaling
- Univariate, bivariate, and multivariate analysis
- Correlation analysis
- Distribution plots and statistical summaries
- Target Variable: Treatment seeking behavior
- Algorithms: Logistic Regression, Random Forest, XGBoost
- Evaluation Metrics: Accuracy, Precision, Recall, F1-Score, ROC-AUC
- Target Variable: Age prediction
- Algorithms: Linear Regression, Random Forest Regressor
- Evaluation Metrics: RMSE, MAE, RΒ² Score
- Algorithm: K-Means Clustering
- Purpose: Segmentation of employees based on mental health patterns
The models demonstrate strong predictive capabilities in identifying:
- Employees at risk of mental health issues
- Key workplace factors influencing mental wellness
- Distinct employee segments requiring different support strategies
For detailed results and insights, please refer to the Technical Report.
Experience the interactive dashboard: Launch App
- Python - Core programming language
- Pandas & NumPy - Data manipulation and analysis
- Scikit-learn - Machine learning algorithms
- XGBoost - Gradient boosting framework
- Matplotlib & Seaborn - Data visualization
- Streamlit - Web application framework
- Joblib - Model serialization
- Docker - Containerization and deployment
- Family history of mental health issues is a strong predictor of treatment seeking
- Remote work policies impact mental wellness differently across demographics
- Company size and benefits significantly influence employee mental health
- Age-specific interventions can improve support program effectiveness
Contributions, issues, and feature requests are welcome! Feel free to check the issues page.
This project is open source and available for educational and research purposes.
- Dataset Source: OSMI Mental Health in Tech Survey
- Kaggle Community - For making valuable datasets accessible
Javin Chutani
- GitHub: @javin1106
- Medium: @javin.chutani
- Docker Hub: javin1106/mental-health-app
Made with β€οΈ for improving mental health awareness in tech
β Star this repository if you find it helpful!