Welcome to the Data Visualization with Python repository! This project is a comprehensive, step-by-step guide to mastering the fundamentals and advanced techniques of data visualization using Python's two most powerful plotting libraries: Matplotlib and Seaborn.
Whether you are a beginner looking to understand basic charts or an intermediate developer wanting to create complex subplots, heatmaps, and publication-ready figures, this repository covers the entire learning path.
The project follows a clean, standardized layout designed for ease of use and clean version control:
Data_Visualization/
β
βββ π data/ # Placeholder for external datasets (CSV, JSON, etc.)
β
βββ π notebooks/ # Step-by-step Jupyter Notebook tutorials
β βββ π 1-matplotlib_tutorial.ipynb # Part 1 & Part 2: Lines, Bars, Scatters, Pies, Histograms, Box/Stack/Subplots
β βββ π 2-seaborn_tutorial.ipynb # Part 2: Seaborn basics, Relational, Categorical, Distribution Plots & Heatmaps
β
βββ π outputs/ # Directory where generated and saved plots will be stored
β
βββ π .gitattributes # Git attribute configurations
βββ π .gitignore # Standard gitignore for Python and Jupyter artifacts
βββ π LICENSE # Standard MIT open-source license
βββ π requirements.txt # Dependencies needed to run the notebooks
βββ π README.md # Main repository guide (you are here!)
The tutorial is split across three notebooks, ordered sequentially to build your skills:
- What is Data Visualization? & How to Plot Data.
- Introduction to Matplotlib & Important Plot Methods.
- Line Plots β Multiple datasets, format strings (colors/styles/markers), and grid settings.
- Vertical & Horizontal Bar Charts β Bar labels, multiple datasets, and categories.
- Scatter Plots β Customizations (colors/sizes), overlays, and annotations.
- Pie Charts β Exploded slices, custom color palettes, shadows, and percentages.
- Styling & Saving Plots β Implementing legends, grid lines, titles, and high-res exports (
savefig()).
- Histograms β Understanding frequency distributions, plotting multiple datasets, and adding threshold markers (
axvline). - Box Plots β Visualizing data quartiles, medians, and outliers, along with common box plot operations.
- Stack Plots β Displaying cumulative changes over time for multiple variables.
- Subplots β Creating grids of plots (multiple axes in a single figure) using
plt.subplots(). - Modern Matplotlib Styles β Incorporating styling rules and modern design aesthetics.
- Practice Task (Weekly Temperature) β Practical exercise putting all Matplotlib concepts to work.
- Introduction to Seaborn β Why Seaborn is used, and how it simplifies plotting complex data frames.
- Creating Plots with Seaborn β Custom styles, grid configurations, and built-in theme presets.
- Relational Plots (
relplot) β Visualizing statistical relationships (scatter plots and line plots) with color/size semantics. - Categorical Plots (
catplot) β Bar plots, box plots, violin plots, and strip plots grouped by category. - Distribution Plots (
displot) β KDE (Kernel Density Estimate) plots, histograms, and cumulative distributions. - Heatmaps (
heatmap) β Visualizing correlation matrices, pivot tables, and relational grid weights with color bars. - Best Practices for Data Visualization β Final tips on layout structure, font scaling, palette choices, and statistical communication.
Follow these simple steps to run this project locally on your machine:
Make sure you have Python 3.8+ installed. You can check your Python version by running:
python --version-
Clone this repository:
git clone https://github.com/your-username/your-repo-name.git cd your-repo-name -
Create and activate a virtual environment (optional but recommended):
- On macOS/Linux:
python -m venv venv source venv/bin/activate - On Windows:
python -m venv venv venv\Scripts\activate
- On macOS/Linux:
-
Install the required dependencies:
pip install -r requirements.txt
-
Launch Jupyter Notebook:
jupyter notebook
Navigate to the
notebooks/directory and openmatplotlib_tutorial.ipynb,1-matplotlib_tutorial.ipynb, or2-seaborn_tutorial.ipynbto begin!
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5), dpi=300)
# Left Plot: Sine
ax1.plot(x, np.sin(x), color='#2ca02c', linewidth=2)
ax1.set_title('Sine Wave')
ax1.grid(True, linestyle=':', alpha=0.6)
# Right Plot: Cosine
ax2.plot(x, np.cos(x), color='#d62728', linewidth=2, linestyle='--')
ax2.set_title('Cosine Wave')
ax2.grid(True, linestyle=':', alpha=0.6)
plt.tight_layout()
plt.savefig('outputs/matplotlib_subplots.png')
plt.show()import seaborn as sns
import matplotlib.pyplot as plt
# Load a built-in dataset
tips = sns.load_dataset("tips")
# Create a relational scatter plot
plt.figure(figsize=(8, 5))
sns.scatterplot(
data=tips,
x="total_bill",
y="tip",
hue="smoker",
style="time",
size="size",
palette="Set2"
)
plt.title('Tips Analysis by Bill Amount & Smoker Status', fontsize=12, fontweight='bold')
plt.savefig('outputs/seaborn_relational_plot.png', bbox_inches='tight')
plt.show()When styling your plots, always adhere to the following rules:
- Colors & Contrast: Avoid high-contrast primary colors. Use custom hex codes or Seaborn's preset palettes (like
muted,deep, or custom color maps). - Readability: Use
plt.tight_layout()orbbox_inches='tight'to avoid cutting off axes labels when exporting figures. - Information Hierarchy: Highlight critical points (anomalies or limits) using line markers like
axvlineor text annotations. - Density plots: Use Seaborn's
kdeplotor custom opacity markers (alpha) when plotting overlapping scatter distributions.
For more detailed information, API limits, and plotting guides, refer to the following official resources:
-
Matplotlib:
- Official Matplotlib Homepage β Reference manual and quickstart templates.
- Matplotlib Tutorials & User Guide β Beginner to advanced visualization walkthroughs.
- Matplotlib Plot Types Gallery β Example plots categorized by design types.
-
Seaborn:
- Official Seaborn Homepage β Statistical data visualization tool documentation.
- Seaborn Tutorial Guide β Detailed breakdown of Seaborn APIs and usage patterns.
- Seaborn Example Gallery β Showcase of styled statistical charts.
-
Data Science Core Libraries:
- Pandas Documentation β Python Data Analysis library (DataFrames/Series structures).
- NumPy Documentation β N-dimensional arrays and mathematical utilities.
- Jupyter Documentation β Guidelines on running Jupyter notebook editor environments.
-
Visualization Standards & Galleries:
- Financial Times Visual Vocabulary β A structured blueprint to select the most appropriate graph styles.
- The Python Graph Gallery β Inspiration and templates for Python charts.
This project is licensed under the MIT License - see the LICENSE file for details.