A data-driven investigation into why U.S. college costs keep rising, analyzing 20,603 records across 3,700+ institutions using IPEDS data.
Students pay more but get less — where does the money go?
U.S. college tuition has risen steadily while state funding collapsed during COVID-19. This project quantifies exactly how university budgets shifted between 2018 and 2023, separating the signal (structural administrative bloat) from the noise (inflation, enrollment swings).
| Metric | 2018 | 2023 | Change | Significance |
|---|---|---|---|---|
| Admin spending share | 18.5% | 19.2% | +3.8% | p=0.0018 ** |
| Instruction spending share | 47.7% | 45.7% | -4.2% | p=0.0074 ** |
| Real admin per FTE | baseline | -9.0% | p=0.0185 * | |
| Real instruction per FTE | baseline | -12.6% | p=0.0033 ** |
Private vs. Public (2023): Private universities spend 25.5% on administration vs. 13.5% at public institutions — a 1.9x gap — yet allocate less to instruction (42.3% vs. 48.7%).
State disinvestment: Average state funding per public-university student fell 51.7% between 2018 and 2023, shifting costs from governments to students.
Bottom line: Universities cut teaching investment before cutting administrative overhead when budgets tightened. Students bear higher tuition while receiving proportionally less instruction.
- Python: Pandas, NumPy, SciPy, Matplotlib, Seaborn
- SQL: SQLite (window functions, CTEs, NTILE segmentation)
- Visualization: Matplotlib (12-panel dashboard), Plotly, Tableau
- Web App: Streamlit (multipage interactive platform)
- AI: Anthropic Claude API (conversational advisor)
- Data: IPEDS (U.S. Dept. of Education), College Scorecard API
A Streamlit web app that makes the research accessible and actionable for students, parents, and advisors.
Run it locally:
streamlit run app.pyThe app supports both English and Traditional Chinese (繁體中文).
| Page | Description |
|---|---|
| 🔍 University Search | Search any U.S. college and get an integrated profile: tuition, employment outcomes, CP value score, and scholarship recommendations |
| 📊 CP Value Dashboard | Compare up to 5 colleges side-by-side on cost, earnings, debt-to-income, and ROI score with interactive charts |
| 🎓 Scholarship & Aid | Browse federal, state, and external scholarship options with eligibility filters and loan repayment calculator |
| 📈 Spending Trends | Explore the IPEDS spending analysis interactively — filter by year range and institution type |
| 🤖 AI Advisor | Conversational advisor powered by Claude that queries real College Scorecard data and answers questions in plain language |
The CP Value score is calculated as: 10-year median earnings ÷ annual net price × graduation rate. Scores are visualized with animal mascot tiers (Tiger 🐯, Fox 🦊, Koala 🐨, Sloth 🦥).
college-tuition-analysis/
├── analyze_spending_trends.py # Main Python analysis script
├── sql_data_pipeline.py # SQL analytics pipeline (SQLite)
├── app.py # Streamlit app entry point
├── requirements.txt # Python dependencies
│
├── pages/ # Streamlit multipage app
│ ├── 1_University_Search.py
│ ├── 2_CP_Value_Dashboard.py
│ ├── 3_Scholarship_Aid.py
│ ├── 4_Spending_Trends.py
│ └── 5_AI_Advisor.py
│
├── utils/ # App utilities and helpers
│ ├── __init__.py
│ ├── advisors.py
│ ├── api.py
│ ├── calculations.py
│ ├── scholarships.py
│ ├── theme.py
│ └── translations.py
│
├── data/
│ └── panel_2018_2023.csv # IPEDS panel data
│
├── queries/
│ ├── 01_state_spending_gap_ranking.sql
│ ├── 02_yoy_spending_changes.sql
│ └── 03_institution_tier_analysis.sql
│
├── outputs/ # Auto-generated analysis results
├── reports/
│ └── final_report.md
├── visualizations/ # Pre-generated charts
└── docs/
├── USAGE_EXAMPLES_EN.md
└── COLAB_GUIDE.md
The AI agent orchestrator uses the Anthropic Claude API to answer complex questions about colleges, costs, and financial aid.
- Go to console.anthropic.com
- Sign in and navigate to API Keys
- Click Create Key, copy it, and add it to your
.envfile:ANTHROPIC_API_KEY=sk-ant-...
Set DEMO_MODE=true in your .env to run the agent without making any API calls. This is useful for UI testing and demos:
DEMO_MODE=true
When enabled, the agent returns a canned response immediately — no tokens are consumed.
We recommend setting a monthly spend limit in your Anthropic Console to prevent unexpected charges:
- Go to console.anthropic.com → Plans & Billing
- Set a monthly spend limit (e.g. $5–$10 for personal use)
- The session-level limit (
DAILY_AGENT_RUN_LIMIT, default 5) provides an additional in-app guardrail
pip install -r requirements.txtDownload the IPEDS panel data and place it at data/panel_2018_2023.csv.
python analyze_spending_trends.pyOutputs are written to outputs/:
panel_2018_2023_corrected.csv— FTE-corrected, inflation-adjusted datasettrend_analysis_summary.csv— annual trend statisticstrend_analysis_comprehensive.png— 12-panel visualization
python sql_data_pipeline.pyOutputs are written to outputs/:
query1_state_spending_gap.csv— state-level admin vs. instruction gap rankingquery2_yoy_spending_changes.csv— year-over-year changes per institutionquery3_institution_tier_analysis.csv— institution tier segmentation
See docs/COLAB_GUIDE.md for step-by-step Colab instructions.
- FTE correction: 2020 FTE data shows a system-wide +258% spike (affecting 92.2% of institutions), likely from COVID-era methodology changes. Anomalous values are replaced by extrapolating each institution's 2018–2019 growth rate forward.
- Inflation adjustment: All nominal spending converted to constant 2018 dollars using BLS CPI deflators (cumulative inflation 2018–2023: +20%).
- Robustness: Medians used throughout to resist outlier distortion from elite research universities.
| Test | Use case |
|---|---|
| Linear regression | Time-trend significance (slope ≠ 0) |
| Paired t-test | Same-institution 2018 vs. 2023 comparison |
| Independent t-test | Public vs. private differences |
| Cohen's d | Effect size for practical significance |
Significance thresholds: *** p < 0.001, ** p < 0.01, * p < 0.05.
- State spending gap ranking — CTE-based 2018→2023 admin vs. instruction delta, ranked with
RANK()window function - Year-over-year changes —
LAG()andFIRST_VALUE()window functions for per-institution temporal comparison - Institution tier segmentation —
NTILE(3)dynamic percentile cutoffs to classify institutions as high/medium/low admin spenders, then aggregate by tier
IPEDS (Integrated Postsecondary Education Data System)
- Provider: U.S. Department of Education, National Center for Education Statistics
- URL: https://nces.ed.gov/ipeds/
- Surveys: Finance (F1A/F2), Institutional Characteristics (HD), Enrollment (EFFY)
- Coverage: 2018–2023 academic years, all federally-funded institutions
Key variables:
admin— Administrative and general institutional support spendinginstruction— Instructional spending (faculty salaries, course costs)research— Research spendingfte— Full-Time Equivalent student enrollmenttype— Public (GASB) or Private Nonprofit (FASB)state— State appropriations (public institutions only)
- Descriptive only — Trend correlations are identified; causal relationships require further study.
- Self-reported data — Relies on institutional IPEDS submissions; measurement error is possible.
- Missing variables — Federal funding, endowment income, and faculty counts are not included.
- U.S. scope — Findings apply to U.S. higher education; international comparison is out of scope.
Yun-Ting Su. (2026). U.S. Higher Education Spending Trends Analysis (2018-2023).
GitHub: https://github.com/Katherine-code-web/college-tuition-analysis
Yun-Ting Su | Boston University MSBA kt0704@bu.edu | linkedin.com/in/yun-ting-su-867b70364
This project is licensed under the MIT License — see LICENSE for details.
Data source: U.S. Department of Education IPEDS | CPI data: U.S. Bureau of Labor Statistics