Extract skills from your CV, map their relationships as an interactive graph, and find the shortest path to your target job role.
SkillGraph takes a CV as input, runs it through a three-strategy NLP pipeline to extract technical skills, then builds a weighted graph where nodes are skills and edges represent semantic or domain relationships. Given a target job role, it computes which skills you're missing and ranks them by proximity to skills you already have -- so you learn in the most efficient order.
git clone https://github.com/<your-username>/skillgraph.git
cd skillgraph
python -m venv venv && source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
python -m spacy download en_core_web_sm
streamlit run app.pyVisit http://localhost:8501. Use "Load sample CV" to try it without uploading anything.
- Skill extraction: Three complementary strategies (substring, regex, spaCy NER) with per-skill confidence scores
- Relationship graph: Edges built from sentence-transformer semantic similarity plus 25+ hardcoded domain rules (Python-Flask, Docker-Kubernetes, etc.)
- Job role matching: Compare extracted skills against 18+ predefined roles; see matched and missing skills grouped by category
- Learning recommendations: Missing skills ranked by shortest-path distance to your current skills in the graph
- Graph metrics: Network density, average degree, PageRank-based centrality
- File upload: Accepts PDF, DOCX, and plain text CVs
- Export: Download results as JSON or CSV
skill-graph/
├── app.py # Streamlit application
├── skill_extractor.py # Multi-strategy skill extraction
├── graph_builder.py # Graph construction, metrics, recommendations
├── visualise.py # Plotly network graph and confidence chart
├── job_roles.json # 18+ job role definitions
├── config.yaml # Configuration
└── test_suite.py # pytest test suite
| Layer | Library |
|---|---|
| NLP | spaCy, sentence-transformers |
| Graph | NetworkX |
| Visualisation | Plotly |
| Frontend | Streamlit |
| Testing | pytest |
| Component | Time | Space |
|---|---|---|
| Skill extraction | O(n*m) | O(n) |
| Semantic similarity | O(n^2) | O(n^2) |
| Graph building | O(n^2 + k) | O(n + e) |
| Recommendations | O(n*log n) | O(n) |
n = skills, m = vocabulary size, e = edges, k = domain relationships
Three strategies run in sequence; each assigns a confidence score:
- Substring matching (0.95): Direct lookup of known skills in the input text. Fast and precise for explicit mentions.
- Regex patterns (0.85): Catches common abbreviations like
ML(Machine Learning),NLP,RESTful. - spaCy NER (0.80): Named Entity Recognition over PRODUCT and ORG entities, cross-referenced against the skill database to reduce false positives.
- Encode all extracted skills with
all-MiniLM-L6-v2 - Compute cosine similarity; add an edge between any two skills above the threshold (default 0.3)
- Overlay domain knowledge edges at higher weights (e.g. Python-Flask: 0.95)
- Compute PageRank to identify the most central skill
For each missing skill, score it by proximity to your current skills:
score = sum(1 / (shortest_path_distance + 1)) for each current skill
Skills closer to what you already know score higher. Skills with no graph path score 0.
pytest -v --cov=.The test suite covers unit tests for each module, integration tests for the full pipeline, edge cases (empty input, unicode, special characters), and a performance test asserting extraction completes in under 5 seconds.
Edit config.yaml to change the similarity threshold, transformer model, graph layout parameters, or logging level.
PDF and DOCX CV parsing(done)- Skill proficiency levels (Beginner / Intermediate / Expert) with weighted graph influence
- GitHub Actions CI badge
- Shareable results via URL params