Skip to content

raph-bl/Alice-In-Wonderland

Repository files navigation

banner

Alice in Wonderland

NLP-based system for analyzing literature from Project Gutenberg using topic modeling, similarity analysis, clustering, and more.

Install

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Project structure

├── lib/
│   ├── __init__.py
│   ├── card.py
│   ├── entities.py
│   ├── lexdiv.py
│   ├── scrapper.py
│   ├── similar.py
│   ├── summarize.py
│   ├── tools_nlp.py
│   └── topics_f.py
├── books/
├── venv/
├── .gitignore
├── banner.png
├── bookworm.py
├── README.md
├── requirements.txt

Usage

Bootstrap

python bookworm.py --info <book_id>        # Get book metadata
python bookworm.py --download <book_id>    # Download book from Project Gutenberg

Analysis

python bookworm.py --topics <book_id>      # Extract top topics per section
python bookworm.py --topics_lda <book_id>  # Topic modeling with LDA
python bookworm.py --topics_lsa <book_id>  # Topic modeling with LSA
python bookworm.py --similar <book_id>     # Find similar books
python bookworm.py --entities <book_id>    # Extract named entities
python bookworm.py --summarize <book_id>   # Summarize a book
python bookworm.py --card <book_id>        # Book info card
python bookworm.py --lexdiv <book_id>      # Lexical diversity metrics

Example

python bookworm.py --download 11
python bookworm.py --similar 11
python bookworm.py --topics 11

About

Academic NLP project to analyze Project Gutenberg books — topic modeling, similarity, named entities, summarization & clustering

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages