Transform any PDF into an engaging, illustrated children's storybook using Google's Gemini API.
- main.py: Contains the core logic of PDF processing, gemini initialisation text and image API calls
- chat.py: Contains the logic of chatbot query and answering
NovelAI is an intelligent application that leverages Google's Gemini 2.5 Flash API to automatically convert complex PDF documents into child-friendly storybooks complete with AI generated illustrations. The application simplifies text for young readers (ages 6-8) and creates colorful, engaging images that bring stories to life.
This project fundamentally relies on two distinct Gemini models to achieve its core functionality:
- Purpose: Simplifies complex text into child-friendly language
- Why it's necessary:
- Transforms hard english language content into simple english content for kids
- Generates concise, meaningful chapter titles (2-3 words)
- Gemini chatbot answers from uploaded novel pdf as core memory
- Creates detailed image prompts for illustration generation
- Maintains story coherence while simplifying vocabulary
- Uses structured JSON output for reliable parsing
API Implementation:
response = client.models.generate_content(
model="gemini-2.5-flash",
contents=prompt,
config=types.GenerateContentConfig(
response_mime_type="application/json",
temperature=0.6
)
)- Purpose: Creates colorful children's book illustrations
- Why it's necessary:
- Generates unique, context aware images for each chapter
- Produces child friendly, vibrant illustrations
- Maintains consistent artistic style throughout the book
- No external image APIs or databases required
API Implementation:
response = client.models.generate_content(
model="gemini-2.5-flash-image",
contents=f"Children's book illustration, colorful: {prompt}",
config=types.GenerateContentConfig(response_modalities=["IMAGE"])
)- PDF Text Extraction: Automatically extracts and chunks content into chapters
- ChatBot: Answers kids questions in simple language based on the novel
- Text Simplification: Converts complex text into simple, engaging stories
- AI Generated Illustrations: Creates unique images for each chapter
- Lazy Loading: On demand chapter processing for optimised performance
- Image Caching: Reduces redundant API calls and improves speed
- Retry Mechanism: Robust error handling with exponential backoff
- Rate Limiting: Semaphore based concurrency control
┌─────────────────────────────────────────────────────────────┐
│ Frontend │
│ (Next.js + React) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ PDF Upload │ │ Book Display │ │ Story View │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
│ HTTP/REST API
▼
┌─────────────────────────────────────────────────────────────┐
│ Backend │
│ (FastAPI + Python) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ PDF Parser │ │ Text Processor│ │ Image Gen │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
│ API Calls
▼
┌─────────────────────────────────────────────────────────────┐
│ Google Gemini API │
│ ┌──────────────────────────┐ ┌──────────────────────┐ │
│ │ Gemini 2.5 Flash │ │ Gemini 2.5 Flash │ │
│ │ (Text Simplification) │ │ Image (Illustrations)│ │
│ └──────────────────────────┘ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
- FastAPI: High-performance async web framework
- Google Gemini API: AI text and image generation
- PyPDF2: PDF text extraction
- Pydantic: Data validation and serialization
- Python-dotenv: Environment variable management
- Next.js 16: React framework with server-side rendering
- React 19: Modern UI library
- Framer Motion: Smooth animations and transitions
- TailwindCSS 4: Utility-first CSS framework
- TypeScript: Type-safe development
- Upload a PDF: Click the upload button and select any PDF document
- Wait for Processing: The app extracts text and splits it into chapters
- View Chapters: Navigate through chapters using the interactive UI
- Lazy Loading: Chapters are simplified and illustrated on-demand as you view them
- Enjoy: Read the simplified story with beautiful AI-generated illustrations!
TEXT_SEMAPHORE = asyncio.Semaphore(2) # Max 2 concurrent text requests
IMAGE_SEMAPHORE = asyncio.Semaphore(1) # Max 1 concurrent image requestIMAGE_CACHE: dict[str, str] = {} # Cache generated images by promptasync def retry_with_backoff(func, *args, max_retries=5, initial_delay=1, **kwargs):
for attempt in range(max_retries):
try:
return await func(*args, **kwargs)
except ServerError as e:
if e.code == 503: # Model overloaded
delay = initial_delay * (2 ** attempt) + random.uniform(0, 1)
await asyncio.sleep(delay)config=types.GenerateContentConfig(
response_mime_type="application/json", # Ensures parseable output
temperature=0.6 # Balanced creativity
)- Chapters are only processed when viewed by the user
- Reduces initial load time and API costs
- Improves user experience with progressive loading
- PDF Processing: ~10-30 seconds (depending on size)
- Text Simplification: ~2-5 seconds per chapter
- Image Generation: ~3-8 seconds per image
- Total Time (10 chapters): ~5-10 minutes (with lazy loading)
- 503 Errors: Automatic retry with exponential backoff
- Rate Limiting: Semaphore-based concurrency control
- Network Failures: Graceful degradation with user feedback
- Invalid PDFs: Clear error messages
- API Quota: Informative error responses