Skip to content
View BaptisteBlouin's full-sized avatar

Block or report BaptisteBlouin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
BaptisteBlouin/README.md

Baptiste Blouin

PhD in Computer Science  ·  Data Scientist & AI Engineer  ·  NLP Specialist
Aix-en-Provence, France  ·  Open to new opportunities


About

Data Scientist and AI Engineer with a PhD in Computer Science (Aix-Marseille Université, 2022), specialising in NLP, LLMs, RAG and generative AI. I combine published ML research with hands-on product engineering.

  • 🎓 PhD in Computer Science — Aix-Marseille Université, LIS / IrAsia — ERC Advanced Grant ENP-China (n° 788476)
  • 📄 7 peer-reviewed publications — LREC-COLING, NLP4DH, TALN, PACLIC, JDMDH, JHNR
  • 🌍 Presented research in 7 countries (France, USA, China, Italy, Japan, India, Vietnam)
  • 🔭 Currently building a Data & AI SaaS platform — RAG pipeline, text-to-SQL, multi-tenant architecture

Tech Stack

Languages

Python Rust TypeScript R SQL Java C++

ML / AI

PyTorch TensorFlow HuggingFace scikit-learn LangChain

LLM & Generative AI

litellm · LangGraph · Langfuse · RAG / GraphRAG / RAPTOR · Hybrid Search (BM25 + pgvector) · Reranking · Text-to-SQL · Prompt Engineering · VLM

Full-Stack & Infrastructure

FastAPI React Docker PostgreSQL Redis Elasticsearch GCP


Featured Projects

Project Description
HistText Full-stack platform for large-scale analysis of historical Chinese texts (billions of tokens). Rust backend, React UI, Apache Solr, multilingual NER pipeline, R package on CRAN. Deployed for the international digital humanities community. 🌐 Live
EventExtractionPapers Curated and actively maintained list of NLP papers, datasets and models for event extraction. Widely used reference in the research community. ⭐ 580

Selected Publications

Year Title Venue
2024 HistText: An Application for Leveraging Large-Scale Historical Textbases JDMDH
2024 A Dataset for NER and Entity Linking in Chinese Historical Newspapers LREC-COLING
2023 Unlocking Transitional Chinese: Word Segmentation in Modern Historical Texts NLP4DH
2022 Simulation d'erreurs d'OCR dans les systèmes de TAL TALN
2021 Transferring Modern NER to the Historical Domain NLP4DH
2021 Creating Biographical Networks from Chinese and English Wikipedia JHNR
2020 Contextual Characters with Segmentation Representation for NER in Chinese PACLIC 34

baptisteblouin.fr

Pinned Loading

  1. EventExtractionPapers EventExtractionPapers Public

    A list of NLP resources focused on event extraction task

    580 92

  2. HistText HistText Public

    HistText – A full-stack Rust & React platform for large-scale historical text analysis, featuring full-text search, named entity recognition, and data visualization.

    Rust 8 2

  3. AI-resources AI-resources Public

    A collection of AI tools, libraries, papers, learning resources, and more.

    Python 3

  4. BaptisteBlouin.github.io BaptisteBlouin.github.io Public

    Personal CV & Portfolio — Baptiste Blouin

    HTML 1