Mobiko NLP

A biodiversity information extraction pipeline using NLP techniques.

Overview

This project provides tools for extracting and classifying biodiversity-related entities from text documents using:

BERT-based Named Entity Recognition (NER). Work in progress!
LLM-based extraction for biodiversity entity classification with structured schemas (Demo version).
spaCy for text processing and noun phrase extraction

Installation

Docker Compose Usage

# Start the development container
docker-compose up -d

# Run commands inside the container
docker-compose exec biodiv python src/ner/bert_ner_baseline.py --in_dir data --out_jsonl output/ner_results.jsonl
docker-compose exec biodiv python src/demo/demo.py --in_dir data --out_jsonl output/demo_results.jsonl

# Run one-off tasks without starting the persistent container
docker-compose run --rm biodiv python src/ner/bert_ner_baseline.py --help

# Stop the container
docker-compose down

Local Installation

If installing locally, refer to Dockerfile for exact dependencies:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txt

Download spaCy model:

python -m spacy download en_core_web_trf

For OpenAI integration, set your API key in the environment on the machine that runs the code:

export OPENAI_API_KEY="your-api-key-here"

For the remote interpreter workflow, keep secrets out of tracked files:

cp .env.example .env
# fill in OPENAI_API_KEY and/or OPEN_WEB_UI_API_KEY on the remote machine

The code now auto-loads .env from the repo root when present. If you prefer to keep the secrets file elsewhere on the remote host, set:

export MOBIKO_ENV_FILE="/absolute/path/to/remote-secrets.env"

Name		Name	Last commit message	Last commit date
Latest commit History 155 Commits
config		config
deploy_annotate		deploy_annotate
scripts		scripts
src		src
tables/thesaurus		tables/thesaurus
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mobiko NLP

Overview

Installation

Docker Compose Usage

Local Installation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mobiko NLP

Overview

Installation

Docker Compose Usage

Local Installation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages