bot.zet PoS tagger

Bi-directional LSTM tagger with subword embeddings for EmpiriST/PostWITA datasets. The project now targets Python 3.12+, modern TensorFlow/Keras, and uses uv for dependency management.

Quickstart

Install dependencies:
```
uv sync
```
Configure embedding paths in tagger.ini or via env vars:
- TAGGER_W2V_SMALL=/path/to/small.vec
- TAGGER_W2V_BIG=/path/to/big.vec

Train:

uv run tagger train --task postwita --config tagger.ini --epochs 5

Predict:

uv run tagger predict --model-path artifacts/tagger.keras --output-ext .pred

Project layout

src/tagger/ package with config handling, data utilities, and the BiLSTM model.
tagger.ini optional config; env vars prefixed with TAGGER_ override values.
pyproject.toml defines runtime/dev dependencies; uv.lock produced by uv sync.

Notes

Default data root is <repo>/data; override with --data-root or TAGGER_DATA_ROOT.
Saved models use Keras’ .keras format containing architecture + weights.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
src/tagger		src/tagger
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
LICENSE-ASF-2.0		LICENSE-ASF-2.0
README.md		README.md
pyproject.toml		pyproject.toml
tagger.ini		tagger.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bot.zet PoS tagger

Quickstart

Project layout

Notes

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

bot.zet PoS tagger

Quickstart

Project layout

Notes

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages