The code is a PoC of how you can use Tensorflow in Solr query parser. It includes:
/2023/config- the configuration files for Solr/2023/solr- the final PoC code that extends Solr/2023/tf-java-poc- the work in progress code for loading and working with the created model/2023/tf-python-poc- the code for creating the model used for PoC/2023demo- the script for the demo during the talk
The video of the talk is available on YouTube.
The video of the talk is available on YouTube.
The code in the repository is the code used to prepare the data for the Berlin Buzzwords 2026 talk.
Indexes the MovieLens ml-25m dataset into a locally running Vespa instance. Creates three document types — movie, rating, and tag — and deploys the full Vespa application package automatically.
- Python 3.10+
- Vespa running in Docker with ports 19071 (config server) and 8080 (feed/query) exposed
- The
ml-25m/dataset directory (containingmovies.csv,ratings.csv,tags.csv,links.csv)
docker run --detach \
--name vespa \
--hostname vespa-container \
--publish 8080:8080 \
--publish 19071:19071 \
vespaengine/vespapython3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtDATA_DIR=./ml-25m python3 batch.pyThe script will:
- Generate the Vespa application package (schemas +
services.xml) - Deploy it to the config server
- Feed all movies (~62K docs)
- Feed all ratings (~25M docs)
- Feed all tags (~1M docs)
- Print a final summary
Note: It can take quite some time to index the data.
You can run the verify.py during indexing to check its status:
python3 verify.py