Skip to content

ch007m/java-tree-sitter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Java tree-sitter client

Interactive CLI for parsing source code with tree-sitter and querying the resulting AST. Built with Aesh and Quarkus.

The project contains three modules, each using a different Java tree-sitter library to explore the trade-offs between JNI bindings, managed language packs, and pure-Java WASM runtimes.

Prerequisites

  • Maven 3.9+
  • JDK version depends on the module (see below)

Modules

Module Library Binding Languages
treesitter4j-client treesitter4j (io.roastedroot) + Chicory WASM Pure Java (WASM) Java, YAML, JSON, Properties, HTML, XML, Markdown
bonede tree-sitter-ng (io.github.bonede) JNI Java
languagepack tree-sitter-language-pack (dev.kreuzberg) JNI + auto-download Java, YAML, JSON, XML, HTML, JS, ...

treesitter4j client

Uses the treesitter4j library which runs tree-sitter entirely in pure Java -- no native C libraries or JNI required. The tree-sitter core and language grammars are compiled to WASM and executed via the Chicory WebAssembly runtime.

  • Polyglot: supports Java, YAML, JSON, XML, Properties, HTML, Markdown
  • Automatic language detection from file extension
  • Parses source files and exports AST trees as JSON (using ASTExporter / ASTJsonSerializer)
  • Query persisted AST nodes by type, file path, or text content

Requirements:

Item Version / Details
JDK 21+
Runtime None -- pure Java
# Build and run
mvn clean install -pl treesitter4j-client
java -jar treesitter4j-client/target/treesitter4j-client-1.0.0-SNAPSHOT-runner.jar

Using jbang

jbang app install --force --name ts4j dev.snowdrop:treesitter4j-client:1.0.0-SNAPSHOT:runner

ts4j parse /path/to/project
ts4j query class --app /path/to/project
ts4j query "class = Customer" --app /path/to/project
ts4j query "method contains get" --app /path/to/project
ts4j query "property = quarkus.datasource.*" --app /path/to/project
ts4j query "pom-dependency = io.quarkus:*" --app /path/to/project
ts4j query class_declaration --app /path/to/project --reload
ts4j types
ts4j types -L java
Option Short Applies to Description
--app <path> -a <path> query, types Path to the application directory (defaults to current directory)
--file <filter> -f <filter> query Filter results by file path (substring)
--text <text> -t <text> query Filter by node text (case-insensitive)
--reload -r query Force re-parse of source files before querying
--language <lang> -L <lang> types Filter node types by language (e.g., java, yaml, json)

Query syntax

The query command supports human-friendly aliases with optional operators. Raw tree-sitter node types (e.g. class_declaration) are also accepted.

Form Description Example
alias List all matches class
alias = value Exact match class = Customer
alias = glob* Wildcard match (* matches any characters) property = quarkus.datasource.*
alias contains text Case-insensitive substring match method contains get

Available aliases

Java:

Alias Captures
class Class names
interface Interface names
enum Enum names
method Method names
constructor Constructor names
field Field names
annotation Annotation names
import Import declarations
package Package declarations

Properties:

Alias Captures
property Property keys

XML:

Alias Captures
element XML element tag names
attribute XML attribute names

POM (Maven):

POM aliases compose GAV coordinates (groupId:artifactId[:version]) from child XML elements. The version component is included only when a <version> element is present.

Alias Captures
pom-dependency <dependency> GAV coordinates
pom-plugin <plugin> GAV coordinates
pom-parent <parent> GAV coordinates
pom-extension <extension> GAV coordinates
# List all dependencies
ts4j query pom-dependency --app /path/to/project

# Find a specific dependency
ts4j query "pom-dependency = io.quarkus:quarkus-rest" --app /path/to/project

# Wildcard on groupId
ts4j query "pom-dependency = io.quarkus:*" --app /path/to/project

# Search by substring
ts4j query "pom-dependency contains hibernate" --app /path/to/project

Excluding directories during parsing

The parse command skips certain files/directories by default. This is controlled by the ts4j.parser.exclude-dirs configuration property, which accepts a comma-separated list of directory names or prefix patterns (using a trailing *).

Defaults (defined in application.properties):

ts4j.parser.exclude-dirs=.*,target,node_modules

This skips hidden directories (names starting with .), target, and node_modules.

To override the defaults, pass the property on the command line:

java -Dts4j.parser.exclude-dirs=".*,target,node_modules,build,dist" \
     -jar treesitter4j-client/target/treesitter4j-client-1.0.0-SNAPSHOT-runner.jar

# Or with jbang
ts4j -Dts4j.parser.exclude-dirs=".*,build" parse /path/to/project

You can also set it via the TS4J_PARSER_EXCLUDE_DIRS environment variable (MicroProfile Config convention):

export TS4J_PARSER_EXCLUDE_DIRS=".*,target,node_modules,build"
ts4j parse /path/to/project

bonede

Uses the bonede tree-sitter-ng JNI bindings which ship pre-built native libraries for each platform. Each language grammar is a separate Maven artifact (e.g. tree-sitter-java).

  • Parses .java files under a directory and builds an in-memory AST store
  • Query the AST with predefined query names or raw S-expression patterns
  • Persist/reload parsed projects to disk

Requirements:

Item Version / Details
JDK 21+
Platform macOS (aarch64/x86_64), Linux (x86_64/aarch64), Windows -- native libs bundled in the Maven artifact
# Build and run
mvn clean install -pl bonede
java -jar bonede/target/aesh-tree-sitter-bonede-1.0.0-SNAPSHOT-runner.jar

Using jbang

jbang app install --force --name ng dev.snowdrop:aesh-tree-sitter-bonede:1.0.0-SNAPSHOT:runner

ng parse /path/to/project
ng query classes
ng query "(method_declaration name: (identifier) @name)"

Predefined queries: classes, methods, constructors, imports, fields, interfaces, enums, annotations, packages, strings, method-calls

Option Short Description
--limit N -l N Max results to display
--file <filter> -f <filter> Filter by file path (substring)
--name <name> -n <name> Filter annotations by name
--list-queries -L List predefined query names

languagepack

Uses the Kreuzberg tree-sitter-language-pack which bundles grammars for many languages in a single dependency with the help of JDK Panama. Language detection is automatic based on file extension, and grammars are downloaded on first use.

  • Polyglot: supports Java, YAML, JSON, XML, Properties, Markdown, HTML, JavaScript
  • Automatic language detection from file paths
  • Extracts structural items (classes, methods, imports) per language

Requirements:

Item Version / Details
JDK 25+
Network Required on first run to download grammar native libraries
mvn clean install -pl languagepack
java -jar languagepack/target/aesh-tree-sitter-languagepack-1.0.0-SNAPSHOT-runner.jar

jbang app install --force --name lp dev.snowdrop:aesh-tree-sitter-languagepack:1.0.0-SNAPSHOT:runner

lp parse /pat/to/project
lp query annotation

Build all modules

mvn clean install

To be reviewed

Install the wasm, tree-sitter lib and client using brew

brew install tree-sitter-cli
brew install tree-sitter
brew install binaryen

Using dylib

  • Create the following path mkdir -p ~/Library/Java/Extensions
  • Add a symlink link between the tree-sitter lib and ~/Library/Java/Extensions
ln -s /opt/homebrew/Cellar/tree-sitter/0.26.9/lib/libtree-sitter.0.26.dylib ~/Library/Java/Extensions/libtree-sitter.dylib
  • For the grammar/language to be used on your machine, git clone and build the dylib
  • Execute the following bash script able to git clone and build the needed grammars dylib and wasm
./scripts/build-grammar-lib.sh
# The dylib will be created under: lib/wasm_dylib_output
# cp them to: ~/Library/Java/Extensions
cp ./lib/wasm_dylib_output/*.dylib ~/Library/Java/Extensions
cp ./lib/wasm_dylib_output/*.wasm ~/Library/Java/Extensions

Build tree-sitter wasm

Remark: The tree-sitter core lib has been installed using the brew command !

As we cannot build the tree-sitter project like the languages, then we can build/install it using a skill:

git clone https://github.com/tree-sitter/tree-sitter.git "lib/tree-sitter"
cd "lib/tree-sitter/lib"

# To build the wasm file, we need the help of a skill
git clone https://github.com/andreaTP/skill-compile-to-wasm.git
cp -r skill-compile-to-wasm ~/.claude/skills/compile-to-wasm
claude "Build this tree-sitter project to wasm"

# When done
cp *.wasm ~/Library/Java/Extensions/

OR

using the following bash script

./scripts/build-tree-sitter-wasm.sh
  • If you prefer to do it manually, following these instructions
git clone https://github.com/tree-sitter/tree-sitter-java & cd tree-sitter-java
make all
cp libtree-sitter-java.* ~/Library/Java/Extensions/

and many more

git clone https://github.com/tree-sitter-grammars/tree-sitter-yaml.git
cd tree-sitter-yaml
make all
cp libtree-sitter-yaml.* ~/Library/Java/Extensions/

git clone https://github.com/tree-sitter-grammars/tree-sitter-properties.git
cd tree-sitter-properties
make
cp libtree-sitter-properties.* ~/Library/Java/Extensions/

git clone https://github.com/tree-sitter-grammars/tree-sitter-markdown.git
cd tree-sitter-markdown/
make
cp tree-sitter-markdown-inline/libtree-sitter-markdown-inline.* ~/Library/Java/Extensions/
cp tree-sitter-markdown/libtree-sitter-markdown.* ~/Library/Java/Extensions/

git clone https://github.com/tree-sitter/tree-sitter-html.git
cd tree-sitter-html
make all
cp libtree-sitter-html.* ~/Library/Java/Extensions/

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors