Skip to content

Document JBang UDF support with examples #1854

@velo

Description

@velo

(We need to include JBang-specific information to function.md where we talk how we support Java UDFs, and also mention it in deepdive.md in the preprocessor section.)

Summary

DataSQRL now supports writing Flink User-Defined Functions (UDFs) as single-file Java scripts using JBang. This feature needs user-facing documentation explaining how to use it, with practical examples.

Background

JBang UDFs allow users to write custom functions as simple .java files without needing a full Maven/Gradle project. The DataSQRL CLI automatically detects, compiles, and packages these into the pipeline.

Key implementation details:

  • JBang files are detected by the shebang line: ///usr/bin/env jbang "$0" "$@" ; exit $?
  • Flink dependencies are provided automatically — users must NOT declare them in //DEPS
  • Multiple JBang files are batched into a single jbang-udfs.jar for efficiency
  • Regular .java files without the shebang are ignored by the preprocessor

What to Document

1. Getting Started

  • JBang must be installed on the user's system (curl -Ls https://sh.jbang.dev | bash)
  • UDF files go in the usrlib/ directory of the SQRL project

2. Writing a Scalar Function

///usr/bin/env jbang "$0" "$@" ; exit $?
import org.apache.flink.table.functions.ScalarFunction;

public class MyScalarFunction extends ScalarFunction {

  public Long eval(Long a, Long b) {
    return a + b;
  }
}

3. Writing an Async Scalar Function

///usr/bin/env jbang "$0" "$@" ; exit $?
import org.apache.flink.table.functions.AsyncScalarFunction;
import org.apache.flink.table.functions.FunctionContext;

import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

public class MyAsyncScalarFunction extends AsyncScalarFunction {

    private transient ExecutorService executor;

    @Override
    public void open(FunctionContext context) throws Exception {
        this.executor = Executors.newFixedThreadPool(10);
    }

    @Override
    public void close() throws Exception {
        if (executor != null && !executor.isShutdown()) {
            executor.shutdown();
            if (!executor.awaitTermination(60, TimeUnit.SECONDS)) {
                executor.shutdownNow();
            }
        }
    }

    public void eval(CompletableFuture<String> result, String param1, int param2) {
        executor.submit(() -> {
            try {
                Thread.sleep(1000);
                String response = "Processed " + param1 + " with " + param2;
                result.complete(response);
            } catch (Exception e) {
                result.completeExceptionally(e);
            }
        });
    }
}

4. Using UDFs in SQRL Scripts

IMPORT usrlib.MyScalarFunction;
IMPORT usrlib.MyAsyncScalarFunction;
-- Aliases are also supported:
IMPORT usrlib.MyScalarFunction AS AddFunction;

MyTable := SELECT val, MyScalarFunction(val, val) AS sum
           FROM (VALUES ((1)), ((2)), ((3))) AS t(val);

MyAsyncTable := SELECT val, MyAsyncScalarFunction(val, ival) AS result
                FROM (VALUES (('a'), (1)), (('b'), (2))) AS t(val, ival);

5. Supported UDF Types

  • ScalarFunction
  • AsyncScalarFunction
  • TableFunction
  • AsyncTableFunction
  • AggregateFunction
  • TableAggregateFunction

6. Important Rules

  • The shebang line ///usr/bin/env jbang "$0" "$@" ; exit $? must be the first line
  • Do NOT add Flink //DEPS — they cause build errors since Flink is on the classpath already
  • External (non-Flink) //DEPS are allowed (e.g., //DEPS com.google.code.gson:gson:2.10)
  • Each file must contain exactly one public class that extends a Flink UDF class
  • Package declarations are optional
  • Function names are case-insensitive in SQRL scripts

7. How It Works Under the Hood

  • The preprocessor scans .java files for the JBang shebang
  • All detected JBang UDF files are batched into a single jbang export fatjar invocation
  • The resulting jbang-udfs.jar is placed in the lib directory
  • A .function.json manifest is generated for each UDF, enabling SQRL script imports

Reference Implementation

See the integration test at sqrl-testing/sqrl-testing-container/src/test/resources/jbang/ for a working example with both sync and async UDFs.

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentation

Type

No type
No fields configured for issues without a type.

Projects

Status
Backlog

Relationships

None yet

Development

No branches or pull requests

Issue actions