feat: inject workspace file context into LLM prompts for data-aware code generation by TristanSchneider-dev · Pull Request #53 · ISG-Siegen/AutoRecLab

TristanSchneider-dev · 2026-05-28T16:05:42Z

🚀 Description

This PR introduces a new utility module utils/workspace_context.py that dynamically scans the local execution workspace (out/workspace) for user-provided data files (such as .csv, .tsv, .parquet, .json, .jsonl). It automatically formats and injects their metadata directly into all LLM interaction prompts.

By giving the model immediate visibility into exactly what custom files are available and how to reference them, this enhancement prevents the planner from guessing paths or hallucinating filenames.

🛡️ Edge Cases Handled

Empty/Missing Workspace: Features a graceful fallback that prints (No custom data files found.) or Not yet created instead of crashing.
Noise Filtering: Internal pipeline execution or log files (e.g., save.pkl, runfile.py, out.log) are automatically filtered out and ignored.
Environment Agnostic: Paths are resolved robustly across different environments, whether running inside Docker, via local relative execution paths, or using environment variables (/app/..., ./out/..., $ARL_out_dir/...).

📁 Files Changed

utils/workspace_context.py — Added path resolution, directory scanning, filtering logic, and context formatting.
treesearch/minimal_agent.py — Updated LLM prompt injection hooks to include the workspace context.

🧪 Proof of Functionality

1. User Prompt vs. Injected Context

When a user requests a custom dataset, the agent scans the workspace, detects the file, and embeds its metadata directly into the system prompt context:

Prompt:

Workspace with custom .csv:

2. Shared Options & Path Detection

The LLM is presented with both options: downloading the data or using the provided CSV.:

MinimalAgent: Getting plan and code
                    INFO     === Workspace Context injected into LLM prompt ===                                                                     minimal_agent.py:326
                    INFO     WORKSPACE_CTX: ## Workspace & File Context                                                                             minimal_agent.py:328
                    INFO     WORKSPACE_CTX:                                                                                                         minimal_agent.py:328
                    INFO     WORKSPACE_CTX: - **Project root:** `/home/pete-fed/PycharmProjects/AutoRecLab-group4`                                  minimal_agent.py:328
                    INFO     WORKSPACE_CTX: - **Code execution workspace:** `/home/pete-fed/PycharmProjects/AutoRecLab-group4/out/workspace`        minimal_agent.py:328
                    INFO     WORKSPACE_CTX:                                                                                                         minimal_agent.py:328
                    INFO     WORKSPACE_CTX: ### How to access data files                                                                            minimal_agent.py:328
                    INFO     WORKSPACE_CTX:                                                                                                         minimal_agent.py:328
                    INFO     WORKSPACE_CTX: When your code runs, the working directory is the code execution workspace listed above.                minimal_agent.py:328
                    INFO     WORKSPACE_CTX:                                                                                                         minimal_agent.py:328
                    INFO     WORKSPACE_CTX: You have two options to load datasets:                                                                  minimal_agent.py:328
                    INFO     WORKSPACE_CTX:                                                                                                         minimal_agent.py:328
                    INFO     WORKSPACE_CTX: 1. **Use the `dataloader` package** (recommended for standard datasets):                                minimal_agent.py:328
                    INFO     WORKSPACE_CTX:    ```python                                                                                            minimal_agent.py:328
                    INFO     WORKSPACE_CTX:    from dataloader.loaders.registry import _run_loader                                                  minimal_agent.py:328
                    INFO     WORKSPACE_CTX:    df = _run_loader('MovieLens100K')  # Downloads & caches automatically                                minimal_agent.py:328
                    INFO     WORKSPACE_CTX:    ```                                                                                                  minimal_agent.py:328
                    INFO     WORKSPACE_CTX:    Available datasets include: MovieLens100K, MovieLens1M, MovieLens10M, MovieLens20M,                  minimal_agent.py:328
                    INFO     WORKSPACE_CTX:    MovieLens25M, MovieLensLatest, MovieLensLatestSmall, MovieLens1BSynthetic,                           minimal_agent.py:328
                    INFO     WORKSPACE_CTX:    Amazon2014*, Amazon2018*, Amazon2023*, Yelp2023, Gowalla, BeerAdvocate, etc.                         minimal_agent.py:328
                    INFO     WORKSPACE_CTX:                                                                                                         minimal_agent.py:328
                    INFO     WORKSPACE_CTX: 2. **Use custom files placed in the workspace** (listed below):                                         minimal_agent.py:328
                    INFO     WORKSPACE_CTX: ## Custom data files in workspace                                                                       minimal_agent.py:328
                    INFO     WORKSPACE_CTX:                                                                                                         minimal_agent.py:328
                    INFO     WORKSPACE_CTX: - `groesste_studenten_uni_siegen.csv` — CSV data file (932.0 B)                                         minimal_agent.py:328
                    INFO     WORKSPACE_CTX:                                                                                                         minimal_agent.py:328
                    INFO     WORKSPACE_CTX: To use a custom file in your code, reference it by its filename                                         minimal_agent.py:328
                    INFO     WORKSPACE_CTX: (the code runs inside the workspace directory):                                                         minimal_agent.py:328
                    INFO     WORKSPACE_CTX: ```python                                                                                               minimal_agent.py:328
                    INFO     WORKSPACE_CTX: df = pd.read_csv('filename.csv')                                                                        minimal_agent.py:328
                    INFO     WORKSPACE_CTX: ```                                                                                                     minimal_agent.py:328
                    INFO     === End workspace context ===                                                                                          minimal_agent.py:329#

Log print was a quick and dirty debug idea - not commited

3. Execution Result (Successfully Created Row List)

Equipped with accurate file metadata, the agent compiles a valid runfile.py script, executing it successfully to yield the exact top 5 rows from the target CSV:

Matrikelnummer;Vorname;Nachname;Geschlecht;Alter;Studiengang;Fachsemester;Koerpergroesse_cm
1575400;Tim;Wagner;M;27;Wirtschaftsingenieurwesen (B.Sc.);18;209
1351722;Lukas;Becker;M;20;Informatik (M.Sc.);3;207
1591650;Julian;Schulz;M;28;Lehramt an Grundschulen;20;207
1341530;Noah;Bauer;M;27;Physik (B.Sc.);17;205
1216663;Ben;Richter;M;19;Wirtschaftsingenieurwesen (B.Sc.);1;202

The chage worked like a charm in three cases it found the right file imediatly without guessing the path :)

…ode generation

TristanSchneider-dev · 2026-05-28T17:10:08Z

Edit: If desired instead of using workspace as a dataset storage a dedicated context aware folder to load stuff could be created / used

eisenbahnhero

Looks good from my side. Thy

feat: inject workspace file context into LLM prompts for data-aware c…

b23c656

…ode generation

eisenbahnhero reviewed May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: inject workspace file context into LLM prompts for data-aware code generation#53

feat: inject workspace file context into LLM prompts for data-aware code generation#53
TristanSchneider-dev wants to merge 1 commit into
ISG-Siegen:developfrom
leonlenz:feat/workspace_context

TristanSchneider-dev commented May 28, 2026

Uh oh!

TristanSchneider-dev commented May 28, 2026

Uh oh!

eisenbahnhero left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

TristanSchneider-dev commented May 28, 2026

🚀 Description

🛡️ Edge Cases Handled

📁 Files Changed

🧪 Proof of Functionality

1. User Prompt vs. Injected Context

2. Shared Options & Path Detection

3. Execution Result (Successfully Created Row List)

Uh oh!

TristanSchneider-dev commented May 28, 2026

Uh oh!

eisenbahnhero left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants