feat: inject workspace file context into LLM prompts for data-aware code generation#53
Open
TristanSchneider-dev wants to merge 1 commit into
Open
Conversation
Author
|
Edit: If desired instead of using workspace as a dataset storage a dedicated context aware folder to load stuff could be created / used |
Collaborator
eisenbahnhero
left a comment
There was a problem hiding this comment.
Looks good from my side. Thy
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🚀 Description
This PR introduces a new utility module
utils/workspace_context.pythat dynamically scans the local execution workspace (out/workspace) for user-provided data files (such as.csv,.tsv,.parquet,.json,.jsonl). It automatically formats and injects their metadata directly into all LLM interaction prompts.By giving the model immediate visibility into exactly what custom files are available and how to reference them, this enhancement prevents the planner from guessing paths or hallucinating filenames.
🛡️ Edge Cases Handled
(No custom data files found.)orNot yet createdinstead of crashing.save.pkl,runfile.py,out.log) are automatically filtered out and ignored./app/...,./out/...,$ARL_out_dir/...).📁 Files Changed
utils/workspace_context.py— Added path resolution, directory scanning, filtering logic, and context formatting.treesearch/minimal_agent.py— Updated LLM prompt injection hooks to include the workspace context.🧪 Proof of Functionality
1. User Prompt vs. Injected Context
When a user requests a custom dataset, the agent scans the workspace, detects the file, and embeds its metadata directly into the system prompt context:
Prompt:

Workspace with custom .csv:

2. Shared Options & Path Detection
The LLM is presented with both options: downloading the data or using the provided CSV.:
Log print was a quick and dirty debug idea - not commited
3. Execution Result (Successfully Created Row List)
Equipped with accurate file metadata, the agent compiles a valid runfile.py script, executing it successfully to yield the exact top 5 rows from the target CSV:
The chage worked like a charm in three cases it found the right file imediatly without guessing the path :)