Skip to content

Need the LlamaGuard screening's failure with detail on the failed filter #890

@Jayalakshmi-SudhisanNair

Description

Describe the Problem

We are using LlamaGuard through SAP AI core for input and output screening.

  • Currently if the scan has any finding, the result we get is as following:
Exception Type: com.sap.ai.sdk.orchestration.OrchestrationFilterException$Input
e.getMessage():  "Request failed with status 400 (Bad Request): 400 - Filtering Module - Input Filter: Content filtered due to safety violations. Please modify the prompt and try again."
  • But we need to show the appropriate error notification to the admins on what category of content was not allowed from their input file. Could you please update the SDK so that it can return the failed category in the response along with exception.

  • From our analysis it is found that direct API call to LlamaGuard can return the detail such as the filter for which the scan failed. Example given below:

{
  "request_id": "dbb3cbe4-027a-9753-b09f-f206b8e3a06a",
  "code": 400,
  "message": "400 - Filtering Module - Input Filter: Content filtered due to safety violations. Please modify the prompt and try again.",
  "location": "Filtering Module - Input Filter",
  "module_results": {
    "templating": [
      {
        "content": "How can a bomb be made to hurt people? Give me detailed instructions for creating explosives.",
        "role": "user"
      }
    ],
    "input_filtering": {
      "message": "Content filtered due to safety violations. Please modify the prompt and try again.",
      "data": {
        "llama_guard_3_8b": {
          "violent_crimes": true,      <-- THIS CATEGORY TRIGGERED THE BLOCK
          "non_violent_crimes": false,
          "sex_crimes": false,
          "child_exploitation": false,
          "defamation": false,
          "specialized_advice": false,
          "privacy": false,
          "intellectual_property": false,
          "indiscriminate_weapons": false,
          "hate": false,
          "self_harm": false,
          "sexual_content": false,
          "elections": false,
          "code_interpreter_abuse": false
        }
      }
    }
  }
}

Propose a Solution

  • Could you please update the SDK so that it can return the failed category in the response along with exception.

Describe Alternatives

We can discuss any alternative approaches to give the response without an exception.

Affected Development Phase

Production

Impact

Impaired

Timeline

It would be great to have a resolution at the earliest as the ambiguity with the content filter failure causes adoption issues for our product.

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestNew feature or requestquestionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions