Skip to content
View Arth-Singh's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report Arth-Singh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Arth-Singh/README.md

Hi there 👋

I’m Arth Singh, an AI Safety & Red Teaming researcher from Mumbai, India 🇮🇳. I am currently working at AIM Intelligence as a Research Engineer in the AI Safety department, and currently collaborating with Seoul National University PI Lab for Mobile Use Agents Red Teaming and with Rishub Jain from Google Deepmind on AI Oversight Research, I was also a Research Collaborator with FAR.AI where I helped them build their Red Teaming Toolkit.

  • 🧨 I enjoy red teaming AI models, but lately I’m more focused on AI alignment & safety

📫 Let’s connect:

Always down to talk alignment, adversarial evals, or half-baked research ideas that can turn into collaborations.

Pinned Loading

  1. arth-finds-weird-model-behaviours arth-finds-weird-model-behaviours Public

    I have created this repository to document all the weird findings I do related to LLMs

    Python

  2. vlm-compression-circuits vlm-compression-circuits Public

    Code for mechanistically analyzing and improving task-specific model compression in VLMs

    Python 1

  3. A-Red-Team-Havoc A-Red-Team-Havoc Public

    This is a red teaming toolkit that i have built to do attacks on LLMs. More to add soon.

    Python 1

  4. readibility-controbility readibility-controbility Public

    TeX

  5. SuperAdditive_Vector_Collusion SuperAdditive_Vector_Collusion Public

    Forked from ranabir/SuperAdditive_Vector_Collusion

    Recent safety research investigates whether AI systems could exhibit heightened risks when multiple individually benign failure modes or steering vectors combine. This project explores whether "two…

    Python

  6. AI4Collaboration/in-context-scheming-env AI4Collaboration/in-context-scheming-env Public

    Forked from noise-field/self-preservation-env

    A configurable environment for tool-augmented LLMs to evaluate their self-preservation capabilities

    Python 1 1