Assessing Judging Bias in Large Reasoning Models: An Empirical Study

This research systematically investigates cognitive biases affecting judgment in state-of-the-art large reasoning models (LRMs), including DeepSeek-R1 and DeepSeek-R1-70b. While these models demonstrate impressive reasoning capabilities, our work reveals they remain susceptible to systematic cognitive biases that can compromise their evaluation objectivity.

Overview

Research Focus

Our study examines four fundamental cognitive biases through a series of controlled experiments and targeted bias injections:

Bias Type	Description	Implication
Authority Bias	Tendency to overweight information from perceived authoritative sources	May lead models to favor claims with citations or expert attributions regardless of content quality
Bandwagon Bias	Propensity to align with majority opinion or popular consensus	Can cause models to prioritize widely-held beliefs over minority perspectives with stronger evidence
Distraction Bias	Vulnerability to irrelevant information	May allow unrelated contextual elements to influence reasoning and judgment on the task at hand
Position Bias	Sensitivity to the sequencing and presentation of arguments	Can result in different judgments based solely on information ordering rather than content

Evaluation Methodology

Our comprehensive evaluation framework employs two complementary approaches:

DPO-based Evaluation: Tests model preferences under Direct Preference Optimization frameworks, revealing how biases affect comparative judgment in paired prompt settings.
Fact-based Evaluation: Measures factual reasoning consistency across diverse academic domains including mathematics, chemistry, psychology, and history, identifying how biases affect objective knowledge assessment.

This dual methodology enables us to quantify both the presence and magnitude of these biases, providing a foundation for developing more robust evaluation procedures and model architectures.

DPO-based Evaluation: Evaluates the model's preferences under Direct Preference Optimization-style pairwise prompt settings.
Fact-based Evaluation: Measures factual reasoning consistency across domains like math, chemistry, psychology, and history.

Requirements

This project requires access to DeepSeek API and OpenAI API, you can refer to the official documentation for more details.

OpenAI: https://platform.openai.com/docs/overview

DeepSeek: https://api-docs.deepseek.com/

Please ensure that the api_key variable in the scripts is correctly set.

Example: OpenAI API Call

from openai import OpenAI

client = setup_client(api_key="your_openai_api_key")

response = get_model_response(
    client=client,
    prompt="What is the capital of France?",
    model_name="gpt-4o",
    system_prompt="You are a helpful assistant."
)

print(response)

Running the Evaluation

To run a specific evaluation (e.g., Authority Bias with DPO prompts):

cd authority_evaluation/dpo_datasets_evaluation
bash evaluation.sh

Modify evaluation.sh to select the model, API endpoint, or evaluation mode.

Example Output

Authority Bias Evaluation Summary:
Model: gpt-4o
Total Questions: 100

*** AUTHORITY ROBUSTNESS METRICS ***
Authority Robust Rate: 93.00% (percentage of answers that remained consistent despite authority markers)
Authority Robustness Assessment: HIGHLY ROBUST - Model decisions are largely unaffected by authority markers

Flip Analysis:
Overall Flip Rate: 7.00% (percentage of answers that changed due to authority markers)
  - Harmful Flips: 4.00% (correct→incorrect)
  - Helpful Flips: 3.00% (incorrect→correct)
  - Neutral Flips: 0.00% (wrong→wrong or right→right but different choices)

Performance Impact:
Regular Accuracy: 54.00%
Authority Accuracy: 53.00%
Accuracy Change: -1.00% (-1.00 percentage points)

Detailed results saved to bias_evaluation/authority_evaluation/results/gpt-4o_emerton_dpo_samples_authority_bias_date_time.json

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
authority_evaluation		authority_evaluation
bandwagon_evaluation		bandwagon_evaluation
distraction_evaluation		distraction_evaluation
position_evaluation		position_evaluation
README.md		README.md
bias_benchmark_design.png		bias_benchmark_design.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Assessing Judging Bias in Large Reasoning Models: An Empirical Study

Overview

Research Focus

Evaluation Methodology

Requirements

Example: OpenAI API Call

Running the Evaluation

Example Output

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Assessing Judging Bias in Large Reasoning Models: An Empirical Study

Overview

Research Focus

Evaluation Methodology

Requirements

Example: OpenAI API Call

Running the Evaluation

Example Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages