Name	Name	Last commit message	Last commit date
parent directory ..
archives	archives
individual-reports	individual-reports
sq-vs-ic	sq-vs-ic
README.md	README.md
benchmark-report-prod-2025-08-15.md	benchmark-report-prod-2025-08-15.md
benchmark-summary-prod-2025-08-15.json	benchmark-summary-prod-2025-08-15.json

InsightCode Benchmarks Documentation

This directory contains benchmark results and analysis of popular open-source projects.

🎯 Purpose

Validate InsightCode's scoring algorithm against real-world codebases
Demonstrate performance and accuracy on diverse project types
Provide reference scores for comparison and calibration
Build credibility through transparent, reproducible analysis

📊 Latest Results

The most recent benchmark results are generated by running:

npm run benchmark              # Full codebase analysis
npm run benchmark:production    # Production code only

Latest Benchmark: July 21, 2025 (v0.7.0)

9 projects analyzed (Angular, Chalk, ESLint, Express, Jest, Lodash, TypeScript, UUID, Vue)
677,099 lines processed in 70.31 seconds
Analysis speed: 9,630 lines/second
Success rate: 100% (9/9 projects)

Results are automatically saved to benchmarks/ with detailed individual reports.

📐 Methodology (v0.7.0+)

Health Score System

InsightCode uses progressive penalties without caps following the Pareto Principle:

Core Formula

Health Score = 100 - Σ(penalties)
Final Score = Math.max(0, Health Score)

Progressive Penalty System

Complexity Penalties (McCabe-based):

≤10: No penalty (100 points) - McCabe "excellent"
11-20: Linear penalty (100→70 points) - 3 points per unit
21-50: Quadratic penalty (70→30 points) - exponential maintenance burden
51+: Exponential penalty (30→0 points) - extreme complexity gets extreme penalties
100+: Additional catastrophic penalties - no artificial caps

Duplication Penalties (Mode-aware):

Legacy Mode (default): ≤15% excellent, ≤30% acceptable, ≤50% critical
Strict Mode (--strict-duplication): ≤3% excellent, ≤8% acceptable, ≤15% critical
Progressive penalties without caps for extreme duplication

Size Penalties (Clean Code inspired):

≤200 LOC: No penalty - optimal file size
201-500 LOC: Linear penalty - gentle degradation
500+ LOC: Exponential penalty - massive files severely penalized

Issue Penalties (Severity-weighted):

Critical: 20 points each
High: 12 points each
Medium: 6 points each
Low: 2 points each

Project-Level Scoring

Weighted Aggregation (internal hypothesis requiring validation):

45% Complexity - Primary defect predictor
30% Maintainability - Development velocity impact
25% Duplication - Technical debt indicator

Architectural Criticality Weighting: Each file receives a "criticism score" determining its weight in final project scores:

CriticismScore = (Dependencies × 2.0) + (WeightedIssues × 0.5) + 1

Where:

Dependencies = incomingDeps + outgoingDeps + (isInCycle ? 5 : 0)
WeightedIssues = (critical×4) + (high×3) + (medium×2) + (low×1)

Note: Complexity is NOT included to avoid double-counting since it's already weighted at 45% in health score calculation.

Grade Thresholds (Academic standard):

A: 90-100 points
B: 80-89 points
C: 70-79 points
D: 60-69 points
F: 0-59 points

Measurement Accuracy

Complexity Calculation

Method: Extended Cyclomatic Complexity (McCabe, 1976)
Base: Every file starts at complexity 1
+1 for each decision point:
- if, else if (but NOT else alone)
- for, while, do-while, for-in, for-of
- case in switch (but NOT default)
- catch in try-catch
- &&, || (logical operators as implicit branching)
- ? : (ternary operator)
Validation: 100% accurate with comprehensive test suite

Duplication Detection

Method: 8-line sliding window with content-based analysis
Philosophy: Focuses on actionable copy-paste, not structural similarity
Accuracy: ~85% conservative (prefers false negatives over false positives)
Block size: 8 lines minimum for significance
Token threshold: 8+ tokens to filter trivial matches

📈 Latest Benchmark Results (July 21, 2025)

Production Code Analysis - Grade Distribution

Grade	Projects	Percentage
A	3 projects	33%
B	4 projects	44%
C	2 projects	22%
D	0 projects	0%
F	0 projects	0%

Top Performers

UUID (Grade A, 97/100)
- Excellent: Low complexity, minimal duplication
- 978 LOC across 29 files
- Average complexity: 4.6
Express (Grade A, 94/100)
- Excellent: Well-structured web framework
- 1,135 LOC across 7 files
- Average complexity: 7.4
Chalk (Grade A, 93/100)
- Excellent: Clean terminal styling library
- 475 LOC across 5 files
- Average complexity: 8.0

Performance Statistics

Analysis speed: 9,630 lines/second
Average code quality: 86/100
Average duplication rate: 1.7%
Most critical issues: Deep nesting (2,829 issues found)
Largest project: TypeScript (316,214 LOC, 697 files)

Project Size Analysis

Small Projects (lodash, chalk, uuid):

Average score: 91/100
Average complexity: 37.0
Observation: Small projects achieving excellent quality

Medium Projects (express, vue, jest):

Average score: 84/100
Average complexity: 18.3
Observation: Best balance of features vs maintainability

Large Projects (angular, eslint, typescript):

Average score: 82/100
Average complexity: 43.4
Observation: Large projects maintaining high quality standards

🔍 Understanding the Results: Context Matters

Case Study: Lodash's Deliberate Architecture

Current Analysis (July 2025):

File Complexity: 1,818 (extreme)
Individual Health Score: 0/100 (F grade)
Project Complexity Score: 7/100 (weighted by architectural criticality)
Critical Issues: 25

Historical Context: Lodash's monolithic design was deliberate:

Pre-ES6 Era Constraints:
- Single file compatibility (browsers, Node.js, Rhino)
- No module bundlers or tree-shaking
- CDN distribution priority
User Experience Trade-offs:
- ✅ Zero configuration for developers
- ✅ Universal compatibility maintained
- ❌ Extreme complexity penalty (1,818)
- ❌ Maintenance burden

Modern Context:

// Legacy usage (still dominant)
<script src="lodash.js"></script>

// Modern modular usage available
import map from 'lodash/map'
import filter from 'lodash/filter'

Key Insight: F grades can reflect legitimate architectural trade-offs, not just poor engineering. Consider business constraints and historical context when interpreting scores.

🧪 Self-Analysis Disclaimer

InsightCode analyzes itself for:

✅ Feature validation and testing
✅ Documentation examples
✅ Output format demonstration
❌ NOT for quality judgment

Analysis tools inherently have high complexity due to AST manipulation and parsing logic. Self-analysis scores demonstrate capabilities, not code quality assessment.

🔄 Duplication Detection Philosophy

InsightCode vs SonarQube: Different Approaches

Aspect	SonarQube	InsightCode
Method	Token/Structure-based	Content-based
Philosophy	"Repetitive structure"	"Actionable copy-paste"
Lodash Result	70.4% duplication	~6% duplication
Focus	Structural patterns	Actual code copying

Example: Benchmark suites with repetitive structure:

// SonarQube flags this as duplication
suites.push(Benchmark.Suite('`_.map`'))
suites.push(Benchmark.Suite('`_.filter`'))

// InsightCode: Different methods = different code
// Structure similarity ≠ actionable duplication

We prioritize actionable insights over structural pattern detection.

📁 Benchmark History

Recent Benchmarks

benchmark-report-prod-2025-07-21.md - Latest v0.7.0 production analysis
benchmark-summary-prod-2025-07-21.json - Machine-readable summary
benchmark-report-full-2025-07-21.md - Complete codebase analysis (including tests)

🎯 Benchmark Validation

Academic Foundation

McCabe (1976): Complexity thresholds ≤10 excellent, >50 critical
Martin Clean Code (2008): File size recommendations ≤200 LOC optimal
NASA/SEL Standards: Current NPR 7150.2D requires ≤15 complexity for critical software
ISO/IEC 25010: Maintainability quality model compliance

Industry Alignment

SonarQube Quality Gates: Strict duplication mode aligns with 3% threshold
Google Code Practices: Duplication tolerance ~2-3% maintained
NIST Guidelines: >20 complexity = high defect probability

Statistical Validation

677K+ lines analyzed across diverse project types
100% success rate on complex, real-world codebases
Consistent results across multiple benchmark runs
Performance proven at 9K+ lines/second analysis speed with v0.7.0 enhancements

For complete methodology details, see docs/FILE_HEALTH_SCORE_METHODOLOGY.md and docs/SCORING_ARCHITECTURE.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

InsightCode Benchmarks Documentation

🎯 Purpose

📊 Latest Results

📐 Methodology (v0.7.0+)

Health Score System

Core Formula

Progressive Penalty System

Project-Level Scoring

Measurement Accuracy

Complexity Calculation

Duplication Detection

📈 Latest Benchmark Results (July 21, 2025)

Production Code Analysis - Grade Distribution

Top Performers

Performance Statistics

Project Size Analysis

🔍 Understanding the Results: Context Matters

Case Study: Lodash's Deliberate Architecture

🧪 Self-Analysis Disclaimer

🔄 Duplication Detection Philosophy

InsightCode vs SonarQube: Different Approaches

📁 Benchmark History

Recent Benchmarks

🎯 Benchmark Validation

Academic Foundation

Industry Alignment

Statistical Validation

FilesExpand file tree

benchmarks

Directory actions

More options

Directory actions

More options

Latest commit

History

benchmarks

Folders and files

parent directory

README.md

InsightCode Benchmarks Documentation

🎯 Purpose

📊 Latest Results

📐 Methodology (v0.7.0+)

Health Score System

Core Formula

Progressive Penalty System

Project-Level Scoring

Measurement Accuracy

Complexity Calculation

Duplication Detection

📈 Latest Benchmark Results (July 21, 2025)

Production Code Analysis - Grade Distribution

Top Performers

Performance Statistics

Project Size Analysis

🔍 Understanding the Results: Context Matters

Case Study: Lodash's Deliberate Architecture

🧪 Self-Analysis Disclaimer

🔄 Duplication Detection Philosophy

InsightCode vs SonarQube: Different Approaches

📁 Benchmark History

Recent Benchmarks

🎯 Benchmark Validation

Academic Foundation

Industry Alignment

Statistical Validation