Stats Command

The stats command analyzes annotation datasets and generates comprehensive statistics reports including category distribution, annotation counts, dimension analysis, and more.

Usage

panlabel stats <INPUT> [OPTIONS]

Parameters

input

path

required

Path to the dataset to analyze. Can be a file or directory depending on the format.

--format

string

Input format. If omitted, Panlabel auto-detects the format.Supported values: ir-json, coco, cvat, label-studio, tfod, yolo, vocAliases: coco-json, cvat-xml, label-studio-json, ls, tfod-csv, ultralytics, yolov8, yolov5, pascal-voc, voc-xml

When auto-detection fails for a JSON file, stats falls back to reading it as ir-json.

--top

number

default:"10"

Number of top labels and label pairs to show in the report.Useful for large datasets with many categories.

--tolerance

float

default:"0.5"

Tolerance in pixels for out-of-bounds checks.Annotations within this tolerance of the image boundary are not flagged as out-of-bounds.

--output

string

default:"text"

Output format for the statistics report.Options:

text - Human-readable text report with ASCII visualizations
json - Machine-readable JSON with full statistics
html - Self-contained HTML report with interactive charts

Statistics Included

The stats report includes:

Dataset Overview

Total images, annotations, and categories
Images with/without annotations
Average annotations per image

Category Distribution

Annotation count per category
Visual bar charts (text mode) or interactive charts (HTML mode)
Top N most frequent categories (controlled by --top)

Dimension Analysis

Image dimension distribution
Bounding box size statistics (min, max, average)
Aspect ratio analysis

Quality Metrics

Out-of-bounds annotations (beyond --tolerance)
Empty or zero-area bounding boxes
Images with duplicate annotations

Co-occurrence Analysis

Top N category pairs that appear together (controlled by --top)
Useful for understanding object relationships in your dataset

Examples

Basic Statistics

panlabel stats dataset.json

Auto-detects the format and prints a text report.

Explicit Format with JSON Output

panlabel stats dataset.json --format coco --output json > stats.json

Generates a machine-readable JSON report.

HTML Report

panlabel stats dataset.json --output html > report.html

Creates a self-contained HTML file with interactive visualizations.

Show Top 20 Categories

panlabel stats large_dataset.json --top 20

Displays the 20 most frequent categories and pairs.

YOLO Dataset Statistics

panlabel stats /data/yolo_dataset --format yolo --output text

Analyzes a YOLO directory structure.

Custom Tolerance for OOB Checks

panlabel stats dataset.json --tolerance 1.0

Uses 1 pixel tolerance instead of the default 0.5 pixels.

Output Examples

Text Report

Dataset Statistics

Overview:
  Images:              150
  Annotations:         1,234
  Categories:          8
  Avg annotations/img: 8.2
  Images w/o annot:    12 (8.0%)

Top 10 Categories:
  person      ████████████████████ 456 (37.0%)
  car         ███████████████      298 (24.2%)
  bicycle     ██████               124 (10.1%)
  motorcycle  ████                  89 (7.2%)
  bus         ███                   67 (5.4%)
  truck       ██                    54 (4.4%)
  traffic     ██                    48 (3.9%)
  stop_sign   █                     32 (2.6%)

Top 10 Category Co-occurrences:
  person × car         89 images
  person × bicycle     54 images
  car × traffic        42 images
  person × motorcycle  38 images
  car × truck          28 images
  bus × car            24 images
  bicycle × traffic    19 images
  person × bus         15 images
  motorcycle × traffic 12 images
  truck × traffic      11 images

Dimension Analysis:
  Image sizes: 640×480 (95), 1920×1080 (45), 800×600 (10)
  BBox width:  min=12px, max=580px, avg=156px
  BBox height: min=18px, max=420px, avg=142px

Quality:
  Out-of-bounds: 3 (0.2%)
  Zero-area:     0 (0.0%)

JSON Report Structure

{
  "overview": {
    "images": 150,
    "annotations": 1234,
    "categories": 8,
    "avg_annotations_per_image": 8.227,
    "images_without_annotations": 12
  },
  "categories": [
    {
      "name": "person",
      "count": 456,
      "percentage": 37.0
    },
    {
      "name": "car",
      "count": 298,
      "percentage": 24.2
    }
  ],
  "dimensions": {
    "image_sizes": {
      "640x480": 95,
      "1920x1080": 45,
      "800x600": 10
    },
    "bbox_stats": {
      "width": {"min": 12, "max": 580, "avg": 156.4},
      "height": {"min": 18, "max": 420, "avg": 142.1}
    }
  },
  "quality": {
    "out_of_bounds": 3,
    "zero_area": 0
  },
  "co_occurrences": [
    {"pair": ["person", "car"], "count": 89},
    {"pair": ["person", "bicycle"], "count": 54}
  ]
}

HTML Report

The HTML output creates a self-contained report with:

Interactive bar charts for category distribution
Searchable/sortable tables
Collapsible sections
Responsive design for mobile viewing
No external dependencies (all CSS/JS embedded)

panlabel stats dataset.json --output html > report.html
open report.html  # View in browser

Use Cases

Dataset Quality Assessment

# Check if dataset is balanced
panlabel stats training_data.json --top 50 --output text

Quickly identify imbalanced categories that may need more samples.

Pre-Training Analysis

# Generate comprehensive report before training
panlabel stats dataset.json --output html > analysis.html

Share with team members or include in documentation.

Automated Monitoring

# Track stats over time
panlabel stats current_dataset.json --output json | \
  jq '.overview' >> stats_history.jsonl

Monitor how your dataset evolves during annotation.

Compare Dataset Versions

# Generate stats for before/after
panlabel stats v1.json --output json > v1_stats.json
panlabel stats v2.json --output json > v2_stats.json
jd v1_stats.json v2_stats.json  # Use jd or diff tool

Performance Notes

Stats computation is fast even for large datasets (millions of annotations)
JSON output is more verbose but easier to parse programmatically
HTML generation adds minimal overhead and produces self-contained files
Use --top to limit output size for datasets with hundreds of categories

Validate Command

Validate dataset quality

Diff Command

Compare two datasets

Sample Command

Create balanced subsets based on stats

Get Started

CLI Commands

Guides

Format Reference

Advanced

Usage

Parameters

Statistics Included

Dataset Overview

Category Distribution

Dimension Analysis

Quality Metrics

Co-occurrence Analysis

Examples

Basic Statistics

Explicit Format with JSON Output

HTML Report

Show Top 20 Categories

YOLO Dataset Statistics

Custom Tolerance for OOB Checks

Output Examples

Text Report

JSON Report Structure

HTML Report

Use Cases

Dataset Quality Assessment

Pre-Training Analysis

Automated Monitoring

Compare Dataset Versions

Performance Notes

See Also

Validate Command

Diff Command

Sample Command

Get Started

CLI Commands

Guides

Format Reference

Advanced

​Usage

​Parameters

​Statistics Included

​Dataset Overview

​Category Distribution

​Dimension Analysis

​Quality Metrics

​Co-occurrence Analysis

​Examples

​Basic Statistics

​Explicit Format with JSON Output

​HTML Report

​Show Top 20 Categories

​YOLO Dataset Statistics

​Custom Tolerance for OOB Checks

​Output Examples

​Text Report

​JSON Report Structure

​HTML Report

​Use Cases

​Dataset Quality Assessment

​Pre-Training Analysis

​Automated Monitoring

​Compare Dataset Versions

​Performance Notes

​See Also

Validate Command

Diff Command

Sample Command

Usage

Parameters

Statistics Included

Dataset Overview

Category Distribution

Dimension Analysis

Quality Metrics

Co-occurrence Analysis

Examples

Basic Statistics

Explicit Format with JSON Output

HTML Report

Show Top 20 Categories

YOLO Dataset Statistics

Custom Tolerance for OOB Checks

Output Examples

Text Report

JSON Report Structure

HTML Report

Use Cases

Dataset Quality Assessment

Pre-Training Analysis

Automated Monitoring

Compare Dataset Versions

Performance Notes

See Also