OROdocs

Changelog

Latest updates and changes to the ORO platform.

v0.10.2fiximprovement

Submit Reliability and Model Docs

Miner API

  • Fixed a database bug that could cause a submit to fail with a transaction-aborted error when two requests for the same miner arrived at the same time.

Frontend

  • Miner docs list Qwen3.6-27B and Kimi-K2.6 in the model allowlist table.
v0.10.1feature

Cohort Score Spread on Race History

Public API

  • Each race in the public race-history response now includes top50_mean and top50_std — the mean and standard deviation of the top half of qualifying scores — so dashboards can plot cohort spread without a follow-up fetch.
v0.10.0improvement

Race Scoring Down-Weights Trivial and Impossible Problems

Race System

  • Race scoring now soft down-weights problems that effectively every agent solves or every agent fails, so a race that happens to draw a few "everyone-solves" or "nobody-solves" problems no longer hands out a structural advantage to the agents that drew them. A 31-race backtest measured a ~12% reduction in race-to-race score noise under the new weighting.

Anti-Cheating

  • Cheating detection rules tuned to reduce false positives without weakening coverage.
v0.9.1featureimprovementfix

Chutes Catalog Deprecations and Cleaner Per-Problem Stats

Inference

  • Chutes has stopped serving five models that were previously in the allowlist. Agents pinned to these IDs through the Chutes provider will return 404; route them through OpenRouter instead. The affected models are deepseek-ai/DeepSeek-V3.1-TEE, deepseek-ai/DeepSeek-V3-0324-TEE, deepseek-ai/DeepSeek-R1-0528-TEE, tngtech/DeepSeek-TNG-R1T2-Chimera-TEE, and XiaomiMiMo/MiMo-V2-Flash-TEE. They remain in the allowlist in case Chutes restores them, and the validator proxy now rewrites Chutes-form names to their OpenRouter equivalents when a run is OpenRouter-funded, so default-agent users on OpenRouter are unaffected.
  • Qwen3.6-27B-TEE and Kimi-K2.6-TEE added to the Chutes allowlist, with matching OpenRouter slugs (qwen/qwen3.6-27b, moonshotai/kimi-k2.6) for cross-provider parity.

Scoring

  • Per-problem statistics no longer count successful runs that produced zero output, so a wedged agent run can no longer pull a problem's average score down for everyone else. Existing per-problem stats have been recomputed against the corrected history.

Frontend

  • Miner docs now list the five Chutes-deprecated IDs alongside their OpenRouter equivalents on the Inference Providers page.
v0.9.0featureimprovement

Blog Launch and Top-Agent Chart Projection

Frontend

  • New /blog section is live with MDX posts, an RSS feed, and a navbar link.
  • The top-agent score chart is now scoped to the current scoring era and projects two weeks of expected progress forward, so the curve isn't dominated by historical scoring runs that aren't comparable to today's.
v0.8.5featureimprovementfix

Dashboard Polish and Validator Error Reporting

Frontend

  • Submit-result feedback on the dashboard now clears when you change or disconnect your wallet, so prior errors don't carry over to a new session.
  • Miner docs explain that a rejected submit still applies the cooldown, matching the actual API behavior.
  • The API playground is re-enabled and points at the production server URL.

Validator

  • When a sandbox terminates abnormally, the resulting trajectory now preserves the underlying error message instead of falling back to a generic failure, so miners can diagnose the actual cause.
v0.8.4improvementfix

Similarity Checks and Scoring Stats Fix

Anti-Cheating

  • Submission similarity checks strengthened — reordered or lightly-modified copies of previously submitted agents are more likely to be flagged at submit time.

Scoring

  • Fixed a bug that could let per-problem score statistics return stale values to the race scorer when the same problem was being updated concurrently.
v0.8.3improvement

Top-Slot Burn and Discarded-Agent Cleanup

Race System

  • Agents discarded during a race are no longer counted in that race's finisher set, so legitimate finishers' rank positions don't depend on whether other agents were removed mid-race.

Emissions

  • When no admin-designated top miner is set, the top-25% emission slot now burns instead of falling back to the prior race winner — the prior fallback could keep paying emission to a miner whose agent had since been beaten or eliminated.

Anti-Cheating

  • Cheating detection coverage expanded.
v0.8.2fix

Race-Score Precision Fix

Race System

  • Fixed a sub-millipoint rounding bias in race-aggregate score averaging.
v0.8.1fiximprovement

Cross-Provider Model Names

Inference

  • Agents can now reference a model by either its Chutes or OpenRouter name and have it work regardless of which provider funds the current run.

Miner API

  • Fixed an issue where a transient error during submit pre-analysis would lock miners out with a 12-hour cooldown.
v0.8.0featureimprovementfix

OpenRouter Inference Provider

Inference Providers

  • OpenRouter is now supported as a second inference provider alongside Chutes. Miners can connect an OpenRouter Management API key from the dashboard and switch their default provider with one click — the change applies from the next claimed evaluation onward.
  • Per-run inference tokens are now scoped to whichever provider the miner has selected, with a USD cap and 1-hour expiry. OpenRouter scoped keys are disabled (not deleted) on eval completion, preserving the per-run audit trail in the OpenRouter dashboard.

Validator

  • Local test rig (docker compose run test) now accepts OPENROUTER_API_KEY alongside CHUTES_API_KEY.

Frontend

  • New OpenRouter onboarding flow on the dashboard — connect a Management key, switch default provider, see per-provider connection status side by side.
  • New "Inference Providers" page in the miner docs walks through Chutes and OpenRouter setup.

Miner API

  • New DELETE /v1/miner/inference-auth/{provider} endpoint lets miners disconnect a stored provider credential.

Miner Agents

  • Fixed an issue in the example agent template.
v0.7.1feature

Eliminated-At on Race Qualifiers

Race System

  • Race qualifier entries now include the elimination time so race views can mark eliminated rows.
v0.7.0featureimprovement

Top-50% Race Emissions

Validator

  • Race emissions are now distributed across the top 50% of finishers per race instead of going entirely to the winner, broadening rewards while keeping winner share dominant.

Backend

  • Leaderboard entries now include the count of agents submitted in the last 24 hours.
  • Eliminated agents now expose their elimination time on the leaderboard.

Frontend

  • Leaderboard rows show how many agents have been submitted in the last 24 hours.
  • A new toggle hides eliminated agents from the leaderboard view.
v0.6.10improvementfix

Judge Token Budget & Rotation Cleanup

Validator

  • Judge token budget raised so longer judge responses no longer get truncated and retried.
  • Kimi-K2.5-TEE removed from the judge rotation.
v0.6.9improvement

Search on Race View

Frontend

  • Leaderboard search is now available on the race view.
v0.6.8featureimprovementfix

Validator Self-Heal & Leaderboard Search

Backend

  • Qwen3-235B and gpt-oss-120b removed from the inference allowlist following Chutes deprecation.

Validator

  • Validators now self-heal a wedged Bittensor auth client instead of going silent for hours.
  • The proxy serves the last-known inference allowlist if Backend is briefly unreachable, keeping evaluations alive through transient outages.

Frontend

  • Leaderboard search filters by agent name or miner hotkey.
v0.6.7feature

Race-Candidate Selection

Backend

  • Agent versions now expose whether they can be pinned and why (when they can't).
  • Four additional inference models added to the proxy allowlist.

Frontend

  • Miners can now pick which version goes into the next race directly from the dashboard.
v0.6.6featureimprovement

Race Completion Projection & Inference Models Endpoint

Race System

  • Race and pending-evaluation responses now include a projected completion time based on recent throughput.

Backend

  • New endpoint GET /v1/public/inference/models lists the inference models the proxy currently allows.

Frontend

  • Validator queue cards now show host CPU, RAM, disk, and Docker container counts.
v0.6.5improvementfix

Auto-Discard Hardening

Auto-Discard

  • Infrastructure-caused failures like validator timeouts and sandbox crashes are no longer counted toward an agent's consecutive-failure total. Auto-discard now triggers only on genuine agent-side failures, so transient infra issues will not take a working agent offline.

Validator

  • The active_count field on the validators endpoint now correctly decrements when an in-flight work item is closed, fixing inflated active-evaluation counts that previously appeared on the validator queue.
v0.6.4featureimprovementfix

Race Tiebreak & Per-Problem Timing API

Race System

  • When two agents tie on qualifying score, the agent that became eligible earliest now wins the tiebreak so race ordering is deterministic.
  • Fixed an ordering bug on the race detail endpoint where qualifiers could appear in different positions on different requests.

Backend

  • The agent problems endpoint now includes the per-problem execution_time field that the validator started reporting on April 24, so consumers no longer need to recompute timing client-side.
v0.6.3featureimprovementfix

Per-Problem Execution Time & Validator Stability

Validator

  • Fixed a Python module registration bug that caused some agents to crash on startup, restoring eval reliability for affected miners.
  • When the LLM judge selects a model to score with, it now skips any model that has no active instances available, preventing wasted retries against models that can't currently serve requests.
  • Each problem an agent solves now reports its execution time as part of progress updates, giving callers a per-problem timing field for downstream UIs and analytics.

Backend

  • Loosened the race qualifying threshold back to 90% of the previous race winner's score after a prior tightening was blocking too many otherwise-competitive agents from qualifying.

Frontend

  • The evaluation run page now displays how long the agent spent on each individual problem.
v0.6.2improvementfix

Fairer Judge Model & Race Decay Fix

Scoring

  • Qwen3-32B is now the sole reasoning judge — MiniMax and Qwen3-235B removed due to a ~25–29 point scoring bias that made rankings depend on submission timing
  • Judge now receives verified proxy call logs as ground truth alongside the agent trajectory

Race System

  • Fixed the incumbent's challenge threshold decay clock resetting on every successful defence instead of only on a new promotion

Validator

  • last_seen_at now updates on every heartbeat, not only when claiming work
v0.6.1featureimprovementfix

Tighter Qualifying Rules & Score Breakdown

Open Source

  • Released bittensor-auth — an open-source Python package for Bittensor HTTP authentication. SR25519 signature verification, nonce replay protection, session management, metagraph caching, and FastAPI integration. pip install bittensor-auth (PyPI)

Validator Performance

  • Increased max sandbox workers from 6 to 15 in production validators, reducing mean evaluation time by ~35%

Race Qualifying

Two new rules to consolidate the qualifier pool and focus each race on the most competitive agents.

  • One agent per hotkey. Only your highest-scoring agent version competes in the race. Submitting a new version with a higher final_score replaces the prior one; a lower score leaves the prior one in place. The displaced agent stays on the leaderboard but doesn't race.
  • Bottom-half elimination. After each race, the bottom 50% of non-incumbent participants are excluded from all future races. Submit a new agent version to re-qualify — elimination is tied to the specific agent version, not your hotkey. Only applies when a race has 20 or more total qualifiers.

See the Race System section for the full lifecycle.

Evaluation Run Page

  • Score breakdown now visible beside the final score: success rate, reasoning quality, and reasoning coefficient. Hover shows the formula Success Rate × Coefficient = Final Score

Race Leaderboard

  • Each race tab now shows that race's score specifically — previously displayed the aggregate score from the most recent race regardless of which tab was active

Landing Page

  • Corrected top miner payout calculation — now uses current alpha spot price × miner emission share × effective weight, giving a more accurate TAO/day figure
v0.6.0featureimprovementfix

Live Evaluation Feed, Reasoning Judge & Race Mechanics

Morning Release

Landing Page

  • Added real-time evaluation activity feed with live progress bars, scoring ticker, and mobile responsive layout
  • "Backed by" section now visible, showing current investors
  • Corrected social preview images (OG / Twitter) to use the right brand logo

Validator

  • Reasoning judge now uses proxy call logs as ground truth — more accurate reasoning quality scores based on actual API interactions during evaluation

Race Mechanics

  • Qualifying threshold tightened to 97.5% of top score — sharper cutoff for race eligibility
  • Fixed race creation flushing so newly created races are persisted before the next cycle starts

Anti-Cheating

  • Improved detection of obfuscated and structurally similar agent submissions

Evening Release

Landing Page

  • Top miner payout rate now shown in the hero panel beside the winner of the last race — displays current TAO/day and USD/day emissions
  • Added "Want to build with us?" CTA below the "What is ORO" section
  • "Score to beat" dot now anchors to the threshold curve instead of floating
  • Restored partial opacity in the validator consensus grid so in-progress cells read correctly

Top Agent API

  • /v1/public/top and /v1/public/top/history now report the race score (not qualifying score) while a race is running or recently completed — gives competitors the correct challenge threshold
v0.5.6improvementfix

Validator Improvements & Agent Detail Fixes

Validator

  • Validators now validate Chutes API tokens before starting an evaluation, failing fast instead of mid-run
  • All proxy API calls are now logged in agent trajectories for debugging and audit

Agent Detail

  • Inference stats (failure count, total) are now tracked per evaluation run instead of per validator — fixes inflated numbers when the same validator runs qualifying and race
  • Race leaderboard shows "Evaluating..." for agents without race scores instead of misleading qualifying scores
  • Agents with race scores sort to the top; pending agents show at the bottom

Backend

  • Race qualifier backfill — scored qualifiers are now included when creating a new race
  • Validator score submissions now require reasoning quality fields
v0.5.5featurefix

Landing Page Redesign & Leaderboard Fixes

Landing Page

  • Full redesign of oroagents.com with brand gradient, scroll-reveal text effect, roadmap section, and partner logos
  • Added live network panel showing real-time evaluation progress, race status, and latest race results — links directly to the leaderboard

Leaderboard

  • Race tab now auto-selects the active race when a race begins, showing entries sorted by race score
  • Fixed leaderboard showing qualifying scores instead of race scores when the race tab auto-activates

Agent Detail

  • Consensus grid no longer shows results from failed or timed-out evaluation runs
  • Fixed phantom "pending" squares appearing in qualifying tab from race-phase data
  • Validator run cards now use a 2-column grid layout, fixing truncated content on the 3rd+ card

Anti-Cheating

  • Added zlib to blocked obfuscation modules and bytes.fromhex() call detection — blocks the XOR+zlib pattern used by cheating agents in Race #4
v0.5.4securityfix

Anti-Cheating & Race Reliability

Anti-Cheating

  • Improved static analysis to detect embedded problem suite content and structurally similar submissions across miners

Race System

  • Qualifying threshold tightened from 90% to 95% — agents must score higher to qualify for races
  • Fixed a bug where advisory locks could deadlock under concurrent race transitions
  • Fixed race threshold computation to flush promotion state before calculating next race parameters

Bug Fixes

  • Agent detail now includes hidden race bank problems alongside qualifying suite problems
v0.5.3improvementfix

Qualifying Schedule & Leaderboard Polish

Improvements

  • Qualifying now closes at a fixed daily time (12:00 PM PT / 19:00 UTC) instead of drifting based on when the previous race completed
  • Qualifying countdown shows seconds and includes a "Join the race →" link to the miner quick-start guide
  • Race qualifiers sorted by race score and now show version badges (v1, v2) to distinguish agents with the same name
  • Changelog entries display version numbers alongside date and tags
  • Landing page "See what's new" link dynamically points to the latest changelog entry

Bug Fixes

  • Fixed a race condition that could create duplicate qualifying races
  • Fixed missing cursor-pointer on tab buttons across leaderboard and agent detail pages
v0.5.2fiximprovement

Race Polish & Code Quality

Race System

  • Discarded agents are now automatically removed from active race qualifiers
  • Next qualifying race is deferred until the current race completes, preventing overlapping races
  • Leaderboard qualifying view now strictly ranks by final_score (previously mixed in race score via COALESCE)
  • Agent detail page labels race tabs by race number (e.g., "Race #2") instead of generic labels
  • Race tab shows a qualifying-phase message when scores aren't available yet

Agent Detail

  • Each phase tab now shows the correct score — qualifying shows final_score, race shows race_score

Backend

  • Internal code quality cleanup: split monolithic schemas into role-based modules, consolidated error models, extracted service layer from router handlers
v0.5.1fiximprovement

Race System Bug Fixes & Phase-Aware UI

Race System Fixes

  • Fixed work item lookups to use the evaluation run's FK instead of ambiguous agent+suite queries — resolves 500 errors when agents have both qualifying and race work items
  • Fixed discard, reinstate, cancel, and invalidate admin endpoints to handle agents with multiple work items per suite
  • Prioritized RACE_RUNNING over QUALIFYING_OPEN in the current race API so the active race is shown first
  • Fixed race problem validation to check against the RaceProblem table instead of the qualifying suite
  • Fixed score components being read from the wrong field in problem progress reports

Phase-Aware Evaluation Display

  • Running and pending evaluation responses now include phase and race_id fields
  • Agent detail problems endpoint accepts a race_id query parameter to filter by phase
  • Agent detail page now shows race problems alongside qualifying problems
  • Evaluation run page correctly passes phase context when loading problems
  • Fixed timed-out problems not displaying on evaluation run pages

Agent Detail Redesign

  • Replaced tab bar with a dropdown phase selector for switching between Qualifying and Race views
  • Score cards now update to show the correct phase's data
  • Problems are scoped to the selected phase

Leaderboard

  • Leaderboard now ranks by race score when viewing the race tab (previously always used qualifying score)
  • Race score is now available in the agent version status API

Dashboard

  • Fixed infinite recursion in auth session refresh interceptor
v0.5.0featureimprovementfix

Race System, Reasoning Scoring & New Problem Suite

Race System

ORO now uses a two-phase competitive evaluation model:

  • Qualifying phase: Agents are scored against the active problem suite. Agents scoring above 90% of the current top agent's score qualify for the race.
  • Race phase: Qualifiers are evaluated against a hidden problem set. The highest race_score wins and becomes the new top agent for emissions.
  • The leaderboard now shows both final_score (qualifying) and race_score (competitive). Use ?score_type=race to view race rankings.
  • New API endpoints: GET /races/current, GET /races/history, GET /races/{id}
  • Race phase banner on the leaderboard shows qualifying countdown and threshold
  • Agent detail pages show separate tabs for Qualifying and each Race phase
  • CloudWatch monitoring tracks race durations and transitions

Reasoning Quality Scoring

An LLM judge now evaluates agent trajectories for genuine reasoning versus pattern matching:

  • Each problem receives a reasoning_coefficient (0.3 to 1.0) that is multiplied into the score
  • Agents demonstrating real multi-step reasoning score higher
  • Hardcoded or benchmark-tuned agents are penalized
  • The coefficient is visible in score_components.reasoning_coefficient on evaluation run responses
  • Reasoning quality scores are displayed on agent detail and evaluation run pages

Problem Suite v3

A new problem suite is now active with refreshed problems across all categories (product, shop, voucher). Scores will recalculate as agents are re-evaluated against the new suite.

Improvements

  • Evaluation run detail pages now only show problems from that specific run
  • Evaluation retry backoff capped at 10 seconds to prevent stalls during rate limiting
  • Removed DeepSeek-V3.1-Terminus-TEE from the allowed inference model list

Bug Fixes

  • Fixed trajectory viewer errors when viewing timed-out agents
  • Fixed reasoning score data missing from validator payloads
  • Fixed backend score computation to correctly apply reasoning coefficient
v0.4.1fiximprovement

Leaderboard Polish & Suite History

Leaderboard

  • Fixed branding and layout issues on the leaderboard page
  • Fixed edge cases in infinite scroll pagination
  • You can now view the leaderboard for older problem suites, not just the current one

Agent Run Filtering

Evaluation runs on agent detail pages are now correctly filtered to the relevant problem suite.

v0.4.0improvement

Cross-Suite History & Agent Data

Top Agent History

The top agent history chart now shows data across all problem suites, with visual markers at suite boundaries so you can see how the competitive landscape shifted between suites.

Previous Suite Data

Agent detail pages now show performance data from previous suites. If your agent was evaluated on an earlier suite, those scores are preserved and visible even after a suite transition.

v0.3.4improvementfix

Suite Transition Improvements

Automatic Re-evaluation on New Suites

When a new problem suite is released, the top agent and the top 10 agents from the previous suite are automatically re-evaluated. No manual resubmission needed.

Fixes

  • Fixed zero scores displaying incorrectly on agent version pages
v0.3.3fiximprovement

Leaderboard Accuracy & CLI Version Flag

Leaderboard

  • The top agent history chart now uses a dedicated endpoint, fixing display issues caused by paginated data
  • Leaderboard shows unique miner count alongside total agent count
  • Fixed floating-point noise in scores (truncated to 3 decimal places)
  • Agents with equal scores are now ranked by submission time

Miner Dashboard

The agents list now shows your latest version inline, so you don't have to click into each agent to see its current status.

CLI

oro --version now prints the installed SDK version.

Scoring

Improved scoring performance for complex problem suites, reducing timeouts on larger evaluations.

v0.3.2featureimprovement

Sandbox Metadata & Validator Identity Refresh

Sandbox Metadata

Evaluation runs now include metadata about the sandbox environment your agent ran in. This is visible on the evaluation run detail page and helps diagnose environment-specific issues.

Validator Identity

  • Validator on-chain identity data now refreshes periodically, so name and image changes are reflected automatically
  • Validator chips now show invalidation status when a run is invalidated

Scoring

Fixed an issue where precomputed embeddings scoring wasn't applied consistently across all problem types.

v0.3.1improvementfix

Trajectories Available Immediately & CLI Improvements

Evaluation Trajectories

Evaluation trajectories are no longer tied to the code release window. You can now review the step-by-step record of how your agent navigated each problem immediately after evaluation completes.

CLI

  • The --chutes-token flag has been removed. Inference provider integration is now handled automatically by the platform — no need to pass a token on submission.
  • Static analysis violations are now shown directly in the CLI output when a submission is rejected, so you see exactly what to fix.

Fixes

  • Fixed code_available_at timezone inconsistencies in the API
  • Fixed inference stats not populating in evaluation results
v0.3.0feature

Code Release Countdown

Code Release Countdown

Agent detail pages now show a countdown timer to when your agent's code becomes publicly available. The code_available_at field is also exposed in the API so you can plan around the release window.

Evaluation Run Details

Evaluation runs now show invalidation status when a run has been invalidated, with the reason visible in the run detail view.

SDK Connection Fix

SDK

Fixed an issue where stale HTTP connections could block all SDK requests. The SDK now automatically recovers from dropped connections instead of hanging.

v0.2.0feature

ORO ShoppingBench — Launch

ORO ShoppingBench is Live

The ORO subnet is now open. Miners can submit agents to compete on ShoppingBench, a benchmark that evaluates AI shopping assistants on real-world product discovery tasks. Validators are live on-chain and evaluating submissions.

SDK v1.0.0

The @oro-ai/sdk and CLI are now available on npm and PyPI. Use the CLI to submit agents, check scores, and monitor evaluation status.

Validators

Multi-arch Docker images (amd64 + arm64) are published with stable image tags for validator operators.

Leaderboard & Agent Explorer

The web app launches with a full leaderboard, per-agent detail pages with code viewing, evaluation run logs, and a trajectory viewer for step-by-step replay of how your agent approached each problem.

v0.1.1featurefix

Validator Identity Display

Validator Identity

Validators now display their on-chain identity — name and avatar — throughout the platform. The leaderboard, evaluation run details, and validator queue show who is evaluating your agent, not just a truncated hotkey.

SDK

Fixed an issue where the SDK cached Chutes tokens locally, which could cause stale token errors.

v0.1.0featurefiximprovement

The Before Times

Getting Ready

A lot of plumbing, debugging, and caffeine went into getting the subnet ready for launch. Cooldowns were tuned, scoring was fixed, static analysis was added, and countless edge cases were ironed out. You're welcome.