AgentNash
What is AgentNash?How It WorksConnecting Your AccountTraining vs Live ModeStrategiesCouncil V2The SuperforecasterThe Council (Legacy)TerminalAgentsBenchmarkingSafeguards & RulesNuclear Option

Strategies

The Superforecaster

Research-first structured decomposition for Kalshi and Polymarket.

The Superforecaster is a research-first prediction strategy that grounds every forecast in sourced, real-time evidence. Before the reasoning model ever sees a market question, Perplexity Sonar Pro searches the live web for recent developments, historical base rates, stakeholder signals, and arguments on both sides of the outcome. The reasoning model receives verified facts rather than relying on stale training data — every claim traces back to a real source.

Available on both Kalshi and Polymarket, the Superforecaster takes the opposite approach from the Council's multi-model debate. Where the Council uses breadth (five specialist models debating), the Superforecaster uses depth — a single powerful reasoning model armed with comprehensive, pre-gathered research. Its structured decomposition methodology comes directly from the superforecasting literature: decompose the question, anchor to base rates with real sample sizes, apply inside and outside views independently, then synthesize a calibrated probability weighted by evidence quality.

The default reasoning model is Claude Opus 4.6, selectable from the bot settings dropdown. The research model is always Perplexity Sonar Pro, which performs agentic multi-step web searches with built-in citation tracking.

The Pipeline at a Glance

Every market opportunity flows through 8 stages before any capital moves:

1

Market Scanning

Fetch active binary markets from the exchange API, filter by volume, expiry, category, and price extremes.

2

Selection & Dedup

Skip already-decided markets (configurable cooldown), enforce daily AI budget, and filter by allowed categories.

3

Web Research

Perplexity Sonar Pro searches the live web for recent developments, base rate data, stakeholder signals, and arguments for/against. Runs in batches of 3 to respect rate limits.

4

Superforecaster Analysis

Two-phase reasoning: Phase 1 audits research for contradictions and hallucinations (quality score 1-10). Phase 2 applies structured decomposition to produce a calibrated probability.

5

Edge Calculation

Compare AI probability to market price. Required edge: 4% at high confidence (>=80%), 6% at medium (60-79%), 8% at low (<60%). Always skip below 50% confidence.

6

Position Sizing

Tier-based system by account size with Kelly Criterion scaling (quarter-Kelly). Cash reserves, position limits, and exchange minimum order sizes enforced.

7

Risk Checks

11 bot-level + 6 account-level rules validated by the backend orchestrator. Any single failure blocks the trade.

8

Execution & Settlement

Order intercepted, validated, and routed. Training mode saves paper trades; live mode places real orders on the exchange.

Step 1: Market Scanning

The agent fetches active binary markets from the exchange's data API. On Kalshi, it uses cursor-based pagination (up to 1,000 markets per page, 5 pages max) with server-side filters for close time and status. On Polymarket, it queries the data API with offset pagination filtered by volume, active status, and expiry window.

Only markets matching all of these criteria survive the scan:

  • Market type: Binary only — scalar, combo, and multivariate markets are excluded.
  • Status: Active and open only. No closed, settled, or suspended markets.
  • Price range: YES price must be between $0.03 and $0.97. Markets outside this range are effectively resolved.
  • Volume: Must meet the minimum volume threshold (default: 50). Configurable per bot.
  • Expiry: Must close within the configured window (default: 7 days). Markets with at least 1 hour remaining.
  • Category: If allowed categories are configured, only those categories pass. Otherwise all categories are eligible.

Category Inference

Markets are automatically categorized by scanning event tickers and titles for known keywords:

CategoryDetected Keywords
SportsNBA, NFL, MLB, NHL, NCAA, UFC, Soccer, Tennis, PGA, Golf
CryptoBitcoin, BTC, ETH, Crypto, SOL, DOGE, XRP, DeFi
EconomicsFed, CPI, GDP, Inflation, Jobs, Tariff, Oil, Gold, Treasury
PoliticsTrump, Biden, Election, Vote, Congress, Senate, President
WeatherWeather, Temperature, Hurricane, Climate, Storm
TechAI, Apple, Google, Meta, Microsoft, Tesla, OpenAI
OtherDefault when no keywords match

Price Handling

On Kalshi, prices arrive as dollar strings. The agent computes midpoint prices from the bid/ask spread: yes_price = (yes_bid + yes_ask) / 2. On Polymarket, prices come from the outcomePrices array as floats (0.0 to 1.0) representing YES and NO probabilities.

Step 2: Selection & Deduplication

Before spending AI credits, the agent applies pre-checks to avoid redundant work. Markets that fail any check are skipped without calling any AI model.

1

Backend Dedup

Queries the backend for markets already decided within the cooldown window (default: 6 hours). Markets that were executed, skipped, or rejected are excluded.

2

Daily AI Budget

Checks if the daily AI spending limit has been exceeded (default: $10/day). If over budget, all analysis stops until the next day.

3

Category Filter

If allowed_categories is configured, markets outside those categories are excluded before any AI call.

After filtering, the top {config.max_markets_per_cycle} markets by volume (default: 10) are selected for research and analysis in the current cycle.

SettingDefaultWhat It Does
Reanalyze Cooldown6 hoursMin hours before the same market can be re-analyzed
Daily AI Budget$10Max daily spend on AI API calls (research + reasoning combined)
Max Markets per Cycle10Top N markets by volume to analyze each cycle
Allowed CategoriesAllComma-separated category whitelist (empty = trade all)

Step 3: Web Research (Perplexity Sonar Pro)

This is the Superforecaster's defining advantage. Before the reasoning model forms any opinion, Perplexity Sonar Pro performs agentic, multi-step web searches and returns structured findings with source citations. The research model always runs at temperature 0.0 for deterministic, fact-focused output.

For each market, the research prompt requests six categories of evidence:

  1. Recent Developments: Key news from the last 7 days directly relevant to the outcome, with dates, sources, and specific facts. If the event has not occurred yet, the model must state this clearly.
  2. Base Rate Data: Historical frequency of similar events. How often have comparable situations resolved YES vs. NO? Specific numbers and sample sizes required.
  3. Key Stakeholders & Signals: What have relevant decision-makers, experts, or officials said? Scheduled events (votes, hearings, deadlines) that could force resolution.
  4. Arguments for YES: The strongest factual evidence and reasoning that the market resolves YES.
  5. Arguments for NO: The strongest factual evidence and reasoning that the market resolves NO.
  6. Expert & Statistical Signals: Domain expert opinions, statistical models, polls, and historical patterns. Explicitly excludes prediction market prices — the research focuses on independent evidence only.

Research runs in parallel batches of 3 markets to stay within rate limits, with a 1-second delay between batches. Each research call has a 90-second timeout with up to 3 retries on server errors. The model receives up to 3,000 tokens of research per market.

The research prompt includes a critical anti-hallucination guardrail: it injects today's date and instructs the model to never fabricate outcomes or claim events have occurred without verifiable sources. If no confirmed result exists, the model must state that explicitly.

Step 4: Superforecaster Analysis

The core analysis happens in two mandatory phases within a single model call to the user-selected reasoning model (default: Claude Opus 4.6). The model must complete Phase 1 before beginning Phase 2 — this ordering prevents the model from anchoring to the market price before evaluating evidence quality.

Phase 1: Research Audit

The model acts as an adversarial reviewer, examining the research with the goal of finding errors rather than confirming conclusions. It checks for five categories of problems:

  • Internal Contradictions: Do any data points conflict with each other? A claimed "52-week low" higher than the current price suggests a data error or stock split.
  • Logical Consistency: Are YES/NO outcome labels applied correctly? Does any argument accidentally support the opposite outcome from what it claims?
  • Suspicious Data: Do any numbers, dates, or claims seem implausible in context? The model cross-checks figures against each other.
  • Hallucination Signals: Does the research claim an event has "already happened" while the market price suggests it has not resolved? Anything dated after today is speculation, not fact.
  • Missing Context: What important information is absent from the research? Key gaps are flagged.

After the audit, the model explicitly states which findings it trusts and will use, which it discards and why, and assigns an overall quality score from 1 to 10. If research quality scores below 3, the system logs a warning — the model should anchor more heavily to base rates than to thin evidence.

Phase 2: Probability Estimation via Structured Decomposition

Using only the findings marked as trusted in Phase 1, the model follows a 7-step methodology drawn from the superforecasting literature:

  1. Decompose — Break the question into independent sub-questions that can be assessed separately.
  2. Establish Base Rates — For each sub-question, find the historical frequency of similar events. Must include sample sizes — "3 out of 8 comparable periods since 2017" not just "it has happened before." If no hard data exists, the model must reason from first principles and state that the estimate is a reasoned guess, not an empirical finding.
  3. Inside View — What specific current evidence shifts probability from the base rate? Each adjustment must reference sourced research from Phase 1.
  4. Outside View — What does the reference class of similar events suggest, ignoring the specific details? This is the probability anchor.
  5. Synthesize — Weight inside and outside views independently, then combine. Strong corroborated evidence justifies larger departures from the base rate. Weak or contradictory evidence means staying close to the outside view.
  6. Calibrate — Express as a precise probability (0.00 to 1.00), never using vague words like "likely."
  7. Compare to Market — Only after forming an independent estimate, compare it to the market price. If they agree, the market is fairly priced. If they diverge, explain why with specific evidence. The model must not force an edge where none exists.

Output Format

The model returns structured JSON containing:

  • research_quality — Score (1-10), issues found, trusted findings, discarded findings with reasons.
  • probability (0.00-1.00) — Calibrated P(YES) from structured decomposition.
  • confidence (0.00-1.00) — How well-calibrated the estimate is. Higher when evidence is strong and corroborated.
  • side ("YES" or "NO") — Which side has positive expected value.
  • limit_price (0.01-0.99) — Maximum price the model would pay for the recommended side.
  • position_size_pct (1-25) — Suggested percent of available capital, lower when uncertain.
  • should_trade — Boolean flag indicating whether the model believes genuine edge exists backed by specific evidence.
  • reasoning — Full audit summary followed by step-by-step decomposition, base rate, evidence, probability, and market comparison.
  • key_factors — List of the most important factors driving the estimate.

Key Rules

  • Independence first. The model forms its probability estimate before comparing to the market. It starts from base rates and evidence, not from the current price.
  • Never fabricate statistics. If no hard data exists, reason from first principles and say so explicitly.
  • Confidence reflects evidence quality, not probability extremity. A 90% probability with 40% confidence means the outcome appears very likely but the analysis rests on weak evidence. The edge calculation demands a larger mispricing to compensate.
  • Honesty about uncertainty. High confidence requires strong, corroborated evidence. The model must not force an edge where the market is fairly priced.

Step 5: Edge Calculation & Decision

After analysis, the agent calculates the edge — the absolute difference between the AI's probability estimate and the market price for the traded side:

  • edge = |ai_probability - market_price| for the side being traded.
  • For YES trades: ai_prob = probability, market_price = yes_price.
  • For NO trades: ai_prob = 1 - probability, market_price = no_price.

The trade only proceeds if confidence meets the minimum threshold (50%) AND the edge exceeds the tier requirement for that confidence level:

Required Edge by Confidence

AI ConfidenceEdge RequiredReasoning
>=80%>=4%High certainty — strong corroborated evidence justifies a tighter threshold
60-79%>=6%Medium certainty — need a clearer mispricing before committing capital
50-59%>=8%Low certainty — only trade obvious mispricings with wide margins
<50%Always skipConfidence too low to act regardless of apparent edge

Example: AI confidence 75%, AI probability 65%, market YES price $0.58. Edge = |0.65 - 0.58| = 7%. Required for 75% confidence = 6%. Since 7% > 6%, this trade passes the edge check and proceeds to position sizing.

Step 6: Position Sizing

Position sizing adapts automatically to account size using a tier-based system. Larger accounts allocate a smaller percentage per trade and can hold more contracts per order.

Tier Table

Account SizeBase %Max %Max Contracts/Order
Under $10020%40%10
$100 - $1K5%15%50
$1K - $10K3%8%250
$10K - $100K2%5%1,000
$100K+1%3%5,000

How the Calculation Works

  1. Determine available cash: Subtract the cash reserve (5% of balance) from total cash. If available cash is zero or negative, no trades are placed.
  2. Look up tier: Find the row matching the current balance.
  3. Base investment: available_cash x base_pct
  4. Edge-scaled multiplier: scaler = 1.0 + (kelly_multiplier x signed_edge), clamped between 0.1x and 3.0x. Stronger edges produce larger positions.
  5. Apply scaler: investment = base_investment x scaler
  6. Cap at max: The lesser of available_cash x max_pct and balance x max_position_pct / 100
  7. Convert to contracts: int(investment / market_price), capped at tier_max_contracts
  8. Enforce minimums: If the position cost is below min_position_size ($1 default), bump to the minimum viable quantity if cash permits.
  9. Kelly criterion cap: If the risk manager recommends a lower size, the position is reduced accordingly. Uses quarter-Kelly (0.25 multiplier) for production safety.

On Polymarket, an additional check enforces the exchange's per-market orderMinSize — the minimum number of shares the CLOB will accept for that specific market.

Safety Caps

CapDefaultDescription
Cash reserves5%Must keep 5% of balance in cash at all times
Max single position30%No single trade can use more than 30% of portfolio
Max concurrent positions5Cannot open more positions until one closes
Kelly multiplier0.25Quarter-Kelly for conservative production sizing
Min position size$1.00Trades costing less than this are not placed

Step 7: Risk Checks

Every trade must pass three independent layers of validation before execution. Any single failure at any layer blocks the trade entirely. For the full breakdown of all rules, see the Safeguards page.

Layer 1: Agent-Level Guards

CheckDefaultWhat Happens
Max concurrent positions5Cannot open more positions until one closes
Max position size30% of portfolioSingle position cannot exceed this
Cash reserves minimum5%Must keep 5% of balance in cash at all times

Layer 2: Backend Rules Engine (11 Rules)

#RuleDefaultWhat It Does
1Trade size$100 maxRejects trades exceeding this cost
2Capital per agent$2,000Max capital a single agent can deploy
3Daily loss limit$500Kill switch — pauses agent if daily losses exceed this
4Min confidence60%Rejects trades below this confidence score
5Allowed categoriesAllWhitelist of tradeable categories
6Blocked tickersNoneBlacklist of specific market tickers
7Max positions10Concurrent open position limit
8Duplicate preventionOnBlocks duplicate position on same ticker
9Opposing positionBlockedPrevents YES and NO on same market
10Max trades/dayUnlimitedDaily trade count limit per agent
11Sell requires positionOnCannot sell if you don't hold the position

Layer 3: Account-Level Validation

CheckDefaultScope
Max trades/day (global)50All agents combined
Global daily loss$500Across all agents
Max trades per marketUnlimitedAny single market
Cooldown0 hoursMin time between same-market trades
Active hoursAlwaysUTC hours when trading is allowed
Daily AI budget$50Global AI API spend cap

You can configure all rules from Settings → Safeguards. Changes take effect on the next validation cycle — no restart or redeployment needed.

Step 8: Execution & Settlement

Once a trade passes all risk checks, it enters the execution pipeline. The agent never talks directly to the exchange — every order is intercepted by the backend, queued, and validated before execution.

Execution

  • Training mode: Saved as a paper trade. No exchange API call. P&L is calculated against real market prices.
  • Live mode (Kalshi): A real limit order is placed on Kalshi using cryptographically signed authentication.
  • Live mode (Polymarket): A signed order is submitted to the Polymarket order book with the appropriate token ID, tick size, and neg-risk flag.
  • Order interception: All orders are intercepted and validated by the orchestrator before execution.

Settlement

A settlement checker runs periodically in the backend and polls both exchanges for resolved markets.

  • Kalshi: Markets settle internally when the event outcome is confirmed. Typically minutes to hours after the event. Winning contracts pay $1.00 USD.
  • Polymarket: Uses the UMA Optimistic Oracle for decentralized resolution. Undisputed: 2 hours after proposal. Disputed: 4-6 days (UMA token holder vote). Winning tokens pay $1.00 USDC.

Example: Full Superforecaster Walkthrough

Here is how the Superforecaster analyzes a hypothetical market: "Will Bitcoin exceed $150K by June 30?" — currently trading at $0.22 YES (market implies 22% probability).

Step 3: Web Research

Perplexity Sonar Pro searches the live web and returns structured findings:

  • Recent developments: Bitcoin trading at ~$97K. Spot ETF inflows averaging $400M/week. Halving supply shock already priced in (April 2024). Fed holding rates steady.
  • Base rate data: Bitcoin achieved a 50%+ move in 3 of 8 comparable 6-month periods since 2017 (37.5%). A move from $97K to $150K requires a 55% gain.
  • Stakeholder signals: Standard Chartered targets $150K year-end. Most sell-side analysts project $100-120K range.
  • Arguments for YES: Structural institutional ETF demand, post-halving supply squeeze peaks 12-18 months after halving, improving regulatory clarity.
  • Arguments for NO: 55% move needed in under 3 months, no imminent rate cuts, ETF inflows have plateaued from peak levels, sub-40% base rate for moves of this magnitude.

Step 4 Phase 1: Research Audit

  • Contradiction found: One source says ETF inflows are "accelerating," another says "plateaued from peak levels." Both technically true — flows are positive but below all-time highs. Nuance noted.
  • Suspicious claims: None — all statistics trace to verifiable sources.
  • Hallucination check: No claims of events that have not yet occurred.
  • Quality rating: 8/10 — strong sourcing, minor contradiction handled appropriately.

Step 4 Phase 2: Structured Decomposition

Sub-questions:

  1. Can Bitcoin sustain its trajectory? Base rate: post-halving bull runs sustained in 5 of 7 cycles (71%). Inside view: ETF inflows positive but decelerating. Assessment: 60%.
  2. Is a 55% move in ~3 months feasible? Base rate: achieved in 4 of 15 comparable periods (27%). Inside view: institutional demand is new but price already elevated. Assessment: 25%.
  3. Are there catalysts to accelerate? Inside view: no imminent rate cuts, no new ETF approvals pending. Assessment: 15% chance of sufficient catalysts.

Outside view anchor: 37.5% base rate for 6-month windows, adjusted to ~25% for a 3-month window.

Inside view adjustments: Positive institutional demand (upward), decelerating flows and no macro catalyst (downward). Net: roughly neutral.

Synthesis: Anchoring to the adjusted 25% base rate with neutral net adjustments. Final estimate: 24% probability. Confidence: 72%.

Output: probability: 0.24, confidence: 0.72, side: "NO"

Step 5: Edge Calculation

The model recommends the NO side. AI thinks P(NO) = 76%, market prices NO at $0.78. Edge = |0.76 - 0.78| = 2%. Required at 72% confidence = 6%. Since 2% < 6%, the edge is insufficient.

Result: SKIP. Neither side has sufficient edge. The market is approximately fairly priced according to the Superforecaster's analysis.

What If the Market Were at $0.12?

If YES traded at $0.12 (12% implied), the AI's 24% estimate gives a YES edge of 12%. At 72% confidence, the required edge is 6%. Since 12% > 6%, the trade proceeds. With a $1,000 account (3% base tier, 5% reserve = $950 available), base investment = $28.50. Edge scaler with 12% edge: 1.0 + (0.25 x 0.12) = 1.03x. Investment = ~$29.36. Contracts: int($29.36 / $0.12) = 244 contracts, capped at 250 by tier.

Polymarket Differences

The Superforecaster runs the same research and analysis pipeline on both exchanges. The differences are in market data sourcing, order execution, and settlement mechanics.

AspectKalshiPolymarket
Data APIKalshi REST APIPolymarket data API
Market IDTicker string (e.g., KXBTC-25MAR21)conditionId (0x hex hash)
PricesDollar strings, bid/ask midpointoutcomePrices array [yes, no]
Token IDsN/ASeparate YES and NO token IDs per market
Order signingCryptographic signature authenticationWallet-based signing
Tick size1 centPer-market (0.01 or 0.001)
Neg-riskN/AMarkets may use neg-risk complement structure
Min order size1 contractPer-market orderMinSize from API
SettlementCentralized (Kalshi confirms)UMA Optimistic Oracle (2h undisputed, 4-6d disputed)
Payout currencyUSDUSDC

On Polymarket, the agent also tracks additional market metadata: yes_token_id, no_token_id, tick_size, neg_risk, and order_min_size. These are required for constructing valid CLOB orders. Title-date extraction is used as a sanity check for multi-resolution markets where the data API's endDate can be unreliable.

All Configurable Settings

Every setting listed here can be adjusted per bot. Agent-level settings are in the bot's configuration panel. Account-level settings are in Settings → Safeguards.

AI & Research

SettingDefaultDescription
Reasoning ModelClaude Opus 4.6User-selectable from bot settings dropdown
Research ModelPerplexity Sonar ProResearch model (always Perplexity Sonar Pro)
AI Temperature0.0Model temperature (deterministic output)
AI Max Tokens4,000Max tokens per model response
AI Timeout120sTimeout per model call
Daily AI Budget$10Daily spending limit on AI API calls
Reanalyze Cooldown6 hoursMin hours between same-market analyses

Market Scanning

SettingDefaultDescription
Min Volume50Minimum volume (contracts or USDC) to be eligible
Max Expiry7 daysMarkets expiring beyond this window are skipped
Max Markets per Cycle10Top N markets by volume analyzed per cycle
Allowed CategoriesAllComma-separated category whitelist (empty = all)

Edge Thresholds

SettingDefaultDescription
High Confidence Edge4%Required edge when confidence >= 80%
Medium Confidence Edge6%Required edge when confidence 60-79%
Low Confidence Edge8%Required edge when confidence < 60%
Min Confidence0.50Below this confidence, all trades are skipped

Position Sizing & Risk

SettingDefaultDescription
Tier systemAutoBase/max percentages adapt to account size (see tier table)
Kelly Multiplier0.25Quarter-Kelly for conservative production sizing
Max Position %30%Max % of portfolio per position
Max Positions5Max concurrent open positions
Min Position Size$1.00Minimum trade cost to place an order
Cash Reserve %5%Minimum cash reserve maintained at all times

The Superforecaster's key advantage is evidence grounding. Every prediction is anchored to sourced research rather than model intuition. This makes it particularly strong on markets where breaking news or fresh data shifts probability in ways that stale training data would miss. The structured decomposition methodology — decompose, base rate, inside/outside view, synthesize — produces calibrated forecasts that resist the overconfidence typical of unconstrained LLM predictions.

Prediction market trading involves real financial risk. Even with comprehensive research and structured methodology, markets can move against well-reasoned positions. The Superforecaster is designed for responsible, risk-managed trading with multiple safety layers. Past performance does not guarantee future results. Always start in Training mode and only switch to Live trading with capital you can afford to lose.