Strategies
The Superforecaster
Research-first structured decomposition for Kalshi and Polymarket.
The Superforecaster is a research-first prediction strategy that grounds every forecast in sourced, real-time evidence. Before the reasoning model ever sees a market question, Perplexity Sonar Pro searches the live web for recent developments, historical base rates, stakeholder signals, and arguments on both sides of the outcome. The reasoning model receives verified facts rather than relying on stale training data — every claim traces back to a real source.
Available on both Kalshi and Polymarket, the Superforecaster takes the opposite approach from the Council's multi-model debate. Where the Council uses breadth (five specialist models debating), the Superforecaster uses depth — a single powerful reasoning model armed with comprehensive, pre-gathered research. Its structured decomposition methodology comes directly from the superforecasting literature: decompose the question, anchor to base rates with real sample sizes, apply inside and outside views independently, then synthesize a calibrated probability weighted by evidence quality.
The default reasoning model is Claude Opus 4.6, selectable from the bot settings dropdown. The research model is always Perplexity Sonar Pro, which performs agentic multi-step web searches with built-in citation tracking.
The Pipeline at a Glance
Every market opportunity flows through 8 stages before any capital moves:
Market Scanning
Fetch active binary markets from the exchange API, filter by volume, expiry, category, and price extremes.
Selection & Dedup
Skip already-decided markets (configurable cooldown), enforce daily AI budget, and filter by allowed categories.
Web Research
Perplexity Sonar Pro searches the live web for recent developments, base rate data, stakeholder signals, and arguments for/against. Runs in batches of 3 to respect rate limits.
Superforecaster Analysis
Two-phase reasoning: Phase 1 audits research for contradictions and hallucinations (quality score 1-10). Phase 2 applies structured decomposition to produce a calibrated probability.
Edge Calculation
Compare AI probability to market price. Required edge: 4% at high confidence (>=80%), 6% at medium (60-79%), 8% at low (<60%). Always skip below 50% confidence.
Position Sizing
Tier-based system by account size with Kelly Criterion scaling (quarter-Kelly). Cash reserves, position limits, and exchange minimum order sizes enforced.
Risk Checks
11 bot-level + 6 account-level rules validated by the backend orchestrator. Any single failure blocks the trade.
Execution & Settlement
Order intercepted, validated, and routed. Training mode saves paper trades; live mode places real orders on the exchange.
Step 1: Market Scanning
The agent fetches active binary markets from the exchange's data API. On Kalshi, it uses cursor-based pagination (up to 1,000 markets per page, 5 pages max) with server-side filters for close time and status. On Polymarket, it queries the data API with offset pagination filtered by volume, active status, and expiry window.
Only markets matching all of these criteria survive the scan:
- Market type: Binary only — scalar, combo, and multivariate markets are excluded.
- Status: Active and open only. No closed, settled, or suspended markets.
- Price range: YES price must be between $0.03 and $0.97. Markets outside this range are effectively resolved.
- Volume: Must meet the minimum volume threshold (default: 50). Configurable per bot.
- Expiry: Must close within the configured window (default: 7 days). Markets with at least 1 hour remaining.
- Category: If allowed categories are configured, only those categories pass. Otherwise all categories are eligible.
Category Inference
Markets are automatically categorized by scanning event tickers and titles for known keywords:
| Category | Detected Keywords |
|---|---|
| Sports | NBA, NFL, MLB, NHL, NCAA, UFC, Soccer, Tennis, PGA, Golf |
| Crypto | Bitcoin, BTC, ETH, Crypto, SOL, DOGE, XRP, DeFi |
| Economics | Fed, CPI, GDP, Inflation, Jobs, Tariff, Oil, Gold, Treasury |
| Politics | Trump, Biden, Election, Vote, Congress, Senate, President |
| Weather | Weather, Temperature, Hurricane, Climate, Storm |
| Tech | AI, Apple, Google, Meta, Microsoft, Tesla, OpenAI |
| Other | Default when no keywords match |
Price Handling
On Kalshi, prices arrive as dollar strings. The agent computes midpoint prices from the bid/ask spread: yes_price = (yes_bid + yes_ask) / 2. On Polymarket, prices come from the outcomePrices array as floats (0.0 to 1.0) representing YES and NO probabilities.
Step 2: Selection & Deduplication
Before spending AI credits, the agent applies pre-checks to avoid redundant work. Markets that fail any check are skipped without calling any AI model.
Backend Dedup
Queries the backend for markets already decided within the cooldown window (default: 6 hours). Markets that were executed, skipped, or rejected are excluded.
Daily AI Budget
Checks if the daily AI spending limit has been exceeded (default: $10/day). If over budget, all analysis stops until the next day.
Category Filter
If allowed_categories is configured, markets outside those categories are excluded before any AI call.
After filtering, the top {config.max_markets_per_cycle} markets by volume (default: 10) are selected for research and analysis in the current cycle.
| Setting | Default | What It Does |
|---|---|---|
| Reanalyze Cooldown | 6 hours | Min hours before the same market can be re-analyzed |
| Daily AI Budget | $10 | Max daily spend on AI API calls (research + reasoning combined) |
| Max Markets per Cycle | 10 | Top N markets by volume to analyze each cycle |
| Allowed Categories | All | Comma-separated category whitelist (empty = trade all) |
Step 3: Web Research (Perplexity Sonar Pro)
This is the Superforecaster's defining advantage. Before the reasoning model forms any opinion, Perplexity Sonar Pro performs agentic, multi-step web searches and returns structured findings with source citations. The research model always runs at temperature 0.0 for deterministic, fact-focused output.
For each market, the research prompt requests six categories of evidence:
- Recent Developments: Key news from the last 7 days directly relevant to the outcome, with dates, sources, and specific facts. If the event has not occurred yet, the model must state this clearly.
- Base Rate Data: Historical frequency of similar events. How often have comparable situations resolved YES vs. NO? Specific numbers and sample sizes required.
- Key Stakeholders & Signals: What have relevant decision-makers, experts, or officials said? Scheduled events (votes, hearings, deadlines) that could force resolution.
- Arguments for YES: The strongest factual evidence and reasoning that the market resolves YES.
- Arguments for NO: The strongest factual evidence and reasoning that the market resolves NO.
- Expert & Statistical Signals: Domain expert opinions, statistical models, polls, and historical patterns. Explicitly excludes prediction market prices — the research focuses on independent evidence only.
Research runs in parallel batches of 3 markets to stay within rate limits, with a 1-second delay between batches. Each research call has a 90-second timeout with up to 3 retries on server errors. The model receives up to 3,000 tokens of research per market.
The research prompt includes a critical anti-hallucination guardrail: it injects today's date and instructs the model to never fabricate outcomes or claim events have occurred without verifiable sources. If no confirmed result exists, the model must state that explicitly.
Step 4: Superforecaster Analysis
The core analysis happens in two mandatory phases within a single model call to the user-selected reasoning model (default: Claude Opus 4.6). The model must complete Phase 1 before beginning Phase 2 — this ordering prevents the model from anchoring to the market price before evaluating evidence quality.
Phase 1: Research Audit
The model acts as an adversarial reviewer, examining the research with the goal of finding errors rather than confirming conclusions. It checks for five categories of problems:
- Internal Contradictions: Do any data points conflict with each other? A claimed "52-week low" higher than the current price suggests a data error or stock split.
- Logical Consistency: Are YES/NO outcome labels applied correctly? Does any argument accidentally support the opposite outcome from what it claims?
- Suspicious Data: Do any numbers, dates, or claims seem implausible in context? The model cross-checks figures against each other.
- Hallucination Signals: Does the research claim an event has "already happened" while the market price suggests it has not resolved? Anything dated after today is speculation, not fact.
- Missing Context: What important information is absent from the research? Key gaps are flagged.
After the audit, the model explicitly states which findings it trusts and will use, which it discards and why, and assigns an overall quality score from 1 to 10. If research quality scores below 3, the system logs a warning — the model should anchor more heavily to base rates than to thin evidence.
Phase 2: Probability Estimation via Structured Decomposition
Using only the findings marked as trusted in Phase 1, the model follows a 7-step methodology drawn from the superforecasting literature:
- Decompose — Break the question into independent sub-questions that can be assessed separately.
- Establish Base Rates — For each sub-question, find the historical frequency of similar events. Must include sample sizes — "3 out of 8 comparable periods since 2017" not just "it has happened before." If no hard data exists, the model must reason from first principles and state that the estimate is a reasoned guess, not an empirical finding.
- Inside View — What specific current evidence shifts probability from the base rate? Each adjustment must reference sourced research from Phase 1.
- Outside View — What does the reference class of similar events suggest, ignoring the specific details? This is the probability anchor.
- Synthesize — Weight inside and outside views independently, then combine. Strong corroborated evidence justifies larger departures from the base rate. Weak or contradictory evidence means staying close to the outside view.
- Calibrate — Express as a precise probability (0.00 to 1.00), never using vague words like "likely."
- Compare to Market — Only after forming an independent estimate, compare it to the market price. If they agree, the market is fairly priced. If they diverge, explain why with specific evidence. The model must not force an edge where none exists.
Output Format
The model returns structured JSON containing:
research_quality— Score (1-10), issues found, trusted findings, discarded findings with reasons.probability(0.00-1.00) — Calibrated P(YES) from structured decomposition.confidence(0.00-1.00) — How well-calibrated the estimate is. Higher when evidence is strong and corroborated.side("YES" or "NO") — Which side has positive expected value.limit_price(0.01-0.99) — Maximum price the model would pay for the recommended side.position_size_pct(1-25) — Suggested percent of available capital, lower when uncertain.should_trade— Boolean flag indicating whether the model believes genuine edge exists backed by specific evidence.reasoning— Full audit summary followed by step-by-step decomposition, base rate, evidence, probability, and market comparison.key_factors— List of the most important factors driving the estimate.
Key Rules
- Independence first. The model forms its probability estimate before comparing to the market. It starts from base rates and evidence, not from the current price.
- Never fabricate statistics. If no hard data exists, reason from first principles and say so explicitly.
- Confidence reflects evidence quality, not probability extremity. A 90% probability with 40% confidence means the outcome appears very likely but the analysis rests on weak evidence. The edge calculation demands a larger mispricing to compensate.
- Honesty about uncertainty. High confidence requires strong, corroborated evidence. The model must not force an edge where the market is fairly priced.
Step 5: Edge Calculation & Decision
After analysis, the agent calculates the edge — the absolute difference between the AI's probability estimate and the market price for the traded side:
edge = |ai_probability - market_price|for the side being traded.- For YES trades:
ai_prob = probability,market_price = yes_price. - For NO trades:
ai_prob = 1 - probability,market_price = no_price.
The trade only proceeds if confidence meets the minimum threshold (50%) AND the edge exceeds the tier requirement for that confidence level:
Required Edge by Confidence
| AI Confidence | Edge Required | Reasoning |
|---|---|---|
| >=80% | >=4% | High certainty — strong corroborated evidence justifies a tighter threshold |
| 60-79% | >=6% | Medium certainty — need a clearer mispricing before committing capital |
| 50-59% | >=8% | Low certainty — only trade obvious mispricings with wide margins |
| <50% | Always skip | Confidence too low to act regardless of apparent edge |
Example: AI confidence 75%, AI probability 65%, market YES price $0.58. Edge = |0.65 - 0.58| = 7%. Required for 75% confidence = 6%. Since 7% > 6%, this trade passes the edge check and proceeds to position sizing.
Step 6: Position Sizing
Position sizing adapts automatically to account size using a tier-based system. Larger accounts allocate a smaller percentage per trade and can hold more contracts per order.
Tier Table
| Account Size | Base % | Max % | Max Contracts/Order |
|---|---|---|---|
| Under $100 | 20% | 40% | 10 |
| $100 - $1K | 5% | 15% | 50 |
| $1K - $10K | 3% | 8% | 250 |
| $10K - $100K | 2% | 5% | 1,000 |
| $100K+ | 1% | 3% | 5,000 |
How the Calculation Works
- Determine available cash: Subtract the cash reserve (5% of balance) from total cash. If available cash is zero or negative, no trades are placed.
- Look up tier: Find the row matching the current balance.
- Base investment:
available_cash x base_pct - Edge-scaled multiplier:
scaler = 1.0 + (kelly_multiplier x signed_edge), clamped between 0.1x and 3.0x. Stronger edges produce larger positions. - Apply scaler:
investment = base_investment x scaler - Cap at max: The lesser of
available_cash x max_pctandbalance x max_position_pct / 100 - Convert to contracts:
int(investment / market_price), capped attier_max_contracts - Enforce minimums: If the position cost is below
min_position_size($1 default), bump to the minimum viable quantity if cash permits. - Kelly criterion cap: If the risk manager recommends a lower size, the position is reduced accordingly. Uses quarter-Kelly (0.25 multiplier) for production safety.
On Polymarket, an additional check enforces the exchange's per-market orderMinSize — the minimum number of shares the CLOB will accept for that specific market.
Safety Caps
| Cap | Default | Description |
|---|---|---|
| Cash reserves | 5% | Must keep 5% of balance in cash at all times |
| Max single position | 30% | No single trade can use more than 30% of portfolio |
| Max concurrent positions | 5 | Cannot open more positions until one closes |
| Kelly multiplier | 0.25 | Quarter-Kelly for conservative production sizing |
| Min position size | $1.00 | Trades costing less than this are not placed |
Step 7: Risk Checks
Every trade must pass three independent layers of validation before execution. Any single failure at any layer blocks the trade entirely. For the full breakdown of all rules, see the Safeguards page.
Layer 1: Agent-Level Guards
| Check | Default | What Happens |
|---|---|---|
| Max concurrent positions | 5 | Cannot open more positions until one closes |
| Max position size | 30% of portfolio | Single position cannot exceed this |
| Cash reserves minimum | 5% | Must keep 5% of balance in cash at all times |
Layer 2: Backend Rules Engine (11 Rules)
| # | Rule | Default | What It Does |
|---|---|---|---|
| 1 | Trade size | $100 max | Rejects trades exceeding this cost |
| 2 | Capital per agent | $2,000 | Max capital a single agent can deploy |
| 3 | Daily loss limit | $500 | Kill switch — pauses agent if daily losses exceed this |
| 4 | Min confidence | 60% | Rejects trades below this confidence score |
| 5 | Allowed categories | All | Whitelist of tradeable categories |
| 6 | Blocked tickers | None | Blacklist of specific market tickers |
| 7 | Max positions | 10 | Concurrent open position limit |
| 8 | Duplicate prevention | On | Blocks duplicate position on same ticker |
| 9 | Opposing position | Blocked | Prevents YES and NO on same market |
| 10 | Max trades/day | Unlimited | Daily trade count limit per agent |
| 11 | Sell requires position | On | Cannot sell if you don't hold the position |
Layer 3: Account-Level Validation
| Check | Default | Scope |
|---|---|---|
| Max trades/day (global) | 50 | All agents combined |
| Global daily loss | $500 | Across all agents |
| Max trades per market | Unlimited | Any single market |
| Cooldown | 0 hours | Min time between same-market trades |
| Active hours | Always | UTC hours when trading is allowed |
| Daily AI budget | $50 | Global AI API spend cap |
You can configure all rules from Settings → Safeguards. Changes take effect on the next validation cycle — no restart or redeployment needed.
Step 8: Execution & Settlement
Once a trade passes all risk checks, it enters the execution pipeline. The agent never talks directly to the exchange — every order is intercepted by the backend, queued, and validated before execution.
Execution
- Training mode: Saved as a
papertrade. No exchange API call. P&L is calculated against real market prices. - Live mode (Kalshi): A real limit order is placed on Kalshi using cryptographically signed authentication.
- Live mode (Polymarket): A signed order is submitted to the Polymarket order book with the appropriate token ID, tick size, and neg-risk flag.
- Order interception: All orders are intercepted and validated by the orchestrator before execution.
Settlement
A settlement checker runs periodically in the backend and polls both exchanges for resolved markets.
- Kalshi: Markets settle internally when the event outcome is confirmed. Typically minutes to hours after the event. Winning contracts pay $1.00 USD.
- Polymarket: Uses the UMA Optimistic Oracle for decentralized resolution. Undisputed: 2 hours after proposal. Disputed: 4-6 days (UMA token holder vote). Winning tokens pay $1.00 USDC.
Example: Full Superforecaster Walkthrough
Here is how the Superforecaster analyzes a hypothetical market: "Will Bitcoin exceed $150K by June 30?" — currently trading at $0.22 YES (market implies 22% probability).
Step 3: Web Research
Perplexity Sonar Pro searches the live web and returns structured findings:
- Recent developments: Bitcoin trading at ~$97K. Spot ETF inflows averaging $400M/week. Halving supply shock already priced in (April 2024). Fed holding rates steady.
- Base rate data: Bitcoin achieved a 50%+ move in 3 of 8 comparable 6-month periods since 2017 (37.5%). A move from $97K to $150K requires a 55% gain.
- Stakeholder signals: Standard Chartered targets $150K year-end. Most sell-side analysts project $100-120K range.
- Arguments for YES: Structural institutional ETF demand, post-halving supply squeeze peaks 12-18 months after halving, improving regulatory clarity.
- Arguments for NO: 55% move needed in under 3 months, no imminent rate cuts, ETF inflows have plateaued from peak levels, sub-40% base rate for moves of this magnitude.
Step 4 Phase 1: Research Audit
- Contradiction found: One source says ETF inflows are "accelerating," another says "plateaued from peak levels." Both technically true — flows are positive but below all-time highs. Nuance noted.
- Suspicious claims: None — all statistics trace to verifiable sources.
- Hallucination check: No claims of events that have not yet occurred.
- Quality rating: 8/10 — strong sourcing, minor contradiction handled appropriately.
Step 4 Phase 2: Structured Decomposition
Sub-questions:
- Can Bitcoin sustain its trajectory? Base rate: post-halving bull runs sustained in 5 of 7 cycles (71%). Inside view: ETF inflows positive but decelerating. Assessment: 60%.
- Is a 55% move in ~3 months feasible? Base rate: achieved in 4 of 15 comparable periods (27%). Inside view: institutional demand is new but price already elevated. Assessment: 25%.
- Are there catalysts to accelerate? Inside view: no imminent rate cuts, no new ETF approvals pending. Assessment: 15% chance of sufficient catalysts.
Outside view anchor: 37.5% base rate for 6-month windows, adjusted to ~25% for a 3-month window.
Inside view adjustments: Positive institutional demand (upward), decelerating flows and no macro catalyst (downward). Net: roughly neutral.
Synthesis: Anchoring to the adjusted 25% base rate with neutral net adjustments. Final estimate: 24% probability. Confidence: 72%.
Output: probability: 0.24, confidence: 0.72, side: "NO"
Step 5: Edge Calculation
The model recommends the NO side. AI thinks P(NO) = 76%, market prices NO at $0.78. Edge = |0.76 - 0.78| = 2%. Required at 72% confidence = 6%. Since 2% < 6%, the edge is insufficient.
Result: SKIP. Neither side has sufficient edge. The market is approximately fairly priced according to the Superforecaster's analysis.
What If the Market Were at $0.12?
If YES traded at $0.12 (12% implied), the AI's 24% estimate gives a YES edge of 12%. At 72% confidence, the required edge is 6%. Since 12% > 6%, the trade proceeds. With a $1,000 account (3% base tier, 5% reserve = $950 available), base investment = $28.50. Edge scaler with 12% edge: 1.0 + (0.25 x 0.12) = 1.03x. Investment = ~$29.36. Contracts: int($29.36 / $0.12) = 244 contracts, capped at 250 by tier.
Polymarket Differences
The Superforecaster runs the same research and analysis pipeline on both exchanges. The differences are in market data sourcing, order execution, and settlement mechanics.
| Aspect | Kalshi | Polymarket |
|---|---|---|
| Data API | Kalshi REST API | Polymarket data API |
| Market ID | Ticker string (e.g., KXBTC-25MAR21) | conditionId (0x hex hash) |
| Prices | Dollar strings, bid/ask midpoint | outcomePrices array [yes, no] |
| Token IDs | N/A | Separate YES and NO token IDs per market |
| Order signing | Cryptographic signature authentication | Wallet-based signing |
| Tick size | 1 cent | Per-market (0.01 or 0.001) |
| Neg-risk | N/A | Markets may use neg-risk complement structure |
| Min order size | 1 contract | Per-market orderMinSize from API |
| Settlement | Centralized (Kalshi confirms) | UMA Optimistic Oracle (2h undisputed, 4-6d disputed) |
| Payout currency | USD | USDC |
On Polymarket, the agent also tracks additional market metadata: yes_token_id, no_token_id, tick_size, neg_risk, and order_min_size. These are required for constructing valid CLOB orders. Title-date extraction is used as a sanity check for multi-resolution markets where the data API's endDate can be unreliable.
All Configurable Settings
Every setting listed here can be adjusted per bot. Agent-level settings are in the bot's configuration panel. Account-level settings are in Settings → Safeguards.
AI & Research
| Setting | Default | Description |
|---|---|---|
| Reasoning Model | Claude Opus 4.6 | User-selectable from bot settings dropdown |
| Research Model | Perplexity Sonar Pro | Research model (always Perplexity Sonar Pro) |
| AI Temperature | 0.0 | Model temperature (deterministic output) |
| AI Max Tokens | 4,000 | Max tokens per model response |
| AI Timeout | 120s | Timeout per model call |
| Daily AI Budget | $10 | Daily spending limit on AI API calls |
| Reanalyze Cooldown | 6 hours | Min hours between same-market analyses |
Market Scanning
| Setting | Default | Description |
|---|---|---|
| Min Volume | 50 | Minimum volume (contracts or USDC) to be eligible |
| Max Expiry | 7 days | Markets expiring beyond this window are skipped |
| Max Markets per Cycle | 10 | Top N markets by volume analyzed per cycle |
| Allowed Categories | All | Comma-separated category whitelist (empty = all) |
Edge Thresholds
| Setting | Default | Description |
|---|---|---|
| High Confidence Edge | 4% | Required edge when confidence >= 80% |
| Medium Confidence Edge | 6% | Required edge when confidence 60-79% |
| Low Confidence Edge | 8% | Required edge when confidence < 60% |
| Min Confidence | 0.50 | Below this confidence, all trades are skipped |
Position Sizing & Risk
| Setting | Default | Description |
|---|---|---|
| Tier system | Auto | Base/max percentages adapt to account size (see tier table) |
| Kelly Multiplier | 0.25 | Quarter-Kelly for conservative production sizing |
| Max Position % | 30% | Max % of portfolio per position |
| Max Positions | 5 | Max concurrent open positions |
| Min Position Size | $1.00 | Minimum trade cost to place an order |
| Cash Reserve % | 5% | Minimum cash reserve maintained at all times |
The Superforecaster's key advantage is evidence grounding. Every prediction is anchored to sourced research rather than model intuition. This makes it particularly strong on markets where breaking news or fresh data shifts probability in ways that stale training data would miss. The structured decomposition methodology — decompose, base rate, inside/outside view, synthesize — produces calibrated forecasts that resist the overconfidence typical of unconstrained LLM predictions.
Prediction market trading involves real financial risk. Even with comprehensive research and structured methodology, markets can move against well-reasoned positions. The Superforecaster is designed for responsible, risk-managed trading with multiple safety layers. Past performance does not guarantee future results. Always start in Training mode and only switch to Live trading with capital you can afford to lose.