Strategies
Council V2
Sequential 5-agent debate with Trader decision gate, live research, and edge filtering.
Council V2 is the flagship trading strategy powering AgentNash. It deploys a sequential 5-agent adversarial debate where each AI builds on the previous agent's output — no parallel groupthink, no single point of failure. A dedicated research phase gathers live web intelligence before the debate begins, and a final Trader agent acts as the decision gate: no trade executes unless the math checks out and the Trader confirms edge.
Council V2 is available on both Kalshi (CFTC-regulated US exchange) and Polymarket (decentralized prediction market on Polygon). The AI analysis pipeline is identical across both — only the execution layer and edge thresholds differ.
Key Differences from Council V1
V2 is a ground-up rewrite that replaces the V1 parallel ensemble with a sequential debate architecture. Every design choice targets a specific weakness observed in V1.
| Aspect | Council V1 | Council V2 |
|---|---|---|
| Architecture | Parallel — agents run simultaneously | Sequential — each agent sees and responds to prior output |
| Agent Count | 6 agents (incl. News Analyst) | 5 agents + dedicated Research phase |
| Research | RSS feeds + optional Perplexity | Perplexity Sonar Deep Research on every market |
| Forecaster Weight | 30% | 35% |
| Bull Model | OpenAI o4-mini | Claude Opus 4.6 |
| Bear Model | Gemini 3.1 Pro Preview | Claude Sonnet 4.6 |
| Risk Manager | DeepSeek V3.2 | Claude Opus 4.6 |
| Trader | Grok 4.1 Fast | Claude Sonnet 4.6 |
| Edge (High Conf) | 6% | 4% (Polymarket) / 6% (Kalshi) |
| Edge (Medium Conf) | 8% | 6% (Polymarket) / 8% (Kalshi) |
| Edge (Low Conf) | 12% | 10% (Polymarket) / 12% (Kalshi) |
| News Analyst | Dedicated agent (Claude) | Removed — replaced by Perplexity research phase |
| Consensus Gate | 3 of 5 agents must agree | Trader has final authority (Risk Manager advises) |
The Pipeline
Every market opportunity flows through 10 stages. The pipeline is deterministic — the same inputs always produce the same sequence of checks and gates. A trade only executes when every stage passes.
Market Ingestion
Fetch active binary markets from the exchange API. Filter by volume, expiry, order book status, and price bounds (3%-97%).
Category Inference
Classify each market by scanning the title for known keywords — Sports, Crypto, Economics, Politics, Weather, Tech, or Other. Optional category allowlist narrows focus.
Cooldown & Dedup
Skip markets already analyzed within the cooldown window (default: 6 hours). Prevents wasting AI credits on unchanged conditions.
Research (Perplexity Sonar Deep Research)
Gather live web intelligence for each market — recent developments, base rate data, stakeholder signals, and arguments for/against. Runs in parallel batches of 3 with rate limiting.
Forecaster Debate
Grok 4.1 Fast estimates the true YES probability using a 6-step method: research audit, base rate, current conditions, market structure analysis, calibration adjustment, and EV check.
Bull & Bear Debate
Claude Opus 4.6 argues the YES case with 3-5 evidence-backed arguments. Then Claude Sonnet 4.6 sees the Bull's case and counters every argument. The Bear also checks if the Bull fabricated any data.
Risk Manager
Claude Opus 4.6 evaluates both sides, calculates EV for BUY YES and BUY NO, picks the better side, recommends position sizing via fractional Kelly, and issues a should_trade verdict.
Trader Decision Gate
Claude Sonnet 4.6 reviews the full debate transcript and makes the final BUY or SKIP decision. Default stance is to BUY when edge exists — only skips with a specific, concrete reason.
Edge Filter & Position Sizing
Verify the AI's edge over market price meets the confidence-tiered threshold. Calculate position size using tier-based rules, Kelly criterion cap, and exchange minimum order size.
Order Execution
Route the order through the intercept pipeline. Training mode saves a paper trade. Live mode places a real limit order on the exchange.
The 5 Agents + Trader
V2 uses models from two providers — xAI and Anthropic — routed through OpenRouter. Each agent has a specific role in the sequential debate. The Forecaster, Bull, and Bear contribute probability estimates with confidence-adjusted weights. The Risk Manager and Trader do not contribute to probability aggregation — they govern sizing and execution.
| Role | Model | Weight | Purpose |
|---|---|---|---|
| Forecaster | Grok 4.1 Fast (xAI) | 35% | Anchors the debate — estimates true P(YES) using base rates and structured reasoning |
| Bull Researcher | Claude Opus 4.6 (Anthropic) | 25% | Builds the strongest evidence-based YES case with 3-5 arguments and probability floor |
| Bear Researcher | Claude Sonnet 4.6 (Anthropic) | 20% | Counters every Bull argument — estimates probability ceiling and flags fabricated data |
| Risk Manager | Claude Opus 4.6 (Anthropic) | — | Calculates EV for both sides, assigns risk score (1-10), recommends sizing via Kelly |
| Trader | Claude Sonnet 4.6 (Anthropic) | — | Final decision gate — BUY or SKIP with limit price and position size |
Weights apply only to probability aggregation. The ensemble probability is a confidence-adjusted weighted average: each agent's weight is multiplied by its self-reported confidence (floored at 0.1) before averaging. This means a high-confidence Forecaster naturally dominates a low-confidence Bull.
Research Phase: Perplexity Sonar Deep Research
Before the debate begins, every market undergoes a dedicated research step using Perplexity Sonar Deep Research — an agentic multi-step web search model. This replaced V1's RSS-based News Analyst with live, targeted intelligence gathering.
For each market, Perplexity is prompted to gather six categories of information:
- Recent Developments — Key news from the last 7 days with dates, sources, and specific facts. If the event has not happened yet, it must say so explicitly.
- Base Rate Data — Historical frequency of similar events with sample sizes.
- Key Stakeholders & Signals — Statements from decision-makers, experts, or officials. Scheduled events that could force resolution.
- Arguments for YES — Strongest evidence and reasoning supporting YES.
- Arguments for NO — Strongest evidence and reasoning supporting NO.
- Expert & Statistical Signals — Domain expert opinions, statistical models, polls, and historical patterns. Explicitly excludes prediction market prices to avoid circular reasoning.
Research runs in parallel batches of 3 markets with a 600-second timeout per request and automatic retry on server errors (up to 3 attempts with exponential backoff). The generous timeout accommodates sonar-deep-research, which can spend several minutes gathering and synthesizing sources for complex questions. The research output is injected into every subsequent agent's prompt as shared context — clearly labeled as pre-gathered data that may contain errors, prompting agents to cross-check.
The research prompt includes today's date and instructs Perplexity to never fabricate outcomes. If an event is scheduled for today or later, it must state that no confirmed result exists. This prevents hallucinated resolution data from contaminating the debate.
Agent Roles in Detail
1. Forecaster (Grok 4.1 Fast)
The Forecaster anchors the entire debate. It receives the market data and Perplexity research, then applies a strict 6-step analytical method:
- Research Audit — Note contradictions or suspicious claims in the research. State what can be trusted.
- Base Rate — Historical frequency of this type of event, with specific sample sizes.
- Current Conditions — Specific, verifiable evidence that shifts probability from the base rate.
- Market Structure — Is this a single binary question or part of a multi-outcome event?
- Calibration — Adjust toward the base rate when uncertain. Overconfidence is the default failure mode.
- EV Check — Compare estimated probability to market price. Only flag edge if the difference exceeds 5 percentage points.
Output: probability (0.0-1.0), confidence (0.0-1.0), base_rate, side (yes/no), key_factors, and step-by-step reasoning.
Anti-hallucination rule: Must not fabricate base rates, statistics, or studies. When hard data is unavailable, reason from first principles and explicitly say so.
2. Bull Researcher (Claude Opus 4.6)
The Bull receives the market data, Perplexity research, and the Forecaster's probability estimate. Its mandate is to construct the strongest possible YES case — but with strict evidentiary standards:
- Thesis — One sentence on why this will happen.
- 3-5 Key Arguments — Each must cite specific evidence from the research or verifiable first principles. No fabricated statistics.
- Probability Floor — The minimum reasonable YES probability even if the Bear is right about some things.
- Catalysts — Near-term events (1-7 days) that could push probability higher. Only verifiable or scheduled events.
Key constraint: It is better to make 2-3 honest arguments than 5 fabricated ones. The prompt explicitly prohibits inventing future events or statistics.
3. Bear Researcher (Claude Sonnet 4.6)
The Bear sees everything the Bull produced and must directly counter it. This is the adversarial core of the system — the Bear is specifically instructed to check whether the Bull fabricated any data and call it out.
- Counter-Thesis — One sentence on why this will not happen.
- Counter-Arguments — 3-5 reasons directly addressing the Bull's specific claims.
- Probability Ceiling — The maximum reasonable YES probability even if the Bull is right about some things.
- Risk Factors — What could go wrong for YES holders?
- Structural Analysis — Base rates, market mechanics, and structural arguments (not narrative-driven).
Key constraint: Arguments must be statistical and structural. Single observations are treated as high-variance noise — the Bear must use base rates and sample sizes.
4. Risk Manager (Claude Opus 4.6)
The Risk Manager receives the full debate output and portfolio context. It performs a quantitative evaluation of both sides:
- True Probability — Pick a single P(YES) anchored on the Forecaster, adjusted by Bull/Bear bounds. One number — no rambling.
- Expected Value (both sides) —
EV(BUY YES) = (true_prob x $1.00) - market_price_yes.EV(BUY NO) = ((1 - true_prob) x $1.00) - market_price_no. Pick the side with higher positive EV. - Risk Score — Rate 1-10 across liquidity, time risk, information quality, and model disagreement.
- Position Size — Fractional Kelly:
size_pct = (edge / odds) x 0.25. Always round down. - Edge Durability — Will this edge persist? Fast-moving news means trade smaller.
Critical rule: should_trade must be true if best EV exceeds $0.03 per share. The Risk Manager cannot override the math with subjective conservatism — it uses recommended_size_pct to manage risk instead.
5. Trader (Claude Sonnet 4.6)
The Trader is the final decision gate. It receives every agent's complete output and makes the authoritative BUY or SKIP call. Its default stance is to execute when edge exists — it should only skip with a concrete, specific reason.
Decision rules hardcoded into the Trader's prompt:
- If Risk Manager says
should_trade=trueAND Forecaster shows >5pp edge, default is BUY. - Bull-Bear disagreement is expected by design — it is NOT a reason to skip.
- If Forecaster and Risk Manager agree on direction, that is strong conviction — BUY.
- Only SKIP when: edge <5pp, market is mispriced in the opposite direction, or a specific flaw in the analysis is identified (e.g., Bull fabricated data).
- Set limit price at or slightly below estimated fair probability for the traded side.
- Size: 5-10% for marginal edge (5-8pp), 15-25% for strong edge (>10pp).
Fallback: If the Trader returns empty or invalid JSON but the Risk Manager approved the trade, the system falls back to the Risk Manager's recommended side and executes automatically.
Edge Filtering
After the debate concludes, the system checks whether the AI ensemble found sufficient edge over the market price. Edge is the absolute difference between the AI's probability estimate for the traded side and the current market price. The required edge threshold varies by the Forecaster's confidence level — higher confidence permits thinner edges.
| Confidence Tier | Forecaster Confidence | Polymarket Edge | Kalshi Edge |
|---|---|---|---|
| High | >= 80% | 4% | 6% |
| Medium | >= 60% | 6% | 8% |
| Low | < 60% | 10% | 12% |
A minimum ensemble confidence of 50% is required regardless of edge size. Below 50% confidence, the trade is always rejected — the agents are not sufficiently certain about their own estimates.
Polymarket's tighter thresholds reflect its deeper liquidity and narrower spreads compared to Kalshi. The same strategy can trade more frequently on Polymarket because smaller edges are still profitable after execution costs.
Position Sizing
Position sizing uses a tier-based system scaled by account balance. Smaller accounts take proportionally larger positions (up to 40% of balance) because minimum order sizes require it. Larger accounts are constrained to avoid concentrated risk.
Sizing Tiers
| Account Balance | Base % | Max % | Max Contracts |
|---|---|---|---|
| < $100 | 20% | 40% | 10 |
| < $1,000 | 5% | 15% | 50 |
| < $10,000 | 3% | 8% | 250 |
| < $100,000 | 2% | 5% | 1,000 |
| $100,000+ | 1% | 3% | 5,000 |
Sizing Formula
The base percentage is scaled by edge strength using a Kelly-inspired multiplier:
edge = ai_probability - market_price(signed, for the traded side)scaler = 1.0 + (kelly_multiplier x edge), clamped between 0.1x and 3.0xinvestment = available_cash x base_pct x scaler- Cap at
max_pct,max_contracts, andmax_position_pct(default 30% of portfolio) - If the Risk Manager recommended a specific
recommended_size_pct, cap at that value (Kelly cap) - Enforce exchange minimum order size and minimum position value (default $1.00)
The kelly_multiplier defaults to 0.25 (quarter-Kelly). This is deliberately conservative — full Kelly sizing is theoretically optimal but assumes perfect probability estimates, which no AI system achieves. Quarter-Kelly reduces variance while preserving most of the expected growth.
Pre-Trade Guards
- Position count limit: Maximum 5 concurrent open positions (configurable in Settings).
- Cash reserve: 5% of balance is always held back. No trade can dip into the reserve.
- Minimum position size: Orders below $1.00 are rejected — not worth the execution overhead.
- Exchange minimum: Each market has a CLOB minimum order size (from the API). Orders below this are rounded up or rejected.
Example Walkthrough
A concrete example of the full pipeline in action:
Ingest
Market: "Will BTC exceed $120K by April 15?" — YES price $0.35, NO price $0.65, volume $180K USDC, 5 days to expiry. Passes all filters.
Research
Perplexity gathers: BTC at $108K, ETF inflows accelerating, halving supply shock still unfolding, macro uncertainty from Fed rate decision next week. No confirmed breakout above $115K yet.
Forecaster
Grok estimates P(YES) = 0.22 (22%), confidence 0.75. Base rate for 11%+ BTC moves in 5 days is ~8%. Current momentum and ETF flows push it higher, but $120K is a major psychological resistance.
Bull Researcher
Claude Opus argues YES: ETF inflows at record pace, halving supply constraint, historical precedent of rapid moves near round numbers. Probability floor: 0.15. Catalyst: Fed decision could trigger risk-on rally.
Bear Researcher
Claude Sonnet counters: $120K has never been tested, 11% move in 5 days is 92nd percentile, Fed uncertainty cuts both ways, ETF flows can reverse quickly. Probability ceiling: 0.30. Calls out Bull's catalyst as speculative.
Risk Manager
Claude Opus calculates: P(YES) = 0.24, EV(BUY NO) = (0.76 x $1.00) - $0.65 = +$0.11, EV(BUY YES) = (0.24 x $1.00) - $0.35 = -$0.11. Recommends BUY NO, should_trade=true, size 8%.
Trader
Claude Sonnet confirms: BUY NO at limit $0.72. Edge is 11pp on the NO side, Risk Manager approved, Forecaster and Bear align. Position size: 8% of available capital.
Edge Filter
Forecaster confidence 0.75 (medium tier). Required edge: 6%. Actual edge: |0.76 - 0.65| = 11%. Passes.
Position Sizing
Account balance $500 (tier 2: 5% base, 15% max). Scaler = 1.0 + (0.25 x 0.11) = 1.03x. Investment = $475 x 0.05 x 1.03 = $24.46. At $0.65/share = 37 shares. Risk Manager cap: 8% = $38, no cap hit.
Execution
Limit order placed: BUY 37 NO shares at $0.72. Routed through intercept pipeline. If live mode, order hits the exchange CLOB.
Polymarket-Specific Behavior
Polymarket operates as a decentralized prediction market on the Polygon blockchain. While the AI analysis pipeline is identical to Kalshi, the execution layer has significant differences.
Market Data
Market data is fetched from Polymarket's data API. The bot requests active, open binary markets sorted by volume descending, filtered by:
- Order book enabled (CLOB markets only)
- Binary market type (excludes scalar/combo)
- Minimum volume threshold (default: 50 USDC)
- Expiry within the configured window (default: 7 days)
- YES price between $0.03 and $0.97 (no edge possible at extremes)
Order Signing & CLOB
Polymarket uses an on-chain Central Limit Order Book (CLOB) with cryptographic signing for order authentication. Orders are signed by the wallet's private key — typically a MetaMask-derived key. The bot handles order construction, signing, and submission through the exchange API.
UMA Oracle Settlement
Polymarket markets settle via the UMA Optimistic Oracle. Resolution is proposed on-chain, and there is a dispute window before finalization. This means settlement can take longer than Kalshi's centralized resolution, and in rare cases, resolutions can be disputed.
Polymarket vs Kalshi Comparison
| Aspect | Polymarket | Kalshi |
|---|---|---|
| Market API | Polymarket data API | Kalshi REST API |
| Price Format | Dollars (0.0-1.0) | Cents (1-99) |
| Order Auth | Cryptographic wallet signing | Cryptographic signature authentication |
| Settlement | UMA Optimistic Oracle (on-chain) | Centralized (Kalshi resolves) |
| Currency | USDC on Polygon | USD |
| Edge (High Conf) | 4% | 6% |
| Edge (Medium Conf) | 6% | 8% |
| Edge (Low Conf) | 10% | 12% |
| Neg Risk | Supported (multi-outcome markets) | N/A |
| Token IDs | Separate YES/NO token IDs per market | Single ticker per market |
Models & Costs
All AI calls route through OpenRouter, which provides a single API key for models across xAI and Anthropic (plus Perplexity for research). Temperature is set to 0.0 across all agents for deterministic output. Max tokens per call: 4,000 (debate agents) or 8,000 (research). Timeout: 120 seconds per debate agent, 600 seconds for research (deep-research can take several minutes per market).
| Agent | Model | Provider | Role in Pipeline |
|---|---|---|---|
| Research | Perplexity Sonar Deep Research | Perplexity | Live web search and evidence gathering |
| Forecaster | Grok 4.1 Fast | xAI | Probability estimation with base rate anchoring |
| Bull Researcher | Claude Opus 4.6 | Anthropic | Evidence-based YES advocacy |
| Bear Researcher | Claude Sonnet 4.6 | Anthropic | Adversarial counter-arguments |
| Risk Manager | Claude Opus 4.6 | Anthropic | EV calculation and position sizing |
| Trader | Claude Sonnet 4.6 | Anthropic | Final BUY/SKIP decision gate |
A full pipeline run (research + 5 agents) typically costs $0.10-$0.30 depending on market complexity and response length. The daily AI budget (default: $300.00) caps total spending across all markets analyzed in a 24-hour window.
All Configurable Settings
Every setting can be configured per bot from the dashboard. Changes take effect on the next cycle.
Market Filtering
| Setting | Default | Description |
|---|---|---|
| Min Volume | 50 | Minimum market volume (USDC/contracts) to consider |
| Max Expiry Days | 7 | Skip markets expiring beyond this window |
| Allowed Categories | All | Comma-separated list of categories to trade (empty = all) |
| Max Markets per Cycle | 10 | Top N markets by volume to analyze each cycle |
Position Sizing & Risk
| Setting | Default | Description |
|---|---|---|
| Max Positions | 5 | Maximum concurrent open positions |
| Kelly Multiplier | 0.25 | Fraction of Kelly criterion (quarter-Kelly) |
| Max Position % | 30 | Maximum single position as % of portfolio |
| Min Position Size | $1.00 | Orders below this value are rejected |
| Cash Reserve | 5% | Percentage of balance always held back |
AI & Budget
| Setting | Default | Description |
|---|---|---|
| Daily AI Budget | $300.00 | Maximum daily spend on AI API calls |
| Reanalyze Cooldown | 6 hours | Minimum hours between analyzing the same market |
| AI Temperature | 0.0 | All agents use temperature 0 for deterministic output |
| AI Max Tokens | 4,000 / 8,000 | Debate agents / research (deep-research needs headroom) |
| AI Timeout | 120s | Per-agent timeout (600s for research — deep-research can run minutes) |
Edge Thresholds (Polymarket)
| Setting | Value | Description |
|---|---|---|
| edge_high_confidence | 4% | Required edge when forecaster confidence >= 80% |
| edge_medium_confidence | 6% | Required edge when forecaster confidence >= 60% |
| edge_low_confidence | 10% | Required edge when forecaster confidence < 60% |
| min_confidence | 50% | Ensemble confidence floor — below this, always skip |
Edge Thresholds (Kalshi)
| Setting | Value | Description |
|---|---|---|
| edge_high_confidence | 6% | Required edge when forecaster confidence >= 80% |
| edge_medium_confidence | 8% | Required edge when forecaster confidence >= 60% |
| edge_low_confidence | 12% | Required edge when forecaster confidence < 60% |
| min_confidence | 50% | Ensemble confidence floor — below this, always skip |
Risk Disclaimer
Council V2 is an experimental AI trading system. Past performance does not guarantee future results. AI models can hallucinate, fabricate data, or produce overconfident estimates despite the safeguards described above. Prediction markets carry inherent risk of total loss on any individual position. Never deploy capital you cannot afford to lose. Always start in Training mode to evaluate performance before switching to live trading. See Safeguards for the full safety architecture.