CQ-GESUS Technical Overview v1.0

Technical Overview · Version 1.1 · June 2026

CQ-GESUS: A Parser-First Intelligence &
Forecasting System for Counter-Strike 2

CounterQuant Generalised Esports Sequence & Uncertainty System

Author: CounterQuant Published: June 2026 Status: Live — shadow-betting, intelligence pages active Version: 1.1 — June 2026 refresh · live track record growing

Disclaimer: This document is for informational and research purposes only. Nothing herein is betting advice, a tip service, or a solicitation to wager. All bets shown on this platform are paper (virtual) bets. Past simulated performance does not guarantee future results. Esports betting carries real financial risk; bet only what you can afford to lose.

Contents

Executive Summary
Why CS2 Prediction Is Hard
System Architecture
Data Foundation & Parser-First Governance
Rating Methodology — CQE, CQR & Team Elo
Forecasting Methodology — the GESUS Family
Validation & Honest Results — match outcome, map-level, in-round
Shadow-Betting Environment & Audit Trail
Risk Management & Failure Modes
Roadmap & Conclusion

Executive Summary

One parsed-demo dataset, held to a research standard

CounterQuant is a Counter-Strike 2 intelligence platform built on a single foundation: every statistic we publish is derived from demo files we parse ourselves, never from scraped third-party numbers. That one dataset feeds three layers — proprietary player and team ratings (CQE, CQR, Team Elo), a family of match-outcome forecasting models (the GESUS suite), and a fully public paper shadow-betting ledger that records exactly how those forecasts would have performed against real markets.

The forecasting layer is a regime-aware ensemble. For each upcoming match a single model is selected: NEXUS when a liquid betting market exists for that match (it blends our model with the live market price), otherwise ORACLE, our market-independent base model. Every prediction is written once, frozen, timestamped, and attributed to the exact model and version that made it — so the number you see never silently changes.

Core design principles: Parser-first data over scraped data. Point-in-time, leakage-free features over hindsight. Honest per-tier metrics over a single flattering headline number. One immutable prediction per match over moving goalposts. No paid product until a live track record earns it.

16,633,076

Demo kill events

48,468

Matches with parsed demos

1,380

Players tracked

11,083

Teams tracked

509+

Engineered features

Demo parser generation

GESUS model family

Tiers covered (T1–T3)

Section 1

Why CS2 Prediction Is Hard

1.1 The variance problem

Counter-Strike is a high-variance game decided in short series. A best-of-one is close to a coin-flip between evenly matched sides; even a best-of-three can swing on a handful of clutch rounds, an eco that hits, or one player's off-day. Published esports-prediction work tends to report accuracy on cherry-picked top-tier datasets and quietly omits the long tail of unpredictable lower-tier matches. We refuse to do that — we predict all three tiers and report each separately.

1.2 Why scraped stats are not enough

Most CS2 analytics sites display numbers scraped from match pages. Those numbers are inconsistent across sources, miss per-round context, and cannot be audited. CounterQuant instead downloads and parses the actual demo files, reconstructing every kill, every round of damage, every economy state and every side-switch from the ground truth of the game server. This is the hard, slow path — and it is the moat. The metrics that drive our ratings and models simply cannot be reproduced by anyone who has not built the parsing pipeline.

1.3 Roster churn and a shifting meta

CS2 lineups change constantly, maps rotate in and out of the active pool, and balance patches alter how the game is played. A model that treats a team as a fixed entity will misfire the moment a star player is benched. Our features are built point-in-time: every match is scored using only what was known before it started, with recency-weighted form and roster-aware aggregation, so the system tracks teams as they actually are on the day — not as they were a season ago.

Section 2

System Architecture

2.1 End-to-end pipeline

CQ-GESUS — Demo-to-Forecast Pipeline ┌──────────────────────────────────────────────────────────────────┐ │ SOURCE — raw .dem demo files (primary data source) │ │ + match metadata only: scheduling / indexing (HLTV) │ └───────────────────────────────┬──────────────────────────────────┘ ↓ download ┌───────────────────────────────┴──────────────────────────────────┐ │ PARSE FLEET — distributed CS2 workers · demoparser2 · v4 │ │ kills · damage · player-rounds · rounds · economy · sides │ └───────────────────────────────┬──────────────────────────────────┘ ↓ validated only ┌───────────────────────────────┴──────────────────────────────────┐ │ GOLD TABLES — the single canonical source for all analytics │ │ demo_kill_events · demo_damage_events · demo_player_rounds · │ │ demo_rounds (scraped data is isolated, never mixed in) │ └──────────────┬───────────────────────────────┬───────────────────┘ ↓ ↓ ┌─────────────┴────────────┐ ┌────────────────┴──────────────────┐ │ RATING ENGINES │ │ FEATURE ENGINE — 509+ signals │ │ CQE · CQR · Team Elo │ │ point-in-time · leakage-free │ └──────────────────────────┘ └────────────────┬──────────────────┘ ↓ ┌───────────────────────────────────────────────┴──────────────────┐ │ GESUS MODELS — gradient-boosted ensembles + calibration │ │ ORACLE · PHANTOM · MIRAGE · NEXUS · APEX │ │ per-match selector → one model makes the call │ └───────────────────────────────┬──────────────────────────────────┘ ↓ once per match, frozen ┌───────────────────────────────┴──────────────────────────────────┐ │ LEDGER — immutable prediction + one Kelly-sized shadow bet │ │ settled against the real result · published live on the site │ └───────────────────────────────────────────────────────────────────┘

2.2 Parse fleet

CS2 demos are parsed by a cloud worker fleet (10+ concurrent instances) running demoparser2 under a pinned parser version (v4). Only demos that pass validation are written to the gold tables; partial or corrupt parses are rejected rather than silently half-ingested. Versioning the parser means we always know exactly which code produced any given row. Historical CS:GO demos are processed on a separate dedicated server with its own parser version to prevent cross-game contamination of the CS2 gold tables.

2.3 Rating engines vs forecasting models

Two consumers read the gold tables. The rating engines (Section 5) produce the player and team ratings shown across the site. The feature engine turns the same gold rows into the 509+ point-in-time signals that the GESUS models (Section 6) consume to forecast match outcomes. Both read the identical canonical source, so a player's rating and the model's view of them never disagree about the underlying facts.

Section 3

Data Foundation & Parser-First Governance

3.1 Coverage

All analytics derive from demos we parse and the match metadata we scrape for scheduling only. Live coverage (estimated row counts, refreshed every 10 minutes):

Gold dataset	What it records	Approx. rows
demo_kill_events	Every kill: attacker, victim, weapon, round, side	16,633,076
demo_damage_events	Per-round damage exchanges	63,453,184
demo_player_rounds	Per-player, per-round state (the basis of every player metric)	24,589,028
demo_rounds	Per-round outcome, economy, and side	2,313,333
matches	Match schedule, teams, tier, result (2006-05-18 → 2026-07-01)	119,957

Matches with at least one fully-parsed demo driving their stats: 48,468 · teams tracked: 11,083 · players tracked: 1,380.

3.2 Parser-first governance

A strict data-isolation rule governs everything downstream: only validated, v4-parsed demo data may drive any player or match metric, rating, or model. Scraped third-party numbers (such as external rating figures) are physically separated — stored under an external_ namespace — and are never allowed to feed a rating, an achievement, or a forecast. This guarantees that no model can accidentally learn from, or be contaminated by, numbers we did not compute from ground truth.

Why this matters: it makes the entire system internally consistent and auditable. Every published number traces back to a demo file and a known parser version. There is no opaque blend of "our data and theirs".

3.3 Feature engineering philosophy

The feature engine is the most heavily protected part of the system. We publish the categories and approximate counts, not the specific transformations. The 509+ point-in-time features span six broad families:

Strength & Rating

~110

Team Elo & per-map Elo (point-in-time), roster CQE/CQR aggregates, strength-of-schedule

Recent Form

~95

Recency-weighted win/round form across multiple windows, momentum, streak structure

Economy

~85

Per-team economy attribution from gold rounds: eco/force/full-buy conversion, save discipline

Map & Pool

~90

Per-map performance, map-pool overlap, T/CT side balance, veto-relevant signals

Head-to-Head

~70

Direct and common-opponent history, recency-decayed, tier-adjusted

Context

~60

Tier, LAN vs online, series format, event stage, schedule density

Engineering principle: every feature is computed strictly from information available before the match starts, using a chronological state walk. The state is updated only after a match emits its feature row, which structurally prevents look-ahead leakage.

3.4 Data-quality controls

Real demo data is never perfectly clean. Controls include: a delete-before-insert guarantee on kill events (the gold tables carry no silent duplicates); per-round side resolution so a player's team is read from the side they actually played each round, correctly handling halftime and overtime swaps; point-in-time Elo snapshots taken before each match; and tolerance for known parser edge-cases (e.g. warmup-round artefacts) rather than letting them corrupt aggregates. Settlement of bets resolves the winner by team identity and name, which absorbs the occasional duplicate team-ID emitted upstream.

Section 4

Rating Methodology — CQE, CQR & Team Elo

CounterQuant publishes three proprietary ratings. We describe their design principles here; the exact weightings and constants are vault-tier (see the tiered-disclosure model).

4.1 CQE — player impact rating

CQE is an opponent-adjusted player rating computed only from validated demo data. It combines round-survival, multi-kill and trade impact, per-round damage, and consistency into a single number — then applies Bayesian shrinkage so that a player with a small sample is pulled toward a sensible prior rather than rocketing to the top off a handful of pug rounds. The headline failure we engineered against: a low-sample player should never outrank a proven elite simply for having one hot series.

4.2 CQR — eligibility-gated leaderboard rating

CQR turns CQE into a ranked, percentile-tiered leaderboard, but only for players who clear an eligibility gate (a verified competitive identity plus minimum match and round volume). The gate is the clean discriminator that keeps mangled or one-off accounts off the board, so the leaderboard reflects real established players.

4.3 Team Elo + lobby context

Team Elo is a results-driven, opponent-adjusted rating with a tier-weighted update — beating a top team at a LAN moves your rating more than beating a weaker side online. It is reconstructed point-in-time from match history, and a lobby-strength context term scales a team's effective rating by the quality of the field it is competing in. Team Elo is also the backbone strength feature consumed by the forecasting models.

Section 5

Forecasting Methodology — the GESUS Family

5.1 Base learner family

All GESUS forecasting models are built on gradient-boosted tree ensembles with a post-hoc probability calibration stage. This family was chosen over deep learning for reasons specific to the problem: it handles the heavy feature correlation inherent in our 509+ signals natively; it is sample-efficient given a finite history of professional matches; and its feature contributions are interpretable, which lets us publish the top-3 drivers behind every prediction.

5.2 The model family

The family splits into three groups by task:

Model	Task	Role
MATCH OUTCOME — pre-match win probability
ORACLE	Full feature set incl. clean economy + Team Elo	Market-independent base — the everyday forecaster
NEXUS	ORACLE ⊕ live market price (log-odds blend)	Selected when a liquid betting market exists
APEX	Earlier-generation baseline (290 features)	Deployed baseline; retained for provenance comparison
MAP SPECIALISTS — per-map outcome prediction
MIRAGE	Per-map win probability (7 independent map models)	Feeds veto predictor + map-level intelligence panels
IN-ROUND SPECIALISTS — within-match signal extraction
CLUTCH	Clutch-situation outcome (kill sequence replay)	Player clutch rating signal; feeds achievement engine
SPECTER	Round kill/death sequence classification	Per-round performance signal; economy context
CIPHER	Player performance regression (CQE prediction)	Roster strength signal for match-outcome models

5.3 Per-match model selection

For each upcoming match, exactly one model makes the call — we never stack several models onto the same match. If a liquid betting market exists for the match, NEXUS is selected: it blends ORACLE's view with the live market price in log-odds space, weighted by market liquidity, because a sharp market is itself a strong predictor. Otherwise ORACLE makes the market-independent call. The chosen model name and version are stored on the immutable prediction, so you always know who made it.

5.4 Leakage-free validation discipline

Models are trained on a strict chronological train/validation/test split — no shuffling, no future information. Features are generated by the same point-in-time walk used at serving time, giving zero train/serve skew: the feature vector a match receives in production is identical to the one it would have received in training. Calibration is fit on validation data only; the test period is held out until final evaluation.

5.5 Overfitting controls

With 509+ features and a finite match history, controlling overfitting is the central concern. The ensembles use conservative tree depth, minimum-leaf population floors, row and column subsampling, and early stopping on a held-out validation set — training halts when validation loss stops improving, and the selected iteration count sits well below the budget, confirming the regularisation is binding rather than cosmetic.

Section 6

Validation & Honest Results

The number that matters is per-tier AUC, not a single headline. A blended all-tiers figure is dragged down by Tier-3, which makes up the bulk of matches and is inherently the noisiest. We report each tier separately rather than hiding behind an average.

6.1 Match-outcome model (APEX baseline / ORACLE)

Segment	Held-out test AUC	Reading
Tier 1	0.70–0.72	The genuine, usable signal — top-tier matches are the most predictable
Tier 2	0.64–0.67	Moderate signal; smaller team-history samples
Tier 3	0.63–0.66	Noisy by nature; bet sizing is capped hardest here

Calibrated on isotonic regression fit over the validation fold. Deployed baseline (APEX) achieved calibrated test AUC 0.702 all-tiers. ORACLE adds full economy + Team Elo history signals to extend that baseline.

6.2 Map-level model (MIRAGE family)

The MIRAGE family runs 7 independent map-specialist models. Because each map is a different tactical game, a single model trained across all maps loses resolution. Each specialist uses the same feature categories but with map-specific CT/T side win rates, economy conversion, and player lineup signals.

Map	Best version AUC	Note
de_nuke	0.641	CT-side dominance and side balance are strong predictors
de_inferno	0.632	Economy conversion rate dominant
de_dust2	0.635	High data volume; stable estimates
de_ancient	0.611	Newer map — growing training set
de_overpass	0.613	Limited data (≈1,400 train rows); lineup signals help
de_mirage	0.608	Economy and CT/T balance dominant
de_anubis	0.591	Newest map — smallest training set

6.3 In-round specialist models

These models operate at round level, not match level. Their outputs feed player rating and achievement calculations — they are not used directly for pre-match win probability.

Model	Task	AUC
CLUTCH	Clutch-situation win prediction (kill sequence replay)	0.877
SPECTER	Round kill/death sequence outcome	0.904
CIPHER	Player performance regression (R²)	0.307 (R²)

CLUTCH and SPECTER handle different, narrower prediction tasks than full match outcome — their higher AUCs reflect that. Direct comparison to match-outcome AUC would be misleading.

6.4 Calibration and honest limits

Why ~0.70 is the honest ceiling for match outcome. Pre-match CS2 outcome prediction has a real upper bound around 0.70–0.72 AUC at the top tier — the residual is genuine in-game variance that no pre-match feature can resolve. Claims meaningfully above this on realistic, leakage-free, all-tier data should be treated with suspicion. We would rather publish a credible 0.70 than an inflated number that quietly leaks the future.

Calibration is not the same as AUC. AUC measures ranking; calibration measures whether a stated 70% actually wins ~70% of the time. We calibrate explicitly and judge betting value on calibrated probabilities, not raw scores.

The market is hard to beat. On matches with a liquid market, the market price is typically sharper than our standalone model — which is precisely why NEXUS folds the market in rather than ignoring it. We are honest that our independent edge is clearest where markets are thin or absent.

Live track record (auto-updated from the ledger): 67 of 125 settled predictions correct (53.6%). Shadow-bet bankroll 926.76u from a 1000u start · ROI -8.1% · bet hit-rate 53.6% over 125 settled bets. See every bet →

Section 7

Shadow-Betting Environment & Audit Trail

The shadow-betting layer turns forecasts into an honest, public scorecard. It places paper bets only — no real money — starting from a 1000-unit virtual bankroll.

7.1 The invariants

Each upcoming match gets exactly one immutable prediction, written the first time the match is seen and never altered afterwards (probabilities, model and timestamp are frozen). Each prediction gets exactly one shadow bet, and that bet is always on the predicted winner — enforced in code, not by convention. We bet all three tiers.

7.2 Sizing and pricing

Stakes are fractional-Kelly (a conservative quarter-Kelly) against the best available price — the live market when it is liquid, otherwise an Elo-implied baseline — with per-tier caps that allow more on Tier-1 conviction and least on noisy Tier-3. Every bet records its model probability, market probability, edge, price source, stake and the bankroll it was placed against.

7.3 Why this is verifiable

Because predictions are immutable and one-per-match, the numbers on every page — home, predictions, bets, match detail — are read from the same ledger rows rather than recomputed. There is no surface where a different probability or a different winner can appear. Settlement compares the bet's team to the match's real result, so a win is a win everywhere at once.

Section 8

Risk Management & Failure Modes

We document how this system can fail, in detail and in public, on a dedicated page. A summary of the matrix:

Failure mode	Severity	Primary mitigation
Roster changes	High	Point-in-time, roster-aware, recency-weighted form
Meta / patch shifts	High	Rolling windows; per-map signals; periodic retrain
Tier-3 noise	Medium	Per-tier honesty; hardest Kelly caps on T3
Market beats model	High if misapplied	NEXUS blends the market in; honest edge accounting
Demo parse gaps	Medium	Validation gate; parser-first isolation; delete-before-insert
Overfitting	Medium	Chronological split; calibration; early stopping
Format variance (bo1)	Medium	Format feature; confidence floor before betting
Small live sample	Documented	Paper-only until a significant track record exists

Full narrative descriptions are at counterquant.com/transparency/why-it-fails/.

Section 9

Roadmap & Conclusion

What is live now

Shipped and live on counterquant.com (June 2026):

Immutable prediction ledger — one APEX/ORACLE prediction per match, frozen at creation, public audit trail with shadow-bet bankroll
Team intelligence pages — TITAN score (Team CQ), ORACLE prediction history, live map pool win rates (score-based, 8 active pool maps), CIPHER roster stats (avg Rating / ADR / Kills from parsed demos)
Match detail — GESUS prediction panel, per-team map pool heat-map, BO3 veto predictor (bans/picks simulated from historical win rates, only maps with ≥3 parsed appearances)
Player pages — CQE, CQR, demo-derived stats (ADR, K/D, rating), achievement system (45 unlockable CS2 achievements across 8 categories, sourced entirely from demo events)
Per-team and per-match predictions shown on home page, predictions list, and team detail — with live probability bars and settled/correct markers

Near term

Deploy ORACLE and NEXUS to replace the APEX baseline in the prediction path — same immutable ledger, better model; publish a calibration curve (win-rate by predicted-probability bucket) as the settled-bet sample grows; add SHAP top-3 feature contributions to each prediction card so users can see why the model made each call.

The hard gate

Our commitment: no paid signals, API tier, or premium product until the GESUS system has a statistically significant live track record — measured by settled-bet count and calibrated profitability after fees, not by a flattering backtest. Until then, every prediction and every bet stays public and paper.

Conclusion

CQ-GESUS is a bet that the honest path wins: parse the real data yourself, build leakage-free features, report per-tier truth, freeze every prediction, and let a public ledger keep score. We do not claim to have beaten Counter-Strike's variance — we claim to measure ourselves against it in the open. Read the methodology, check the ledger, question the numbers. That is exactly what we want.

Live predictions → Transparency hub Why it might fail

Citation: CounterQuant (2026). "CQ-GESUS: A Parser-First Intelligence & Forecasting System for Counter-Strike 2." Technical Overview v1.0.
Version history: v1.0 (June 2026) — initial publication. · v1.1 (June 2026) — expanded metrics (live pipeline counts), full GESUS model family table (8 models), per-map MIRAGE AUCs, in-round specialist results, "What is live now" roadmap section, EC2 parse fleet detail.