⚠ Failure Analysis · CQ-GESUS Forecasting System

Why CQ-GESUS Might Fail

Every way our CS2 prediction system could be wrong, lose paper money, or mislead. We publish this because almost no prediction product ever does — and they should.

This is not a legal disclaimer. It is a genuine technical analysis of the system's failure modes, written by the people who built it. If a risk listed here actually materialises, we will update this page and say so in the open rather than quietly deprecating an underperforming model.

1. Roster Changes

High risk

CS2 teams swap players constantly — a benching, a stand-in, a full re-build. The "team" that plays tomorrow may share only a name and a logo with the one our history describes. A model that treats a team as a fixed entity will confidently misprice a roster that no longer exists.

↳What we do: features are point-in-time and roster-aware, built from recency-weighted form so a lineup's recent matches dominate. Player ratings (CQE) are aggregated per current roster, not per historical brand. The model tracks teams as they are now, but a brand-new lineup with no shared history is an honest blind spot.

2. Meta & Patch Shifts

High risk

Valve changes the game: the active map pool rotates, weapons get rebalanced, utility behaviour changes. Patterns learned on the old meta can invalidate overnight — a team strong on a map that just left the pool tells you little about the new one.

↳What we do: rolling training windows and recency weighting let recent matches dominate stale ones; per-map signals localise map-specific strength; periodic retraining rolls the window forward. A sudden, large meta break is still a real risk between retrains.

3. Tier-3 Noise

Medium risk

Lower-tier matches are the bulk of the schedule and the least predictable: thin history, volatile rosters, inconsistent stakes. Our Tier-3 AUC (~0.65) is honestly modest, and any single flattering Tier-3 streak is more likely luck than edge.

↳What we do: we report Tier-3 separately instead of burying it in a blended average, and the shadow-bet sizing caps are tightest on Tier-3, so noise there cannot dominate the bankroll. We still predict it — because hiding the hard tier would be dishonest.

4. The Market Beats the Model

High if misapplied

On matches with a liquid betting market, the market price is usually sharper than our standalone model. Betting against a sharp market on the strength of a 0.71-AUC model is a good way to lose money slowly.

↳What we do: where a liquid market exists we use NEXUS, which blends our model with the market price rather than fighting it; the standalone model leads only where markets are thin or absent. The bets page splits P&L by price source so any false edge is visible.

5. Demo Parse Gaps & Data Quality

Medium risk

Demos go missing, arrive late, or parse with edge-cases — warmup-round artefacts, ambiguous side attribution, and occasional duplicate team IDs emitted upstream. Bad rows feeding a rating or a feature would quietly degrade everything downstream.

↳What we do: only validated parses reach the gold tables; parser-first isolation keeps scraped numbers out of analytics entirely; kill events are delete-before-insert to kill duplicates; per-round side resolution handles halftime/OT swaps; and bet settlement resolves the winner by team identity and name so duplicate IDs don't strand a result.

6. Overfitting

Medium risk

With 509+ features and a finite history of professional matches, some apparent signal may reflect quirks of our specific training window rather than a durable pattern — and could decay out-of-sample.

↳What we do: strict chronological train/validation/test split (no shuffling, no leakage), conservative regularisation, early stopping on held-out data with selected iterations well below budget, and explicit calibration. The public live ledger is the real out-of-sample test that no backtest can fake.

7. Format Variance (bo1)

Medium risk

A best-of-one is close to a coin-flip even between mismatched sides — one map, one pistol round gone wrong. The model can be "right" about the better team and still see it lose a single map, and short series amplify variance.

↳What we do: series format is an explicit feature, and a confidence floor prevents placing a shadow bet on a near-coin-flip with no real model lean. We accept that bo1 outcomes will be noisier and size accordingly.

8. Small Live Sample

Lower risk, documented

With 125 settled paper bets so far, the live track record is still too small for statistical confidence — a good or bad run at this stage is dominated by variance, not skill.

↳What we do: everything stays paper — no real capital — and no paid product launches until a statistically significant settled-bet sample exists. We publish the live numbers anyway, because transparency means showing them before they look good, not after.

Our commitment to radical transparency

Every prediction is frozen and timestamped at creation — impossible to backfill retroactively
The live bankroll, ROI, hit-rate and drawdown are public and read straight from the ledger
Per-tier and per-price-source results are published separately — the hard cases are not hidden
If CQ-GESUS fails to show a calibrated edge over a meaningful sample, we will say so plainly and keep it paper
We update this page whenever a new failure mode is identified or an existing one manifests

Watch the models predict live → Read the whitepaper