Rabbit Hole Tornado
Macro-sentiment intelligence platform that treats word-frequency ratios as tradeable tickers. Scans 244 news sources in 14 languages, plots 100 narrative pairs, and surfaces anomalies, whale moves, and cascade predictions — narrative shifts before markets react.

Most decision-makers read the news to understand what is happening. By the time the story is on Bloomberg, the position has already moved. Rabbit Hole Tornado turns global language into a tradeable feed — 100 word-ratio tickers across 244 news sources in 14 languages, with anomaly and cascade alerts in seconds.
The problem it solves
Over 300,000 news articles are published every day across major outlets, in dozens of languages. A single analyst can read maybe 100 of them. The signals that move markets and policy — "sanctions" rising relative to "trade", "mobilization" replacing "rhetoric", "layoffs" overtaking "hiring" — are buried in the other 299,900. By the time a Bloomberg headline summarizes the shift, the move is over. Hedge fund desks pay around US$25,000 per Bloomberg seat per year for price data, and on top of that burn roughly three analyst-hours per day per analyst reading news to gauge sentiment. The data they pay for tells them what already happened. The narrative shift that preceded it goes unmeasured.
The same gap shows up in geopolitical risk, corporate strategy, defense intelligence, and investigative journalism. Everyone wants to know what is about to happen. Everyone reads the news manually to find out. Nobody quantifies the narrative the way they quantify price.
Who needs this most
- Hedge fund analysts and prop desk strategists burning 2-3 hours a day reading news across multiple time zones to vet a position, who need the read to be defensible in a Monday-morning risk meeting. The moment this hurts: any week where two analysts disagree about a macro position and there is no quantitative narrative evidence to settle it.
- Corporate risk officers at multinationals exposed to 10-20 jurisdictions who watch sanctions, trade, energy, and labor narratives drift across regions and have to brief the C-suite before the FT writes the story. The moment this hurts: every quarterly board pack where the geopolitical-risk page is built from memory because no platform tracks the full narrative surface in their language coverage.
- Government and defense intelligence analysts running multi-language collection who need a self-hosted, air-gapped option that handles 14 languages — including Ukrainian and Russian — without sending source material to a cloud LLM provider. The moment this hurts: any tasking that requires correlating Russian-language and English-language narrative drift on the same topic in the same week.
- Geopolitical and macro consultants writing weekly client letters who need defensible narrative evidence to back a thesis, not screenshots and gut feel.
The solution — in plain terms
The platform reads the global news for you and turns it into a financial dashboard. Instead of stock prices, you see narrative prices — the ratio of war-words to peace-words, of recession-words to growth-words, of sanctions to trade. When one of those ratios moves three standard deviations above its own historical mean, the platform flags it as an anomaly, names the keywords driving the move, and walks the cascade down to the sectors and commodities the move historically predicts.
Day to day, the platform pulls articles every few minutes from 244 sources spanning nine macro-regions (Anglosphere, EU-West, East Europe, MENA, South Asia, Southeast Asia, East Asia, Latin America, Africa), deduplicates them with SHA-256 URL hashing, extracts full text with Trafilatura, and counts keywords against 100 user-configurable ticker pairs. Five-minute ratio windows feed a z-score anomaly engine, a domino-chain engine that maps anomalies through cause-effect graphs, and a four-source Tornado prediction system that combines a curated seed dictionary, a learned co-occurrence matrix, real-time velocity detection, and a 916MB local ConceptNet knowledge graph with 10 million semantic relationships across 304 languages.
Where it lives in the operator's workflow: it replaces the morning news read, the manual sentiment spreadsheet, and the patchwork of Google News searches across translated outlets — and adds a quantitative audit trail none of those produce.
Value delivered — what you get
- Compresses the morning news read from 2-3 hours to 90 seconds — the Dashboard tab answers "what's happening, what changed, what should I watch" with the same numerical rigor a trader expects from a price feed.
- Quantifies narrative shifts in a form that survives a risk meeting — every anomaly is a z-score against the ticker's own history, every cascade is a named dependency, every domino is a timestamped article trail. No "I have a feeling."
- Catches the moves the English-language press has not yet picked up — 14 languages and 9 macro-regions cover Ukrainian, Russian, Spanish, Portuguese, French, German, Arabic, Turkish, Hebrew, Italian, Dutch, Malay, and Thai narratives the same way they cover English. ConceptNet's 304-language graph extends predictions cross-linguistically.
- Cuts a US$25,000-per-seat alternative-data subscription back to roughly US$5-10 per month in compute and optional LLM costs — 95 per cent of the pipeline is algorithmic regex and z-score work that runs on a commodity VPS. The 5 per cent LLM layer (Claude Haiku) is opt-in, async, and costs about US$0.001 per article when enabled.
- Tracks 87 institutional whales and 160 named world leaders by name, role, and emotion — institutional silence, person-level sentiment drift, and dominant-emotion classification across a 12-emotion spectrum, all derived from article text, no NLP service contract required.
- Self-hosted, air-gappable, no SaaS dependency — the full pipeline runs on a single SQLite database and a single Docker compose file. Suitable for government and intelligence buyers that cannot send source material to third-party clouds.
- Audit-grade evidence on every signal — every anomaly, every cascade, every signal links back to the underlying articles, the source, the timestamp, and the keyword that triggered it. Nothing comes from a black box.
Where it delivers outsized value
- Macro and global-macro hedge funds running multi-region books — the platform's strength is correlating narrative drift across language and region, which is exactly the analytical surface a macro book trades on.
- Government intelligence and defense buyers with multi-language collection needs and an on-premise mandate — the 304-language semantic graph, the self-hosted architecture, and the 14-language source set fit the profile no cloud-only competitor (Dataminr, Recorded Future, Predata) addresses.
- Geopolitical risk consultancies and corporate strategy teams at multinationals — the watchlist, the 8-dimensional conflict enrichment per country pair, and the Heartbeat builder for arbitrary sectors turn weekly client letters and quarterly board packs into a repeatable production process.
- Investigative newsrooms covering disinformation, propaganda, and narrative warfare — the Source Network Rating (SNR) gives a 4-dimensional reliability score per source (entropy, vulnerability, behavioral burst, deception divergence) that maps directly onto information-warfare analysis.
Distinctive features — why this over the alternatives
- Word-ratio tickers as a first-class instrument — no other platform exposes WAR/PEACE, INFLATION/SAVINGS, AI/JOBS as quantifiable, chartable, alertable instruments with their own historical baselines and z-scores. This is the platform's novel IP.
- Four-source Tornado prediction with confidence boosting — curated seed dictionary plus learned co-occurrence matrix plus live velocity detection plus ConceptNet semantic graph, recursed three levels deep. When multiple sources agree, confidence climbs from 0.50 to 0.95. Mainstream sentiment tools have one of these signals at best.
- 15 analytical tabs, not 15 disconnected dashboards — Dashboard, Chart, Cycle, Chains, Heartbeat, Cloud, Cascade, SNR, Psychology, Anomalies, Dominos, Geopolitical, Whales, Signals, Articles. Each tab targets a specific question a specific operator role asks.
- 9-block, 14-language source taxonomy with balance scoring — the platform doesn't just scrape feeds; it knows which block each source belongs to and surfaces narrative imbalance across regions as a first-class signal.
- Honest 95 / 5 algorithmic split — 95 per cent of the pipeline is regex, set matching, z-score statistics, and graph traversal that costs nothing per article. The 5 per cent LLM layer is fully optional, fully async, and never blocks ingestion. Unlike AI-wrapper products, the unit economics survive scale.
- Operator-grade settings surface — every ticker, source, whale, person, and chain is enable-able, disable-able, addable, and deletable through the Settings tab. The auto-rematch system rebuilds historical data for newly enabled tickers in the background with stop and resume controls.
Under the hood — built to last
Backend runs on Python 3.12 and FastAPI with aiosqlite, APScheduler, and Trafilatura — mature, well-supported, boring choices that will still be running in five years. Frontend is React 19 with TypeScript, Vite, Tailwind v4, Radix UI primitives, TradingView Lightweight Charts, and TanStack Query. Persistence is SQLite in WAL mode with a read/write connection split — production-credible on a single VPS, trivial to migrate to PostgreSQL when multi-tenancy becomes the bottleneck. The ConceptNet graph ships as a 916MB local SQLite file with no network dependency. The whole stack stands up under one Docker compose file; nothing in the critical path requires a SaaS account.
Current maturity
Working platform. The codebase is roughly 31,000 lines of source — 11,400 lines of Python across 22 API routers and a similar number of service modules, plus 19,900 lines of TypeScript across 15 tabs and 22+ data hooks. 244 sources are configured and live across 9 macro-regions. 100 ticker pairs, 87 whale entities, 160 key persons, 105 cause-effect chains, and 29 trade-signal rules are all in production data. The ConceptNet integration is fully wired and persistent. Six in-repo documents — whitepaper, business model, investor pitch, market analysis, infrastructure scaling, and capital strategy — cover the productization plan. Last development activity 2026-05-25. The platform is pre-revenue: usable end-to-end on the operator's own infrastructure, not yet packaged as a hosted multi-tenant SaaS.
Roadmap — what's next
The next milestone is the Professional tier — managed cloud hosting, multi-user workspaces, anomaly webhooks to Slack and Telegram, an API tier for programmatic consumers, and a backtesting engine that lets desks replay historical narrative regimes against their own strategies. Beyond that, the Enterprise tier opens custom source integration (private feeds, internal data, classified sources for the government track) and white-label deployment for consultancies that want to ship the platform under their own brand. The longer-term arc is a Government tier with on-premise, air-gapped deployment, multi-language analyst tooling, and propaganda-detection modules tuned for state-aligned information operations.
Working with the architect
Three engagement modes apply to this project. A fund, consultancy, or agency can commission a custom build modeled on Rabbit Hole Tornado, tuned to a specific keyword universe, source set, and language coverage — typical scope for a desk with proprietary sources or regulated data. An in-house data team can engage in strategic advisory on narrative-quantification methodology, the four-source cascade architecture, and the 95/5 algorithmic split so they can build the capability inside their own stack. Government, defense, and intelligence buyers can scope an on-premise extension with custom source integration, classified-friendly deployment, and language coverage tuned to their collection mandate. Reach out via sintegrium.io or LinkedIn for a 30-minute scoping call.
Built by Yurii Staryk · Solution Ecosystem Architect
Rabbit Hole Tornado
Related Posts

Audio & Video-to-Text Converter
Self-hosted YouTube-to-text pipeline — faster-whisper runs on a home GPU inside Docker, callable from any laptop on the LAN. Own your transcripts, no API fees, no rate limits, 99+ languages.

LANpaster: Secure Local Network Sharing
Self-hosted LAN clipboard for engineers running multiple machines on one network — paste text or files on one device, grab on another, with auto-expiring slots for API keys and zero internet dependency.

