What data sources feed the network?

150+ sources across three types: 151 curated RSS feeds across 12 content lanes, NewsAPI.ai (Event Registry) for global wire/agency articles, and Mediastack for business news. A source diversity cap of 3 items per domain prevents any single outlet from dominating.

How is entity extraction performed?

Natural-language entity recognition processes every individual signal (not just published articles), extracting organizations, people, and locations with confidence scores above 0.85. Each signal gets a unique content-hash key so entity co-occurrence reflects real editorial relationships. Currently tracking 1,800+ entities across 200+ unique signal sources.

Intelligence Network Methodology

Q: How often is the network updated?

The network is re-rendered daily at 8 AM ET. Signal ingestion runs every 2 hours from 150+ sources. Entity enrichment occurs on every signal ingestion and on article publish, plus a weekly batch re-enrichment.

Q: What is BICS and why is it used?

BICS (Bloomberg Industry Classification System) is a 4-level institutional taxonomy used by global asset managers. It provides consistent sector classification: Sector, Industry Group, Industry, and Sub-Industry.

Q: What does the daily insight tell me?

An AI-generated briefing highlighting which entities and sectors dominate the network, surprising cross-sector connections, and what relationships are strengthening or emerging.

Q: Can I access detailed entity data?

Detailed signal timelines, entity relationship graphs, and coverage analysis are available to Plocamium clients and partners. Click any entity node on the network to request access.

Q: What are the ML models and how do they work?

Two proprietary models retrain weekly: an Entity Momentum Scorer (weighted composite of 9 features producing a 0-100 score per entity) and a Sector Clustering model (KMeans on entity features revealing non-obvious groupings across BICS sectors). Momentum drives node sizing; clusters drive node coloring.

How Plocamium maps entity relationships across healthcare, industrials, and geopolitics - from raw signal ingestion through ML momentum scoring and unsupervised clustering to institutional-grade network visualization. Proprietary models drive topic selection, article quality scoring, and social post prioritization in a closed-loop intelligence pipeline. The Plocamium Signal Index (PSI) provides a daily composite z-score measuring narrative disruption, while Entity Intelligence Cards deliver five proprietary metrics per entity alongside every published article.

Signal Collection

Plocamium continuously ingests from three complementary source types spanning thirteen content lanes. Signals are polled every 2 hours and streamed to archival storage, producing a compressed time-series archive of global market activity. A source diversity cap limits each domain to a maximum of 3 items per run, preventing any single outlet from dominating the signal pool.

172 curated RSS feeds across 13 content lanes, shuffled randomly each run for source diversity
NewsAPI.ai (Event Registry) - short keyword queries per lane, returning global wire/agency articles (Reuters, AP, AFP, etc.)
Mediastack - business news API covering the 3 highest-value lanes (Healthcare, Finance, Industrials)
13 content lanes: Healthcare M&A, Industrials & Defense, GCC/LATAM Geopolitics, Healthcare Marketing, World/Geopolitical, Technology, AI/ML, Finance, Government & Regulatory, Intelligence & Security, Deep-Dive (Foreign Policy, Intelligence & Defense)
Source diversity cap: max 3 items per domain to prevent any single outlet from dominating

Topic Discovery & Trend Acceleration

Twice daily (7 AM and 5 PM ET), the Trend Acceleration Engine scans the signal archive to identify topics with accelerating mention velocity. Acceleration is calculated as the ratio of 24-hour mentions to the 7-day daily average. Topics exceeding 5× acceleration with no existing coverage automatically trigger the content pipeline - ensuring the network captures emerging entities before they become consensus.

Named entity extraction via regex on signal titles (capitalized multi-word phrases)
Lane classification per topic (Healthcare, Industrials, GCC/LATAM)
Coverage gap detection against published article corpus
Auto-pilot: max 2 pipeline triggers per scan for high-acceleration uncovered topics

Entity Extraction

Entities are now extracted from every individual signal - not just published articles - using natural-language entity recognition. Each signal receives a unique content-hash-based article key (e.g., HEALTHCARE-3a5e65588bb5), ensuring co-occurrence reflects real editorial relationships rather than artificial batching. Entities are extracted with a confidence threshold of >0.85 and classified as ORGANIZATION, PERSON, or LOCATION. Entity extraction runs on every signal ingestion (every 2 hours) and again on published articles.

First 4,900 characters processed per signal (NER API limit)
HTML stripped to plain text before analysis
Each signal gets a unique hash-based key so co-occurrence reflects real editorial relationships
Entities deduplicated and upserted with mention counts and signal/article references
Currently tracking 1,800+ entities across 200+ unique signal sources

Entity Enrichment (BICS Classification)

Each entity is enriched via AI classification using the Bloomberg Industry Classification System (BICS) - a 4-level institutional taxonomy: Sector → Industry Group → Industry → Sub-Industry. Enrichment runs on two triggers: immediately on article publish (async, zero pipeline latency) and weekly batch re-enrichment with fresh signal context.

BICS Level 1 (Sector): Health Care, Industrials, Financials, Technology, Energy, Materials, Communications, Consumer Discretionary, Utilities, Government
BICS Level 2-4: Industry Group, Industry, Sub-Industry (e.g., Health Care → HC Equipment & Services → HC Facilities & Services → Hospitals)
Deal context: acquirer, target, regulator, competitor, partner, investor, subsidiary
Geography: headquarters city/country
Market cap tier: mega-cap through private/government
Relationships: up to 5 related entities with relationship type
Re-enrichment: weekly with latest signal titles for temporal accuracy

04b

Machine Learning Models

Two proprietary ML models run on the entity corpus, retrained weekly. These models transform the static entity catalog into a dynamic intelligence signal - surfacing which entities are gaining momentum and revealing non-obvious groupings that BICS classification alone cannot detect.

Entity Momentum Scorer: Weighted composite of 9 features - mention velocity, article breadth, relationship density, deal context presence, recency, market cap, enrichment completeness, mention concentration, and entity type. Produces a 0-100 momentum score per entity. Drives node sizing in the visualization.
Sector Clustering (KMeans): Unsupervised clustering on entity features (BICS sector, geography, market cap, deal activity, mention volume). Optimal K selected via silhouette score. Reveals data-driven entity groupings - e.g., “resource geography” clusters (Permian Basin, Haynesville Shale) or “deal-active heavyweights” (UPMC, Boeing, GE Aerospace) that span multiple BICS sectors.
Retrain schedule: Weekly (Sundays 05:00 UTC). Auto-triggers chart re-render on completion.
Rising Entities panel: Top 10 entities by momentum score surfaced on the intelligence page, giving investors an at-a-glance view of where attention is accelerating.

04c

ML-Driven Pipeline Integration

The ML models don’t just power the visualization - they feed back into the entire content pipeline, influencing which topics get covered, how articles are scored, and how social posts are framed. This creates a closed loop: entity signals inform content production, which generates new entity data, which retrains the models.

Momentum-weighted topic selection: The Trend Acceleration Engine weights topics using a blended score: 60% signal acceleration + 40% ML momentum. An entity at momentum 85 with 3× acceleration outranks one at momentum 10 with 5× acceleration - ensuring the pipeline prioritizes structurally important topics over noise spikes.
Article scoring bonus: After AI evaluates an article draft on 6 quality dimensions, the scoring Lambda checks whether the article covers high-momentum entities. Articles mentioning 2+ top-20 momentum entities receive a +1 point bonus; coverage of 1 top-20 or 3+ top-50 entities yields +0.5. This makes articles about rising entities less likely to be rejected.
Social post enrichment: Every social post generation prompt is injected with a momentum context block identifying which high-momentum entities appear in the source article. The model is instructed to prioritize naming these entities. When entities from different ML clusters co-occur, the prompt flags this as a cluster-crossing signal - directing the model to emphasize the cross-sector convergence.
Closed-loop retraining: As new articles are published and entities accumulate mentions, the weekly retrain captures updated momentum patterns. Topics that were rising last week but plateau will naturally decay in momentum, while newly accelerating entities surface automatically.

Network Construction

The entity network is a co-occurrence graph: two entities are connected when they appear in ≥2 shared signals or articles with unique per-signal keys. Edge weight reflects the number of shared items. The graph is stored in a purpose-built database. Each news signal gets a unique hash-based key (e.g., HEALTHCARE-3a5e65588bb5) so entities only co-occur when they appear in the same news story.

Vertices: enriched entity profiles (name, BICS classification, deal context, geography)
Edges: co-occurrence relationships with signal/article-count weights
Enrichment-derived edges: relationships inferred by AI during BICS enrichment are also used as graph edges, pulling related entities into the network even without co-occurrence
Degree centrality computed per node (connection count)
Centrality movers: rising and falling centrality tracked day-over-day with tooltips showing degree change
New edge detection: daily diff against previous graph state
Currently: ~877 nodes, ~3,800 edges

Variance & Historical Analysis

Network state is archived daily as JSON snapshots in S3. This enables temporal analysis: which connections are strengthening, which are new, and which entities are gaining or losing centrality over time. The daily insight narrative explicitly highlights new edges and momentum shifts, providing investors with directional signal rather than static state.

Daily graph.json snapshots archived to S3 for historical comparison
New edge detection: set difference between today's and yesterday's edge sets
Centrality drift: tracked via degree count changes across snapshots
Entity staleness: last_enriched timestamp ensures profiles stay current (<7 day TTL)
Future: temporal edge decay weighting, rolling 30-day network animation

AI-Generated Daily Insight

Each morning at 8 AM ET, an AI model generates a 3-paragraph network intelligence briefing. The prompt is structured for institutional investors: which entities dominate, what cross-sector connections are emerging, and what's changing. Every sentence must answer “so what” - implications, not just descriptions.

Input: top 10 entities by centrality, new edges in 24h, sector distribution
Output: 3 paragraphs - dominance, cross-sector, momentum
Tone: institutional wire-service style, no jargon
Audience: institutional investors, PE professionals, corporate strategists

Visualization

The network is rendered as an interactive D3.js v7 force-directed graph with custom SVG rendering. Nodes are sized by ML momentum score (square root scale) and colored by ML cluster assignment using jewel-tone radial gradients - with BICS sector color as fallback for unclustered entities. The visualization is pre-rendered to static HTML + JSON on S3 and served via CloudFront for zero-latency page loads.

Force simulation: forceLink (distance by weight), forceManyBody (charge), forceCollide (collision detection)
Node sizing: ML momentum score (0-100), square root scale for perceptual accuracy
Node coloring: ML cluster assignment (KMeans), sector fallback for unclustered
Node styling: 3-stop radial gradients, cluster-colored borders, feGaussianBlur glow on high-centrality
Interactions: hover highlighting (connected nodes stay, others fade), drag, zoom/pan
Sector filter pills: remove/add nodes from simulation dynamically
Label pills: dark background behind text for readability on complex graphs
Rising Entities panel: top 10 by momentum score with hover tooltips explaining the momentum formula
Centrality Movers panel: rising and falling entities with degree-change tooltips showing connection gains/losses
Entity Watchlists: users can subscribe to watch specific entities with 4 alert triggers (momentum threshold, momentum delta, cluster change, new connection)
Cross-cluster article trigger: when new edges cross ML cluster boundaries with momentum > 50, the system auto-generates articles (max 2/day)
Pre-rendered: zero compute at request time, 1-hour cache TTL

Plocamium Signal Index (PSI)

The Plocamium Signal Index (PSI) is a daily composite z-score measuring narrative disruption across the entity graph. It is not a sentiment score. It is not a volume count. PSI measures how much the graph’s behavior deviates from its own learned baseline - capturing the structural shifts that precede consensus.

Five orthogonal input signals:

Attention Cascade Intensity - Hawkes process residuals. Measures surprising coverage volume after accounting for self-excitation (one story triggers more stories). The residual is the unexpected component - the attention that cannot be explained by prior attention.
Narrative Embedding Drift - Vector embeddings cosine distance between this week’s entity-level text centroid and last week’s. Measures what is being said changing, independent of how much is being said.
Graph Spectral Shift - Eigenvalues of the graph Laplacian tracked daily. Captures structural community reorganization - merging clusters, splitting groups, new bridge entities. Orthogonal to node-level metrics by construction.
Sentiment-Momentum Divergence - Residual of targeted sentiment regressed on momentum score. The part of sentiment not explained by attention volume. An entity getting quietly negative while staying prominent is a contrarian signal.
Source Concentration - Herfindahl-Hirschman Index (HHI) across source domains per entity. High HHI means one source is driving the narrative. Low HHI means consensus is forming across multiple outlets.

Signal processing pipeline:

Orthogonalization: PCA decomposition on the 5-signal matrix. Components with eigenvalue > 1 retained. VIF screening rejects any input with VIF > 5 - ensuring no signal is a linear combination of the others.
Regime Detection: Bayesian Online Changepoint Detection (BOCPD, Adams & MacKay 2007). Maintains a posterior distribution over run length. On changepoint detection, PSI enters a 7-day transition window with widened confidence bands and a visible “regime shift detected” flag.
Composite: Inverse-variance weighted sum of orthogonal components. More precise signals receive higher weight automatically - no manual tuning.
Smoothing: Kalman filter separates three components: trend (reported as PSI), mean-reverting deviations (trigger alerts), and noise (discarded). Produces native confidence intervals on every estimate.

Output & calibration:

Z-score with plain-English labels: < −1.0 “Unusually Quiet” · −1.0 to +1.0 “Normal” · +1.0 to +2.0 “Elevated” · +2.0 to +3.0 “High Activity” · > +3.0 “Extreme”
Calibration: First 30 days use percentile ranking (non-parametric). Day 30: full model trains and validation suite runs. If tests fail, PSI stays in calibration mode until passing.
Audit trail: Every daily computation writes raw signals, PCA loadings, regime state, Kalman estimate + confidence interval, feature attribution, and driving entities to the database. Every PSI value traces back to source feeds and timestamps.

Entity Intelligence Cards

A visual data card embedded alongside every published article, showing five proprietary metrics for each primary entity. Article prose stays clean - the card provides the quantitative layer. Each card carries the current PSI value as a market-context watermark, connecting the entity-level view to the system-level signal.

Card metrics:

→ Momentum Score - Existing 0-100 weighted composite. Full weight at 30+ observations, linear ramp below.
→ Sentiment Trajectory - 7-day sparkline from targeted sentiment EMA. Suppressed below 5 datapoints (direction arrow only).
→ Network Position - Personalized PageRank conditioned on sector (not raw degree centrality - PPR captures influence relative to sector context). Requires 5+ edges.
→ Narrative Drift - Vector embedding cosine distance week-over-week. Labeled “emerging” below 4 weeks of data.
→ Sentiment-Momentum Divergence - The contrarian indicator. When sentiment and momentum decouple, something is changing beneath the surface.

Eligibility & confidence:

→ Eligibility thresholds: 5+ co-occurrence edges, 3+ signal appearances in 14 days, momentum > 0
→ Confidence indicator: Each metric shows data sufficiency - “dense data” vs. “emerging” - so readers can assess reliability at a glance
→ PSI watermark: Every card displays the current PSI value in the corner, connecting the entity-level view to the market-level signal

Statistical Validation

Plocamium holds its own models to institutional quantitative research standards. Every signal must earn its place through statistical validation - not intuition, not backtesting convenience, not narrative appeal. PSI does not leave calibration until it passes five independent gates.

Five validation gates:

Permutation test (p < 0.01, 1,000 shuffles) - Proves the signal is not random noise. If shuffled inputs produce equivalent output, the signal has no information content.
Subsample stability - Consistent direction across 3 non-overlapping time periods. A signal that works only in one sub-period is overfit to that regime, not generalizable.
Walk-forward cross-validation - Train on months 1-N, test on month N+1. Signal must show positive information coefficient in >60% of out-of-sample windows. No look-ahead bias.
Factor regression - Significant alpha after controlling for raw news volume and sector momentum. Proves PSI contains proprietary information, not repackaged public data.
Decay analysis - Characterized half-life. Narrative signals typically decay 1-5 days for breaking news, 2-8 weeks for structural shifts. PSI must have a well-defined decay profile to be actionable.

Cold start & confidence:

Hierarchical Bayesian priors: New entities borrow baseline distributions from their BICS sector cohort. As data accumulates, the entity’s posterior migrates toward its own data (James-Stein shrinkage). No entity starts from zero.
Confidence ramps: Below 30 observations, signal weight scales linearly (n/30). Below 10 observations, the signal is suppressed entirely. No thin-data artifacts.
No black boxes: Every PSI value traces back to source signals, feed URLs, and timestamps. The audit trail is the proof. If a value cannot be explained, it is not published.

Technology Stack

NLP Entity Recognition AI Classification BICS Taxonomy KMeans Clustering Isolation Forest Kalman Filter BOCPD Vector Embeddings Hawkes Process Personalized PageRank D3.js v7

Pipeline Flow

End-to-End Intelligence Pipeline

Signal
Ingestion

150+ sources
2 hours

→

Topic
Discovery

Acceleration
scoring

→

Entity
Extraction

NLP
NER

→

BICS
Enrichment

AI
Classifier

→

04b

ML
Scoring

Momentum
+ Clustering

→

PSI

Signal
Index

Z-score
5 signals

→

Network
Graph

Graph
Database

→

Daily
Insight

AI narrative
8 AM ET

→

D3.js
Visualization

Interactive
network

→

Watchlists

Email alerts
4 triggers

→

Entity
Cards

5 metrics
per entity

↻ Closed Loop: ML scores feed back into Topic Selection (momentum weighting) · Article Scoring (quality bonus) · Social Posts (prompt enrichment)

Frequently Asked Questions

FAQ

What is the Plocamium Intelligence Network? +

An interactive visualization mapping entity relationships across healthcare, industrials, and geopolitics. It shows how companies, regulators, and decision-makers connect through shared news signals and published analysis.

How often is the network updated? +

The network visualization is re-rendered daily at 8 AM ET. Signal ingestion runs every 2 hours from 150+ sources via Kinesis Firehose. Entity extraction runs on every signal ingestion and again on published articles. Entity enrichment occurs immediately on article publish and in a weekly batch re-enrichment cycle.

What is BICS and why is it used? +

BICS (Bloomberg Industry Classification System) is the institutional-grade taxonomy used by global asset managers for sector classification. It provides four levels of granularity: Sector → Industry Group → Industry → Sub-Industry. Using BICS ensures compatibility with how institutional investors already categorize companies.

How are entities connected in the graph? +

Two entities are connected when they co-occur in two or more shared signals or articles. Each signal gets a unique content-hash key (e.g., HEALTHCARE-3a5e65588bb5) so co-occurrence reflects real editorial relationships. Additionally, relationships inferred by AI during BICS enrichment create edges even without co-occurrence. The graph is stored in the database (Neptune was retired March 2026). Edge weight reflects the number of shared items. This approach surfaces non-obvious relationships - entities appearing together in deal coverage, regulatory filings, and market signals before formal announcements.

What does the daily insight tell me? +

An AI-generated briefing written for institutional investors, highlighting three things: which entities and sectors currently dominate the network, any surprising cross-sector connections worth watching, and what relationships are strengthening or emerging. Every sentence answers “so what” - implications for capital allocation, not just descriptions.

Can I access detailed entity data? +

Detailed signal timelines, entity relationship graphs, BICS profiles, and coverage analysis are available to Plocamium clients and partners. Click any entity node on the network to request access, or contact us directly.

How does variance analysis work? +

Network state is archived daily as JSON snapshots. New edges are detected by comparing today’s graph against yesterday’s. Centrality drift tracks which entities are gaining or losing connections over time. The daily insight narrative explicitly highlights these directional signals.

What are the ML models and how do they work? +

Two proprietary models retrain weekly: an Entity Momentum Scorer computes a 0-100 score from 9 features (mention velocity, deal context, recency, relationship density, market cap, enrichment completeness, and more). A Sector Clustering model (KMeans) groups entities by behavioral similarity rather than just BICS label - revealing clusters like “deal-active heavyweights” or “resource geographies” that span traditional sector boundaries. Momentum drives node sizing; clusters drive node coloring.

What is the Rising Entities panel? +

The Rising Entities panel surfaces the top 10 entities by ML momentum score. These are entities where multiple signals converge: increasing mention velocity, recent deal context, growing relationship networks, and fresh enrichment. It gives investors an at-a-glance view of where attention is accelerating - before consensus catches up.

How do ML models affect content production? +

The ML models feed into three pipeline stages. Topic selection: the Trend Acceleration Engine blends signal acceleration (60%) with ML momentum (40%), so structurally important entities outrank noise spikes. Article scoring: drafts covering high-momentum entities receive a scoring bonus (+0.5 to +1 point), making them less likely to be rejected. Social posts: generation prompts are enriched with momentum context and cluster-crossing flags, directing the AI to emphasize the most newsworthy entities and cross-sector convergences.

What is a cluster-crossing signal? +

When an article covers entities from different ML clusters - data-driven groupings that often span traditional BICS sector boundaries - the system flags this as a cluster-crossing signal. For example, an article connecting a “deal-active heavyweights” cluster entity (UPMC) with a “resource geography” cluster entity (Permian Basin) would trigger this flag. These cross-sector convergences often indicate emerging deal activity or structural market shifts that single-sector analysis would miss.

What is an entity watchlist? +

Users can subscribe to watch specific entities and receive daily email alerts when tracked entities cross momentum thresholds, change ML clusters, gain new graph connections, or see significant momentum shifts. Watchlists use double opt-in confirmation via email and support four distinct alert trigger types. Alerts are delivered as a daily digest via email.

How does source diversity work? +

RSS feeds are shuffled randomly each run and capped at 3 items per domain, preventing any single outlet from dominating the signal pool. Three complementary API sources (151 RSS feeds, NewsAPI.ai, and Mediastack) ensure broad geographic and editorial diversity. This produces more representative entity co-occurrence data and reduces bias toward any particular news source.

What are centrality movers? +

The Rising/Falling Centrality panel tracks which entities are gaining or losing graph connections compared to the previous day. Rising centrality means more co-occurrence edges formed - the entity appeared alongside more peers in recent signals. Falling centrality means connections dropped. Tooltips show the exact degree change, giving investors a quick read on which entities are becoming more or less central to the network.

What is the Plocamium Signal Index (PSI)? +

A daily composite z-score measuring how much the entity graph is deviating from its own learned baseline. It combines five orthogonal signals - attention cascades (Hawkes process), narrative drift (vector embeddings), graph structure (spectral shift), sentiment divergence, and source concentration (HHI) - into a single regime-aware indicator. PSI uses Bayesian Online Changepoint Detection to distinguish permanent shifts from temporary anomalies, and a Kalman filter to separate signal from noise. Every value comes with a confidence interval and a full audit trail.

How is PSI different from a sentiment score? +

Sentiment measures tone (positive/negative). PSI measures structural disruption in the entity network - new relationships forming, communities reorganizing, narratives shifting. A market can be highly negative (bearish sentiment) but structurally quiet (low PSI), or neutral in tone but structurally chaotic (high PSI). They measure fundamentally different things.

What are Entity Intelligence Cards? +

Visual data cards showing 5 proprietary metrics - momentum score, sentiment trajectory (7-day sparkline), network position (Personalized PageRank), narrative drift (embedding distance), and sentiment-momentum divergence - for each primary entity in an article. Cards include confidence indicators so readers can assess data sufficiency. Each card carries the current PSI value as a market-context watermark.

How does PSI handle regime changes? +

BOCPD (Bayesian Online Changepoint Detection) maintains a posterior distribution over how long since the last structural break. When a changepoint is detected, PSI enters a 7-day transition window with widened confidence bands and a visible “regime shift detected” flag. After transition, baseline parameters are re-estimated from post-break data only. No arbitrary rolling windows - the model adapts to the data.

What validation does PSI undergo? +

Five statistical gates before leaving calibration: permutation testing (p < 0.01, 1,000 shuffles), subsample stability across non-overlapping periods, walk-forward cross-validation with >60% positive information coefficient, factor regression proving proprietary alpha beyond raw volume, and decay analysis characterizing signal half-life. These standards are consistent with institutional quantitative research practices.

Plocamium Signal Index (PSI)

The Plocamium Signal Index is a composite z-score that measures the prevailing market intelligence regime for each entity. It combines five orthogonal signals into a single dimensionless score, processed through six layers of statistical refinement - from raw signal ingestion through Kalman-smoothed output. PSI is the proprietary intelligence layer embedded in every article and entity card published on the platform, giving institutional readers a quantitative anchor to supplement qualitative analysis.

Composite of 5 orthogonal signals, each measuring a distinct dimension of entity activity
Six processing layers: raw signals → PCA orthogonalization → BOCPD regime detection → inverse-variance weighting → Kalman smoothing → z-score output
Published as a labeled score on every article and entity intelligence card
Regime classification from QUIET to EXTREME provides immediate interpretive context
Auto-validates at day 30 via five independent statistical gates; weekly thereafter

09a

Five Orthogonal Signals

Each signal is designed to capture a distinct, non-redundant dimension of entity behavior. Orthogonality is enforced at the processing layer via PCA, but the five signals are conceptually independent by construction: attention volume, narrative direction, structural network position, sentiment-momentum divergence, and source breadth.

S1 - Attention Cascade Intensity

Models coverage volume as a Hawkes self-exciting process: each mention increases the probability of subsequent mentions
Computes the residual between observed coverage volume and the volume predicted by the entity’s own historical excitation pattern
A high residual indicates the entity is attracting more attention than its own momentum predicts - an external catalyst, not self-sustaining noise
Interpretation: positive residual = structural attention surge; negative = momentum decay

S2 - Narrative Embedding Drift

Generates weekly entity description centroids using text embeddings (1536-dimensional)
Computes cosine distance between consecutive weekly centroids
High drift signals that the story surrounding the entity is changing direction - reputational shift, strategic pivot, or emerging deal narrative
Low drift over time = stable narrative; sudden high drift = context break worth investigating

S3 - Graph Spectral Shift

Applies Laplacian eigenvalue analysis to the entity co-occurrence graph snapshot
Tracks changes in the Laplacian spectrum week-over-week to detect structural changes in community topology
Captures cluster merges, splits, and entity migrations between communities that are invisible to node-level metrics
Interpretation: a rising spectral gap indicates strengthening community structure; a collapsing gap signals structural fragmentation

S4 - Sentiment-Momentum Divergence (SMD)

Runs OLS regression of sentiment on momentum across the entity corpus to establish the expected sentiment given a given momentum level
Computes the residual: actual sentiment minus predicted sentiment
A large positive residual means sentiment is running ahead of what momentum would predict - potential contrarian signal
A large negative residual means sentiment is deteriorating despite strong momentum - early-warning divergence often preceding corrections

S5 - Source Concentration (HHI)

Computes the Herfindahl-Hirschman Index across news source domains covering the entity
High HHI (coverage concentrated in few outlets) indicates potential information asymmetry - institutional or specialist sources breaking stories before broader dissemination
Low HHI (broad source diversity) indicates consensus coverage - widely known, low information edge
Contrarian interpretation: high-HHI entities may offer earlier signal; low-HHI entities are already in consensus

09b

Signal Processing Pipeline

Raw signals pass through six processing layers before producing a PSI score. Each layer serves a specific statistical purpose: removing redundancy, detecting regime changes, weighting by reliability, and smoothing noise without obscuring genuine transitions.

Layer 1 - Raw Signal Computation: Each of the five signals computed independently per entity using signal history and graph state
Layer 2 - PCA Orthogonalization: Kaiser criterion (eigenvalue > 1) retains principal components; signals with VIF > 5 are rejected to eliminate multicollinearity. Ensures composite score reflects genuinely independent information
Layer 3 - BOCPD Regime Detection: Bayesian Online Changepoint Detection (Adams & MacKay, 2007) using Normal-Inverse-Gamma conjugate prior. Identifies structural breaks in the signal distribution with a 7-day transition window, allowing the model to adapt to new regimes without overfitting to noise
Layer 4 - Inverse-Variance Weighted Composite: Signals are combined using inverse-variance weights, with weights updating automatically as signal reliability evolves. More stable signals receive proportionally higher weight
Layer 5 - Kalman Smoothing: Harvey (1989) local linear trend model with Rauch-Tung-Striebel backward pass. Decomposes the composite into trend, mean-reverting, and noise components. Produces a smoothed score that lags genuine transitions minimally while filtering measurement noise
Layer 6 - Z-Score Output: Kalman-smoothed composite standardized against the rolling entity-level distribution. Output is a dimensionless z-score with regime label assigned per the classification table

09c

Regime Classification

The PSI z-score is mapped to five labeled regimes, providing immediate interpretive context for non-quant readers. Regime labels appear on entity cards and in article metadata.

PSI Range	Regime	Interpretation
< −1	QUIET	Below-normal signal activity across all five dimensions
−1 to +1	NORMAL	Baseline market conditions; no anomalous signal detected
+1 to +2	ELEVATED	Heightened activity across signal dimensions; monitor closely
+2 to +3	HIGH	Significant regime shift detected; multiple signals converging
> +3	EXTREME	Rare event - maximum signal intensity across all dimensions

09d

Cold Start Handling

Newly tracked entities have insufficient signal history to produce reliable PSI scores. The cold start protocol applies Bayesian shrinkage and graduated confidence ramps to avoid publishing noisy scores for entities with sparse data.

James-Stein shrinkage: raw signals are shrunk toward BICS sector priors derived from all entities in the same sector. New entities inherit partial sector-level expectations rather than starting from zero
Linear confidence ramp: PSI confidence scales linearly from 0 to 1 over the first 30 observations (n/30). Score is suppressed entirely below 10 observations to prevent spurious early signals from surfacing in article metadata
Percentile calibration: during the first 30 days, raw z-scores are converted to sector-relative percentile ranks rather than absolute z-scores, providing interpretable output before full distribution estimation is possible
Sector inheritance: until an entity has sufficient history, its regime classification defaults to the sector median PSI, clearly labeled as an estimated value

09e

Entity Intelligence Cards

Every qualifying entity in the network receives an intelligence card surfaced alongside related articles. Cards present five computed metrics that give investors an at-a-glance quantitative profile without requiring them to interpret the underlying graph or signal time series.

Momentum: ML momentum score (0-100), the entity’s weighted composite of 9 behavioral features including mention velocity, deal activity, and relationship density
Sentiment Trajectory: 7-day rolling directional change in targeted entity-level sentiment, extracted via NLP sentiment analysis on per-signal text windows
Network Position (PPR): Personalized PageRank score measuring structural importance in the co-occurrence graph - entities connected to other well-connected entities score higher
Narrative Drift: S2 signal value (cosine distance between weekly description centroids) indicating how much the entity’s story is changing
SMD: Sentiment-Momentum Divergence residual (S4), flagging when sentiment departs from what momentum predicts

Eligibility Criteria

Minimum 5 graph edges (sufficient network embeddedness for PPR to be meaningful)
At least 3 distinct signals received in the prior 14 days
Momentum score above 0 (entity must show at least baseline activity)

09f

Validation Framework

PSI is subject to a five-gate statistical validation protocol that fires automatically at day 30 and runs weekly thereafter. The framework is designed to catch signal decay, data-snooping bias, and spurious correlations before they propagate into published intelligence.

Gate 1 - Permutation Test: PSI scores are permuted against entity outcomes 1,000 times. The observed information coefficient must exceed the 99th percentile of the null distribution (p < 0.01) to pass
Gate 2 - Subsample Stability: The full entity corpus is split into three temporal subsamples. Signal weights must be directionally consistent across all three periods to confirm robustness rather than sample-specific fitting
Gate 3 - Walk-Forward Cross-Validation: Rolling origin backtests with expanding training windows. PSI must achieve positive information coefficient (IC) on more than 60% of out-of-sample periods to pass
Gate 4 - Factor Regression: PSI is regressed on news volume and sector momentum to isolate alpha. The model must retain significant explanatory power after controlling for these known confounders
Gate 5 - Decay Analysis: Signal half-life is estimated by regressing IC on forward horizon. A half-life below 3 days triggers a weights recalibration; below 1 day triggers a full signal review

09g

Link Prediction Beta

The link prediction model forecasts which entities will become co-occurrence graph neighbors before they appear together in articles - identifying emerging relationships from structural and behavioral signals rather than waiting for editorial co-mention. This predictive layer is in beta, validated via edge holdout against accumulated graph history.

Three-model ensemble: graph heuristics (common neighbors, Jaccard similarity, Adamic-Adar index), logistic regression on entity feature pairs, and a Graph Convolutional Network (GCN) trained on the co-occurrence graph topology
Feature inputs: BICS sector alignment, deal context overlap, momentum score proximity, PPR proximity, geographic co-location, shared signal lane history
Validation: 20% edge holdout test split from the accumulated graph history. Ensemble precision, recall, and AUC-ROC evaluated monthly
Output: A daily list of high-probability predicted edges surfaced to the intelligence dashboard as “Emerging Connections” - relationships that do not yet exist in the graph but are structurally likely within a 7-30 day window
Status: Accumulating 3 months of edge history for initial training; target production deployment Q2 2026

09h

Advanced Analytics

Beyond the core PSI score, the intelligence platform supports a suite of advanced analytical layers for clients requiring deeper quantitative context. These modules operate on the same underlying signal infrastructure and are available via the client portal.

Signal decomposition: Per-entity breakdown of which of the five signals contributes most to the current PSI score, enabling attribution analysis (e.g., “PSI is ELEVATED primarily due to Narrative Drift and Source Concentration”)
Sector rotation heatmap: Aggregate PSI by BICS sector, updated daily, showing which sectors are in ELEVATED or HIGH regimes simultaneously - a macro-level intelligence view for portfolio allocation decisions
Lead-lag detection: Cross-correlation analysis between connected entity PSI series identifies which entities systematically lead or lag their peers in the co-occurrence graph - useful for identifying bellwether entities within a sector
Peer-relative PSI: Entity PSI expressed as a z-score versus the sector median, normalizing for sector-wide regime shifts and isolating entity-specific signal from broad market conditions
Event study framework: Tracks PSI behavior in the 30 days before and after confirmed events (deal announcements, regulatory decisions, earnings) to calibrate signal lead times and validate predictive utility
Clinical trials bridge: Trial status transitions (Phase I → II, enrollment open/closed, primary endpoint readout) are ingested as discrete signals, enabling PSI to reflect pipeline events for healthcare entities not yet captured in general news flow

PSI Technology Stack

Hawkes Process Vector Embeddings Laplacian Spectral Analysis OLS Regression Herfindahl-Hirschman Index PCA Orthogonalization BOCPD (Adams & MacKay 2007) Kalman Smoothing (Harvey 1989) James-Stein Shrinkage Personalized PageRank Graph Convolutional Network Walk-Forward Cross-Validation

Proprietary Defensibility

PSI’s defensibility rests on four compounding advantages that increase over time.

1 - Signal Combination Is Novel

Individual signals (Hawkes processes, spectral analysis, HHI) are well-documented in academic literature. PSI’s contribution is the specific five-signal combination with PCA orthogonalization, Bayesian changepoint detection, and Kalman smoothing as a unified pipeline. No published system combines these methods for healthcare intelligence.

The VIF rejection step ensures signal independence - adding a sixth signal (e.g., clinical trial transitions) only strengthens the composite if it provides genuinely new information.

2 - Graph Topology Is Earned

The entity co-occurrence graph underlying PSI took months of continuous ingestion from 172 RSS feeds, NewsAPI.ai, and Mediastack to construct. With 400+ active nodes, 800+ edges, and daily temporal history stored in the database, the graph captures relationship dynamics that cannot be replicated from static data.

A competitor starting today would need months of identical signal processing to build a comparable graph.

3 - Temporal Depth Compounds

Every day PSI runs, the model becomes more valuable:

Kalman smoother improves estimates with each observation
BOCPD regime detection requires history to identify changepoints
Lead-lag detection needs 7+ days of paired PSI scores
Event study backtesting grows more statistically powerful with each deal event
Link prediction accuracy improves as edge formation patterns accumulate (see also the weekly Fit Health Study)
Validation gates (day 30 auto-fire) build auditable statistical evidence

A 30-day PSI has materially different capabilities than a day-1 PSI. At 90 days, the link prediction model gains enough edge history for temporal features. At 180 days, event study results carry statistical significance. This temporal moat widens continuously.

4 - Six-Model Ensemble Is Redundant by Design

The link prediction system ensembles six models spanning three methodological families:

Classical graph theory: Jaccard, Adamic-Adar
Feature-based machine learning: logistic regression on PSI features
Deep graph learning: GCN, Node2Vec, VGAE, TransE knowledge graph embeddings

Each model captures different structural patterns. When 5 or 6 models agree on a prediction, the confidence is qualitatively different from any single model’s output. This ensemble architecture is standard at institutional quantitative firms but unprecedented in healthcare intelligence.

Alternative Data Integration

PSI incorporates non-news signals that create information asymmetry:

SEC EDGAR filing detection: 8-K material events, 13-D activist disclosures, S-1 IPO registrations
Search velocity acceleration: mention momentum from 24,000+ daily signals
Clinical trial status transitions: recruiting → completed, suspended, terminated

These signals arrive hours to days before news coverage, providing temporal advantage over news-only intelligence systems.

Each of these advantages is backed by ongoing empirical work: ablation study, event study, validation gates, link prediction benchmarks, and the weekly fit health study are documented below with methodology and live result links. Full validation reports and ablation tables are available on request to qualified parties — see the Request Full Study Results block.

Studies & Validation

PSI’s statistical rigor is validated through continuous studies that measure component contribution and predictive accuracy.

Ablation Study

Each of the 6 link prediction models and 5 PSI signals is systematically removed to measure its marginal contribution to ensemble performance. This identifies which components are most critical and whether any are redundant.

Event Study

PSI scores are measured in 6 time windows around known deal announcements (pre-14d, pre-7d, pre-3d, event day, post-3d, post-7d) to quantify the index’s anticipatory power. Hit rate: percentage of events where PSI was ELEVATED or higher before the announcement.

Validation Gates (5 Tests)

Auto-fire at day 30 and weekly thereafter:

1. Permutation test (p < 0.01) - is PSI-outcome correlation real?
2. Subsample stability - consistent across time periods?
3. Walk-forward CV (>60% positive IC) - predictive out-of-sample?
4. Factor regression - adds alpha beyond news volume and sector momentum?
5. Decay analysis - characterized signal half-life

Link Prediction

6-model ensemble predicting future entity connections: classical heuristics (Jaccard, Adamic-Adar), logistic regression, GCN, Node2Vec (Grover & Leskovec 2016), VGAE (Kipf & Welling 2016), TransE (Bordes et al. 2013). Auto-optimized weights via Dirichlet search.

Request Full Study Results

Validation gate reports, ablation tables and event study data are available to qualified parties.

Request Access →

11b

Fit Health Study

Monthly diagnostic of under/overfit on the link-prediction stack. For each of nine models (Jaccard, Adamic-Adar, Common Neighbors, Preferential Attachment, Logistic Regression, GCN, Node2Vec, VGAE, TransE) plus the Dirichlet-weighted ensemble, we plot train-set and holdout-set ROC-AUC across five training-window sizes (7, 14, 30, 45, 60 days) using rolling-origin time-series cross-validation with five folds and a 7-day holdout.

The headline metric is fit-gap: train ROC-AUC minus holdout ROC-AUC. Fit-gap above 0.10 flags overfit; gap below 0.05 with reasonable absolute AUC is healthy. The study also surfaces the underfit signal: both curves low and flat across training sizes indicates insufficient model capacity.

Status Thresholds

OVERFIT — fit-gap > 0.10
WATCH — fit-gap between 0.05 and 0.10
HEALTHY — fit-gap < 0.05 with reasonable absolute AUC

Secondary Credibility Panels (top-50)

Two panels are reported alongside the learning curves but do not drive the fit-gap diagnosis. EDGAR co-filing hit rate measures whether the model surfaces real-world ties confirmed by SEC filings. PSI co-movement hit rate measures whether high-confidence predicted pairs move in lockstep on PSI regime change. Both serve as credibility checks on model output quality.

Explore the Network

See the live entity intelligence map with momentum scoring, cluster analysis, and daily AI-generated insights.

View Intelligence Network →