Authors James Tannahill, Plocamium Holdings
Version Draft v1.1
Date 2026-06-21
Abstract

healthcare intelligenceentity networkslink predictiongraph neural networksknowledge graph embeddingsBayesian changepoint detectionKalman filteringcomposite indices

Draft v1.1 — 2026-06-21. Results current as of the 2026-06-21 Attention Diagnostic Suite, Phase-0 forward-holdout panel, GNN promotion audit, fit_study staging run, and the 2026-06-08 LF-1 anchored test (attempt #2, closed FAIL).


01

Introduction

The healthcare sector generates thousands of news signals daily across regulatory actions, M&A transactions, clinical trial outcomes, executive movements, and policy changes. Traditional intelligence systems process these signals reactively — Bloomberg reports the deal, Reuters covers the announcement, sentiment scores measure the reaction.

We propose the Plocamium Signal Index (PSI), an intelligence system that characterizes the information structure around healthcare entities. PSI was designed on the hypothesis that events in healthcare M&A, regulatory action, and strategic partnerships are preceded by observable changes in information flow — attention cascades, narrative shifts, graph restructuring, sentiment-momentum divergence, source concentration — days to weeks before formal announcements. That forecasting hypothesis was tested externally and did not hold (§6.4: on SEC-filing-dated deal events, PSI does not beat a naive mention-spike baseline). PSI’s demonstrated value is therefore as a structural and enrichment system — quantifying and explaining an entity’s information profile — rather than as an event forecaster. This paper reports that distinction honestly; §6 documents both what is and is not supported.

1.1 Contributions

  1. Five orthogonal signal framework combining self-exciting processes (Hawkes), embedding drift (Titan), spectral analysis (Laplacian), regression residuals (OLS), and concentration metrics (HHI) for healthcare entity monitoring
  2. Six-layer processing pipeline with PCA orthogonalization, BOCPD regime detection, inverse-variance weighting, Kalman smoothing, James-Stein cold-start handling, and percentile calibration
  3. Reproducible eight-model link-prediction ensemble (TransE excluded for non-reproducibility) with auto-tuned weights and leave-one-out ablation; headline AUC + fit-gap measured by the monthly fit_study Fargate job
  4. Diagnostic-aligned production ensemble: eight models (Jaccard, Adamic-Adar, Common Neighbors, Preferential Attachment, Logistic Regression, GCN, Node2Vec, VGAE), Dirichlet-search weight optimizer, leave-one-out ablation reporting marginal contribution per model — deployed scoring matches the fit_study diagnostic exactly
  5. Production deployment on a live ~400-entity healthcare graph (150-entity display subset) processing ~140 real-time feeds with SEC EDGAR integration
  6. Leave-one-out ablation quantifying each model’s marginal contribution — and honestly surfacing that the production ensemble is predictively one-model (Node2Vec)
03

Data

3.1 Signal Sources

  • ~140 RSS feeds across 12 content lanes (Healthcare M&A, Industrials, GCC/LATAM, Government, Tech, etc.; count fluctuates as the feed-health system retires failing feeds — 140 as of 2026-06-11)
  • NewsAPI.ai (Event Registry) — 10 articles per query, 6-hour lookback
  • Mediastack — supplementary global news coverage
  • Total daily signal volume: ~24,000 signals/day

3.2 Entity Graph

  • 6,371 entities extracted via AWS Comprehend (targeted sentiment)
  • 400 entities / 12,791 edges in the backend co-occurrence graph (threshold >= 2 shared articles; as of 2026-06-11), pruned to a 150-entity visualization subset
  • Entity enrichment: BICS 4-level classification (Sector → Industry Group → Industry → Sub-Industry), geography, market cap tier, deal context

3.3 External Data

  • SEC EDGAR EFTS: 8-K material events, SC 13D activist disclosures, S-1 IPO registrations
  • Search velocity: Mention acceleration computed from the signal stream
  • Amazon Titan embeddings: 1,536-dimensional text embeddings for narrative drift measurement
04

Methodology

4.1 Five Orthogonal Signals

4.1.1 Attention Cascade Intensity (ACI)

We model entity mention arrivals as a self-exciting Hawkes process. For entity e, the conditional intensity function is:

λ(t) = μ_e + α Σ_{t_i < t} β exp(-β(t - t_i))

where μ_e is the baseline mention rate estimated from the trailing 14-day window, α is the self-excitation parameter (fixed at 0.5), and β is the decay rate (fixed at 1.0 day⁻¹). The ACI signal for entity e on day d is the standardized residual:

ACI_e(d) = (observed_count - expected_count) / σ_e

where expected_count is the integral of λ(t) over day d and σ_e is the historical standard deviation of residuals. Positive ACI indicates attention exceeding the entity’s own self-exciting baseline — a signal that exogenous information is driving coverage beyond endogenous momentum.

4.1.2 Narrative Embedding Drift (NED)

For each entity e, we compute the centroid of Amazon Titan (1,536-dim) embeddings from all articles mentioning e in the trailing 7-day window. NED is the cosine distance between the current week’s centroid and the prior week’s centroid:

NED_e(d) = 1 - cos(c_e^{current}, c_e^{prior})

High NED indicates that the narrative context surrounding an entity has shifted — the same entity is being discussed in materially different terms. For emerging entities (first appearance), NED is set to 1.0 (maximum drift), reflecting complete novelty. NED captures qualitative narrative change that volume-based signals miss entirely: an entity can maintain constant mention volume while undergoing a complete narrative reframing.

4.1.3 Graph Spectral Shift (GSS)

We construct the entity co-occurrence graph G_d = (V, E_d) where edges are weighted by co-mention frequency in the trailing 14-day window. The normalized Laplacian is:

L = I - D^{-1/2} A D^{-1/2}

where A is the adjacency matrix and D is the degree matrix. We compute the top-k eigenvalues (k = min(50, |V|)) of L for consecutive days and measure:

GSS(d) = ||λ(d) - λ(d-1)||_2

For per-entity attribution, we compute the change in each entity’s eigenvector centrality between consecutive snapshots. GSS captures structural reorganization of the entity network — mergers, new alliances, and cluster dissolution manifest as spectral shifts before they appear in headlines.

4.1.4 Sentiment-Momentum Divergence (SMD)

For each entity e, we fit an OLS regression of sentiment on momentum (trailing 7-day mention velocity):

sentiment_e(d) = β_0 + β_1 · momentum_e(d) + ε_e(d)

The SMD signal is the standardized residual ε̂_e(d). Positive SMD indicates sentiment exceeding what momentum alone would predict (the market is more positive than attention warrants). Negative SMD indicates sentiment lagging momentum (rising attention with deteriorating narrative). SMD functions as a contrarian signal: extreme positive SMD often precedes corrections, while extreme negative SMD (attention rising, sentiment falling) often precedes adverse events such as regulatory actions or earnings misses.

4.1.5 Source Concentration (SC)

We compute the Herfindahl-Hirschman Index across source domains for each entity’s mentions:

SC_e(d) = Σ_i s_i^2

where s_i is the share of mentions from source domain i. SC ranges from 1/N (perfect diversification) to 1.0 (single source). High SC indicates information asymmetry — when a story is concentrated in one or two sources, it may represent a leak, exclusive, or planted narrative rather than broad consensus. SC serves as a signal quality modifier: high-SC signals should be treated with higher uncertainty.

4.2 Six-Layer Processing Pipeline

Layer 1: PCA Orthogonalization

The five raw signals are orthogonalized via PCA to remove residual correlation. We apply the Kaiser criterion (retain components with eigenvalue > 1) followed by VIF rejection (remove any component with Variance Inflation Factor > 5). This ensures that the composite score is not dominated by correlated signal pairs. The retained principal components are rotated back to the original signal space for interpretability.

Layer 2: BOCPD Regime Detection

We implement Adams and MacKay (2007) with a Normal-Inverse-Gamma conjugate prior on the composite signal stream. The prior parameters are:

  • μ_0 = 0 (centered z-score)
  • κ_0 = 1, α_0 = 1, β_0 = 1 (weakly informative)
  • Hazard function: constant 1/250 (expected regime length ~250 days)

Regime classification uses cumulative probability mass at short run lengths (< 7 days). A 7-day transition window smooths regime boundaries to prevent flip-flopping. Regimes are labeled: QUIET, NORMAL, ELEVATED, HIGH, EXTREME.

Layer 3: Inverse-Variance Weighted Composite

Each signal component receives weight inversely proportional to its trailing variance:

w_i = (1/σ_i^2) / Σ_j (1/σ_j^2)

This auto-updating scheme prioritizes stable, informative signals and downweights noisy components. Weights are recalculated daily on a 30-day trailing window.

Layer 4: Kalman Smoothing

We apply Harvey’s (1989) local linear trend model:

State:    x_t = x_{t-1} + v_{t-1} + η_t,  η_t ~ N(0, Q_level)
Velocity: v_t = v_{t-1} + ζ_t,              ζ_t ~ N(0, Q_trend)
Obs:      y_t = x_t + ε_t,                  ε_t ~ N(0, R)

The Rauch-Tung-Striebel backward pass decomposes each entity’s score into: - Trend (smoothed state) — reported as the PSI score - Mean-reverting component (innovation residual) — used for alert generation - Noise (observation error) — discarded

Confidence intervals are derived directly from the smoothed state covariance matrix P_{t|T}.

Layer 5: Cold-Start Handling

Entities with fewer than 30 observations receive James-Stein shrinkage toward their BICS sector mean:

PSI_shrunk = (1 - B) · PSI_raw + B · PSI_sector

where B is the James-Stein shrinkage factor and confidence ramps linearly from 0 at n=0 to 1 at n=30. Entities with fewer than 10 observations are suppressed entirely (score = 0, flagged as insufficient data).

Layer 6: Output

The final PSI score is a z-score with regime classification:

Table 1: PSI regime classification thresholds (z-score bands).

Table 1: PSI regime classification thresholds (z-score bands).
Regime Z-Score Range Interpretation
QUIET < -1 Below-normal activity; entity fading from coverage
NORMAL -1 to +1 Baseline activity; no actionable signal
ELEVATED +1 to +2 Above-normal activity; worth monitoring
HIGH +2 to +3 Significant disruption; likely event precursor
EXTREME > +3 Rare signal intensity; immediate attention required

For the first 30 days of system operation, percentile calibration supplements the z-score thresholds to account for limited distributional data.

4.3.1 Classical Heuristics

We compute four classical link prediction scores for all non-adjacent entity pairs:

  • Jaccard Coefficient: |N(u) ∩ N(v)| / |N(u) ∪ N(v)|
  • Adamic-Adar Index: Σ_{w ∈ N(u) ∩ N(v)} 1/log|N(w)|
  • Common Neighbors: |N(u) ∩ N(v)|
  • Preferential Attachment: |N(u)| · |N(v)|

These heuristics capture structural proximity from different perspectives — local neighborhood overlap (Jaccard, CN), weighted overlap penalizing high-degree intermediaries (Adamic-Adar), and global popularity (PA).

4.3.2 Feature-Based ML

A logistic regression model trained on 10 pair features:

  1. Sector match (binary: same BICS sector)
  2. PSI momentum difference (|PSI_u - PSI_v|)
  3. PSI momentum product (PSI_u × PSI_v)
  4. Degree product (deg_u × deg_v)
  5. Degree sum (deg_u + deg_v)
  6. PSI score difference
  7. PSI score product
  8. Mention ratio (min mentions / max mentions)
  9. Common neighbors count
  10. Jaccard coefficient

Training uses 80/20 stratified split with class-balanced sampling.

4.3.3 Graph Convolutional Network

A 2-layer GCN encoder following Kipf and Welling (2017):

H^{(1)} = ReLU(Â X W^{(0)})
Z = Â H^{(1)} W^{(1)}

where  = D̃^{-1/2} à D̃^{-1/2} is the normalized adjacency with self-loops and X is the node feature matrix (PSI scores, degree, sector encoding). Link scores are computed via cosine similarity: score(u,v) = cos(z_u, z_v). Training minimizes binary cross-entropy on held-out edges.

4.3.4 Node2Vec (Grover & Leskovec 2016)

Biased random walks with return parameter p=1 and in-out parameter q=0.5 (biased toward BFS-like exploration). Walk parameters:

  • Walk length: 30
  • Walks per node: 10
  • Embedding dimension: 32
  • Context window: 5

Training uses skip-gram with negative sampling SGD. Link scores are computed as the dot product of learned embeddings.

4.3.5 Variational Graph Autoencoder (Kipf & Welling 2016)

A 2-layer GCN encoder produces mean (μ) and log-variance (log σ²) vectors for each node:

μ = GCN_μ(X, A)
log σ² = GCN_σ(X, A)
z = μ + σ ⊙ ε,  ε ~ N(0, I)

The decoder reconstructs the adjacency matrix via inner product: Â = σ(Z Z^T). Training maximizes the ELBO:

L = E_q[log p(A|Z)] - KL[q(Z|X,A) || p(Z)]

4.3.6 TransE Knowledge Graph Embeddings (Bordes et al. 2013)

Entities and relations are embedded in ℝ^d such that h + r ≈ t for valid triples (h, r, t). We define four relation types derived from graph context:

  1. co_occurrence — entities co-mentioned in articles
  2. same_sector — entities sharing BICS sector classification
  3. deal_related — entities connected by M&A/partnership signals
  4. geographic_peer — entities in the same geographic market

Training minimizes the margin-based ranking loss with negative sampling. Link prediction scores are computed as -||h + r - t|| for each candidate relation type, taking the maximum.

4.3.7 Ensemble

The eight model scores are combined via weighted average:

score_ensemble(u,v) = Σ_i w_i · score_i(u,v)

Weights are optimized through a two-phase process: 1. Dirichlet search: Sample 200 random weight vectors from Dir(α=1) and evaluate on validation edges 2. Perturbation refinement: Take the best Dirichlet sample and perturb each weight by ±0.05, selecting improvements

Post-processing filters: - BICS sector filter: Both entities must have sector classification - Entity deduplication: Substring pairs excluded (e.g., “UnitedHealth” / “UnitedHealth Group”) - Confidence threshold: Model agreement (how many of 8 models rank the pair in top-20)

4.4 Interpretation Layer

Each entity receives a natural-language interpretation generated via a template-based system enhanced by Bedrock Haiku:

  1. What — which signal is the primary driver of the current PSI score (highest absolute z-component)
  2. Why — peer context including sector percentile rank and temporal direction (rising/falling/stable over 7 days)
  3. Action — regime-appropriate recommendation calibrated to the entity’s current state

Interpretations are regenerated daily and surfaced on the intelligence dashboard alongside the entity card metrics.

05

Experimental Setup

  • Holdout: 20% of existing edges removed for testing
  • Negative sampling: Equal number of non-edges sampled as negative examples
  • Metric: AUC (Area Under ROC Curve)
  • Baseline: Random prediction (AUC = 0.5)

5.2 Attention Diagnostic Suite (Weekly)

The Attention Diagnostic Suite (formerly “validation gates”) runs five statistical probes weekly on PSI history (MIN_VALIDATION_DATE = 2026-04-27). It characterizes signal structure — it is not a deployment go/no-go, does not authorize deal-forecast claims, and does not gate link-prediction promotion (those use separate metrics; see §6.1b–§6.1c and psi-production-model-policy.md).

  1. Permutation test (p < 0.01, 1,000 shuffles) — PSI vs forward mention growth contains more structure than random label assignment
  2. Subsample stability (3 non-overlapping periods) — sign consistency of the PSI–mention relationship across sub-windows
  3. Walk-forward cross-validation (>60% positive IC windows) — rolling information coefficient stability
  4. Factor regression (controlling news volume + sector momentum) — PSI beta after volume/sector controls
  5. Decay analysis (ACF half-life) — mean-reversion timescale for the composite signal

Diagnostic summary threshold (≥4/5 gates) is informational only; current result 2/5 (2026-06-21) does not block production PSI publication.

5.3 Event Study Design

LF-1 pre-registration. External deal-forecast evaluation uses a locked protocol (LF-1): events anchored to SEC EDGAR filing timestamps, PSI regime flags within ~20 days before filing, success defined as beating a mention-spike baseline with median lead ≥ 3 days. Attempt #2 (2026-06-08) pre-registered before inspection; attempt #3+ requires new pre-registration. See §6.4 and artifact models/psi/lf1_anchored_test.json.

  • Scan published articles for M&A keywords across 27 event types (acquisition, merger, IPO, activist stake, regulatory approval, partnership, divestiture, etc.)
  • Measure PSI in 6 time windows: pre-14d, pre-7d, pre-3d, event day, post-3d, post-7d
  • Primary metric: percentage of events where PSI was ELEVATED or higher before formal announcement
  • Secondary metric: average PSI lead time (days before event that PSI first reached ELEVATED)
06

Results

The production link-prediction layer runs eight reproducible models on the ~400-entity production graph (built by global_chart_render from per-article co-occurrence >= 2). Per-model holdout ROC-AUC and the full Dirichlet-weighted ensemble are measured monthly by the psi-fit-study Fargate job and reported here from models/psi/fit_study.json.

Filled from the controlled prod ECS run, run_id 2026-06-22T01:50:03Z (87 days of history, 400-entity universe, 5 CV folds, 7-day holdout; task fea0dbb4, exit 0). Holdout ROC-AUC and fit-gap are from fit_study.json; the Weight column is the production Dirichlet weight from the daily link_predictions.json optimizer.

Table 2: Link prediction holdout metrics from fit_study (2026-06-21 prod run).

Table 2: Link prediction holdout metrics from fit_study (2026-06-21 prod run).
Model Holdout ROC-AUC Fit-Gap Weight
Logistic Regression 0.677 0.048 0.00
Adamic-Adar 0.555 0.061 0.00
Common Neighbors 0.552 0.062 0.00
Jaccard 0.578 0.053 0.00
GCN 0.499 0.061 0.00
Node2Vec 0.570 0.043 1.00
Preferential Attachment 0.472 0.069 0.00
Ensemble (Dirichlet-weighted) 0.676 0.048 1.00

Two honest observations from this run:

  1. The 2026-06-21 prod artifact is HEALTHY. Ensemble fit-gap is 0.0483 (overall.status = "HEALTHY"), consistent with the staging run the same day (trend last runs: 0.078 → 0.052 → 0.049 → 0.048). Worst single fit is preferential_attachment (fit-gap ~0.069); Node2Vec is tightest (~0.043).
  2. Prod and staging keys now align. Controlled prod ECS task fea0dbb4 wrote models/psi/fit_study.json (~31 KB, run_id 2026-06-22T01:50:03Z, 87 history days). Staging artifact fit_study_STAGING.json (run_id 2026-06-21T22:41:31Z) reports ensemble fit-gap 0.0494 — within sampling noise of prod. Weekly schedule psi-fit-study-weekly is ENABLED (Sundays 09:00 UTC). Treat the monthly fit_study as an overfit diagnostic only; it does not gate GNN promotion (pinned holdout AUC vs 0.4293) or authorize deal-forecast claims.
  3. fit_study’s best single model and the production weight diverge. In the monthly CV-holdout evaluation, Logistic Regression is the strongest model (holdout 0.677) and matches the ensemble (0.676). Yet the daily production optimizer — fit on the separate psi_compute link-prediction eval set — assigns weight 1.00 to Node2Vec and 0.00 to everything else. Phase-0 forward holdout (§6.1b) ranks GBM above Node2Vec. These are different evaluation sets; “best model” is eval-set dependent.

Forward link prediction for production scoring policy is evaluated on a pinned 14-day holdout with HeaRT-style hard negatives, three rolling-origin anchors, and five fixed seeds (phase0_baseline_panel.json, generated_for: 2026-06-21). This is Mode E — GBM parity: the graph has not yet shown GNN attention beating a gradient-boosted baseline on forward edge formation.

Table 3: Phase-0 forward holdout panel — production link policy (2026-06-21).

Table 3: Phase-0 forward holdout panel — production link policy (2026-06-21).
Arm Mean AUPRC Mean AUC Notes
GBM 0.717 ~0.71 Production-first policy
GNN (GATv2 encoder) 0.589 ~0.46 Below GBM and Node2Vec on 2/3 windows
Node2Vec-only 0.580 ~0.60 Ablation-backed prod arm; beats GNN
Heuristics (best: Adamic-Adar) 0.556 ~0.46 Above GNN AUPRC on aggregate
EdgeBank 0.500 0.500 Recurrence floor

Baseline hierarchy: EdgeBank (0.50) → heuristics (~0.56) → Node2Vec (0.58) → GNN (0.59) → GBM (0.72). Production policy is GBM / Node2Vec-first; the eight-model Dirichlet ensemble remains a reproducibility and fit_study diagnostic construct — do not claim it beats GBM on forward holdout.

6.1c GNN Attention Refinement (Holdout Gate Only)

GNN attention (models/psi-gnn/latest.json, May 4 four-key manifest) is a separate refinement layer from link-prediction scoring. Promotion is gated solely on pinned forward holdout ROC-AUC — val AUC during GPU training does not gate promotion.

Table 4: GNN attention promotion gate (holdout AUC vs production bar).

Table 4: GNN attention promotion gate (holdout AUC vs production bar).
Field Value (2026-06-21)
Production bar 0.4293 (encoder_heads_20260504_154154.npz)
Holdout window 2026-06-07 .. 2026-06-21, n_pos 1932
Latest candidate holdout 0.4205 (Fargate gnn_retrain, 2026-06-21)
Decision SKIPPED — candidate ≤ prod; latest.json untouched
GPU val AUC (same run) 0.7217 — informational only; holdout 0.4286 still below bar

Weekly auto-retrain (EventBridge) remains disabled until a manual run PROMOTES above the holdout bar. GNN attention does not beat GBM on the Phase-0 panel (§6.1b); scaling GNN is not the validated path for link value.

6.2 Ablation Study

Per-model marginal contribution is measured by leave-one-out: for each of the eight models, the ensemble is re-optimized on the remaining seven features and the resulting AUC is compared to the full ensemble AUC.

Leave-one-out results from ablation_study.json (2026-06-08 run; full-ensemble AUC 0.822 on 1,524 positive / 7,482 negative held-out edges):

Table 5: Leave-one-out ablation study (2026-06-08).

Table 5: Leave-one-out ablation study (2026-06-08).
Model Full AUC LOO AUC Drop Pct Contribution
Node2Vec 0.822 0.006 0.816 99.3%
Logistic Regression 0.822 0.727 0.095 11.5%
GCN 0.822 0.731 0.091 11.1%
VGAE 0.822 0.733 0.089 10.8%
Adamic-Adar 0.822 0.735 0.087 10.6%
Common Neighbors 0.822 0.735 0.087 10.6%
Jaccard 0.822 0.736 0.086 10.5%
Preferential Attachment 0.822 0.737 0.085 10.4%

The previous ablate-by-zeroing logic in psi_compute/ablation.py reported auc_drop = 0 for any model whose full-ensemble weight was already zero (an artifact of the daily Dirichlet search). Leave-one-out, landed in A3, reports each model’s marginal value honestly.

The honest reading: production link prediction is effectively a one-model ensemble. Removing Node2Vec collapses the AUC from 0.822 to 0.006 — it carries 99.3% of the contribution. The other seven models are near-redundant: each leave-one-out drop is ~0.085–0.095, because once Node2Vec is removed the re-optimizer simply re-fits the remaining heuristics back to ~0.73. All eight models are run and scored (the “8-model reproducible ensemble” claim is structurally true), but predictively the ensemble is Node2Vec.

6.3 Attention Diagnostic Suite

The psi_validate Lambda runs weekly (Sun 08:00 UTC) as the Attention Diagnostic Suite — characterizing PSI vs forward 7-day log growth in entity mention count. It is not a deployment go/no-go and does not authorize deal-forecast claims (external anchor: LF-1 only, §6.4).

All runs cited here use the single post-GNN-blend regime (MIN_VALIDATION_DATE = 2026-04-27). Earlier weekly runs mixed pre-Apr-15 sparse graph-cap records with post-Apr-27 GNN-blended records; numbers from those mixed runs should not be cited.

Table 6: Attention Diagnostic Suite summary (2026-06-21).

Table 6: Attention Diagnostic Suite summary (2026-06-21).
Validation date 2026-06-21
N records 48
N dates (artifact)
Date range 2026-04-27 → 2026-06-13
Gates passed 2 / 5 (overall_passed: false)

Per-gate results:

  • Permutation test — PASS; actual correlation ≈ −0.91, p-value 0.0, threshold 0.01, n_observations 48.
  • Subsample stability — FAIL; sign not stable across three sub-windows.
  • Walk-forward CV — FAIL; 33% of rolling IC windows positive (threshold 60%).
  • Factor regression — FAIL; PSI beta ≈ −0.56, controlled for news_volume + sector_momentum.
  • Decay — PASS; half-life 11 observations.

Honest interpretation. The negative correlation is the substantive finding, not a failure mode: PSI captures attention peaking ahead of mean reversion in next-7-day mention growth. This is a different claim than link-prediction AUC (§6.1–§6.1b). The suite’s 2/5 result is diagnostic characterization — it does not block psi-compute, enrichment, or link prediction in production.

6.4 Event Study Results

The event study (models/psi/event_study.json, 2026-06-08 run) tests whether PSI was ELEVATED in the days before a deal/announcement event. From the published corpus it extracted 171 events spanning 232 entity-event pairs (after dropping 2,618 non-PSI entities, 316 duplicate pairs, and 27 events that predate PSI coverage, which begins 2026-03-23). For each pair PSI is sampled in six windows: pre-14d, pre-7d, pre-3d, event day, post-3d, post-7d.

Table 7: Internal event study coverage (2026-06-08; not an external anchor).

Table 7: Internal event study coverage (2026-06-08; not an external anchor).
Metric Value
Entity-event pairs 232
Pairs with ELEVATED PSI before the event 0 / 232 (0.0%)
Mean pre-event PSI -1.0

This is not yet an interpretable result, and should not be cited as a null. The 0/232 hit rate is dominated by coverage, not signal: most pre-event windows resolve to NO_DATA because the event corpus largely predates dense daily PSI history (the GNN-blend regime only begins 2026-04-27; see §6.3). Where data exists, pre-event PSI sits near the -1.0 floor — consistent with the §6.3 finding that PSI is depressed (not elevated) around attention peaks, but the sample of events with complete pre-event windows is currently too small to separate that effect from missing data. The event study becomes evaluable only once the clean-regime corpus is long enough to contain events with fully populated 14-day lookbacks — estimated Day 90+ of the post-Apr-27 regime (see §8.3, §9 limitation 5).

LF-1 Pre-Registration (Attempt #2)

Deal-forecast hypothesis tested under pre-registered protocol before results inspection. Events anchored to SEC EDGAR filing timestamps (8-K Item 1.01/2.01, SC-13D, S-4, DEFM14A) — exogenous to the news stream. Success rule locked: PSI ELEVATED/CRITICAL within ~20 calendar days before filing must beat mention-spike baseline (bootstrap CI) with median lead ≥ 3 days. Attempt #2 (2026-06-08) reused attempt #1 rules; only harness fixes (CIK join + trailing baseline) changed — no goalpost moving. Artifact: models/psi/lf1_anchored_test.json. Do not reopen without new pre-registration (attempt #3+) and explicit sign-off.

External anchored test — LF-1 attempt #2 (2026-06-08, closed FAIL). The internal event study above is circular: it measures PSI against the same coverage stream PSI is built from. We therefore ran a pre-registered, externally anchored test (models/psi/lf1_anchored_test.json): deal events dated by their SEC EDGAR filing timestamp (8-K Item 1.01/2.01, SC-13D, S-4, DEFM14A) — exogenous to the news stream — joined to PSI entities by resolved CIK. Pre-registered rule: an entity’s deal is “flagged” if PSI was ELEVATED/CRITICAL within ~20 calendar days before the filing; success = beat a mention-spike baseline (bootstrap-CI margin) and median lead ≥ 3 days. Attempt #2 uses the same locked rule as attempt #1; only harness fixes (CIK join + trailing baseline) changed — no goalpost moving.

Result: CLOSED FAIL — PSI does NOT lead deal events. PSI hit-rate 10.8% (CI 4.8–18.1%) vs naive mention-spike baseline 49.4% (CI 39.8–60.2%) — non-overlapping, PASS: False (n=83 evaluable clean-regime events). A trivial attention trigger anticipates ~half of deal events; PSI’s regime flag catches ~1 in 9. Do not reopen without new pre-registration (attempt #3+) and explicit sign-off. This corroborates §6.1b (GBM beats GNN on forward holdout), §6.2 (link prediction is Node2Vec-dominated in production), and §6.3 (PSI is a forward-attention contra-indicator). The deal-forecasting hypothesis is refuted on the one target with a clean external anchor. PSI’s validated value is enrichment — explaining why an entity matters — not event forecasting.

6.5 EDGAR Signal Integration

The EDGAR layer (models/psi/edgar_signals.json, 2026-06-07 run) cross- references PSI entities against recent SEC filings: 200 entities scanned, 155 (77.5%) with at least one filing in the lookback window.

The integration is wired and producing per-entity signals, but the current signal quality is low and is reported here honestly rather than as a validated result. The top-scoring “entities” are generic terms — Iran, President, China — matched to unrelated registrants (e.g. Iran → a net-lease REIT, President → a SPAC), with form_type and filing description frequently empty and material_filings at 0. The per-entity score is therefore driven by raw filing volume, not filing materiality or genuine entity identity. The bottleneck is entity resolution: PSI entities are extracted from news narrative (including geopolitical and role nouns) and do not map cleanly to SEC registrant names or CIKs.

Update (2026-06-08): a high-precision entity→CIK resolver now exists (lambdas/psi_compute/entity_resolver.py; 0.47% false-positive rate, ~1,000 entities mapped), so EDGAR signals can be keyed on resolved CIKs rather than the substring match. The resolver also exposed a structural ceiling: only ~10–25% of PSI entities resolve to any CIK — the universe is dominated by non-company nouns (people, places, geopolitics). This is consistent with the entity triage (only ~25% of the graph is market-linkable) and reframes EDGAR/ deal data as usable for a minority enrichment slice, not a universe-wide forecasting signal.

6.6 Defensibility Summary (2026-06-21)

Aligned with psi-poc-scorecard.md and the methodology page “Studies & Validation” section. All numbers cite dated S3 artifacts; clean-corpus floor ≥ 2026-04-27.

Claims we support
Claim Evidence
GBM beats GNN on forward link holdout Phase-0 panel: GBM AUPRC 0.717 vs Node2Vec 0.580 vs GNN 0.589
Production link ensemble is predictively Node2Vec-only Dirichlet weight 1.00 Node2Vec; LOO ablation 99.3% (2026-06-08)
PSI composite is a forward-attention contra-indicator (internal) Attention Diagnostic Suite: permutation PASS, r ≈ −0.91, p=0 (2026-06-21)
Signal decay is measurable Decay gate PASS; half-life 11 observations
GNN promotion gate works (audit trail) 2026-06-21 retrain SKIPPED at holdout 0.4205 ≤ prod 0.4293
Fit study prod healthy fit_study.json: fit-gap 0.0483, status HEALTHY (87 history days; weekly schedule enabled)
Claims we cannot support
Claim Result
PSI leads deal events (external) LF-1 attempt #2 closed FAIL: 10.8% vs spike 49.4%
Attention Diagnostic Suite “passes” 2 / 5 gates — diagnostic only, not a prod block
GNN attention beats GBM on forward holdout GNN AUPRC 0.589 < GBM 0.717
Promote latest GNN retrain on val AUC GPU val 0.7217 → holdout 0.4286 < prod 0.4293
Eight-model ensemble beats GBM on forward edges Reproducibility construct; Phase-0 ranks GBM first
Internal event study as anticipatory null 0 / 232 ELEVATED — not interpretable (coverage)

Metric separation (do not conflate): fit_study holdout AUC → overfit bands only; Phase-0 AUPRC → link-structure / GBM-first policy; GNN pinned holdout AUC → promote.py only; Attention Diagnostic Suite → signal characterization; LF-1 → deal-forecast hypothesis only.

07

Ablation Analysis

7.1 Model Ablation

The per-model leave-one-out ablation is reported in §6.2. The deployed ensemble is the eight reproducible models (four heuristics + Logistic + GCN + Node2Vec + VGAE); TransE was removed in A3 for non-reproducibility and is not ablated here. The headline result: Node2Vec carries ~99% of the contribution, the other seven models are near-redundant — see §6.2 for the full table.

7.2 Signal Ablation

Not yet computed. Unlike the model ablation (§6.2), which is produced daily by the link-prediction pipeline, there is currently no artifact that decomposes the composite PSI into its five constituent signals — ACI (Hawkes self-excitation), NED (embedding drift), GSS (spectral shift), SMD (sentiment-momentum divergence), and SC (source concentration) — and measures each one’s marginal contribution. The intended methodology is:

Signal Variance Explained Correlation w/ PSI IC (rank)
ACI (Hawkes) pending pending pending
NED (Embedding Drift) pending pending pending
GSS (Spectral Shift) pending pending pending
SMD (Sent-Mom Divergence) pending pending pending
SC (Source Concentration) pending pending pending

To fill this honestly requires a per-component analysis run over the daily PSI history that (a) regresses each component against forward 7-day mention growth to obtain its rank IC, (b) computes each component’s correlation with the composite, and (c) attributes variance via a PCA or leave-one-signal-out on the James-Stein composite. That computation does not exist yet; the numbers are deliberately left as pending rather than estimated. This is the natural companion to the signal-decomposition work and is the most defensible next addition to the validation suite.

7.3 Discussion

Models. The link-prediction ensemble is, predictively, a one-model system: the §6.2 leave-one-out ablation shows Node2Vec carrying ~99% of the ensemble AUC, with the remaining seven models near-redundant (each LOO drop ~0.085–0.095 because the optimizer re-fits the survivors). The honest implication is that the “eight-model ensemble” is a robustness and reproducibility claim, not a performance one — seven of the eight could be dropped from production scoring with negligible AUC loss, though they are cheap to compute and retaining them guards against Node2Vec degrading on a future graph. The fit_study diagnostic (§6.1) tells a different story — there Logistic Regression is the strongest single model — which underlines that “most critical model” is eval-set dependent and should not be over-read from either run alone.

Signals. The five-signal ablation (§7.2) is not yet computed, so no claim is made here about which of ACI/NED/GSS/SMD/SC is most critical. What §6.3 does establish is a property of the composite: it is a statistically significant forward-attention contra-indicator (permutation p≈0, factor-regression psi_beta -0.65), not a continuation signal. Determining which constituent signals drive that behavior is exactly what §7.2 is meant to answer and remains open.

Net. The defensible, evidence-backed claims today are narrow: (1) a reproducible link-prediction layer whose predictive content is concentrated in Node2Vec, and (2) a composite PSI that is a significant negative predictor of forward attention. Broader claims about per-signal contribution and event lead-time are not yet supported by the artifacts and are marked pending rather than asserted.

08

Defensibility and Temporal Moat

Scope note (2026-06-21). This section argued a forecasting moat. LF-1 attempt #2 (§6.4) closed FAIL: PSI does not lead deal events better than a naive baseline. Phase-0 (§6.1b) ranks GBM above GNN and Node2Vec on forward holdout; GNN promotion remains gated at holdout 0.4293 with 2026-06-21 candidates SKIPPED. The defensible moat is the earned graph + multi-signal enrichment asset (§8.1, §8.2), not event prediction. §8.3 timeline items on deal probability are contradicted by evidence — read as research agenda only.

8.1 Signal Combination Novelty

No published system combines Hawkes processes, spectral graph analysis, BOCPD, and Kalman smoothing for healthcare intelligence. Individual components are well-studied; the specific combination and domain application are novel. The closest related work in financial signal processing (e.g., Bloomberg’s event-driven analytics) operates on price and volume data rather than narrative structure.

8.2 Graph Topology as Earned Data

The entity co-occurrence graph required months of continuous ingestion from ~140 feeds. With a 400-node backend graph (12,791 edges as of 2026-06-11; 150-entity display subset) and daily temporal history, this graph cannot be replicated from static data. The temporal dimension — how edges form, strengthen, weaken, and dissolve over time — represents information that can only be accumulated through sustained operation.

8.3 Compounding Temporal Advantage

System capability increases non-linearly with data accumulation:

Milestone Capability Unlocked
Day 1 Signal computation only
Day 7 Lead-lag detection activates
Day 30 Validation gates fire, link prediction gets temporal features
Day 90 Event study reaches statistical significance — but see §6.4: the externally-anchored event study already ran and PSI failed to beat a naive baseline
Day 180 Deal probability model viable — first attempt (LF-1) already refuted; not on track
Day 365 Seasonal pattern detection, annual cycle modeling

Each additional day of operation adds to the baseline rate estimates (Hawkes), the embedding history (NED), the spectral trajectory (GSS), and the regression training data (SMD), creating a widening moat against systems that start later.

8.4 Ensemble Redundancy

Eight models from three methodological families (classical heuristics, ML, deep graph) provide reproducibility and diagnostic redundancy. Leave-one-out ablation (§6.2) shows Node2Vec dominates (~99% contribution) in production scoring; Phase-0 forward holdout (§6.1b) ranks GBM first. Production policy is GBM / Node2Vec-first — not “eight models contribute equally.” Model agreement tiers (6–8 models agree) remain a qualitative confidence signal for predicted pairs, but should not be read as evidence that all eight models carry predictive content on forward holdout.

09

Limitations

  1. Data dependency: PSI requires continuous news flow; coverage gaps (weekends, holidays) produce signal dropouts that the Kalman smoother partially but not fully compensates for
  2. Healthcare focus: Methodology validated only on healthcare entities; generalization to other sectors (defense, energy, technology) is hypothesized but unproven
  3. Cold start: New entities require 10+ observations before scoring and 30+ for full confidence; rapidly emerging entities may be underweighted during their most informative period
  4. Causal claims: PSI identifies structural conditions preceding events but does not establish causation; elevated PSI may reflect information leakage, genuine signal, or coincidental attention patterns
  5. Backtest limitation: Currently ~45 days of clean single-regime live data (since 2026-04-27, as of 2026-06-11; ~80 days including the pre-break regime that cannot be used for IID validation); long-horizon validation pending. Event study results will reach statistical significance only after sufficient deal events accumulate within the clean-regime window (estimated Day 90+)
  6. Graph sparsity: The 150-entity pruned graph excludes long-tail entities that may carry important signals; the co-occurrence threshold (>= 2 shared articles) trades recall for precision
  7. Embedding model dependency: NED relies on Amazon Titan embeddings; model updates or API changes could shift the embedding space and invalidate historical centroids
  8. Single-language limitation: All signal processing operates on English-language sources; healthcare intelligence in non-English markets requires separate ingestion infrastructure
  9. Pre-A3 TransE state: Until 2026-05-28, the production link-prediction ensemble included TransE (a knowledge-graph embedding model) at a non-trivial weight despite TransE being verified chaotically non-reproducible (1e-12 init perturbation → 0.64 output delta). A3 removed TransE from production and aligned the deployed ensemble with the eight-model fit_study diagnostic. Pre-A3 published probabilities should be treated as approximate; post-A3 probabilities are reproducible against the deployed weights.
10

Conclusion

PSI assembles statistical signal processing, graph learning, and NLP into a single operational pipeline that runs daily at production scale (~400-entity graph, 1,900+ scored entities, clean single-regime history since 2026-04-27). This draft reports what the artifacts currently support — and, deliberately, no more.

What is established. Evidence-backed claims as of 2026-06-21:

  1. GBM / Node2Vec-first link prediction (§6.1b): Phase-0 forward holdout AUPRC GBM 0.717, Node2Vec 0.580, GNN 0.589; production Dirichlet weights remain Node2Vec-only with LOO ablation 99.3% (§6.2).
  2. GNN attention gated, not promoted (§6.1c): prod holdout bar 0.4293 (May 4 manifest); 2026-06-21 candidate SKIPPED; weekly auto-retrain disabled.
  3. Attention Diagnostic Suite (§6.3): 2/5 gates; significant contra- indicator (permutation p≈0, r≈−0.91); not a deployment go/no-go.
  4. LF-1 attempt #2 closed FAIL (§6.4): PSI 10.8% vs spike 49.4% — deal forecasting hypothesis refuted on the clean external anchor.
  5. Fit study prod HEALTHY (§6.1): fit_study.json fit-gap 0.0483; weekly Fargate schedule enabled (psi-fit-study:13, image f1a03d6).

What is not yet established. Per-signal ablation (§7.2) uncomputed; internal event study (§6.4) coverage-limited; EDGAR integration (§6.5) entity-resolution- bound. The eight-model ensemble does not beat GBM on forward holdout. The 2026-06-07 prod fit_study OVERFIT band (0.078) is a single monthly snapshot — staging HEALTHY (0.0494) suggests the diagnostic can recover with longer history.

The honest position is that PSI is a working, defensible enrichment system — a reproducible relationship-prediction layer plus a five-signal characterization of an entity’s information structure — but not an event forecaster. Its one internally-significant relationship (the forward-attention contra-indicator, §6.3) is endogenous and not external validation; the single test with a clean external anchor (§6.4, SEC-filing-dated deal events) refuted the forecasting hypothesis (PSI 10.8% vs naive baseline 49.4%). The Attention Diagnostic Suite (formerly framed as “validation gates”) characterizes signal structure — it is not a deployment go/no-go; its value is surfacing this — separating a real enrichment asset from an unsupported forecasting claim, rather than obscuring the difference.

Future work, in priority order: (1) lean into the validated direction — ship the graph + signals as enrichment/retrieval (the “why an entity matters” use case); the first such product outcome shipped 2026-06-11: the mention-spike baseline itself (the rule that beat PSI in §6.4) is now a watchlist alert trigger, with its pre-registered constants unchanged; (2) key EDGAR/deal data on the new entity→CIK resolver to strengthen that enrichment layer; (3) compute the §7.2 per-signal ablation; (4) if forecasting is revisited, do it only against a fresh external anchor (e.g. regulatory/FDA events) under the same pre-registration discipline — not by scaling the GNN, which Phase 0 showed is at GBM-parity. The previously-planned “deal-probability model” is removed from the roadmap: its premise was tested early (LF-1) and did not hold.

Ref

References

Literature

  1. Adams, R.P. and MacKay, D.J.C. (2007). “Bayesian Online Changepoint Detection.” arXiv:0710.3742.
  2. Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., and Yakhnenko, O. (2013). “Translating Embeddings for Modeling Multi-relational Data.” NIPS 2013.
  3. Grover, A. and Leskovec, J. (2016). “node2vec: Scalable Feature Learning for Networks.” KDD 2016.
  4. Harvey, A.C. (1989). “Forecasting, Structural Time Series Models and the Kalman Filter.” Cambridge University Press.
  5. Hawkes, A.G. (1971). “Spectra of some self-exciting and mutually exciting point processes.” Biometrika, 58(1), 83-90.
  6. James, W. and Stein, C. (1961). “Estimation with Quadratic Loss.” Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, 1, 361-379.
  7. Kipf, T.N. and Welling, M. (2016). “Variational Graph Auto-Encoders.” NIPS Workshop on Bayesian Deep Learning.
  8. Kipf, T.N. and Welling, M. (2017). “Semi-Supervised Classification with Graph Convolutional Networks.” ICLR 2017.
  9. Sun, Z., Deng, Z., Nie, J., and Tang, J. (2019). “RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space.” ICLR 2019.

Production Artifacts (S3)

All paths relative to s3://plocamium-content/models/psi/ unless noted.

  • fit_study.json — monthly link-prediction CV diagnostic (prod key; staging: fit_study_STAGING.json)
  • phase0_baseline_panel.json — Phase-0 forward holdout panel (GBM / Node2Vec / GNN arms)
  • ablation_study.json — leave-one-out ensemble ablation (2026-06-08)
  • validation/latest.json — Attention Diagnostic Suite weekly output
  • lf1_anchored_test.json — LF-1 pre-registered external deal anchor (attempt #2, closed FAIL)
  • event_study.json — internal event-study coverage artifact
  • edgar_signals.json — SEC EDGAR cross-reference signals
  • psi-gnn/latest.json — GNN attention promotion manifest (holdout gate 0.4293)
AA

Appendix A: System Architecture

Four AWS Lambda functions (arm64, Python 3.12):

Lambda Purpose Schedule Timeout
psi-compute 5 signals, PCA, BOCPD, Kalman, James-Stein, EDGAR boost Daily 13:30 UTC 900s
card-compute Entity intelligence cards with 5 metrics Async (post-compute) 900s
psi-validate Attention Diagnostic Suite (5 probes) Weekly 900s
psi-enrichment Link prediction, EDGAR scan, velocity, event study, lead-lag, peer-relative, interpretations Async (post-compute) 900s

All Lambdas share a common layer with NumPy, SciPy, scikit-learn, and graph libraries. State is persisted in S3 (signal history) and DynamoDB (entity metadata, scores).

AB

Appendix B: Reproducibility

All code is available at github.com/jtannahill/plocamium-content-engine. The system comprises 16 Python modules totaling ~4,000 lines of signal processing, graph learning, and statistical analysis code, with 145+ unit tests.

Key modules: - psi_signals.py — Five signal computations (ACI, NED, GSS, SMD, SC) - psi_pipeline.py — Six-layer processing pipeline (PCA, BOCPD, IVW, Kalman, James-Stein, output) - link_prediction.py — Eight-model reproducible ensemble with auto-weight optimization (leave-one-out ablation) - psi_validation.py — Attention Diagnostic Suite (five statistical probes) - psi_enrichment.py — EDGAR integration, event study, lead-lag analysis

AC

Appendix C: Entity Intelligence Card Example

The card-compute Lambda has produced its first batches; cards_latest.json (2026-06-08) holds 87 live entity cards. A real card, rendered from production data, follows. The original draft showed a hypothetical “UnitedHealth Group” card; it is replaced here with an actual one to avoid presenting fabricated numbers as output.

┌─────────────────────────────────────────┐
│  ENTITY: FDA                            │
│  PSI Score: -0.70 (NORMAL)              │
│  Momentum: 60.4   Degree: 46            │
│  Network position: 0.159                │
├─────────────────────────────────────────┤
│  ACI (Attention cascade):   -0.79       │
│  NED (Narrative drift):      0.00       │
│  GSS (Spectral shift):     464.70       │
│  SMD (Sent-Mom divergence):  0.00       │
│  SC  (Source concentration): 0.91       │
├─────────────────────────────────────────┤
│  Dominant signal: spectral_shift        │
└─────────────────────────────────────────┘

Two honest observations from the live cards, both consistent with limitations already noted:

  1. Entity quality. The top production entities are news nouns — FDA, Iran, China, “President”, “Strait of Hormuz” — not the healthcare companies the card format was designed around. This is the same entity-resolution gap flagged in §6.5: PSI extracts entities from narrative, and the corpus is currently geopolitics-heavy.
  2. GSS is not yet a per-entity contribution. The spectral-shift value (464.70) is identical across FDA, Iran, and other entities, and therefore dominates dominant_signal for nearly every card. It reads as a shared, un-normalized global spectral metric rather than an entity-specific contribution. The per-component normalization needed to make this column comparable is the same work blocking the §7.2 signal ablation, and the card interpretation text is intentionally omitted until it lands.