Section A
The numbers at a glance
The Reading the Room essay describes the intuition. This page is the evidence. Every figure below comes from walk-forward testing with strict separation between training and evaluation data.
Events Scored
2,896
Earnings calls
t-Statistic
7.08
p < 10-12
Info Coefficient
0.115
"Exceptional" > 0.10
Annual Spread
21%
Top vs bottom 20%
Win Rate
92%
Long top 20%, 126d
To put the t-statistic in context: academic finance papers generally consider 3.0 to be highly significant. At 7.08, the probability that this signal is noise is less than one in a trillion. The information coefficient of 0.115 exceeds the "exceptional" threshold used in quantitative equity research, where anything above 0.10 is rare.
Section B
The three questions
Each earnings transcript is scored on three dimensions. The composite score drives the filter. Here is each question and its individual contribution to the overall signal.
Question 1
Creator Energy
Is this leader building the future or defending the past? The AI reads for language that initiates -- new frameworks, forward investments, willingness to cannibalize existing revenue -- versus language that reacts, deflects, or frames maintenance as innovation. A cautious CEO can still score high. The signal is directional: where is the leader's attention pointed?
Question 2
Adaptation Speed
When challenged, does the leader pivot fluidly or resist? The analyst Q&A is the test. CEOs who absorb difficult questions, reframe them, and answer substantively are demonstrating real-time cognitive flexibility. CEOs who repeat prepared language or redirect are demonstrating rigidity. Adaptive organizations outperform rigid ones -- and the earnings call is where rigidity is most visible.
Question 3
Conviction Depth
Is the confidence genuine or performed? Genuine conviction shows in specific, operationally grounded language -- "14% conversion improvement in our Southeast Asian mobile channel." Performed conviction shows in abstract, aspirational language -- "we are incredibly excited about the enormous opportunity." The market cannot easily distinguish between these in real time. The AI can.
Individual ICs do not sum to 0.115 because the composite captures interaction effects between dimensions. A CEO who scores high on conviction but low on adaptation signals a different profile than one who scores moderately on both.
Section C
Three approaches. Two dead. One alive.
The contextual psychology approach was not the first thing we tried. It was the third. Publishing the failures is how we demonstrate that the surviving method was not cherry-picked from a buffet of attempts. Two approaches consumed significant research time and produced nothing.
Approach 1 -- Dead
Bag-of-Words Features
Count positive words, negative words, uncertainty phrases, forward-looking statement ratios. Extract sentiment scores from raw text. The classic NLP-in-finance playbook.
Why it failed: This edge was saturated years ago. Every quant shop with a Bloomberg terminal runs word-count sentiment. The same word -- "challenging" -- means entirely different things in "we are addressing these challenging conditions" versus "we face unprecedented challenges." Counting words destroys context, and context is where the signal lives.
Approach 2 -- Dead
Outcome-Supervised Scoring
Train the model to predict stock returns directly. Feed it transcripts and their subsequent 6-month performance. Let the model learn the mapping between language and price.
~0
Information Coefficient
Why it failed: Circular reasoning. A stock's 6-month return is determined by hundreds of factors -- macro conditions, sector rotation, earnings surprises, geopolitics. Trying to reverse-engineer the management quality signal from this noise is asking the model to find a needle in a haystack by describing the haystack. You need to know what you are measuring before you can measure it.
Approach 3 -- Alive
Contextual Psychology
Don't predict prices. Don't count words. Instead, have AI read for the psychological state of the speaker -- creator energy, adaptation speed, conviction depth -- the same social signals humans detect unconsciously but cannot scale.
Why it works: It measures something real. The AI does not predict the stock. It reads the person. Companies led by adaptive, creative, genuinely convicted leaders tend to outperform -- not because the model learned that from price data, but because it understands management quality as a concept. The stock follows naturally.
"Two dead. One alive. Publishing the failures is how you know the survivor wasn't cherry-picked."
Section D
The quintile breakdown
Every quarter, all scored companies are ranked by their composite NLP score and divided into five equal groups. The table below shows what happened to each group over the subsequent 126 trading days (approximately six months).
| Quintile |
Avg 126-Day Return |
Win Rate |
Count |
| Top 20% (best CEOs) |
+14.2% |
92% |
~580 |
| Q2 |
+9.8% |
76% |
~580 |
| Q3 |
+6.4% |
61% |
~580 |
| Q4 |
+1.7% |
48% |
~580 |
| Bottom 20% (worst CEOs) |
-6.8% |
34% |
~580 |
| Spread (Top - Bottom) |
21.0% |
-- |
2,896 |
The monotonic decline from Q1 to Q5 is the strongest evidence of a real signal. A random factor would show noise across quintiles. A curve-fit signal might show edge at the extremes but chaos in the middle. A real signal shows a clean, ordered gradient -- and the gradient above is almost perfectly linear.
Top Quintile CAGR
+27.9%
Annualized from 126-day returns
Correlation to Sentinel
-0.03
Essentially zero -- free diversification
Section E
How it's used in the portfolio
Integration Model
A Filter, Not a Strategy
The NLP system is not a standalone strategy. It is a filter layered across all equity engines in the portfolio. This is a critical architectural choice -- it means the NLP edge amplifies every equity strategy without adding complexity or turnover.
- Every quarter, 500+ companies are scored on the three-question composite
- Top 20% receive favored allocation -- existing strategy signals on these names get larger position sizes
- Bottom 20% are "shadow banned" -- no equity engine can see or trade them until the next quarterly scoring
- Middle 60% are traded normally -- the NLP filter neither helps nor hinders
- Result: systematic removal of deteriorating leadership, systematic overweight of strong leadership
- Correlation to Sentinel engine: -0.03 -- essentially zero, meaning free diversification
The quarterly cadence is important. Management quality changes slowly. A CEO who scored well in Q1 is overwhelmingly likely to score well in Q2. The filter operates on a different timescale than the fast-moving signals that drive daily position changes -- a structural layer beneath the tactical execution.
~$0.50
Per quarter for 500 companies. Three API calls per transcript. The most cost-effective edge in the portfolio -- a signal that would cost a fundamental research team millions of dollars annually to replicate manually, for less than the price of a coffee.
Every number on this page is available for independent verification. The methodology documentation, scoring rubric, and raw quintile data are available to qualified investors during due diligence. We welcome the scrutiny.