Evidence

NLP Evidence: The Data Behind "Reading the Room"

2,896 earnings events. 500+ companies. Three questions. Every number, including what didn't work.

Data document Prism Capital Research Updated June 2026
Section A

The numbers at a glance

The Reading the Room essay describes the intuition. This page is the evidence. Every figure below comes from walk-forward testing with strict separation between training and evaluation data.

Companies
500+
Quarterly
Info Coefficient
0.115
"Exceptional" > 0.10
Win Rate
92%
Long top 20%, 126d

To put the t-statistic in context: academic finance papers generally consider 3.0 to be highly significant. At 7.08, the probability that this signal is noise is less than one in a trillion. The information coefficient of 0.115 exceeds the "exceptional" threshold used in quantitative equity research, where anything above 0.10 is rare.

Section B

The three questions

Each earnings transcript is scored on three dimensions. The composite score drives the filter. Here is each question and its individual contribution to the overall signal.

Question 1
Creator Energy
Is this leader building the future or defending the past? The AI reads for language that initiates -- new frameworks, forward investments, willingness to cannibalize existing revenue -- versus language that reacts, deflects, or frames maintenance as innovation. A cautious CEO can still score high. The signal is directional: where is the leader's attention pointed?
IC Contribution
0.044
Question 2
Adaptation Speed
When challenged, does the leader pivot fluidly or resist? The analyst Q&A is the test. CEOs who absorb difficult questions, reframe them, and answer substantively are demonstrating real-time cognitive flexibility. CEOs who repeat prepared language or redirect are demonstrating rigidity. Adaptive organizations outperform rigid ones -- and the earnings call is where rigidity is most visible.
IC Contribution
0.037
Question 3
Conviction Depth
Is the confidence genuine or performed? Genuine conviction shows in specific, operationally grounded language -- "14% conversion improvement in our Southeast Asian mobile channel." Performed conviction shows in abstract, aspirational language -- "we are incredibly excited about the enormous opportunity." The market cannot easily distinguish between these in real time. The AI can.
IC Contribution
0.034

Individual ICs do not sum to 0.115 because the composite captures interaction effects between dimensions. A CEO who scores high on conviction but low on adaptation signals a different profile than one who scores moderately on both.

Section C

Three approaches. Two dead. One alive.

The contextual psychology approach was not the first thing we tried. It was the third. Publishing the failures is how we demonstrate that the surviving method was not cherry-picked from a buffet of attempts. Two approaches consumed significant research time and produced nothing.

Approach 1 -- Dead
Bag-of-Words Features
Count positive words, negative words, uncertainty phrases, forward-looking statement ratios. Extract sentiment scores from raw text. The classic NLP-in-finance playbook.
13
Features tested
0.75
Max t-stat
0 / 13
Significant
Why it failed: This edge was saturated years ago. Every quant shop with a Bloomberg terminal runs word-count sentiment. The same word -- "challenging" -- means entirely different things in "we are addressing these challenging conditions" versus "we face unprecedented challenges." Counting words destroys context, and context is where the signal lives.
Approach 2 -- Dead
Outcome-Supervised Scoring
Train the model to predict stock returns directly. Feed it transcripts and their subsequent 6-month performance. Let the model learn the mapping between language and price.
0.11
Max t-stat
~0
Information Coefficient
Noise
Verdict
Why it failed: Circular reasoning. A stock's 6-month return is determined by hundreds of factors -- macro conditions, sector rotation, earnings surprises, geopolitics. Trying to reverse-engineer the management quality signal from this noise is asking the model to find a needle in a haystack by describing the haystack. You need to know what you are measuring before you can measure it.
Approach 3 -- Alive
Contextual Psychology
Don't predict prices. Don't count words. Instead, have AI read for the psychological state of the speaker -- creator energy, adaptation speed, conviction depth -- the same social signals humans detect unconsciously but cannot scale.
7.08
t-statistic
0.115
Info Coefficient
21%
Annual Spread
Why it works: It measures something real. The AI does not predict the stock. It reads the person. Companies led by adaptive, creative, genuinely convicted leaders tend to outperform -- not because the model learned that from price data, but because it understands management quality as a concept. The stock follows naturally.

"Two dead. One alive. Publishing the failures is how you know the survivor wasn't cherry-picked."

Section D

The quintile breakdown

Every quarter, all scored companies are ranked by their composite NLP score and divided into five equal groups. The table below shows what happened to each group over the subsequent 126 trading days (approximately six months).

Quintile Avg 126-Day Return Win Rate Count
Top 20% (best CEOs) +14.2% 92% ~580
Q2 +9.8% 76% ~580
Q3 +6.4% 61% ~580
Q4 +1.7% 48% ~580
Bottom 20% (worst CEOs) -6.8% 34% ~580
Spread (Top - Bottom) 21.0% -- 2,896

The monotonic decline from Q1 to Q5 is the strongest evidence of a real signal. A random factor would show noise across quintiles. A curve-fit signal might show edge at the extremes but chaos in the middle. A real signal shows a clean, ordered gradient -- and the gradient above is almost perfectly linear.

Correlation to Sentinel
-0.03
Essentially zero -- free diversification
Section E

How it's used in the portfolio

Integration Model
A Filter, Not a Strategy

The NLP system is not a standalone strategy. It is a filter layered across all equity engines in the portfolio. This is a critical architectural choice -- it means the NLP edge amplifies every equity strategy without adding complexity or turnover.

The quarterly cadence is important. Management quality changes slowly. A CEO who scored well in Q1 is overwhelmingly likely to score well in Q2. The filter operates on a different timescale than the fast-moving signals that drive daily position changes -- a structural layer beneath the tactical execution.

~$0.50
Per quarter for 500 companies. Three API calls per transcript. The most cost-effective edge in the portfolio -- a signal that would cost a fundamental research team millions of dollars annually to replicate manually, for less than the price of a coffee.

Every number on this page is available for independent verification. The methodology documentation, scoring rubric, and raw quintile data are available to qualified investors during due diligence. We welcome the scrutiny.

← Previous
Factory Pipeline
Next →
Data Infrastructure