NLP Evidence: The Data Behind "Reading the Room"

Section A

The numbers at a glance

The Reading the Room essay describes the intuition. This page is the evidence. Every figure below comes from walk-forward testing with strict separation between training and evaluation data.

Events Scored

2,896

Earnings calls

Companies

500+

Quarterly

t-Statistic

7.08

p < 10^-12

Info Coefficient

0.115

"Exceptional" > 0.10

Annual Spread

21%

Top vs bottom 20%

Win Rate

92%

Long top 20%, 126d

To put the t-statistic in context: academic finance papers generally consider 3.0 to be highly significant. At 7.08, the probability that this signal is noise is less than one in a trillion. The information coefficient of 0.115 exceeds the "exceptional" threshold used in quantitative equity research, where anything above 0.10 is rare.

Section B

The three questions

Each earnings transcript is scored on three dimensions. The composite score drives the filter. Here is each question and its individual contribution to the overall signal.

Question 1

Creator Energy

Is this leader building the future or defending the past? The AI reads for language that initiates -- new frameworks, forward investments, willingness to cannibalize existing revenue -- versus language that reacts, deflects, or frames maintenance as innovation. A cautious CEO can still score high. The signal is directional: where is the leader's attention pointed?

IC Contribution

0.044

Question 2

Adaptation Speed

When challenged, does the leader pivot fluidly or resist? The analyst Q&A is the test. CEOs who absorb difficult questions, reframe them, and answer substantively are demonstrating real-time cognitive flexibility. CEOs who repeat prepared language or redirect are demonstrating rigidity. Adaptive organizations outperform rigid ones -- and the earnings call is where rigidity is most visible.

IC Contribution

0.037

Question 3

Conviction Depth

Is the confidence genuine or performed? Genuine conviction shows in specific, operationally grounded language -- "14% conversion improvement in our Southeast Asian mobile channel." Performed conviction shows in abstract, aspirational language -- "we are incredibly excited about the enormous opportunity." The market cannot easily distinguish between these in real time. The AI can.

IC Contribution

0.034

Individual ICs do not sum to 0.115 because the composite captures interaction effects between dimensions. A CEO who scores high on conviction but low on adaptation signals a different profile than one who scores moderately on both.

Section C

Three approaches. Two dead. One alive.

The contextual psychology approach was not the first thing we tried. It was the third. Publishing the failures is how we demonstrate that the surviving method was not cherry-picked from a buffet of attempts. Two approaches consumed significant research time and produced nothing.

Approach 1 -- Dead

Bag-of-Words Features

Count positive words, negative words, uncertainty phrases, forward-looking statement ratios. Extract sentiment scores from raw text. The classic NLP-in-finance playbook.

Features tested

0.75

Max t-stat

0 / 13

Significant

Why it failed: This edge was saturated years ago. Every quant shop with a Bloomberg terminal runs word-count sentiment. The same word -- "challenging" -- means entirely different things in "we are addressing these challenging conditions" versus "we face unprecedented challenges." Counting words destroys context, and context is where the signal lives.

Approach 2 -- Dead

Outcome-Supervised Scoring

Train the model to predict stock returns directly. Feed it transcripts and their subsequent 6-month performance. Let the model learn the mapping between language and price.

0.11

Max t-stat

Information Coefficient

Noise

Verdict

Why it failed: Circular reasoning. A stock's 6-month return is determined by hundreds of factors -- macro conditions, sector rotation, earnings surprises, geopolitics. Trying to reverse-engineer the management quality signal from this noise is asking the model to find a needle in a haystack by describing the haystack. You need to know what you are measuring before you can measure it.

Approach 3 -- Alive

Contextual Psychology

Don't predict prices. Don't count words. Instead, have AI read for the psychological state of the speaker -- creator energy, adaptation speed, conviction depth -- the same social signals humans detect unconsciously but cannot scale.

7.08

t-statistic

0.115

Info Coefficient

21%

Annual Spread

Why it works: It measures something real. The AI does not predict the stock. It reads the person. Companies led by adaptive, creative, genuinely convicted leaders tend to outperform -- not because the model learned that from price data, but because it understands management quality as a concept. The stock follows naturally.

"Two dead. One alive. Publishing the failures is how you know the survivor wasn't cherry-picked."

Section D

The quintile breakdown

Every quarter, all scored companies are ranked by their composite NLP score and divided into five equal groups. The table below shows what happened to each group over the subsequent 126 trading days (approximately six months).

Quintile	Avg 126-Day Return	Win Rate	Count
Top 20% (best CEOs)	+14.2%	92%	~580
Q2	+9.8%	76%	~580
Q3	+6.4%	61%	~580
Q4	+1.7%	48%	~580
Bottom 20% (worst CEOs)	-6.8%	34%	~580
Spread (Top - Bottom)	21.0%	--	2,896

The monotonic decline from Q1 to Q5 is the strongest evidence of a real signal. A random factor would show noise across quintiles. A curve-fit signal might show edge at the extremes but chaos in the middle. A real signal shows a clean, ordered gradient -- and the gradient above is almost perfectly linear.

Top Quintile CAGR

+27.9%

Annualized from 126-day returns

Correlation to Sentinel

-0.03

Essentially zero -- free diversification

Section E

How it's used in the portfolio

Integration Model

A Filter, Not a Strategy

The NLP system is not a standalone strategy. It is a filter layered across all equity engines in the portfolio. This is a critical architectural choice -- it means the NLP edge amplifies every equity strategy without adding complexity or turnover.

Every quarter, 500+ companies are scored on the three-question composite
Top 20% receive favored allocation -- existing strategy signals on these names get larger position sizes
Bottom 20% are "shadow banned" -- no equity engine can see or trade them until the next quarterly scoring
Middle 60% are traded normally -- the NLP filter neither helps nor hinders
Result: systematic removal of deteriorating leadership, systematic overweight of strong leadership
Correlation to Sentinel engine: -0.03 -- essentially zero, meaning free diversification

The quarterly cadence is important. Management quality changes slowly. A CEO who scored well in Q1 is overwhelmingly likely to score well in Q2. The filter operates on a different timescale than the fast-moving signals that drive daily position changes -- a structural layer beneath the tactical execution.

~$0.50

Per quarter for 500 companies. Three API calls per transcript. The most cost-effective edge in the portfolio -- a signal that would cost a fundamental research team millions of dollars annually to replicate manually, for less than the price of a coffee.

Every number on this page is available for independent verification. The methodology documentation, scoring rubric, and raw quintile data are available to qualified investors during due diligence. We welcome the scrutiny.