Reading the Room: What CEOs Say vs How They Say It

Movement I

Thirty Seconds in a Room

Imagine you walk into a room and someone is telling you about their company. They are standing at the front, speaking clearly, presenting numbers on a slide. You have never met this person. You know nothing about their industry, their revenue, their margins.

Within thirty seconds, you know something. You can feel whether this person is creating the future or reacting to events. You can hear whether they are adaptive -- pivoting fluidly when challenged -- or rigid, deflecting every question back to a rehearsed talking point. You can tell whether their confidence is real, rooted in knowledge of their business, or performed, a mask stretched over uncertainty.

That is not intuition. It is data. Humans process social energy signals constantly -- vocal cadence, word choice, response latency, the gap between what is said and what is meant. We evolved to read these signals because our survival depended on knowing who to trust. Every person in that room is running the same unconscious algorithm. Most people call it a "gut feeling." We call it an unmeasured variable.

The question that launched this research was simple: what if we could measure it?

"The numbers tell you what happened. The voice tells you what's coming."

Movement II

The Observation

Earnings calls are the richest source of unstructured information in public markets. Four times a year, the CEO and CFO of every publicly traded company sit on a call with analysts and answer questions about their business. The financial data they present -- revenue, earnings, guidance -- is immediately reflected in the stock price. Algorithms parse the numbers in milliseconds. That information is priced before a human can finish reading the headline.

But the way they present the numbers -- the language patterns, the framing choices, the responses to adversarial questions -- takes longer to process. Most analysts focus on the numbers and treat the words as decoration. The market processes the quantitative content within seconds and the qualitative content over days, weeks, sometimes never.

That gap is the edge.

The system reads every earnings call transcript in the S&P 500 and scores it on three dimensions. Not the content of what was said -- the content is already priced -- but the psychological signature of how it was said. Three questions, each scored by AI that can process language at a depth and consistency no human analyst could sustain across 500 companies, four times per year.

Movement III

The Three Questions

Question 1

Creator Energy

Is this leader building the future or defending the past? Creator energy is visible in language that initiates -- new frameworks, forward-looking investments, willingness to cannibalize existing revenue for long-term positioning. Defensive energy is visible in language that reacts -- explaining away misses, blaming externalities, framing maintenance as innovation. The distinction is not about optimism versus pessimism. A CEO can be cautious and still exhibit creator energy. The signal is directional: where is the leader's attention pointed?

Question 2

Adaptation Speed

When challenged, does the leader pivot fluidly or resist and deflect? Earnings calls always include an analyst Q&A session. Some analysts ask sharp questions. The CEO's response -- not the content but the cognitive flexibility it reveals -- is enormously informative. Leaders who absorb a difficult question, reframe it, and offer a substantive answer are demonstrating real-time adaptation. Leaders who repeat prepared language, redirect to a different topic, or become defensive are demonstrating rigidity. Adaptive organizations outperform rigid ones. The earnings call is where you can see which is which.

Question 3

Conviction Depth

Is the confidence genuine or carefully performed? Genuine conviction manifests in specific, operationally grounded language -- the CEO who says "we're seeing 14% conversion improvement in our Southeast Asian mobile channel" is speaking from direct knowledge. Performed conviction manifests in abstract, aspirational language -- the CEO who says "we are incredibly excited about the enormous opportunity ahead of us" is speaking from a communications playbook. The market cannot easily distinguish between these in real time. The AI can.

Movement IV

The Numbers

Every claim in quantitative finance must be accompanied by its statistical foundation. The following figures represent the NLP scoring system tested across multiple years of earnings events, with strict separation between the data used to develop the scoring methodology and the data used to measure its performance.

2,896

Total earnings events scored and measured across multiple years

7.08

t-statistic (above 3.0 is highly significant)

0.115

Information Coefficient

92%

Win rate, top quintile, 126-day horizon

To put these numbers in context: a t-statistic of 7.08 means the probability that this signal is random noise is less than one in a trillion. Most published academic finance papers consider a t-statistic above 3.0 to be significant. The NLP composite is more than twice that threshold.

The Information Coefficient of 0.115 measures the correlation between the signal's prediction and the actual subsequent return. In equity research, an IC of 0.05 is considered useful and an IC of 0.10 is considered exceptional. The NLP composite exceeds the exceptional threshold.

The practical translation: companies whose management scored in the top 20% on the three-question composite outperformed companies in the bottom 20% by 21 percentage points annually. Going long the top quintile and short the bottom quintile at a 126-day horizon (approximately six months) produced a win rate of 92%.

21%

Annual spread, top vs bottom quintile

-0.03

Correlation to Sentinel engine

The correlation figure is arguably the most important number on this page. The NLP signal has a correlation of -0.03 to the system's Sentinel engine -- effectively zero. This means the NLP edge is completely independent from the system's other strategies. In portfolio mathematics, an uncorrelated source of return with a positive Sharpe ratio is the rarest and most valuable asset that exists. It is free diversification -- additional return without additional risk.

"An IC of 0.05 is useful. An IC of 0.10 is exceptional. The NLP composite scores 0.115 across nearly three thousand earnings events."

Movement V

What Did Not Work

The NLP edge was not the first thing we tried. It was the third. The first two approaches consumed months of research and produced nothing. Publishing what failed is how we demonstrate that the thing that works was not cherry-picked from a buffet of attempts.

Approach 1 -- Dead

Bag-of-Words Features

The first attempt used traditional natural language processing: counting word frequencies, sentiment scores, readability metrics. Thirteen features were extracted from each transcript -- positive word count, negative word count, uncertainty language, forward-looking statement ratio, and so on. The maximum t-statistic across all thirteen features was 0.75. Not one crossed the significance threshold. Counting words does not capture meaning. The same word -- "challenging" -- means something completely different when preceded by "we are addressing these" versus "we face unprecedented." Context is everything, and bag-of-words features destroy context by design.

Max t-statistic: 0.75 -- 13 features, all insignificant

Approach 2 -- Dead

Outcome-Supervised Scoring

The second attempt trained the scoring model to predict stock price outcomes directly. The AI was given transcripts and corresponding returns and asked to learn the mapping between language and performance. The maximum t-statistic was 0.11 -- statistically indistinguishable from zero. This approach failed because it was asking the model to do something impossible: learn the relationship between language and price movement from noisy, multivariate data. A stock's six-month return is determined by hundreds of factors. Isolating the management quality signal from that noise requires a model that understands management quality as a concept -- not one that tries to reverse-engineer it from prices.

Max t-statistic: 0.11 -- indistinguishable from noise

The approach that worked -- the three-question contextual scoring -- succeeded precisely because it did not try to predict prices. It tried to measure something real: the quality of the human being running the company. The price prediction follows naturally, not because the model learned it, but because companies led by adaptive, creative, genuinely convicted leaders tend to outperform. The AI does not predict the stock. It reads the person. The stock follows.

Movement VI

How It Is Used

Integration Model

A Filter, Not a Strategy

The NLP scoring system is not deployed as a standalone strategy. It is deployed as a filter across all equity strategies in the portfolio. The distinction matters. A standalone strategy would go long top-scoring companies and short bottom-scoring ones. That works -- the 21% annual spread proves it. But the system achieves more by using the signal as an overlay on strategies that are already generating alpha from other sources.

Companies scoring in the bottom 20% on the quarterly NLP composite are "shadow banned" -- no equity strategy in the system trades them until the next quarterly score. The strategy engines that select stocks for mean reversion, momentum, or regime rotation simply cannot see these companies. They are invisible to the portfolio.

Companies scoring in the top 20% receive favored allocation. When a strategy engine generates a signal to buy a stock, and that stock happens to be in the top NLP quintile, it receives a larger position size than it otherwise would. The NLP score acts as a confidence multiplier on existing signals.

This architecture means the NLP edge amplifies every equity strategy without adding complexity. No new trades are generated by the NLP filter alone. It simply makes the existing trades better -- avoiding companies with poor management quality and overweighting companies with strong management quality. The cost is zero additional turnover. The benefit is a measurable improvement in Sharpe ratio across the equity book.

The quarterly cadence is important. Unlike technical signals that update every second, management quality changes slowly. A CEO who scored well in Q1 is overwhelmingly likely to score well in Q2. The filter updates four times per year, and each update reshuffles the universe that equity strategies can see. This slow update cycle means the NLP edge does not interfere with the fast-moving signals that drive daily position changes. It operates on a different timescale entirely -- a structural layer beneath the tactical execution.

Movement VII

Why AI Sees What Analysts Cannot

A human equity analyst covers eight to fifteen companies. They listen to every earnings call for their coverage universe, read every transcript, and form judgments about management quality. Many of them are very good at it. The best fundamental analysts have been reading management for decades and their pattern recognition is genuinely impressive.

But they cannot do it at scale. No human can read 500 earnings transcripts in a quarter and maintain consistent scoring criteria across all of them. The analyst covering technology companies develops different standards from the analyst covering healthcare. The analyst who heard a brilliant Microsoft call at 9 AM unconsciously adjusts their scoring of a mediocre Intel call at 2 PM. Fatigue, recency bias, sector familiarity, personal mood -- all of these contaminate human judgment at scale.

The AI reads every transcript with the same three questions, the same scoring criteria, the same lack of fatigue or mood or sector bias. It processes the 500th transcript with the same precision as the first. And because it understands language contextually -- not as word frequencies but as meaning -- it can detect the subtle differences between genuine conviction and performed confidence that even experienced analysts miss when they are tired or distracted.

This is not a story about AI replacing human judgment. It is a story about measuring something that humans already perceive but cannot scale. The analyst who covers twelve companies and says "I just have a bad feeling about this CEO" is detecting the same signal. The system detects it across 500 companies, four times per year, with mathematical consistency.

"The analyst who covers twelve companies and says 'I just have a bad feeling about this CEO' is detecting the same signal. The system detects it across five hundred companies with mathematical consistency."

Movement VIII

The Edge That Grows

Most quantitative edges decay. The more people who discover a signal, the more capital chases it, the more the edge gets arbitraged away. Momentum was a powerful factor in the 1990s. It still works, but the premium has shrunk as thousands of funds now trade it. Value was Warren Buffett's secret. Now it is a commodity signal available on every Bloomberg terminal.

The NLP edge has the opposite property. It grows as AI comprehension improves.

The three-question scoring system was built using the current generation of large language models. These models understand context, tone, and meaning at a level that was impossible two years ago and crude compared to what will be possible in two years. As AI language models improve -- as they become better at detecting subtle differences between performed and genuine confidence, between strategic adaptation and defensive deflection -- the signal will become more precise. The measurement instrument is getting sharper, not duller.

Meanwhile, the source of the signal -- human psychology -- does not change. CEOs will continue to reveal their management quality through their language patterns on earnings calls. They cannot help it. The social signals that our system measures are deeply embedded in how humans communicate. No CEO, no matter how well coached, can completely mask rigidity or convincingly perform genuine creativity. The signal is hardwired into how language works.

A growing edge with a permanent source. In a world where most quantitative signals are decaying, this combination is unusual. It is also, in our view, the single strongest argument for the system's long-term competitive advantage. Every other edge in the portfolio -- mean reversion, momentum, carry -- faces arbitrage pressure. This one faces the opposite: technology is making the measurement better while the phenomenon being measured stays the same.

500+

Companies scored every quarter

Questions per transcript

Four times a year, the system reads every earnings transcript in its coverage universe. Three questions are asked. Scores are computed. The equity universe is reshuffled. Companies with strong management quality become visible. Companies with weak management quality disappear. No trades are generated. No positions are forced. The filter simply changes what the other engines can see.

Then the other engines do what they do -- finding mean reversion opportunities, riding momentum, rotating with the regime. But they do it in a universe that has been quietly curated by a signal that measures the one thing financial statements cannot capture: the quality of the mind making the decisions.

"The numbers tell you what happened. The voice tells you what's coming. The system reads both."