The Graveyard: What We Killed and Why

Movement I

Why Publish Failures

There is a reason most funds only show their winners. Showing failures feels like weakness. It invites the obvious question: if you killed 147 strategies, how do you know the surviving eleven are real?

That question is exactly the point. The fact that we killed 147 strategies is the strongest evidence that the survivors are genuine. A team that never kills anything is a team that is not testing honestly. A track record without dead strategies behind it is a track record built on luck, survivorship bias, or both.

Trust is built by what you are willing to show, not what you choose to hide.

What follows is a selection of the most instructive failures -- strategies that looked promising, consumed weeks or months of research, and ultimately produced negative returns when tested honestly. For each one: the thesis, the data, what went wrong, and what the system learned from its death.

158

Strategies built and tested

147

Killed with documentation

Survivors in production

A 7% survival rate. For context, pharmaceutical drug development has a similar ratio -- approximately 90% of candidates fail in clinical trials. The analogy is not accidental. Like drug development, strategy research involves a hypothesis, controlled testing, and a strict threshold that most candidates cannot clear. The difference is that in strategy research, the testing environment is adversarial: markets actively punish ideas that do not carry genuine edge.

Movement II

The Dead

Strategy #1

Value + Hurst Hybrid

-2.345

Sharpe Ratio

Thesis: Stocks with low Hurst exponents (mean-reverting price behavior) combined with deep value metrics should outperform. Buy mean-reverting cheap stocks, short trending expensive ones.

The Hurst exponent measures whether a time series tends to revert to its mean or trend persistently. The idea was elegant: identify stocks whose prices naturally oscillate, buy them when they are cheap, and capture the reversion. On paper, it combined two well-documented anomalies. In practice, it lost 26.7% annually. The Hurst signal, when applied to individual equities at daily frequency, was indistinguishable from noise. The "mean-reverting" stocks were not reverting to a stable mean -- they were reverting to a declining trend.

Lesson: Statistical properties measured on price histories are not the same as tradeable signals. A stock can have a low Hurst exponent and still lose money systematically. The math described the data. It did not describe the opportunity.

CAGR: -26.7%Test period: 16 yearsLong/short equity

Strategy #2

Commodity Momentum

-1.053

Sharpe Ratio

Thesis: Cross-sectional momentum on commodity futures. Buy the commodities rising fastest, short the laggards. A well-documented factor in academic literature.

Four separate versions were built and tested. All four produced negative Sharpe ratios. During debugging, the team discovered a Panama Canal roll adjustment bug -- the continuous futures contracts were not being stitched correctly, creating phantom gaps at roll dates. After fixing the bug, the results were still negative. The academic momentum premium in commodities appears to have been concentrated in a specific historical period and does not replicate in modern markets with realistic transaction costs.

Lesson: Academic papers often present gross-of-cost results on data that includes periods of structural market inefficiency. A strategy that worked in the 1990s commodity pits does not necessarily work on electronic markets with tight spreads and algorithmic competition. Also: always check the roll logic.

CAGR: -21.0%4 versions testedCommodity futures

Strategy #3

Crisis Alpha (Standalone)

-0.694

Sharpe Ratio

Thesis: Harvest tail risk premiums using VIX derivatives. Systematically buy volatility protection cheap, profit when crises arrive.

Six versions were constructed, ranging from simple long-VIX strategies to sophisticated VIX curve shape trades. All six produced negative Sharpe ratios. The fundamental problem: volatility insurance costs money every day that there is no crisis. Crises are rare. The daily cost of carry overwhelms the occasional payoff. More importantly, the system's regime detector already rotates into crisis-appropriate assets when volatility spikes -- achieving the protective effect without the ongoing premium drain. The standalone crisis alpha strategy was paying for protection the system already had.

Lesson: Insurance is a losing trade when the underlying system already adapts to crises. Buying protection against a scenario you already handle is paying twice for the same coverage.

CAGR: -8.4%6 versions testedVIX derivatives

Strategy #4

Crypto Microstructure

-1.430

Sharpe on Holdout

Thesis: Order-flow and microstructure signals on BTC and ETH. Patterns in order book depth, trade aggression, and funding rates should predict short-term price movements.

This was the closest the system came to deploying a false positive into production. The initial scanner identified 68 distinct patterns with in-sample Sharpe ratios above 1.3. The signals looked robust -- consistent across time periods, significant t-statistics, clean equity curves. A circular validation process (walk-forward on overlapping data) confirmed the results with Sharpe 1.32. Then the team ran a true holdout test on data the scanner had never touched. Sharpe: -1.43. The entire signal set was noise. Every single pattern was an artifact of overfitting to the specific characteristics of the training data. Capital was weeks away from deployment.

Lesson: Validation on overlapping data is not validation. 68 "confirmed" patterns can all be false positives when the search space is large enough. True holdout testing -- on data completely separated from the discovery process -- is not optional. It is the only thing standing between research and ruin.

68 patterns foundAll false positivesBTC/ETH microstructure

Strategy #5

Multi-Factor Market Neutral

-0.159

Sharpe Ratio

Thesis: Classic four-factor long/short equity -- value, momentum, quality, and low volatility. Dollar-neutral construction. The most researched strategy in quantitative finance.

Over a sixteen-year backtest period, the strategy lost 59% of its starting capital. The Sharpe ratio of -0.159 means it performed worse than holding cash. The root cause was not the factors themselves but the data: fundamental data from commercial providers (Morningstar) is too coarse for daily rebalancing of a market-neutral portfolio. Point-in-time fundamental data changes quarterly, but the strategy was attempting to trade daily. The result was a portfolio that churned through transaction costs while its signal updated four times per year.

Lesson: The frequency of the signal must match the frequency of the trading. Daily trading on quarterly data is a guaranteed way to convert capital into broker commissions. The strategy might work at monthly frequency on higher-quality data. But "might work with different assumptions" is not a strategy -- it is a wish.

CAGR: -5.9%/yr16-year testL/S equity, 4 factors

Strategy #6

0DTE Options

< 0

All 20 Versions

Thesis: Zero-days-to-expiration options offer extreme gamma. Systematic selling (or buying) of 0DTE puts and calls on the S&P 500 should capture vol premium or directional moves.

Twenty versions were tested. Sellers. Buyers. Condors. Strangles. With filters. Without filters. Regime-conditional. Static. Every combination produced negative Sharpe ratios when tested at realistic bid-ask spreads. The theoretical edge in 0DTE options exists in academic models that assume mid-price execution. In reality, the bid-ask spread on a 0DTE option with four hours of life remaining is typically 5-15% of the premium. The spread eats the edge. The more sophisticated the filter used to select when to trade, the fewer trades executed, and the higher the variance. Twenty versions, zero survivors.

Lesson: Execution costs are not a nuisance to be modeled away. They are the market's way of telling you whether an edge is real. If the edge disappears at realistic spreads, the edge belongs to the market maker, not to you.

20 versions testedAll negative at real spreadsS&P 500 options

Strategy #7

VIX Tail-Risk Overlay

< 0

All 6 Versions

Thesis: A permanent VIX call overlay to protect the portfolio during stress events. Systematically roll long VIX calls at varying strikes and tenors.

Six versions were constructed with different strike selection rules, roll timing, and notional sizing. All six lost money. The VIX call overlay is essentially a more expensive version of Crisis Alpha (Strategy #3 above) with the same fundamental flaw: the daily cost of protection exceeds the occasional payoff. Worse, the system's regime detector already shifts allocation during stress, making the overlay redundant. VIX calls during the 2020 crash paid handsomely -- but the profits from those calls did not offset the cumulative cost of carrying them during the preceding four years of low volatility.

Lesson: Never add a hedging overlay without first measuring what the base system already does in the scenario being hedged. The regime detector already rotated to +2.8% during COVID and +7.3% during 2022. Paying an annual premium to improve on those numbers is a negative expected value trade.

6 versions testedAll negative SharpeVIX calls overlay

Strategy #8

Commodity Seasonality

< 0

All 4 Versions

Thesis: Commodity futures exhibit calendar-based seasonality -- natural gas rises in winter, agricultural commodities move with planting cycles. Systematic trading of seasonal patterns should produce consistent returns.

Four versions were tested on calendar-based patterns in commodity futures. All produced negative Sharpe ratios. The seasonal patterns that appear in historical data are well known and have been traded by commodity specialists for decades. Whatever edge they once contained has been arbitraged away by the traders who arrived first. The scanner found "significant" seasonal patterns in backtest -- but when tested on data the scanner had not seen, the patterns vanished. Seasonal commodity trading is another example of a strategy that works in a textbook and dies in a market.

Lesson: Well-known patterns in liquid markets have already been priced in by the participants who trade them professionally. A seasonal edge that any Bloomberg terminal can display is not an edge -- it is a consensus position.

4 versions testedAll negative SharpeCommodity futures

Movement III

The Pattern in the Graveyard

Step back from the individual failures and a pattern emerges. Nearly every dead strategy shares one or more of three characteristics:

1. The signal was too slow for the trading frequency. Multi-Factor Market Neutral attempted daily trading on quarterly data. Commodity Seasonality attempted daily trading on annual patterns. When the signal updates slower than the portfolio rebalances, the strategy pays transaction costs on noise.

2. The edge belonged to someone else. 0DTE Options profits go to market makers who can execute at the mid. Commodity Momentum was arbitraged by specialists decades ago. Commodity Seasonality is priced in by every agricultural trader alive. When an edge is visible to everyone, it is no longer an edge.

3. The validation was circular. Crypto Microstructure found 68 "confirmed" patterns that were all false positives. The scanner that found them validated them using overlapping data. The discovery process and the validation process shared the same information. When a truly independent test was applied, the signal was negative. This is the most dangerous failure mode because it produces the highest confidence in results that do not exist.

"The system got better not because we found more things that work -- but because we found more ways to be wrong."

Movement IV

What Survives and Why

The eleven engines that survived did not survive by being more clever. They survived by being structurally different from the dead. Each surviving engine has an edge that comes from a source the market cannot easily replicate or arbitrage away:

Mean reversion profits from the behavioral fact that humans panic and overshoot. This edge does not diminish because it is rooted in human psychology, not market structure.

NLP scoring profits from the fact that management language contains information that traditional financial analysis does not capture. This edge grows as AI comprehension improves.

Regime rotation profits from the fact that most portfolios are static in a dynamic world. This edge persists because the majority of capital is managed by institutions that cannot change allocation quickly.

Each surviving engine was subjected to the same tests that killed the others: walk-forward validation on truly unseen data, realistic transaction costs, stress-period analysis, and correlation measurement against all other engines. The bar for survival is the same bar that 93% of candidates failed to clear.

The graveyard is not a source of embarrassment. It is the foundation of conviction. Every dead strategy is a hypothesis that was tested honestly and found wanting. The strategies that remain are the ones that survived a process designed to kill them.

Movement V

The Ongoing Process

The graveyard is not closed. The research pipeline continuously generates new strategy candidates, and most of them will die. The same process that produced these eight failures also produced the eleven survivors -- and it will produce the twelfth, thirteenth, and fourteenth engines when candidates emerge that can clear the bar.

The system improves in two directions simultaneously: forward, by discovering new strategies that genuinely work, and backward, by killing existing strategies the moment they stop working. An engine that degrades past its minimum threshold is removed from production, documented, and added to this record. Nothing is permanent. Nothing is sacred. Nothing survives on reputation.

There is a common belief in the investment industry that the best teams are the ones that find the most edges. That belief is wrong. The best teams are the ones that are most rigorous about killing ideas that do not work. Discovery is easy. Discipline is rare.

This document will be updated as the graveyard grows. We expect it to grow significantly. That is not a failure of the research process. It is the research process working exactly as designed.

"Discovery is easy. Discipline is rare. The graveyard is not a record of failure. It is a record of honesty."