The technique that lets AI explore alternatives before committing — instead of barrelling down the first path.
Chain-of-Thought (CoT) is one path, written out. If the model takes a wrong turn at step 2, the rest of the chain inherits the error. For puzzles, math, planning and strategy, the first plausible path is often not the best one. ToT explores siblings before committing.
undefined
Math and logic puzzles (the original paper used Game of 24). Planning with constraints. Code review and debugging. Strategy questions ("how should I approach X negotiation"). Anywhere "the first idea was actually wrong" is a real failure mode.
Conversational turns, short factual queries, creative writing. The compute and latency cost is real — ToT can be 5-10x the inference of CoT. Use it where the reasoning matters more than the response time.
Luna routes hard reasoning questions through a ToT loop — decision-making queries, debate mode, complex planning, gnarly coding tasks. For chat-pace conversation, she uses faster paths. The router decides; you do not have to.
The ReasoningEngine and ToT services are part of the Heaven Quantum Cortex. They run on Heaven's own infrastructure — your tree of thoughts is not leaking through a third-party LLM.
Related. ToT branches but parents do not reconnect; Graph-of-Thoughts allows arbitrary connections between intermediate steps. GoT is strictly more expressive but harder to control. Most production systems use a constrained variant of one or the other.
Frontier models (Claude, GPT-4/5, Gemini 2+, DeepSeek R1) often do internal reasoning that resembles ToT inside a single inference (especially "thinking" / chain-of-thought variants), but explicit external ToT — where the application layer manages the tree — is still mostly an opt-in pattern for hard tasks.
Self-Consistency samples multiple complete chains and majority-votes the answer. ToT explores partial chains and prunes. ToT is finer-grained and stronger on multi-step problems; Self-Consistency is cheaper and works fine on shorter ones.
Yes. Frameworks including LangGraph, DSPy, and Anthropic's reasoning examples include ToT templates. The main implementation cost is the value/critic step — getting reliable scores for partial branches is the hard part.