Skip to main content
Technology & EngineeringPrompt Engineering150 lines

Chain of Thought

Chain-of-thought prompting to elicit step-by-step reasoning from language models

Quick Summary28 lines
You are an expert in Chain-of-Thought (CoT) prompting for crafting effective AI prompts that elicit explicit, step-by-step reasoning.

## Key Points

1. What risks exist?
2. What is the safest sequence of DDL operations?
3. What rollback plan should be in place?
- **Match granularity to difficulty.** Simple factual lookups do not need CoT; multi-step math, logic, and planning tasks benefit greatly.
- **Be explicit.** Phrases like "think step by step", "show your reasoning", or "explain before answering" are reliable triggers.
- **Provide format guidance.** If you need the final answer separated from the reasoning, instruct the model to put it after a delimiter such as `ANSWER:` or in a JSON field.
- **Combine with few-shot examples.** Demonstrated reasoning chains are more reliable than zero-shot directives alone for complex tasks.
- **Use self-consistency for high-stakes answers.** Multiple independent CoT samples with majority voting reduces errors.
- **Keep reasoning on-topic.** If the model starts producing irrelevant tangents, add a constraint like "Keep each step concise and directly relevant to the question."
- **Overthinking simple tasks.** CoT adds latency and token cost. Do not apply it to straightforward retrieval or classification where it provides no accuracy gain.
- **Trusting the reasoning blindly.** Models can produce plausible-sounding but incorrect reasoning chains. Always verify critical steps.
- **Letting CoT become unstructured.** Without formatting cues, the model may produce a wall of text. Use numbered steps or labeled sections.

## Quick Example

```
Prompt:
A store sells apples for $1.50 each and oranges for $2.00 each.
If Maria buys 4 apples and 3 oranges, how much does she spend in total?

Let's think step by step.
```
skilldb get prompt-engineering-skills/Chain of ThoughtFull skill: 150 lines
Paste into your CLAUDE.md or agent config

Chain-of-Thought — Prompt Engineering

You are an expert in Chain-of-Thought (CoT) prompting for crafting effective AI prompts that elicit explicit, step-by-step reasoning.

Overview

Chain-of-thought prompting instructs a language model to break down its reasoning into intermediate steps before arriving at a final answer. Rather than asking for a direct output, the prompt explicitly requests (or demonstrates) a thinking process. This dramatically improves accuracy on tasks that require arithmetic, logic, multi-hop reasoning, or any form of compositional problem-solving.

Core Concepts

Zero-Shot CoT

Adding a simple directive like "Let's think step by step" to the end of a prompt triggers the model to produce reasoning traces without any examples.

Few-Shot CoT

Providing one or more worked examples that include explicit reasoning chains. The model learns to mirror the demonstrated thought process.

Self-Consistency

Sampling multiple CoT paths and selecting the answer that appears most frequently across them. This reduces variance from any single reasoning trace.

Tree-of-Thought

An extension where the model explores multiple reasoning branches at each step, evaluates partial solutions, and backtracks when a path fails. Useful for search-heavy problems like puzzles or planning.

Structured Reasoning Markers

Using explicit labels such as Step 1:, Therefore:, Given:, Because: to anchor each phase of the reasoning.

Implementation Patterns

Zero-Shot CoT

Prompt:
A store sells apples for $1.50 each and oranges for $2.00 each.
If Maria buys 4 apples and 3 oranges, how much does she spend in total?

Let's think step by step.

The model will produce intermediate arithmetic before giving the final total.

Few-Shot CoT with a Worked Example

Prompt:
Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls.
Each can has 3 tennis balls. How many tennis balls does he have now?

A: Roger started with 5 balls. He bought 2 cans × 3 balls = 6 balls.
5 + 6 = 11. The answer is 11.

Q: A store sells apples for $1.50 each and oranges for $2.00 each.
If Maria buys 4 apples and 3 oranges, how much does she spend in total?

A:

CoT for Code Debugging

Prompt:
The following Python function is supposed to return the second-largest
element in a list, but it has a bug. Walk through the logic step by step
with the input [3, 1, 4, 1, 5] and identify the error.

def second_largest(nums):
    first = second = float('-inf')
    for n in nums:
        if n > first:
            second = first
            first = n
    return second

Self-Consistency Pattern

Prompt:
Answer the following question. Provide your step-by-step reasoning,
then give your final answer on the last line prefixed with "ANSWER:".

I will ask you this question 5 times. Give an independent chain of thought
each time.

Question: If a train travels 60 km in 45 minutes, what is its speed in km/h?

After collecting 5 answers, take the majority vote.

Planning with CoT

Prompt:
You need to plan a database migration that renames column "usr_name" to
"username" in a production PostgreSQL table with 50 million rows and zero
downtime.

Think through each step carefully:
1. What risks exist?
2. What is the safest sequence of DDL operations?
3. What rollback plan should be in place?

Lay out your reasoning before giving the final migration script.

Best Practices

  • Match granularity to difficulty. Simple factual lookups do not need CoT; multi-step math, logic, and planning tasks benefit greatly.
  • Be explicit. Phrases like "think step by step", "show your reasoning", or "explain before answering" are reliable triggers.
  • Provide format guidance. If you need the final answer separated from the reasoning, instruct the model to put it after a delimiter such as ANSWER: or in a JSON field.
  • Combine with few-shot examples. Demonstrated reasoning chains are more reliable than zero-shot directives alone for complex tasks.
  • Use self-consistency for high-stakes answers. Multiple independent CoT samples with majority voting reduces errors.
  • Keep reasoning on-topic. If the model starts producing irrelevant tangents, add a constraint like "Keep each step concise and directly relevant to the question."

Core Philosophy

Chain-of-thought prompting works because language models are sequence predictors: the quality of the next token depends on the quality of all preceding tokens. When a model is asked to jump directly to an answer, it must compress multi-step reasoning into a single prediction -- a task that is inherently lossy. When it is asked to write out intermediate steps first, each step provides context that makes the next step more accurate. The reasoning trace is not just an explanation for the human; it is a scaffold that improves the model's own computation.

CoT is a tool with an optimal operating range. It dramatically improves performance on tasks that require multiple reasoning steps: arithmetic, logic puzzles, multi-hop knowledge retrieval, planning, and code debugging. For tasks that do not require reasoning -- simple factual recall, classification, or text reformatting -- CoT adds latency and token cost without improving accuracy. Applying CoT to everything is as wrong as applying it to nothing; the key is matching the technique to the task's cognitive complexity.

The value of CoT is amplified when combined with structured output and verification. A free-form reasoning trace is better than no trace, but a trace with labeled steps (Given:, Step 1:, Therefore:) is easier to parse, audit, and debug. When the model's reasoning is visible and structured, you can identify exactly where it went wrong and refine the prompt to address that specific failure mode. This is the difference between prompt engineering by intuition and prompt engineering by diagnosis.

Anti-Patterns

  • Applying CoT to trivial tasks: Requesting step-by-step reasoning for simple factual questions like "What is the capital of France?" This wastes tokens and latency without any accuracy benefit, and it can actually introduce errors by giving the model room to overthink.

  • Trusting the reasoning trace as proof of correctness: Assuming that because the model produced a plausible-looking reasoning chain, the final answer must be correct. Models can generate convincing but logically flawed reasoning. Always verify critical outputs independently.

  • Unstructured reasoning walls: Asking the model to "think step by step" without providing any format guidance. The model may produce a rambling paragraph instead of discrete, labeled steps, making it difficult to audit the reasoning or extract the final answer programmatically.

  • Using CoT as a substitute for task decomposition: Asking a single prompt to reason through a 10-step process when the task should be broken into separate prompts (prompt chaining). Long reasoning chains accumulate errors, and a single monolithic CoT prompt cannot be debugged at the step level.

  • Ignoring token budget impact: Adding CoT to every prompt in a high-volume system without accounting for the additional tokens consumed by the reasoning trace. In production, CoT can double or triple token usage, directly increasing cost and latency.

Common Pitfalls

  • Overthinking simple tasks. CoT adds latency and token cost. Do not apply it to straightforward retrieval or classification where it provides no accuracy gain.
  • Trusting the reasoning blindly. Models can produce plausible-sounding but incorrect reasoning chains. Always verify critical steps.
  • Letting CoT become unstructured. Without formatting cues, the model may produce a wall of text. Use numbered steps or labeled sections.
  • Ignoring token limits. Long reasoning chains consume context window space. For very complex problems, break the task into sub-prompts rather than one massive CoT.
  • Confusing CoT with correctness. A step-by-step trace does not guarantee the right answer; it increases the probability but is not infallible.

Install this skill directly: skilldb add prompt-engineering-skills

Get CLI access →