What is a list comprehension?
A list comprehension is a concise way to build a new list by applying an expression to every item in an iterable — optionally filtering items along the way — all within a single line of code. Instead of starting with an empty list, calling append() inside a loop, and ending up with three or four lines to express a simple idea, you write the entire transformation in one readable expression.
The concept is borrowed from set-builder notation in mathematics. In math, you might write a set as { x² | x ∈ ℕ, x < 10 } — meaning "the square of every natural number less than 10." Python's syntax mirrors that idea almost exactly.
Here is the same task written three ways. All three produce identical output:
# Method 1: classic for loop
squares = []
for x in range(10):
squares.append(x ** 2)
# Method 2: map() + lambda
squares = list(map(lambda x: x ** 2, range(10)))
# Method 3: list comprehension
squares = [x ** 2 for x in range(10)]
print(squares)
The list comprehension version is shorter, reads from left to right like a sentence, and runs faster. That combination of readability and performance is why experienced Python developers reach for it by default.
Understanding comprehensions also unlocks Python's other comprehension forms — dictionary comprehensions, set comprehensions, and generator expressions — which all share the same mental model but produce different output types.
Basic syntax explained
Every list comprehension follows this template:
[expression for item in iterable]
- expression — what you want each element of the new list to be. It usually involves
item, but it doesn't have to. - item — the loop variable, the same name you would use in a
forloop. - iterable — any sequence, iterator, or object that supports iteration: lists, tuples, strings, ranges, generators, file handles, dictionary views, and so on.
Let's walk through a handful of examples with different iterables to make the pattern concrete:
# 1. Doubling numbers from a range
doubled = [n * 2 for n in range(1, 6)]
print(doubled) # [2, 4, 6, 8, 10]
# 2. Uppercasing every string in a list
languages = ["python", "javascript", "rust", "go"]
upper = [lang.upper() for lang in languages]
print(upper) # ['PYTHON', 'JAVASCRIPT', 'RUST', 'GO']
# 3. Getting the length of each string
lengths = [len(word) for word in languages]
print(lengths) # [6, 10, 4, 2]
# 4. Iterating over a string character by character
chars = [ch for ch in "Python"]
print(chars) # ['P', 'y', 't', 'h', 'o', 'n']
# 5. Calling a math function on every element
import math
roots = [round(math.sqrt(n), 2) for n in [1, 4, 9, 16, 25]]
print(roots) # [1.0, 2.0, 3.0, 4.0, 5.0]
# 6. Accessing dictionary values
users = [
{"name": "Alice", "age": 30},
{"name": "Bob", "age": 25},
]
names = [user["name"] for user in users]
print(names) # ['Alice', 'Bob']
# 7. Unpacking tuples using tuple unpacking in the for clause
points = [(1, 2), (3, 4), (5, 6)]
sums = [x + y for x, y in points]
print(sums) # [3, 7, 11]
Notice that the expression can be any valid Python expression: arithmetic, method calls, function calls, attribute access, subscripting, f-strings, even another comprehension. If it returns a value, it can go in the expression slot.
The loop variable can also be a tuple to unpack structured data directly in the for clause, as shown in example 7. This pattern is especially useful when iterating over enumerate(), zip(), or dict.items():
# enumerate(): access both index and value
items = ["a", "b", "c"]
indexed = [f"{i}:{v}" for i, v in enumerate(items)]
print(indexed) # ['0:a', '1:b', '2:c']
# zip(): pair elements from two lists
names = ["Alice", "Bob", "Carol"]
scores = [92, 87, 95]
report = [f"{name}: {score}" for name, score in zip(names, scores)]
print(report) # ['Alice: 92', 'Bob: 87', 'Carol: 95']
# dict.items(): work with key-value pairs
config = {"host": "localhost", "port": "5432", "db": "myapp"}
env_vars = [f"DB_{k.upper()}={v}" for k, v in config.items()]
print(env_vars)
# ['DB_HOST=localhost', 'DB_PORT=5432', 'DB_DB=myapp']
Conditional filtering (if clause)
Adding an if clause at the end of a comprehension filters the iterable — only items that pass the condition are processed by the expression:
[expression for item in iterable if condition]
The condition is evaluated first. If it is truthy, the expression runs and the result is added to the output list. If it is falsy, that item is skipped entirely. Think of it as a gate between the iterable and the expression:
# Even numbers from 0 to 19
evens = [n for n in range(20) if n % 2 == 0]
print(evens)
# [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
# Words longer than 4 characters
words = ["list", "comprehension", "is", "very", "powerful"]
long_words = [w for w in words if len(w) > 4]
print(long_words)
# ['comprehension', 'powerful']
# Filter out None values from a mixed list
raw = [1, None, 3, None, 5, 6]
clean = [x for x in raw if x is not None]
print(clean)
# [1, 3, 5, 6]
# Multiple conditions combined with 'and'
divisible = [n for n in range(1, 101) if n % 3 == 0 and n % 5 == 0]
print(divisible)
# [15, 30, 45, 60, 75, 90]
# Multiple conditions combined with 'or'
extremes = [n for n in range(20) if n < 3 or n > 16]
print(extremes)
# [0, 1, 2, 17, 18, 19]
# Filter and transform in one step: square only the even numbers
even_squares = [n**2 for n in range(10) if n % 2 == 0]
print(even_squares)
# [0, 4, 16, 36, 64]
# Using a method in the condition: keep non-empty strings after stripping
lines = ["hello", " ", "world", "", "!"]
non_empty = [line.strip() for line in lines if line.strip()]
print(non_empty)
# ['hello', 'world', '!']
The filtering if clause lives at the end of the comprehension, after the iterable. This is different from the if/else form you'll see in the next section, which sits in the expression position at the front. This positional difference is the most common source of confusion for people learning comprehensions.
If/else for value transformation
Sometimes you don't want to remove items from the output — you want to produce different values depending on a condition, while keeping every item. For that, you use a ternary expression in the expression position, before the for keyword:
[value_if_true if condition else value_if_false for item in iterable]
Every item appears in the output list, but the value differs depending on the condition. The output list always has the same length as the input iterable:
# Label each number as 'even' or 'odd'
labels = ["even" if n % 2 == 0 else "odd" for n in range(6)]
print(labels)
# ['even', 'odd', 'even', 'odd', 'even', 'odd']
# Clamp negative numbers to 0, leave positives unchanged
values = [-5, 3, -1, 7, 0, -2]
clamped = [v if v >= 0 else 0 for v in values]
print(clamped)
# [0, 3, 0, 7, 0, 0]
# Replace None with a default value, keep everything else
data = ["Alice", None, "Bob", None, "Carol"]
filled = [name if name is not None else "Unknown" for name in data]
print(filled)
# ['Alice', 'Unknown', 'Bob', 'Unknown', 'Carol']
# Absolute value without abs() — for illustration
nums = [-3, -1, 0, 2, -4]
abs_vals = [n if n >= 0 else -n for n in nums]
print(abs_vals)
# [3, 1, 0, 2, 4]
for keyword. If you put an if after for without an else, it acts as a filter. These two forms look similar but do completely different things — the placement is the key.nums = [1, 2, 3, 4, 5]
# FILTER: only even numbers enter the output. Length < input.
filtered = [n for n in nums if n % 2 == 0] # [2, 4]
# TRANSFORM: every number, mapped to 'E' or 'O'. Length == input.
mapped = ["E" if n % 2 == 0 else "O" for n in nums] # ['O', 'E', 'O', 'E', 'O']
You can combine both forms in one comprehension — filter items at the end, then transform the ones that pass using a ternary at the front:
# From 0–19, take only even numbers, then label them 'small' or 'large'
result = [
"small" if n < 10 else "large"
for n in range(20)
if n % 2 == 0
]
print(result)
# ['small', 'small', 'small', 'small', 'small', 'large', 'large', 'large', 'large', 'large']
Spreading a complex comprehension over multiple lines like this is encouraged by PEP 8 when it improves readability. Python treats newlines inside brackets as continuation characters — no backslash needed.
Nested comprehensions
A nested comprehension contains more than one for clause. The order of the clauses mirrors the order of equivalent nested for loops — outer loop first, inner loop second. This is one of the most common points of confusion, so let's work through it step by step.
# Flatten a 2D matrix into a 1D list
matrix = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
]
# Equivalent nested for loop:
# flat = []
# for row in matrix: ← outer loop (comes first in comprehension)
# for cell in row: ← inner loop (comes second)
# flat.append(cell)
flat = [cell for row in matrix for cell in row]
print(flat)
# [1, 2, 3, 4, 5, 6, 7, 8, 9]
cell, for each row in the matrix, for each cell in that row." The for clauses read exactly like nested for loops, left to right = outer to inner.# Cartesian product: all (color, size) combinations
colors = ["red", "green", "blue"]
sizes = ["S", "M", "L"]
variants = [(color, size) for color in colors for size in sizes]
print(variants)
# [('red', 'S'), ('red', 'M'), ('red', 'L'),
# ('green', 'S'), ('green', 'M'), ('green', 'L'),
# ('blue', 'S'), ('blue', 'M'), ('blue', 'L')]
# Transpose: rows become columns, columns become rows
transposed = [[row[i] for row in matrix] for i in range(3)]
print(transposed)
# [[1, 4, 7], [2, 5, 8], [3, 6, 9]]
# Cartesian product with filter: only pairs where x != y
pairs = [(x, y) for x in range(4) for y in range(4) if x != y]
print(pairs[:6])
# [(0, 1), (0, 2), (0, 3), (1, 0), (1, 2), (1, 3)]
# Flatten a list of strings into individual characters, skipping spaces
words = ["hello world", "foo bar"]
chars = [ch for word in words for ch in word if ch != " "]
print(chars)
# ['h', 'e', 'l', 'l', 'o', 'w', 'o', 'r', 'l', 'd', 'f', 'o', 'o', 'b', 'a', 'r']
The transpose example uses a comprehension inside a comprehension. The inner comprehension [row[i] for row in matrix] picks column i from every row. The outer comprehension repeats this for every column index. This is also achievable with list(zip(*matrix)), which is idiomatic Python — but the comprehension version makes the intent more transparent to readers who don't immediately recognize the zip-unpack transpose trick.
Keep nesting to two levels. Three-level nesting becomes difficult to reason about quickly and usually signals that a helper function would be clearer.
Dict and set comprehensions
Python extends the comprehension syntax to two other fundamental types. The mental model is identical to list comprehensions — the only differences are delimiters and expression format.
Dictionary comprehensions
Use curly braces with a key: value expression to build a dictionary in a single pass:
# Word → length mapping
words = ["apple", "banana", "cherry", "date"]
word_lengths = {word: len(word) for word in words}
print(word_lengths)
# {'apple': 5, 'banana': 6, 'cherry': 6, 'date': 4}
# Invert a dictionary (swap keys and values)
original = {"a": 1, "b": 2, "c": 3}
inverted = {v: k for k, v in original.items()}
print(inverted)
# {1: 'a', 2: 'b', 3: 'c'}
# Build a square lookup table: fast O(1) access later
squares_map = {n: n**2 for n in range(1, 11)}
print(squares_map)
# {1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81, 10: 100}
# Filter a dictionary: keep only items where the value meets a condition
scores = {"Alice": 92, "Bob": 58, "Carol": 87, "Dave": 44}
passing = {name: score for name, score in scores.items() if score >= 60}
print(passing)
# {'Alice': 92, 'Carol': 87}
# Normalize messy keys: strip whitespace and lowercase
messy = {" Name ": "Alice", "AGE ": 30, " CITY": "London"}
clean = {k.strip().lower(): v for k, v in messy.items()}
print(clean)
# {'name': 'Alice', 'age': 30, 'city': 'London'}
# Create a dict from two parallel lists using zip
keys = ["host", "port", "db"]
values = ["localhost", 5432, "myapp"]
config = {k: v for k, v in zip(keys, values)}
print(config)
# {'host': 'localhost', 'port': 5432, 'db': 'myapp'}
# (equivalent to dict(zip(keys, values)) but more flexible)
Set comprehensions
Use curly braces with a single expression (no colon) to build a set. Sets automatically deduplicate, so this pattern is ideal for extracting unique values:
# Unique string lengths
words = ["cat", "dog", "elephant", "ant", "bee"]
unique_lengths = {len(w) for w in words}
print(unique_lengths) # {3, 8} — order may vary
# All unique characters in a string (like a frequency analysis first pass)
unique_chars = {ch.lower() for ch in "Hello World" if ch.isalpha()}
print(unique_chars)
# {'h', 'e', 'l', 'o', 'w', 'r', 'd'} — unordered
# Unique domains from a list of email addresses
emails = ["alice@gmail.com", "bob@yahoo.com", "carol@gmail.com", "dave@outlook.com"]
domains = {email.split("@")[1] for email in emails}
print(domains)
# {'gmail.com', 'yahoo.com', 'outlook.com'} — no duplicate gmail.com
{} in Python creates a dict, not a set. To create an empty set you must write set(). Set comprehensions using {expr for x in iterable} are unambiguous because they contain a for keyword — Python knows it's a set.Walrus operator (Python 3.8+)
The walrus operator := (named for its resemblance to walrus eyes and tusks on their side) assigns a value to a variable inside an expression. Inside comprehensions, this solves the "compute-filter-reuse" problem: when you need to compute an expensive result, use it in the filter condition, and then also use it as the output value — without calling the function twice.
import math
# WITHOUT walrus: sqrt() runs twice on every item — once to filter, once to use
results_double_work = [
math.sqrt(n)
for n in [9, -4, 16, -1, 25]
if n >= 0 and math.sqrt(n) > 3 # sqrt computed again here
]
# WITH walrus: compute once, assign to 'root', reuse in condition AND expression
results_efficient = [
root
for n in [9, -4, 16, -1, 25]
if n >= 0 and (root := math.sqrt(n)) > 3
]
print(results_efficient) # [4.0, 5.0]
# Practical: parse JSON strings, keep only the successfully parsed ones
import json
raw = [
'{"status": 200, "body": "OK"}',
'not valid json',
'{"status": 404, "body": "Not Found"}',
'{"status": 200, "body": "Created"}',
]
def safe_parse(s):
try:
return json.loads(s)
except json.JSONDecodeError:
return None
# Parse once per item, filter on status 200
successful = [
parsed
for r in raw
if (parsed := safe_parse(r)) is not None
and parsed["status"] == 200
]
print(successful)
# [{'status': 200, 'body': 'OK'}, {'status': 200, 'body': 'Created'}]
Without the walrus operator, you'd call safe_parse() twice on every item, or fall back to a regular for loop. The walrus operator makes the comprehension the right tool without doubling the work.
One caution: the walrus operator leaks the assigned variable into the enclosing scope, which is intentional by design but can be surprising. After the comprehension above runs, parsed exists in the local scope and holds the last value it was assigned.
Real-world examples
Textbook examples with range() are fine for learning syntax, but list comprehensions are most valuable when processing real, messy data. Here are patterns you'll encounter in production Python code.
Parsing structured text
# Parse CSV lines into dicts, skip rows with missing name
csv_lines = [
"Alice,30,Engineer",
"Bob,25,Designer",
"Carol,35,Manager",
" ,28,Unknown", # empty name — should be skipped
]
records = [
{"name": p[0].strip(), "age": int(p[1]), "role": p[2]}
for line in csv_lines
if (p := line.split(",")) and p[0].strip()
]
for r in records:
print(r)
# {'name': 'Alice', 'age': 30, 'role': 'Engineer'}
# {'name': 'Bob', 'age': 25, 'role': 'Designer'}
# {'name': 'Carol', 'age': 35, 'role': 'Manager'}
Processing files
import os
# All Python source files in a directory tree
py_files = [
os.path.join(root, f)
for root, dirs, files in os.walk(".")
for f in files
if f.endswith(".py")
]
# Non-comment, non-empty lines from a config file
with open("config.ini") as f:
settings = [
line.strip()
for line in f
if line.strip() and not line.startswith("#")
]
Working with objects and dataclasses
from dataclasses import dataclass
@dataclass
class Product:
name: str
price: float
stock: int
inventory = [
Product("Keyboard", 79.99, 15),
Product("Monitor", 399.00, 0),
Product("Mouse", 29.99, 42),
Product("Webcam", 89.00, 3),
]
# Product names that are currently in stock
available = [p.name for p in inventory if p.stock > 0]
print(available)
# ['Keyboard', 'Mouse', 'Webcam']
# Apply 10% discount to items priced over $50
discounted = [
Product(p.name, round(p.price * 0.9, 2), p.stock)
if p.price > 50
else p
for p in inventory
]
for p in discounted:
print(f"{p.name}: ${p.price}")
# Keyboard: $71.99 — discounted
# Monitor: $359.1 — discounted
# Mouse: $29.99 — unchanged
# Webcam: $80.1 — discounted
Flattening nested API data
# Typical nested structure from a REST API response
departments = [
{"name": "Engineering", "employees": ["Alice", "Bob", "Carol"]},
{"name": "Design", "employees": ["Dave", "Eve"]},
{"name": "Marketing", "employees": ["Frank"]},
]
# Flat list of all employees
all_employees = [
emp
for dept in departments
for emp in dept["employees"]
]
print(all_employees)
# ['Alice', 'Bob', 'Carol', 'Dave', 'Eve', 'Frank']
# Add department context to each employee record
with_context = [
{"name": emp, "dept": dept["name"]}
for dept in departments
for emp in dept["employees"]
]
print(with_context[:2])
# [{'name': 'Alice', 'dept': 'Engineering'},
# {'name': 'Bob', 'dept': 'Engineering'}]
Performance benchmarks
The performance advantage of list comprehensions over for loops is real and measurable, not just folklore. Here is why and by how much.
Under the hood, a list comprehension is compiled to optimized bytecode. CPython uses a special LIST_APPEND opcode at the C level that bypasses Python's normal attribute lookup mechanism. A for loop calling list.append() must resolve the append method on each iteration — looking it up in the list object's method dictionary — which adds overhead proportional to the number of iterations.
import timeit
N = 100_000
# Benchmark 1: classic for loop
loop_time = timeit.timeit(
"r=[]\nfor x in range(N):\n r.append(x*2)",
globals={"N": N}, number=50
)
# Benchmark 2: pre-bind append to avoid repeated lookup
bind_time = timeit.timeit(
"r=[]; a=r.append\nfor x in range(N):\n a(x*2)",
globals={"N": N}, number=50
)
# Benchmark 3: list comprehension
comp_time = timeit.timeit(
"r=[x*2 for x in range(N)]",
globals={"N": N}, number=50
)
# Benchmark 4: map() with a named function
map_time = timeit.timeit(
"r=list(map(double, range(N)))",
setup="def double(x): return x*2",
globals={"N": N}, number=50
)
print(f"For loop: {loop_time:.3f}s")
print(f"For loop + bound append: {bind_time:.3f}s")
print(f"List comprehension: {comp_time:.3f}s")
print(f"map() + named fn: {map_time:.3f}s")
Key takeaways from these numbers:
- The comprehension is 35% faster than the naive for loop.
- Pre-binding
appendto a local variable closes the gap by 22% — confirming that attribute lookup is a significant cost in tight loops. map()with a named function is the theoretical ceiling for pure Python, but the difference from a comprehension is small (9%) and comes at the cost of readability for anything beyond trivial transformations.
When the expression involves a significant function call, the relative speedup shrinks because the function call overhead dominates:
import math, timeit
N = 50_000
setup = "import math; data=list(range(1,N+1))"
loop_time = timeit.timeit(
"r=[]\nfor x in data:\n r.append(math.sqrt(x))",
setup=setup, globals={"N": N}, number=100
)
comp_time = timeit.timeit(
"r=[math.sqrt(x) for x in data]",
setup=setup, globals={"N": N}, number=100
)
print(f"Loop: {loop_time:.3f}s")
print(f"Comprehension: {comp_time:.3f}s")
print(f"Speedup: {loop_time/comp_time:.2f}x")
The speedup drops to 11% when the expression is a real function call. The practical rule: comprehensions are always at least as fast as loops, and meaningfully faster for arithmetic. If performance is truly critical, use NumPy vectorized operations — they are typically 10–100× faster than any Python loop or comprehension for numerical work.
Common mistakes to avoid
1. Using a comprehension purely for side effects
# Wrong: builds a throwaway list of None values
_ = [print(x) for x in items]
# Right: use a for loop for side effects
for x in items:
print(x)
2. Mutating a list while iterating it with a for loop
numbers = [1, 2, 3, 4, 5]
# Wrong: skip elements because list shrinks mid-iteration
for n in numbers:
if n % 2 == 0:
numbers.remove(n)
print(numbers) # [1, 3, 5] — only by luck; the logic is broken
# Right: build a new filtered list — comprehension makes this natural
numbers = [1, 2, 3, 4, 5]
numbers = [n for n in numbers if n % 2 != 0]
print(numbers) # [1, 3, 5] — correct, clear intent
3. Using a list comprehension when a generator expression is enough
# Wasteful: allocates a list of 1,000,000 integers just to sum them
total = sum([x**2 for x in range(1_000_000)])
# Efficient: generator yields one value at a time, near-zero memory overhead
total = sum(x**2 for x in range(1_000_000))
# The same applies to any function that accepts an iterable:
maximum = max(x.strip() for x in lines if x.strip())
exists = any(n > 100 for n in numbers)
count = sum(1 for w in words if w.startswith("a"))
4. Triple-nesting into unreadability
# Hard to read: three levels, two filters
result = [[c*2 for c in row if c>0] for row in matrix if sum(row)>10]
# Readable: extract a named helper
def double_positives(row):
return [c * 2 for c in row if c > 0]
result = [double_positives(row) for row in matrix if sum(row) > 10]
When not to use them
List comprehensions are not a universal replacement for all loops. Here is a decision table to guide the choice:
| Situation | Use comprehension? | Alternative |
|---|---|---|
| Building a new list by transforming or filtering | ✅ Yes — primary use case | — |
| Iterating to produce side effects (print, write, mutate external state) | ❌ No | for loop |
| Reducing to a single value (sum, max, count) | ⚠️ Use a generator | sum(x for x in ...) |
| Three or more levels of nesting | ❌ No | Helper function + comprehension |
Loop needs break or continue |
❌ No | for loop |
| Building a dict | ✅ Use dict comprehension | {k: v for k, v in ...} |
| Deduplicating a sequence | ✅ Use set comprehension | {x for x in ...} |
| Large dataset, iterating only once | ⚠️ Prefer generator | (x for x in ...) |
| Multi-step logic that needs intermediate variables | ❌ No | for loop or walrus if simple |
| Numerical operations on large arrays | ⚠️ Only for small arrays | NumPy vectorized operations |
The most important principle is that code communicates intent. A list comprehension says: "I am building a new list." A for loop says: "I am doing something repeatedly." When the loop's purpose is side effects, not list construction, a comprehension obscures that intent — even if it technically works.
Python also offers itertools as an alternative to complex comprehensions. Functions like itertools.compress(), itertools.filterfalse(), and itertools.chain.from_iterable() replace common comprehension patterns with named, composable operations that become clearer as complexity grows. When a comprehension starts to feel like a puzzle, reach for itertools or a well-named helper function instead.