What are window functions?
A window function performs a calculation across a set of rows that are related to the current row — without collapsing those rows into a single output row the way GROUP BY does. The set of rows the function sees is called the window, defined by the OVER clause.
The best way to understand why window functions exist is to start with a problem that is awkward without them. Suppose you have a sales table and you want to show each sale alongside the department total. With a plain GROUP BY, you lose the individual rows:
CREATE TABLE sales (
id INT,
employee VARCHAR(50),
department VARCHAR(50),
amount NUMERIC(10, 2),
sale_date DATE
);
INSERT INTO sales VALUES
(1, 'Alice', 'Engineering', 1200.00, '2025-01-05'),
(2, 'Bob', 'Engineering', 850.00, '2025-01-08'),
(3, 'Carol', 'Marketing', 600.00, '2025-01-03'),
(4, 'Dave', 'Marketing', 950.00, '2025-01-10'),
(5, 'Eve', 'Marketing', 720.00, '2025-01-12'),
(6, 'Frank', 'Engineering', 1500.00, '2025-01-15'),
(7, 'Grace', 'Sales', 400.00, '2025-01-04'),
(8, 'Henry', 'Sales', 880.00, '2025-01-09'),
(9, 'Iris', 'Sales', 1100.00, '2025-01-11'),
(10, 'Jack', 'Engineering', 970.00, '2025-01-14');
-- This gives you one row per department
-- You can't see individual employees anymore
SELECT department, SUM(amount) AS dept_total
FROM sales
GROUP BY department;
-- Window function: each row gets the department total alongside it
SELECT
employee,
department,
amount,
SUM(amount) OVER (PARTITION BY department) AS dept_total,
ROUND(amount / SUM(amount) OVER (PARTITION BY department) * 100, 1) AS pct_of_dept
FROM sales
ORDER BY department, amount DESC;
Every row is present, each showing its individual sale, the department total, and the individual employee's percentage contribution to that department — computed in a single query without any joins or subqueries.
The OVER clause anatomy
The OVER clause defines the window — the set of rows available to the function for each row being processed. It has three optional sub-clauses:
function_name() OVER (
[PARTITION BY column1, column2, ...] -- divide rows into groups
[ORDER BY column3 ASC|DESC, ...] -- define row order within group
[frame_clause] -- define the physical frame
)
Each sub-clause is independent and optional:
- No OVER sub-clauses —
OVER ()means the window is all rows in the result set. ASUM() OVER ()gives you the grand total on every row. - PARTITION BY only — divides rows into groups, restarts calculation for each group. Like
GROUP BYbut keeps all rows. - ORDER BY only — defines an ordering across all rows. For aggregate functions, this implicitly creates a running/cumulative calculation (see frame clause section).
- Both — partitions first, then orders within each partition.
SELECT
employee,
department,
amount,
-- 1. OVER () — grand total of all rows
SUM(amount) OVER () AS grand_total,
-- 2. PARTITION BY — total per department (no ordering)
SUM(amount) OVER (PARTITION BY department) AS dept_total,
-- 3. ORDER BY only — running total across all rows by date
SUM(amount) OVER (ORDER BY sale_date) AS running_total,
-- 4. PARTITION BY + ORDER BY — running total per department
SUM(amount) OVER (PARTITION BY department
ORDER BY sale_date) AS dept_running_total
FROM sales
ORDER BY sale_date;
PARTITION BY
PARTITION BY splits the result set into independent groups before the window function runs. The function restarts its calculation for each partition. You can partition by any expression — one column, multiple columns, or even a computed value:
-- Average and max per department alongside each row
SELECT
employee,
department,
amount,
AVG(amount) OVER (PARTITION BY department) AS dept_avg,
MAX(amount) OVER (PARTITION BY department) AS dept_max,
amount - AVG(amount) OVER (PARTITION BY department) AS vs_dept_avg
FROM sales
ORDER BY department, amount DESC;
-- Partition by month extracted from date
SELECT
sale_date,
employee,
amount,
SUM(amount) OVER (
PARTITION BY DATE_TRUNC('month', sale_date) -- PostgreSQL
-- DATE_FORMAT(sale_date, '%Y-%m') for MySQL
) AS monthly_total
FROM sales
ORDER BY sale_date;
ROW_NUMBER, RANK, DENSE_RANK, NTILE
The four ranking functions assign a number to each row based on position within a window. Understanding when to use each is important for correct results:
SELECT
employee,
department,
amount,
-- ROW_NUMBER: unique sequential number, no gaps, no ties
ROW_NUMBER() OVER (PARTITION BY department ORDER BY amount DESC) AS row_num,
-- RANK: gaps after ties (1, 2, 2, 4 — position 3 is skipped)
RANK() OVER (PARTITION BY department ORDER BY amount DESC) AS rank_num,
-- DENSE_RANK: no gaps after ties (1, 2, 2, 3 — consecutive)
DENSE_RANK() OVER (PARTITION BY department ORDER BY amount DESC) AS dense_num
FROM sales
ORDER BY department, amount DESC;
To see the difference between RANK and DENSE_RANK clearly, here is a dedicated example with intentional ties:
SELECT
name,
score,
RANK() OVER (ORDER BY score DESC) AS rank_pos,
DENSE_RANK() OVER (ORDER BY score DESC) AS dense_pos,
ROW_NUMBER() OVER (ORDER BY score DESC) AS row_pos
FROM (
VALUES ('Alice', 95), ('Bob', 88), ('Carol', 88),
('Dave', 72), ('Eve', 72), ('Frank', 65)
) AS t(name, score);
NTILE — divide rows into buckets
NTILE(n) divides the ordered rows within a partition into n approximately equal buckets, assigning each row a bucket number from 1 to n. This is useful for percentile analysis — quartiles, deciles, quintiles:
-- Divide employees into 4 quartiles by sales amount
SELECT
employee,
amount,
NTILE(4) OVER (ORDER BY amount) AS quartile,
CASE NTILE(4) OVER (ORDER BY amount)
WHEN 1 THEN 'Bottom 25%'
WHEN 2 THEN 'Lower mid'
WHEN 3 THEN 'Upper mid'
WHEN 4 THEN 'Top 25%'
END AS tier
FROM sales
ORDER BY amount;
The classic "top N per group" pattern
One of the most frequent uses of ROW_NUMBER is selecting the top N rows per partition. Because window functions cannot appear in WHERE, wrap in a CTE:
WITH ranked AS (
SELECT
employee,
department,
amount,
ROW_NUMBER() OVER (
PARTITION BY department
ORDER BY amount DESC
) AS rn
FROM sales
)
SELECT employee, department, amount
FROM ranked
WHERE rn <= 2
ORDER BY department, amount DESC;
Aggregate functions as window functions
All standard aggregate functions — SUM, AVG, COUNT, MIN, MAX — can be used as window functions by appending an OVER clause. This gives you the aggregate result while keeping individual row visibility:
SELECT
employee,
department,
amount,
-- Count of employees in same department
COUNT(*) OVER (PARTITION BY department) AS dept_headcount,
-- Total sales company-wide
SUM(amount) OVER () AS company_total,
-- Highest single sale across all departments
MAX(amount) OVER () AS top_sale,
-- Each row's share of company total
ROUND(
100.0 * amount / SUM(amount) OVER (), 2
) AS pct_of_company
FROM sales
ORDER BY department, amount DESC;
Running totals and cumulative sums
A running total accumulates values as you move through rows in a defined order. Add ORDER BY to an aggregate window function, and the database automatically computes the cumulative value from the first row up to and including the current row:
-- Running total of sales ordered by date
SELECT
sale_date,
employee,
amount,
SUM(amount) OVER (
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM sales
ORDER BY sale_date;
-- Cumulative sales within each department, ordered by date
SELECT
sale_date,
employee,
department,
amount,
SUM(amount) OVER (
PARTITION BY department
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS dept_running_total,
-- Running count of transactions per department
COUNT(*) OVER (
PARTITION BY department
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS dept_tx_count
FROM sales
ORDER BY department, sale_date;
Moving averages and frame clauses
The frame clause gives you precise control over which rows are included in the window for each row's calculation. It defines the frame as a range of rows relative to the current row.
The full frame clause syntax:
{ROWS | RANGE} BETWEEN frame_start AND frame_end
-- Frame boundary options:
UNBOUNDED PRECEDING -- first row of the partition
N PRECEDING -- N rows before the current row
CURRENT ROW -- the current row itself
N FOLLOWING -- N rows after the current row
UNBOUNDED FOLLOWING -- last row of the partition
ROWS counts physical rows. RANGE counts rows by value — when using RANGE, all rows with the same ORDER BY value as the current row are included in the same frame boundary.
-- 3-sale moving average (current row + 2 preceding)
SELECT
sale_date,
employee,
amount,
ROUND(
AVG(amount) OVER (
ORDER BY sale_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
), 2
) AS moving_avg_3
FROM sales
ORDER BY sale_date;
-- Centered 5-row window: 2 before + current + 2 after
SELECT
sale_date,
employee,
amount,
ROUND(
AVG(amount) OVER (
ORDER BY sale_date
ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
), 2
) AS centered_avg_5,
-- Cumulative max (highest sale seen so far)
MAX(amount) OVER (
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_max
FROM sales
ORDER BY sale_date;
ORDER BY is present in a window function but no frame clause is specified, the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW — a cumulative window. This is why SUM() OVER (ORDER BY date) gives a running total without an explicit frame clause. Always write the frame clause explicitly in production code to avoid ambiguity.LAG and LEAD
LAG accesses data from a previous row without a self-join. LEAD accesses a following row. Both are invaluable for period-over-period comparisons, change calculations, and gap analysis:
LAG (column [, offset [, default]]) OVER (...)
LEAD (column [, offset [, default]]) OVER (...)
-- offset: how many rows back/forward (default 1)
-- default: value to use when the offset goes out of bounds (default NULL)
SELECT
sale_date,
employee,
amount,
-- Previous row's amount (within the same date ordering)
LAG(amount) OVER (ORDER BY sale_date) AS prev_amount,
-- Amount two rows back
LAG(amount, 2, 0) OVER (ORDER BY sale_date) AS amount_2_back,
-- Change vs previous sale
amount - LAG(amount, 1, amount) OVER (ORDER BY sale_date) AS delta,
-- Next sale's amount
LEAD(amount) OVER (ORDER BY sale_date) AS next_amount
FROM sales
ORDER BY sale_date;
WITH monthly AS (
SELECT
DATE_TRUNC('month', sale_date) AS month, -- PostgreSQL
SUM(amount) AS total
FROM sales
GROUP BY 1
)
SELECT
month,
total,
LAG(total) OVER (ORDER BY month) AS prev_month_total,
ROUND(
100.0 * (total - LAG(total) OVER (ORDER BY month))
/ LAG(total) OVER (ORDER BY month),
2
) AS mom_growth_pct
FROM monthly
ORDER BY month;
SELECT
employee,
sale_date,
LAG(sale_date) OVER (
PARTITION BY employee
ORDER BY sale_date
) AS prev_sale_date,
sale_date - LAG(sale_date) OVER (
PARTITION BY employee
ORDER BY sale_date
) AS days_since_last_sale -- PostgreSQL date arithmetic
FROM sales
ORDER BY employee, sale_date;
FIRST_VALUE, LAST_VALUE, NTH_VALUE
These functions return the value of a specified expression from the first, last, or nth row of the window frame:
SELECT
employee,
department,
amount,
-- Top earner in the same department
FIRST_VALUE(employee) OVER (
PARTITION BY department
ORDER BY amount DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS dept_top_earner,
-- Lowest earner in the same department
LAST_VALUE(employee) OVER (
PARTITION BY department
ORDER BY amount DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS dept_lowest_earner,
-- Second highest earner in the department
NTH_VALUE(employee, 2) OVER (
PARTITION BY department
ORDER BY amount DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS dept_second_earner
FROM sales
ORDER BY department, amount DESC;
LAST_VALUE is frequently misused. Without an explicit ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, the default frame ends at the current row, so LAST_VALUE returns the current row's own value — not the partition's last row. Always specify the full frame when using LAST_VALUE or NTH_VALUE.Named windows (WINDOW clause)
When the same window definition appears multiple times in a query, the WINDOW clause lets you name it once and reference it by name. This avoids repetition and keeps the SELECT readable:
-- Without named window: repetitive
SELECT
employee,
department,
amount,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY amount DESC) AS rn,
RANK() OVER (PARTITION BY department ORDER BY amount DESC) AS rnk,
DENSE_RANK() OVER (PARTITION BY department ORDER BY amount DESC) AS dense_rnk,
SUM(amount) OVER (PARTITION BY department ORDER BY amount DESC) AS running_sum
FROM sales;
-- With named window: define once, reuse cleanly
SELECT
employee,
department,
amount,
ROW_NUMBER() OVER dept_window AS rn,
RANK() OVER dept_window AS rnk,
DENSE_RANK() OVER dept_window AS dense_rnk,
SUM(amount) OVER dept_window AS running_sum
FROM sales
WINDOW dept_window AS (
PARTITION BY department
ORDER BY amount DESC
);
Named windows also support inheritance — one window can extend another by referencing its name in the OVER clause with additional clauses:
SELECT
employee,
department,
amount,
SUM(amount) OVER (by_dept) AS dept_total,
SUM(amount) OVER (by_dept ORDER BY sale_date) AS dept_running_total
FROM sales
WINDOW by_dept AS (PARTITION BY department);
-- The second OVER inherits PARTITION BY from by_dept and adds ORDER BY
Real-world query patterns
Deduplicate: keep the latest record per entity
-- Table with multiple rows per user — keep only the most recent
WITH latest AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY created_at DESC
) AS rn
FROM user_events
)
SELECT * FROM latest WHERE rn = 1;
Median without a dedicated function
-- Median salary using PERCENTILE_CONT (PostgreSQL, SQL Server)
SELECT
department,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS median_sale
FROM sales
GROUP BY department;
-- Median using ROW_NUMBER for databases without PERCENTILE_CONT
WITH ordered AS (
SELECT
amount,
ROW_NUMBER() OVER (ORDER BY amount) AS rn,
COUNT(*) OVER () AS total
FROM sales
)
SELECT AVG(amount) AS median
FROM ordered
WHERE rn IN ((total + 1) / 2, (total + 2) / 2);
Identify consecutive streaks
-- The "islands and gaps" technique using ROW_NUMBER
-- A consecutive streak is where date - row_number is constant
WITH ranked AS (
SELECT
employee,
sale_date,
sale_date - INTERVAL (ROW_NUMBER() OVER (
PARTITION BY employee
ORDER BY sale_date
) || ' days')::interval AS streak_group -- PostgreSQL
FROM (
SELECT DISTINCT employee, sale_date FROM sales
) d
)
SELECT
employee,
MIN(sale_date) AS streak_start,
MAX(sale_date) AS streak_end,
COUNT(*) AS streak_length
FROM ranked
GROUP BY employee, streak_group
ORDER BY streak_length DESC;
Year-over-year comparison in a single query
WITH yearly AS (
SELECT
EXTRACT(YEAR FROM sale_date) AS yr,
department,
SUM(amount) AS total
FROM sales
GROUP BY 1, 2
)
SELECT
yr,
department,
total,
LAG(total) OVER (
PARTITION BY department
ORDER BY yr
) AS prior_yr_total,
ROUND(
100.0 * (total - LAG(total) OVER (PARTITION BY department ORDER BY yr))
/ LAG(total) OVER (PARTITION BY department ORDER BY yr),
2
) AS yoy_growth_pct
FROM yearly
ORDER BY department, yr;
Performance and indexes
Window functions are evaluated late in the query pipeline — after WHERE, GROUP BY, and HAVING — so they never benefit directly from indexes to reduce the rows scanned. However, indexes can still help in two important ways:
1. Speeding up the sort for ORDER BY inside OVER: If a window function's ORDER BY matches an existing index, the database can avoid a sort step entirely by reading from the index in order. A partial or covering index that matches the PARTITION BY and ORDER BY columns is ideal:
-- This window function benefits from an index on (department, amount DESC)
ROW_NUMBER() OVER (PARTITION BY department ORDER BY amount DESC)
-- Create a matching index
CREATE INDEX idx_sales_dept_amount
ON sales (department, amount DESC);
2. Reducing rows early with WHERE/CTE filtering: Place as many filters as possible before the window function evaluates. Wrapping in a CTE that filters first means the window function operates on a smaller dataset:
-- Better: filter first, then rank
WITH recent_sales AS (
SELECT * FROM sales
WHERE sale_date >= '2025-01-01' -- filter rows before windowing
)
SELECT
employee,
amount,
RANK() OVER (ORDER BY amount DESC) AS overall_rank
FROM recent_sales;
-- Always use EXPLAIN / EXPLAIN ANALYZE to verify the execution plan
EXPLAIN ANALYZE
SELECT
employee,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY amount DESC) AS rn
FROM sales;
For very large datasets, be aware that window functions may require the database to materialize the sorted partition in memory. If the partition exceeds work_mem (PostgreSQL) or the equivalent sort buffer, the database will spill to disk, which dramatically slows the query. Increasing the sort buffer for a specific query session or rewriting with batched CTEs can help for extremely large datasets.
Finally, avoid computing the same window expression multiple times in a single query. The optimizer may not deduplicate them. Use a CTE or a subquery to compute the window once, then reference the result:
-- Less efficient: same window computed twice (optimizer may not deduplicate)
SELECT
amount / SUM(amount) OVER (PARTITION BY department) AS pct,
amount - AVG(amount) OVER (PARTITION BY department) AS vs_avg
FROM sales;
-- More explicit: compute in a subquery, reuse the column aliases
SELECT
amount / dept_total AS pct,
amount - dept_avg AS vs_avg
FROM (
SELECT
amount,
SUM(amount) OVER (PARTITION BY department) AS dept_total,
AVG(amount) OVER (PARTITION BY department) AS dept_avg
FROM sales
) sub;