Multi-level Aggregation Using MySQL GROUP BY WITH ROLLUP
A means to retrieve multi-level aggregation metrics in a single query...
Hey hey! 👍
So glad to have you here reading my thoughts as I continue to learn MySQL and PHP while navigating this self-taught developers’ path. Check out some of the cool stuff I've been up to lately…
Kofi Shop Freebie
I’ve added a free PDF to my Kofi shop with a mantra/slogan/saying that maps out to the specific MySQL query clause syntax order. I hope this free resource helps you remember the MySQL query clause order. I know it sure does help me!
Call it premonition or just my curiosity, but for whatever reason, I have been studying - of my own accord - the WITH ROLLUP modifier for the MySQL GROUP BY clause.
Having never used WITH ROLLUP in any queries in real-world projects, I still found it quite interesting as a means to retrieve multi-level aggregation metrics in a single query.
And so, some amount of time passed…
Lo and behold I would need this functionality sooner than I anticipated for a query requirement so kudos to my curious persistence!
Nevertheless, what does GROUP BY WITH ROLLUP do?
That is what we are going to learn about in this post using some simple exploratory queries and examples.
ROLLUP What?
I'm using a more manageable subset of data based on this query against the Sakila database 'payment' table:
SELECT
MONTHNAME(payment_date) AS month_name, DAYNAME(payment_date) AS day_name, SUM(amount) AS total
FROM payment
WHERE MONTH(payment_date) IN(2,5)
GROUP BY MONTHNAME(payment_date), DAYNAME(payment_date);These query results show that we have a summation total for all payments in the months of February and May on a 'per weekday' basis.
Suppose you need a finer level of aggregation, say at the single month and the dual months' combined levels.
Well, you could determine these values in several different ways:
Calculate them yourself: Error-prone and just an all-around bad idea.
Execute an additional query: Definitely a better option.
Use the
WITH ROLLUPmodifier of theGROUP BYclause: The best option yet.
MySQL GROUP BY WITH ROLLUP Modifier
In this next query, I use the GROUP BY clause and the WITH ROLLUP modifier on 2 SELECT list columns that are not part of an aggregate function call; the MONTHNAME(payment_date) and DAYNAME(payment_date) expressions respectively:
SELECT
MONTHNAME(payment_date) AS month_name, DAYNAME(payment_date) AS day_name, SUM(amount) AS total
FROM payment
WHERE MONTH(payment_date) IN(2,5)
GROUP BY MONTHNAME(payment_date), DAYNAME(payment_date) WITH ROLLUP;Viewing these query results, you can see that:
Rows with a
NULLvalue for the 'day_name' column has summary totals at the month level in the 'total' column. The MySQL documentation refers to rows like these as having super-aggregation.Rows having a non-
NULLvalue in the 'month_name' and 'day_name' columns have a standard aggregation level summation.There is one row with both the 'month_name' and 'day_name' columns having a
NULLvalue in each, which contains the summary total calculation for all months combined (February and May in this example) in the 'total' column. This is also a super-aggregate row.
Filtering Super-Aggregate Rows
If you wanted to filter for just the row containing the 'all_months' summary total, you need to move the SELECT query to the FROM clause as a Derived Table and then use the IS NULL operator in the WHERE clause from the outer query as shown in this example:
SELECT IFNULL(t.month_name, 'ALL MONTHS') AS all_months, t.total AS all_months_totals
FROM (
SELECT
MONTHNAME(payment_date) AS month_name, DAYNAME(payment_date) AS day_name, SUM(amount) AS total
FROM payment
WHERE MONTH(payment_date) IN(2,5)
GROUP BY MONTHNAME(payment_date), DAYNAME(payment_date) WITH ROLLUP) AS t
WHERE t.month_name IS NULL;The reason we must use a Derived Table to look at only those super-aggregate rows is best explained in the online GROUP BY Modifiers documentation:
“Because the
NULLvalues in the super-aggregate rows are placed into the result set at such a late stage in query processing, you can test them asNULLvalues only in the select list orHAVINGclause. You cannot test them asNULLvalues in join conditions or theWHEREclause to determine which rows to select.”
MySQL GROUPING() Function
Even though using a Derived Table for this query requirement absolutely works, MySQL provides additional mechanisms you can use in this instance (and many others), one of them being the GROUPING() function.
The GROUPING() function tests whether NULL values in the result set indicate any of the super-aggregate values.
However, the GROUPING() function is only allowed in these 3 places in a query:
The
SELECTlistThe
HAVINGclauseThe
ORDER BYclause
📚 Discover in-depth articles, guides, and videos in the “MySQL Learning Tier” membership over on my Kofi page. All levels of learning and understanding are welcome there.
As shown by the result set in the following query, calling the GROUPING() function on the expressions MONTHNAME(payment_date) and DAYNAME(payment_date), returns 1 for any NULL columns which represent the super-aggregation level or 0 otherwise:
SELECT
MONTHNAME(payment_date) AS month_name, DAYNAME(payment_date) AS day_name, SUM(amount) AS total,
GROUPING(MONTHNAME(payment_date)) AS mnth_grp,
GROUPING(DAYNAME(payment_date)) AS day_name_grp
FROM payment
WHERE MONTH(payment_date) IN(2,5)
GROUP BY MONTHNAME(payment_date), DAYNAME(payment_date) WITH ROLLUP;To target the specific super-aggregate rows only, we can again use a Derived Table (such a handy feature in all of SQL) and filter just those rows from the outer query WHERE clause:
SELECT t.month_name, t.day_name, t.total
FROM
(SELECT
MONTHNAME(payment_date) AS month_name, DAYNAME(payment_date) AS day_name, SUM(amount) AS total,
GROUPING(MONTHNAME(payment_date)) AS mnth_grp,
GROUPING(DAYNAME(payment_date)) AS day_name_grp
FROM payment
WHERE MONTH(payment_date) IN(2,5)
GROUP BY MONTHNAME(payment_date), DAYNAME(payment_date) WITH ROLLUP
) AS t
WHERE t.mnth_grp = 1 OR t.day_name_grp = 1;
An Easier Way
While the previous query we looked at does work, MySQL provides a simpler means to obtain just those super-aggregate rows without the need to use a Derived Table.
We can filter the rows, looking at only super-aggregate NULLs using the GROUPING() function, in the HAVING clause as shown in this next query:
SELECT
MONTHNAME(payment_date) AS month_name, DAYNAME(payment_date) AS day_name,
SUM(amount) AS total
FROM payment
WHERE MONTH(payment_date) IN(2,5)
GROUP BY MONTHNAME(payment_date), DAYNAME(payment_date) WITH ROLLUP
HAVING GROUPING(month_name, day_name) <> 0;From the online documentation:
"With multiple expression arguments, GROUPING() returns a result representing a bitmask the combines the results for each expression, with the lowest-order bit corresponding to the result for the rightmost expression."
Knowing this information, we are able to filter in the HAVING clause with this expression because GROUPING() returns a nonzero value for the super-aggregate NULLs:
HAVING GROUPING(month_name, day_name) <> 0More To Come
While I learned a great deal about the WITH ROLLUP modifier, I won't pretend that everything was all chocolate and roses.
I’ll cover some of the problems and obstacles I ran into in a follow-up post as I continue to use and explore this handy feature.
Thank you for reading!
Until next time,
Joshua Otwell
Visit my blog Digital Owl’s Prose, where I write regularly on CodeIgniter, PHP, and SQL. I’m also active on Medium, LinkedIn, Twitter, and Instagram.
If you have any questions or see any mistakes in the code, please let me know via the comments. Constructive comments help me provide accurate content and are much appreciated.
Want to support this newsletter and my work? Drop some spare change in my Tip Jar.💰
Disclosure: Some of the services and products links in this email are affiliate links. At no additional cost to you, should you make a purchase by clicking through one of them, I will receive a commission.








