Multi-level Aggregation Using MySQL GROUP BY WITH ROLLUP
A means to retrieve multi-level aggregation metrics in a single query...
Hey hey! 👍
So glad to have you here reading my thoughts as I continue to learn MySQL and PHP while navigating this self-taught developers’ path. Check out some of the cool stuff I've been up to lately…
Kofi Shop Freebie
I’ve added a free PDF to my Kofi shop with a mantra/slogan/saying that maps out to the specific MySQL query clause syntax order. I hope this free resource helps you remember the MySQL query clause order. I know it sure does help me!
Call it premonition or just my curiosity, but for whatever reason, I have been studying - of my own accord - the WITH ROLLUP
modifier for the MySQL GROUP BY
clause.
Having never used WITH ROLLUP
in any queries in real-world projects, I still found it quite interesting as a means to retrieve multi-level aggregation metrics in a single query.
And so, some amount of time passed…
Lo and behold I would need this functionality sooner than I anticipated for a query requirement so kudos to my curious persistence!
Nevertheless, what does GROUP BY WITH ROLLUP
do?
That is what we are going to learn about in this post using some simple exploratory queries and examples.
ROLLUP What?
I'm using a more manageable subset of data based on this query against the Sakila database 'payment' table:
SELECT
MONTHNAME(payment_date) AS month_name, DAYNAME(payment_date) AS day_name, SUM(amount) AS total
FROM payment
WHERE MONTH(payment_date) IN(2,5)
GROUP BY MONTHNAME(payment_date), DAYNAME(payment_date);
These query results show that we have a summation total for all payments in the months of February and May on a 'per weekday' basis.
Suppose you need a finer level of aggregation, say at the single month and the dual months' combined levels.
Well, you could determine these values in several different ways:
Calculate them yourself: Error-prone and just an all-around bad idea.
Execute an additional query: Definitely a better option.
Use the
WITH ROLLUP
modifier of theGROUP BY
clause: The best option yet.
MySQL GROUP BY WITH ROLLUP Modifier
In this next query, I use the GROUP BY
clause and the WITH ROLLUP
modifier on 2 SELECT
list columns that are not part of an aggregate function call; the MONTHNAME(payment_date)
and DAYNAME(payment_date)
expressions respectively:
SELECT
MONTHNAME(payment_date) AS month_name, DAYNAME(payment_date) AS day_name, SUM(amount) AS total
FROM payment
WHERE MONTH(payment_date) IN(2,5)
GROUP BY MONTHNAME(payment_date), DAYNAME(payment_date) WITH ROLLUP;
Viewing these query results, you can see that:
Rows with a
NULL
value for the 'day_name' column has summary totals at the month level in the 'total' column. The MySQL documentation refers to rows like these as having super-aggregation.Rows having a non-
NULL
value in the 'month_name' and 'day_name' columns have a standard aggregation level summation.There is one row with both the 'month_name' and 'day_name' columns having a
NULL
value in each, which contains the summary total calculation for all months combined (February and May in this example) in the 'total' column. This is also a super-aggregate row.
Filtering Super-Aggregate Rows
If you wanted to filter for just the row containing the 'all_months' summary total, you need to move the SELECT
query to the FROM
clause as a Derived Table and then use the IS NULL
operator in the WHERE
clause from the outer query as shown in this example:
SELECT IFNULL(t.month_name, 'ALL MONTHS') AS all_months, t.total AS all_months_totals
FROM (
SELECT
MONTHNAME(payment_date) AS month_name, DAYNAME(payment_date) AS day_name, SUM(amount) AS total
FROM payment
WHERE MONTH(payment_date) IN(2,5)
GROUP BY MONTHNAME(payment_date), DAYNAME(payment_date) WITH ROLLUP) AS t
WHERE t.month_name IS NULL;
The reason we must use a Derived Table to look at only those super-aggregate rows is best explained in the online GROUP BY Modifiers documentation:
“Because the
NULL
values in the super-aggregate rows are placed into the result set at such a late stage in query processing, you can test them asNULL
values only in the select list orHAVING
clause. You cannot test them asNULL
values in join conditions or theWHERE
clause to determine which rows to select.”
MySQL GROUPING() Function
Even though using a Derived Table for this query requirement absolutely works, MySQL provides additional mechanisms you can use in this instance (and many others), one of them being the GROUPING()
function.
The GROUPING()
function tests whether NULL
values in the result set indicate any of the super-aggregate values.
However, the GROUPING()
function is only allowed in these 3 places in a query:
The
SELECT
listThe
HAVING
clauseThe
ORDER BY
clause
📚 Discover in-depth articles, guides, and videos in the “MySQL Learning Tier” membership over on my Kofi page. All levels of learning and understanding are welcome there.
As shown by the result set in the following query, calling the GROUPING()
function on the expressions MONTHNAME(payment_date)
and DAYNAME(payment_date)
, returns 1 for any NULL columns which represent the super-aggregation level or 0 otherwise:
SELECT
MONTHNAME(payment_date) AS month_name, DAYNAME(payment_date) AS day_name, SUM(amount) AS total,
GROUPING(MONTHNAME(payment_date)) AS mnth_grp,
GROUPING(DAYNAME(payment_date)) AS day_name_grp
FROM payment
WHERE MONTH(payment_date) IN(2,5)
GROUP BY MONTHNAME(payment_date), DAYNAME(payment_date) WITH ROLLUP;
To target the specific super-aggregate rows only, we can again use a Derived Table (such a handy feature in all of SQL) and filter just those rows from the outer query WHERE
clause:
SELECT t.month_name, t.day_name, t.total
FROM
(SELECT
MONTHNAME(payment_date) AS month_name, DAYNAME(payment_date) AS day_name, SUM(amount) AS total,
GROUPING(MONTHNAME(payment_date)) AS mnth_grp,
GROUPING(DAYNAME(payment_date)) AS day_name_grp
FROM payment
WHERE MONTH(payment_date) IN(2,5)
GROUP BY MONTHNAME(payment_date), DAYNAME(payment_date) WITH ROLLUP
) AS t
WHERE t.mnth_grp = 1 OR t.day_name_grp = 1;
An Easier Way
While the previous query we looked at does work, MySQL provides a simpler means to obtain just those super-aggregate rows without the need to use a Derived Table.
We can filter the rows, looking at only super-aggregate NULL
s using the GROUPING()
function, in the HAVING
clause as shown in this next query:
SELECT
MONTHNAME(payment_date) AS month_name, DAYNAME(payment_date) AS day_name,
SUM(amount) AS total
FROM payment
WHERE MONTH(payment_date) IN(2,5)
GROUP BY MONTHNAME(payment_date), DAYNAME(payment_date) WITH ROLLUP
HAVING GROUPING(month_name, day_name) <> 0;
From the online documentation:
"With multiple expression arguments, GROUPING() returns a result representing a bitmask the combines the results for each expression, with the lowest-order bit corresponding to the result for the rightmost expression."
Knowing this information, we are able to filter in the HAVING
clause with this expression because GROUPING()
returns a nonzero value for the super-aggregate NULL
s:
HAVING GROUPING(month_name, day_name) <> 0
More To Come
While I learned a great deal about the WITH ROLLUP
modifier, I won't pretend that everything was all chocolate and roses.
I’ll cover some of the problems and obstacles I ran into in a follow-up post as I continue to use and explore this handy feature.
Thank you for reading!
Until next time,
Joshua Otwell
Visit my blog Digital Owl’s Prose, where I write regularly on CodeIgniter, PHP, and SQL. I’m also active on Medium, LinkedIn, Twitter, and Instagram.
If you have any questions or see any mistakes in the code, please let me know via the comments. Constructive comments help me provide accurate content and are much appreciated.
Want to support this newsletter and my work? Drop some spare change in my Tip Jar.💰
Disclosure: Some of the services and products links in this email are affiliate links. At no additional cost to you, should you make a purchase by clicking through one of them, I will receive a commission.