How to Generate Summary Reports in SQL with GROUP BY and Aggregate Functions
One of the most common tasks that SQL developers are asked to do is build summary reports from a data set. In this guide we'll examine how to combine the GROUP BY and various aggregate functions to generate summary reports.
Guide Tasks
  • Read Tutorial
  • Watch Guide Video
Video locked
This video is viewable to users with a Bottega Bootcamp license

I'm excited about this guide because this guide is going to be one where we put a lot of the things that we've learned so far together. Hopefully where you'll be able to see some of the practicality of where SQL comes into play In being able, to sum up, and summarize data.

Now we're going to leverage functions, and we're also going to learn about something new called the GROUP BY clause, and that's going to allow us to create summary reports right In the SQL.

I'm going to use a couple of different databases for the examples, but right now I'm going to start with a more basic one with addresses. I'm going to say select addresses state, and we're going to include one of our aggregate functions count. We're going to count the states and so I'd say addresses_state and then say from addresses.

Now I can GROUP BY addresses_state.

SELECT addresses_state, COUNT(addresses_state)
FROM addresses
GROUP BY addresses_state;

Now if I run this, this is going to group our states so we're going to say that we can see that we have one Arizona State and we have three for New York. All of this works, and this is a pretty cool way of being able to grab a quick peek into your data.

large

So being able to see if you did have an application where you collected states and addresses from users. You could within three lines of code right here be able to take a look and see how many users you had in a particular location. Now obviously we could also change this up and say we're looking to group by city and let me leave this here so that in the show notes you can use it. So here I'm going to say city and copy in each one of these spots.

SELECT addresses_city, COUNT(addresses_city)
FROM addresses
GROUP BY addresses_city;

Now if I run this it is going to show that we have two users in Manhattan one in Phoenix and one in Queens.

large

This is helpful one count is a great way of being able to summarize the data. Now let's look at another example, and this one is I think even a little bit more practical. Let's say that we have our guides tables. They select all from guides just so that we can see it.

SELECT *
FROM guides

Right here you can see that we have each one of these guides and we have revenue, and we know the user that is associated with it and then the title.

large

Now for this one, I don't care that much about the title but what if I asked you to go out and build a query that told me how much money each user-generated. That may or may not be something that seems very intuitive to you if you've never done that kind of thing before, but by leveraging aggregate functions, we can create that summary very quickly.

What I'm going to do here is I'm going to say select and then guides_users_id and then sum up the guides revenue and then that's going to be from guides, and then we're going to group by the guides_users_id.

SELECT guides_users_id, SUM(guides_revenue)
FROM guides
GROUP BY guides_users_id;

Now when I run this, this is going to do something very cool. It is going to group all of our guide_users_id just like we told it to do right here and then it's going, to sum up, all of the revenue per user.

large

So within just once again three lines of code, we're able to go in and query to see how much money each one of our users were able to generate from their guides. This is something that if you didn't use aggregate functions. It would take quite a bit more work to do.

If you are doing this say in a programming language where you ran a query you grabbed all the values and then you had to go sum them up yourself. SQL actually gives you the ability to just return those values, and then you can use them, and because SQL is so incredibly fast, it allows you to do it and to do it more efficiently.

Whenever I can offload some computational power such as some mean items up or counting items, and I can pass that to SQL. I try to take advantage of that as much as I can because it will allow me to really, kind of take and delegate some of the computing power back into the data instead of the application itself. I definitely recommend for you to play with all kinds of different combinations with GROUP BY and passing in aggregate functions to see how it works and so you can understand how you could use it in your own use cases.

Code

USE devcamp_sql_course_schema;

SELECT addresses_state, COUNT(addresses_state)
FROM addresses
GROUP BY addresses_state;

SELECT addresses_city, COUNT(addresses_city)
FROM addresses
GROUP BY addresses_city;

SELECT guides_users_id, SUM(guides_revenue)
FROM guides
GROUP BY guides_users_id;