Used in select statements to divide a table into groups and to return only groups that match conditions in the having clause.
Start of select statement
[group by [all] aggregate_free_expression [, aggregate_free_expression]...]
[having search_conditions]
End of select statement
specifies the groups into which the table will be divided, and if aggregate functions are included in the select list, finds a summary value for each group. These summary values appear as columns in the results, one for each group. You can refer to these summary columns in the having clause.
You can use the avg, count, max, min, and sum aggregate functions in the select list before group by (the expression is usually a column name). For more information, see “Aggregate functions”.
A table can be grouped by any combination of columns—that is, groups can be nested within each other, as in Example 2.
is a Transact-SQL extension that includes all groups in the results, even those excluded by a where clause. For example:
select type, avg(price) from titles where advance > 7000 group by all type
type ----------------- ---------- UNDECIDED NULL business 2.99 mod_cook 2.99 popular_comp 20.00 psychology NULL trad_cook 14.99 (6 rows affected)
“NULL” in the aggregate column indicates groups that would be excluded by the where clause. A having clause negates the meaning of all.
is an expression that includes no aggregates. A Transact-SQL extension allows grouping by an aggregate-free expression as well as by a column name.
You cannot group by column heading or alias. This example is correct:
select Price=avg(price), Pay=avg(advance), Total=price * $1.15 from titles group by price * $1.15
sets conditions for the group by clause, similar to the way in which where sets conditions for the select clause.
having search conditions can include aggregate expressions; otherwise, having search conditions are identical to where search conditions. Following is an example of a having clause with aggregates:
select pub_id, total = sum(total_sales) from titles where total_sales is not null group by pub_id having count(*)>5
When Adaptive Server optimizes queries, it evaluates the search conditions in where and having clauses, and determines which conditions are search arguments (SARGs) that can be used to choose the best indexes and query plan. All of the search conditions are used to qualify the rows. For more information on search arguments, see the Performance and Tuning Guide.
Calculates the average advance and the sum of the sales for each type of book:
select type, avg(advance), sum(total_sales) from titles group by type
Groups the results by type, then by pub_id within each type:
select type, pub_id, avg(advance), sum(total_sales) from titles group by type, pub_id
Calculates results for all groups, but displays only groups whose type begins with “p”:
select type, avg(price) from titles group by type having type like 'p%'
Calculates results for all groups, but displays results for groups matching the multiple conditions in the having clause:
select pub_id, sum(advance), avg(price) from titles group by pub_id having sum(advance) > $15000 and avg(price) < $10 and pub_id > "0700"
Calculates the total sales for each group (publisher) after joining the titles and publishers tables:
select p.pub_id, sum(t.total_sales) from publishers p, titles t where p.pub_id = t.pub_id group by p.pub_id
Displays the titles that have an advance of more than $1000 and a price that is more than the average price of all titles:
select title_id, advance, price from titles where advance > 1000 having price > avg(price)
You can use a column name or any expression (except a column heading or alias) after group by. You can use group by to calculate results or display a column or an expression that does not appear in the select list (a Transact-SQL extension described in “Transact-SQL extensions to group by and having”).
The maximum number of columns or expressions allowed in a group by clause is 31, the same as the maximum number of indexes allowed on a table.
The group by clause on large columns, and on all columns specified by the group by clause, is limited by the maximum size of the index for a given logical page size. This is because Adaptive Server generates a worktable with a key when grouping data results. For more information on index sizes, see create index.
An index size limitation may cause errors when you process a group by clause. For instance, a group by clause in a 1024-byte column on a 2K page size server causes an error if the index size limitation is 600 bytes.
Null values in the group by column are put into a single group.
You cannot name text or image columns in group by and having clauses.
You cannot use a group by clause in the select statement of an updatable cursor.
Aggregate functions can be used only in the select list or in a having clause. They cannot be used in a where or group by clause.
Aggregate functions are of two types. Aggregates applied to all the qualifying rows in a table (producing a single value for the whole table per function) are called scalar aggregates. An aggregate function in the select list with no group by clause applies to the whole table; it is one example of a scalar aggregate.
Aggregates applied to a group of rows in a specified column or expression (producing a value for each group per function) are called vector aggregates. For either aggregate type, the results of the aggregate operations are shown as new columns that the having clause can refer to.
You can nest a vector aggregate inside a scalar aggregate. See “Aggregate functions.” for more information.
The where clause excludes rows that do not meet its search conditions; its function remains the same for grouped or nongrouped queries.
The group by clause collects the remaining rows into one group for each unique value in the group by expression. Omitting group by creates a single group for the whole table.
Aggregate functions specified in the select list calculate summary values for each group. For scalar aggregates, there is only one value for the table. Vector aggregates calculate values for the distinct groups.
The having clause excludes groups from the results that do not meet its search conditions. Even though the having clause tests only rows, the presence or absence of a group by clause may make it appear to be operating on groups:
When the query includes group by, having excludes result group rows. This is why having seems to operate on groups.
When the query has no group by, having excludes result rows from the (single-group) table. This is why having seems to operate on rows (the results are similar to where clause results).
All group by and having queries in the Examples section adhere to the SQL standard. It dictates that queries using group by, having, and vector aggregate functions produce one row and one summary value per group, using these guidelines:
Columns in a select list must also be in the group by expression, or they must be arguments of aggregate functions.
A group by expression can contain only column names that are in the select list. However, columns used only as arguments of aggregate functions in the select list do not qualify.
Columns in a having expression must be single-valued —arguments of aggregates, for instance — and they must be in the select list or group by clause. Queries with a select list aggregate and a having clause must have a group by clause. If you omit the group by for a query without a select list aggregate, all the rows not excluded by the where clause are considered to be a single group (see Example 6).
In nongrouped queries, the principle that “where excludes rows” seems straightforward. In grouped queries, the principle expands to “where excludes rows before group by, and having excludes rows from the display of results.”
The SQL standard allows queries that join two or more tables to use group by and having, if they also adhere to the above guidelines. When specifying joins or other complex queries, use the standard syntax of group by and having until you fully comprehend the effect of the Transact-SQL extensions to both clauses, as described in “Transact-SQL extensions to group by and having.”
To help you avoid problems with extensions, Adaptive Server provides the fipsflagger option to the set command that issues a nonfatal warning for each occurrence of a Transact-SQL extension in a query. See set for more information.
Transact-SQL extensions to standard SQL make displaying data more flexible, by allowing references to columns and expressions that are not used for creating groups or summary calculations:
A select list that includes aggregates can include extended columns that are not arguments of aggregate functions and are not included in the group by clause. An extended column affects the display of final results, since additional rows are displayed.
The group by clause can include columns or expressions that are not in the select list.
The group by all clause displays all groups, even those excluded from calculations by a where clause. See the example for the keyword all in the “Parameters” section.
The having clause can include columns or expressions that are not in the select list and not in the group by clause.
When the Transact-SQL extensions add rows and columns to a display, or if group by is omitted, query results can be hard to interpret. The examples that follow can help you understand how Transact-SQL extensions can affect query results.
The following examples illustrate the differences between queries that use standard group by and having clauses and queries that use the Transact-SQL extensions:
An example of a standard grouping query:
select type, avg(price) from titles group by type
type ---------------------- ---------- UNDECIDED NULL business 13.73 mod_cook 11.49 popular_comp 21.48 psychology 13.50 trad_cook 15.96 (6 rows affected)
The Transact-SQL extended column, price (in the select list, but not an aggregate and not in the group by clause), causes all qualified rows to display in each qualified group, even though a standard group by clause produces a single row per group. The group by still affects the vector aggregate, which computes the average price per group displayed on each row of each group (they are the same values that were computed for example a):
select type, price, avg(price) from titles group by type
type price ------------ ---------------- -------------- business 19.99 13.73 business 11.95 13.73 business 2.99 13.73 business 19.99 13.73 mod_cook 19.99 11.49 mod_cook 2.99 11.49 UNDECIDED NULL NULL popular_comp 22.95 21.48 popular_comp 20.00 21.48 popular_comp NULL 21.48 psychology 21.59 13.50 psychology 10.95 13.50 psychology 7.00 13.50 psychology 19.99 13.50 psychology 7.99 13.50 trad_cook 20.95 15.96 trad_cook 11.95 15.96 trad_cook 14.99 15.96 (18 rows affected)
The way Transact-SQL extended columns are handled can make it look as if a query is ignoring a where clause. This query computes the average prices using only those rows that satisfy the where clause, but it also displays rows that do not match the where clause.
Adaptive Server first builds a worktable containing only the type and aggregate values using the where clause. This worktable is joined back to the titles table in the grouping column type to include the price column in the results, but the where clause is not used in the join.
The only row in titles that is not in the results is the lone row with type = “UNDECIDED” and a NULL price, that is, a row for which there were no results in the worktable. If you also want to eliminate the rows from the displayed results that have prices of less than $10.00, you must add a having clause that repeats the where clause, as shown in Example 4:
select type, price, avg(price) from titles where price > 10.00 group by type
type price ------------ ---------------- -------------- business 19.99 17.31 business 11.95 17.31 business 2.99 17.31 business 19.99 17.31 mod_cook 19.99 19.99 mod_cook 2.99 19.99 popular_comp 22.95 21.48 popular_comp 20.00 21.48 popular_comp NULL 21.48 psychology 21.59 17.51 psychology 10.95 17.51 psychology 7.00 17.51 psychology 19.99 17.51 psychology 7.99 17.51 trad_cook 20.95 15.96 trad_cook 11.95 15.96 trad_cook 14.99 15.96 (17 rows affected)
If you are specifying additional conditions, such as aggregates, in the having clause, be sure to also include all conditions specified in the where clause. Adaptive Server will appear to ignore any where clause conditions that are missing from the having clause:
select type, price, avg(price) from titles where price > 10.00 group by type having price > 10.00
type price ----------- ---------------- -------------- business 19.99 17.31 business 11.95 17.31 business 19.99 17.31 mod_cook 19.99 19.99 popular_comp 22.95 21.48 popular_comp 20.00 21.48 psychology 21.59 17.51 psychology 10.95 17.51 psychology 19.99 17.51 trad_cook 20.95 15.96 trad_cook 11.95 15.96 trad_cook 14.99 15.96 (12 rows affected)
This is an example of a standard grouping query using a join between two tables. It groups by pub_id, then by type within each publisher ID, to calculate the vector aggregate for each row:
select p.pub_id, t.type, sum(t.total_sales) from publishers p, titles t where p.pub_id = t.pub_id group by p.pub_id, t.type
pub_id type ------ ------------ -------- 0736 business 18722 0736 psychology 9564 0877 UNDECIDED NULL 0877 mod_cook 24278 0877 psychology 375 0877 trad_cook 19566 1389 business 12066 1389 popular_comp 12875 (8 rows affected)
It may seem that it is only necessary to specify group by for the pub_id and type columns to produce the results, and add extended columns as follows:
select p.pub_id, p.pub_name, t.type, sum(t.total_sales) from publishers p, titles t where p.pub_id = t.pub_id group by p.pub_id, t.type
However, the results for the above query are much different from the results for the first query in this example. After joining the two tables to determine the vector aggregate in a worktable, Adaptive Server joins the worktable to the table (publishers) of the extended column for the final results. Each extended column from a different table invokes an additional join.
As you can see, using the extended column extension in queries that join tables can easily produce results that are difficult to comprehend. In most cases, you should use the standard group by to join tables in your queries.
This example uses the Transact-SQL extension to group by to include columns that are not in the select list. Both the pub_id and type columns are used to group the results for the vector aggregate. However, the final results do not include the type within each publisher. In this case, you may only want to know how many distinct title types are sold for each publisher:
select p.pub_id, sum(t.total_sales) from publishers p, titles t where p.pub_id = t.pub_id group by p.pub_id, t.type
pub_id ------ -------- 0736 18722 0736 9564 0877 NULL 0877 24278 0877 375 0877 19566 1389 12066 1389 12875 (8 rows affected)
This example combines two Transact-SQL extension effects. First, it omits the group by clause while including an aggregate in the select list. Second, it includes an extended column. By omitting the group by clause:
The table becomes a single group. The scalar aggregate counts three qualified rows.
pub_id becomes a Transact-SQL extended column because it does not appear in a group by clause. No having clause is present, so all rows in the group are qualified to be displayed.
select pub_id, count(pub_id) from publishers
pub_id ---------- --------- 0736 3 0877 3 1389 3 (3 rows affected)
The where clause excludes publishers with a pub_id of 1000 or more from the single group, so the scalar aggregate counts two qualified rows. The extended column pub_id displays all qualified rows from the publishers table:
select pub_id, count(pub_id) from publishers where pub_id < "1000"
pub_id -------------- ----------- 0736 2 0877 2 1389 2 (3 rows affected)
This example illustrates an effect of a having clause used without a group by clause.
The table is considered a single group. No where clause excludes rows, so all the rows in the group (table) are qualified to be counted.
The rows in this single-group table are tested by the having clause.
These combined effects display the two qualified rows.
select pub_id, count(pub_id) from publishers having pub_id < "1000"
pub_id -------------- --------- 0736 3 0877 3 (2 rows affected)
This example uses the extension to having that allows columns or expressions not in the select list and not in the group by clause. It determines the average price for each title type, but it excludes those types that do not have more than $10,000 in total sales, even though the sum aggregate does not appear in the results:
select type, avg(price) from titles group by type having sum(total_sales) > 10000
type ------------ ---------- business 13.73 mod_cook 11.49 popular_comp 21.48 trad_cook 15.96 (4 rows affected)
If your server has a case-insensitive sort order, group by ignores the case of the grouping columns. For example, given this data on a case-insensitive server:
select lname, amount from groupdemo
lname amount ---------- ------------------ Smith 10.00 smith 5.00 SMITH 7.00 Levi 9.00 Lévi 20.00
grouping by lname produces these results:
select lname, sum(amount) from groupdemo
lname
lname ---------- ------------------ Levi 9.00 Lévi 20.00 Smith 22.00
The same query on a case- and accent-insensitive server produces these results:
lname ---------- ------------------ Levi 29.00 Smith 22.00
SQL92 – Compliance level: Entry-level compliant.
The use of columns within the select list that are not in the group by list and have no aggregate functions is a Transact-SQL extension.
The use of the all keyword is a Transact-SQL extension.
Commands compute clause, declare, select, where clause
Functions Aggregate functions