Ranking functions let you compile a list of values from the data set in ranked order, as well as compose single-statement SQL queries that answer questions such as, “Name the top 10 products shipped this year by total sales,” or “Give the top 5% of salespersons who sold orders to at least 15 different companies.” The functions include RANK(), DENSE_RANK(), PERCENT_RANK(), and NTILE() with a PARTITION BY clause. See “Ranking functions”.
SQL/OLAP defines four functions that are categorized as ranking functions:
<RANK FUNCTION TYPE> ::= RANK | DENSE RANK | PERCENT RANK | NTILE
Ranking functions let you compute a rank value for each row in a result set based on the order specified in the query. For example, a sales manager might need to identify the top or bottom sales people in the company, the highest- or lowest-performing sales region, or the best- or worst-selling products. Ranking functions can provide this information.
The RANK function returns a number that indicates the rank of the current row among the rows in the row’s partition, as defined by the ORDER BY clause. The first row in a partition has a rank of 1, and the last rank in a partition containing 25 rows is 25. RANK is specified as a syntax transformation, which means that an implementation can choose to actually transform RANK into its equivalent, or it can merely return a result equivalent to the result that transformation would return.
In the following example, ws1 indicates the window specification that defines the window named w1.
RANK() OVER ws
is equivalent to
( COUNT (*) OVER ( ws RANGE UNBOUNDED PRECEDING ) - COUNT (*) OVER ( ws RANGE CURRENT ROW ) + 1 )
The transformation of the RANK function uses logical aggregation (RANGE). As a result, two or more records that are tied—or have equal values in the ordering column—will have the same rank.The next group in the partition that has a different value will have a rank that is more than one greater than the rank of the tied rows. For example, if there are rows whose ordering column values are 10, 20, 20, 20, 30, the rank of the first row is 1 and the rank of the second row is 2. The rank of the third and fourth row is also 2, but the rank of the fifth row is 5. There are no rows whose rank is 3 or 4. This algorithm is sometimes known as sparse ranking.
See also “RANK function [Analytical],” Chapter 5, “SQL Functions,” in the Sybase IQ Reference Manual.
While RANK returns duplicate values in the ranking sequence when there are ties between values, DENSE_RANK returns ranking values without gaps. The values for rows with ties are still equal, but the ranking of the rows represents the positions of the clusters of rows having equal values in the ordering column, rather than the positions of the individual rows. As in the RANK example, where rows ordering column values are 10, 20, 20, 20, 30, the rank of the first row is still 1 and the rank of the second row is still 2, as are the ranks of the third and fourth rows. The last row, however, is 3, not 5.
DENSE_RANK is computed through a syntax transformation, as well.
DENSE_RANK() OVER ws
is equivalent to
COUNT ( DISTINCT ROW ( expr_1, . . ., expr_n ) ) OVER ( ws RANGE UNBOUNDED PRECEDING )
In the above example, expr_1 through expr_n represent the list of value expressions in the sort specification list of window w1.
See also “DENSE_RANK function [Analytical],” Chapter 5, “SQL Functions,” in the Sybase IQ Reference Manual.
The PERCENT_RANK function calculates a percentage for the rank, rather than a fractional amount, and returns a decimal value between 0 and 1. In other words, PERCENT_RANK returns the relative rank of a row, which is a number that indicates the relative position of the current row within the window partition in which it appears. For example, in a partition that contains 10 rows having different values in the ordering columns, the third row would be given a PERCENT_RANK value of 0.222 …, because you have covered 2/9 (22.222...%) of rows following the first row of the partition. PERCENT_RANK of a row is defined as one less than the RANK of the row divided by one less than then umber of rows in the partition, as seen in the following example (where “ANT” stands for an approximate numeric type, such as REAL or DOUBLE PRECISION).
PERCENT_RANK() OVER ws
is equivalent to
CASE WHEN COUNT (*) OVER ( ws RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) = 1 THEN CAST (0 AS ANT) ELSE ( CAST ( RANK () OVER ( ws ) AS ANT ) -1 / ( COUNT (*) OVER ( ws RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) - 1 ) END
See also Chapter , “PERCENT_RANK function [Analytical],” Chapter 5, “SQL Functions,” in the Sybase IQ Reference Manual.
Ranking example 1 The SQL query that follows finds the male and female employees from Utah and ranks them in descending order according to salary.
SELECT emp_lname, salary, sex, RANK() OVER (ORDER BY salary DESC) AS Rank FROM employee WHERE state IN ('MA') AND dept_id =100 ORDER BY salary DESC;
The following are the results from the above query:
emp_lname salary sex rank --------- -------- --- ---- Scott 96300.000 M 1 Lull 87900.000 M 2 Pastor 74500.000 F 3 Shishov 72995.000 F 4 Wang 68400.000 M 5 Sullivan 67890.000 F 6 Rabkin 64500.000 M 7 Cobb 62000.000 M 8 Gowda 59840.000 M 9 Breault 57490.000 M 10 Diaz 54900.000 M 11 Melkisetian 48500.000 F 12 Driscoll 48023.690 M 13 Whitney 45700.000 F 14 Guevara 42998.000 M 15 Marshall 42500.000 M 16 Siperstein 39875.500 F 17 Soo 39075.000 M 18 Samuels 37400.000 M 19
Ranking example 2 Using the query from Ranking example 1, you can change the data by partitioning it by gender. The following example ranks employees in descending order by salary and partitions by gender:
SELECT emp_lname, salary, sex, RANK() OVER (PARTITION BY sex ORDER BY salary DESC) AS RANK FROM employee WHERE state IN ('MA', 'AZ') AND dept_id IN (100, 200) ORDER BY sex, salary DESC;
The following are the results from the above query:
emp_lname salary sex rank --------- --------- --- ---- Kelly 87500.000 F 1 Pastor 74500.000 F 2 Shishov 72995.000 F 3 Sullivan 67890.000 F 4 Melkisetian 48500.000 F 5 Pickett 47653.000 F 6 Poitras 46200.000 F 7 Whitney 45700.000 F 8 Siperstein 39875.500 F 9 Scott 96300.000 M 1 Lull 87900.000 M 2 Wang 68400.000 M 3 Sterling 64900.000 M 4 Rabkin 64500.000 M 5 Cobb 62000.000 M 6 Gowda 59840.000 M 7 Breault 57490.000 M 8 Martel 55700.000 M 9 Diaz 54900.000 M 10 Dill 54800.000 M 11 Powell 54600.000 M 12 Driscoll 48023.690 M 13 Guevara 42998.000 M 14 Marshall 42500.000 M 15 Soo 39075.000 M 16 Goggin 37900.000 M 17 Preston 37803.000 M 18 Samuels 37400.000 M 19 Singer 34892.000 M 20 Chao 33890.000 M 21
Ranking example 3 This example takes a list of female employees in California and Texas and ranks them in descending order according to salary. The PERCENT_RANK function is used to provide a cumulative total in descending order.
SELECT emp_lname, salary, sex, CAST(PERCENT_RANK() OVER (ORDER BY salary DESC) AS numeric (4, 2)) AS RANK FROM employee WHERE state IN ('CA', 'TX') AND sex ='F' ORDER BY salary DESC;
The following are the results from the above query:
emp_lname salary sex percent --------- --------- --- ---------- Savarino 72300.000 F 0.00 Smith 51411.000 F 0.33 Clark 45000.000 F 0.66 Garcia 39800.000 F 1.00
Ranking example 4 You can use the PERCENT_RANK function to find the top or bottom percentiles in the data set. In this example, the query returns male employees whose salary is in the top five percent of the data set.
SELECT * FROM (SELECT emp_lname, salary, sex, CAST(PERCENT_RANK() OVER (ORDER BY salary DESC) as numeric (4, 2)) AS percent FROM employee WHERE state IN ('MA') AND sex ='F' ) AS DT where percent > 0.5 ORDER BY salary DESC;
The following are the results from the above query:
emp_lname salary sex percent --------- ---------- --- --------- Whitney 45700.000 F 0.51 Barletta 45450.000 F 0.55 Higgins 43700.000 F 0.59 Siperstein 39875.500 F 0.62 Coe 36500.000 F 0.66 Espinoza 36490.000 F 0.70 Wetherby 35745.000 F 0.74 Braun 34300.000 F 0.77 Butterfield 34011.000 F 0.81 Bigelow 31200.000 F 0.85 Bertrand 29800.000 F 0.88 Lambert 29384.000 F 0.92 Kuo 28200.000 F 0.96 Romero 27500.000 F 1.00