Ranking functions

Ranking functions let you compile a list of values from the data set in ranked order, as well as compose single-statement SQL queries that answer questions such as, “Name the top 10 products shipped this year by total sales,” or “Give the top 5% of salespersons who sold orders to at least 15 different companies.” The functions include RANK(), DENSE_RANK(), PERCENT_RANK(), and NTILE() with a PARTITION BY clause. See “Ranking functions”.

SQL/OLAP defines four functions that are categorized as ranking functions:

<RANK FUNCTION TYPE> ::=
  RANK | DENSE RANK | PERCENT RANK | NTILE

Ranking functions let you compute a rank value for each row in a result set based on the order specified in the query. For example, a sales manager might need to identify the top or bottom sales people in the company, the highest- or lowest-performing sales region, or the best- or worst-selling products. Ranking functions can provide this information.

RANK() function

The RANK function returns a number that indicates the rank of the current row among the rows in the row’s partition, as defined by the ORDER BY clause. The first row in a partition has a rank of 1, and the last rank in a partition containing 25 rows is 25. RANK is specified as a syntax transformation, which means that an implementation can choose to actually transform RANK into its equivalent, or it can merely return a result equivalent to the result that transformation would return.

In the following example, ws1 indicates the window specification that defines the window named w1.

RANK() OVER ws

is equivalent to

( COUNT (*) OVER ( ws RANGE UNBOUNDED PRECEDING )
- COUNT (*) OVER ( ws RANGE CURRENT ROW ) + 1 )

The transformation of the RANK function uses logical aggregation (RANGE). As a result, two or more records that are tied—or have equal values in the ordering column—will have the same rank.The next group in the partition that has a different value will have a rank that is more than one greater than the rank of the tied rows. For example, if there are rows whose ordering column values are 10, 20, 20, 20, 30, the rank of the first row is 1 and the rank of the second row is 2. The rank of the third and fourth row is also 2, but the rank of the fifth row is 5. There are no rows whose rank is 3 or 4. This algorithm is sometimes known as sparse ranking.

See also “RANK function [Analytical],” Chapter 5, “SQL Functions,” in the Sybase IQ Reference Manual.

DENSE_RANK() function

While RANK returns duplicate values in the ranking sequence when there are ties between values, DENSE_RANK returns ranking values without gaps. The values for rows with ties are still equal, but the ranking of the rows represents the positions of the clusters of rows having equal values in the ordering column, rather than the positions of the individual rows. As in the RANK example, where rows ordering column values are 10, 20, 20, 20, 30, the rank of the first row is still 1 and the rank of the second row is still 2, as are the ranks of the third and fourth rows. The last row, however, is 3, not 5.

DENSE_RANK is computed through a syntax transformation, as well.

DENSE_RANK() OVER ws

is equivalent to

COUNT ( DISTINCT ROW ( expr_1, . . ., expr_n ) )
 OVER ( ws RANGE UNBOUNDED PRECEDING )

In the above example, expr_1 through expr_n represent the list of value expressions in the sort specification list of window w1.

See also “DENSE_RANK function [Analytical],” Chapter 5, “SQL Functions,” in the Sybase IQ Reference Manual.

PERCENT_RANK() function

The PERCENT_RANK function calculates a percentage for the rank, rather than a fractional amount, and returns a decimal value between 0 and 1. In other words, PERCENT_RANK returns the relative rank of a row, which is a number that indicates the relative position of the current row within the window partition in which it appears. For example, in a partition that contains 10 rows having different values in the ordering columns, the third row would be given a PERCENT_RANK value of 0.222 …, because you have covered 2/9 (22.222...%) of rows following the first row of the partition. PERCENT_RANK of a row is defined as one less than the RANK of the row divided by one less than then umber of rows in the partition, as seen in the following example (where “ANT” stands for an approximate numeric type, such as REAL or DOUBLE PRECISION).

PERCENT_RANK() OVER ws

is equivalent to

CASE
  WHEN COUNT (*) OVER ( ws RANGE BETWEEN UNBOUNDED
  PRECEDING AND UNBOUNDED FOLLOWING ) = 1
  THEN CAST (0 AS ANT)
  ELSE
   ( CAST ( RANK () OVER ( ws ) AS ANT ) -1 /
   ( COUNT (*) OVER ( ws RANGE BETWEEN UNBOUNDED
   PRECEDING AND UNBOUNDED FOLLOWING ) - 1 )
END

Ranking examples

Ranking example 1 The SQL query that follows finds the male and female employees from Utah and ranks them in descending order according to salary.

SELECT emp_lname, salary, sex, RANK() OVER (ORDER BY
  salary DESC) AS Rank 
FROM employee WHERE state IN ('MA') AND dept_id =100
ORDER BY salary DESC;

The following are the results from the above query:

emp_lname      salary        sex    rank
---------      --------      ---    ----
Scott         96300.000      M         1
Lull          87900.000      M         2
Pastor        74500.000      F         3
Shishov       72995.000      F         4
Wang          68400.000      M         5
Sullivan      67890.000      F         6
Rabkin        64500.000      M         7
Cobb          62000.000      M         8
Gowda         59840.000      M         9
Breault       57490.000      M        10
Diaz          54900.000      M        11
Melkisetian   48500.000      F        12
Driscoll      48023.690      M        13
Whitney       45700.000      F        14
Guevara       42998.000      M        15
Marshall      42500.000      M        16
Siperstein    39875.500      F        17
Soo           39075.000      M        18
Samuels       37400.000      M        19

Ranking example 2 Using the query from Ranking example 1, you can change the data by partitioning it by gender. The following example ranks employees in descending order by salary and partitions by gender:

SELECT emp_lname, salary, sex, RANK() OVER (PARTITION
  BY sex ORDER BY salary DESC) AS RANK 
FROM employee WHERE state IN ('MA', 'AZ') AND dept_id
  IN (100, 200) 
ORDER BY sex, salary DESC;

The following are the results from the above query:

emp_lname            salary      sex     rank
---------         ---------      ---     ----
Kelly             87500.000      F          1
Pastor            74500.000      F          2
Shishov           72995.000      F          3
Sullivan          67890.000      F          4
Melkisetian       48500.000      F          5
Pickett           47653.000      F          6
Poitras           46200.000      F          7
Whitney           45700.000      F          8
Siperstein        39875.500      F          9
Scott             96300.000      M          1
Lull              87900.000      M          2
Wang              68400.000      M          3
Sterling          64900.000      M          4
Rabkin            64500.000      M          5
Cobb              62000.000      M          6
Gowda             59840.000      M          7
Breault           57490.000      M          8
Martel            55700.000      M          9
Diaz              54900.000      M         10
Dill              54800.000      M         11
Powell            54600.000      M         12
Driscoll          48023.690      M         13
Guevara           42998.000      M         14
Marshall          42500.000      M         15
Soo               39075.000      M         16
Goggin            37900.000      M         17
Preston           37803.000      M         18
Samuels           37400.000      M         19
Singer            34892.000      M         20
Chao              33890.000      M         21

Ranking example 3 This example takes a list of female employees in California and Texas and ranks them in descending order according to salary. The PERCENT_RANK function is used to provide a cumulative total in descending order.

SELECT emp_lname, salary, sex, CAST(PERCENT_RANK() OVER
  (ORDER BY salary DESC) AS numeric (4, 2)) AS RANK 
FROM employee WHERE state IN ('CA', 'TX') AND sex ='F'
ORDER BY salary DESC;

The following are the results from the above query:

emp_lname         salary   sex      percent
---------      ---------   ---   ----------
Savarino       72300.000   F           0.00 
Smith          51411.000   F           0.33
Clark          45000.000   F           0.66
Garcia         39800.000   F           1.00

Ranking example 4 You can use the PERCENT_RANK function to find the top or bottom percentiles in the data set. In this example, the query returns male employees whose salary is in the top five percent of the data set.

SELECT * FROM (SELECT emp_lname, salary, sex,
  CAST(PERCENT_RANK() OVER (ORDER BY salary DESC) as numeric (4, 2)) AS percent 
FROM employee WHERE state IN ('MA') AND sex ='F' ) AS
  DT where percent > 0.5 
ORDER BY salary DESC;

The following are the results from the above query:

emp_lname          salary    sex     percent
---------      ----------    ---   ---------
Whitney        45700.000     F          0.51
Barletta       45450.000     F          0.55
Higgins        43700.000     F          0.59
Siperstein     39875.500     F          0.62
Coe            36500.000     F          0.66
Espinoza       36490.000     F          0.70
Wetherby       35745.000     F          0.74
Braun          34300.000     F          0.77
Butterfield    34011.000     F          0.81
Bigelow        31200.000     F          0.85
Bertrand       29800.000     F          0.88
Lambert        29384.000     F          0.92
Kuo            28200.000     F          0.96
Romero         27500.000     F          1.00