Obtaining analytical data

This section tells how to construct queries that give you analytical information. There are two types of analytical functions: rank and inverse distribution. The rank analytical functions rank items in a group, compute distribution, and divide a result set into a number of groupings. The inverse distribution analytical functions return a k-th percentile value, which can be used to help establish a threshold acceptance value for a set of data.

The rank analytical functions are RANK, DENSE_RANK, PERCENT_RANK, and NTILE. The inverse distribution analytical functions are PERCENTILE_CONT and PERCENTILE_DISC.

Suppose you want to determine the sale status of car dealers. The NTILE function divides the dealers into four groups based on the number of cars each dealer sold. The dealers with ntile = 1 are in the top 25% for car sales.

SELECT dealer_name, sales,
NTILE(4) OVER ( ORDER BY sales DESC )
FROM carSales;

dealer_name        sales       ntile
Boston             1000        1
Worcester          950         1
Providence         950         1
SF                 940         1
Lowell             900         2
Seattle            900         2
Natick             870         2
New Haven          850         2
Portland           800         3
Houston            780         3
Hartford           780         3
Dublin             750         3
Austin             650         4
Dallas             640         4
Dover              600         4

To find the top 10% of car dealers by sales, you specify NTILE(10) in the example SELECT statement. Similarly, to find the top 50% of car dealers by sales, specify NTILE(2).

NTILE is a rank analytical function that distributes query results into a specified number of buckets and assigns the bucket number to each row in the bucket. You can divide a result set into tenths (deciles), fourths (quartiles), and other numbers of groupings.

The rank analytical functions require an OVER (ORDER BY) clause. The ORDER BY clause specifies the parameter on which ranking is performed and the order in which the rows are sorted in each group. Note that this ORDER BY clause is used only within the OVER clause and is not an ORDER BY for the SELECT.

The OVER clause indicates that the function operates on a query result set. The result set is the rows that are returned after the FROM, WHERE, GROUP BY, and HAVING clauses have all been evaluated. The OVER clause defines the data set of the rows to include in the computation of the rank analytical function.

Similarly, the inverse distribution functions require a WITHIN GROUP (ORDER BY) clause. The ORDER BY specifies the expression on which the percentile function is performed and the order in which the rows are sorted in each group. This ORDER BY clause is used only within the WITHIN GROUP clause and is not an ORDER BY for the SELECT. The WITHIN GROUP clause distributes the query result into an ordered data set from which the function calculates a result.

For more details on the analytical functions, see the section “Analytical functions” in Chapter 5, “SQL Functions” of the Sybase IQ Reference Manual For information on individual analytical functions, see the section for each function in the “SQL Functions” chapter.