Distribution functions

SQL/OLAP defines several functions that deal with ordered sets. The two inverse distribution functions are named PERCENTILE_CONT and PERCENTILE_DISC. These analytical functions take a percentile value as the function argument and operate on a group of data specified in the WITHIN GROUP clause or operate on the entire data set.

These functions return one value per group. For PERCENTILE_DISC (discrete), the data type of the results is the same as the data type of its ORDER BY item specified in the WITHIN GROUP clause. For PERCENTILE_CONT (continuous), the data type of the results is either numeric, if the ORDER BY item in the WITHIN GROUP clause is a numeric, or double, if the ORDER BY item is an integer or floating point.

The inverse distribution analytical functions require a WITHIN GROUP (ORDER BY) clause. For example:

PERCENTILE_CONT ( expression1 )
WITHIN GROUP ( ORDER BY expression2 [ ASC | DESC ] )

The value of expression1 must be a constant of numeric data type and range from 0 to 1 (inclusive). If the argument is NULL, then a “wrong argument for percentile” error is returned. If the argument value is less than 0, or greater than 1, then a “data value out of range” error is returned.

The ORDER BY clause, which must be present, specifies the expression on which the percentile function is performed and the order in which the rows are sorted in each group. This ORDER BY clause is used only within the WITHIN GROUP clause and is not an ORDER BY for the SELECT statement.

The WITHIN GROUP clause distributes the query result into an ordered data set from which the function calculates a result.

The value expression2 is a sort specification that must be a single expression involving a column reference. Multiple expressions are not allowed and no rank analytical functions, set functions, or subqueries are allowed in this sort expression.

The ASC or DESC parameter specifies the ordering sequence as ascending or descending. Ascending order is the default.

Inverse distribution analytical functions are allowed in a subquery, a HAVING clause, a view, or a union. The inverse distribution functions can be used anywhere the simple non analytical aggregate functions are used. The inverse distribution functions ignore the NULL value in the data set.

PERCENTILE_CONT example This example uses the PERCENTILE_CONT function to determine the 10th percentile value for car sales in a region using the following data set:

sales       region        dealer_name
-----       ---------     -----------
900         Northeast     Boston
800         Northeast     Worcester
800         Northeast     Providence
700         Northeast     Lowell
540         Northeast     Natick
500         Northeast     New Haven
450         Northeast     Hartford
800         Northwest     SF
600         Northwest     Seattle
500         Northwest     Portland
400         Northwest     Dublin
500         South         Houston
400         South         Austin
300         South         Dallas
200         South         Dover

In the following example query, the SELECT statement contains the PERCENTILE_CONT function:

SELECT region, PERCENTILE_CONT(0.1)
WITHIN GROUP ( ORDER BY sales DESC )
FROM carSales GROUP BY region;

The result of the SELECT statement lists the 10th percentile value for car sales in a region:

region           percentile_cont
---------        ---------------
Northeast        840
Northwest        740
South            470

PERCENTILE_DISC example This example uses the PERCENTILE_DISC function to determine the 10th percentile value for car sales in a region, using the following data set:

sales       region        dealer_name
-----       ---------     -----------
900         Northeast     Boston
800         Northeast     Worcester
800         Northeast     Providence
700         Northeast     Lowell
540         Northeast     Natick
500         Northeast     New Haven
450         Northeast     Hartford
800         Northwest     SF
600         Northwest     Seattle
500         Northwest     Portland
400         Northwest     Dublin
500         South         Houston
400         South         Austin
300         South         Dallas
200         South         Dover

In the following query, the SELECT statement contains the PERCENTILE_DISC function:

SELECT region, PERCENTILE_DISC(0.1) WITHIN GROUP 
  (ORDER BY sales DESC )
FROM carSales GROUP BY region;

The result of the SELECT statement lists the 10th percentile value for car sales in each region:

region           percentile_cont
---------        ---------------
Northeast        900
Northwest        800
South            500

For more information about the distribution functions, see “PERCENTILE_CONT function [Analytical]” and “PERCENTILE_DISC function [Analytical],” Chapter 5, “SQL Functions,” in the Sybase IQ Reference Manual.