For a given expression, the WIDTH_BUCKET function returns the bucket number that the result of this expression will be assigned after it is evaluated.
WIDTH_BUCKET ( expression, min_value, max_value, num_buckets)
expression is the expression for which the histogram is being created. This expression must evaluate to a numeric or datetime value or to a value that can be implicitly converted to a numeric or datetime value. If expr evaluates to null, then the expression returns null.
min_value An expression that resolves to the end points of the acceptable range for expr. Must also evaluate to numeric or datetime values and cannot evaluate to null.
max_value An expression that resolves to the end points of the acceptable range for expr. Must also evaluate to numeric or datetime values and cannot evaluate to null.
num_buckets Is an expression that resolves to a constant indicating the number of buckets. This expression must evaluate to a positive integer.
The following example creates a ten-bucket histogram on the credit_limit column for customers in Massachusetts in the sample table and returns the bucket number (“Credit Group”) for each customer. Customers with credit limits greater than the maximum value are assigned to the overflow bucket, 11:
SELECT customer_id, cust_last_name, credit_limit, WIDTH_BUCKET(credit_limit, 100, 5000, 10) "Credit Group" FROM customers WHERE territory = 'MA' ORDER BY "Credit Group";
CUSTOMER_ID CUST_LAST_NAME CREDIT_LIMIT Credit Group ----------- -------------- ------------ ------------ 825 Dreyfuss 500 1 826 Barkin 500 1 853 Palin 400 1 827 Siegel 500 1 843 Oates 700 2 844 Julius 700 2 835 Eastwood 1200 3 840 Elliott 1400 3 842 Stern 1400 3 841 Boyer 1400 3 837 Stanton 1200 3 836 Berenger 1200 3 848 Olmos 1800 4 849 Kaurusmdki 1800 4 828 Minnelli 2300 5 829 Hunter 2300 5 852 Tanner 2300 5 851 Brown 2300 5 850 Finney 2300 5 830 Dutt 3500 7 831 Bel Geddes 3500 7 832 Spacek 3500 7 838 Nicholson 3500 7 839 Johnson 3500 7 833 Moranis 3500 7 834 Idle 3500 7 845 Fawcett 5000 11 846 Brando 5000 11 847 Streep 5000 11
When the bounds are reversed, the buckets are open-closed intervals. For example: WIDTH_BUCKET (credit_limit, 5000, 0, 5). In this example, bucket number 1 is (4000, 5000], bucket number 2 is (3000, 4000], and bucket number 5 is (0, 1000]. The overflow bucket is numbered 0 (5000, +infinity), and the underflow bucket is numbered 6 (-infinity, 0].
You can generate equiwidth histograms with the WIDTH_BUCKET function. Equiwidth histograms divide data sets into buckets whose interval size (highest value to lowest value) is equal. The number of rows held by each bucket will vary. A related function, NTILE, creates equiheight buckets.
Equiwidth histograms can be generated only for numeric, date or datetime data types; therefore, the first three parameters should be all numeric expressions or all date expressions. Other types of expressions are not allowed. If the first parameter is NULL, the result is NULL. If the second or the third parameter is NULL, an error message is returned, as a NULL value cannot denote any end point (or any point) for a range in a date or numeric value dimension. The last parameter (number of buckets) should be a numeric expression that evaluates to a positive integer value; 0, NULL, or a negative value will result in an error.
Buckets are numbered from 0 to (n+1). Bucket 0 holds the count of values less than the minimum. Bucket(n+1) holds the count of values greater than or equal to the maximum specified value.
“NTILE function [Analytical]”, which creates equiheight histograms.
Chapter 4, “Using OLAP” in the Sybase IQ Performance and Tuning Guide