Histograms for columns with
highly duplicated values look very different from histograms for
columns with a large number of discrete values. In histograms for
columns with highly duplicated values, a single cell, called a **frequency
cell**, represents the duplicated value.

The weight of the frequency cell shows the percentage of columns that have matching values.

Histogram output for frequency cells varies, depending on whether the column values represent one of the following:

A dense frequency count, where values in the column are contiguous in the domain. For example, 1, 2, 3 are contiguous integer values

A sparse frequency count, where the domain of possible values contains values not represented by the discrete set of values in the table

A mix of dense and sparse frequency counts.

Histogram output for some columns includes a mix of frequency cells and range cells.

The following output shows the histogram for a column that has 6 distinct integer values, 1–6, and some null values:

Step Weight Value 1 0.13043478 < 1 2 0.04347826 = 1 3 0.17391305 <= 2 4 0.30434781 <= 3 5 0.13043478 <= 4 6 0.17391305 <= 5 7 0.04347826 <= 6

The
histogram above shows a **dense frequency count**,
because all the values for the column are contiguous integers.

The first cell represents null values. Since there are null values, the weight for this cell represents the percentage of null values in the column.

The “Value” column for the first step displays
the minimum column value in the table and the **<** operator.

In
a histogram representing a column with a *sparse frequency
count*, the highly duplicated values are represented by
a step showing the discrete values with the **=** operator
and the weight for the cell.

Preceding each step, there is a step with a weight of 0.0,
the same value, and the **<** operator, indicating
that there are no rows in the table with intervening values. For
columns with null values, the first step will have a nonzero weight
if there are null values in the table.

The following histogram represents the *type* column
of the *titles* table. Since there are only 9
distinct types, they are represented by 18 steps.

Step Weight Value 1 0.00000000 < "UNDECIDED " 2 0.11500000 = "UNDECIDED " 3 0.00000000 < "adventure " 4 0.11000000 = "adventure " 5 0.00000000 < "business " 6 0.11040000 = "business " 7 0.00000000 < "computer " 8 0.11640000 = "computer " 9 0.00000000 < "cooking " 10 0.11080000 = "cooking " 11 0.00000000 < "news " 12 0.10660000 = "news " 13 0.00000000 < "psychology " 14 0.11180000 = "psychology " 15 0.00000000 < "romance " 16 0.10800000 = "romance " 17 0.00000000 < "travel " 18 0.11100000 = "travel "

For example, 10.66% of the values in the type column are “news,” so for a table with 5000 rows, the optimizer estimates that 533 rows will be returned.

For tables with some values that are highly duplicated, and others that have distributed values, the histogram output shows a combination of operators and a mix of frequency cells and range cells.

The column represented in the histogram below has a value of 30.0 for a large percentage of rows, a value of 50.0 for a large percentage of rows, and a value 100.0 for another large percentage of rows.

There are two steps in the histogram for each of these values:
one step representing the highly duplicated value has the **=** operator
and a weight showing the percentage of columns that match the value.
The other step for each highly duplicated value has the **<** operator
and a weight of 0.0. The datatype for this column is *numeric(5,1)*.

Step Weight Value 1 0.00000000 <= 0.9 2 0.04456094 <= 20.0 3 0.00000000 < 30.0 4 0.29488859 = 30.0 5 0.05996068 <= 37.0 6 0.04292267 <= 49.0 7 0.00000000 < 50.0 8 0.19659241 = 50.0 9 0.06028834 <= 75.0 10 0.05570118 <= 95.0 11 0.01572739 <= 99.0 12 0.00000000 < 100.0 13 0.22935779 = 100.0

Since the lowest value in the column is 1.0, the step for the null values is represented by 0.9.