Result-set size and index use

A range query that returns a small number of rows performs well with the index, however, range queries that return a large number of rows may not use the index—it may be more expensive to perform the logical and physical I/O for a large number of index pages plus a large number of data pages. The lower the data row cluster ratio, the more expensive it is to use the index.

At the leaf level of a nonclustered index or a clustered index on a data-only-locked table, the keys are stored sequentially. For a search argument on a value that matches 100 rows, the rows on the index leaf level fit on perhaps one or two index pages. The actual data rows might all be on different data pages. The following queries show how different data row cluster ratios affect I/O estimates. The authors table uses datarows locking, and has these indexes:

Each of these queries returns about 100 rows:

select au_lname, phone 
from authors 
where au_lname like "E%"
select au_id, au_lname, phone 
from authors 
where state = "NC"

The following table shows the data row cluster ratio for each index, and the optimizer’s estimate of the number of rows to be returned and the number of pages required.

SARG on

Data row cluster ratio

Row estimate

Page estimate

Data I/O size

au_lname

.999789

101

8

16K

state

.232539

103

83

2K

The basic information on the table is:

While each of the queries has its search clauses in valid search-argument form, and each of the clauses matches an index, only the first query uses the index: for the other query, a table scan is cheaper than using the index. With 262 pages, the cost of the table scan is:


Closer look at the Search Argument costing

Looking more closely at the tables, cluster ratios, and search arguments explains why the table scan is chosen: