The set statistics plancost option simplifies
query analysis. It displays the estimated values for logical I/O,
physical I/O, and row counts compared to the actual ones
evaluated at each operator, and reports on CPU and sort buffer cost (cpu:
0 bufct: 16
in the example below). To enable
this display, enter:
dbcc traceon(3604) go set statistics plancost on go
For example, this query:
select S.service_key, M.year, M.fiscal_period,count(*) from telco_facts T,month M, service S where T.month_key=M.month_key and T.service_key = S.service_key and S.call_waiting_flag='Y' and S.caller_id_flag='Y' and S.voice_mail_flag='Y' group by M.year, M.fiscal_period, S.service_key order by M.year, M.fiscal_period, S.service_key
Produces this query tree:
Emit (VA = 7) 12 rows est: 1200 cpu: 500 / GroupSorted (VA = 6) 12 rows est: 1200 / NestLoopJoin Inner Join (VA = 5) 242704 rows est: 244857 / \ Sort IndexScan (VA = 3) month_svc_idx (T) 72 rows est: 24 (VA = 4) lio: 6 est: 6 242704 rows est: 244857 pio: 0 est: 0 lio: 1116 est: 0 cpu: 0 bufct: 16 pio: 0 est: 0 / NestLoopJoin Inner Join (VA = 2) 72 rows est: 24 / \ TableScan TableScan month (M) service (S) (VA = 0) (VA = 1) 24 rows est: 24 72 rows est: 24 lio: 1 est: 0 lio: 24 est: 0 pio: 0 est: 0 pio: 0 est: 0
If set statistics planview estimates the wrong row counts, the optimizer estimates are also off. This may be caused by missing or stale statistics. This is described in the following query (which is also run with show_missing_stats enabled):
dbcc traceon(3604) go set option show_missing_stats on go set statistics plancost on go select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= dateadd(day, 79, '1998-12-01') group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus go
==================== Lava Operator Tree ==================== Emit (VA = 4) 4 rows est: 100 cpu: 800 / Restrict (0)(13)(0)(0) (VA = 3) 4 rows est: 100 / GroupSorted (VA = 2) 4 rows est: 100 / Sort (VA = 1) 60175 rows est: 19858 lio: 2470 est: 284 pio: 2355 est: 558 cpu: 1900 bufct: 21 / TableScan lineitem (VA = 0) 60175 rows est: 19858 lio: 4157 est: 4157 pio: 1205 est: 4157 ============================================================ NO STATS on column lineitem.l_shipdate (4 rows affected)
The estimated number of rows is incorrect at the scan level. The query does have a predicate:
l_shipdate <= dateadd(day, 79, '1998-12-01')
If there are no statistics on l_shipdate
,
the query processor uses an arbitrary value. In this example, the
query processor uses a value that creates an estimated row count
of 19858 rows, which is far from the actual row count of 60175 rows,
and is why the query processor decided to run a sort; the cost of
a sort appears cheaper if the query processor estimates that the
number of rows streamed into the sorter is a third of the actual
count. The row reduction of the GroupSorted operator
is also significant; reduced from 60175 to 4 rows, so the advantages
of the GroupSorted algorithm are overridden
by a hash-based grouping algorithm, which would probably cache all
the data in memory.
Based on the value from the show_missing_stats option, the query processor decided to run update statistics on the l_shipdate column:
update statistics lineitem(l_shipdate)
If you re-run the query, this query tree is produced:
==================== Lava Operator Tree ==================== Emit (VA = 4) 4 rows est: 100 cpu: 0 / Restrict (0)(13)(0)(0) (VA = 3) 4 rows est: 100 / Sort (VA = 2) 4 rows est: 100 lio: 6 est: 6 pio: 0 est: 0 cpu: 800 bufct: 16 / HashVectAgg Count (VA = 1) 4 rows est: 100 lio: 5 est: 5 pio: 0 est: 0 bufct: 16 / TableScan lineitem (VA = 0) 60175 rows est: 60175 lio: 4157 est: 4157 pio: 1039 est: 4157
After update statistics runs, the estimated row count for the TableScan operator is the same as the actual row count, and the query plan changes to use the HashVectAgg (that is, hash-based vector aggregation) instead of the Sort and GroupSorted combination in the earlier example. The query also runs much faster.
You can perform more work to improve the query. The output of the HashVectAgg operator shows an estimated rowcount of 100, but the actual rowcount is 4. Because the grouping columns are on l_returnflag and l_linestatus, you can create a density on this pair of columns:
use tpcd go update statistics lineitem(l_returnflag, l_linestatus) go set showplan on go set statistics plancost on go
If you re-run the query, you get this query plan:
QUERY PLAN FOR STATEMENT 1 (at line 2). 4 operator(s) under root The type of query is SELECT. ROOT:EMIT Operator |RESTRICT Operator | | |SORT Operator | | Using Worktable2 for internal storage. | | | | |HASH VECTOR AGGREGATE Operator | | | GROUP BY | | | Evaluate Grouped COUNT AGGREGATE. | | | Evaluate Grouped SUM OR AVERAGE AGGREGATE. | | | Evaluate Grouped COUNT AGGREGATE. | | | Evaluate Grouped SUM OR AVERAGE AGGREGATE. | | | Evaluate Grouped SUM OR AVERAGE AGGREGATE. | | | Evaluate Grouped SUM OR AVERAGE AGGREGATE. | | | Evaluate Grouped SUM OR AVERAGE AGGREGATE. | | | Using Worktable1 for internal storage. | | | | | | | SCAN Operator | | | | FROM TABLE | | | | lineitem | | | | Table Scan. | | | | Forward Scan. | | | | Positioning at start of table. | | | | Using I/O Size 2 Kbytes for data pages. | | | | With MRU Buffer Replacement Strategy for data pages. ==================== Lava Operator Tree ==================== Emit (VA = 4) 4 rows est: 4 cpu: 0 / Restrict (0)(13)(0)(0) (VA = 3) 4 rows est: 4 / Sort (VA = 2) 4 rows est: 4 lio: 6 est: 6 pio: 0 est: 0 cpu: 700 bufct: 16 / HashVectAgg Count (VA = 1) 4 rows est: 4 lio: 5 est: 5 pio: 0 est: 0 bufct: 16 / TableScan lineitem (VA = 0) 60175 rows est: 60175 lio: 4157 est: 4157 pio: 1264 est: 4157
The estimated row count for HashVectAgg is same as that of the actual row count.