like optimization enhancements

Version 12.0 changed the costing for like clauses not migrated into search arguments, using a technique of generating more accurate cost estimates for queries that include a leading wildcard in a like clause.

This provided better selectivity estimates, resulting in better query plans. Previously, a like clause with a leading wildcard was estimated to qualify all rows in the column (a selectivity of 1.0) since there was no way to search for a match. This was not the most accurate way to cost such a clause.

Example, in this query in a version 12.0 server:

select ... from part, partsupp, lineitem
where l_partkey = p_partkey
and l_partkey = ps_partkey
and p_title like ’%Topographic%’

The like string is compared with histogram cell boundaries.

A match occurs when the like term is found in a cell boundary by means of a pattern match search.

If no pattern match is found, selectivity is assumed to be 1 divided by the number of steps in the histogram. If the default of 20 cells is used, selectivity is 0.05 if there is now pattern match in the histogram boundary values.

If a pattern match is found in the cell boundaries, the selectivity is estimated to be the sum of the weights of all cells with a pattern match.

In either case, the resulting selectivity estimate is more accurate than in versions earilier than 12.0.

This also applies to queries with like clauses of the type like “_abc”, or like “[ ]abc”.