If I know an index will have unique values, how will it affect performance on inserts or selects if I declare it as such.
If the optimiser knows the index is unique how will that affect the query plan?
I understand that specifying uniquenes can serve to preserve integrity, but leaving that discussion aside for the moment, what are the perfomance consequences.
-
Of course the optimizer will take uniqueness in consideration. It affects the expected row count in query plans.
-
Yes, it will be taken into consideration by the query engine.
-
Perhaps more important: the uniqueness will protect the data integrity. Performance would a reason to ignore this.
Performance could be affected positively or negatively or not at all: it would depends on the query, if the index is used etc
-
Performance is negatively affected when inserting data. It needs to check the uniqueness.
kquinn : And positively affected when selecting data: the optimizer can exploit the uniqueness.Quassnoi : There is no performance difference between inserting a field into UNIQUE and non-UNIQUE index. The engine should parse the B-tree anyway, uniqueness just affects the decision whether to insert this value into given place in the B-tree or not.Michael Haren : I'm very curious about this, too. Benchmarks or credible sources would be much appreciated.Jonathan Leffler : Performance is negatively affected when inserting data into a non-unique index; it has to check the uniqueness or not, and deal with adding the new row into the pre-existing slot or creating a new slot. There isn't much difference.Stefan Steinegger : Found this thread: http://www.sqlservercentral.com/Forums/Topic651562-360-1.aspx#bm652904 "The optimizer will take into account when an index is unique and it can improve performance, but it really does depend on the query." (...) "It will slow down inserts slightly, but probably not enough to notice." I think, it doesn't matter much. -
Long story short: if your data are intrinsically
UNIQUE
, you will benefit from creating aUNIQIE
index on them.See the article in my blog for detailed explanation:
Now, the gory details.
As @Mehrdad said,
UNIQUENESS
affects the estimated row count in the plan builder.UNIQUE
index has maximal possible selectivity, that's why:SELECT * FROM table1 t2, table2 t2 WHERE t1.id = :myid AND t2.unique_indexed_field = t1.value
almost surely will use
NESTED LOOPS
, whileSELECT * FROM table1 t2, table2 t2 WHERE t1.id = :myid AND t2.non_unique_indexed_field = t1.value
may benefit from a
HASH JOIN
if the optimizer thinks thatnon_unique_indexed_field
is not selective.If your index is
CLUSTERED
(i. e. the rows theirselves are contained in the index leaves) and non-UNIQUE
, then a special hidden column calleduniquifier
is added to each index key, thus making the key larger and the index slower.That's why
UNIQUE CLUSTERED
index is in fact a little more efficicent than anon-UNIQUE CLUSTERED
one.In
Oracle
, a join onUNIQUE INDEX
is required for a such calledkey preservation
, which ensures that each row from a table will be selected at most once and makes a view updatable.This query:
UPDATE ( SELECT * FROM mytable t1, mytable t2 WHERE t2.reference = t1.unique_indexed_field ) SET value = other_value
will work in
Oracle
, while this one:UPDATE ( SELECT * FROM mytable t1, mytable t2 WHERE t2.reference = t1.non_unique_indexed_field ) SET value = other_value
will fail.
This is not an issue with
SQL Server
, though.One more thing: for a table like this,
CREATE TABLE t_indexer (id INT NOT NULL PRIMARY KEY, uval INT NOT NULL, ival INT NOT NULL) CREATE UNIQUE INDEX ux_indexer_ux ON t_indexer (uval) CREATE INDEX ix_indexer_ux ON t_indexer (ival)
, this query:
/* Sorts on the non-unique index first */ SELECT TOP 1 * FROM t_indexer ORDER BY ival, uval
will use a
TOP N SORT
, while this one:/* Sorts on the unique index first */ SELECT TOP 1 * FROM t_indexer ORDER BY uval, ival
will use just an index scan.
For the latter query, there is no point in additional sorting on
ival
, sinceuval
are unique anyway, and the optimizer takes this into account.On sample data of
200,000
rows (id == uval == ival
), the former query runs for15
seconds, while the latter one is instant.Michael Haren : Is there a significant difference between hash joins and nested loop joins? It's not clear if you're suggesting that the distinction justifies one or the other.Quassnoi : For the query above, HASH JOIN's are more efficient on non-selective indexes, NESTED LOOP's are more efficient on selective ones. UNIQUE index is the most selective index ever, and the optimizer will take the index uniqueness into account when estimating selectivity and choosing the join algorithm.Michael Haren : Are you saying then that there's not a general answer (it depends heavily on the query)? Is there no easy answer to this?: if the index *could* be unique, should I make it unique or not?Quassnoi : Yes, if the index could be unique, you certainly should make it unique. There is no benefit from using non-UNIQUE index on intrinsically UNIQUE data. UNIQUE helps the SQL Server to understand that the data are really unique and optimize the algorithms.
0 comments:
Post a Comment