I want to join two tables on a char column. The simplest way I use is:
SELECT * FROM a JOIN b ON (a.text = b.text)
This method is quite fast but it the comparison of field is case insensitive. I have tried STRCMP(.., ..) and .. LIKE .. instead .. = .. but both are far to slow. What is the best solution to join two tables on char column with respect to the letter case?
-
don’t join on strings, use surrogate keys instead
czuk : Every entry has its own surrogate key. However, couple entries can have the same text and I need to join using that text. I already have the database and I cannot redesign it. -
I've no possibility to run a benchmark, but have you tried:
SELECT * FROM a JOIN b ON (BINARY a.text = BINARY b.text)EDIT
Just as a sidenote: When using the
BINARYoperator both columnsa.textandb.textmust use the same character set as comparison is done on the byte-level.Mitch Wheat : That's a good idea!czuk : It works, but it took 8 min 0.73 sec, while the simple comparison only 2 sec. It is strange for me. Is there a solution that is as fast as insensitive comparison?Stefan Gehrig : The problem is that even if you've indexed both columns correctly, these indexes won't be used as you apply an operator to the column content (same as applying a function). In my opinion you're stuck here if you don't have a chance to modify the database schema at all. -
You could create your tables using certain character sets and collations (f.e. latin1_bin) that are case sensitive. Just look at the create table syntax. in this case the = operator should be as fast as on standard settings (like f.e. latin1 and latin1_swedish_ci).
-
If that
textcolumns are always compared case-sensitively, give them a case sensitive collation in the schema (VARCHAR BINARYis one way of doing that) and your original query will work.Now give them each an index to make the joins fast, if they're not already keys.
knittl : case sensitive collations is a good idea! -
First of all, check indexes on these columns. This join takes fractions of a second on thousands of rows in both tables, provided that the fields are indexed properly.
Second, make sure you use same collations on both table. If you don't, specify the collation for the table column you want to be leading in the join.
Note that collation conversion makes condition not sargable and the indexes not usable. If you want case sensitive comparison to be fast, make sure your collation is set to
binary(likeUTF8_BIN) in both fields.
0 comments:
Post a Comment