Bug 830326
| Summary: | mysql-server poor correlated subquery performance | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Charles Kloote <ckloote> | ||||
| Component: | mysql | Assignee: | Jakub Dorňák <jdornak> | ||||
| Status: | CLOSED WONTFIX | QA Contact: | qe-baseos-daemons | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 6.0 | CC: | byte, databases-maint, hhorak, praiskup, psklenar | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2015-10-14 06:48:06 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 947775, 1070830, 1159820 | ||||||
| Attachments: |
|
||||||
|
Description
Charles Kloote
2012-06-08 20:57:27 UTC
That mysql bug is pretty devoid of any concrete information :-(. Can you put together a self-contained test case? Yeah, we'll see if we can make it happen with a dummy schema and data, then update this bug. May take a few days to put something together. Thanks. Just to clarify the situation: the mysql 5.6.5 release notes (http://dev.mysql.com/doc/refman/5.6/en/news-5-6-5.html) indicate that quite a lot of work was done in the area of optimizing "foo IN (sub-query)" constructs; it looks like a near-total rewrite of that section of the optimizer. If that's what fixed your problem, there's no chance of maintaining a back-port into 5.1 or even 5.5. The only thing that gives me some hope is that the test query shown in upstream bug 64557 includes GROUP BY in the sub-query, but the documentation at http://dev.mysql.com/doc/refman/5.6/en/semi-joins.html says that the new semi-join handling doesn't cover such cases. So for this specific case, the fix might not have come from the semi-join rewrite but from some more localized change that we might stand a chance of back-porting. So in short, we need to see a test case that pretty exactly embodies the specific pain point you're having, and then we can try to trace what fixed it. It may well end up that the only sane answer will be "you gotta move to 5.6 when it's out", but we'll take a look at it. Created attachment 599002 [details]
DB schema for bug recreation test
I have uploaded a small gzipped MySQL dump. Here's a query against that dump that demonstrates the problem: SELECT foo.foo_id, foo.data, bar.bar_id, bar.timestamp FROM foo JOIN bar on foo.foo_id = bar.foo_id WHERE ( bar.foo_id = ( SELECT bar.foo_id FROM bar WHERE bar.foo_id = foo.foo_id ORDER BY bar.timestamp DESC LIMIT 1 ) ) LIMIT 1000; On RHEL5 with MySQL 5.0.45, the results return nearly instantly. On RHEL6 with MySQL 5.1.61 however: 1000 rows in set (1.93 sec) On much larger tables with real production data, that execution time gets much much longer. Let me know if you have any questions. Yeah, if this isn't something that can be backported, we do have workarounds by breaking this up into two queries, but I'd still be curious to learn the results of your testing. Thanks for the test case! Just by eyeball, it does not look like a candidate to be a semi-join, so maybe there's a chance of doing something for you. Honza Horak had expressed interest in studying this, so I'm going to let him do the initial research. Well, I did some testing and it seems the reality is a bit different than we expected. I used dataset and test query from comment #6 from Charles tried it on mysql-5.0.95, mysql-5.5.24 and mysql-5.6.6-m9, which came right from upstream repository. Here are results: ---------------------------- | mysql-5.0.95 | ~0.02 s | | mysql-5.5.24 | ~4.00 s | | mysql-5.6.6-m9 | ~2.00 s | ---------------------------- So the performance issue doesn't seem to be actually fixed even in the present source code, which means Charles probably hit another issue than #64557. Going a bit deeper, EXPLAIN shows the same result in mysql-5.5.24 and in mysql-5.6.6: +----+--------------------+-------+-------+---------------+ | id | select_type | table | type | possible_keys | +----+--------------------+-------+-------+---------------+ | 1 | PRIMARY | foo | ALL | PRIMARY | | 1 | PRIMARY | bar | ref | foo_id | | 2 | DEPENDENT SUBQUERY | bar | index | foo_id | +----+--------------------+-------+-------+---------------+ -----------+---------+------+------+-------------+ key | key_len | ref | rows | Extra | -----------+---------+------+------+-------------+ NULL | NULL | NULL | 885 | Using where | foo_id | 4 | func | 49 | Using where | timestamp | 4 | NULL | 1 | Using where | -----------+---------+------+------+-------------+ ...meanwhile mysql-5.0.95 shows a bit different EXPLAIN: +----+--------------------+-------+------+---------------+--------+ | id | select_type | table | type | possible_keys | key | +----+--------------------+-------+------+---------------+--------+ | 1 | PRIMARY | foo | ALL | PRIMARY | NULL | | 1 | PRIMARY | bar | ref | foo_id | foo_id | | 2 | DEPENDENT SUBQUERY | bar | ref | foo_id | foo_id | +----+--------------------+-------+------+---------------+--------+ ---------+----------------------+------+-----------------------------+ key_len | ref | rows | Extra | ---------+----------------------+------+-----------------------------+ NULL | NULL | 1099 | Using where | 4 | func | 50 | Using where | 4 | bug830326.foo.foo_id | 50 | Using where; Using filesort | ---------+----------------------+------+-----------------------------+ That means that the use of timestamp index significantly slows down the query, which is what one wouldn't expect. I'd say it is a bug, but I haven't found such bug report at bugs.mysql.com. Charles, would you mind reporting that performance issue or shall I? Anyway, using "IGNORE INDEX (timestamp)" for bar table in the subquery I got the same fast results as in mysql-5.0.95, so it could be a fix for Charles at least. Since you're working directly with the upstream MySQL source it probably makes the most sense for you to file the bug, if that works for you. I can confirm that doing IGNORE INDEX (timestamp) on that query improves performance to 0.08 sec, so that definitely gives us a good workaround for the time being. Reported as http://bugs.mysql.com/bug.php?id=65957. This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. Upstream only fixed this in 5.6+ releases and doesn't plan to backport it to older releases. Due to big differences in planner implementation we're not able to backport the fix into 5.1 and since using downstrea-only patch would be too risky, I'm closing this bug and don't plan to fix it. Please, mind that there is still a work-around to specify "IGNORE INDEX" to force planner to choose better execution plan. |