Bug 853532 - stop slave sql_thread hangs due to user defined variable being replicated to same server
stop slave sql_thread hangs due to user defined variable being replicated to ...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: mysql (Show other bugs)
6.3
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Tom Lane
qe-baseos-daemons
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-31 15:41 EDT by Dan Ragle
Modified: 2013-03-29 06:37 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-03-29 06:37:50 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Dan Ragle 2012-08-31 15:41:02 EDT
Description of problem:

In a circular master/slave environment (two servers that replicate back and forth to one another) setting a user-defined variable on one server will replicate to the other and then BACK to the original with the SLAVE's server-id (and thus the var is set in the slave thread on the original server as well). In certain situations this appears to cause a STOP SLAVE SQL_THREAD statement on the original server to hang until another statement is received from the second server; and in the case of a general STOP SLAVE statement may cause the statement to hang indefinitely (since the IO slave is stopped and thus no further IO statements are received from the second server). 

The STOP SLAVE SQL_THREAD part of this bug sounds a lot like bug 53985, I wasn't sure if it was the same beast or not. Also I'm requesting if that particular bug fix can be back ported to 5.1. It's serious for us since it potentially causes STOP SLAVE as well as a regular server shutdown to hang, requiring us to force-kill the mysql thread to stop the server.

In the example I give below, the replication mode is MIXED. 

Version-Release number of selected component (if applicable):

RHEL 6.3 (2.6.32-279.5.2.el6.x86_64)
MySQL 5.1.61-log

How reproducible:

Always, as below

Steps to Reproduce:
  
1.) Setup two servers, both replicating to one another. For this example I'll call them "A" and "B". "A" has served-id 1 and "B" has server-id 2.

2.) Create TESTTABLE with an int (primary key) and a varchar as a MyISAM

CREATE TABLE `TESTTABLE` (
    `testthis` int(3) unsigned NOT NULL,
    `testthat` varchar(10) NOT NULL,
    PRIMARY KEY (`testthis`)) 
ENGINE=MyISAM DEFAULT CHARSET=latin1

3.) On "B", insert two new records into the table, using two separate statements (i.e.):

mysql> insert into TESTTABLE (testthis,testthat) values (1,'first');
Query OK, 1 row affected (0.00 sec)

mysql> insert into TESTTABLE (testthis,testthat) values (2,'second');
Query OK, 1 row affected (0.00 sec)

4.) On "A", set a user-defined variable and then use it to insert a third record into the test table:

mysql> set @mine = 'third';
Query OK, 0 rows affected (0.00 sec)

mysql> insert into TESTTABLE (testthis,testthat) values (3,@mine);
Query OK, 1 row affected (0.00 sec)

5.) At this point, if you view the relay log on "A" you see something like this:

# at 3179
#120830 15:39:17 server id 2  end_log_pos 5644  Query   thread_id=12    exec_time=0     error_code=0
SET TIMESTAMP=1346355557/*!*/;
insert into TESTTABLE (testthis,testthat) values (1,'first')
/*!*/;
# at 3304
#120830 15:39:26 server id 2  end_log_pos 5770  Query   thread_id=12    exec_time=0     error_code=0
SET TIMESTAMP=1346355566/*!*/;
insert into TESTTABLE (testthis,testthat) values (2,'second')
/*!*/;
# at 3430
#120830 15:40:04 server id 2  end_log_pos 5812  User_var
SET @`mine`:=_latin1 0x7468697264 COLLATE `latin1_swedish_ci`/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

Notice that the set @mine statement replicated to "B" which processed it (and the subsequent insert). It ("B") then wrote its own set @mine statement into its binlog, identified with its own server-id (2) and then sent it back to "A" to be processed, now identified as server-id 2 (so "A" did in fact process it, as is noted in "A"'s relay log above). Is this desired behavior? 

6.) If you now issue a STOP SLAVE SQL_THREAD on "A", it will hang. You can force it to clear by sending through ANOTHER statement from "B"; for example, if you submit this on "B":

mysql> insert into TESTTABLE (testthis,testthat) values (4,'fourth');
Query OK, 1 row affected (0.00 sec)

the STOP SLAVE SQL_THREAD on "A" will complete and return as soon as the slave executes this new statement. 

Actual results:

As described above

Expected results:

stop slave sql_thread (or stop slave, or service mysqld stop, all of which can hang in the above example; since stop slave potentially kills the io_thread before the sql_thread) should not hang on a potentially never completing statement. I'm not sure what the "right" fix is, since I'm not sure if the resetting of the server-id on user-defined variables is desired behavior or not.

Thanks!

Dan Ragle
Comment 2 Dan Ragle 2012-09-01 15:41:40 EDT
FYI, please also see the bug report on bugs.mysql.com:

http://bugs.mysql.com/bug.php?id=66626
Comment 3 RHEL Product and Program Management 2012-12-14 03:03:16 EST
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Comment 4 Honza Horak 2013-03-29 06:37:50 EDT
Since upstream states it is fixed in mysql-5.1.66 [1] and we have 5.1.67 now in RHEL-6, I believe it is fixed in the current release.

[1] http://bugs.mysql.com/bug.php?id=66626

Note You need to log in before you can comment on or make changes to this bug.