Bug 204595 - relay log gets corrupted when communication with master is lost in the middle of a large update/insert (usually blobs)
relay log gets corrupted when communication with master is lost in the middle...
Product: Fedora
Classification: Fedora
Component: mysql (Show other bugs)
x86_64 Linux
medium Severity urgent
: ---
: ---
Assigned To: Tom Lane
David Lawrence
Depends On:
  Show dependency treegraph
Reported: 2006-08-30 06:17 EDT by Pau Aliagas
Modified: 2013-07-02 23:10 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2006-08-30 12:42:48 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Pau Aliagas 2006-08-30 06:17:28 EDT
Description of problem:

The scenario is:
-we have a remote master and a replica
-communication between both is lost temporarily (1-2 minutes)
-the replica is in the middle of a long read AFAICT: for example reading an
insert of a blob

When this happens, the replica writes garbage to its relay log and, when it
tries to execute it, it finds that the sql that position is just corrupted.

I'd say that if it can recover the socket connection, it does not write garbag,
but when it loses it, it writes the unfinished position in the log, actually
creating the problem.

Version-Release number of selected component (if applicable):

Latest Fedora 5 release up to date: mysql-server-5.0.22-1.FC5.1

How reproducible:

I'd say that every time that the connection is lost and the socket connection
times out.

Steps to Reproduce:
1. Start master and replica
2. Create a long INSERT that takes time to cross the network (better to have a
slow enough connection betwen them to make it easy to catch it)
3. Cut communicationfor 2 minutes
Actual results:

Garbage in the replica relay. In example inserting a 62576 bytes long blob:

060829 21:36:50 [Note] Slave I/O thread: connected to master
'satchmo@satchmo.smsarena.com:3306',  replication started in log
'satchmo-bin.000032' at position 808604296
060830  8:08:57 [ERROR] Error reading packet from server: Lost connection to
MySQL server during query ( server_errno=2013)
060830  8:08:57 [Note] Slave I/O thread: Failed reading log event, reconnecting
to retry, log 'satchmo-bin.000033' position 563828083
060830  8:08:57 [ERROR] Slave: Error 'You have an error in your SQL syntax;
check the manual that corresponds to your MySQL server version for the right
syntax to use near 'EMISeruanSuci', tamany='62576', id_tipus_contingut='599',
id_content_type='13', ' at line 1' on query. Default database: 'smsarena'.
Query: 'INSERT INTO contingut SET codi_contingut='EMIGerimis', tamany='47558',
id_tipus_contingut='590', id_content_type='1', contingut='#!AMR
>*^P^Y<88><96>>     <80>^G^W\"<B6>z<D6>2<C3>j<97>^B<80>&<A4><F6>_^N<C1>}I<F0><
q<9A>H<B2><D9> ^^^W<83><B8>`<DD><C8><F9>^Dm^Y^T<9C><F8><BD><C7>^Cq^US<C2>6
à <^Z>!y<BC>F
  Y^?<99><81> <<DE><U+0243>3
060830  8:08:57 [ERROR] Error running query, slave SQL thread aborted. Fix the
problem, and restart the slave SQL thread with "SLAVE START". We stopped at log
'satchmo-bin.000033' position 563779430

Expected results:

I'd expect that if the position (in mysql relay file terms) is not fully read,
nothing would be written to the relay log.

Additional info:

It has happend me 4 times in one week.
Our master and replica run in fast idle machines.
The communication channel is a full 2-way 2Mb pipe that has some hipcups, from
what I can see.

If you need additional info, plese fell fre to ask.
Comment 1 Tom Lane 2006-08-30 08:51:16 EDT
I'd suggest filing this upstream at bugs.mysql.com.  I don't think we have the expertise in-house to deal 
with it.
Comment 2 Pau Aliagas 2006-08-30 10:12:16 EDT
(In reply to comment #1)
> I'd suggest filing this upstream at bugs.mysql.com.  I don't think we have the
expertise in-house to deal 
> with it.

Thanks Tom.

Reported upstream on:
Comment 3 Tom Lane 2006-08-30 12:42:48 EDT
OK, closing this entry as duly reported upstream.

Note You need to log in before you can comment on or make changes to this bug.