204595 – relay log gets corrupted when communication with master is lost in the middle of a large update/insert (usually blobs)

Bug 204595 - relay log gets corrupted when communication with master is lost in the middle of a large update/insert (usually blobs)

Summary: relay log gets corrupted when communication with master is lost in the middle...

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	mysql
Sub Component:
Version:	5
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	urgent
Target Milestone:	---
Assignee:	Tom Lane
QA Contact:	David Lawrence
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-08-30 10:17 UTC by Pau Aliagas
Modified:	2013-07-03 03:10 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-08-30 16:42:48 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Pau Aliagas 2006-08-30 10:17:28 UTC

Description of problem:

The scenario is:
-we have a remote master and a replica
-communication between both is lost temporarily (1-2 minutes)
-the replica is in the middle of a long read AFAICT: for example reading an
insert of a blob

When this happens, the replica writes garbage to its relay log and, when it
tries to execute it, it finds that the sql that position is just corrupted.

I'd say that if it can recover the socket connection, it does not write garbag,
but when it loses it, it writes the unfinished position in the log, actually
creating the problem.

Version-Release number of selected component (if applicable):

Latest Fedora 5 release up to date: mysql-server-5.0.22-1.FC5.1

How reproducible:

I'd say that every time that the connection is lost and the socket connection
times out.

Steps to Reproduce:
1. Start master and replica
2. Create a long INSERT that takes time to cross the network (better to have a
slow enough connection betwen them to make it easy to catch it)
3. Cut communicationfor 2 minutes
  
Actual results:

Garbage in the replica relay. In example inserting a 62576 bytes long blob:

060829 21:36:50 [Note] Slave I/O thread: connected to master
'satchmo.com:3306',  replication started in log
'satchmo-bin.000032' at position 808604296
060830  8:08:57 [ERROR] Error reading packet from server: Lost connection to
MySQL server during query ( server_errno=2013)
060830  8:08:57 [Note] Slave I/O thread: Failed reading log event, reconnecting
to retry, log 'satchmo-bin.000033' position 563828083
060830  8:08:57 [ERROR] Slave: Error 'You have an error in your SQL syntax;
check the manual that corresponds to your MySQL server version for the right
syntax to use near 'EMISeruanSuci', tamany='62576', id_tipus_contingut='599',
id_content_type='13', ' at line 1' on query. Default database: 'smsarena'.
Query: 'INSERT INTO contingut SET codi_contingut='EMIGerimis', tamany='47558',
id_tipus_contingut='590', id_content_type='1', contingut='#!AMR
<|<F4><82><B4>nX\0\0ao<BA>\0^S03<C0>\0\0\0<FE>~ESC<D4>\0\0\0\0=<FF><FB><90>*   
P<B0>T#$*n<F2><DD>c<FA><A5><E6>^?
<81><C9>$<A4>nP1^_<F6>*<B7>4<97>0<&l_D^Y<8F><98><E1><FE>}<C9><D7>&ã£*<F7><EE><E7>c<F0>/.Þ(<FA>llp<
7<A0>QY<80><C0>
^R<CC>~<B6><8F>it$P<8D>1Þ©v<A2>Þª%\"^W<85>.<DF>p<^\<88>\'^X^Xt<94><BE>^A<83>ZB<AF>n<E5>*^X<CA>Q?<E0>&<D3>:<B6><84><<D5>Z5^X^Y
^P<BE>^D<DD>|9M<86>\\D<C5>;{<B3>v<A0><AF><AA><E4><E7>t{>=0<^^m<9A>Z<B8><C0>^?>^A<87><9A>B_<FD><DB><FE>
<A1>v<FA><DE>!G<91>^W<BE><E9>\\<95><E0>#<80><<D8>?<90>\"<B3>^C9^^Q<81>5w<DC>Lr^Q!WH.<99>-<CE>m><U+07EF>^^1<E1><90><&m<9A>P^X<F0>
<FE>^S*^Z<A6><9F><C1><98><F3><F6>4@<F1><A7><A6>^\<C9><EA>^^^YE<BC><97><A0><
>*^P^Y<88><96>>     <80>^G^W\"<B6>z<D6>2<C3>j<97>^B<80>&<A4><F6>_^N<C1>}I<F0><
q<9A>H<B2><D9> ^^^W<83><B8>`<DD><C8><F9>^Dm^Y^T<9C><F8><BD><C7>^Cq^US<C2>6
<C7><F0><^\<C8>^Q@^Y<80>D~^U(<C6>g^Q^Y^X<80><8B>^O&<A4><92>><A8><AB><AA>{P$
Ã <^Z>!y<BC>F
~^W)<DA>(<B1><FB>^M^N<A0><FF><D1><C8>^D<D4>^S<90>1<C0>MqZ<A1><F0><<D8>É<91>ï¼~^O<87><C6>I+<E1>)N><A3>^\<91>^B&<E5>
<80>sV<CC>^E<8A>(<E0><<D8>ÉM<E5>D^Q<FE>^Wm<BB><E7><B1>O<97>M<F8><94><F0><EB>ESCU78+
  Y^?<99><81> <<DE><U+0243>3
<B6>b^W^,<9C>^EH<99><A4><A2><8E><86><87><C0><BE><CF>e.^K2<8E><AE><DF>q#<B0><D?<90>\'<ED>L4<9E>^OTH<E4>Ê£<8C>^SCi<A8>
Ñ¼G<AE>!XL<96>P<AC><8C><80><<D8>>$9^]&^P<FE>^_/<FB>|9.<C0>w<BB><F1>l<D4>Ze^\mESCm<D2>.<C1>;<F0>
060830  8:08:57 [ERROR] Error running query, slave SQL thread aborted. Fix the
problem, and restart the slave SQL thread with "SLAVE START". We stopped at log
'satchmo-bin.000033' position 563779430

Expected results:

I'd expect that if the position (in mysql relay file terms) is not fully read,
nothing would be written to the relay log.

Additional info:

It has happend me 4 times in one week.
Our master and replica run in fast idle machines.
The communication channel is a full 2-way 2Mb pipe that has some hipcups, from
what I can see.

If you need additional info, plese fell fre to ask.

Comment 1 Tom Lane 2006-08-30 12:51:16 UTC

I'd suggest filing this upstream at bugs.mysql.com.  I don't think we have the expertise in-house to deal 
with it.

Comment 2 Pau Aliagas 2006-08-30 14:12:16 UTC

(In reply to comment #1)
> I'd suggest filing this upstream at bugs.mysql.com.  I don't think we have the
expertise in-house to deal 
> with it.

Thanks Tom.

Reported upstream on:
http://bugs.mysql.com/21923

Comment 3 Tom Lane 2006-08-30 16:42:48 UTC

OK, closing this entry as duly reported upstream.

Note You need to log in before you can comment on or make changes to this bug.