Bug 200341

Summary: possible bad interaction between pvmove and mysqld
Product: Red Hat Enterprise Linux 4 Reporter: Mark Tinberg <mtinberg>
Component: lvm2Assignee: Milan Broz <mbroz>
Status: CLOSED DUPLICATE QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 4.0CC: agk, dwysocha, mbroz, pvrabec
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-01-14 14:19:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
sysreport from affected system
none
vgdisplay verbose output
none
mysql configuration for affected database
none
mysql error log from affected database none

Description Mark Tinberg 2006-07-27 00:22:29 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/418.8 (KHTML, like Gecko) Safari/419.3

Description of problem:
I apologize in advance for not having sufficient factual data with which to make a proper bug report.

On 7 Jun 06 a db server that I manage experienced massive mysql innodb tablespace corruption at 
approximately the same time that a pvmove of its most recently added disk space completed.  The 
corruption seemed to only affect the most recently added data, from the last few minutes or hours, data 
from the previous day going back in time did not seem to be corrupted.  I'm not sure whether this was 
some sort of mysql bug or a problem in pvmove or some kind of cache or buffer inconsistancy when they 
are used together or with innodb tablespaces that are accessed with the O_DIRECT option.  There were no 
kernel errors for this time frame and the system has not been rebooted since the incident and continues 
to run a new database (with the old data backfilled in) on a different disk without incident.

Version-Release number of selected component (if applicable):
kernel-smp-2.6.9-34.EL lvm2-2.02.01-1.3.RHEL4 mysql-server-4.1.12-3.RHEL4.1

How reproducible:
Couldn't Reproduce


Steps to Reproduce:
I tried to reproduce on a test piece of hardware by running pvmove in a continuous loop between two disk 
devices while running mysql benchmarks for several weeks but was unable to reproduce the problem.  The 
hardware I tested on was a Dell PE850 with two SATA disks, unfortunately I don't have an FC test setup I 
could use that would more closely replicate the production config.

Actual Results:


Expected Results:


Additional info:
System: Dell PE1850 x86_64 w/ 2x 2.8GHz Xeon  w/HT 4GB RAM
HBA: 2x QLogic QLA2312 w/ Optical SFP
SAN: Infortrend A16F-R1211+JBOD & Infortrend A16F-R2221

Comment 1 Mark Tinberg 2006-07-27 00:24:07 UTC
Created attachment 133120 [details]
sysreport from affected system

Comment 2 Mark Tinberg 2006-07-27 00:24:52 UTC
Created attachment 133121 [details]
vgdisplay verbose output

Comment 3 Mark Tinberg 2006-07-27 00:26:01 UTC
Created attachment 133122 [details]
mysql configuration for affected database

this config includes the force_recovery option added during db recovery

Comment 4 Mark Tinberg 2006-07-27 00:27:24 UTC
Created attachment 133123 [details]
mysql error log from affected database

this has sensitive and redundant data removed

Comment 5 Milan Broz 2008-01-14 14:19:40 UTC
Well, there is not too much information about physical volumes used during
pvmove  operation but I guess that this could be another demonstration of bug
179201 (not with kernel oops - but only with temporary structure corruption
in-memory).

From log I see there are many PV used in usagelog VG with mysqlold and myql LV,
so there are multisegmented LVs - and exactly this triggers kernel bug during
pvmove operation.

Closing as duplicate of recently fixed bug.


*** This bug has been marked as a duplicate of 179201 ***