1115132 – glusterfs two node replication test appeared to result in data corruption

Bug 1115132 - glusterfs two node replication test appeared to result in data corruption

Summary: glusterfs two node replication test appeared to result in data corruption

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	replicate
Sub Component:
Version:	3.5.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	GlusterFS Bugs list
QA Contact:	Ben Turner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-07-01 16:09 UTC by John Salinas
Modified:	2014-09-30 13:39 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2014-09-30 13:39:44 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
e2image of /home/metavera tdcslb1 (6.60 MB, application/octet-stream) 2014-07-01 16:09 UTC, John Salinas	no flags	Details
e2image tdcslb2 (6.60 MB, application/octet-stream) 2014-07-01 16:10 UTC, John Salinas	no flags	Details
gluster log files tdcslb1 (33.28 KB, application/octet-stream) 2014-07-01 16:18 UTC, John Salinas	no flags	Details
gluster log files tdcslb2 (34.01 KB, application/octet-stream) 2014-07-01 16:19 UTC, John Salinas	no flags	Details
sosreport tdcslb1 (6.02 MB, application/octet-stream) 2014-07-01 16:20 UTC, John Salinas	no flags	Details
sosreport tdcslb2 (6.05 MB, application/octet-stream) 2014-07-01 16:21 UTC, John Salinas	no flags	Details
clean tdcslb1 (25.44 KB, text/plain) 2014-07-01 16:22 UTC, John Salinas	no flags	Details
clean tdcslb2 (21.22 KB, text/plain) 2014-07-01 16:22 UTC, John Salinas	no flags	Details
View All

Description John Salinas 2014-07-01 16:09:14 UTC

Created attachment 913802 [details]
e2image of /home/metavera tdcslb1

Description of problem:

We have two linux test vms. Each has its own /home/metavera file system. One directory in that has gluster replication on top of it: 

# df -h /home/metavera/ Filesystem Size Used Avail Use% Mounted on /dev/mapper/rootvg-rootvol 2.5G 2.3G 109M 96% / 
# df -h /home/metavera/ Filesystem Size Used Avail Use% Mounted on /dev/mapper/rootvg-rootvol 2.5G 2.0G 322M 87% / 

#gluster volume info 
Volume Name: datastore 
Type: Replicate 
Volume ID: c73bd310-8be7-4021-ade5-a5cfb1e91fac 
Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: tdcslb1:/home/metavera/enterprisecarshare.com/test Brick2: tdcslb2:/home/metavera/enterprisecarshare.com/test 

On each of these system this gluster volume "datastore" is mounted as /mnt/glusterfs. 

On tdcslb1 the following iozone job was running: 

[root@tdcslb1 current]# ./iozone -Ra -g 2G -i0 -i 1 -i 2 -f /mnt/glusterfs/iozone_test Iozone: Performance Test of File I/O Version $Revision: 3.424 $ Compiled for 64 bit mode. Build: linux Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins Al Slater, Scott Rhine, Mike Wisner, Ken Goss Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR, Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner, Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone, Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root, Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer, Vangel Bojaxhi, Ben England, Vikentsi Lapa. Run began: Mon Jun 30 13:44:16 2014 Excel chart generation enabled Auto Mode Using maximum file size of 2097152 kilobytes. Command line used: ./iozone -Ra -g 2G -i0 -i 1 -i 2 -f /mnt/glusterfs/iozone_test Output is in kBytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 kBytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. random random bkwd record stride kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 64 4 35512 36507 92634 2006158 1105556 16252 64 8 26325 23364 78817 2923952 2052169 24586 ...etc... 524288 2048 108769 115576 224147 1699914 1705063 121318 524288 4096 110032 111976 213880 1440644 1783924 124043 524288 8192 <-------------------------------------------------------reboot 103925 120128 231561 1352328 1454694 122545 524288 16384 117790 16484 193729 1403840 1478762 74662 1048576 64 99938 73909 135913 67536 34160 61833 1048576 128 104751 106741 
...etc... 

At the start of the test gluster was replicating data as expected to tdcslb2. At the point where the <---reboot is noted tdcslb2 was rebooted. When it came up glusterfs was started and a self heal was started. It was at this point that we noticed ext4 errors on both nodes, including JBD: Spotted dirty metadata buffer: Jun 30 13:52:40 tdcslb2 kernel: EXT4-fs error (device dm-3): ext4_add_entry: bad entry in directory #18890585: directory entry across blocks - block=18982400offset=0(0), inode=4109694196, rec_len=62708, name_len=244 
Jun 30 13:53:15 tdcslb2 kernel: EXT4-fs error (device dm-3): ext4_mb_generate_buddy: EXT4-fs: group 449: 2015 blocks in bitmap, 32736 in gd Jun 30 13:53:15 tdcslb2 kernel: EXT4-fs error (device dm-3): ext4_mb_generate_buddy: EXT4-fs: group 450: 0 blocks in bitmap, 32768 in gd 
Jun 30 13:53:32 tdcslb2 kernel: EXT4-fs error (device dm-3): ext4_mb_generate_buddy: EXT4-fs: group 641: 31341 blocks in bitmap, 31346 in gd Jun 30 13:53:32 tdcslb2 kernel: JBD: Spotted dirty metadata buffer (dev = dm-3, blocknr = 0). There's a risk of filesystem corruption in case of system crash. Jun 30 13:54:45 tdcslb2 kernel: EXT4-fs error (device dm-3): ext4_mb_generate_buddy: EXT4-fs: group 579: 24482 blocks in bitmap, 24481 in gd Jun 30 13:54:45 tdcslb2 kernel: JBD: Spotted dirty metadata buffer (dev = dm-3, blocknr = 0). There's a risk of filesystem corruption in case of system crash. 

Shortly after this iozone failed with a data mismatch on tdcslb1 where it was expecting one data pattern but received another. Oddly when the filesystem was cleaned no errors were reported on the iozone file ïozone_test" but most appeared to be directories and files that were already there and not being accessed/used for this test.

Version-Release number of selected component (if applicable):
glusterfs-server-3.5.0-2.el6.x86_64
glusterfs-fuse-3.5.0-2.el6.x86_64 

RHEL 6.5 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Dec 13 06:58:20 EST 2013 x86_64

How reproducible:

This happened on our first first bring one of the nodes that had replicated down hard and bring it backup.  This suggests it is reproducible but we have not yet tried again. 

Steps to Reproduce:
1. Two nodes each with their own file system.  Replicate one directory of this file system to each node
2. Start iozone test on one node
3. After the test has run for awhile (the test was about at 1GB written when we tried) reboot the other node 
4. Start self heal after the node is up and gluster is running again on the node that was brought down hard.

Actual results:
iozone expected one pattern but received another 

File system errors: 
Jun 30 13:53:15 tdcslb2 kernel: EXT4-fs error (device dm-3): ext4_mb_generate_buddy: EXT4-fs: group 450: 0 blocks in bitmap, 32768 in gd Jun 30 13:53:32 tdcslb2 kernel: EXT4-fs error (device dm-3): ext4_mb_generate_buddy: EXT4-fs: group 641: 31341 blocks in bitmap, 31346 in gd Jun 30 13:53:32 tdcslb2 kernel: JBD: Spotted dirty metadata buffer (dev = dm-3, blocknr = 0). There's a risk of filesystem corruption in case of system crash. Jun 30 13:54:45 tdcslb2 kernel: EXT4-fs error (device dm-3): ext4_mb_generate_buddy: EXT4-fs: group 579: 24482 blocks in bitmap, 24481 in gd Jun 30 13:54:45 tdcslb2 kernel: JBD: Spotted dirty metadata buffer (dev = dm-3, blocknr = 0). There's a risk of filesystem corruption in case of system crash. 

Expected results:
iozone test to finish
data to be replicated successfully to other node


Additional info:

Comment 1 John Salinas 2014-07-01 16:10:40 UTC

Created attachment 913803 [details]
e2image tdcslb2

Comment 2 John Salinas 2014-07-01 16:18:43 UTC

Created attachment 913804 [details]
gluster log files tdcslb1

Comment 3 John Salinas 2014-07-01 16:19:08 UTC

Created attachment 913805 [details]
gluster log files tdcslb2

Comment 4 John Salinas 2014-07-01 16:20:24 UTC

Created attachment 913806 [details]
sosreport tdcslb1

Comment 5 John Salinas 2014-07-01 16:21:21 UTC

Created attachment 913807 [details]
sosreport tdcslb2

Comment 6 John Salinas 2014-07-01 16:22:21 UTC

Created attachment 913808 [details]
clean tdcslb1

Comment 7 John Salinas 2014-07-01 16:22:48 UTC

Created attachment 913809 [details]
clean tdcslb2

Comment 9 John Salinas 2014-07-01 16:31:38 UTC

I have forced crash dumps available if they are useful I can figure out a way to link them to the case (they are too big to attach).

Comment 10 SATHEESARAN 2014-07-02 05:07:40 UTC

This looks like a upstream bug and should be moved to product GlusterFS

Comment 11 Ravishankar N 2014-07-02 15:26:09 UTC

Changing product/version to upstream as the issue is reported on glusterfs 3.5.0. Please correct if I am mistaken.

Comment 12 Justin Clift 2014-07-02 16:38:31 UTC

John, is there any way you could test with 3.5.1 instead of 3.5.0?

Just asking, as we fixed a bunch of bugs in 3.5.1. :)

Comment 13 John Salinas 2014-07-02 17:26:41 UTC

Sure I guess that must have released after my initial install.  I see the software I will go out and get it installed today / tomorrow and re-run.

Comment 14 John Salinas 2014-07-03 16:57:37 UTC

Okay I am now running: 

[root@tdcslb2 reserve]# rpm -qa |grep -i gluster
glusterfs-server-3.5.1-1.el6.x86_64
glusterfs-libs-3.5.1-1.el6.x86_64
glusterfs-fuse-3.5.1-1.el6.x86_64
glusterfs-api-3.5.1-1.el6.x86_64
glusterfs-3.5.1-1.el6.x86_64
glusterfs-cli-3.5.1-1.el6.x86_64 

I ran through the test and was able to bring down each node hard and self heal ran automatically (so when I ran it by hand it had already run) and there were no issues.  I will repeat several more times today and tomorrow and close this if no issues are found.

Comment 15 Pranith Kumar K 2014-07-13 06:50:57 UTC

John,
    Could you please update the bug with your findings.

Thanks
Pranith

Comment 16 John Salinas 2014-09-30 13:39:44 UTC

This bug can be closed, I have not been able to reproduce since I updated gluster and we have been running for 2 months now successfully

Note You need to log in before you can comment on or make changes to this bug.