988262 – [QEMU/KVM-RHS] EXT4 filesystem corruption & device IO error, in application VMs, after rebalance operation

Bug 988262 - [QEMU/KVM-RHS] EXT4 filesystem corruption & device IO error, in application VMs, after rebalance operation

Summary: [QEMU/KVM-RHS] EXT4 filesystem corruption & device IO error, in application V...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	2.1
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	shishir gowda
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-07-25 08:23 UTC by SATHEESARAN
Modified:	2013-09-23 22:29 UTC (History)
CC List:	11 users (show)
Fixed In Version:	glusterfs-3.4.0.19rhs-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:	virt qemu integration
Last Closed:	2013-09-23 22:29:50 UTC
Embargoed:
Dependent Products:
Flags:	sasundar: needinfo- sasundar: needinfo-

Attachments	(Terms of Use)
EXT4 corruption as seen in appvm6 (44.81 KB, image/png) 2013-07-25 08:26 UTC, SATHEESARAN	no flags	Details
EXT4 corruption as seen in appvm1 (40.40 KB, image/png) 2013-08-07 10:53 UTC, SATHEESARAN	no flags	Details
EXT4 corruption as seen in appvm5 (39.30 KB, image/png) 2013-08-07 10:54 UTC, SATHEESARAN	no flags	Details
Filesystem checking, while rebooting application VM affected with EXT4 FS IO errors while rebalance (26.91 KB, image/png) 2013-08-07 16:17 UTC, SATHEESARAN	no flags	Details
tar-ed sosreports (8.65 MB, application/gzip) 2013-08-20 07:29 UTC, SATHEESARAN	no flags	Details
View All

Description SATHEESARAN 2013-07-25 08:23:30 UTC

Description of problem:
EXT4 FS IO Error and device IO errors are seen in application VMs, ( created with its image file on glusterfs volume, vm-imagestore ), after rebalance operation

Version-Release number of selected component (if applicable):
RHS2.1 - glusterfs-3.4.0.12beta6-1

How reproducible:
Haven't tried to reproduce it

Steps to Reproduce:
1. Create a 6X2, distributed replicate volume
(i.e) gluster volume create replica 2 drvol <brick1> ... <brick12>

2. Fuse mount the volume the client (RHEL6.4)
(i.e) mount.glusterfs 10.70.37.200 /mnt/dr-vmstore

3. Create a raw image of size 20G

4. Create a VM, with the image created in step 3, with RHEL6.3
VM can be created with "virt-install" command
(i.e) virt-install --name appvm1 --ram 2048 --vcpus 2 --pxe --disk <img>,io=threads,device=disk,bus=virtio --vnc

5. Once installation is completed,the VM is powered off, 
machine dependent information is removed using "virt-sysprep"
(i.e) virt-sysprep --add <img.file>

6.Copy the created qcow2 to create 4 more VM Images
(i.e) for i in 2 3 4 5 6 7 8;do cp /mnt/dr-vmstore/appvm1 /mnt/dr-vmstore/appvm${i};done

7.Create 7 more VMs using, images created in step 6
(i.e) virt-install --name appvm1 --ram 2048 --vcpus 2 --disk <img>,io=threads,device=disk,bus=virtio --vnc --import

NOTE: start all the VMs once after installation is complete, also start appvm1

8. When all VMs are up and running, run the script attached, in https://bugzilla.redhat.com/show_bug.cgi?id=983737#c4
This script will simulate running IO continuously inside VM

9.At the gluster end, add a pair of bricks
(i.e) gluster volume add-brick drvol <brick1> <brick2>

10. Start rebalance operation
(i.e) gluster volume rebalance drvol start

11. Check for the status of rebalance operation
(i.e) gluster volume rebalance drvol status

12. Check for the state of VM post rebalance


Actual results:
VM consoles are flooded with Device IO errors and EXT4 FS IO Errors.

Expected results:
VM should be healthy and there should not be any device errors or EXT4 FS IO errors

Additional info:

1. RHS Nodes
=============
10.70.37.200
10.70.37.82
10.70.37.204
10.70.37.188

All RHS VMs configured as follows,
Installed with ISO : RHS-2.1-20130719.n.0-RHS-x86_64-DVD1.iso
VCPUS              : 4
RAM                : 8192 MB

2. RHEL-H Node
==============
rhs-client8.lab.eng.blr.redhat.com  - 10.70.36.32
intalled with : RHEL6.4

3. All gluster commands are executed from : 10.70.37.200

4. Mountpoint at Client 
========================
Mount point : /mnt/dr-vmstore
[root@rhs-client8 ~]# mount
/dev/mapper/vg_rhsclient8-lv_root on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0")
/dev/sda1 on /boot type ext4 (rw)
/dev/mapper/vg_rhsclient8-lv_home on /home type ext4 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
10.70.37.200:drvol on /mnt/dr-vmstore type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)

5. df output
=============
[root@rhs-client8 ~]# df -Th
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/mapper/vg_rhsclient8-lv_root
              ext4     50G  2.9G   44G   7% /
tmpfs        tmpfs    7.8G     0  7.8G   0% /dev/shm
/dev/sda1     ext4    485M   65M  396M  14% /boot
/dev/mapper/vg_rhsclient8-lv_home
              ext4    1.8T  196M  1.7T   1% /home
10.70.37.200:drvol
    fuse.glusterfs    595G  147G  449G  25% /mnt/dr-vmstore

6. Volume Information
======================
Volume Name : drvol

[Thu Jul 25 05:54:14 UTC 2013 root.37.200:~ ] # gluster volume info drvol
 
Volume Name: drvol
Type: Distributed-Replicate
Volume ID: 84ba2d55-3f40-4bb4-8ae9-9db8f0e98b9f
Status: Started
Number of Bricks: 7 x 2 = 14
Transport-type: tcp
Bricks:
Brick1: 10.70.37.200:/rhs/brick1/dir1
Brick2: 10.70.37.82:/rhs/brick1/dir1
Brick3: 10.70.37.200:/rhs/brick2/dir2
Brick4: 10.70.37.82:/rhs/brick2/dir2
Brick5: 10.70.37.200:/rhs/brick3/dir3
Brick6: 10.70.37.82:/rhs/brick3/dir3
Brick7: 10.70.37.204:/rhs/brick1/dir1
Brick8: 10.70.37.188:/rhs/brick1/dir1
Brick9: 10.70.37.204:/rhs/brick2/dir2
Brick10: 10.70.37.188:/rhs/brick2/dir2
Brick11: 10.70.37.204:/rhs/brick3/dir3
Brick12: 10.70.37.188:/rhs/brick3/dir3
Brick13: 10.70.37.200:/rhs/brick4     /spare1    --> extra brick added for rebalance                                                                                                                                   
Brick14: 10.70.37.82:/rhs/brick4/spare1   --> extra brick added for rebalance                                                                                                                              
Options Reconfigured:                                                                                                                                                   
storage.owner-gid: 107                                                                                                                                                  
storage.owner-uid: 107
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off

[Thu Jul 25 07:43:54 UTC 2013 root.37.200:~ ] # gluster volume status drvol
Status of volume: drvol
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick 10.70.37.200:/rhs/brick1/dir1                     49159   Y       4567
Brick 10.70.37.82:/rhs/brick1/dir1                      49153   Y       4496
Brick 10.70.37.200:/rhs/brick2/dir2                     49160   Y       4580
Brick 10.70.37.82:/rhs/brick2/dir2                      49154   Y       4515
Brick 10.70.37.200:/rhs/brick3/dir3                     49161   Y       4593
Brick 10.70.37.82:/rhs/brick3/dir3                      49155   Y       4528
Brick 10.70.37.204:/rhs/brick1/dir1                     49153   Y       4203
Brick 10.70.37.188:/rhs/brick1/dir1                     49152   Y       4141
Brick 10.70.37.204:/rhs/brick2/dir2                     49154   Y       4216
Brick 10.70.37.188:/rhs/brick2/dir2                     49153   Y       4154
Brick 10.70.37.204:/rhs/brick3/dir3                     49155   Y       4229
Brick 10.70.37.188:/rhs/brick3/dir3                     49154   Y       4167
Brick 10.70.37.200:/rhs/brick4/spare1                   49162   Y       6472
Brick 10.70.37.82:/rhs/brick4/spare1                    49156   Y       6339
NFS Server on localhost                                 2049    Y       6486
Self-heal Daemon on localhost                           N/A     Y       6494
NFS Server on 10.70.37.204                              2049    Y       6035
Self-heal Daemon on 10.70.37.204                        N/A     Y       6042
NFS Server on 10.70.37.82                               2049    Y       6354
Self-heal Daemon on 10.70.37.82                         N/A     Y       6361
NFS Server on 10.70.37.188                              2049    Y       5976
Self-heal Daemon on 10.70.37.188                        N/A     Y       5983
 
           Task                                      ID         Status
           ----                                      --         ------
      Rebalance    c2a4fe38-299d-4c46-b45a-f5a246c4c284              3

7.Mount point information
==========================
[root@rhs-client8 ~]# ls -ld /mnt/dr-vmstore/
drwxr-xr-x. 3 qemu qemu 359 Jul 25 05:58 /mnt/dr-vmstore/
[root@rhs-client8 ~]# ls -lh /mnt/dr-vmstore/
total 147G
-rw-r--r--. 1 qemu qemu 21G Jul 25 03:48 appvm1
-rw-r--r--. 1 qemu qemu 21G Jul 25  2013 appvm2
-rw-r--r--. 1 qemu qemu 21G Jul 25  2013 appvm3
-rw-r--r--. 1 qemu qemu 21G Jul 25  2013 appvm4
-rw-r--r--. 1 qemu qemu 21G Jul 25 02:45 appvm5
-rw-r--r--. 1 qemu qemu 20G Jul 25 04:14 appvm6.img
-rw-r--r--. 1 qemu qemu 20G Jul 25 05:10 appvm7.img
-rw-r--r--. 1 qemu qemu 20G Jul 25  2013 appvm8.img

8.Images related information
=============================
[root@rhs-client8 ~]# for i in 1 2 3 4 5; do qemu-img info /mnt/dr-vmstore/appvm$i;done
image: /mnt/dr-vmstore/appvm1
file format: raw
virtual size: 20G (21478375424 bytes)
disk size: 20G
image: /mnt/dr-vmstore/appvm2
file format: raw
virtual size: 20G (21478375424 bytes)
disk size: 20G
image: /mnt/dr-vmstore/appvm3
file format: raw
virtual size: 20G (21478375424 bytes)
disk size: 20G
image: /mnt/dr-vmstore/appvm4
file format: raw
virtual size: 20G (21478375424 bytes)
disk size: 20G
image: /mnt/dr-vmstore/appvm5
file format: raw
virtual size: 20G (21478375424 bytes)
disk size: 20G

[root@rhs-client8 ~]# for i in 6 7 8; do qemu-img info /mnt/dr-vmstore/appvm${i}.img;done
image: /mnt/dr-vmstore/appvm6.img
file format: raw
virtual size: 20G (21474836480 bytes)
disk size: 15G
image: /mnt/dr-vmstore/appvm7.img
file format: raw
virtual size: 20G (21474836480 bytes)
disk size: 15G
image: /mnt/dr-vmstore/appvm8.img
file format: raw
virtual size: 20G (21474836480 bytes)
disk size: 16G

9. VM information
==================
[root@rhs-client8 ~]# for i in 1 2 3 4 5 6 7 8; do virsh domblklist appvm$i; done
Target     Source
------------------------------------------------
vda        /mnt/dr-vmstore/appvm1

Target     Source
------------------------------------------------
vda        /mnt/dr-vmstore/appvm2

Target     Source
------------------------------------------------
vda        /mnt/dr-vmstore/appvm3

Target     Source
------------------------------------------------
vda        /mnt/dr-vmstore/appvm4

Target     Source
------------------------------------------------
vda        /mnt/dr-vmstore/appvm5

Target     Source
------------------------------------------------
vda        /mnt/dr-vmstore/appvm6.img

Target     Source
------------------------------------------------
vda        /mnt/dr-vmstore/appvm7.img

Target     Source
------------------------------------------------
vda        /mnt/dr-vmstore/appvm8.img

10. Corrupted VMs
==================
The following application VMs got corrupted
appvm1, appvm5, appvm6, appvm7

11. Attached screenshot of corruption and errors

Comment 1 SATHEESARAN 2013-07-25 08:26:04 UTC

Created attachment 778110 [details]
EXT4 corruption as seen in appvm6

Comment 2 SATHEESARAN 2013-07-25 08:37:58 UTC

1. Info from Backend
=================
[Thu Jul 25 08:32:29 UTC 2013 root.37.200:~ ] # ls -lh /rhs/brick{1,2,3,4}/*
/rhs/brick1/dir1:
total 0
---------T 2 qemu qemu 0 Jul 25 00:28 appvm5

/rhs/brick2/dir2:
total 0
---------T 2 qemu qemu 0 Jul 25 07:47 appvm1

/rhs/brick3/dir3:
total 0
---------T 2 qemu qemu 0 Jul 25 07:47 appvm6.img
---------T 2 qemu qemu 0 Jul 25 07:47 appvm7.img

/rhs/brick4/spare1:
total 31G
---------T 2 qemu qemu 21G Jul 24 21:57 appvm2
---------T 2 qemu qemu 21G Jul 24 21:57 appvm3
---------T 2 qemu qemu 21G Jul 24 21:57 appvm4
-rw-r--r-- 2 qemu qemu 20G Jul 24 22:44 appvm6.img
-rw-r--r-- 2 qemu qemu 20G Jul 24 23:40 appvm7.img
---------T 2 qemu qemu 20G Jul 24 21:57 appvm8.img

[Thu Jul 25 08:35:00 UTC 2013 root.37.82:~ ] # ls -lh /rhs/brick{1,2,3,4}/*
/rhs/brick1/dir1:
total 0
---------T 2 qemu qemu 0 Jul 25 00:28 appvm5

/rhs/brick2/dir2:
total 0
---------T 2 qemu qemu 0 Jul 25 07:47 appvm1

/rhs/brick3/dir3:
total 0
---------T 2 qemu qemu 0 Jul 25 07:47 appvm6.img
---------T 2 qemu qemu 0 Jul 25 07:47 appvm7.img

/rhs/brick4/spare1:
total 31G
---------T 2 qemu qemu 21G Jul 24 21:57 appvm2
---------T 2 qemu qemu 21G Jul 24 21:57 appvm3
---------T 2 qemu qemu 21G Jul 24 21:57 appvm4
-rw-r--r-- 2 qemu qemu 20G Jul 24 22:44 appvm6.img
-rw-r--r-- 2 qemu qemu 20G Jul 24 23:40 appvm7.img
---------T 2 qemu qemu 20G Jul 24 21:57 appvm8.img

[Thu Jul 25 08:35:54 UTC 2013 root.37.204:~ ] # ls -lh /rhs/brick{1,2,3}/*
/rhs/brick1/dir1:
total 21G
-rw-r--r-- 2 qemu qemu 21G Jul 24 22:18 appvm1

/rhs/brick2/dir2:
total 76G
-rw-r--r-- 2 qemu qemu 21G Jul 25 08:36 appvm2
-rw-r--r-- 2 qemu qemu 21G Jul 25 08:36 appvm3
-rw-r--r-- 2 qemu qemu 21G Jul 25 08:36 appvm4
-rw-r--r-- 2 qemu qemu 20G Jul 25 08:36 appvm8.img

/rhs/brick3/dir3:
total 21G
-rw-r--r-- 2 qemu qemu 21G Jul 24 21:15 appvm5

[Thu Jul 25 08:36:47 UTC 2013 root.37.188:~ ] # ls -lh /rhs/brick{1,2,3}/*
/rhs/brick1/dir1:
total 21G
-rw-r--r-- 2 qemu qemu 21G Jul 24 22:18 appvm1

/rhs/brick2/dir2:
total 76G
-rw-r--r-- 2 qemu qemu 21G Jul 25 08:36 appvm2
-rw-r--r-- 2 qemu qemu 21G Jul 25 08:36 appvm3
-rw-r--r-- 2 qemu qemu 21G Jul 25 08:36 appvm4
-rw-r--r-- 2 qemu qemu 20G Jul 25 08:36 appvm8.img

/rhs/brick3/dir3:
total 21G
-rw-r--r-- 2 qemu qemu 21G Jul 24 21:15 appvm5

2.xattrs on bricks
===================
[Thu Jul 25 08:39:04 UTC 2013 root.37.200:~ ] # getfattr -d -m . -e hex /rhs/brick{1,2,3,4}/*
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/dir1
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x00000001000000002492492449249247
trusted.glusterfs.volume-id=0x84ba2d553f404bb48ae99db8f0e98b9f

# file: rhs/brick2/dir2
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x0000000100000000492492486db6db6b
trusted.glusterfs.volume-id=0x84ba2d553f404bb48ae99db8f0e98b9f

# file: rhs/brick3/dir3
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x00000001000000006db6db6c9249248f
trusted.glusterfs.volume-id=0x84ba2d553f404bb48ae99db8f0e98b9f

# file: rhs/brick4/spare1
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x00000001000000000000000024924923
trusted.glusterfs.volume-id=0x84ba2d553f404bb48ae99db8f0e98b9f

[Thu Jul 25 08:39:45 UTC 2013 root.37.82:~ ] # getfattr -d -m . -e hex /rhs/brick{1,2,3,4}/*
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/dir1
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x00000001000000002492492449249247
trusted.glusterfs.volume-id=0x84ba2d553f404bb48ae99db8f0e98b9f

# file: rhs/brick2/dir2
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x0000000100000000492492486db6db6b
trusted.glusterfs.volume-id=0x84ba2d553f404bb48ae99db8f0e98b9f

# file: rhs/brick3/dir3
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x00000001000000006db6db6c9249248f
trusted.glusterfs.volume-id=0x84ba2d553f404bb48ae99db8f0e98b9f

# file: rhs/brick4/spare1
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x00000001000000000000000024924923
trusted.glusterfs.volume-id=0x84ba2d553f404bb48ae99db8f0e98b9f

[Thu Jul 25 08:40:44 UTC 2013 root.37.204:~ ] # getfattr -d -m . -e hex /rhs/brick{1,2,3}/*
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/dir1
trusted.afr.drvol-client-6=0x000000000000000000000000
trusted.afr.drvol-client-7=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000092492490b6db6db3
trusted.glusterfs.volume-id=0x84ba2d553f404bb48ae99db8f0e98b9f

# file: rhs/brick2/dir2
trusted.afr.drvol-client-8=0x000000000000000000000000
trusted.afr.drvol-client-9=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x0000000100000000b6db6db4db6db6d7
trusted.glusterfs.volume-id=0x84ba2d553f404bb48ae99db8f0e98b9f

# file: rhs/brick3/dir3
trusted.afr.drvol-client-10=0x000000000000000000000000
trusted.afr.drvol-client-11=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x0000000100000000db6db6d8ffffffff
trusted.glusterfs.volume-id=0x84ba2d553f404bb48ae99db8f0e98b9f

[Thu Jul 25 08:41:18 UTC 2013 root.37.188:~ ] # getfattr -d -m . -e hex /rhs/brick{1,2,3}/*
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/dir1
trusted.afr.drvol-client-6=0x000000000000000000000000
trusted.afr.drvol-client-7=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000092492490b6db6db3
trusted.glusterfs.volume-id=0x84ba2d553f404bb48ae99db8f0e98b9f

# file: rhs/brick2/dir2
trusted.afr.drvol-client-8=0x000000000000000000000000
trusted.afr.drvol-client-9=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x0000000100000000b6db6db4db6db6d7
trusted.glusterfs.volume-id=0x84ba2d553f404bb48ae99db8f0e98b9f

# file: rhs/brick3/dir3
trusted.afr.drvol-client-10=0x000000000000000000000000
trusted.afr.drvol-client-11=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x0000000100000000db6db6d8ffffffff
trusted.glusterfs.volume-id=0x84ba2d553f404bb48ae99db8f0e98b9f

3.Rebalance status
==================
[Thu Jul 25 08:39:12 UTC 2013 root.37.200:~ ] # gluster volume rebalance drvol status
                                    Node Rebalanced-files          size       scanned      failures         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   ------------   --------------
                               localhost                3        60.0GB            11             0      completed         11580.00
                             10.70.37.82                0        0Bytes             8             0      completed             1.00
                            10.70.37.188                0        0Bytes             8             0      completed             1.00
                            10.70.37.204                1        20.0GB             9             4      completed          2493.00
volume rebalance: drvol: success:

Comment 4 SATHEESARAN 2013-07-25 08:42:00 UTC

Observation
===========
I could see the following in rebalance error logs(/var/log/glusterfs/drvol-rebalance.log) in 10.70.37.82, as follows :

<snip>
[2013-07-24 21:15:36.297303] I [afr-common.c:2181:afr_discovery_cbk] 0-drvol-replicate-4: selecting local read_child drvol-client-8
[2013-07-24 21:15:36.300858] I [dht-layout.c:636:dht_layout_normalize] 0-drvol-dht: found anomalies in /. holes=1 overlaps=2 missing=0 down=0 misc=0
</snip>

Comment 7 SATHEESARAN 2013-07-25 10:24:02 UTC

(In reply to SATHEESARAN from comment #4)
> Observation
> ===========
> I could see the following in rebalance error
> logs(/var/log/glusterfs/drvol-rebalance.log) in 10.70.37.82, as follows :
> 
> <snip>
> [2013-07-24 21:15:36.297303] I [afr-common.c:2181:afr_discovery_cbk]
> 0-drvol-replicate-4: selecting local read_child drvol-client-8
> [2013-07-24 21:15:36.300858] I [dht-layout.c:636:dht_layout_normalize]
> 0-drvol-dht: found anomalies in /. holes=1 overlaps=2 missing=0 down=0 misc=0
> </snip>

I could see same errors @ client(10.70.36.32) side logs(/var/log/glusterfs/mnt-dr-vmstore.log) also :

<snip>
[2013-07-24 21:14:34.885009] I [dht-layout.c:636:dht_layout_normalize] 1-drvol-dht: found anomalies in /. holes=1 overlaps=0 missing=0 down=1 misc=0
</snip>

Comment 8 shishir gowda 2013-07-29 05:23:35 UTC

Seems to be a fd migration issue:

[2013-07-24 21:14:35.034612] I [fuse-bridge.c:4892:fuse_migrate_fd_open] 0-glusterfs-fuse: migrated basefd (0xbf2fb4) to newfd (0xe645fc) (inode-gfid:16b92d60-1621-4d0b-9752-c090505da239)(old-subvolume:drvol-0 new-subvolume:drvol-1)
[2013-07-24 21:14:35.036723] I [fuse-bridge.c:4892:fuse_migrate_fd_open] 0-glusterfs-fuse: migrated basefd (0xbf2edc) to newfd (0xe64668) (inode-gfid:f60fc19c-627b-4c1a-95de-3d47f45fcab5)(old-subvolume:drvol-0 new-subvolume:drvol-1)
[2013-07-24 21:14:35.039936] I [fuse-bridge.c:4892:fuse_migrate_fd_open] 0-glusterfs-fuse: migrated basefd (0xbf2f48) to newfd (0xe646d4) (inode-gfid:cf62eaa8-7923-4cc9-b145-29f7fd359c86)(old-subvolume:drvol-0 new-subvolume:drvol-1)
[2013-07-24 21:14:35.084415] I [fuse-bridge.c:4892:fuse_migrate_fd_open] 0-glusterfs-fuse: migrated basefd (0xbf3020) to newfd (0xe64740) (inode-gfid:cb5dc5f7-9f42-4712-84e7-3e85aa03ffe0)(old-subvolume:drvol-0 new-subvolume:drvol-1)
[2013-07-24 21:14:35.088213] I [fuse-bridge.c:4892:fuse_migrate_fd_open] 0-glusterfs-fuse: migrated basefd (0xbf30f8) to newfd (0xe647ac) (inode-gfid:4f080784-cf1d-4f5d-9422-1fe8f2dac8be)(old-subvolume:drvol-0 new-subvolume:drvol-1)
[2013-07-24 21:14:35.268606] I [fuse-bridge.c:4892:fuse_migrate_fd_open] 0-glusterfs-fuse: migrated basefd (0xbf3164) to newfd (0xe64818) (inode-gfid:500bdfeb-d5bd-458d-8111-76da7428eef4)(old-subvolume:drvol-0 new-subvolume:drvol-1)
[2013-07-24 21:14:35.270745] I [fuse-bridge.c:4892:fuse_migrate_fd_open] 0-glusterfs-fuse: migrated basefd (0xbf308c) to newfd (0xe64884) (inode-gfid:f24d7373-b952-4f63-a258-a60fbd15c017)(old-subvolume:drvol-0 new-subvolume:drvol-1)
[2013-07-24 21:14:35.272783] I [fuse-bridge.c:4892:fuse_migrate_fd_open] 0-glusterfs-fuse: migrated basefd (0xbf323c) to newfd (0xe648f0) (inode-gfid:4f9990da-74b9-4526-8f1c-fd4a52726cd7)(old-subvolume:drvol-0 new-subvolume:drvol-1)
[2013-07-24 21:14:38.449142] I [rpc-clnt.c:1675:rpc_clnt_reconfig] 1-drvol-client-12: changing port to 49162 (from 0)

.........
[2013-07-24 21:15:15.649436] W [client-rpc-fops.c:866:client3_3_writev_cbk] 1-drvol-client-6: remote operation failed: Bad file descriptor
[2013-07-24 21:15:15.656641] W [client-rpc-fops.c:866:client3_3_writev_cbk] 1-drvol-client-7: remote operation failed: Bad file descriptor
[2013-07-24 21:15:15.656680] W [fuse-bridge.c:2631:fuse_writev_cbk] 0-glusterfs-fuse: 2490994: WRITE => -1 (Bad file descriptor)
[2013-07-24 21:15:15.671073] W [client-rpc-fops.c:866:client3_3_writev_cbk] 1-drvol-client-6: remote operation failed: Bad file descriptor
[2013-07-24 21:15:15.679869] W [client-rpc-fops.c:866:client3_3_writev_cbk] 1-drvol-client-7: remote operation failed: Bad file descriptor

Comment 9 shishir gowda 2013-07-29 05:26:48 UTC

Server logs:
[2013-07-24 18:13:07.294544] I [server-handshake.c:569:server_setvolume] 0-drvol-server: accepted client from rhs-client8.lab.eng.blr.redhat.com-26442-2013/07/24-18:12:45:38843-drvol-client-6-0 (version: 3.4.0.1
2rhs.beta6)
[2013-07-24 18:24:15.488082] I [server.c:773:server_rpc_notify] 0-drvol-server: disconnecting connection from rhs-client8.lab.eng.blr.redhat.com-26442-2013/07/24-18:12:45:38843-drvol-client-6-0, Number of pendin
g operations: 1
[2013-07-24 18:24:15.488232] I [server-helpers.c:752:server_connection_put] 0-drvol-server: Shutting down connection rhs-client8.lab.eng.blr.redhat.com-26442-2013/07/24-18:12:45:38843-drvol-client-6-0
[2013-07-24 18:24:15.488280] I [server-helpers.c:585:server_log_conn_destroy] 0-drvol-server: destroyed connection of rhs-client8.lab.eng.blr.redhat.com-26442-2013/07/24-18:12:45:38843-drvol-client-6-0  
[2013-07-24 18:28:54.770477] I [server-handshake.c:569:server_setvolume] 0-drvol-server: accepted client from rhs-client8.lab.eng.blr.redhat.com-7995-2013/07/24-18:28:32:669057-drvol-client-6-0 (version: 3.4.0.1
2rhs.beta6)
[2013-07-24 21:14:56.482542] I [server-handshake.c:569:server_setvolume] 0-drvol-server: accepted client from rhs-client8.lab.eng.blr.redhat.com-7995-2013/07/24-18:28:32:669057-drvol-client-6-1 (version: 3.4.0.1
2rhs.beta6)
[2013-07-24 21:14:56.581174] I [server.c:773:server_rpc_notify] 0-drvol-server: disconnecting connection from localhost.localdomain-4658-2013/07/24-18:09:15:362646-drvol-client-6-0, Number of pending operations:
 1
[2013-07-24 21:14:56.581251] I [server-helpers.c:752:server_connection_put] 0-drvol-server: Shutting down connection localhost.localdomain-4658-2013/07/24-18:09:15:362646-drvol-client-6-0
[2013-07-24 21:14:56.581286] I [server-helpers.c:585:server_log_conn_destroy] 0-drvol-server: destroyed connection of localhost.localdomain-4658-2013/07/24-18:09:15:362646-drvol-client-6-0  
[2013-07-24 21:14:57.656278] I [server.c:773:server_rpc_notify] 0-drvol-server: disconnecting connection from localhost.localdomain-4613-2013/07/24-18:08:49:227528-drvol-client-6-0, Number of pending operations:
 1
[2013-07-24 21:14:57.656368] I [server-helpers.c:752:server_connection_put] 0-drvol-server: Shutting down connection localhost.localdomain-4613-2013/07/24-18:08:49:227528-drvol-client-6-0
[2013-07-24 21:14:57.656408] I [server-helpers.c:585:server_log_conn_destroy] 0-drvol-server: destroyed connection of localhost.localdomain-4613-2013/07/24-18:08:49:227528-drvol-client-6-0  
[2013-07-24 21:14:59.010729] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2013-07-24 21:14:59.062835] I [server.c:773:server_rpc_notify] 0-drvol-server: disconnecting connection from localhost.localdomain-4587-2013/07/24-18:09:16:565428-drvol-client-6-0, Number of pending operations:
 1
[2013-07-24 21:14:59.062931] I [server-helpers.c:752:server_connection_put] 0-drvol-server: Shutting down connection localhost.localdomain-4587-2013/07/24-18:09:16:565428-drvol-client-6-0
[2013-07-24 21:14:59.062974] I [server-helpers.c:585:server_log_conn_destroy] 0-drvol-server: destroyed connection of localhost.localdomain-4587-2013/07/24-18:09:16:565428-drvol-client-6-0  
[2013-07-24 21:14:59.240440] I [server.c:773:server_rpc_notify] 0-drvol-server: disconnecting connection from localhost.localdomain-4290-2013/07/24-18:09:16:917203-drvol-client-6-0, Number of pending operations:
 1
[2013-07-24 21:14:59.240491] I [server-helpers.c:752:server_connection_put] 0-drvol-server: Shutting down connection localhost.localdomain-4290-2013/07/24-18:09:16:917203-drvol-client-6-0
[2013-07-24 21:14:59.240608] I [server-helpers.c:585:server_log_conn_destroy] 0-drvol-server: destroyed connection of localhost.localdomain-4290-2013/07/24-18:09:16:917203-drvol-client-6-0  
[2013-07-24 21:14:59.351492] I [server.c:773:server_rpc_notify] 0-drvol-server: disconnecting connection from localhost.localdomain-4226-2013/07/24-18:09:16:766361-drvol-client-6-0, Number of pending operations:
 1
[2013-07-24 21:14:59.351609] I [server-helpers.c:752:server_connection_put] 0-drvol-server: Shutting down connection localhost.localdomain-4226-2013/07/24-18:09:16:766361-drvol-client-6-0
[2013-07-24 21:14:59.351640] I [server-helpers.c:585:server_log_conn_destroy] 0-drvol-server: destroyed connection of localhost.localdomain-4226-2013/07/24-18:09:16:766361-drvol-client-6-0  
[2013-07-24 21:15:00.082308] I [server.c:773:server_rpc_notify] 0-drvol-server: disconnecting connection from localhost.localdomain-4547-2013/07/24-18:08:49:538873-drvol-client-6-0, Number of pending operations:
 1
[2013-07-24 21:15:00.082378] I [server-helpers.c:752:server_connection_put] 0-drvol-server: Shutting down connection localhost.localdomain-4547-2013/07/24-18:08:49:538873-drvol-client-6-0
[2013-07-24 21:15:00.082432] I [server-helpers.c:585:server_log_conn_destroy] 0-drvol-server: destroyed connection of localhost.localdomain-4547-2013/07/24-18:08:49:538873-drvol-client-6-0  
[2013-07-24 21:15:00.266400] I [glusterfsd-mgmt.c:1545:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
[2013-07-24 21:15:00.279650] I [server-handshake.c:569:server_setvolume] 0-drvol-server: accepted client from localhost.localdomain-6493-2013/07/24-21:14:58:232896-drvol-client-6-0 (version: 3.4.0.12rhs.beta6)
[2013-07-24 21:15:00.308110] I [server.c:773:server_rpc_notify] 0-drvol-server: disconnecting connection from localhost.localdomain-4248-2013/07/24-18:08:49:885957-drvol-client-6-0, Number of pending operations:
 1
[2013-07-24 21:15:00.308168] I [server-helpers.c:752:server_connection_put] 0-drvol-server: Shutting down connection localhost.localdomain-4248-2013/07/24-18:08:49:885957-drvol-client-6-0
[2013-07-24 21:15:00.308199] I [server-helpers.c:585:server_log_conn_destroy] 0-drvol-server: destroyed connection of localhost.localdomain-4248-2013/07/24-18:08:49:885957-drvol-client-6-0  
[2013-07-24 21:15:00.587673] I [server.c:773:server_rpc_notify] 0-drvol-server: disconnecting connection from localhost.localdomain-4186-2013/07/24-18:08:50:262229-drvol-client-6-0, Number of pending operations:
 1
[2013-07-24 21:15:00.587758] I [server-helpers.c:752:server_connection_put] 0-drvol-server: Shutting down connection localhost.localdomain-4186-2013/07/24-18:08:50:262229-drvol-client-6-0
[2013-07-24 21:15:00.587998] I [server-helpers.c:585:server_log_conn_destroy] 0-drvol-server: destroyed connection of localhost.localdomain-4186-2013/07/24-18:08:50:262229-drvol-client-6-0  
[2013-07-24 21:15:01.346994] I [server-handshake.c:569:server_setvolume] 0-drvol-server: accepted client from localhost.localdomain-6485-2013/07/24-21:14:57:218199-drvol-client-6-0 (version: 3.4.0.12rhs.beta6)
[2013-07-24 21:15:01.423570] I [server-handshake.c:569:server_setvolume] 0-drvol-server: accepted client from localhost.localdomain-6353-2013/07/24-21:14:59:723372-drvol-client-6-0 (version: 3.4.0.12rhs.beta6)
[2013-07-24 21:15:01.440864] I [server-handshake.c:569:server_setvolume] 0-drvol-server: accepted client from localhost.localdomain-6041-2013/07/24-21:15:01:305761-drvol-client-6-0 (version: 3.4.0.12rhs.beta6)
[2013-07-24 21:15:01.467482] I [server-handshake.c:569:server_setvolume] 0-drvol-server: accepted client from localhost.localdomain-6360-2013/07/24-21:15:00:737721-drvol-client-6-0 (version: 3.4.0.12rhs.beta6)
[2013-07-24 21:15:01.636060] I [server-handshake.c:569:server_setvolume] 0-drvol-server: accepted client from localhost.localdomain-6034-2013/07/24-21:15:00:312372-drvol-client-6-0 (version: 3.4.0.12rhs.beta6)
[2013-07-24 21:15:01.731995] I [server-handshake.c:569:server_setvolume] 0-drvol-server: accepted client from localhost.localdomain-5982-2013/07/24-21:15:01:315529-drvol-client-6-0 (version: 3.4.0.12rhs.beta6)
[2013-07-24 21:15:02.068744] I [server-handshake.c:569:server_setvolume] 0-drvol-server: accepted client from localhost.localdomain-5975-2013/07/24-21:15:00:345084-drvol-client-6-0 (version: 3.4.0.12rhs.beta6)
[2013-07-24 21:15:36.490013] I [server-handshake.c:569:server_setvolume] 0-drvol-server: accepted client from localhost.localdomain-6530-2013/07/24-21:15:26:4843-drvol-client-6-0 (version: 3.4.0.12rhs.beta6)
[2013-07-24 21:15:36.616477] I [server-handshake.c:569:server_setvolume] 0-drvol-server: accepted client from localhost.localdomain-6380-2013/07/24-21:15:31:101716-drvol-client-6-0 (version: 3.4.0.12rhs.beta6)
[2013-07-24 21:15:36.929038] I [server-handshake.c:569:server_setvolume] 0-drvol-server: accepted client from localhost.localdomain-6001-2013/07/24-21:15:31:344034-drvol-client-6-0 (version: 3.4.0.12rhs.beta6)
[2013-07-24 21:15:37.251672] I [server-handshake.c:569:server_setvolume] 0-drvol-server: accepted client from localhost.localdomain-6065-2013/07/24-21:15:31:451247-drvol-client-6-0 (version: 3.4.0.12rhs.beta6)
[2013-07-24 21:15:37.314065] E [posix.c:2150:posix_writev] 0-drvol-posix: write failed: offset 6094622720, Bad file descriptor
[2013-07-24 21:15:37.314102] I [server-rpc-fops.c:1439:server_writev_cbk] 0-drvol-server: 2322: WRITEV 1 (4f080784-cf1d-4f5d-9422-1fe8f2dac8be) ==> (Bad file descriptor)
[2013-07-24 21:15:37.335589] E [posix.c:2150:posix_writev] 0-drvol-posix: write failed: offset 6094622720, Bad file descriptor
[2013-07-24 21:15:37.335669] I [server-rpc-fops.c:1439:server_writev_cbk] 0-drvol-server: 2325: WRITEV 1 (4f080784-cf1d-4f5d-9422-1fe8f2dac8be) ==> (Bad file descriptor)
[2013-07-24 21:15:37.358656] E [posix.c:2150:posix_writev] 0-drvol-posix: write failed: offset 6095114240, Bad file descriptor

Comment 10 Amar Tumballi 2013-07-30 12:08:32 UTC

Spent some time on this issue today with Avati and Shishir, and it seems that this particular cases have come up in Big Bend now is because of 'open-behind' (and if you notice, 'gluster volume set <VOL> group virt' didn't disable open-behind). With build glusterfs-3.4.0.13rhs, open-behind is default off on the volume, and hence can we get a final round of test done on this ?

Comment 11 SATHEESARAN 2013-07-31 18:27:21 UTC

Amar,

Yes. I will test this particular scenario with latest glusterfs [3.4.0.14rhs-1] and will update this bug.

Comment 12 shishir gowda 2013-08-06 09:26:04 UTC

Can you please update the bug if the test ran without open-behind enabled (in latest glusterfs post 3.4.013 open-behind is off by default)?

Comment 13 SATHEESARAN 2013-08-06 09:28:39 UTC

Shishir,

I will update soon after testing this.

Comment 14 SATHEESARAN 2013-08-07 10:52:13 UTC

I re-tested the same with glusterfs-3.4.0.17rhs-1, after installing ISO, RHS-2.1-20130806.n.2-RHS-x86_64-DVD1.iso

I am still seeing the same issue happening in 2/7 VMs

Steps Performed
===============
1. Create a 6X2 distributed-replicate volume, with 4 node cluster, each node having 3 bricks
(i.e) gluster volume create <vol-name> replica 2 <brick1> .. <brick12>

2. Set the volume with profile virt
(i.e) gluster volume set <vol-name> group virt

3. Set owner-uid and owner-gid to 107 and 107
(i.e) gluster volume set <vol-name> storage.owner-uid 107
      gluster volume set <vol-name> storage.owner-uid 107

4. start the volume
(i.e) gluster volume start <vol-name>

5. Fuse mount the volume on the client [RHEL 6.4 in my case]
(i.e) mount.glusterfs <RHS-Server>:<vol-name> <mount-point>

6. Created 4 qcow2 images (thick provisioned) and 3 raw images, each of size
20GB
(i.e) for i in 1 2 3 4; do qemu-img create -f qcow2 -o preallocation=metadata <mount-point>/appvm${i}.qcow2 20G;done 
      for i in 5 6 7; do qemu-img create -f raw <mount-point>/appvm${i}.img 20G;done 

7. Create 7 VMs running RHEL6.4, each having images created in step 6
(i.e) for i in 1 2 3 4; do virt-install --name appvm$i --ram 2048 --vcpus 2 --pxe --disk path=/mnt/dr-vmstore/appvm${i}.qcow2,format=qcow2,bus=virtio,device=disk,io=threads --vnc

8. Execute the script, https://bugzilla.redhat.com/show_bug.cgi?id=983737#c3, which simulates IO running inside VM

9. Added a pair of bricks on RHS server
(i.e) gluster volume add-brick <vol-name> <brick1> <brick2>

10. Start the rebalance
(i.e) gluster volume rebalance <vol-name> start

11. check for rebalance status
(i.e) gluster volume rebalance <vol-name> status

Additional Information
======================
1. RHS Nodes
============
10.70.37.188
10.70.37.73
10.70.37.106
10.70.37.204

2. Client [ RHEL 6.4 ]
======================
10.70.36.32 - rhs-client8.lab.eng.blr.redhat.com

3. Mount point
===============
volume is mounted at, /mnt/dr-vmstore

4. All commands are executed from the RHS Node - 10.70.37.188

5. Console logs on Client
=========================
[Wed Aug  7 08:14:23 UTC 2013 root.36.32:~ ] # mkdir /mnt/dr-vmstore

[Wed Aug  7 08:14:29 UTC 2013 root.36.32:~ ] # ls -ld /mnt/dr-vmstore
drwxr-xr-x. 2 root root 4096 Aug  7 13:44 /mnt/dr-vmstore

[Wed Aug  7 08:14:34 UTC 2013 root.36.32:~ ] # mount.glusterfs 10.70.37.188:dr-vmstore /mnt/dr-vmstore

[Wed Aug  7 08:15:08 UTC 2013 root.36.32:~ ] # ls -ld /mnt/dr-vmstore
drwxr-xr-x. 3 qemu qemu 138 Aug  7 13:33 /mnt/dr-vmstore

[Wed Aug  7 08:15:11 UTC 2013 root.36.32:~ ] # df -Th
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/mapper/vg_rhsclient8-lv_root
              ext4     50G  3.2G   44G   7% /
tmpfs        tmpfs    7.8G     0  7.8G   0% /dev/shm
/dev/sda1     ext4    485M   65M  396M  14% /boot
/dev/mapper/vg_rhsclient8-lv_home
              ext4    1.8T  7.7G  1.7T   1% /home
10.70.37.188:dr-vmstore
    fuse.glusterfs    510G  206M  510G   1% /mnt/dr-vmstore

[Wed Aug  7 08:16:01 UTC 2013 root.36.32:~ ] # mount
/dev/mapper/vg_rhsclient8-lv_root on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0")
/dev/sda1 on /boot type ext4 (rw)
/dev/mapper/vg_rhsclient8-lv_home on /home type ext4 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
10.70.37.188:dr-vmstore on /mnt/dr-vmstore type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)

[Wed Aug  7 08:16:05 UTC 2013 root.36.32:~ ] # ps aux | grep dr-vmstore
root     14875  0.2  0.4 336800 71748 ?        Ssl  13:45   0:00 /usr/sbin/glusterfs --volfile-id=dr-vmstore --volfile-server=10.70.37.188 /mnt/dr-vmstore
root     14900  0.0  0.0 103244   856 pts/0    S+   13:46   0:00 grep dr-vmstore

6. RHS Console logs on 10.70.37.188
===================================
[Wed Aug  7 08:13:51 UTC 2013 satheesaran@unused:~/sas/simplefs ] # ssh root.37.188
Last login: Wed Aug  7 07:26:57 2013 from dhcp-0-179.blr.redhat.com
[Wed Aug  7 09:22:39 UTC 2013 root.37.188:~ ] #
[Wed Aug  7 09:22:40 UTC 2013 root.37.188:~ ] #
[Wed Aug  7 09:22:40 UTC 2013 root.37.188:~ ] # clear

[Wed Aug  7 09:22:41 UTC 2013 root.37.188:~ ] # gluster volume info dr-vmstore

Volume Name: dr-vmstore
Type: Distributed-Replicate
Volume ID: 6b5b534a-d714-4d6e-91b2-71423a8ff7f1
Status: Started
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.37.188:/rhs/brick1/dir1
Brick2: 10.70.37.73:/rhs/brick1/dir1
Brick3: 10.70.37.188:/rhs/brick2/dir2
Brick4: 10.70.37.73:/rhs/brick2/dir2
Brick5: 10.70.37.188:/rhs/brick3/dir3
Brick6: 10.70.37.73:/rhs/brick3/dir3
Brick7: 10.70.37.106:/rhs/brick1/dir1
Brick8: 10.70.37.204:/rhs/brick1/dir1
Brick9: 10.70.37.106:/rhs/brick2/dir2
Brick10: 10.70.37.204:/rhs/brick2/dir2
Brick11: 10.70.37.106:/rhs/brick3/dir3
Brick12: 10.70.37.204:/rhs/brick3/dir3
Options Reconfigured:
storage.owner-gid: 107
storage.owner-uid: 107
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off

[Wed Aug  7 09:22:51 UTC 2013 root.37.188:~ ] # gluster volume status dr-vmstore
Status of volume: dr-vmstore
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick 10.70.37.188:/rhs/brick1/dir1                     49152   Y       4657
Brick 10.70.37.73:/rhs/brick1/dir1                      49152   Y       2865
Brick 10.70.37.188:/rhs/brick2/dir2                     49153   Y       4668
Brick 10.70.37.73:/rhs/brick2/dir2                      49153   Y       2876
Brick 10.70.37.188:/rhs/brick3/dir3                     49154   Y       4679
Brick 10.70.37.73:/rhs/brick3/dir3                      49154   Y       2887
Brick 10.70.37.106:/rhs/brick1/dir1                     49152   Y       4405
Brick 10.70.37.204:/rhs/brick1/dir1                     49152   Y       4364
Brick 10.70.37.106:/rhs/brick2/dir2                     49153   Y       4416
Brick 10.70.37.204:/rhs/brick2/dir2                     49153   Y       4375
Brick 10.70.37.106:/rhs/brick3/dir3                     49154   Y       4427
Brick 10.70.37.204:/rhs/brick3/dir3                     49154   Y       4387
NFS Server on localhost                                 2049    Y       4752
Self-heal Daemon on localhost                           N/A     Y       4700
NFS Server on 10.70.37.204                              2049    Y       4454
Self-heal Daemon on 10.70.37.204                        N/A     Y       4407
NFS Server on 10.70.37.106                              2049    Y       4497
Self-heal Daemon on 10.70.37.106                        N/A     Y       4448
NFS Server on 10.70.37.73                               2049    Y       2954
Self-heal Daemon on 10.70.37.73                         N/A     Y       2906

There are no active volume tasks
[Wed Aug  7 09:23:02 UTC 2013 root.37.188:~ ] # gluster volume add-brick dr-vmstore 10.70.37.188:/rhs/brick4/newbrick1 10.70.37.73:/rhs/brick4/newbrick1
volume add-brick: success
[Wed Aug  7 09:24:46 UTC 2013 root.37.188:~ ] # gluster volume rebalance dr-vmstore start
volume rebalance: dr-vmstore: success: Starting rebalance on volume dr-vmstore has been successful.
ID: bfd9c472-2e89-4f88-b7e1-76bf427f2ef2
[Wed Aug  7 09:25:17 UTC 2013 root.37.188:~ ] # gluster volume rebalance dr-vmstore status
                                    Node Rebalanced-files          size       scanned      failures       skipped         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------   ------------   --------------
                               localhost                0        0Bytes             1             0             0    in progress             9.00
                             10.70.37.73                0        0Bytes             7             0             0      completed             0.00
                            10.70.37.106                0        0Bytes             5             0             0    in progress             9.00
                            10.70.37.204                0        0Bytes             7             0             0      completed             2.00
volume rebalance: dr-vmstore: success:

[Wed Aug  7 09:25:26 UTC 2013 root.37.188:~ ] # gluster volume status dr-vmstore
Status of volume: dr-vmstore
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick 10.70.37.188:/rhs/brick1/dir1                     49152   Y       4657
Brick 10.70.37.73:/rhs/brick1/dir1                      49152   Y       2865
Brick 10.70.37.188:/rhs/brick2/dir2                     49153   Y       4668
Brick 10.70.37.73:/rhs/brick2/dir2                      49153   Y       2876
Brick 10.70.37.188:/rhs/brick3/dir3                     49154   Y       4679
Brick 10.70.37.73:/rhs/brick3/dir3                      49154   Y       2887
Brick 10.70.37.106:/rhs/brick1/dir1                     49152   Y       4405
Brick 10.70.37.204:/rhs/brick1/dir1                     49152   Y       4364
Brick 10.70.37.106:/rhs/brick2/dir2                     49153   Y       4416
Brick 10.70.37.204:/rhs/brick2/dir2                     49153   Y       4375
Brick 10.70.37.106:/rhs/brick3/dir3                     49154   Y       4427
Brick 10.70.37.204:/rhs/brick3/dir3                     49154   Y       4387
Brick 10.70.37.188:/rhs/brick4/newbrick1                49156   Y       32034
Brick 10.70.37.73:/rhs/brick4/newbrick1                 49155   Y       11254
NFS Server on localhost                                 2049    Y       32046
Self-heal Daemon on localhost                           N/A     Y       32053
NFS Server on 10.70.37.204                              2049    Y       31672
Self-heal Daemon on 10.70.37.204                        N/A     Y       31679
NFS Server on 10.70.37.73                               2049    Y       11710
Self-heal Daemon on 10.70.37.73                         N/A     Y       12086
NFS Server on 10.70.37.106                              2049    Y       31713
Self-heal Daemon on 10.70.37.106                        N/A     Y       31720

           Task                                      ID         Status
           ----                                      --         ------
      Rebalance    bfd9c472-2e89-4f88-b7e1-76bf427f2ef2              1

[Wed Aug  7 09:25:46 UTC 2013 root.37.188:~ ] # gluster volume info dr-vmstore

Volume Name: dr-vmstore
Type: Distributed-Replicate
Volume ID: 6b5b534a-d714-4d6e-91b2-71423a8ff7f1
Status: Started
Number of Bricks: 7 x 2 = 14
Transport-type: tcp
Bricks:
Brick1: 10.70.37.188:/rhs/brick1/dir1
Brick2: 10.70.37.73:/rhs/brick1/dir1
Brick3: 10.70.37.188:/rhs/brick2/dir2
Brick4: 10.70.37.73:/rhs/brick2/dir2
Brick5: 10.70.37.188:/rhs/brick3/dir3
Brick6: 10.70.37.73:/rhs/brick3/dir3
Brick7: 10.70.37.106:/rhs/brick1/dir1
Brick8: 10.70.37.204:/rhs/brick1/dir1
Brick9: 10.70.37.106:/rhs/brick2/dir2
Brick10: 10.70.37.204:/rhs/brick2/dir2
Brick11: 10.70.37.106:/rhs/brick3/dir3
Brick12: 10.70.37.204:/rhs/brick3/dir3
Brick13: 10.70.37.188:/rhs/brick4/newbrick1
Brick14: 10.70.37.73:/rhs/brick4/newbrick1
Options Reconfigured:
storage.owner-gid: 107
storage.owner-uid: 107
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off

7.Partition information for all VMs 
=====================================
NAME                         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda                          252:0    0   20G  0 disk 
├─vda1                       252:1    0  500M  0 part /boot
└─vda2                       252:2    0 19.5G  0 part 
  ├─vg_appvm2-lv_root (dm-0) 253:0    0 15.6G  0 lvm  /
  └─vg_appvm2-lv_swap (dm-1) 253:1    0    4G  0 lvm  [SWAP]

8. Screenshot of VMs showing up EXT4 Error is attached

Rebalance is still in progress, will be uploading sosreports its completed

Comment 15 SATHEESARAN 2013-08-07 10:53:50 UTC

Created attachment 783801 [details]
EXT4 corruption as seen in appvm1

Comment 16 SATHEESARAN 2013-08-07 10:54:22 UTC

Created attachment 783802 [details]
EXT4 corruption as seen in appvm5

Comment 17 SATHEESARAN 2013-08-07 11:04:29 UTC

Observation
===========

I see some errors in fuse mount log (/var/log/glusterfs/mnt-dr-vmstore.log)

013-08-07 11:08:26.158105] W [fuse-bridge.c:2695:fuse_writev_cbk] 0-glusterfs-fuse: 2898608: WRITE => -1 (Bad file descriptor)
[2013-08-07 11:08:26.163813] W [client-rpc-fops.c:866:client3_3_writev_cbk] 1-dr-vmstore-client-6: remote operation failed: Bad file descriptor
[2013-08-07 11:08:26.167022] W [client-rpc-fops.c:866:client3_3_writev_cbk] 1-dr-vmstore-client-7: remote operation failed: Bad file descriptor
[2013-08-07 11:08:26.167070] W [fuse-bridge.c:2695:fuse_writev_cbk] 0-glusterfs-fuse: 2898613: WRITE => -1 (Bad file descriptor)
[2013-08-07 11:08:26.172347] W [client-rpc-fops.c:866:client3_3_writev_cbk] 1-dr-vmstore-client-6: remote operation failed: Bad file descriptor
[2013-08-07 11:08:26.190515] W [client-rpc-fops.c:866:client3_3_writev_cbk] 1-dr-vmstore-client-7: remote operation failed: Bad file descriptor
[2013-08-07 11:08:26.190553] W [fuse-bridge.c:2695:fuse_writev_cbk] 0-glusterfs-fuse: 2898616: WRITE => -1 (Bad file descriptor)

Comment 18 SATHEESARAN 2013-08-07 16:15:35 UTC

Additional Information
======================

1. Files in all bricks
=======================
NODE 1
======
[Wed Aug  7 15:53:09 UTC 2013 root.37.188:~ ] # ls -Rl /rhs/brick*
/rhs/brick1:
total 0
drwxr-xr-x 3 qemu qemu 59 Aug  7 10:15 dir1

/rhs/brick1/dir1:
total 24002736
-rw-r--r-- 2 qemu qemu 21478375424 Aug  7 10:29 appvm4.qcow2
-rw-r--r-- 2 qemu qemu 21474836480 Aug  7 10:29 appvm5.img

/rhs/brick2:
total 0
drwxr-xr-x 3 qemu qemu 23 Aug  7 13:28 dir2

/rhs/brick2/dir2:
total 0

/rhs/brick3:
total 0
drwxr-xr-x 3 qemu qemu 23 Aug  7 08:03 dir3

/rhs/brick3/dir3:
total 0

/rhs/brick4:
total 0
drwxr-xr-x 3 qemu qemu 76 Aug  7 09:25 newbrick1

/rhs/brick4/newbrick1:
total 53080120
-rw-r--r-- 2 qemu qemu 21478375424 Aug  7 12:42 appvm2.qcow2
-rw-r--r-- 2 qemu qemu 21474836480 Aug  7 11:30 appvm6.img
-rw-r--r-- 2 qemu qemu 21474836480 Aug  7 11:40 appvm7.img

NODE 2
=======
[Wed Aug  7 15:57:59 UTC 2013 root.37.73:~ ] # ls -Rlh /rhs/brick*
/rhs/brick1:
total 0
drwxr-xr-x 3 qemu qemu 59 Aug  7 10:15 dir1

/rhs/brick1/dir1:
total 23G
-rw-r--r-- 2 qemu qemu 21G Aug  7 10:29 appvm4.qcow2
-rw-r--r-- 2 qemu qemu 20G Aug  7 10:29 appvm5.img

/rhs/brick2:
total 0
drwxr-xr-x 3 qemu qemu 23 Aug  7 13:28 dir2

/rhs/brick2/dir2:
total 0

/rhs/brick3:
total 0
drwxr-xr-x 3 qemu qemu 23 Aug  7 08:03 dir3

/rhs/brick3/dir3:
total 0

/rhs/brick4:
total 0
drwxr-xr-x 3 qemu qemu 76 Aug  7 09:25 newbrick1

/rhs/brick4/newbrick1:
total 51G
-rw-r--r-- 2 qemu qemu 21G Aug  7 12:42 appvm2.qcow2
-rw-r--r-- 2 qemu qemu 20G Aug  7 11:30 appvm6.img
-rw-r--r-- 2 qemu qemu 20G Aug  7 11:40 appvm7.img

NODE 3
======
[Wed Aug  7 15:58:00 UTC 2013 root.37.106:~ ] # ls -Rlh /rhs/brick*
/rhs/brick1:
total 0
drwxr-xr-x 3 qemu qemu 61 Aug  7 12:26 dir1

/rhs/brick1/dir1:
total 24G
-rw-r--r-- 2 qemu qemu 21G Aug  7 09:25 appvm1.qcow2
-rw-r--r-- 2 qemu qemu 21G Aug  7 12:26 appvm3.qcow2

/rhs/brick2:
total 0
drwxr-xr-x 3 qemu qemu 23 Aug  7 08:03 dir2

/rhs/brick2/dir2:
total 0

/rhs/brick3:
total 0
drwxr-xr-x 3 qemu qemu 23 Aug  7 13:06 dir3

/rhs/brick3/dir3:
total 0

/rhs/brick4:
total 0

NODE 4
=======
[Wed Aug  7 15:57:59 UTC 2013 root.37.204:~ ] # ls -Rlh /rhs/brick*
/rhs/brick1:
total 0
drwxr-xr-x 3 qemu qemu 61 Aug  7 12:26 dir1

/rhs/brick1/dir1:
total 24G
-rw-r--r-- 2 qemu qemu 21G Aug  7 09:25 appvm1.qcow2
-rw-r--r-- 2 qemu qemu 21G Aug  7 12:26 appvm3.qcow2

/rhs/brick2:
total 0
drwxr-xr-x 3 qemu qemu 23 Aug  7 08:03 dir2

/rhs/brick2/dir2:
total 0

/rhs/brick3:
total 0
drwxr-xr-x 3 qemu qemu 23 Aug  7 13:06 dir3

/rhs/brick3/dir3:
total 0

/rhs/brick4:
total 0

2. xattrs on all bricks
========================

NODE-1
======
[Wed Aug  7 16:05:06 UTC 2013 root.37.188:~ ] # for i in 1 2 3 ;do getfattr -d -m . -e hex /rhs/brick${i}/dir${i};done
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/dir1
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x0000000100000000db6db6d8ffffffff
trusted.glusterfs.volume-id=0x6b5b534ad7144d6e91b271423a8ff7f1

getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/dir2
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000092492490b6db6db3
trusted.glusterfs.volume-id=0x6b5b534ad7144d6e91b271423a8ff7f1

getfattr: Removing leading '/' from absolute path names
# file: rhs/brick3/dir3
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x0000000100000000b6db6db4db6db6d7
trusted.glusterfs.volume-id=0x6b5b534ad7144d6e91b271423a8ff7f1

[Wed Aug  7 16:08:13 UTC 2013 root.37.188:~ ] # getfattr -d -m . -e hex /rhs/brick4/newbrick1/
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick4/newbrick1/
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x00000001000000006db6db6c9249248f
trusted.glusterfs.volume-id=0x6b5b534ad7144d6e91b271423a8ff7f1

NODE-2
======
[Wed Aug  7 16:05:05 UTC 2013 root.37.73:~ ] # for i in 1 2 3 ;do getfattr -d -m . -e hex /rhs/brick${i}/dir${i};done
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/dir1
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x0000000100000000db6db6d8ffffffff
trusted.glusterfs.volume-id=0x6b5b534ad7144d6e91b271423a8ff7f1

getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/dir2
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000092492490b6db6db3
trusted.glusterfs.volume-id=0x6b5b534ad7144d6e91b271423a8ff7f1

getfattr: Removing leading '/' from absolute path names
# file: rhs/brick3/dir3
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x0000000100000000b6db6db4db6db6d7
trusted.glusterfs.volume-id=0x6b5b534ad7144d6e91b271423a8ff7f1

[Wed Aug  7 16:08:13 UTC 2013 root.37.73:~ ] # getfattr -d -m . -e hex /rhs/brick4/newbrick1/
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick4/newbrick1/
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x00000001000000006db6db6c9249248f
trusted.glusterfs.volume-id=0x6b5b534ad7144d6e91b271423a8ff7f1

NODE-3
=======
[Wed Aug  7 16:05:06 UTC 2013 root.37.106:~ ] # for i in 1 2 3 ;do getfattr -d -m . -e hex /rhs/brick${i}/dir${i};done
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/dir1
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x0000000100000000492492486db6db6b
trusted.glusterfs.volume-id=0x6b5b534ad7144d6e91b271423a8ff7f1

getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/dir2
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x00000001000000000000000024924923
trusted.glusterfs.volume-id=0x6b5b534ad7144d6e91b271423a8ff7f1

getfattr: Removing leading '/' from absolute path names
# file: rhs/brick3/dir3
trusted.afr.dr-vmstore-client-10=0x000000000000000000000000
trusted.afr.dr-vmstore-client-11=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x00000001000000002492492449249247
trusted.glusterfs.volume-id=0x6b5b534ad7144d6e91b271423a8ff7f1

NODE-4
======
[Wed Aug  7 16:05:05 UTC 2013 root.37.204:~ ] # for i in 1 2 3 ;do getfattr -d -m . -e hex /rhs/brick${i}/dir${i};done
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/dir1
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x0000000100000000492492486db6db6b
trusted.glusterfs.volume-id=0x6b5b534ad7144d6e91b271423a8ff7f1

getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/dir2
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x00000001000000000000000024924923
trusted.glusterfs.volume-id=0x6b5b534ad7144d6e91b271423a8ff7f1

getfattr: Removing leading '/' from absolute path names
# file: rhs/brick3/dir3
trusted.afr.dr-vmstore-client-10=0x000000000000000000000000
trusted.afr.dr-vmstore-client-11=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x00000001000000002492492449249247
trusted.glusterfs.volume-id=0x6b5b534ad7144d6e91b271423a8ff7f1

3. Corrupted VMs
================
I could see FS errors in 6 VMs out of 7.  ( except appvm6 )

4. Rebalance status post post its completion :
============================================
[Wed Aug  7 15:07:22 UTC 2013 root.37.188:~ ] # gluster volume rebalance dr-vmstore status
                                    Node Rebalanced-files          size       scanned      failures       skipped         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------   ------------   --------------
                               localhost                4        80.0GB             9             0             0      completed         14585.00
                             10.70.37.73                0        0Bytes             7             0             0      completed             0.00
                            10.70.37.106                3        60.0GB            10             0             0      completed         13278.00
                            10.70.37.204                0        0Bytes             7             0             0      completed             2.00
volume rebalance: dr-vmstore: success: 

5.Since sosreports are more than 20MB, I have copied sosreports to, http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/988262-1/

6. Rebooting the VM, which was affected with EXT4 FS IO errors, boots up post checking filesystems. Attached the screenshot of the same

Comment 19 SATHEESARAN 2013-08-07 16:17:22 UTC

Created attachment 783989 [details]
Filesystem checking, while rebooting application VM affected with EXT4 FS IO errors while rebalance

Comment 20 Amar Tumballi 2013-08-13 18:47:17 UTC

https://code.engineering.redhat.com/gerrit/11398 fixes the issue.

Comment 21 SATHEESARAN 2013-08-20 07:29:33 UTC

Created attachment 788342 [details]
tar-ed sosreports

Comment 22 SATHEESARAN 2013-08-20 07:33:31 UTC

I tested this bug with glusterfs-3.4.0.20rhs-2 and there were no EXT4 IO Errors on Application VMs. But I see 2 issues,

1. https://bugzilla.redhat.com/show_bug.cgi?id=979901. Is this still expected in BigBend

2. Migration took a long time, rather than usual in my setup, which is about ~9.5 hours for 80GB.

I am providing info for issue 2,

[Tue Aug 20 06:11:52 UTC 2013 root.37.216:~ ] # gluster volume status
Status of volume: dr-vmstore
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick 10.70.37.216:/rhs/brick1/dir1                     49152   Y       3007
Brick 10.70.37.98:/rhs/brick1/dir1                      49152   Y       2982
Brick 10.70.37.216:/rhs/brick2/dir2                     49153   Y       3018
Brick 10.70.37.98:/rhs/brick2/dir2                      49153   Y       2993
Brick 10.70.37.216:/rhs/brick3/dir3                     49154   Y       3029
Brick 10.70.37.98:/rhs/brick3/dir3                      49154   Y       3004
Brick 10.70.37.123:/rhs/brick1/dir1                     49152   Y       2924
Brick 10.70.37.164:/rhs/brick1/dir1                     49152   Y       2976
Brick 10.70.37.123:/rhs/brick2/dir2                     49153   Y       2935
Brick 10.70.37.164:/rhs/brick2/dir2                     49153   Y       2987
Brick 10.70.37.123:/rhs/brick3/dir3                     49154   Y       2946
Brick 10.70.37.164:/rhs/brick3/dir3                     49154   Y       2998
Brick 10.70.37.216:/rhs/brick4/newdir1                  49155   Y       3817
Brick 10.70.37.98:/rhs/brick4/newdir1                   49155   Y       3771
NFS Server on localhost                                 2049    Y       3829
Self-heal Daemon on localhost                           N/A     Y       3845
NFS Server on 10.70.37.164                              2049    Y       3961
Self-heal Daemon on 10.70.37.164                        N/A     Y       3968
NFS Server on 10.70.37.123                              2049    Y       3912
Self-heal Daemon on 10.70.37.123                        N/A     Y       3919
NFS Server on 10.70.37.98                               2049    Y       3783
Self-heal Daemon on 10.70.37.98                         N/A     Y       3790
 
           Task                                      ID         Status
           ----                                      --         ------
      Rebalance    190512d5-1a68-49ad-9db4-13162a642bd6              3
[Tue Aug 20 06:51:11 UTC 2013 root.37.216:~ ] # gluster volume info                                                                                               
                                                                                                                                                                        
Volume Name: dr-vmstore                                                                                                                                                 
Type: Distributed-Replicate                                                                                                                                             
Volume ID: 28ce4337-c3a3-4f2f-9db7-b0e5c2c57959
Status: Started
Number of Bricks: 7 x 2 = 14
Transport-type: tcp
Bricks:
Brick1: 10.70.37.216:/rhs/brick1/dir1
Brick2: 10.70.37.98:/rhs/brick1/dir1
Brick3: 10.70.37.216:/rhs/brick2/dir2
Brick4: 10.70.37.98:/rhs/brick2/dir2
Brick5: 10.70.37.216:/rhs/brick3/dir3
Brick6: 10.70.37.98:/rhs/brick3/dir3
Brick7: 10.70.37.123:/rhs/brick1/dir1
Brick8: 10.70.37.164:/rhs/brick1/dir1
Brick9: 10.70.37.123:/rhs/brick2/dir2
Brick10: 10.70.37.164:/rhs/brick2/dir2
Brick11: 10.70.37.123:/rhs/brick3/dir3
Brick12: 10.70.37.164:/rhs/brick3/dir3
Brick13: 10.70.37.216:/rhs/brick4/newdir1
Brick14: 10.70.37.98:/rhs/brick4/newdir1
Options Reconfigured:
storage.owner-gid: 107
storage.owner-uid: 107
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off

[Tue Aug 20 06:51:18 UTC 2013 root.37.216:~ ] # gluster volume rebalance dr-vmstore status
                                    Node Rebalanced-files          size       scanned      failures       skipped         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------   ------------   --------------
                               localhost                4        80.0GB             9             0             0      completed         34151.00
                            10.70.37.164                0        0Bytes             7             0             0      completed             0.00
                            10.70.37.123                3        60.0GB             9             0             0      completed         23658.00
                             10.70.37.98                0        0Bytes             7             0             0      completed             0.00
volume rebalance: dr-vmstore: success: 

RHS Node Details
================
10.70.37.216
10.70.37.98
10.70.37.123
10.70.37.164

RHEL 6.4 Client : 10.70.36.32
gluster volume fuse mounted at : /mnt/dr-vmstore
sosreports are attached and also available @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/988262-2/

Is this duration of ~9.5 hours for rebalance for 80GB , is expected ?


Setting NEEDINFO to address above mentioned Query

Comment 23 SATHEESARAN 2013-08-20 07:46:22 UTC

Marking this bug as, VERIFIED, as there no more EXT4 FS IO Errors seen in Application VMs, while rebalance of glusterfs volume is in progress.

The issue as mentioned in comment, comment22, will be handled probably with seperate bug.

Tested with glusterfs-3.4.0.20rhs-2

Comment 24 Scott Haines 2013-09-23 22:29:50 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.