Description of problem: Towards the end of a remove-brick operation on distribute-replicate volume, for VM image store on RHEV, VM got paused, and had to be manually recovered. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1.add distribute-replicate volume to RHEV as posixfs storage domain 2.create and run VMS on the storage domain 3.perform remove-brick operation start 4.access VM function till remove-brick commit Actual results: VM got paused, and had to be manually recovered Expected results: Functioning of VM should not be impacted during the operation. Additional info:
Rejy, Could you please mention the steps used to manually recover the VM ?
(In reply to comment #2) > Rejy, > > Could you please mention the steps used to manually recover the VM ? The VM remained in paused state, till I manually clicked the option to run the VM, and then it came out of paused state. - rejy (rmc)
Version Info: RHEV-M : 3.1.0-49.el6ev Hypervisors: RHEV-H 6.4 (20130306.2.el6_4) RHEL 6.4 RHEL 6.3 RHS servers: RHS-2.0-20130317.0-RHS-x86_64-DVD1.iso gluster related rpms: glusterfs-fuse-3.3.0.6rhs-4.el6rhs.x86_64 vdsm-gluster-4.9.6-19.el6rhs.noarch gluster-swift-plugin-1.0-5.noarch gluster-swift-account-1.4.8-5.el6rhs.noarch glusterfs-3.3.0.6rhs-4.el6rhs.x86_64 org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch glusterfs-server-3.3.0.6rhs-4.el6rhs.x86_64 glusterfs-rdma-3.3.0.6rhs-4.el6rhs.x86_64 gluster-swift-object-1.4.8-5.el6rhs.noarch gluster-swift-container-1.4.8-5.el6rhs.noarch gluster-swift-doc-1.4.8-5.el6rhs.noarch gluster-swift-1.4.8-5.el6rhs.noarch gluster-swift-proxy-1.4.8-5.el6rhs.noarch glusterfs-geo-replication-3.3.0.6rhs-4.el6rhs.x86_64 I have the sosreport from RHEV-M, Hypervisors, and RHS servers if required.
Please provide the sos reports
(In reply to comment #5) > Please provide the sos reports sosreport attached from all systems that were part of the environment - rejy (rmc)
(In reply to comment #14) > (In reply to comment #5) > > Please provide the sos reports > > sosreport attached from all systems that were part of the environment > > - rejy (rmc) I have to warn you that there may be a lot of information in the logs from the system, since the issue occurred towards the end of 100+ test-cases test run. Look towards the latter part of the logs for information regarding this issue. --------- The next round of the same test on RHS-2.0-20130320.2-RHS-x86_64-DVD1.iso (glusterfs*-3.3.0.7rhs-1.el6rhs.x86_64) led to the Data Center and the Storage being unavailable for a long time, with the VMs initially accessible, but not accessible after shutdown. This issue is reported at Bug 928054 - rejy (rmc)
The following are log snippets from the sos-reports, <snip1> rhel6.4/var/log/glusterfs/rhev-data-center-mnt-rhs-client45.lab.eng.blr.redhat.com:_RHS__VM__imagestore.log:3317:[2013-03-20 10:31:13.253518] W [client3_1-fops.c:1120:client3_1_getxattr_cbk] 0-RHS_VM_imagestore-client-0: remote operation failed: Permission denied. Path: /b949cede-515e-483f-a15a-4983e2e5241c/images/94af07c1-3b89-4403-9b89-cdaac51a4c8f/d8320b26-d471-4a79-b6d9-38022b461f95.meta (8bde5e62-bd6d-4cd4-ac33-49d4747f3183). Key: trusted.glusterfs.dht.linkto rhel6.4/var/log/glusterfs/rhev-data-center-mnt-rhs-client45.lab.eng.blr.redhat.com:_RHS__VM__imagestore.log:3318:[2013-03-20 10:31:13.253941] W [client3_1-fops.c:1120:client3_1_getxattr_cbk] 0-RHS_VM_imagestore-client-1: remote operation failed: Permission denied. Path: /b949cede-515e-483f-a15a-4983e2e5241c/images/94af07c1-3b89-4403-9b89-cdaac51a4c8f/d8320b26-d471-4a79-b6d9-38022b461f95.meta (8bde5e62-bd6d-4cd4-ac33-49d4747f3183). Key: trusted.glusterfs.dht.linkto rhel6.4/var/log/glusterfs/rhev-data-center-mnt-rhs-client45.lab.eng.blr.redhat.com:_RHS__VM__imagestore.log:3319:[2013-03-20 10:31:13.254023] E [dht-helper.c:652:dht_migration_complete_check_task] 0-RHS_VM_imagestore-dht: /b949cede-515e-483f-a15a-4983e2e5241c/images/94af07c1-3b89-4403-9b89-cdaac51a4c8f/d8320b26-d471-4a79-b6d9-38022b461f95.meta: failed to get the 'linkto' xattr Permission denied rhel6.4/var/log/glusterfs/rhev-data-center-mnt-rhs-client45.lab.eng.blr.redhat.com:_RHS__VM__imagestore.log:3321:[2013-03-20 10:31:13.255275] W [client3_1-fops.c:1120:client3_1_getxattr_cbk] 0-RHS_VM_imagestore-client-0: remote operation failed: Permission denied. Path: /b949cede-515e-483f-a15a-4983e2e5241c/images/94af07c1-3b89-4403-9b89-cdaac51a4c8f/d8320b26-d471-4a79-b6d9-38022b461f95.meta (8bde5e62-bd6d-4cd4-ac33-49d4747f3183). Key: trusted.glusterfs.dht.linkto rhel6.4/var/log/glusterfs/rhev-data-center-mnt-rhs-client45.lab.eng.blr.redhat.com:_RHS__VM__imagestore.log:3322:[2013-03-20 10:31:13.255708] W [client3_1-fops.c:1120:client3_1_getxattr_cbk] 0-RHS_VM_imagestore-client-1: remote operation failed: Permission denied. Path: /b949cede-515e-483f-a15a-4983e2e5241c/images/94af07c1-3b89-4403-9b89-cdaac51a4c8f/d8320b26-d471-4a79-b6d9-38022b461f95.meta (8bde5e62-bd6d-4cd4-ac33-49d4747f3183). Key: trusted.glusterfs.dht.linkto rhel6.4/var/log/glusterfs/rhev-data-center-mnt-rhs-client45.lab.eng.blr.redhat.com:_RHS__VM__imagestore.log:3323:[2013-03-20 10:31:13.255761] E [dht-helper.c:652:dht_migration_complete_check_task] 0-RHS_VM_imagestore-dht: /b949cede-515e-483f-a15a-4983e2e5241c/images/94af07c1-3b89-4403-9b89-cdaac51a4c8f/d8320b26-d471-4a79-b6d9-38022b461f95.meta: failed to get the 'linkto' xattr Permission denied rhel6.4/var/log/glusterfs/rhev-data-center-mnt-rhs-client45.lab.eng.blr.redhat.com:_RHS__VM__imagestore.log:3349:[2013-03-20 10:32:13.359447] W [client3_1-fops.c:1120:client3_1_getxattr_cbk] 0-RHS_VM_imagestore-client-0: remote operation failed: Permission denied. Path: /b949cede-515e-483f-a15a-4983e2e5241c/images/94af07c1-3b89-4403-9b89-cdaac51a4c8f/d8320b26-d471-4a79-b6d9-38022b461f95.meta (8bde5e62-bd6d-4cd4-ac33-49d4747f3183). Key: trusted.glusterfs.dht.linkto rhel6.4/var/log/glusterfs/rhev-data-center-mnt-rhs-client45.lab.eng.blr.redhat.com:_RHS__VM__imagestore.log:3350:[2013-03-20 10:32:13.359962] W [client3_1-fops.c:1120:client3_1_getxattr_cbk] 0-RHS_VM_imagestore-client-1: remote operation failed: Permission denied. Path: /b949cede-515e-483f-a15a-4983e2e5241c/images/94af07c1-3b89-4403-9b89-cdaac51a4c8f/d8320b26-d471-4a79-b6d9-38022b461f95.meta (8bde5e62-bd6d-4cd4-ac33-49d4747f3183). Key: trusted.glusterfs.dht.linkto rhel6.4/var/log/glusterfs/rhev-data-center-mnt-rhs-client45.lab.eng.blr.redhat.com:_RHS__VM__imagestore.log:3351:[2013-03-20 10:32:13.360090] E [dht-helper.c:652:dht_migration_complete_check_task] 0-RHS_VM_imagestore-dht: /b949cede-515e-483f-a15a-4983e2e5241c/images/94af07c1-3b89-4403-9b89-cdaac51a4c8f/d8320b26-d471-4a79-b6d9-38022b461f95.meta: failed to get the 'linkto' xattr Permission denied</snip1> <snip2> rhevh6.4/var/log/glusterfs/rhev-data-center-mnt-rhs-client45.lab.eng.blr.redhat.com:_RHS__VM__imagestore.log:2500:[2013-03-20 05:01:20.722058] W [client3_1-fops.c:1120:client3_1_getxattr_cbk] 0-RHS_VM_imagestore-client-1: remote operation failed: Permission denied. Path: /b949cede-515e-483f-a15a-4983e2e5241c/images/94af07c1-3b89-4403-9b89-cdaac51a4c8f/d8320b26-d471-4a79-b6d9-38022b461f95.meta (8bde5e62-bd6d-4cd4-ac33-49d4747f3183). Key: trusted.glusterfs.dht.linkto rhevh6.4/var/log/glusterfs/rhev-data-center-mnt-rhs-client45.lab.eng.blr.redhat.com:_RHS__VM__imagestore.log:2501:[2013-03-20 05:01:20.722634] W [client3_1-fops.c:1120:client3_1_getxattr_cbk] 0-RHS_VM_imagestore-client-0: remote operation failed: Permission denied. Path: /b949cede-515e-483f-a15a-4983e2e5241c/images/94af07c1-3b89-4403-9b89-cdaac51a4c8f/d8320b26-d471-4a79-b6d9-38022b461f95.meta (8bde5e62-bd6d-4cd4-ac33-49d4747f3183). Key: trusted.glusterfs.dht.linkto rhevh6.4/var/log/glusterfs/rhev-data-center-mnt-rhs-client45.lab.eng.blr.redhat.com:_RHS__VM__imagestore.log:2502:[2013-03-20 05:01:20.722734] E [dht-helper.c:652:dht_migration_complete_check_task] 0-RHS_VM_imagestore-dht: /b949cede-515e-483f-a15a-4983e2e5241c/images/94af07c1-3b89-4403-9b89-cdaac51a4c8f/d8320b26-d471-4a79-b6d9-38022b461f95.meta: failed to get the 'linkto' xattr Permission denied rhevh6.4/var/log/glusterfs/rhev-data-center-mnt-rhs-client45.lab.eng.blr.redhat.com:_RHS__VM__imagestore.log:2504:[2013-03-20 05:01:20.723463] W [client3_1-fops.c:1120:client3_1_getxattr_cbk] 0-RHS_VM_imagestore-client-1: remote operation failed: Permission denied. Path: /b949cede-515e-483f-a15a-4983e2e5241c/images/94af07c1-3b89-4403-9b89-cdaac51a4c8f/d8320b26-d471-4a79-b6d9-38022b461f95.meta (8bde5e62-bd6d-4cd4-ac33-49d4747f3183). Key: trusted.glusterfs.dht.linkto rhevh6.4/var/log/glusterfs/rhev-data-center-mnt-rhs-client45.lab.eng.blr.redhat.com:_RHS__VM__imagestore.log:2505:[2013-03-20 05:01:20.723943] W [client3_1-fops.c:1120:client3_1_getxattr_cbk] 0-RHS_VM_imagestore-client-0: remote operation failed: Permission denied. Path: /b949cede-515e-483f-a15a-4983e2e5241c/images/94af07c1-3b89-4403-9b89-cdaac51a4c8f/d8320b26-d471-4a79-b6d9-38022b461f95.meta (8bde5e62-bd6d-4cd4-ac33-49d4747f3183). Key: trusted.glusterfs.dht.linkto rhevh6.4/var/log/glusterfs/rhev-data-center-mnt-rhs-client45.lab.eng.blr.redhat.com:_RHS__VM__imagestore.log:2506:[2013-03-20 05:01:20.723992] E [dht-helper.c:652:dht_migration_complete_check_task] 0-RHS_VM_imagestore-dht: /b949cede-515e-483f-a15a-4983e2e5241c/images/94af07c1-3b89-4403-9b89-cdaac51a4c8f/d8320b26-d471-4a79-b6d9-38022b461f95.meta: failed to get the 'linkto' xattr Permission denied rhevh6.4/var/log/glusterfs/rhev-data-center-mnt-rhs-client45.lab.eng.blr.redhat.com:_RHS__VM__imagestore.log:2507:[2013-03-20 05:02:20.835389] W [client3_1-fops.c:1120:client3_1_getxattr_cbk] 0-RHS_VM_imagestore-client-1: remote operation failed: Permission denied. Path: /b949cede-515e-483f-a15a-4983e2e5241c/images/94af07c1-3b89-4403-9b89-cdaac51a4c8f/d8320b26-d471-4a79-b6d9-38022b461f95.meta (8bde5e62-bd6d-4cd4-ac33-49d4747f3183). Key: trusted.glusterfs.dht.linkto rhevh6.4/var/log/glusterfs/rhev-data-center-mnt-rhs-client45.lab.eng.blr.redhat.com:_RHS__VM__imagestore.log:2508:[2013-03-20 05:02:20.835927] W [client3_1-fops.c:1120:client3_1_getxattr_cbk] 0-RHS_VM_imagestore-client-0: remote operation failed: Permission denied. Path: /b949cede-515e-483f-a15a-4983e2e5241c/images/94af07c1-3b89-4403-9b89-cdaac51a4c8f/d8320b26-d471-4a79-b6d9-38022b461f95.meta (8bde5e62-bd6d-4cd4-ac33-49d4747f3183). Key: trusted.glusterfs.dht.linkto rhevh6.4/var/log/glusterfs/rhev-data-center-mnt-rhs-client45.lab.eng.blr.redhat.com:_RHS__VM__imagestore.log:2509:[2013-03-20 05:02:20.835996] E [dht-helper.c:652:dht_migration_complete_check_task] 0-RHS_VM_imagestore-dht: /b949cede-515e-483f-a15a-4983e2e5241c/images/94af07c1-3b89-4403-9b89-cdaac51a4c8f/d8320b26-d471-4a79-b6d9-38022b461f95.meta: failed to get the 'linkto' xattr Permission denied </snip2> From the above we see that the clients are getting permission denied when they perform a getxatter fop for the linkto attr. This was caused by the acl translator incorrectly checking for permissions on an already opened fd. This has been fixed in rhs glusterfs versions 3.3.0.7 as a fix for bug-918567.
Is this bug still reproducible of version 3.3.0.7? Fix for bug-918567 has been merged to handle permission denied issues for getxattr calls.
(In reply to comment #17) > Is this bug still reproducible of version 3.3.0.7? Fix for bug-918567 has > been merged to handle permission denied issues for getxattr calls. The next round of the same test on RHS-2.0-20130320.2-RHS-x86_64-DVD1.iso (glusterfs*-3.3.0.7rhs-1.el6rhs.x86_64) led to the Data Center and the Storage being unavailable for a long time, with the VMs initially accessible, but not accessible after shutdown. This issue is reported at Bug 928054
Issue still reproducible on glusterfs*3.4.0.8rhs-1.el6rhs.x86_64 During the multiple runs of remove-brick operations, it was observed that the VMs were brought to paused state in those operations that involved data migration. Environment: RHEV+RHS RHEVM: 3.2.0-10.21.master.el6ev Hypervisor: RHEL 6.4 RHS: 4 nodes running gluster*3.4.0.8rhs-1.el6rhs.x86_64 Volume Name: RHEV-BigBend_extra --------------------------------------------------------------------- [Thu May 16 17:08:23 root@rhs-client45:~ ] #gluster volume remove-brick RHEV-BigBend_extra rhs-client45.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra rhs-client37.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra start volume remove-brick start: success ID: ec497c4e-4b5d-4761-990e-a6a080797116 [Thu May 16 17:10:27 root@rhs-client45:~ ] #gluster volume remove-brick RHEV-BigBend_extra rhs-client45.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra rhs-client37.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra status Node Rebalanced-files size scanned failures status run-time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 0 completed 0.00 rhs-client37.lab.eng.blr.redhat.com 0 0Bytes 17 0 completed 0.00 rhs-client4.lab.eng.blr.redhat.com 0 0Bytes 0 0 not started 0.00 rhs-client15.lab.eng.blr.redhat.com 0 0Bytes 0 0 not started 0.00 [Thu May 16 17:18:45 root@rhs-client45:~ ] #gluster volume remove-brick RHEV-BigBend_extra rhs-client45.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra rhs-client37.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra commit Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y volume remove-brick commit: success [Thu May 16 17:19:19 root@rhs-client45:~ ] #gluster volume remove-brick RHEV-BigBend_extra rhs-client15.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra rhs-client4.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra start volume remove-brick start: success ID: ce758707-d15d-4a6c-b01c-693ff4e140b6 [Thu May 16 17:19:51 root@rhs-client45:~ ] #gluster volume remove-brick RHEV-BigBend_extra rhs-client15.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra rhs-client4.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra status Node Rebalanced-files size scanned failures status run-time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 0 not started 0.00 rhs-client37.lab.eng.blr.redhat.com 0 0Bytes 0 0 not started 0.00 rhs-client4.lab.eng.blr.redhat.com 0 0Bytes 16 0 completed 0.00 rhs-client15.lab.eng.blr.redhat.com 3 2.0MB 7 0 in progress 9.00 ..... [Thu May 16 17:28:54 root@rhs-client45:~ ] #gluster volume remove-brick RHEV-BigBend_extra rhs-client15.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra rhs-client4.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra status Node Rebalanced-files size scanned failures status run-time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 0 not started 0.00 rhs-client37.lab.eng.blr.redhat.com 0 0Bytes 0 0 not started 0.00 rhs-client4.lab.eng.blr.redhat.com 0 0Bytes 16 0 completed 0.00 rhs-client15.lab.eng.blr.redhat.com 4 5.0GB 16 0 in progress 576.00 [Thu May 16 17:29:27 root@rhs-client45:~ ] #gluster volume remove-brick RHEV-BigBend_extra rhs-client15.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra rhs-client4.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra status Node Rebalanced-files size scanned failures status run-time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 0 not started 0.00 rhs-client37.lab.eng.blr.redhat.com 0 0Bytes 0 0 not started 0.00 rhs-client4.lab.eng.blr.redhat.com 0 0Bytes 16 0 completed 0.00 rhs-client15.lab.eng.blr.redhat.com 6 30.0GB 18 0 completed 648.00 [Thu May 16 17:32:04 root@rhs-client45:~ ] #gluster volume remove-brick RHEV-BigBend_extra rhs-client15.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra rhs-client4.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra commit Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y volume remove-brick commit: success [Thu May 16 17:37:01 root@rhs-client45:~ ] #gluster volume remove-brick RHEV-BigBend_extra rhs-client45.lab.eng.blr.redhat.com:/rhs/brick8/RHEV-BigBend_extra rhs-client37.lab.eng.blr.redhat.com:/rhs/brick8/RHEV-BigBend_extra start volume remove-brick start: success ID: c6231879-f33e-43f3-b1cf-b095fc351149 [Thu May 16 17:44:22 root@rhs-client45:~ ] #gluster volume remove-brick RHEV-BigBend_extra rhs-client15.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra rhs-client4.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra status Node Rebalanced-files size scanned failures status run-time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 0 completed 0.00 rhs-client37.lab.eng.blr.redhat.com 0 0Bytes 16 0 completed 0.00 rhs-client4.lab.eng.blr.redhat.com 0 0Bytes 0 0 not started 0.00 rhs-client15.lab.eng.blr.redhat.com 0 0Bytes 0 0 not started 0.00 [Thu May 16 17:44:34 root@rhs-client45:~ ] #gluster volume remove-brick RHEV-BigBend_extra rhs-client45.lab.eng.blr.redhat.com:/rhs/brick8/RHEV-BigBend_extra rhs-client37.lab.eng.blr.redhat.com:/rhs/brick8/RHEV-BigBend_extra commit Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y volume remove-brick commit: success [Thu May 16 17:45:48 root@rhs-client45:~ ] #gluster volume remove-brick RHEV-BigBend_extra rhs-client15.lab.eng.blr.redhat.com:/rhs/brick8/RHEV-BigBend_extra rhs-client4.lab.eng.blr.redhat.com:/rhs/brick8/RHEV-BigBend_extra start volume remove-brick start: success ID: 0eae71da-3866-4ba6-bc60-32e7134cffb1 [Thu May 16 17:46:26 root@rhs-client45:~ ] #gluster volume remove-brick RHEV-BigBend_extra rhs-client15.lab.eng.blr.redhat.com:/rhs/brick8/RHEV-BigBend_extra rhs-client4.lab.eng.blr.redhat.com:/rhs/brick8/RHEV-BigBend_extra status Node Rebalanced-files size scanned failures status run-time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 0 not started 0.00 rhs-client37.lab.eng.blr.redhat.com 0 0Bytes 0 0 not started 0.00 rhs-client4.lab.eng.blr.redhat.com 0 0Bytes 11 0 completed 0.00 rhs-client15.lab.eng.blr.redhat.com 2 1.0MB 6 0 in progress 4.00 .... [Thu May 16 17:51:28 root@rhs-client45:~ ] #gluster volume remove-brick RHEV-BigBend_extra rhs-client15.lab.eng.blr.redhat.com:/rhs/brick8/RHEV-BigBend_extra rhs-client4.lab.eng.blr.redhat.com:/rhs/brick8/RHEV-BigBend_extra status Node Rebalanced-files size scanned failures status run-time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 0 not started 0.00 rhs-client37.lab.eng.blr.redhat.com 0 0Bytes 0 0 not started 0.00 rhs-client4.lab.eng.blr.redhat.com 0 0Bytes 11 0 completed 0.00 rhs-client15.lab.eng.blr.redhat.com 5 30.0GB 12 0 completed 381.00 [Thu May 16 17:53:02 root@rhs-client45:~ ] #gluster volume remove-brick RHEV-BigBend_extra rhs-client15.lab.eng.blr.redhat.com:/rhs/brick8/RHEV-BigBend_extra rhs-client4.lab.eng.blr.redhat.com:/rhs/brick8/RHEV-BigBend_extra commit Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y volume remove-brick commit: success ---------------------------------------------------------------------
The VMs that were brought to paused state, described in comment 19, were not recoverable even after forcefully shutting them down. On trying to start the VMs, the following type of messages were displayed, and the VMs remained dowm. --------------------------------------------------------------------- 2013-May-16, 18:32 Failed to run VM virtBB03 (User: admin@internal). 35d701bd oVirt 2013-May-16, 18:32 Failed to run VM virtBB03 on Host rhs-gp-srv15. 35d701bd oVirt 2013-May-16, 18:32 VM virtBB03 is down. Exit message: 'truesize'. oVirt 2013-May-16, 18:32 VM virtBB03 was started by admin@internal (Host: rhs-gp-srv15). 35d701bd oVirt --------------------------------------------------------------------- So this Bug has led to *VM Data Corruption*, and *VM loss*.
Additional Info: The VMs that were brought to paused state, described in comment 19, are no longer removable as well, and the status of the disks of the VMs are shown as 'Illegal'
These error messages are seen in the tail end of the logs. The current graph count is 6. [2013-05-16 12:23:26.243298] E [client-handshake.c:1741:client_query_portmap_cbk] 3-RHEV-BigBend_extra-client-14: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2013-05-16 12:23:26.243361] W [socket.c:515:__socket_rwv] 3-RHEV-BigBend_extra-client-14: readv on 10.70.36.39:24007 failed (No data available) [2013-05-16 12:23:26.247349] E [client-handshake.c:1741:client_query_portmap_cbk] 4-RHEV-BigBend_extra-client-14: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2013-05-16 12:23:26.247401] W [socket.c:515:__socket_rwv] 4-RHEV-BigBend_extra-client-14: readv on 10.70.36.39:24007 failed (No data available) [2013-05-16 12:23:26.264549] W [socket.c:515:__socket_rwv] 5-RHEV-BigBend_extra-client-13: readv on 10.70.36.28:24007 failed (No data available) [2013-05-16 12:23:26.264582] I [client.c:2103:client_rpc_notify] 5-RHEV-BigBend_extra-client-13: disconnected from 10.70.36.28:24007. Client process will keep trying to connect to glusterd until brick's port is available. [2013-05-16 12:23:26.268634] W [socket.c:515:__socket_rwv] 3-RHEV-BigBend_extra-client-12: readv on 10.70.36.69:24007 failed (No data available) [2013-05-16 12:23:26.272808] W [socket.c:515:__socket_rwv] 4-RHEV-BigBend_extra-client-12: readv on 10.70.36.69:24007 failed (No data available) [2013-05-16 12:23:27.277194] W [socket.c:515:__socket_rwv] 4-RHEV-BigBend_extra-client-13: readv on 10.70.36.61:24007 failed (No data available)
This issue seen in release glusterfs*3.4.0.8rhs-1.el6rhs.x86_64 might be a duplicate of bug 963896. The regression could lead to migration of data away from sub volumes that might not be under decommission. On a commit, the data in the brick being removed would be lost.
Fix for bug 963896 is merged downstream. The issue was remove-brick was marking the incorrect brick/subvolume as being decommissioned. This leads to data loss, after a remove-brick commit.
Issue remains. VMs irrecoverably went into paused state, during and after remove-brick operations, on 10X2 Distribute-Replicate volume, used as image-store in RHEVM+RHS environment with - RHEVM 3.3 : 3.3.0-0.4.master.el6ev RHS 2.1 : 3.4.0.12rhs-1.el6rhs.x86_64 Hypervisor: RHEL 6.4 + glusterfs*-3.4.0.12rhs-1.el6rhs.x86_64 Two separate runs were attempted during the test. In the first run, the 'remove-brick start' command finished very quickly, and the VMs went into paused state after the 'remove-brick commit' command was run. It seems that the data from the bricks removed was failed to be migrated. The VMs were irrecoverable from the paused state, or after powered down. Given below is the output of commands from the RHS server. --------------------------------------------------------------------------- [root@rhs-client45 ~]# gluster volume info Volume Name: BendVol Type: Distributed-Replicate Volume ID: c2158e6b-4072-417a-8259-b9b073e0c3c4 Status: Started Number of Bricks: 10 x 2 = 20 Transport-type: tcp Bricks: Brick1: rhs-client45.lab.eng.blr.redhat.com:/rhs/brick4/BendVol Brick2: rhs-client37.lab.eng.blr.redhat.com:/rhs/brick4/BendVol Brick3: rhs-client15.lab.eng.blr.redhat.com:/rhs/brick4/BendVol Brick4: rhs-client4.lab.eng.blr.redhat.com:/rhs/brick4/BendVol Brick5: rhs-client45.lab.eng.blr.redhat.com:/rhs/brick5/BendVol Brick6: rhs-client37.lab.eng.blr.redhat.com:/rhs/brick5/BendVol Brick7: rhs-client15.lab.eng.blr.redhat.com:/rhs/brick5/BendVol Brick8: rhs-client4.lab.eng.blr.redhat.com:/rhs/brick5/BendVol Brick9: rhs-client45.lab.eng.blr.redhat.com:/rhs/brick6/BendVol Brick10: rhs-client37.lab.eng.blr.redhat.com:/rhs/brick6/BendVol Brick11: rhs-client15.lab.eng.blr.redhat.com:/rhs/brick6/BendVol Brick12: rhs-client4.lab.eng.blr.redhat.com:/rhs/brick6/BendVol Brick13: rhs-client45.lab.eng.blr.redhat.com:/rhs/brick7/BendVol Brick14: rhs-client37.lab.eng.blr.redhat.com:/rhs/brick7/BendVol Brick15: rhs-client15.lab.eng.blr.redhat.com:/rhs/brick7/BendVol Brick16: rhs-client4.lab.eng.blr.redhat.com:/rhs/brick7/BendVol Brick17: rhs-client45.lab.eng.blr.redhat.com:/rhs/brick8/BendVol Brick18: rhs-client37.lab.eng.blr.redhat.com:/rhs/brick8/BendVol Brick19: rhs-client15.lab.eng.blr.redhat.com:/rhs/brick8/BendVol Brick20: rhs-client4.lab.eng.blr.redhat.com:/rhs/brick8/BendVol Options Reconfigured: storage.owner-gid: 36 storage.owner-uid: 36 network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off [root@rhs-client45 ~]# ls /rhs/brick8/BendVol/ 1071de86-7917-48a7-8063-7a9cc82c598f/ __DIRECT_IO_TEST__ .glusterfs/ [root@rhs-client45 ~]# ls /rhs/brick8/BendVol/1071de86-7917-48a7-8063-7a9cc82c598f/images/ 2146b854-28a9-473e-80ad-47224a798619 51af6f3d-4e84-4cf1-8a1f-2762e4b42487 94382a3a-8554-43cc-a3ee-ddc2b4d595ef d81d9131-2c29-42d3-9bfa-d7071030e739 225cb764-22aa-435d-b802-794cc5e1bdc7 8f775a74-e300-4512-9ed8-a4d5b852eb3d 95c77039-aca1-43c4-bf20-46852fe6d1de [root@rhs-client45 ~]# ls -lh /rhs/brick8/BendVol/1071de86-7917-48a7-8063-7a9cc82c598f/images/51af6f3d-4e84-4cf1-8a1f-2762e4b42487/ total 12G -rw-rw----. 2 vdsm kvm 12G Jun 28 12:18 42ee340f-3086-4721-b726-4adf5966c9f6 [root@rhs-client45 ~]# ls -lh /rhs/brick8/BendVol/1071de86-7917-48a7-8063-7a9cc82c598f/images/8f775a74-e300-4512-9ed8-a4d5b852eb3d/ total 15G -rw-rw----. 2 vdsm kvm 15G Jun 28 12:18 6840ccf1-9787-44e8-9907-6cad97a6e2ea [root@rhs-client45 ~]# ls -lh /rhs/brick8/BendVol/1071de86-7917-48a7-8063-7a9cc82c598f/images/95c77039-aca1-43c4-bf20-46852fe6d1de/ total 4.0K -rw-r--r--. 2 vdsm kvm 274 Jun 27 18:43 3a55e8ef-5a4d-4358-ac33-436f2cba2a13.meta [root@rhs-client45 ~]# gluster volume remove-brick BendVol rhs-client45.lab.eng.blr.redhat.com:/rhs/brick8/BendVol rhs-client37.lab.eng.blr.redhat.com:/rhs/brick8/BendVol start volume remove-brick start: success ID: 9144c90b-f578-4129-a8fc-be10fff1dd2c [root@rhs-client45 ~]# gluster volume remove-brick BendVol rhs-client45.lab.eng.blr.redhat.com:/rhs/brick8/BendVol rhs-client37.lab.eng.blr.redhat.com:/rhs/brick8/BendVol status Node Rebalanced-files size scanned failures status run-time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 1 0Bytes 2 0 completed 0.00 rhs-client37.lab.eng.blr.redhat.com 0 0Bytes 34 0 completed 0.00 rhs-client15.lab.eng.blr.redhat.com 0 0Bytes 0 0 not started 0.00 rhs-client4.lab.eng.blr.redhat.com 0 0Bytes 0 0 not started 0.00 [root@rhs-client45 ~]# gluster volume remove-brick BendVol rhs-client45.lab.eng.blr.redhat.com:/rhs/brick8/BendVol rhs-client37.lab.eng.blr.redhat.com:/rhs/brick8/BendVol status Node Rebalanced-files size scanned failures status run-time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 1 0Bytes 2 0 completed 0.00 rhs-client37.lab.eng.blr.redhat.com 0 0Bytes 34 0 completed 0.00 rhs-client15.lab.eng.blr.redhat.com 0 0Bytes 0 0 not started 0.00 rhs-client4.lab.eng.blr.redhat.com 0 0Bytes 0 0 not started 0.00 [root@rhs-client45 ~]# gluster volume remove-brick BendVol rhs-client45.lab.eng.blr.redhat.com:/rhs/brick8/BendVol rhs-client37.lab.eng.blr.redhat.com:/rhs/brick8/BendVol commit Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y volume remove-brick commit: success [root@rhs-client45 ~]# gluster volume info Volume Name: BendVol Type: Distributed-Replicate Volume ID: c2158e6b-4072-417a-8259-b9b073e0c3c4 Status: Started Number of Bricks: 9 x 2 = 18 Transport-type: tcp Bricks: Brick1: rhs-client45.lab.eng.blr.redhat.com:/rhs/brick4/BendVol Brick2: rhs-client37.lab.eng.blr.redhat.com:/rhs/brick4/BendVol Brick3: rhs-client15.lab.eng.blr.redhat.com:/rhs/brick4/BendVol Brick4: rhs-client4.lab.eng.blr.redhat.com:/rhs/brick4/BendVol Brick5: rhs-client45.lab.eng.blr.redhat.com:/rhs/brick5/BendVol Brick6: rhs-client37.lab.eng.blr.redhat.com:/rhs/brick5/BendVol Brick7: rhs-client15.lab.eng.blr.redhat.com:/rhs/brick5/BendVol Brick8: rhs-client4.lab.eng.blr.redhat.com:/rhs/brick5/BendVol Brick9: rhs-client45.lab.eng.blr.redhat.com:/rhs/brick6/BendVol Brick10: rhs-client37.lab.eng.blr.redhat.com:/rhs/brick6/BendVol Brick11: rhs-client15.lab.eng.blr.redhat.com:/rhs/brick6/BendVol Brick12: rhs-client4.lab.eng.blr.redhat.com:/rhs/brick6/BendVol Brick13: rhs-client45.lab.eng.blr.redhat.com:/rhs/brick7/BendVol Brick14: rhs-client37.lab.eng.blr.redhat.com:/rhs/brick7/BendVol Brick15: rhs-client15.lab.eng.blr.redhat.com:/rhs/brick7/BendVol Brick16: rhs-client4.lab.eng.blr.redhat.com:/rhs/brick7/BendVol Brick17: rhs-client15.lab.eng.blr.redhat.com:/rhs/brick8/BendVol Brick18: rhs-client4.lab.eng.blr.redhat.com:/rhs/brick8/BendVol Options Reconfigured: storage.owner-gid: 36 storage.owner-uid: 36 network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off [root@rhs-client45 ~]# gluster volume status Status of volume: BendVol Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick rhs-client45.lab.eng.blr.redhat.com:/rhs/brick4/B endVol 49155 Y 9702 Brick rhs-client37.lab.eng.blr.redhat.com:/rhs/brick4/B endVol 49155 Y 9786 Brick rhs-client15.lab.eng.blr.redhat.com:/rhs/brick4/B endVol 49155 Y 10080 Brick rhs-client4.lab.eng.blr.redhat.com:/rhs/brick4/Be ndVol 49155 Y 10217 Brick rhs-client45.lab.eng.blr.redhat.com:/rhs/brick5/B endVol 49156 Y 9711 Brick rhs-client37.lab.eng.blr.redhat.com:/rhs/brick5/B endVol 49156 Y 9795 Brick rhs-client15.lab.eng.blr.redhat.com:/rhs/brick5/B endVol 49156 Y 10089 Brick rhs-client4.lab.eng.blr.redhat.com:/rhs/brick5/Be ndVol 49156 Y 10226 Brick rhs-client45.lab.eng.blr.redhat.com:/rhs/brick6/B endVol 49157 Y 9720 Brick rhs-client37.lab.eng.blr.redhat.com:/rhs/brick6/B endVol 49157 Y 9804 Brick rhs-client15.lab.eng.blr.redhat.com:/rhs/brick6/B endVol 49157 Y 10098 Brick rhs-client4.lab.eng.blr.redhat.com:/rhs/brick6/Be ndVol 49157 Y 10235 Brick rhs-client45.lab.eng.blr.redhat.com:/rhs/brick7/B endVol 49158 Y 9729 Brick rhs-client37.lab.eng.blr.redhat.com:/rhs/brick7/B endVol 49158 Y 9813 Brick rhs-client15.lab.eng.blr.redhat.com:/rhs/brick7/B endVol 49158 Y 10107 Brick rhs-client4.lab.eng.blr.redhat.com:/rhs/brick7/Be ndVol 49158 Y 10244 Brick rhs-client15.lab.eng.blr.redhat.com:/rhs/brick8/B endVol 49159 Y 10116 Brick rhs-client4.lab.eng.blr.redhat.com:/rhs/brick8/Be ndVol 49159 Y 10253 NFS Server on localhost 2049 Y 19422 Self-heal Daemon on localhost N/A Y 19429 NFS Server on 9c1c5f38-19d0-475e-897c-d88f651a54ba 2049 Y 19440 Self-heal Daemon on 9c1c5f38-19d0-475e-897c-d88f651a54b a N/A Y 19447 NFS Server on 49257f07-7344-4b00-9ff3-544959419579 2049 Y 20241 Self-heal Daemon on 49257f07-7344-4b00-9ff3-54495941957 9 N/A Y 20248 NFS Server on 22b94b39-514d-4986-8e37-36322a08b9c1 2049 Y 20112 Self-heal Daemon on 22b94b39-514d-4986-8e37-36322a08b9c 1 N/A Y 20119 There are no active volume tasks [root@rhs-client45 ~]# -------------------------------------------------------------^C [root@rhs-client45 ~]# ls -lh /rhs/brick8/BendVol/1071de86-7917-48a7-8063-7a9cc82c598f/images/ total 0 drwxr-xr-x. 2 vdsm kvm 6 Jun 27 19:57 2146b854-28a9-473e-80ad-47224a798619 drwxr-xr-x. 2 vdsm kvm 6 Jun 27 18:41 225cb764-22aa-435d-b802-794cc5e1bdc7 drwxr-xr-x. 2 vdsm kvm 49 Jun 27 18:53 51af6f3d-4e84-4cf1-8a1f-2762e4b42487 drwxr-xr-x. 2 vdsm kvm 49 Jun 27 18:40 8f775a74-e300-4512-9ed8-a4d5b852eb3d drwxr-xr-x. 2 vdsm kvm 6 Jun 27 18:55 94382a3a-8554-43cc-a3ee-ddc2b4d595ef drwxr-xr-x. 2 vdsm kvm 54 Jun 27 18:43 95c77039-aca1-43c4-bf20-46852fe6d1de drwxr-xr-x. 2 vdsm kvm 6 Jun 27 18:38 d81d9131-2c29-42d3-9bfa-d7071030e739 [root@rhs-client45 ~]# ls -lh /rhs/brick8/BendVol/1071de86-7917-48a7-8063-7a9cc82c598f/images/51af6f3d-4e84-4cf1-8a1f-2762e4b42487/ total 12G -rw-rw----. 2 vdsm kvm 12G Jun 28 12:26 42ee340f-3086-4721-b726-4adf5966c9f6 [root@rhs-client45 ~]# ls -lh /rhs/brick8/BendVol/1071de86-7917-48a7-8063-7a9cc82c598f/images/8f775a74-e300-4512-9ed8-a4d5b852eb3d/ total 15G -rw-rw----. 2 vdsm kvm 15G Jun 28 12:26 6840ccf1-9787-44e8-9907-6cad97a6e2ea [root@rhs-client45 ~]# ls -lh /rhs/brick8/BendVol/1071de86-7917-48a7-8063-7a9cc82c598f/images/95c77039-aca1-43c4-bf20-46852fe6d1de/ total 4.0K -rw-r--r--. 2 vdsm kvm 274 Jun 27 18:43 3a55e8ef-5a4d-4358-ac33-436f2cba2a13.meta --------------------------------------------------------------------------- Given below are the messages noticed on the RHEVM UI during the events, and finally when one of the VMs was removed. --------------------------------------------------------------------------- ID 139 Time 2013-Jun-28, 12:27 Message VM Snaf4 has paused due to unknown storage error. ID 139 Time 2013-Jun-28, 12:27 Message VM VQ2 has paused due to unknown storage error. ID 119 Time 2013-Jun-28, 12:33 Message VM VQ2 is down. Exit message: 'truesize'. ID 119 Time 2013-Jun-28, 12:34 Message VM Snaf4 is down. Exit message: 'truesize'. --------------------------------------------------------------------------- On the second run of the test, the 'remove-brick start' command took some time to finish. But during its run, the VM got irrecoverably paused. The 'remove-brick commit' was run after the completion of the first stage. The VM was successfully resumed from the p[paused state, but after a shut down, the VM was irrecoverable. Given below is the output of commands from the RHS server. --------------------------------------------------------------------------- [root@rhs-client15 ~]# ls -lh /rhs/brick8/BendVol/ 1071de86-7917-48a7-8063-7a9cc82c598f/ __DIRECT_IO_TEST__ .glusterfs/ [root@rhs-client15 ~]# ls -lh /rhs/brick8/BendVol/1071de86-7917-48a7-8063-7a9cc82c598f/images/ total 0 drwxr-xr-x. 2 vdsm kvm 6 Jun 27 19:57 2146b854-28a9-473e-80ad-47224a798619 drwxr-xr-x. 2 vdsm kvm 54 Jun 27 18:42 225cb764-22aa-435d-b802-794cc5e1bdc7 drwxr-xr-x. 2 vdsm kvm 55 Jun 28 12:27 51af6f3d-4e84-4cf1-8a1f-2762e4b42487 drwxr-xr-x. 2 vdsm kvm 54 Jun 27 18:42 8f775a74-e300-4512-9ed8-a4d5b852eb3d drwxr-xr-x. 2 vdsm kvm 6 Jun 27 18:55 94382a3a-8554-43cc-a3ee-ddc2b4d595ef drwxr-xr-x. 2 vdsm kvm 6 Jun 28 12:27 95c77039-aca1-43c4-bf20-46852fe6d1de drwxr-xr-x. 2 vdsm kvm 55 Jun 27 18:43 d81d9131-2c29-42d3-9bfa-d7071030e739 [root@rhs-client15 ~]# ls -lh /rhs/brick8/BendVol/1071de86-7917-48a7-8063-7a9cc82c598f/images/225cb764-22aa-435d-b802-794cc5e1bdc7/ total 4.0K -rw-r--r--. 2 vdsm kvm 274 Jun 27 18:42 6fdc1ab8-7f03-4726-9b07-2ef528f8132e.meta [root@rhs-client15 ~]# ls -lh /rhs/brick8/BendVol/1071de86-7917-48a7-8063-7a9cc82c598f/images/51af6f3d-4e84-4cf1-8a1f-2762e4b42487/ total 1.0M -rw-rw----. 2 vdsm kvm 1.0M Jun 27 18:55 42ee340f-3086-4721-b726-4adf5966c9f6.lease [root@rhs-client15 ~]# ls -lh /rhs/brick8/BendVol/1071de86-7917-48a7-8063-7a9cc82c598f/images/8f775a74-e300-4512-9ed8-a4d5b852eb3d/ total 0 ---------T. 2 root root 0 Jun 27 18:42 6840ccf1-9787-44e8-9907-6cad97a6e2ea.meta [root@rhs-client15 ~]# ls -lh /rhs/brick8/BendVol/1071de86-7917-48a7-8063-7a9cc82c598f/images/d81d9131-2c29-42d3-9bfa-d7071030e739/ total 1.0M -rw-rw----. 2 vdsm kvm 1.0M Jun 27 18:43 52e9b2b6-f7c0-4e50-b333-51413571e487.lease [root@rhs-client45 ~]# -----------------------------------------------------------------------^C [root@rhs-client45 ~]# gluster volume info Volume Name: BendVol Type: Distributed-Replicate Volume ID: c2158e6b-4072-417a-8259-b9b073e0c3c4 Status: Started Number of Bricks: 9 x 2 = 18 Transport-type: tcp Bricks: Brick1: rhs-client45.lab.eng.blr.redhat.com:/rhs/brick4/BendVol Brick2: rhs-client37.lab.eng.blr.redhat.com:/rhs/brick4/BendVol Brick3: rhs-client15.lab.eng.blr.redhat.com:/rhs/brick4/BendVol Brick4: rhs-client4.lab.eng.blr.redhat.com:/rhs/brick4/BendVol Brick5: rhs-client45.lab.eng.blr.redhat.com:/rhs/brick5/BendVol Brick6: rhs-client37.lab.eng.blr.redhat.com:/rhs/brick5/BendVol Brick7: rhs-client15.lab.eng.blr.redhat.com:/rhs/brick5/BendVol Brick8: rhs-client4.lab.eng.blr.redhat.com:/rhs/brick5/BendVol Brick9: rhs-client45.lab.eng.blr.redhat.com:/rhs/brick6/BendVol Brick10: rhs-client37.lab.eng.blr.redhat.com:/rhs/brick6/BendVol Brick11: rhs-client15.lab.eng.blr.redhat.com:/rhs/brick6/BendVol Brick12: rhs-client4.lab.eng.blr.redhat.com:/rhs/brick6/BendVol Brick13: rhs-client45.lab.eng.blr.redhat.com:/rhs/brick7/BendVol Brick14: rhs-client37.lab.eng.blr.redhat.com:/rhs/brick7/BendVol Brick15: rhs-client15.lab.eng.blr.redhat.com:/rhs/brick7/BendVol Brick16: rhs-client4.lab.eng.blr.redhat.com:/rhs/brick7/BendVol Brick17: rhs-client15.lab.eng.blr.redhat.com:/rhs/brick8/BendVol Brick18: rhs-client4.lab.eng.blr.redhat.com:/rhs/brick8/BendVol Options Reconfigured: storage.owner-gid: 36 storage.owner-uid: 36 network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off [root@rhs-client45 ~]# gluster volume status Status of volume: BendVol Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick rhs-client45.lab.eng.blr.redhat.com:/rhs/brick4/B endVol 49155 Y 9702 Brick rhs-client37.lab.eng.blr.redhat.com:/rhs/brick4/B endVol 49155 Y 9786 Brick rhs-client15.lab.eng.blr.redhat.com:/rhs/brick4/B endVol 49155 Y 10080 Brick rhs-client4.lab.eng.blr.redhat.com:/rhs/brick4/Be ndVol 49155 Y 10217 Brick rhs-client45.lab.eng.blr.redhat.com:/rhs/brick5/B endVol 49156 Y 9711 Brick rhs-client37.lab.eng.blr.redhat.com:/rhs/brick5/B endVol 49156 Y 9795 Brick rhs-client15.lab.eng.blr.redhat.com:/rhs/brick5/B endVol 49156 Y 10089 Brick rhs-client4.lab.eng.blr.redhat.com:/rhs/brick5/Be ndVol 49156 Y 10226 Brick rhs-client45.lab.eng.blr.redhat.com:/rhs/brick6/B endVol 49157 Y 9720 Brick rhs-client37.lab.eng.blr.redhat.com:/rhs/brick6/B endVol 49157 Y 9804 Brick rhs-client15.lab.eng.blr.redhat.com:/rhs/brick6/B endVol 49157 Y 10098 Brick rhs-client4.lab.eng.blr.redhat.com:/rhs/brick6/Be ndVol 49157 Y 10235 Brick rhs-client45.lab.eng.blr.redhat.com:/rhs/brick7/B endVol 49158 Y 9729 Brick rhs-client37.lab.eng.blr.redhat.com:/rhs/brick7/B endVol 49158 Y 9813 Brick rhs-client15.lab.eng.blr.redhat.com:/rhs/brick7/B endVol 49158 Y 10107 Brick rhs-client4.lab.eng.blr.redhat.com:/rhs/brick7/Be ndVol 49158 Y 10244 Brick rhs-client15.lab.eng.blr.redhat.com:/rhs/brick8/B endVol 49159 Y 10116 Brick rhs-client4.lab.eng.blr.redhat.com:/rhs/brick8/Be ndVol 49159 Y 10253 NFS Server on localhost 2049 Y 19422 Self-heal Daemon on localhost N/A Y 19429 NFS Server on 49257f07-7344-4b00-9ff3-544959419579 2049 Y 20241 Self-heal Daemon on 49257f07-7344-4b00-9ff3-54495941957 9 N/A Y 20248 NFS Server on 22b94b39-514d-4986-8e37-36322a08b9c1 2049 Y 20112 Self-heal Daemon on 22b94b39-514d-4986-8e37-36322a08b9c 1 N/A Y 20119 NFS Server on 9c1c5f38-19d0-475e-897c-d88f651a54ba 2049 Y 19440 Self-heal Daemon on 9c1c5f38-19d0-475e-897c-d88f651a54b a N/A Y 19447 There are no active volume tasks [root@rhs-client45 ~]# gluster volume remove-brick BendVol rhs-client15.lab.eng.blr.redhat.com:/rhs/brick8/BendVol rhs-client4.lab.eng.blr.redhat.com:/rhs/brick8/BendVol start volume remove-brick start: success ID: c9d75e81-f6ec-4967-8e6b-2b24559d51bc [root@rhs-client45 ~]# gluster volume remove-brick BendVol rhs-client15.lab.eng.blr.redhat.com:/rhs/brick8/BendVol rhs-client4.lab.eng.blr.redhat.com:/rhs/brick8/BendVol status Node Rebalanced-files size scanned failures status run-time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 0 not started 0.00 rhs-client37.lab.eng.blr.redhat.com 0 0Bytes 0 0 not started 0.00 rhs-client15.lab.eng.blr.redhat.com 6 2.0MB 13 0 in progress 9.00 rhs-client4.lab.eng.blr.redhat.com 0 0Bytes 30 0 completed 0.00 [root@rhs-client45 ~]# gluster volume remove-brick BendVol rhs-client15.lab.eng.blr.redhat.com:/rhs/brick8/BendVol rhs-client4.lab.eng.blr.redhat.com:/rhs/brick8/BendVol status Node Rebalanced-files size scanned failures status run-time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 0 not started 0.00 rhs-client37.lab.eng.blr.redhat.com 0 0Bytes 0 0 not started 0.00 rhs-client15.lab.eng.blr.redhat.com 15 12.0GB 36 0 completed 239.00 rhs-client4.lab.eng.blr.redhat.com 0 0Bytes 30 0 completed 0.00 [root@rhs-client45 ~]# gluster volume remove-brick BendVol rhs-client15.lab.eng.blr.redhat.com:/rhs/brick8/BendVol rhs-client4.lab.eng.blr.redhat.com:/rhs/brick8/BendVol commit Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y volume remove-brick commit: success [root@rhs-client45 ~]# gluster volume remove-brick BendVol rhs-client15.lab.eng.blr.redhat.com:/rhs/brick8/BendVol rhs-client4.lab.eng.blr.redhat.com:/rhs/brick8/BendVol status Node Rebalanced-files size scanned failures status run-time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 0 not started 0.00 rhs-client37.lab.eng.blr.redhat.com 0 0Bytes 0 0 not started 0.00 rhs-client15.lab.eng.blr.redhat.com 0 0Bytes 0 0 not started 0.00 rhs-client4.lab.eng.blr.redhat.com 0 0Bytes 0 0 not started 0.00 [root@rhs-client45 ~]# gluster volume info Volume Name: BendVol Type: Distributed-Replicate Volume ID: c2158e6b-4072-417a-8259-b9b073e0c3c4 Status: Started Number of Bricks: 8 x 2 = 16 Transport-type: tcp Bricks: Brick1: rhs-client45.lab.eng.blr.redhat.com:/rhs/brick4/BendVol Brick2: rhs-client37.lab.eng.blr.redhat.com:/rhs/brick4/BendVol Brick3: rhs-client15.lab.eng.blr.redhat.com:/rhs/brick4/BendVol Brick4: rhs-client4.lab.eng.blr.redhat.com:/rhs/brick4/BendVol Brick5: rhs-client45.lab.eng.blr.redhat.com:/rhs/brick5/BendVol Brick6: rhs-client37.lab.eng.blr.redhat.com:/rhs/brick5/BendVol Brick7: rhs-client15.lab.eng.blr.redhat.com:/rhs/brick5/BendVol Brick8: rhs-client4.lab.eng.blr.redhat.com:/rhs/brick5/BendVol Brick9: rhs-client45.lab.eng.blr.redhat.com:/rhs/brick6/BendVol Brick10: rhs-client37.lab.eng.blr.redhat.com:/rhs/brick6/BendVol Brick11: rhs-client15.lab.eng.blr.redhat.com:/rhs/brick6/BendVol Brick12: rhs-client4.lab.eng.blr.redhat.com:/rhs/brick6/BendVol Brick13: rhs-client45.lab.eng.blr.redhat.com:/rhs/brick7/BendVol Brick14: rhs-client37.lab.eng.blr.redhat.com:/rhs/brick7/BendVol Brick15: rhs-client15.lab.eng.blr.redhat.com:/rhs/brick7/BendVol Brick16: rhs-client4.lab.eng.blr.redhat.com:/rhs/brick7/BendVol Options Reconfigured: storage.owner-gid: 36 storage.owner-uid: 36 network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off [root@rhs-client45 ~]# gluster volume status Status of volume: BendVol Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick rhs-client45.lab.eng.blr.redhat.com:/rhs/brick4/B endVol 49155 Y 9702 Brick rhs-client37.lab.eng.blr.redhat.com:/rhs/brick4/B endVol 49155 Y 9786 Brick rhs-client15.lab.eng.blr.redhat.com:/rhs/brick4/B endVol 49155 Y 10080 Brick rhs-client4.lab.eng.blr.redhat.com:/rhs/brick4/Be ndVol 49155 Y 10217 Brick rhs-client45.lab.eng.blr.redhat.com:/rhs/brick5/B endVol 49156 Y 9711 Brick rhs-client37.lab.eng.blr.redhat.com:/rhs/brick5/B endVol 49156 Y 9795 Brick rhs-client15.lab.eng.blr.redhat.com:/rhs/brick5/B endVol 49156 Y 10089 Brick rhs-client4.lab.eng.blr.redhat.com:/rhs/brick5/Be ndVol 49156 Y 10226 Brick rhs-client45.lab.eng.blr.redhat.com:/rhs/brick6/B endVol 49157 Y 9720 Brick rhs-client37.lab.eng.blr.redhat.com:/rhs/brick6/B endVol 49157 Y 9804 Brick rhs-client15.lab.eng.blr.redhat.com:/rhs/brick6/B endVol 49157 Y 10098 Brick rhs-client4.lab.eng.blr.redhat.com:/rhs/brick6/Be ndVol 49157 Y 10235 Brick rhs-client45.lab.eng.blr.redhat.com:/rhs/brick7/B endVol 49158 Y 9729 Brick rhs-client37.lab.eng.blr.redhat.com:/rhs/brick7/B endVol 49158 Y 9813 Brick rhs-client15.lab.eng.blr.redhat.com:/rhs/brick7/B endVol 49158 Y 10107 Brick rhs-client4.lab.eng.blr.redhat.com:/rhs/brick7/Be ndVol 49158 Y 10244 NFS Server on localhost 2049 Y 19656 Self-heal Daemon on localhost N/A Y 19663 NFS Server on 9c1c5f38-19d0-475e-897c-d88f651a54ba 2049 Y 19634 Self-heal Daemon on 9c1c5f38-19d0-475e-897c-d88f651a54b a N/A Y 19641 NFS Server on 49257f07-7344-4b00-9ff3-544959419579 2049 Y 20426 Self-heal Daemon on 49257f07-7344-4b00-9ff3-54495941957 9 N/A Y 20433 NFS Server on 22b94b39-514d-4986-8e37-36322a08b9c1 2049 Y 20304 Self-heal Daemon on 22b94b39-514d-4986-8e37-36322a08b9c 1 N/A Y 20311 There are no active volume tasks [root@rhs-client15 ~]# ---------------------------------------------------^C [root@rhs-client15 ~]# [root@rhs-client15 ~]# [root@rhs-client15 ~]# ls -lh /rhs/brick8/BendVol/1071de86-7917-48a7-8063-7a9cc82c598f/images/2146b854-28a9-473e-80ad-47224a798619/ total 0 [root@rhs-client15 ~]# ls -lh /rhs/brick8/BendVol/1071de86-7917-48a7-8063-7a9cc82c598f/images/225cb764-22aa-435d-b802-794cc5e1bdc7/ total 0 [root@rhs-client15 ~]# ls -lh /rhs/brick8/BendVol/1071de86-7917-48a7-8063-7a9cc82c598f/images/51af6f3d-4e84-4cf1-8a1f-2762e4b42487/ total 0 [root@rhs-client15 ~]# ls -lh /rhs/brick8/BendVol/1071de86-7917-48a7-8063-7a9cc82c598f/images/8f775a74-e300-4512-9ed8-a4d5b852eb3d/ total 0 [root@rhs-client15 ~]# ls -lh /rhs/brick8/BendVol/1071de86-7917-48a7-8063-7a9cc82c598f/images/94382a3a-8554-43cc-a3ee-ddc2b4d595ef/ total 0 [root@rhs-client15 ~]# ls -lh /rhs/brick8/BendVol/1071de86-7917-48a7-8063-7a9cc82c598f/images/95c77039-aca1-43c4-bf20-46852fe6d1de/ total 0 [root@rhs-client15 ~]# ls -lh /rhs/brick8/BendVol/1071de86-7917-48a7-8063-7a9cc82c598f/images/d81d9131-2c29-42d3-9bfa-d7071030e739/ total 0 [root@rhs-client15 ~]# --------------------------------------------------------------------------- Given below are the messages noticed on the RHEVM UI during the events. There was no error during removal of VM in this case. --------------------------------------------------------------------------- ID 139 Time 2013-Jun-28, 12:40 Message VM VQ1 has paused due to unknown storage error. ID 119 Time 2013-Jun-28, 12:49 Message VM VQ1 is down. Exit message: 'truesize'. ---------------------------------------------------------------------------
This BZ was initially opened for RHS 2.0+. But this has now inadvertently evolved to handling another issue, also caused by remove-brick operation, but valid only on RHS 2.1 , and where, as a result the VMs get corrupted. So another BZ 983145 has been opened to deal with the original issue afresh on RHS 2.0+, which is still reproducible, and leads to intermittent instances of paused VMs, and may lead to loss of data not synced
Reproduced issue on glusterfs-server-3.4.0.12rhs.beta6-1.el6rhs.x86_64 The xattrs information are given below: ------------------------------------------------------------------------ [2013-07-24 11:11:49.503538] D [afr-common.c:1385:afr_lookup_select_read_child] 0-Hacker-replicate-5: Source selected as 0 for / [2013-07-24 11:11:49.503545] D [afr-common.c:1122:afr_lookup_build_response_params] 0-Hacker-replicate-5: Building lookup response from 0 [2013-07-24 11:11:49.503577] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-Hacker-replicate-4: pending_matrix: [ 0 0 ] [2013-07-24 11:11:49.503586] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-Hacker-replicate-4: pending_matrix: [ 0 0 ] [2013-07-24 11:11:49.503592] D [afr-self-heal-common.c:887:afr_mark_sources] 0-Hacker-replicate-4: Number of sources: 0 [2013-07-24 11:11:49.503598] D [afr-self-heal-data.c:929:afr_lookup_select_read_child_by_txn_type] 0-Hacker-replicate-4: returning read_child: 0 [2013-07-24 11:11:49.503605] D [afr-common.c:1385:afr_lookup_select_read_child] 0-Hacker-replicate-4: Source selected as 0 for / [2013-07-24 11:11:49.503611] D [afr-common.c:1122:afr_lookup_build_response_params] 0-Hacker-replicate-4: Building lookup response from 0 [2013-07-24 11:11:49.503654] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-Hacker-replicate-6: pending_matrix: [ 0 0 ] [2013-07-24 11:11:49.503663] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-Hacker-replicate-6: pending_matrix: [ 0 0 ] [2013-07-24 11:11:49.503670] D [afr-self-heal-common.c:887:afr_mark_sources] 0-Hacker-replicate-6: Number of sources: 0 [2013-07-24 11:11:49.503676] D [afr-self-heal-data.c:929:afr_lookup_select_read_child_by_txn_type] 0-Hacker-replicate-6: returning read_child: 1 [2013-07-24 11:11:49.503682] D [afr-common.c:1385:afr_lookup_select_read_child] 0-Hacker-replicate-6: Source selected as 1 for / [2013-07-24 11:11:49.503689] D [afr-common.c:1122:afr_lookup_build_response_params] 0-Hacker-replicate-6: Building lookup response from 1 [2013-07-24 11:11:49.503720] I [dht-rebalance.c:1106:gf_defrag_migrate_data] 0-Hacker-dht: migrate data called on / [2013-07-24 11:11:49.504432] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-Hacker-replicate-0: /: no entries found in Hacker-client-0 [2013-07-24 11:11:49.504494] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-Hacker-replicate-0: /: no entries found in Hacker-client-1 [2013-07-24 11:11:49.504697] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-Hacker-replicate-5: /: no entries found in Hacker-client-10 [2013-07-24 11:11:49.504724] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-Hacker-replicate-2: /: no entries found in Hacker-client-5 [2013-07-24 11:11:49.504920] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-Hacker-replicate-5: /: no entries found in Hacker-client-11 [2013-07-24 11:11:49.504944] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-Hacker-replicate-4: /: no entries found in Hacker-client-8 [2013-07-24 11:11:49.504957] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-Hacker-replicate-2: /: no entries found in Hacker-client-4 [2013-07-24 11:11:49.504972] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-Hacker-replicate-1: /: no entries found in Hacker-client-3 [2013-07-24 11:11:49.504998] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-Hacker-replicate-3: /: no entries found in Hacker-client-6 [2013-07-24 11:11:49.505011] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-Hacker-replicate-1: /: no entries found in Hacker-client-2 [2013-07-24 11:11:49.505027] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-Hacker-replicate-3: /: no entries found in Hacker-client-7 [2013-07-24 11:11:49.505043] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-Hacker-replicate-6: /: no entries found in Hacker-client-12 [2013-07-24 11:11:49.505055] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-Hacker-replicate-6: /: no entries found in Hacker-client-13 [2013-07-24 11:11:49.505082] D [afr-dir-read.c:126:afr_examine_dir_readdir_cbk] 0-Hacker-replicate-4: /: no entries found in Hacker-client-9 [2013-07-24 11:11:49.505160] D [afr-common.c:745:afr_get_call_child] 0-Hacker-replicate-0: Returning 0, call_child: 0, last_index: -1 [2013-07-24 11:11:49.505484] D [afr-common.c:745:afr_get_call_child] 0-Hacker-replicate-1: Returning 0, call_child: 1, last_index: -1 [2013-07-24 11:11:49.505783] D [afr-common.c:745:afr_get_call_child] 0-Hacker-replicate-2: Returning 0, call_child: 0, last_index: -1 [2013-07-24 11:11:49.505958] D [afr-common.c:745:afr_get_call_child] 0-Hacker-replicate-3: Returning 0, call_child: 1, last_index: -1 [2013-07-24 11:11:49.506242] D [afr-common.c:745:afr_get_call_child] 0-Hacker-replicate-4: Returning 0, call_child: 0, last_index: -1 [2013-07-24 11:11:49.506489] D [afr-common.c:745:afr_get_call_child] 0-Hacker-replicate-5: Returning 0, call_child: 0, last_index: -1 [2013-07-24 11:11:49.506652] D [afr-common.c:745:afr_get_call_child] 0-Hacker-replicate-6: Returning 0, call_child: 1, last_index: -1 [2013-07-24 11:11:49.507006] I [dht-rebalance.c:1311:gf_defrag_migrate_data] 0-Hacker-dht: Migration operation on dir / took 0.00 secs ------------------------------------------------------------------------ ------------------------------------------------------------------------ [root@rhs-client4 glusterfs]# ls -l /rhs/brick*/Hacker /rhs/brick1/Hacker: total 0 drwxr-xr-x 5 vdsm kvm 45 Jul 24 10:30 fc59491a-b214-4023-a7d3-59ff6b0f25da /rhs/brick3/Hacker: total 0 drwxr-xr-x 5 vdsm kvm 45 Jul 24 10:30 fc59491a-b214-4023-a7d3-59ff6b0f25da /rhs/brick5/Hacker: total 0 drwxr-xr-x 5 vdsm kvm 45 Jul 24 10:30 fc59491a-b214-4023-a7d3-59ff6b0f25da /rhs/brick7/Hacker: total 0 drwxr-xr-x 5 vdsm kvm 45 Jul 24 10:30 fc59491a-b214-4023-a7d3-59ff6b0f25da [root@rhs-client4 glusterfs]# ls -l /rhs/brick*/Hacker/* /rhs/brick1/Hacker/fc59491a-b214-4023-a7d3-59ff6b0f25da: total 0 drwxr-xr-x 2 vdsm kvm 28 Jul 24 10:30 dom_md drwxr-xr-x 10 vdsm kvm 350 Jul 24 11:08 images drwxr-xr-x 4 vdsm kvm 28 Jul 24 10:30 master /rhs/brick3/Hacker/fc59491a-b214-4023-a7d3-59ff6b0f25da: total 0 drwxr-xr-x 2 vdsm kvm 29 Jul 24 16:41 dom_md drwxr-xr-x 10 vdsm kvm 350 Jul 24 11:08 images drwxr-xr-x 4 vdsm kvm 28 Jul 24 10:30 master /rhs/brick5/Hacker/fc59491a-b214-4023-a7d3-59ff6b0f25da: total 0 drwxr-xr-x 2 vdsm kvm 34 Jul 24 16:28 dom_md drwxr-xr-x 10 vdsm kvm 350 Jul 24 11:08 images drwxr-xr-x 4 vdsm kvm 28 Jul 24 10:30 master /rhs/brick7/Hacker/fc59491a-b214-4023-a7d3-59ff6b0f25da: total 0 drwxr-xr-x 2 vdsm kvm 21 Jul 24 10:42 dom_md drwxr-xr-x 10 vdsm kvm 350 Jul 24 11:08 images drwxr-xr-x 4 vdsm kvm 28 Jul 24 10:30 master [root@rhs-client4 glusterfs]# getfattr -m . -d -e hex /rhs/brick*/Hacker/ getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/Hacker/ trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000555555547ffffffd trusted.glusterfs.volume-id=0x50fa570f609c4bd89ebd53e4ff360c14 # file: rhs/brick3/Hacker/ trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000d5555552ffffffff trusted.glusterfs.volume-id=0x50fa570f609c4bd89ebd53e4ff360c14 # file: rhs/brick5/Hacker/ trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000007ffffffeaaaaaaa7 trusted.glusterfs.volume-id=0x50fa570f609c4bd89ebd53e4ff360c14 # file: rhs/brick7/Hacker/ trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000000000000000000000 trusted.glusterfs.volume-id=0x50fa570f609c4bd89ebd53e4ff360c14 [root@rhs-client10 glusterfs]# getfattr -m . -d -e hex /rhs/brick*/Hacker/ getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/Hacker/ trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000555555547ffffffd trusted.glusterfs.volume-id=0x50fa570f609c4bd89ebd53e4ff360c14 # file: rhs/brick3/Hacker/ trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000d5555552ffffffff trusted.glusterfs.volume-id=0x50fa570f609c4bd89ebd53e4ff360c14 # file: rhs/brick5/Hacker/ trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000007ffffffeaaaaaaa7 trusted.glusterfs.volume-id=0x50fa570f609c4bd89ebd53e4ff360c14 # file: rhs/brick7/Hacker/ trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000000000000000000000 trusted.glusterfs.volume-id=0x50fa570f609c4bd89ebd53e4ff360c14 [root@rhs-client15 ~]# getfattr -m . -d -e hex /rhs/brick*/Hacker/fc59491a-b214-4023-a7d3-59ff6b0f25da/master/tasks/* getfattr: /rhs/brick*/Hacker/fc59491a-b214-4023-a7d3-59ff6b0f25da/master/tasks/*: No such file or directory [root@rhs-client15 ~]# getfattr -m . -d -e hex /rhs/brick*/Hacker/ getfattr: Removing leading '/' from absolute path names # file: rhs/brick2/Hacker/ trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000002aaaaaaa55555553 trusted.glusterfs.volume-id=0x50fa570f609c4bd89ebd53e4ff360c14 # file: rhs/brick4/Hacker/ trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000000000002aaaaaa9 trusted.glusterfs.volume-id=0x50fa570f609c4bd89ebd53e4ff360c14 # file: rhs/brick6/Hacker/ trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000000000000000000000 trusted.glusterfs.volume-id=0x50fa570f609c4bd89ebd53e4ff360c14 # file: rhs/brick8/Hacker/ trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000aaaaaaa8d5555551 trusted.glusterfs.volume-id=0x50fa570f609c4bd89ebd53e4ff360c14 [root@rhs-client37 ~]# getfattr -m . -d -e hex /rhs/brick*/Hacker/ getfattr: Removing leading '/' from absolute path names # file: rhs/brick2/Hacker/ trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000002aaaaaaa55555553 trusted.glusterfs.volume-id=0x50fa570f609c4bd89ebd53e4ff360c14 # file: rhs/brick4/Hacker/ trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000000000002aaaaaa9 trusted.glusterfs.volume-id=0x50fa570f609c4bd89ebd53e4ff360c14 # file: rhs/brick6/Hacker/ trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000000000000000000000 trusted.glusterfs.volume-id=0x50fa570f609c4bd89ebd53e4ff360c14 # file: rhs/brick8/Hacker/ trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000aaaaaaa8d5555551 trusted.glusterfs.volume-id=0x50fa570f609c4bd89ebd53e4ff360c14 ------------------------------------------------------------------------
Information on directory layout from the RHS servers client10,15,37 was missing from comment 35. Adding them here. --------------------------------------------------------------- [root@rhs-client10 ~]# ls -l /rhs/brick*/Hacker/* /rhs/brick1/Hacker/fc59491a-b214-4023-a7d3-59ff6b0f25da: total 0 drwxr-xr-x 2 vdsm kvm 16 Jul 24 17:36 dom_md drwxr-xr-x 10 vdsm kvm 350 Jul 24 11:08 images drwxr-xr-x 4 vdsm kvm 28 Jul 24 10:30 master /rhs/brick3/Hacker/fc59491a-b214-4023-a7d3-59ff6b0f25da: total 0 drwxr-xr-x 2 vdsm kvm 29 Jul 24 16:41 dom_md drwxr-xr-x 10 vdsm kvm 350 Jul 24 11:08 images drwxr-xr-x 4 vdsm kvm 28 Jul 24 10:30 master /rhs/brick5/Hacker/fc59491a-b214-4023-a7d3-59ff6b0f25da: total 0 drwxr-xr-x 2 vdsm kvm 21 Jul 24 17:36 dom_md drwxr-xr-x 10 vdsm kvm 350 Jul 24 11:08 images drwxr-xr-x 4 vdsm kvm 28 Jul 24 10:30 master /rhs/brick7/Hacker/fc59491a-b214-4023-a7d3-59ff6b0f25da: total 0 drwxr-xr-x 2 vdsm kvm 19 Jul 24 17:36 dom_md drwxr-xr-x 10 vdsm kvm 350 Jul 24 11:08 images drwxr-xr-x 4 vdsm kvm 28 Jul 24 10:30 master [root@rhs-client10 ~]# ls -l /rhs/brick*/Hacker /rhs/brick1/Hacker: total 0 drwxr-xr-x 5 vdsm kvm 45 Jul 24 10:30 fc59491a-b214-4023-a7d3-59ff6b0f25da /rhs/brick3/Hacker: total 0 drwxr-xr-x 5 vdsm kvm 45 Jul 24 10:30 fc59491a-b214-4023-a7d3-59ff6b0f25da /rhs/brick5/Hacker: total 0 drwxr-xr-x 5 vdsm kvm 45 Jul 24 10:30 fc59491a-b214-4023-a7d3-59ff6b0f25da /rhs/brick7/Hacker: total 0 drwxr-xr-x 5 vdsm kvm 45 Jul 24 10:30 fc59491a-b214-4023-a7d3-59ff6b0f25da [root@rhs-client10 ~]# --------------------------------------------------------------- --------------------------------------------------------------- [root@rhs-client15 ~]# ls -l /rhs/brick*/Hacker/* /rhs/brick2/Hacker/fc59491a-b214-4023-a7d3-59ff6b0f25da: total 0 drwxr-xr-x 2 vdsm kvm 19 Jul 24 17:36 dom_md drwxr-xr-x 10 vdsm kvm 350 Jul 24 11:08 images drwxr-xr-x 4 vdsm kvm 28 Jul 24 10:30 master /rhs/brick4/Hacker/fc59491a-b214-4023-a7d3-59ff6b0f25da: total 0 drwxr-xr-x 2 vdsm kvm 6 Jul 24 17:36 dom_md drwxr-xr-x 10 vdsm kvm 350 Jul 24 11:08 images drwxr-xr-x 4 vdsm kvm 28 Jul 24 10:30 master /rhs/brick6/Hacker/fc59491a-b214-4023-a7d3-59ff6b0f25da: total 0 drwxr-xr-x 2 vdsm kvm 6 Jul 24 16:04 dom_md drwxr-xr-x 10 vdsm kvm 350 Jul 24 11:08 images drwxr-xr-x 4 vdsm kvm 28 Jul 24 10:30 master /rhs/brick8/Hacker/fc59491a-b214-4023-a7d3-59ff6b0f25da: total 0 drwxr-xr-x 2 vdsm kvm 18 Jul 24 16:04 dom_md drwxr-xr-x 10 vdsm kvm 350 Jul 24 11:08 images drwxr-xr-x 4 vdsm kvm 28 Jul 24 10:30 master [root@rhs-client15 ~]# ls -l /rhs/brick*/Hacker /rhs/brick2/Hacker: total 0 drwxr-xr-x 5 vdsm kvm 45 Jul 24 10:30 fc59491a-b214-4023-a7d3-59ff6b0f25da /rhs/brick4/Hacker: total 0 drwxr-xr-x 5 vdsm kvm 45 Jul 24 10:30 fc59491a-b214-4023-a7d3-59ff6b0f25da /rhs/brick6/Hacker: total 0 drwxr-xr-x 5 vdsm kvm 45 Jul 24 10:30 fc59491a-b214-4023-a7d3-59ff6b0f25da /rhs/brick8/Hacker: total 0 drwxr-xr-x 5 vdsm kvm 45 Jul 24 10:30 fc59491a-b214-4023-a7d3-59ff6b0f25da [root@rhs-client15 ~]# --------------------------------------------------------------- --------------------------------------------------------------- [root@rhs-client37 ~]# ls -l /rhs/brick*/Hacker/* /rhs/brick2/Hacker/fc59491a-b214-4023-a7d3-59ff6b0f25da: total 0 drwxr-xr-x 2 vdsm kvm 19 Jul 24 17:36 dom_md drwxr-xr-x 10 vdsm kvm 350 Jul 24 11:08 images drwxr-xr-x 4 vdsm kvm 28 Jul 24 10:30 master /rhs/brick4/Hacker/fc59491a-b214-4023-a7d3-59ff6b0f25da: total 0 drwxr-xr-x 2 vdsm kvm 6 Jul 24 17:36 dom_md drwxr-xr-x 10 vdsm kvm 350 Jul 24 11:08 images drwxr-xr-x 4 vdsm kvm 28 Jul 24 10:30 master /rhs/brick6/Hacker/fc59491a-b214-4023-a7d3-59ff6b0f25da: total 0 drwxr-xr-x 2 vdsm kvm 6 Jul 24 16:04 dom_md drwxr-xr-x 10 vdsm kvm 350 Jul 24 11:08 images drwxr-xr-x 4 vdsm kvm 28 Jul 24 10:30 master /rhs/brick8/Hacker/fc59491a-b214-4023-a7d3-59ff6b0f25da: total 0 drwxr-xr-x 2 vdsm kvm 18 Jul 24 16:04 dom_md drwxr-xr-x 10 vdsm kvm 350 Jul 24 11:08 images drwxr-xr-x 4 vdsm kvm 28 Jul 24 10:30 master [root@rhs-client37 ~]# ls -l /rhs/brick*/Hacker /rhs/brick2/Hacker: total 0 drwxr-xr-x 5 vdsm kvm 45 Jul 24 10:30 fc59491a-b214-4023-a7d3-59ff6b0f25da /rhs/brick4/Hacker: total 0 drwxr-xr-x 5 vdsm kvm 45 Jul 24 10:30 fc59491a-b214-4023-a7d3-59ff6b0f25da /rhs/brick6/Hacker: total 0 drwxr-xr-x 5 vdsm kvm 45 Jul 24 10:30 fc59491a-b214-4023-a7d3-59ff6b0f25da /rhs/brick8/Hacker: total 0 drwxr-xr-x 5 vdsm kvm 45 Jul 24 10:30 fc59491a-b214-4023-a7d3-59ff6b0f25da [root@rhs-client37 ~]# ---------------------------------------------------------------
It looks like rebalance/remove-brick process is not getting any entries reported in readdir fop. Have verified the back-end bricks, and can confirm files/directories on all of the bricks. Looks like underlying subvolume is not returning any entries
Since all earlier issue reproductions involved Red Hat Storage servers on physical systems, the issue was tested on an environment with the Red Hat Storage servers installed on Virtual Machines. The issue was *reproduced* in this set-up as well.
Spent some time on this issue today with Avati and Shishir, and it seems that this particular cases have come up in Big Bend now is because of 'open-behind' (and if you notice, 'gluster volume set <VOL> group virt' didn't disable open-behind). With build glusterfs-3.4.0.13rhs, open-behind is default off on the volume, and hence can we get a final round of test done on this ?
Bug 988262 seems to be similar to this one.
As per Rich and Sayan's triage, removing the blocker flag
Rejy, the reason for removing this blocker is: "Introduced by open-behind; not blocker, since not default nor recommended for virtualization case"
(In reply to Sachidananda Urs from comment #48) > Rejy, the reason for removing this blocker is: > "Introduced by open-behind; not blocker, since not default nor recommended > for virtualization case" The test environment that reproduces the bug uses the 'virt' group to set the volume options, as is recommended. The rest of the volume options used are as default for the particular build. No volume options are being set manually. If there has been a change in the default volume options for the latest build, it needs to be tested whether that fixes the reported issue. If it is considered that a fix for the reported bug is in place, with any new patch, the BZ needs to be moved to 'ON_QA', with information on the build that contains the patch that fixes the issue.
(In reply to Rejy M Cyriac from comment #49) > (In reply to Sachidananda Urs from comment #48) > > Rejy, the reason for removing this blocker is: > > "Introduced by open-behind; not blocker, since not default nor recommended > > for virtualization case" > > The test environment that reproduces the bug uses the 'virt' group to set > the volume options, as is recommended. The rest of the volume options used > are as default for the particular build. No volume options are being set > manually. > > If there has been a change in the default volume options for the latest > build, it needs to be tested whether that fixes the reported issue. If it is > considered that a fix for the reported bug is in place, with any new patch, > the BZ needs to be moved to 'ON_QA', with information on the build that > contains the patch that fixes the issue. I need to add that there are two volume options that are being set manually, which are the ones for setting user and group ownership to 36:36. These are also being done as per recommendation. storage.owner-uid 36 storage.owner-gid 36 No other volume options are being set manually.
Verified that the remove-brick operation now migrates the data as expected, and the VMs stay online, and available, after completion of the operation. Test Environment versions: RHS - glusterfs-server-3.4.0.21rhs-1.el6rhs.x86_64 6X2 Distribute-Replicate Volume used a Storage Domain Red Hat Enterprise Virtualization Manager Version: 3.2.2-0.41.el6ev RHEVH-6.4 Hypervisor with glusterfs-3.4.0.21rhs-1.el6_4.x86_64 RHEL-6.4 Hypervisor with glusterfs-3.4.0.21rhs-1.el6_4.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html