Created attachment 698915 [details] sosreport on which problem occured Description of problem: gluster self heal daemon is not operational after few gluster operations like add-brick, and self-healing. Also there is a following error message, in /var/log/glusterfs/glustershd.log [2013-02-18 16:21:58.853780] E [options.c:166:xlator_option_validate_bool] 0-rep-qcow2-replicate-0: option eager-lock ^A: '^A' is not a valid boolean value [2013-02-18 16:21:58.853806] W [options.c:771:xl_opt_validate] 0-rep-qcow2-replicate-0: validate of eager-lock returned -1 [2013-02-18 16:21:58.853836] E [graph.c:272:glusterfs_graph_validate_options] 0-rep-qcow2-replicate-0: validation failed: option eager-lock ^A: '^A' is not a valid boolean value [2013-02-18 16:21:58.853861] E [graph.c:476:glusterfs_graph_activate] 0-graph: validate options failed [2013-02-18 16:21:58.854198] W [glusterfsd.c:924:cleanup_and_exit] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x335100f0c5] (-->/usr/sbin/glusterfs(mgmt_getspec_cbk+0xe0) [0x40ca30] (-->/usr/sbin/glusterfs(glusterfs_process_volfp+0x198) [0x405a88]))) 0-: received signum (0), shutting down Version-Release number of selected component (if applicable): RHS2.0+ - [ glusterfs-3.3.0.5rhs40 ] How reproducible: Once. I have not tried it again Steps to Reproduce: 1. Create a 6X2 distributed replicate volume 2. Fuse mount the volume 3. Create a qcow2 image on the volume 4. Create a VM [appvm] using the above created image as backend. (i.e) virt-install --name vm1 --ram 4096 --vcpus 4 --location <iso-location> --disk path=<path-to-image-file-on-fuse-mnt>,format=qcow2,bus=virtio --vnc 5. Create a snapshot on the appvm, while saving one of the RHS node. <snapshot> : virsh snapshot-create-as --name snap1 --domain appvm <saving VM> : virsh save <domain-name> --file <file-name-to-save> NOTE: After saving, VM will be shutdown 6. Create few files from inside VM, once snapshot is created 7. Add a pair of bricks to the volume. 8. Start the VM which was shutdown earlier after saving (i.e) virsh start <domain-name> 9. Start the snapshot-revert on the appvm (i.e) virsh snapshot-revert --domain appvm --snapshotname snap1 10. Start rebalancing operation as follows, gluster volume rebalance <vol-name> fix-layout start gluster volume rebalance <vol-name> start 11. Initiate self heal also, by gluster volume heal <vol-name> 12. Check for status of rebalance by using the command, gluster volume rebalance <vol-name> status 13. Check the status of healing, by using the command, gluster volume heal <vol-name> info gluster volume rebalance <vol-name> status 14. Then again removed the added brick [safe removal] gluster volume remove-brick <vol-name> start 15. Monitor the remove-brick operation moves the data across bricks. IN PROGRESS status should be moved to COMPLETED status. gluster volume remove-brick <vol-name> status 16. Complete remove-brick operation gluster volume remove-brick <vol-name> commit 17. After few days[4 in my case], again try healing the volume gluster volume heal <vol-name> Actual results: Self-heal should happen successfully, and cluster.eager-lock is set to some control character, [^A] Expected results: Self-heal daemon was not operational Additional info: Following info on Test Bed design, 1. Create a 6 Logical Volume, each on the hard disk of 550GB 2. XFS partition all the logical volumes 3. Mount all volumes, under, /home/rhsvms/rhsvm{1..6} 4. On 3 such Logical volumes, Create four 100G raw images and one 50G raw images 5. On other 3 such Logical volumes, Create four 100G qcow2 images and one 50G raw images 6. Create 3 VMs, each using 4 raw images as additional disks [bricks], and 50GB disk for installation of RHS 7. Create 3 VMs, each using 4 qcow2 images as additional disks [bricks], and 50GB for installation of RHS 8. Create 6x2 distributed-replicate volume, with following pair of disk combo. [ raw-raw, raw-raw,qcow2-qcow2,qcow2-qcow2,raw-qcow2,raw-qcow2 ] 9. Create a 3 numbers of 1X2 replica volume, with raw-raw, raw-qcow2, qcow2-qcow2
Gluster volume info command output, Volume Name: distrep-vmstore Type: Distributed-Replicate Volume ID: b0df694d-aaea-4a2e-8ffd-7773dda51177 Status: Started Number of Bricks: 6 x 2 = 12 Transport-type: tcp Bricks: Brick1: 10.70.37.105:/bricks/dist-repl-brick1 Brick2: 10.70.37.157:/bricks/dist-repl-brick1 Brick3: 10.70.37.105:/bricks/spare-brick1 Brick4: 10.70.37.157:/bricks/spare-brick1 Brick5: 10.70.37.162:/bricks/dist-repl-brick1 Brick6: 10.70.37.112:/bricks/dist-repl-brick1 Brick7: 10.70.37.162:/bricks/spare-brick1 Brick8: 10.70.37.112:/bricks/spare-brick1 Brick9: 10.70.37.150:/bricks/dist-repl-brick1 Brick10: 10.70.37.124:/bricks/dist-repl-brick1 Brick11: 10.70.37.150:/bricks/spare-brick1 Brick12: 10.70.37.124:/bricks/spare-brick1 Options Reconfigured: performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off storage.linux-aio: disable cluster.eager-lock: enable network.remote-dio: enable storage.owner-uid: 36 storage.owner-gid: 36
Targeting for Arches.
Per 04-10-2013 Storage bug triage meeting, targeting for Big Bend.
We were not able to reproduce this bug. Can you please provide the step for the same.
Hi Venkatesh, Even I tried to reproduce this issue, but could not able to reproduce it.
Since QE is unable to reproduce this bug, closing this for now. If found in future, we shall take up.