Description of problem: ======================= On a 3node brick-mux enabled cluster on a replica 3 volume with group profile set to 'gluster-block', when gluster-block create/delete is done in a loop, for one of the seemingly random blocks in the middle of the loop, the create (or delete) fails with this error - "Version check failed between block servers. (host 10.70.46.176 returned -1)". All the block creates/deletes before/after the concerned block do succeed. After the loop finishes its run, when the same command is re triggered for the block for which it failed, it succeeds. Did not get much idea from the gluster-block logs, and I don't see a pattern of failure. But it has happened about 3 times now, in the past 1 week. Raising a relatively low priority bug for now, as the same command when attempted another time does succeed. Sosreports have been run immediately after latest such occurrence, and will be copied at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/swetas/<bugnumber>. Version-Release number of selected component (if applicable): ============================================================ glusterfs-3.8.4-54.13 gluster-block-0.2.1-20 tcmu-runner-1.2.0-20 How reproducible: ================ Intermittent Additional info: =============== [root@dhcp46-50 ozone0]# gluster-block list ozone ob2 ob3 ob4 ob5 ob6 ob7 [root@dhcp46-50 ozone0]# for i in {2..7}; do gluster-block delete ozone/ob$i; done SUCCESSFUL ON: 10.70.46.50 10.70.46.176 10.70.46.102 RESULT: SUCCESS SUCCESSFUL ON: 10.70.46.50 10.70.46.176 10.70.46.102 RESULT: SUCCESS Version check failed between block servers. (host 10.70.46.176 returned -1) RESULT:FAIL SUCCESSFUL ON: 10.70.46.50 10.70.46.102 10.70.46.176 RESULT: SUCCESS SUCCESSFUL ON: 10.70.46.102 10.70.46.50 10.70.46.176 RESULT: SUCCESS SUCCESSFUL ON: 10.70.46.176 10.70.46.102 10.70.46.50 RESULT: SUCCESS [root@dhcp46-50 ozone0]# gluster-block list ozone ob4 [root@dhcp46-50 ozone0]# gluster-block info ozone/ob4 NAME: ob4 VOLUME: ozone GBID: 2e58d99f-c212-4d85-9390-b4d017d1a544 SIZE: 448.0 MiB HA: 3 PASSWORD: b109ffff-3898-4fbe-97fb-9bab52f919db EXPORTED ON: 10.70.46.50 10.70.46.102 10.70.46.176 [root@dhcp46-50 ozone0]# [root@dhcp46-50 ozone0]# [root@dhcp46-50 ozone0]# gluster-block delete ozone/ob4 SUCCESSFUL ON: 10.70.46.50 10.70.46.102 10.70.46.176 RESULT: SUCCESS [root@dhcp46-50 ozone0]#
This must be fixed in 3.10. Giving devel-ack. Please provide QE-ack.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2691