Bug 1601341 - On a 4 node setup heketi blockvolume creation with HA=3 fails when one node is powered off
Summary: On a 4 node setup heketi blockvolume creation with HA=3 fails when one node i...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: heketi
Version: cns-3.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: CNS 3.10
Assignee: Sven Anderson
QA Contact: Neha Berry
URL:
Whiteboard:
Depends On:
Blocks: 1568862
TreeView+ depends on / blocked
 
Reported: 2018-07-16 06:57 UTC by Neha Berry
Modified: 2018-09-12 09:25 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-12 09:23:49 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github heketi heketi pull 1280 0 None None None 2018-07-24 14:48:58 UTC
Red Hat Bugzilla 1596035 0 unspecified CLOSED On a 4 node setup heketi block volume creation fails when a node is powered off 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHEA-2018:2686 0 None None None 2018-09-12 09:25:05 UTC

Internal Links: 1596035

Description Neha Berry 2018-07-16 06:57:25 UTC
Description of problem:
++++++++++++++++++++++++++
We had a 4 node CNS cluster. Created a blockvolume with HA=4 and the blockvolume creation succeeded. Post this, we brought down one of the 4 nodes. 
With 3 nodes still up and running, tried creating blockvolumes with HA=3 , HA=2, and HA=1 in different attempts. But all creation requests failed with following error message:
 
# date ; heketi-cli blockvolume create --size=10 --name=volHA3 --ha=3; date
Mon Jul 16 10:37:19 IST 2018
Error: insufficient block hosts online
Mon Jul 16 10:37:27 IST 2018

+++++++++++++++++++++++++++++++++++++++
Note: This bug seems to be somehow based on BZ#1595531.
+++++++ 
https://bugzilla.redhat.com/show_bug.cgi?id=1595531

We noted down the host list usd by heketi during blockvolume creation. It is seen that the order of nodes passed by Heketi is:
+++++++++
Hosts: [10.70.46.181 10.70.46.132 10.70.46.233 10.70.46.150]

Hence, for creating HA=3 volumes, we did try some corner cases and following were the results:
i) On bringging down any one of the first 3 nodes "10.70.46.181 10.70.46.132 10.70.46.233" blockvolume creation with HA=3 fails.
ii) On bringing down the last host in the list 10.70.46.150 , no issue observed and volumes with HA=3 were created successfully.
iii) Brought down 1st node in the list, and blockvolumes with HA=1 and HA=2 failed getting created.
iv) Brought down node 3 in the list and HA=1 and HA=2 succeeded, but HA=3 still fails. 

Hence, we may not face this issue if somehow unknowingly, we try to create a HA=3 volume and the node which is down is actually the last node of the above list.

++++++++++++++++++++++++++++++++++++++++

# heketi-cli blockvolume info  844f58788fc0339e659ed8c621d62da5
Name: volHA4
Size: 10
Volume Id: 844f58788fc0339e659ed8c621d62da5
Cluster Id: 3fbc2bfba517118dcf3fa4a29bda4a19
Hosts: [10.70.46.181 10.70.46.132 10.70.46.233 10.70.46.150]
IQN: iqn.2016-12.org.gluster-block:34858a17-be29-46d3-939d-1d919150104c
LUN: 0
Hacount: 4
Username: 
Password: 
Block Hosting Volume: 983d934c4f9a1c69843892542e290f6f

++++++++++++++++++++++++++++++++++++++


Version-Release number of selected component (if applicable):
++++++++++++++++++++++++++++++++++

# oc rsh heketi-storage-1-r44vr rpm -qa|grep heketi
python-heketi-7.0.0-3.el7rhgs.x86_64
heketi-client-7.0.0-3.el7rhgs.x86_64
heketi-7.0.0-3.el7rhgs.x86_64




How reproducible:
+++++++++++++++++++
3x3

Steps to Reproduce:
1. Create a 4 node CNS setup and confirm that all gluster-blockd services are UP and running.

2. Create a blockvolume with HA=4 . The blockvolume creation succeeds. Note the order of hosts used in the "Hosts" list of the blockvolume info

# date ; heketi-cli blockvolume create --size=10 --name=volHA4 --ha=4; date
Mon Jul 16 10:22:47 IST 2018
Name: volHA4
Size: 10
Volume Id: 844f58788fc0339e659ed8c621d62da5
Cluster Id: 3fbc2bfba517118dcf3fa4a29bda4a19
Hosts: [10.70.46.181 10.70.46.132 10.70.46.233 10.70.46.150]
IQN: iqn.2016-12.org.gluster-block:34858a17-be29-46d3-939d-1d919150104c
LUN: 0
Hacount: 4
Username: 
Password: 
Block Hosting Volume: 983d934c4f9a1c69843892542e290f6f
Mon Jul 16 10:23:54 IST 2018


E.g . Node A,B,C,D are the orders of the hosts in Hosts[] list
    Hosts: [A B C D]


3. Bring down one CNS node, preferably any of the first 3 nodes(A B C) in the above list.

4. Try creating a blockvolume with HA=3
# date ; heketi-cli blockvolume create --size=10 --name=volHA3 --ha=3; date
Mon Jul 16 10:37:19 IST 2018
Error: insufficient block hosts online
Mon Jul 16 10:37:27 IST 2018

 

Actual results:
+++++++++++++++
Blockvolume creation with HA=3 fails, even when 3 nodes are UP in a 4 node CNS cluster

Expected results:
+++++++++++++
Heketi should be able to pick up the 3 UP and running nodes to create new blockvolumes, even when 4th node is down.


Additional info:
+++++++++++++++

All steps will be detailed in the next comment.

Comment 10 Raghavendra Talur 2018-07-19 06:03:43 UTC
Root cause identified. Loop terminates at i == ha count instead of length(identified_hosts) == ha count.

I will work on this patch.

Comment 11 Sven Anderson 2018-07-24 14:44:54 UTC
Upstream fix: https://github.com/heketi/heketi/pull/1280

Comment 19 errata-xmlrpc 2018-09-12 09:23:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2686


Note You need to log in before you can comment on or make changes to this bug.