Bug 1441675
| Summary: | adding node to cns may fail if one of the existing node is down | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | krishnaram Karthick <kramdoss> | ||||||
| Component: | heketi | Assignee: | Mohamed Ashiq <mliyazud> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Apeksha <akhakhar> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | cns-3.5 | CC: | asriram, divya, fcami, hchiramm, madam, mliyazud, pprakash, rcyriac, rhs-bugs, rtalur, srmukher, storage-qa-internal, vinug | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | CNS 3.6 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | heketi-5.0.0-7 rhgs-volmanager-docker-5.0.0-9 | Doc Type: | Bug Fix | ||||||
| Doc Text: |
Prior to this update, heketi performed 'gluster peer probe' operation only from the first node in the trusted pool. Hence, adding a new node failed if the first node of the pool was not reachable. With this fix, 'gluster peer probe' operation tries on the next online node if the first node in the trusted pool is not reachable.
|
Story Points: | --- | ||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2017-10-11 07:07:22 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1415606, 1445447 | ||||||||
| Attachments: |
|
||||||||
Created attachment 1271135 [details]
topologyinfo
Created attachment 1271136 [details]
heketi_logs
Raghavendra Talur, Could you please add the Known Issues text for this bug? Regards, Divya I have provided the doc text for known issue. We don't know of a workaround hence that part is skipped. Upstream patch: https://github.com/heketi/heketi/pull/819 Unit test pending Will add them soon. (In reply to Mohamed Ashiq from comment #9) > Upstream patch: > > https://github.com/heketi/heketi/pull/819 > > Unit test pending > > Will add them soon. Patch merged upstream. Adding a new node, when one of the node is down is now working fine build - cns-deploy-5.0.0-37.el7rhgs.x86_64, heketi-client-5.0.0-11.el7rhgs.x86_64 [root@dhcp46-156 ~]# oc get nodes NAME STATUS AGE VERSION dhcp46-14.lab.eng.blr.redhat.com Ready 17h v1.6.1+5115d708d7 dhcp46-223.lab.eng.blr.redhat.com NotReady 17h v1.6.1+5115d708d7 dhcp47-127.lab.eng.blr.redhat.com Ready 17h v1.6.1+5115d708d7 dhcp47-169.lab.eng.blr.redhat.com Ready,SchedulingDisabled 17h v1.6.1+5115d708d7 dhcp47-184.lab.eng.blr.redhat.com Ready 17h v1.6.1+5115d708d7 [root@dhcp46-156 ~]# heketi-cli node add --zone=1 --cluster=dbdaa75a2da75a7bd6a5fe368725d92c --management-host-name=dhcp47-127.lab.eng.blr.redhat.com --storage-host-name=10.70.47.127 Node information: Id: 4e2e1425c5cf7fbdd30fd2d6c55a6087 State: online Cluster Id: dbdaa75a2da75a7bd6a5fe368725d92c Zone: 1 Management Hostname dhcp47-127.lab.eng.blr.redhat.com Storage Hostname 10.70.47.127 I changed the type from known issue to bug fix. Please recheck the doc text. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:2879 |
Description of problem: When CNS has three nodes N1,N2 and N3 where N1 is down, adding a new node N4 to this cluster might fail as it tries to perform peer probe from N1, which is the first node in the list. # heketi-cli topology load -j=topology.json Found node dhcp47-106.lab.eng.blr.redhat.com on cluster 796e6db1981f369ea0340913eeea4c9a Found device /dev/sdd Creating node dhcp47-81.lab.eng.blr.redhat.com ... Unable to create node: Unable to execute command on glusterfs-d3qp1: Found node dhcp47-74.lab.eng.blr.redhat.com on cluster 796e6db1981f369ea0340913eeea4c9a Found device /dev/sdd Found node dhcp47-82.lab.eng.blr.redhat.com on cluster 796e6db1981f369ea0340913eeea4c9a Found device /dev/sdd # oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE glusterfs-6gsv2 1/1 Running 1 36m 10.70.47.74 dhcp47-74.lab.eng.blr.redhat.com glusterfs-d3qp1 1/1 Running 1 36m 10.70.47.106 dhcp47-106.lab.eng.blr.redhat.com glusterfs-hwxn0 1/1 Running 1 36m 10.70.47.82 dhcp47-82.lab.eng.blr.redhat.com glusterfs-l0g6c 1/1 Running 1 36m 10.70.47.81 dhcp47-81.lab.eng.blr.redhat.com heketi-1-z0q5n 1/1 Running 3 29m 10.130.0.7 dhcp47-82.lab.eng.blr.redhat.com storage-project-router-1-46m1r 1/1 Running 1 48m 10.70.47.74 dhcp47-74.lab.eng.blr.redhat.com Version-Release number of selected component (if applicable): heketi-client-4.0.0-6.el7rhgs.x86_64 How reproducible: always Steps to Reproduce: 1. have 3 in a CNS setup - Node{1..3} 2. bring down first node in the topology file 3. Add a new node (Node4) in order to replace Node1 4. update topology file and try to load topology file Actual results: adding new node fails as it tries to do a peer probe from Node-1 Expected results: when Node-1 is down, heketi should try to peer probe from node-2 Additional info: