Description of problem: Host UUIDS are zeros in gluster vol info --xml output. Due to this bricks are removed from the volumes in the console. Version-Release number of selected component (if applicable): glusterfs-3.7.1-3.el6rhs.x86_64 How reproducible: Steps to Reproduce: 1. Add a cluster with two hosts one interface managing glusterd traffic and another interface managing data traffic. 2. create volumes and start using them. 3. Now add another node to the volume. Actual results: 1) gluster vol info --xml returns the host UUID as Zeros. 2) All the bricks from the volume are removed in the console and event message says bricks are deleted from the volume. Expected results: Host UUID should be zero in the gluster vol info --xml output. Additional info:
sos reports can be found in the link below. http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/rhsc/1233213/
RamaKasturi, The sosreport doesn't have the cmd_history.log file. It would help to have them for all the nodes. In the event you don't have the setup, could you add important commands, like peer probe, volume-create and volume-start, the nodes where they were issued, in their relative order? This is not easy to recollect :-( Alternatively, could you try recreating this issue?
During a peer probe the following steps take place. 1. Existing peer sends it volumes details to new peer. 2. New peer compares the recieved details with it's own and imports those volumes it is missing. (It will import all recieved volumes as it doesn't have any). 3. New peer confirms that its volume information is the same as the existing peers information. 4. Existing peer sends other peers information to new peer. 5. New peer imports the other peers. In steps 1 and 2, a bricks UUID is not exported or imported. Instead the new peer set's it by doing a brick-resolve. During a brick-resolve, glusterd searches for the bricks hostname in its peers list, and if a match is found, sets the uuid of the peer to the brick uuid. But, when a new peer is being probed, the import of volumes happens before import of peers. So the peer list is empty when brick-resolve happens and brick uuids are left as null. This should have happened always, even before rhgs-3.1. But we noticed it this time because of RHSC. RHSC chose the new peer to run the `volume info` command to update its ui. As brick uuid's came out as null, it couldn't match any to any-bricks in its database and panicked. There are several possible solutions and workarounds for this. Solution 1 - Fix the peer state-machine flow to import friends before importing volumes. Doing this would cause huge backwards compatibility issues, and is not something we'd like to do even if breaking backwards compatibility was acceptable. Solution 2 - Do a brick-resolve every time volume info is run. This is simpler to implement, but you could still get null uuids if the command was run in the tiny gap between import volumes and import friends. Workaround - The workaround is to run any volume command other than info and get, before running . This forces brick-resolve to happen for bricks with null uuids. The simplest command to run would be volume status. We're considering implementing solution-2 for now. But as there isn't really any serious consequences of this bug like data-loss or crashes, we don't consider this as a blocker for rhgs-3.1. We'll have the fix ready for the next rhgs-3.1.z release. I'm removing KP's devel-ack from this bug for rhgs-3.1.
Development Management has reviewed and declined this request. You may appeal this decision by reopening this request.
Doc text is edited. Please sign off to be included in Known Issues.
The doc-text looks good to me.
Frequency of reproducing this issue is very less. patch http://review.gluster.org/#/c/13047/ will fix this issue, which is already posted for review. so making status of the bug as a POST.
Doc text looks good!
The fix is now available in rhgs-3.1.3 branch (commit 9a64e5f), hence moving the state to Modified.
Verified this bug using the build - glusterfs-server-3.7.9-1 With this fix, issue is not seen, below are the verification steps. 1. Created two node cluster. 2. Create and started 2*2 volume using both the node bricks. 3. Probed new node 4. Checked "volume info --xml" on newly probed node Observation: ============ volume info --xml had non zero "hostUuid" and "brick uuid". Based on above info, moving this bug to verified state.
I don't we'd need a doc text here. Apologies.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240