Bug 1233213 - [New] - volume info --xml gives host UUID as zeros
Summary: [New] - volume info --xml gives host UUID as zeros
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: RHGS 3.1.3
Assignee: Satish Mohan
QA Contact: Byreddy
URL:
Whiteboard:
Depends On:
Blocks: 1216951 1268895 1299184
TreeView+ depends on / blocked
 
Reported: 2015-06-18 12:55 UTC by RamaKasturi
Modified: 2016-07-13 05:19 UTC (History)
10 users (show)

Fixed In Version: glusterfs-3.7.9-1
Doc Type: Bug Fix
Doc Text:
Peer update operations happen after volume update operations. This meant that when a node was probed to be added to an existing cluster, but did not yet have information about other cluster members because the cluster had just formed, the new node was unable to update the UUID of its brick when importing volumes from the probing node. This resulted in a host UUID of zero when the 'gluster volume info --xml' command was run on the newly added node. This update ensures that brick UUIDs are exported from cluster members and imported into the newly added node so that this issue no longer occurs.
Clone Of:
Environment:
Last Closed: 2016-06-23 04:54:22 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1293273 0 unspecified CLOSED [GlusterD]: Peer detach happening with a node which is hosting volume bricks 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHBA-2016:1240 0 normal SHIPPED_LIVE Red Hat Gluster Storage 3.1 Update 3 2016-06-23 08:51:28 UTC

Internal Links: 1293273

Description RamaKasturi 2015-06-18 12:55:30 UTC
Description of problem:
Host UUIDS are zeros in gluster vol info --xml output. Due to this bricks are removed from the volumes in the console.

Version-Release number of selected component (if applicable):
glusterfs-3.7.1-3.el6rhs.x86_64

How reproducible:


Steps to Reproduce:
1. Add a cluster with two hosts one interface managing glusterd traffic and another interface managing data traffic.
2. create volumes and start using them.
3. Now add another node to the volume.

Actual results:
1) gluster vol info --xml returns the host UUID as Zeros.

2) All the bricks from the volume are removed in the console and event message says bricks are deleted from the volume.


Expected results:
Host UUID should be zero in the gluster vol info --xml output.

Additional info:

Comment 2 RamaKasturi 2015-06-18 13:04:17 UTC
sos reports can be found in the link below.

http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/rhsc/1233213/

Comment 3 krishnan parthasarathi 2015-06-19 11:47:36 UTC
RamaKasturi,

The sosreport doesn't have the cmd_history.log file. It would help to have them for all the nodes. In the event you don't have the setup, could you add important commands, like peer probe, volume-create and volume-start, the nodes where they were issued, in their relative order? This is not easy to recollect :-( Alternatively, could you try recreating this issue?

Comment 4 Kaushal 2015-06-23 12:32:52 UTC
During a peer probe the following steps take place.

1. Existing peer sends it volumes details to new peer.
2. New peer compares the recieved details with it's own and imports those volumes it is missing. (It will import all recieved volumes as it doesn't have any).
3. New peer confirms that its volume information is the same as the existing peers information.
4. Existing peer sends other peers information to new peer.
5. New peer imports the other peers.

In steps 1 and 2, a bricks UUID is not exported or imported. Instead the new peer set's it by doing a brick-resolve. During a brick-resolve, glusterd searches for the bricks hostname in its peers list, and if a match is found, sets the uuid of the peer to the brick uuid.

But, when a new peer is being probed, the import of volumes happens before import of peers. So the peer list is empty when brick-resolve happens and brick uuids are left as null.

This should have happened always, even before rhgs-3.1. But we noticed it this time because of RHSC. RHSC chose the new peer to run the `volume info` command to update its ui. As brick uuid's came out as null, it couldn't match any to any-bricks in its database and panicked.


There are several possible solutions and workarounds for this.

Solution 1 - Fix the peer state-machine flow to import friends before importing volumes. Doing this would cause huge backwards compatibility issues, and is not something we'd like to do even if breaking backwards compatibility was acceptable.

Solution 2 - Do a brick-resolve every time volume info is run. This is simpler to implement, but you could still get null uuids if the command was run in the tiny gap between import volumes and import friends.

Workaround - The workaround is to run any volume command other than info and get, before running . This forces brick-resolve to happen for bricks with null uuids. The simplest command to run would be volume status.


We're considering implementing solution-2 for now. But as there isn't really any serious consequences of this bug like data-loss or crashes, we don't consider this as a blocker for rhgs-3.1. We'll have the fix ready for the next rhgs-3.1.z release. I'm removing KP's devel-ack from this bug for rhgs-3.1.

Comment 5 RHEL Program Management 2015-06-23 12:47:33 UTC
Development Management has reviewed and declined this request.
You may appeal this decision by reopening this request.

Comment 7 monti lawrence 2015-07-22 20:45:29 UTC
Doc text is edited. Please sign off to be included in Known Issues.

Comment 8 Kaushal 2015-07-27 05:07:10 UTC
The doc-text looks good to me.

Comment 10 Gaurav Kumar Garg 2016-01-04 09:39:18 UTC
Frequency of reproducing this issue is very less.
patch http://review.gluster.org/#/c/13047/ will fix this issue, which is already posted for review. so making status of the bug as a POST.

Comment 11 Atin Mukherjee 2016-02-26 06:31:48 UTC
Doc text looks good!

Comment 13 Atin Mukherjee 2016-03-22 12:15:34 UTC
The fix is now available in rhgs-3.1.3 branch (commit  9a64e5f), hence moving the state to Modified.

Comment 15 Byreddy 2016-04-04 05:10:32 UTC
Verified this bug using the build - glusterfs-server-3.7.9-1

With this fix, issue is not seen, below are the verification steps.

1. Created two node cluster.
2. Create and started 2*2 volume using both the node bricks.
3. Probed new node 
4. Checked "volume info --xml" on newly probed node

Observation:
============
volume info --xml had non zero "hostUuid" and  "brick uuid".


Based on above info, moving this bug to verified state.

Comment 18 Atin Mukherjee 2016-06-10 04:04:40 UTC
I don't we'd need a doc text here. Apologies.

Comment 20 errata-xmlrpc 2016-06-23 04:54:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240


Note You need to log in before you can comment on or make changes to this bug.