Bug 874565 - 'gluster peer probe' command does not ensure UUID of probed server is unique in trusted pool
Summary: 'gluster peer probe' command does not ensure UUID of probed server is unique ...
Keywords:
Status: CLOSED DUPLICATE of bug 811493
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: 2.0
Hardware: All
OS: Linux
high
medium
Target Milestone: ---
: ---
Assignee: krishnan parthasarathi
QA Contact: Sudhir D
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-11-08 13:13 UTC by Rejy M Cyriac
Modified: 2018-12-02 17:22 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-12-26 11:44:10 UTC
Embargoed:


Attachments (Terms of Use)

Description Rejy M Cyriac 2012-11-08 13:13:03 UTC
Description of problem:

The UUID of Red Hat Storage servers are meant to be unique, at least among the servers in a trusted pool. If the UUID of a server being added to the trusted pool is a duplicate of the UUID of an existing server in the trusted pool, this must be made known as early as possible, so as to prevent issues from occurring after the new server is well into production.

The earliest that an issue of duplicate UUID can be made known is at the point of adding the new server to the trusted pool, with the 'gluster peer probe' command. The command should fail at this point with useful information in the output. However I find that currently, it is possible to add the server with the duplicate UUID to the trusted pool, and perform a lot of operations involving it including volume creation and use of the volume. This puts the entire trusted pool at risk for this hidden issue to pop up in the future, and may disrupt production.

The operations that failed due to the duplicate UUID were:

1) When trying to create a volume using bricks of the same name on the two servers. The output message, given below, was not at all useful to decipher the root cause of the failure.

"Brick: RHS6:/mir, RHS7:/mir one of the bricks contain the other"

This failure does not happen if the brick names are different.

2) On trying to remove one of the servers with a duplicate UUID from the other server, the operation fails. The output message, given below, is not good enough to find out the root cause of the failure.

"
[root@RHS6 ~]# gluster peer detach RHS7
RHS7 is localhost
[root@RHS6 ~]# 
"

Version-Release number of selected component (if applicable):


How reproducible:

Clone a Red Hat Storage VM server from an existing Red Hat Storage VM server, and add the cloned Red Hat Storage VM server to the same trusted pool.

Steps to Reproduce:
1.
2.
3.

----------------------------------------------------------

[root@RHS6 ~]# cat /var/lib/glusterd/glusterd.info 
UUID=3ffb89ae-1b5e-46c1-8d10-9185753464ee  <--- same UUID as for other server
[root@RHS6 ~]# service glusterd status
glusterd (pid  1341) is running...

[root@RHS7 ~]# cat /var/lib/glusterd/glusterd.info 
UUID=3ffb89ae-1b5e-46c1-8d10-9185753464ee  <--- same UUID as for other server
[root@RHS7 ~]# service glusterd status
glusterd (pid  1354) is running...

[root@RHS6 ~]# gluster peer probe RHS7
Probe successful     <-- does not point out duplicate UUIDs
[root@RHS6 ~]# gluster peer status
Number of Peers: 1

Hostname: RHS7
Uuid: 3ffb89ae-1b5e-46c1-8d10-9185753464ee  <--- same UUID as for other server
State: Peer in Cluster (Connected)
[root@RHS6 ~]# 

[root@RHS7 ~]# gluster peer status
Number of Peers: 1

Hostname: 10.70.1.218
Uuid: 3ffb89ae-1b5e-46c1-8d10-9185753464e  <--- same UUID as for other servere
State: Peer in Cluster (Connected)
[root@RHS7 ~]# 


[root@RHS6 ~]# df -Th /mir
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/mapper/GlusVol01-GlusLV01
               xfs    1.5G   33M  1.5G   3% /mir

[root@RHS6 ~]# gluster volume create data replica 2 RHS6:/mir RHS7:/mir
Brick: RHS6:/mir, RHS7:/mir one of the bricks contain the other <-- does not point out duplicate UUIDs as reason for failure


[root@RHS6 ~]# gluster volume create data replica 2 RHS6:/mir RHS7:/new
Creation of volume data has been successful. Please start the volume to access data.  <---- allows creation of volume even though UUIDs of servers are duplicate

[root@RHS6 ~]# gluster volume start data
Starting volume data has been successful
[root@RHS6 ~]# gluster volume info
 
Volume Name: data
Type: Replicate
Volume ID: 0c5416b3-9988-4757-bbe8-84e809611f51
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: RHS6:/mir
Brick2: RHS7:/new


[root@RHS6 ~]# gluster volume stop data
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
Stopping volume data has been successful
[root@RHS6 ~]# gluster volume delete data
Deleting volume will erase all information about the volume. Do you want to continue? (y/n) y
Deleting volume data has been successful
[root@RHS6 ~]# gluster volume info
No volumes present
[root@RHS6 ~]# 


[root@RHS6 ~]# gluster peer status
Number of Peers: 1

Hostname: RHS7
Uuid: 3ffb89ae-1b5e-46c1-8d10-9185753464ee
State: Peer in Cluster (Connected)
[root@RHS6 ~]# gluster peer detach RHS7
RHS7 is localhost   <---- failure message not useful - RHS7 is not localhost, but has duplicate UUID 

[root@RHS6 ~]# gluster peer detach RHS6
RHS6 is localhost   <----- failure message same as above, but correct this time. Same failure message for both cases complicates trouble-shooting.

[root@RHS7 ~]# gluster peer status
Number of Peers: 1

Hostname: 10.70.1.218
Uuid: 3ffb89ae-1b5e-46c1-8d10-9185753464ee
State: Peer in Cluster (Connected)
[root@RHS7 ~]# gluster peer detach RHS6
RHS6 is localhost   <---- failure message not useful - RHS6 is not localhost, but has duplicate UUID

[root@RHS7 ~]# gluster peer detach 10.70.1.218
10.70.1.218 is localhost   <---- failure message not useful - 10.70.1.218(RHS7) is not localhost, but has duplicate UUID

[root@RHS7 ~]# hostname -I
10.70.1.15 
[root@RHS7 ~]# gluster peer detach RHS7
RHS7 is localhost   <----- failure message same as above, but correct this time. Same failure message for both cases complicates trouble-shooting.

-------Issue resolved by removing /var/lib/glusterd/glusterd.info ---------- 

[root@RHS7 ~]# service glusterd stop
Stopping glusterd:                                         [  OK  ]
[root@RHS7 ~]# rm /var/lib/glusterd/glusterd.info 
rm: remove regular file `/var/lib/glusterd/glusterd.info'? y

[root@RHS6 ~]# gluster peer status
Number of Peers: 1

Hostname: RHS7
Uuid: 3ffb89ae-1b5e-46c1-8d10-9185753464ee
State: Peer in Cluster (Disconnected)



[root@RHS7 ~]# service glusterd start
Starting glusterd:                                         [  OK  ]
[root@RHS7 ~]# cat /var/lib/glusterd/glusterd.info
UUID=c03174ff-5d0e-4861-b3dc-bb80370206bf
[root@RHS7 ~]# 



[root@RHS6 ~]# gluster peer status
Number of Peers: 1

Hostname: RHS7
Uuid: 3ffb89ae-1b5e-46c1-8d10-9185753464ee <----- UUID unchanged from before
State: Peer in Cluster (Connected)
[root@RHS6 ~]# 

[root@RHS7 ~]# gluster peer status
Number of Peers: 1

Hostname: 10.70.1.218
Uuid: 3ffb89ae-1b5e-46c1-8d10-9185753464ee
State: Peer in Cluster (Connected)
[root@RHS7 ~]# gluster peer detach RHS6
Detach successful  <--------- peer detach works now from the VM with new UUID
[root@RHS7 ~]# gluster peer status
No peers present
[root@RHS7 ~]# 


[root@RHS6 ~]# gluster peer status
No peers present
[root@RHS6 ~]# 


[root@RHS6 ~]# gluster peer probe RHS7
Probe successful
[root@RHS6 ~]# gluster peer status
Number of Peers: 1

Hostname: RHS7
Uuid: c03174ff-5d0e-4861-b3dc-bb80370206bf <-------- new UUID updated
State: Peer in Cluster (Connected)
[root@RHS6 ~]# 

[root@RHS7 ~]# gluster peer status
Number of Peers: 1

Hostname: 10.70.1.218
Uuid: 3ffb89ae-1b5e-46c1-8d10-9185753464ee
State: Peer in Cluster (Connected)
[root@RHS7 ~]# 


[root@RHS6 ~]# gluster peer detach RHS7
Detach successful
[root@RHS6 ~]# gluster peer status
No peers present
[root@RHS6 ~]# 

----------------------------------------------------------
  
Actual results:

Two Red Hat Storage servers with duplicate UUIDs are allowed to be added to the same trusted pool, and various operations are allowed to performed, involving these servers. By not identifying the duplicate UUID issue at an early stage, the entire trusted pool is put at high risk in future, when the servers may be into production phase.

Expected results:

The duplicate UUID must be identified while adding the server, and the operation should fail, with useful output on the cause. This will prevent the risk of the issue striking, when the servers are well into production phase.

Additional info:
One possible scenario of duplicate UUIDs being created is covered in
https://bugzilla.redhat.com/show_bug.cgi?id=811493

Even if the issue is resolved, we still need to guard against the possibility of a server with duplicate UUID being added to a trusted pool.

Comment 2 krishnan parthasarathi 2012-12-26 11:44:10 UTC
Adding 2.0.z? flag to BZ 811493 in order to track rmc's flag in this bug.

*** This bug has been marked as a duplicate of bug 811493 ***


Note You need to log in before you can comment on or make changes to this bug.