Description of problem: The UUID of Red Hat Storage servers are meant to be unique, at least among the servers in a trusted pool. If the UUID of a server being added to the trusted pool is a duplicate of the UUID of an existing server in the trusted pool, this must be made known as early as possible, so as to prevent issues from occurring after the new server is well into production. The earliest that an issue of duplicate UUID can be made known is at the point of adding the new server to the trusted pool, with the 'gluster peer probe' command. The command should fail at this point with useful information in the output. However I find that currently, it is possible to add the server with the duplicate UUID to the trusted pool, and perform a lot of operations involving it including volume creation and use of the volume. This puts the entire trusted pool at risk for this hidden issue to pop up in the future, and may disrupt production. The operations that failed due to the duplicate UUID were: 1) When trying to create a volume using bricks of the same name on the two servers. The output message, given below, was not at all useful to decipher the root cause of the failure. "Brick: RHS6:/mir, RHS7:/mir one of the bricks contain the other" This failure does not happen if the brick names are different. 2) On trying to remove one of the servers with a duplicate UUID from the other server, the operation fails. The output message, given below, is not good enough to find out the root cause of the failure. " [root@RHS6 ~]# gluster peer detach RHS7 RHS7 is localhost [root@RHS6 ~]# " Version-Release number of selected component (if applicable): How reproducible: Clone a Red Hat Storage VM server from an existing Red Hat Storage VM server, and add the cloned Red Hat Storage VM server to the same trusted pool. Steps to Reproduce: 1. 2. 3. ---------------------------------------------------------- [root@RHS6 ~]# cat /var/lib/glusterd/glusterd.info UUID=3ffb89ae-1b5e-46c1-8d10-9185753464ee <--- same UUID as for other server [root@RHS6 ~]# service glusterd status glusterd (pid 1341) is running... [root@RHS7 ~]# cat /var/lib/glusterd/glusterd.info UUID=3ffb89ae-1b5e-46c1-8d10-9185753464ee <--- same UUID as for other server [root@RHS7 ~]# service glusterd status glusterd (pid 1354) is running... [root@RHS6 ~]# gluster peer probe RHS7 Probe successful <-- does not point out duplicate UUIDs [root@RHS6 ~]# gluster peer status Number of Peers: 1 Hostname: RHS7 Uuid: 3ffb89ae-1b5e-46c1-8d10-9185753464ee <--- same UUID as for other server State: Peer in Cluster (Connected) [root@RHS6 ~]# [root@RHS7 ~]# gluster peer status Number of Peers: 1 Hostname: 10.70.1.218 Uuid: 3ffb89ae-1b5e-46c1-8d10-9185753464e <--- same UUID as for other servere State: Peer in Cluster (Connected) [root@RHS7 ~]# [root@RHS6 ~]# df -Th /mir Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/GlusVol01-GlusLV01 xfs 1.5G 33M 1.5G 3% /mir [root@RHS6 ~]# gluster volume create data replica 2 RHS6:/mir RHS7:/mir Brick: RHS6:/mir, RHS7:/mir one of the bricks contain the other <-- does not point out duplicate UUIDs as reason for failure [root@RHS6 ~]# gluster volume create data replica 2 RHS6:/mir RHS7:/new Creation of volume data has been successful. Please start the volume to access data. <---- allows creation of volume even though UUIDs of servers are duplicate [root@RHS6 ~]# gluster volume start data Starting volume data has been successful [root@RHS6 ~]# gluster volume info Volume Name: data Type: Replicate Volume ID: 0c5416b3-9988-4757-bbe8-84e809611f51 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: RHS6:/mir Brick2: RHS7:/new [root@RHS6 ~]# gluster volume stop data Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y Stopping volume data has been successful [root@RHS6 ~]# gluster volume delete data Deleting volume will erase all information about the volume. Do you want to continue? (y/n) y Deleting volume data has been successful [root@RHS6 ~]# gluster volume info No volumes present [root@RHS6 ~]# [root@RHS6 ~]# gluster peer status Number of Peers: 1 Hostname: RHS7 Uuid: 3ffb89ae-1b5e-46c1-8d10-9185753464ee State: Peer in Cluster (Connected) [root@RHS6 ~]# gluster peer detach RHS7 RHS7 is localhost <---- failure message not useful - RHS7 is not localhost, but has duplicate UUID [root@RHS6 ~]# gluster peer detach RHS6 RHS6 is localhost <----- failure message same as above, but correct this time. Same failure message for both cases complicates trouble-shooting. [root@RHS7 ~]# gluster peer status Number of Peers: 1 Hostname: 10.70.1.218 Uuid: 3ffb89ae-1b5e-46c1-8d10-9185753464ee State: Peer in Cluster (Connected) [root@RHS7 ~]# gluster peer detach RHS6 RHS6 is localhost <---- failure message not useful - RHS6 is not localhost, but has duplicate UUID [root@RHS7 ~]# gluster peer detach 10.70.1.218 10.70.1.218 is localhost <---- failure message not useful - 10.70.1.218(RHS7) is not localhost, but has duplicate UUID [root@RHS7 ~]# hostname -I 10.70.1.15 [root@RHS7 ~]# gluster peer detach RHS7 RHS7 is localhost <----- failure message same as above, but correct this time. Same failure message for both cases complicates trouble-shooting. -------Issue resolved by removing /var/lib/glusterd/glusterd.info ---------- [root@RHS7 ~]# service glusterd stop Stopping glusterd: [ OK ] [root@RHS7 ~]# rm /var/lib/glusterd/glusterd.info rm: remove regular file `/var/lib/glusterd/glusterd.info'? y [root@RHS6 ~]# gluster peer status Number of Peers: 1 Hostname: RHS7 Uuid: 3ffb89ae-1b5e-46c1-8d10-9185753464ee State: Peer in Cluster (Disconnected) [root@RHS7 ~]# service glusterd start Starting glusterd: [ OK ] [root@RHS7 ~]# cat /var/lib/glusterd/glusterd.info UUID=c03174ff-5d0e-4861-b3dc-bb80370206bf [root@RHS7 ~]# [root@RHS6 ~]# gluster peer status Number of Peers: 1 Hostname: RHS7 Uuid: 3ffb89ae-1b5e-46c1-8d10-9185753464ee <----- UUID unchanged from before State: Peer in Cluster (Connected) [root@RHS6 ~]# [root@RHS7 ~]# gluster peer status Number of Peers: 1 Hostname: 10.70.1.218 Uuid: 3ffb89ae-1b5e-46c1-8d10-9185753464ee State: Peer in Cluster (Connected) [root@RHS7 ~]# gluster peer detach RHS6 Detach successful <--------- peer detach works now from the VM with new UUID [root@RHS7 ~]# gluster peer status No peers present [root@RHS7 ~]# [root@RHS6 ~]# gluster peer status No peers present [root@RHS6 ~]# [root@RHS6 ~]# gluster peer probe RHS7 Probe successful [root@RHS6 ~]# gluster peer status Number of Peers: 1 Hostname: RHS7 Uuid: c03174ff-5d0e-4861-b3dc-bb80370206bf <-------- new UUID updated State: Peer in Cluster (Connected) [root@RHS6 ~]# [root@RHS7 ~]# gluster peer status Number of Peers: 1 Hostname: 10.70.1.218 Uuid: 3ffb89ae-1b5e-46c1-8d10-9185753464ee State: Peer in Cluster (Connected) [root@RHS7 ~]# [root@RHS6 ~]# gluster peer detach RHS7 Detach successful [root@RHS6 ~]# gluster peer status No peers present [root@RHS6 ~]# ---------------------------------------------------------- Actual results: Two Red Hat Storage servers with duplicate UUIDs are allowed to be added to the same trusted pool, and various operations are allowed to performed, involving these servers. By not identifying the duplicate UUID issue at an early stage, the entire trusted pool is put at high risk in future, when the servers may be into production phase. Expected results: The duplicate UUID must be identified while adding the server, and the operation should fail, with useful output on the cause. This will prevent the risk of the issue striking, when the servers are well into production phase. Additional info: One possible scenario of duplicate UUIDs being created is covered in https://bugzilla.redhat.com/show_bug.cgi?id=811493 Even if the issue is resolved, we still need to guard against the possibility of a server with duplicate UUID being added to a trusted pool.
Adding 2.0.z? flag to BZ 811493 in order to track rmc's flag in this bug. *** This bug has been marked as a duplicate of bug 811493 ***