Bug 1069040

Summary:

Procedure for replacing a completely failed peer with one of the same hostname and Ip does not work

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

Brad Hubbard <bhubbard>

Component:

glusterfs

Assignee:

Raghavendra G <rgowdapp>

Status:

CLOSED NOTABUG

QA Contact:

Sudhir D <sdharane>

Severity:

high

Docs Contact:

Priority:

high

Version:

2.1

CC:

abelur, asrivast, hamiller, nlevinki, nsathyan, pkarampu, ravishankar, sasundar, sauchter, spandura, vbellur, vumrao

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2014-04-01 02:23:45 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1073815

Attachments:

Description	Flags
sosreport1	none
sosreport2	none
sosreport3	none

Description Brad Hubbard 2014-02-24 02:43:50 UTC

Created attachment 866847 [details]
sosreport1

Description of problem:

Trying to follow the procedures here to replace a completely failed peer with one of the same name and IP.

http://gluster.org/community/documentation/index.php/Gluster_3.4:_Brick_Restoration_-_Replace_Crashed_Server

Using the following configuration.

# gluster pool list
UUID                                    Hostname        State
59ffab21-cef8-425c-9f88-059e7954c8fe    sun-x6220.gsslab.rdu2.redhat.com        Connected 
ee611deb-7376-45da-84b5-d0325dadffdd    sun-x4100m2-1.gsslab.rdu2.redhat.com    Connected 
41275c0e-e4d1-4c81-b567-65a522099839    localhost       Connected 
[root@sun-x6275-1 ~]# gluster volume status
Status of volume: test_dist_rep
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick sun-x6275-1.gsslab.rdu2.redhat.com:/gluster/brick
1                                                       49152   Y       7509
Brick sun-x6220.gsslab.rdu2.redhat.com:/gluster/brick1  49152   Y       7048
Brick sun-x4100m2-1.gsslab.rdu2.redhat.com:/gluster/bri
ck1                                                     49152   Y       7116
Brick sun-x6275-1.gsslab.rdu2.redhat.com:/gluster/brick
2                                                       49153   Y       7520
Brick sun-x6220.gsslab.rdu2.redhat.com:/gluster/brick2  49153   Y       7059
Brick sun-x4100m2-1.gsslab.rdu2.redhat.com:/gluster/bri
ck2                                                     49153   Y       7127
NFS Server on localhost                                 2049    Y       7533
Self-heal Daemon on localhost                           N/A     Y       7540
NFS Server on sun-x6220.gsslab.rdu2.redhat.com          2049    Y       7073
Self-heal Daemon on sun-x6220.gsslab.rdu2.redhat.com    N/A     Y       7079
NFS Server on sun-x4100m2-1.gsslab.rdu2.redhat.com      2049    Y       7142
Self-heal Daemon on sun-x4100m2-1.gsslab.rdu2.redhat.co
m                                                       N/A     Y       7148
 
There are no active volume tasks

Once that is set up and running properly reprovision one of the peers (in this case sun-x4100m2-1.gsslab.rdu2.redhat.com). Once the machine comes up after the new provisioning run the following commands.

# service glusterd stop
# pkill glusterfs
On an original peer
# gluster pool list|gawk '/sun-x4100m2-1.gsslab.rdu2.redhat.com/{print $1}'
7f8e9c8f-664d-4492-ad2f-fc113da895b2
On replacement peer
# echo UUID=7f8e9c8f-664d-4492-ad2f-fc113da895b2>/var/lib/glusterd/glusterd.info
# mkdir -p /gluster/brick{1,2}
# service glusterd start
# gluster peer probe sun-x6275-1.gsslab.rdu2.redhat.com
# service glusterd restart
# gluster pool list
UUID                                    Hostname        State
fb4b201d-edd4-4e56-93d3-a7c4885c9d5e    sun-x6275-1.gsslab.rdu2.redhat.com      Connected 
7f8e9c8f-664d-4492-ad2f-fc113da895b2    localhost       Connected 
if peers are missing, probe them explicitly, then restart glusterd again
# gluster peer probe sun-x6220.gsslab.rdu2.redhat.com
# service glusterd restart
# gluster pool list
# gluster volume info
if volume configuration is missing, do
# gluster volume sync sun-x6275-1.gsslab.rdu2.redhat.com all                                       
Sync volume may make data inaccessible while the sync is in progress. Do you want to continue? (y/n) y
volume sync: success
# gluster volume info
# setfattr -n trusted.glusterfs.volume-id -v 0x$(grep volume-id /var/lib/glusterd/vols/disaster_recovery/info| cut -d= -f2 | sed 's/-//g') /gluster/brick1
# setfattr -n trusted.glusterfs.volume-id -v 0x$(grep volume-id /var/lib/glusterd/vols/disaster_recovery/info| cut -d= -f2 | sed 's/-//g') /gluster/brick2

# gluster volume heal test_dist_rep full
Self-heal daemon is not running. Check self-heal daemon log file.

There is no glustershd.log file present.

Running self heal from one of the others hosts appears to work but no files appear on the replacement bricks.

Gluster doesn't appear to be starting the correct processes on the new peer.

[root@sun-x4100m2-1 ~]# ps -ef|grep gluster
root      1603     1  0 21:22 ?        00:00:01 /usr/sbin/glusterd --pid-file=/var/run/glusterd.pid
root      3137  3123  0 21:34 pts/0    00:00:00 grep gluster

Version-Release number of selected component (if applicable):
glusterfs-3.4.0.44rhs-1.el6rhs.x86_64

How reproducible:
100%

Attaching sosreports from the three peers.

Comment 1 Brad Hubbard 2014-02-24 02:47:37 UTC

Created attachment 866848 [details]
sosreport2

Comment 2 Brad Hubbard 2014-02-24 02:56:51 UTC

Created attachment 866850 [details]
sosreport3

Comment 6 Ravishankar N 2014-03-28 05:21:05 UTC

This works for me:
From any of the 2 healthy peers:

#gluster peer detach <IP_of_the_peer_that went_down> force
#gluster peer probe <new_peer_with_same_IP>

Then  check if the brick, self-heald processes on the new peer are alive using #gluster volume status <vol_name>

If not , run #gluster volume start <vol_name> force

Now we can run  #gluster volume heal <vol_name> full to trigger the heal to the bricks of the new peer. 

Also, the "store.c:1957:glusterd_store_retrieve_volume] 0-: Unknown key: brick-X"
messages are just spurious messages and not really errors. A patch that fixes it has just been merged upstream:http://review.gluster.org/#/c/7314/

Comment 7 Pranith Kumar K 2014-03-28 05:38:30 UTC

I am not sure this addresses replacing of the bricks. Lets try that one as well before confirming the steps.

Pranith

Comment 8 Ravishankar N 2014-03-28 06:27:02 UTC

Tried out the steps given in comment #6 for a couple of iterations and verified that it works fine. 

It was observed that if "gluster volume heal <vol_name> full" was issued before the connections between processes were established (after gluster volume start force), self-heal did not happen. In such a case, just wait for a couple of minutes and run the command again. This will trigger the heal.

Comment 9 Brad Hubbard 2014-03-28 06:53:31 UTC

That is considerably different to the community procedure mentioned and not really intuitive for our customers. Have we documented it anywhere?

Comment 10 spandura 2014-03-28 09:56:48 UTC

Here is the possible cause for the issue:-
==========================================

Steps used to re-create
~~~~~~~~~~~~~~~~~~~~~~~~

1. Create a replicate volume 1 x 2 with 2 nodes (node1 {king} and node2 {hicks}).

2. Create a volume 'vol_rep' . Start the volume. Create a fuse mount and add files/dirs to it.

3. Bring offline node2. {Crash/Re-provision).

[root@king vol_rep]# gluster peer status
Number of Peers: 1

Hostname: hicks
Uuid: 093ebaa2-3dc2-4317-a2b5-3461ff08b0e7
State: Peer in Cluster (Disconnected)

Note: node2 , when comes back online has same hostname/ip but a different glusterd UUID

Also, by default we are starting the glusterd automatically when the node is rebooted.  If this is the case, when node2 comes online node1 sees node2 and changes it's state "Disconnected" to "Connected"

[root@king vol_rep]# gluster peer status
Number of Peers: 1

Hostname: hicks
Uuid: 093ebaa2-3dc2-4317-a2b5-3461ff08b0e7
State: Peer in Cluster (Connected)

Even though node2 had a different UUID, node1 established the connection to node2. 

4. stop glusterd on node2. 

[root@hicks ~]# cat /var/lib/glusterd/glusterd.info 
UUID=32e87425-8309-45bc-9d91-2cbb3e431326
operating-version=2

[root@hicks ~]# 
[root@hicks ~]# service glusterd stop
[root@hicks ~]# d:                                         [  OK  ]

5. On node2, edit the /var/lib/glusterd/glusterd.info . Change the UUID of glusterd to older value : "093ebaa2-3dc2-4317-a2b5-3461ff08b0e7"

6. Create the bricks on node2. Set the "volume-id" extended attributes for the bricks. 

[root@hicks ~]# mkdir /rhs/bricks/b2
[root@hicks ~]# setfattr -n trusted.glusterfs.volume-id -v "0x1d105460d1e9478f97d0d44fa2068114" /rhs/bricks/b2/


7. restart the glusterd on node2.

8. As soon as node2 glusterd is restrted, node1 is putting the node2 glusterd in 'peer Rejected' state.

[root@king vol_rep]# gluster peer status
Number of Peers: 1

Hostname: hicks
Uuid: 093ebaa2-3dc2-4317-a2b5-3461ff08b0e7
State: Peer Rejected (Disconnected)


9. from node2, peer probe node1 

[root@hicks ~]# service glusterd start
Starting glusterd:                                         [  OK  ]

[root@hicks ~]# gluster peer status
Number of Peers: 0
[root@hicks ~]# 
[root@hicks ~]# gluster v info
No volumes present

[root@hicks ~]# gluster peer probe king
peer probe: success. 

[root@hicks ~]# gluster peer status
Number of Peers: 1

Hostname: king
Uuid: 62c5a2a0-c058-47c1-af1f-bac54a33d2d8
State: Accepted peer request (Connected)

[root@king vol_rep]# gluster peer status
Number of Peers: 1

Hostname: hicks
Uuid: 093ebaa2-3dc2-4317-a2b5-3461ff08b0e7
State: Accepted peer request (Connected)


NOTE: For node1 , node2 is in "Accepted peer request" and For node2, node1 is in "Accepted peer request" state. For the proper functionality, nodes must be in "Peer in cluster" state.

10. restart glusted on both nodes node1 and node2.

NODE1:
======
[root@king vol_rep]# service glusterd restart
Starting glusterd:                                         [  OK  ]
[root@king vol_rep]# 
[root@king vol_rep]# gluster peer status
Number of Peers: 1

Hostname: hicks
Uuid: 093ebaa2-3dc2-4317-a2b5-3461ff08b0e7
State: Sent and Received peer request (Connected)
[root@king vol_rep]# 
[root@king vol_rep]# 

NODE2:
======
[root@hicks ~]# service glusterd restart
Starting glusterd:                                         [  OK  ]
[root@hicks ~]# 
[root@hicks ~]# gluster peer status
Number of Peers: 1

Hostname: king
Uuid: 62c5a2a0-c058-47c1-af1f-bac54a33d2d8
State: Sent and Received peer request (Connected)

NOTE : At this state, when the glusterd is restarted, node1's NFS and glustershd process is not started

[root@king vol_rep]# gluster v status
Status of volume: vol_rep
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick king:/rhs/bricks/b1				49152	Y	3810
NFS Server on localhost					N/A	N	N/A
Self-heal Daemon on localhost				N/A	N	N/A
 
Task Status of Volume vol_rep
------------------------------------------------------------------------------
There are no active volume tasks


11. From node2, sync the volume information from node1. 

[root@hicks ~]# gluster v info
No volumes present
[root@hicks ~]# 
[root@hicks ~]# 
[root@hicks ~]# gluster v status
No volumes present
[root@hicks ~]# 
[root@hicks ~]# 
[root@hicks ~]# gluster volume sync king all
Sync volume may make data inaccessible while the sync is in progress. Do you want to continue? (y/n) y
volume sync: success
[root@hicks ~]# 
[root@hicks ~]# gluster v info
 
Volume Name: vol_rep
Type: Replicate
Volume ID: 1d105460-d1e9-478f-97d0-d44fa2068114
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: king:/rhs/bricks/b1
Brick2: hicks:/rhs/bricks/b2

[root@hicks ~]# gluster v status
Status of volume: vol_rep
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick hicks:/rhs/bricks/b2				49152	Y	3391
NFS Server on localhost					N/A	N	N/A
Self-heal Daemon on localhost				N/A	N	N/A


NOTE: At this state, Even after volume sync, node1 and node2's NFS and glustershd process is not started

"glusterd" of both the nodes are still in "State: Sent and Received peer request (Connected)" . Expected is "Peer in CLuster (Connected)"

For the errors "Unknown key:" Refer to the bugs : 
==================================================
https://bugzilla.redhat.com/show_bug.cgi?id=1036551
https://bugzilla.redhat.com/show_bug.cgi?id=1056910

For recovering as mentioned in "comment 6"
=============================================================
For all the nodes which crashed / re-provisioned and node goes offline,

1. Peer detach the crashed node 

    gluster peer detach <hostname_of_node_that_crashed> force

2. Note down the volume-id of the volume 

    VOLUME_ID=$(gluster v info | grep "Volume ID" | cut -d ":" -f 2 | tr -d ' \s')
    
3. When node that had previously crashed comes online execute the following on that node.  
                        
    a. create the brick directories
            mkdir <brick_directories>
            
    b. set the extended attribute "trusted.glusterfs.volume-id" on the brick directories.
            setfattr -n "trusted.glusterfs.volume-id" -v "VOLUME_ID" <brick_directories>
            
4. From any of the storage node , peer probe the perviously crashed node

    gluster peer probe <hostname_of_node_that_crashed>

    Validation
    ===========    
        1. Check the peer status . Status of the peer should be : "Peer in Cluster (Connected)" state. 

        2. Check the volume status . All the brick, nfs, glustershd process should be started. 

5. Trigger the heal : "gluster volume heal <volume_name> full"

Comment 11 Brad Hubbard 2014-03-31 03:07:05 UTC

The procedure outlined in comment 6 seems to work fine. I tested it twice on a three node pool and it worked both times, no problems. I'll get this documented in KCS in the next day or so but this needs to be included in the documentation.

Comment 12 Brad Hubbard 2014-04-01 02:23:45 UTC

I've created the following KCS documents for the two scenarios.

- How can I replace a completely failed peer with a machine with a different hostname/IP in Red Hat Storage 2 Update 1
https://access.redhat.com/site/solutions/720413

- How can I replace a completely failed peer with a machine with the same hostname/IP in Red Hat Storage 2 Update 1
https://access.redhat.com/site/solutions/773533

I will update the documentation bug.

Thanks for your efforts. Closing as NOTABUG.

Comment 13 Nagaprasad Sathyanarayana 2014-05-06 11:43:37 UTC

Dev ack to 3.0 RHS BZs