Created attachment 866847 [details] sosreport1 Description of problem: Trying to follow the procedures here to replace a completely failed peer with one of the same name and IP. http://gluster.org/community/documentation/index.php/Gluster_3.4:_Brick_Restoration_-_Replace_Crashed_Server Using the following configuration. # gluster pool list UUID Hostname State 59ffab21-cef8-425c-9f88-059e7954c8fe sun-x6220.gsslab.rdu2.redhat.com Connected ee611deb-7376-45da-84b5-d0325dadffdd sun-x4100m2-1.gsslab.rdu2.redhat.com Connected 41275c0e-e4d1-4c81-b567-65a522099839 localhost Connected [root@sun-x6275-1 ~]# gluster volume status Status of volume: test_dist_rep Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick sun-x6275-1.gsslab.rdu2.redhat.com:/gluster/brick 1 49152 Y 7509 Brick sun-x6220.gsslab.rdu2.redhat.com:/gluster/brick1 49152 Y 7048 Brick sun-x4100m2-1.gsslab.rdu2.redhat.com:/gluster/bri ck1 49152 Y 7116 Brick sun-x6275-1.gsslab.rdu2.redhat.com:/gluster/brick 2 49153 Y 7520 Brick sun-x6220.gsslab.rdu2.redhat.com:/gluster/brick2 49153 Y 7059 Brick sun-x4100m2-1.gsslab.rdu2.redhat.com:/gluster/bri ck2 49153 Y 7127 NFS Server on localhost 2049 Y 7533 Self-heal Daemon on localhost N/A Y 7540 NFS Server on sun-x6220.gsslab.rdu2.redhat.com 2049 Y 7073 Self-heal Daemon on sun-x6220.gsslab.rdu2.redhat.com N/A Y 7079 NFS Server on sun-x4100m2-1.gsslab.rdu2.redhat.com 2049 Y 7142 Self-heal Daemon on sun-x4100m2-1.gsslab.rdu2.redhat.co m N/A Y 7148 There are no active volume tasks Once that is set up and running properly reprovision one of the peers (in this case sun-x4100m2-1.gsslab.rdu2.redhat.com). Once the machine comes up after the new provisioning run the following commands. # service glusterd stop # pkill glusterfs On an original peer # gluster pool list|gawk '/sun-x4100m2-1.gsslab.rdu2.redhat.com/{print $1}' 7f8e9c8f-664d-4492-ad2f-fc113da895b2 On replacement peer # echo UUID=7f8e9c8f-664d-4492-ad2f-fc113da895b2>/var/lib/glusterd/glusterd.info # mkdir -p /gluster/brick{1,2} # service glusterd start # gluster peer probe sun-x6275-1.gsslab.rdu2.redhat.com # service glusterd restart # gluster pool list UUID Hostname State fb4b201d-edd4-4e56-93d3-a7c4885c9d5e sun-x6275-1.gsslab.rdu2.redhat.com Connected 7f8e9c8f-664d-4492-ad2f-fc113da895b2 localhost Connected if peers are missing, probe them explicitly, then restart glusterd again # gluster peer probe sun-x6220.gsslab.rdu2.redhat.com # service glusterd restart # gluster pool list # gluster volume info if volume configuration is missing, do # gluster volume sync sun-x6275-1.gsslab.rdu2.redhat.com all Sync volume may make data inaccessible while the sync is in progress. Do you want to continue? (y/n) y volume sync: success # gluster volume info # setfattr -n trusted.glusterfs.volume-id -v 0x$(grep volume-id /var/lib/glusterd/vols/disaster_recovery/info| cut -d= -f2 | sed 's/-//g') /gluster/brick1 # setfattr -n trusted.glusterfs.volume-id -v 0x$(grep volume-id /var/lib/glusterd/vols/disaster_recovery/info| cut -d= -f2 | sed 's/-//g') /gluster/brick2 # gluster volume heal test_dist_rep full Self-heal daemon is not running. Check self-heal daemon log file. There is no glustershd.log file present. Running self heal from one of the others hosts appears to work but no files appear on the replacement bricks. Gluster doesn't appear to be starting the correct processes on the new peer. [root@sun-x4100m2-1 ~]# ps -ef|grep gluster root 1603 1 0 21:22 ? 00:00:01 /usr/sbin/glusterd --pid-file=/var/run/glusterd.pid root 3137 3123 0 21:34 pts/0 00:00:00 grep gluster Version-Release number of selected component (if applicable): glusterfs-3.4.0.44rhs-1.el6rhs.x86_64 How reproducible: 100% Attaching sosreports from the three peers.
Created attachment 866848 [details] sosreport2
Created attachment 866850 [details] sosreport3
This works for me: From any of the 2 healthy peers: #gluster peer detach <IP_of_the_peer_that went_down> force #gluster peer probe <new_peer_with_same_IP> Then check if the brick, self-heald processes on the new peer are alive using #gluster volume status <vol_name> If not , run #gluster volume start <vol_name> force Now we can run #gluster volume heal <vol_name> full to trigger the heal to the bricks of the new peer. Also, the "store.c:1957:glusterd_store_retrieve_volume] 0-: Unknown key: brick-X" messages are just spurious messages and not really errors. A patch that fixes it has just been merged upstream:http://review.gluster.org/#/c/7314/
I am not sure this addresses replacing of the bricks. Lets try that one as well before confirming the steps. Pranith
Tried out the steps given in comment #6 for a couple of iterations and verified that it works fine. It was observed that if "gluster volume heal <vol_name> full" was issued before the connections between processes were established (after gluster volume start force), self-heal did not happen. In such a case, just wait for a couple of minutes and run the command again. This will trigger the heal.
That is considerably different to the community procedure mentioned and not really intuitive for our customers. Have we documented it anywhere?
Here is the possible cause for the issue:- ========================================== Steps used to re-create ~~~~~~~~~~~~~~~~~~~~~~~~ 1. Create a replicate volume 1 x 2 with 2 nodes (node1 {king} and node2 {hicks}). 2. Create a volume 'vol_rep' . Start the volume. Create a fuse mount and add files/dirs to it. 3. Bring offline node2. {Crash/Re-provision). [root@king vol_rep]# gluster peer status Number of Peers: 1 Hostname: hicks Uuid: 093ebaa2-3dc2-4317-a2b5-3461ff08b0e7 State: Peer in Cluster (Disconnected) Note: node2 , when comes back online has same hostname/ip but a different glusterd UUID Also, by default we are starting the glusterd automatically when the node is rebooted. If this is the case, when node2 comes online node1 sees node2 and changes it's state "Disconnected" to "Connected" [root@king vol_rep]# gluster peer status Number of Peers: 1 Hostname: hicks Uuid: 093ebaa2-3dc2-4317-a2b5-3461ff08b0e7 State: Peer in Cluster (Connected) Even though node2 had a different UUID, node1 established the connection to node2. 4. stop glusterd on node2. [root@hicks ~]# cat /var/lib/glusterd/glusterd.info UUID=32e87425-8309-45bc-9d91-2cbb3e431326 operating-version=2 [root@hicks ~]# [root@hicks ~]# service glusterd stop [root@hicks ~]# d: [ OK ] 5. On node2, edit the /var/lib/glusterd/glusterd.info . Change the UUID of glusterd to older value : "093ebaa2-3dc2-4317-a2b5-3461ff08b0e7" 6. Create the bricks on node2. Set the "volume-id" extended attributes for the bricks. [root@hicks ~]# mkdir /rhs/bricks/b2 [root@hicks ~]# setfattr -n trusted.glusterfs.volume-id -v "0x1d105460d1e9478f97d0d44fa2068114" /rhs/bricks/b2/ 7. restart the glusterd on node2. 8. As soon as node2 glusterd is restrted, node1 is putting the node2 glusterd in 'peer Rejected' state. [root@king vol_rep]# gluster peer status Number of Peers: 1 Hostname: hicks Uuid: 093ebaa2-3dc2-4317-a2b5-3461ff08b0e7 State: Peer Rejected (Disconnected) 9. from node2, peer probe node1 [root@hicks ~]# service glusterd start Starting glusterd: [ OK ] [root@hicks ~]# gluster peer status Number of Peers: 0 [root@hicks ~]# [root@hicks ~]# gluster v info No volumes present [root@hicks ~]# gluster peer probe king peer probe: success. [root@hicks ~]# gluster peer status Number of Peers: 1 Hostname: king Uuid: 62c5a2a0-c058-47c1-af1f-bac54a33d2d8 State: Accepted peer request (Connected) [root@king vol_rep]# gluster peer status Number of Peers: 1 Hostname: hicks Uuid: 093ebaa2-3dc2-4317-a2b5-3461ff08b0e7 State: Accepted peer request (Connected) NOTE: For node1 , node2 is in "Accepted peer request" and For node2, node1 is in "Accepted peer request" state. For the proper functionality, nodes must be in "Peer in cluster" state. 10. restart glusted on both nodes node1 and node2. NODE1: ====== [root@king vol_rep]# service glusterd restart Starting glusterd: [ OK ] [root@king vol_rep]# [root@king vol_rep]# gluster peer status Number of Peers: 1 Hostname: hicks Uuid: 093ebaa2-3dc2-4317-a2b5-3461ff08b0e7 State: Sent and Received peer request (Connected) [root@king vol_rep]# [root@king vol_rep]# NODE2: ====== [root@hicks ~]# service glusterd restart Starting glusterd: [ OK ] [root@hicks ~]# [root@hicks ~]# gluster peer status Number of Peers: 1 Hostname: king Uuid: 62c5a2a0-c058-47c1-af1f-bac54a33d2d8 State: Sent and Received peer request (Connected) NOTE : At this state, when the glusterd is restarted, node1's NFS and glustershd process is not started [root@king vol_rep]# gluster v status Status of volume: vol_rep Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick king:/rhs/bricks/b1 49152 Y 3810 NFS Server on localhost N/A N N/A Self-heal Daemon on localhost N/A N N/A Task Status of Volume vol_rep ------------------------------------------------------------------------------ There are no active volume tasks 11. From node2, sync the volume information from node1. [root@hicks ~]# gluster v info No volumes present [root@hicks ~]# [root@hicks ~]# [root@hicks ~]# gluster v status No volumes present [root@hicks ~]# [root@hicks ~]# [root@hicks ~]# gluster volume sync king all Sync volume may make data inaccessible while the sync is in progress. Do you want to continue? (y/n) y volume sync: success [root@hicks ~]# [root@hicks ~]# gluster v info Volume Name: vol_rep Type: Replicate Volume ID: 1d105460-d1e9-478f-97d0-d44fa2068114 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: king:/rhs/bricks/b1 Brick2: hicks:/rhs/bricks/b2 [root@hicks ~]# gluster v status Status of volume: vol_rep Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick hicks:/rhs/bricks/b2 49152 Y 3391 NFS Server on localhost N/A N N/A Self-heal Daemon on localhost N/A N N/A NOTE: At this state, Even after volume sync, node1 and node2's NFS and glustershd process is not started "glusterd" of both the nodes are still in "State: Sent and Received peer request (Connected)" . Expected is "Peer in CLuster (Connected)" For the errors "Unknown key:" Refer to the bugs : ================================================== https://bugzilla.redhat.com/show_bug.cgi?id=1036551 https://bugzilla.redhat.com/show_bug.cgi?id=1056910 For recovering as mentioned in "comment 6" ============================================================= For all the nodes which crashed / re-provisioned and node goes offline, 1. Peer detach the crashed node gluster peer detach <hostname_of_node_that_crashed> force 2. Note down the volume-id of the volume VOLUME_ID=$(gluster v info | grep "Volume ID" | cut -d ":" -f 2 | tr -d ' \s') 3. When node that had previously crashed comes online execute the following on that node. a. create the brick directories mkdir <brick_directories> b. set the extended attribute "trusted.glusterfs.volume-id" on the brick directories. setfattr -n "trusted.glusterfs.volume-id" -v "VOLUME_ID" <brick_directories> 4. From any of the storage node , peer probe the perviously crashed node gluster peer probe <hostname_of_node_that_crashed> Validation =========== 1. Check the peer status . Status of the peer should be : "Peer in Cluster (Connected)" state. 2. Check the volume status . All the brick, nfs, glustershd process should be started. 5. Trigger the heal : "gluster volume heal <volume_name> full"
The procedure outlined in comment 6 seems to work fine. I tested it twice on a three node pool and it worked both times, no problems. I'll get this documented in KCS in the next day or so but this needs to be included in the documentation.
I've created the following KCS documents for the two scenarios. - How can I replace a completely failed peer with a machine with a different hostname/IP in Red Hat Storage 2 Update 1 https://access.redhat.com/site/solutions/720413 - How can I replace a completely failed peer with a machine with the same hostname/IP in Red Hat Storage 2 Update 1 https://access.redhat.com/site/solutions/773533 I will update the documentation bug. Thanks for your efforts. Closing as NOTABUG.
Dev ack to 3.0 RHS BZs