Hide Forgot
Description of problem: ======================== Consider the case where the storage node is stopped and on restart the storage node gets a new ip_address or hostname. ( When Amazon EC2 instances are stopped and brought back online, the Hostname and the IP address gets changed. The storage node has the cluster information. But the brick paths gets changed) Upon restart of the storage node's glusterd tries to start and start of glusterd fails because it is not able to resolve the brick path. The error message we get is : [2013-11-28 10:14:28.928388] E [glusterd-store.c:1905:glusterd_store_retrieve_volume] 0-: Unknown key: brick-0 [2013-11-28 10:14:28.928424] E [glusterd-store.c:1905:glusterd_store_retrieve_volume] 0-: Unknown key: brick-1 [2013-11-28 10:14:28.928441] E [glusterd-store.c:1905:glusterd_store_retrieve_volume] 0-: Unknown key: brick-2 [2013-11-28 10:14:28.928457] E [glusterd-store.c:1905:glusterd_store_retrieve_volume] 0-: Unknown key: brick-3 [2013-11-28 10:14:28.928472] E [glusterd-store.c:1905:glusterd_store_retrieve_volume] 0-: Unknown key: brick-4 [2013-11-28 10:14:28.928488] E [glusterd-store.c:1905:glusterd_store_retrieve_volume] 0-: Unknown key: brick-5 "glusterd" should be restarted even though it is not able to resolve the brick path. Without restarting glusterd we will not be able to do any "peer" or "volume" operations. Version-Release number of selected component (if applicable): ============================================================= glusterfs 3.4.0.44.1u2rhs built on Nov 25 2013 08:17:39 How reproducible: ================== Often Steps to Reproduce: ==================== 1. Create a 1 x 2 replicate volume . Start the volume. 2. Stop one of the storage nodes (shutdown). Restart the node. (Restart should change the IP/hostname of the node)
After updating from glusterfs 3.4.0.33rhs to glusterfs 3.4.0.44rhs I was experiencing the same errors. I found that the peer definitions for the volumes in question were using aliases instead of FQDNs. I edited the peer files on each server to include the FQDN and glusterd now starts without error.