Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1067733

Summary: Rename temp file created in /var/lib/glusterd/peers/ during peer probe
Product: [Community] GlusterFS Reporter: Anirban Ghoshal <a.ghoshal>
Component: glusterdAssignee: bugs <bugs>
Status: CLOSED EOL QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.4.2CC: bugs, gluster-bugs, joe
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-07 13:49:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Anirban Ghoshal 2014-02-21 00:10:21 UTC
Description of problem:

I found out that when a peer probe is performed by the user, mgmt/glusterd write a file named after the hostname of the peer in question. On successful probes, this file is replaced with a file named after the UUID of the glusterd instance on the peer, while a failed probe causes the temp file to simply get deleted.

Here's an illustration:

root@someserver:/var/lib/glusterd/peers] gluster peer probe some_non_host &
[1] 25918
root@someserver:/var/lib/glusterd/peers] cat some_non_host
uuid=00000000-0000-0000-0000-000000000000
state=0
hostname1=ksome_non_host
root@someserver:/var/lib/glusterd/peers]
root@someserver:/var/lib/glusterd/peers] peer probe: failed: Probe returned with unknown errno 107

[1]+  Exit 1                  gluster peer probe some_non_host
root@someserver:/var/lib/glusterd/peers] ls
root@someserver:/var/lib/glusterd/peers] 
 
Here's the deal. When, for some reason, glulsterd is killed off before it get a chance to clean up on the temp file (say for a peer that really doesn't exist), and then, if you reboot your machine, the temporary file will really break mgmt/glusterd's recovery graph, and glusterd will be unable to initialize any of the existing volumes without having to delete the tmp file manually.

Version-Release number of selected component (if applicable):
Observed this on 2 releases: 3.4.0 and 3.4.2.

How reproducible:
100%

Steps to Reproduce:
1. Probe for a peer (preferably one that does not exist)
2. Parallelly, kill off glusterd as soon as temp file is created
3. Once glusterd is dead, reboot the machine.

Actual results:
When the machine comes back up, the brick processes for none of the glulster volumes created prior to reboot would be started automatically by glusterd.

Expected results:
mgmt/glusterd should have the intelligence to distinguish between a genuine peer and a temp file created during probe. The temp file should not affect the recovery graph after reboot. Something like a <peer-name>.tmp?

Additional info:
Preferably, also delete any temp file discovered during recovery at startup?

Comment 1 Niels de Vos 2015-05-17 21:58:00 UTC
GlusterFS 3.7.0 has been released (http://www.gluster.org/pipermail/gluster-users/2015-May/021901.html), and the Gluster project maintains N-2 supported releases. The last two releases before 3.7 are still maintained, at the moment these are 3.6 and 3.5.

This bug has been filed against the 3,4 release, and will not get fixed in a 3.4 version any more. Please verify if newer versions are affected with the reported problem. If that is the case, update the bug with a note, and update the version if you can. In case updating the version is not possible, leave a comment in this bug report with the version you tested, and set the "Need additional information the selected bugs from" below the comment box to "bugs".

If there is no response by the end of the month, this bug will get automatically closed.

Comment 2 Kaleb KEITHLEY 2015-10-07 13:49:43 UTC
GlusterFS 3.4.x has reached end-of-life.

If this bug still exists in a later release please reopen this and change the version or open a new bug.

Comment 3 Kaleb KEITHLEY 2015-10-07 13:50:53 UTC
GlusterFS 3.4.x has reached end-of-life.\                                                   \                                                                               If this bug still exists in a later release please reopen this and change the version or open a new bug.