Bug 1067733 - Rename temp file created in /var/lib/glusterd/peers/ during peer probe
Summary: Rename temp file created in /var/lib/glusterd/peers/ during peer probe
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: 3.4.2
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-02-21 00:10 UTC by Anirban Ghoshal
Modified: 2015-10-07 13:50 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-10-07 13:49:43 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Anirban Ghoshal 2014-02-21 00:10:21 UTC
Description of problem:

I found out that when a peer probe is performed by the user, mgmt/glusterd write a file named after the hostname of the peer in question. On successful probes, this file is replaced with a file named after the UUID of the glusterd instance on the peer, while a failed probe causes the temp file to simply get deleted.

Here's an illustration:

root@someserver:/var/lib/glusterd/peers] gluster peer probe some_non_host &
[1] 25918
root@someserver:/var/lib/glusterd/peers] cat some_non_host
uuid=00000000-0000-0000-0000-000000000000
state=0
hostname1=ksome_non_host
root@someserver:/var/lib/glusterd/peers]
root@someserver:/var/lib/glusterd/peers] peer probe: failed: Probe returned with unknown errno 107

[1]+  Exit 1                  gluster peer probe some_non_host
root@someserver:/var/lib/glusterd/peers] ls
root@someserver:/var/lib/glusterd/peers] 
 
Here's the deal. When, for some reason, glulsterd is killed off before it get a chance to clean up on the temp file (say for a peer that really doesn't exist), and then, if you reboot your machine, the temporary file will really break mgmt/glusterd's recovery graph, and glusterd will be unable to initialize any of the existing volumes without having to delete the tmp file manually.

Version-Release number of selected component (if applicable):
Observed this on 2 releases: 3.4.0 and 3.4.2.

How reproducible:
100%

Steps to Reproduce:
1. Probe for a peer (preferably one that does not exist)
2. Parallelly, kill off glusterd as soon as temp file is created
3. Once glusterd is dead, reboot the machine.

Actual results:
When the machine comes back up, the brick processes for none of the glulster volumes created prior to reboot would be started automatically by glusterd.

Expected results:
mgmt/glusterd should have the intelligence to distinguish between a genuine peer and a temp file created during probe. The temp file should not affect the recovery graph after reboot. Something like a <peer-name>.tmp?

Additional info:
Preferably, also delete any temp file discovered during recovery at startup?

Comment 1 Niels de Vos 2015-05-17 21:58:00 UTC
GlusterFS 3.7.0 has been released (http://www.gluster.org/pipermail/gluster-users/2015-May/021901.html), and the Gluster project maintains N-2 supported releases. The last two releases before 3.7 are still maintained, at the moment these are 3.6 and 3.5.

This bug has been filed against the 3,4 release, and will not get fixed in a 3.4 version any more. Please verify if newer versions are affected with the reported problem. If that is the case, update the bug with a note, and update the version if you can. In case updating the version is not possible, leave a comment in this bug report with the version you tested, and set the "Need additional information the selected bugs from" below the comment box to "bugs".

If there is no response by the end of the month, this bug will get automatically closed.

Comment 2 Kaleb KEITHLEY 2015-10-07 13:49:43 UTC
GlusterFS 3.4.x has reached end-of-life.

If this bug still exists in a later release please reopen this and change the version or open a new bug.

Comment 3 Kaleb KEITHLEY 2015-10-07 13:50:53 UTC
GlusterFS 3.4.x has reached end-of-life.\                                                   \                                                                               If this bug still exists in a later release please reopen this and change the version or open a new bug.


Note You need to log in before you can comment on or make changes to this bug.