Bug 1051927

Summary: Gluster "volume log rotate" hangs the 'glusterd' process.
Product: [Community] GlusterFS Reporter: Jeff Byers <jbyers>
Component: loggingAssignee: bugs <bugs>
Status: CLOSED EOL QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.4.2CC: bugs, gluster-bugs, jbyers
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-07 13:50:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jeff Byers 2014-01-12 22:45:14 UTC
Gluster "volume log rotate" hangs the 'glusterd' process.

In GlusterFS 3.4.2, the "volume log rotate" command does not
work for either distributed or replicated volumes, when
there are two bricks.

The command hangs for 2 minutes, and then returns with exit
status 146. Note that no log rotation is performed before
the command hangs.

All subsequent 'gluster' CLI commands then will fail.

The 'glusterd' process is still running, but appears to be
stuck on a mutex. CLI commands will not work until
'glusterd' is restarted.

Gluster "volume log rotate" does work when the volume is
only one brick on a single node. It also works with two node
volumes when one of the nodes is down.

Not sure why there is any involvement with the other
bricks/nodes as the logfiles are local to each node, and
should only be rotated locally.

~ Jeff Byers ~

[root@SC-10-10-200-71 log]# gluster --version
glusterfs 3.4.2 built on Jan  6 2014 07:37:13

[root@SC-10-10-200-71 log]# gluster volume list
nas-volume-0003

[root@SC-10-10-200-71 log]# gluster volume info nas-volume-0003

Volume Name: nas-volume-0003
Type: Distribute
Volume ID: edc1f521-1695-4605-8b87-ded21c4c47bc
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: 192.168.5.71:/exports/nas-segment-0003/nas-volume-0003
Brick2: 192.168.5.72:/exports/nas-segment-0001/nas-volume-0003
Options Reconfigured:
nfs.rpc-auth-allow: *
performance.read-ahead: off
performance.write-behind: off
performance.stat-prefetch: off
nfs.disable: off
nfs.addr-namelookup: off

[root@SC-10-10-200-71 log]# gluster volume log rotate nas-volume-0003
[root@SC-10-10-200-71 log]# echo $?
146

[root@SC-10-10-200-71 log]# time gluster volume list

real    2m0.198s
user    0m0.124s
sys     0m0.094s
[root@SC-10-10-200-71 log]# echo $?
146

[root@SC-10-10-200-71 log]# ps -elf |grep glusterd
5 S root     10183     1  0  89   9 - 114455 futex_ 13:56 ?       00:00:04 /usr/local/sbin/glusterd --pid-file=/var/run/glusterd.pid

[root@SC-10-10-200-71 log]# strace -f -p 10183 2>&1 |tail
[pid 10184] rt_sigtimedwait([HUP INT USR1 USR2 TERM], NULL, NULL, 8 <unfinished ...>
[pid 10183] futex(0xefab54, FUTEX_WAIT_PRIVATE, 91, NULL <unfinished ...>
[pid 10184] rt_sigtimedwait([HUP INT USR1 USR2 TERM], NULL, NULL, 8 <unfinished ...>
[pid 10183] futex(0xefab54, FUTEX_WAIT_PRIVATE, 91, NULL <unfinished ...>
[pid 10206] futex(0xf13eb4, FUTEX_WAIT_PRIVATE, 67, NULL <unfinished ...>
[pid 10206] futex(0xf13eb4, FUTEX_WAIT_PRIVATE, 67, NULL <unfinished ...>
[pid 10205] restart_syscall(<... resuming interrupted call ...>) = 0
[pid 10205] restart_syscall(<... resuming interrupted call ...>) = 0
[pid 10205] nanosleep({1, 0}, NULL)     = 0
[pid 10205] nanosleep({1, 0}, NULL)     = 0

[root@SC-10-10-200-71 log]# /etc/init.d/glusterd restart
Starting glusterd:                                         [  OK  ]
[root@SC-10-10-200-71 log]# gluster volume list
nas-volume-0003

Comment 2 Niels de Vos 2015-05-17 21:57:21 UTC
GlusterFS 3.7.0 has been released (http://www.gluster.org/pipermail/gluster-users/2015-May/021901.html), and the Gluster project maintains N-2 supported releases. The last two releases before 3.7 are still maintained, at the moment these are 3.6 and 3.5.

This bug has been filed against the 3,4 release, and will not get fixed in a 3.4 version any more. Please verify if newer versions are affected with the reported problem. If that is the case, update the bug with a note, and update the version if you can. In case updating the version is not possible, leave a comment in this bug report with the version you tested, and set the "Need additional information the selected bugs from" below the comment box to "bugs".

If there is no response by the end of the month, this bug will get automatically closed.

Comment 3 Kaleb KEITHLEY 2015-10-07 13:50:53 UTC
GlusterFS 3.4.x has reached end-of-life.\                                                   \                                                                               If this bug still exists in a later release please reopen this and change the version or open a new bug.