Bug 963658 - OOM: was trying to execute command "gluster volume status" in loop
OOM: was trying to execute command "gluster volume status" in loop
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd (Show other bugs)
2.1
x86_64 Linux
high Severity urgent
: ---
: ---
Assigned To: Krutika Dhananjay
Rahul Hinduja
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-05-16 06:34 EDT by Rahul Hinduja
Modified: 2013-09-23 18:43 EDT (History)
5 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0.12rhs-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-09-23 18:39:48 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Rahul Hinduja 2013-05-16 06:34:49 EDT
Description of problem:
=======================

Powered down one of the server in 4 node cluster and tried to execute "gluster volume status" from other server. "gluster volume status" was failing as the lock was held by the system which went down. 

Tried multiple times "gluster volume status" in 20 mins and observed OOM on the 
server.


Version-Release number of selected component (if applicable):
=============================================================

[root@rhs-client11 ~]# rpm -qa | grep gluster | grep 3.4.0
glusterfs-fuse-3.4.0.8rhs-1.el6rhs.x86_64
glusterfs-devel-3.4.0.8rhs-1.el6rhs.x86_64
glusterfs-3.4.0.8rhs-1.el6rhs.x86_64
glusterfs-server-3.4.0.8rhs-1.el6rhs.x86_64
glusterfs-rdma-3.4.0.8rhs-1.el6rhs.x86_64
glusterfs-debuginfo-3.4.0.8rhs-1.el6rhs.x86_64
glusterfs-geo-replication-3.4.0.8rhs-1.el6rhs.x86_64
[root@rhs-client11 ~]# 


Steps Carried:
==============
1. 6*2 volume was created from four servers (rhs-client11 to rhs-client14)
2. Brought down rhs-client13
3. Executed "gluster volume status" on rhs-client11
4. OOM observed on rhs-client11

[root@rhs-client11 ~]# gluster volume status 
Another transaction is in progress. Please try again after sometime.
 
[root@rhs-client11 ~]# 
[root@rhs-client11 ~]# gluster volume status 
Another transaction could be in progress. Please try again after sometime.
 
[root@rhs-client11 ~]# gluster volume status 
[root@rhs-client11 ~]# gluster volume status 
Another transaction is in progress. Please try again after sometime.
 
[root@rhs-client11 ~]# 
[root@rhs-client11 ~]# 
[root@rhs-client11 ~]# 
[root@rhs-client11 ~]# gluster volume status 
Connection failed. Please check if gluster daemon is operational.
[root@rhs-client11 ~]# 
[root@rhs-client11 ~]# service glusterd status
glusterd dead but pid file exists
[root@rhs-client11 ~]# 



4067128 pages non-shared
[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
[ 1052]     0  1052     2670       61   2     -17         -1000 udevd
[ 3188]     0  3188     2279       36   0       0             0 dhclient
[ 3247]     0  3247    47282      624  18       0             0 vdsm-reg-setup
[ 3255]     0  3255    62269      160   6       0             0 rsyslogd
[ 3267]     0  3267     2763      164  12       0             0 irqbalance
[ 3286]    32  3286     4743      151  12       0             0 rpcbind
[ 3359]     0  3359     6290       62  19       0             0 rpc.idmapd
[ 3382]     0  3382  5936597  3966387   3       0             0 glusterd
[ 3412]    81  3412     5350       88   1       0             0 dbus-daemon
[ 3480]     0  3480   101246      339   6       0             0 glusterfs
[ 3486]     0  3486   137041      248   1       0             0 glusterfsd
[ 3491]     0  3491   120656      249  12       0             0 glusterfsd
[ 3496]     0  3496   120399      248   5       0             0 glusterfsd
[ 3553]    68  3553     6439      423   0       0             0 hald
[ 3554]     0  3554     4526      130   7       0             0 hald-runner
[ 3582]     0  3582     5055      118   0       0             0 hald-addon-inpu
[ 3599]    68  3599     4451      158   6       0             0 hald-addon-acpi
[ 3713]     0  3713    16029       85   1     -17         -1000 sshd
[ 3737]    38  3737     7540      201  12       0             0 ntpd
[ 3753]     0  3753    24469      285   2       0             0 sshd
[ 3754]     0  3754    21683      138  18       0             0 sendmail
[ 3762]    51  3762    19538       94  19       0             0 sendmail
[ 3785]     0  3785    27544      144   0       0             0 abrtd
[ 3799]     0  3799    27051       97   6       0             0 ksmtuned
[ 3811]     0  3811    43663      282   4       0             0 tuned
[ 3821]     0  3821    29302      163   0       0             0 crond
[ 3844]     0  3844    25972       77  18       0             0 rhsmcertd
[ 3856]     0  3856     3387      829   6       0             0 wdmd
[ 3868]   179  3868    65809     4366   7       0             0 sanlock
[ 3869]     0  3869     5769       45  20       0             0 sanlock-helper
[ 3977]     0  3977   107573     1109  20     -17         -1000 multipathd
[ 4276]     0  4276     7238     5701  13       0           -17 iscsiuio
[ 4281]     0  4281     1219      102  15       0             0 iscsid
[ 4282]     0  4282     1344      832  20       0           -17 iscsid
[ 4390]     0  4390   215266      363  16       0             0 libvirtd
[ 4745]     0  4745     2669       68   0     -17         -1000 udevd
[ 4746]     0  4746     2669       63   6     -17         -1000 udevd
[ 4784]    36  4784     2309       48   1       0             0 respawn
[ 4789]    36  4789   392625     2679  16       0             0 vdsm
[ 4794]     0  4794     1015      114   6       0             0 mingetty
[ 4796]     0  4796     1015      114   8       0             0 mingetty
[ 4798]     0  4798     1015      114   1       0             0 mingetty
[ 4800]     0  4800     1015      114   6       0             0 mingetty
[ 4802]     0  4802     1015      114   3       0             0 mingetty
[ 4979]     0  4979    19105      222   6       0             0 sudo
[ 4994]     0  4994   149839      831   6       0             0 python
[ 5158]     0  5158    27111      259   0       0             0 bash
[ 5700]     0  5700    24469      252   0       0             0 sshd
[ 5702]     0  5702    27084      199   6       0             0 bash
[ 6319]     0  6319    24819      700   6       0             0 sshd
[ 6321]     0  6321    24469      285   2       0             0 sshd
[ 6323]     0  6323    24469      285   1       0             0 sshd
[ 6325]     0  6325    27115      198   6       0             0 bash
[ 6336]     0  6336    27084      198   6       0             0 bash
[ 6347]     0  6347    27115      198   6       0             0 bash
[ 6358]     0  6358    25234      119  15       0             0 tail
[ 6365]     0  6365    25234      119  13       0             0 tail
[ 6372]     0  6372    25234      126   7       0             0 tail
[ 6514]     0  6514   101246      347   9       0             0 glusterfs
[ 6530]     0  6530    52871      184   0       0             0 smbd
[ 6533]     0  6533    53000       90   0       0             0 smbd
[ 8080]     0  8080   270107      244   0       0             0 glusterfsd
[ 8089]     0  8089   253461      244  12       0             0 glusterfsd
[ 8098]     0  8098   253467      248   2       0             0 glusterfsd
[ 8108]     0  8108   105758     1940   6       0             0 glusterfs
[ 8117]     0  8117   184401     4925   6       0             0 glusterfs
[ 8132]    29  8132     6621      174   1       0             0 rpc.statd
[ 8502]     0  8502    25226      126   6       0             0 sleep
Out of memory: Kill process 3382 (glusterd) score 921 or sacrifice child
Killed process 3382, UID 0, (glusterd) total-vm:23746388kB, anon-rss:15864332kB, file-rss:1216kB
[root@rhs-client11 ~]#
Comment 3 Amar Tumballi 2013-07-12 01:42:53 EDT
Not happening with latest codebase. Can we reverify.

The reason we speculate for this is the excessive re-work before rebase and the downstream rebase to the proper point in time for upstream code.
Comment 5 Scott Haines 2013-09-23 18:39:48 EDT
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html
Comment 6 Scott Haines 2013-09-23 18:43:48 EDT
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.