Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 880406

Summary: corosync take over most memory and cluster get down
Product: Red Hat Enterprise Linux 6 Reporter: davidyangyi <davidyangyi>
Component: corosyncAssignee: Jan Friesse <jfriesse>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.2CC: davidyangyi, fdinitto
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-02-13 10:53:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
log none

Description davidyangyi 2012-11-26 22:19:25 UTC
Description of problem:
cluster has been running for a few months. recently one node is down because corosync take over most of memory.

Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux Server release 6.2 (Santiago)

How reproducible:


Steps to Reproduce:
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: 14575 total pagecache pages
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: 1704 pages in swap cache
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: Swap cache stats: add 8700304, delete 8698600, find 4246093/6748636
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: Free swap  = 66472748kB
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: Total swap = 67108856kB
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: 16777200 pages RAM
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: 284791 pages reserved
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: 39627 pages shared
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: 16428456 pages non-shared
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [  702]     0   702     2838       58   4     -17         -1000 udevd
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 1738]     0  1738   107025     1104   8     -16          -941 multipathd
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 1809]     0  1809     9811     8274  12       0           -17 iscsiuio
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 1815]     0  1815     1231      114   4       0             0 iscsid
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 1816]     0  1816     2213     1193   1       0           -17 iscsid
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 2372]     0  2372    23306      123   5     -17         -1000 auditd
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 2437]   155  2437    15198      497   0       0             0 stap-serverd
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 2462]     0  2462    62714      176   0       0             0 rsyslogd
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 2474]     0  2474     2326       88   4       0             0 irqbalance
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 2610]     0  2610    34474      202   8       0             0 pbx_exchange
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 2735]     0  2735 16285299 16186051  13       0             0 corosync
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 2795]     0  2795    49297      208  13     -16          -941 fenced
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 2821]     0  2821    54569      166   0     -16          -941 dlm_controld
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 2870]     0  2870    32293      151   2     -16          -941 gfs_controld
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3012]    81  3012     5540      327  11       0             0 dbus-daemon
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3024]     0  3024     9554      440  11       0             0 corosync-notify
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3040]     0  3040    28551     8616  13       0             0 clvmd
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3099]    68  3099     6651      236  12       0             0 hald
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3100]     0  3100     4540      174  12       0             0 hald-runner
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3143]     0  3143     5069      163   1       0             0 hald-addon-inpu
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3153]    68  3153     4464      206  12       0             0 hald-addon-acpi
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3154]     0  3154     5068      164   0       0             0 hald-addon-stor
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3168]    28  3168   199021      188  12       0             0 nscd
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3184]     0  3184     1104       78   9       0             0 mcelog
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3193]     0  3193    53868      617   8       0             0 snmpd
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3204]     0  3204    49497      278   3       0             0 foghorn
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3215]     0  3215    15544       77   1       0             0 sshd
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3223]    38  3223     6130      144   4       0             0 ntpd
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3247]     0  3247    39400      168   0       0             0 vnetd
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3254]     0  3254    43610      172  12       0             0 bpcd
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3312]   496  3312   345850      189   1       0             0 qpidd
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3366]     0  3366    28840       87   4       0             0 crond
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3408]     0  3408    20783      293   9       0             0 Xvnc
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3413]     0  3413     3180      159   7       0             0 ck-xinit-sessio
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3417]     0  3417    11956      136   9       0             0 vncconfig
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3434]     0  3434     4548       61   5       0             0 dbus-launch
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3435]     0  3435     5005       81   0       0             0 dbus-daemon
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3480]     0  3480   106211      439   1       0             0 libvirtd
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3555]     0  3555     2837       61   4     -17         -1000 udevd
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3562]     0  3562  1028472      232   0       0             0 console-kit-dae
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3656]     0  3656    78021      372   8       0             0 gnome-session
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3658]    99  3658     2814       85   8       0             0 dnsmasq
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3659]     0  3659    13799       20   4       0             0 ssh-agent
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3701]     0  3701    11295      234   4       0             0 devkit-power-da
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3711]     0  3711    32897      231  12       0             0 gconfd-2
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3750]     0  3750   127479      377  12       0             0 gnome-settings-
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3752]     0  3752    40304      110   5       0             0 gnome-keyring-d
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3755]     0  3755    72282      272  15       0             0 seahorse-daemon
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3760]     0  3760    33209      227  14       0             0 gvfsd
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3766]     0  3766    67539      221  13       0             0 gvfs-fuse-daemo
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3774]     0  3774    71912      290   1       0             0 metacity
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3780]     0  3780    86341      347   0       0             0 gnome-panel
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3807]     0  3807    16642      227   0       0             0 saslauthd
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3808]     0  3808    16642      208  12       0             0 saslauthd
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3809]     0  3809    16063        8   5       0             0 saslauthd
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3810]     0  3810    16063        7   5       0             0 saslauthd
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3811]     0  3811    16063        7   1       0             0 saslauthd
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3814]     0  3814    64448      211  13       0             0 nautilus
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3816]     0  3816   157267      300  15       0             0 bonobo-activati
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3833]     0  3833     9247     1572   4       0             0 rgmanager
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3834]     0  3834   104755      407   4       0             0 rgmanager
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3869]     0  3869    79963      340   2       0             0 wnck-applet
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3871]     0  3871    77449      324  13       0             0 trashapplet
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3873]     0  3873    35534      261   6       0             0 gvfs-gdu-volume
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3879]     0  3879    10196      247   4       0             0 udisks-daemon
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3881]     0  3881    34939      259   8       0             0 gvfsd-trash
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3882]     0  3882     5769       46   5       0             0 oddjobd
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3885]     0  3885    10099       39   0       0             0 udisks-daemon
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3902]     0  3902    57992      234   9       0             0 gvfs-afc-volume
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3905]   140  3905    15419      119   8       0             0 ricci
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3909]     0  3909    29977      218   4       0             0 gdm-binary
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3912]     0  3912    37074      237   1       0             0 gvfs-gphoto2-vo
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3917]     0  3917     1029      110   0       0             0 mingetty
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3919]     0  3919     1029      110  12       0             0 mingetty
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3921]     0  3921     1029      110   8       0             0 mingetty
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3923]     0  3923     1029      110   9       0             0 mingetty
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3924]     0  3924     2837       53  10     -17         -1000 udevd
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3926]     0  3926     1029      110   1       0             0 mingetty
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3948]     0  3948    37624      241   0       0             0 gdm-simple-slav
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3950]     0  3950    32222      316   4       0             0 Xorg
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3953]     0  3953   102207      333  15       0             0 gdm-user-switch
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3955]     0  3955   100117      404  15       0             0 gnote
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3957]     0  3957   120354      819   9       0             0 clock-applet
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3958]     0  3958    70384      305  12       0             0 notification-ar
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3971]     0  3971    12046      257  11       0             0 polkitd
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3992]    42  3992     4548       60   8       0             0 dbus-launch
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3993]    42  3993     4972       81   1       0             0 dbus-daemon
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3994]    42  3994    65025      268   9       0             0 gnome-session
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 3997]    42  3997    32854      235   7       0             0 gconfd-2
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 4000]    42  4000    29441      265   4       0             0 at-spi-registry
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 4001]    42  4001    86592      368   8       0             0 gnome-settings-
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 4006]    42  4006    89170      300   0       0             0 bonobo-activati
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 4009]    42  4009    33156      221   5       0             0 gvfsd
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 4022]    42  4022    69515      281   8       0             0 metacity
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 4033]    42  4033    99942      827  12       0             0 gdm-simple-gree
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 4038]    42  4038    77400      365   5       0             0 gnome-power-man
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 4041]    42  4041    59820      246  13       0             0 polkit-gnome-au
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 4126]    42  4126    85393      241  14       0             0 pulseaudio
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 4128]   498  4128    41653      210   9       0             0 rtkit-daemon
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 4245]     0  4245    34344      216   8       0             0 gdm-session-wor
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [15224]     0 15224    19414      155  10       0             0 httpd
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 5175]    99  5175    71507      312   4       0             0 gmond
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [11990]     0 11990     1569       59   8       0             0 rdisc
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [12042]     0 12042  5158773      276   0       0             0 java
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 9522]     2  9522   105479      141  12       0             0 httpd
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 9523]     2  9523   105479      139   5       0             0 httpd
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 9524]     2  9524   105479      139   1       0             0 httpd
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: [ 9649]   141  9649   259916      431   8       0             0 paster
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: Out of memory: Kill process 2735 (corosync) score 456 or sacrifice child
Nov 21 20:02:06 NFJD-PSC-SGM-SV2 kernel: Killed process 2735, UID 0, (corosync) total-vm:65141196kB, anon-rss:64692136kB, file-rss:52068kB
Nov 21 20:02:09 NFJD-PSC-SGM-SV2 dlm_controld[2821]: cluster is down, exiting
Nov 21 20:02:09 NFJD-PSC-SGM-SV2 dlm_controld[2821]: daemon cpg_dispatch error 2
Nov 21 20:02:09 NFJD-PSC-SGM-SV2 rgmanager[3834]: #67: Shutting down uncleanly
Nov 21 20:02:09 NFJD-PSC-SGM-SV2 gfs_controld[2870]: cluster is down, exiting
Nov 21 20:02:09 NFJD-PSC-SGM-SV2 gfs_controld[2870]: daemon cpg_dispatch error 2
Nov 21 20:02:09 NFJD-PSC-SGM-SV2 fenced[2795]: cluster is down, exiting
Nov 21 20:02:09 NFJD-PSC-SGM-SV2 fenced[2795]: daemon cpg_dispatch error 2
Nov 21 20:02:10 NFJD-PSC-SGM-SV2 rgmanager[25597]: [ip] Removing IPv4 address 10.11.200.39/25 from bond2
Nov 21 20:02:12 NFJD-PSC-SGM-SV2 ntpd[3223]: Deleting interface #12 bond2, 10.11.200.39#123, interface stats: received=0, sent=0, dropped=0, active_time=2329056 secs
Nov 21 20:02:15 NFJD-PSC-SGM-SV2 kernel: dlm: closing connection to node 2
Nov 21 20:02:15 NFJD-PSC-SGM-SV2 kernel: dlm: rgmanager: no userland control daemon, stopping lockspace
Nov 21 20:02:15 NFJD-PSC-SGM-SV2 kernel: dlm: clvmd: no userland control daemon, stopping lockspace
Nov 21 20:02:20 NFJD-PSC-SGM-SV2 rgmanager[25623]: [script] Executing /etc/init.d/obc_tomcat stop
Nov 21 20:02:22 NFJD-PSC-SGM-SV2 rgmanager[25713]: [ip] Removing IPv4 address 172.16.200.39/25 from bond0
Nov 21 20:02:24 NFJD-PSC-SGM-SV2 ntpd[3223]: Deleting interface #11 bond0, 172.16.200.39#123, interface stats: received=0, sent=0, dropped=0, active_time=2329079 secs  
Actual results:


Expected results:


Additional info:

Comment 1 davidyangyi 2012-11-26 22:22:52 UTC
Created attachment 652346 [details]
log

Comment 3 Jan Friesse 2012-11-27 07:56:29 UTC
Hi,
6.2 is quiet old release and we have fixed (at least) this bugs (which may be related):
- rhbz#752951
- rhbz#848210

Can you please try to test newest version of corosync?

Comment 5 Jan Friesse 2013-04-08 10:02:12 UTC
Hi,
any news there?

Comment 7 Jan Friesse 2013-11-13 08:35:26 UTC
David,
any news there?

Comment 8 Jan Friesse 2014-01-06 13:30:51 UTC
David,
any news there?

Comment 9 Red Hat Bugzilla 2023-09-14 01:39:06 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days