RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 797192 - corosync filling up /dev/shm
Summary: corosync filling up /dev/shm
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: corosync
Version: 6.2
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Jan Friesse
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks: 797922 810915 810916 810917
TreeView+ depends on / blocked
 
Reported: 2012-02-24 14:18 UTC by Patrick Van Gilst
Modified: 2020-05-14 14:53 UTC (History)
8 users (show)

Fixed In Version: corosync-1.4.1-6.el6
Doc Type: Bug Fix
Doc Text:
Previously, the underlying library of corosync did not delete temporary buffers used for Inter-Process Communication (IPC) that are stored in the /dev/shm shared memory file system. Therefore, if the user without proper privileges attempted to establish an IPC connection, the attempt failed with an error message as expected but memory allocated for temporary buffers was not released. This could eventually result in /dev/shm being fully used and Denial of Service. This update modifies the coroipcc library to let applications delete temporary buffers if the buffers were not deleted by the corosync server. The /dev/shm file system is no longer cluttered with needless data in this scenario and IPC connections can be established as expected.
Clone Of:
: 797922 (view as bug list)
Environment:
Last Closed: 2012-06-20 12:23:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
cluster.conf (10.57 KB, application/octet-stream)
2012-02-27 11:17 UTC, Patrick Van Gilst
no flags Details
bind_mount.sh (3.78 KB, application/x-sh)
2012-02-27 12:33 UTC, Patrick Van Gilst
no flags Details
Proposed patch (1.63 KB, patch)
2012-02-27 14:27 UTC, Jan Friesse
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 79313 0 None None None Never
Red Hat Product Errata RHBA-2012:0777 0 normal SHIPPED_LIVE corosync bug fix and enhancement update 2012-06-19 20:35:04 UTC

Description Patrick Van Gilst 2012-02-24 14:18:29 UTC
Description of problem:

Running a 2 nodes NFS cluster active-passive whith RHCS 6.2, after about 1 hour /dev/shm becomes 100% full. SELinux in Permissive mode.

# df -h /dev/shm/
Filesystem            Size  Used Avail Use% Mounted on
tmpfs                  16G     -     -   -  /dev/shm

# ls -l /dev/shm/ | wc -l
21275

# ls /dev/shm
control_buffer-geIdyN  control_buffer-T04M9s  dispatch_buffer-9UF86u  dispatch_buffer-mhchTW  dispatch_buffer-yWbMFr  request_buffer-GkctZk   request_buffer-SU8WQu  response_buffer-AEBaB3  response_buffer-n1BCmK 
.........

# ls -l /dev/shm | less
total 16409808
-rw-------. 1 rpcuser rpcuser    8192 Feb 24 13:22 control_buffer-00AIgE
-rw-------. 1 root    root       8192 Feb 24 14:51 control_buffer-00oq6Q
-rw-------. 1 rpcuser rpcuser    8192 Feb 24 13:48 control_buffer-00R0LQ
-rw-------. 1 rpcuser rpcuser    8192 Feb 24 13:00 control_buffer-01AsLt
-rw-------. 1 rpcuser rpcuser    8192 Feb 24 12:16 control_buffer-01D8op
-rw-------. 1 rpcuser rpcuser    8192 Feb 24 13:16 control_buffer-01wU0Z
-rw-------. 1 rpcuser rpcuser    8192 Feb 24 12:20 control_buffer-01XgT8
-rw-------. 1 rpcuser rpcuser    8192 Feb 24 14:14 control_buffer-01xmph
......

Seems that coroipcc.c cannot free shared memory....


Version-Release number of selected component (if applicable):

Red Hat Enterprise Linux Server release 6.2 (Santiago)
cman-3.0.12.1-23.el6.x86_64
rgmanager-3.0.12.1-5.el6.x86_64
resource-agents-3.9.2-7.el6.x86_64
corosync-1.4.1-4.el6.x86_64
nfs-utils-lib-1.1.5-4.el6.x86_64
nfs-utils-1.2.3-15.el6.x86_64

How reproducible:
Always


Steps to Reproduce:

1. Create a RHCS cluster for nfs in active-passive mode.
2. Start the service
3. /dev/shm is filling up
  
Actual results:

/dev/shm filling up. As a consequence:
# cman_tool version -r segfaults
# corosync-objctl -a
Could not initialize objdb library. Error 2
Cannot use every corosync-* utilities

Expected results:
/dev/shm not filling up, no segfault, no error whith corosync-* utilities

Comment 2 Jan Friesse 2012-02-27 10:37:50 UTC
Can you please provide your cluster.conf?

Comment 3 Patrick Van Gilst 2012-02-27 11:17:50 UTC
Created attachment 566012 [details]
cluster.conf

Comment 4 Jan Friesse 2012-02-27 12:25:08 UTC
(In reply to comment #3)
> Created attachment 566012 [details]
> cluster.conf

Thanks, can you please also provide "/usr/local/bin/bind_mount.sh" (even I can somehow imagine what it does)?

Comment 5 Patrick Van Gilst 2012-02-27 12:33:08 UTC
Created attachment 566027 [details]
bind_mount.sh

Thanks for working on this issue.

Comment 6 Jan Friesse 2012-02-27 13:28:17 UTC
I believe that I found main problem in corosync. Please confirm, that you are seeing "Invalid IPC credentials." in /var/log/messages.

What I don't understand is, who running as rpcuser is trying to connect to corosync. This is actually not problem for solving bug, but it may be problem in future for your environment.

Comment 7 Patrick Van Gilst 2012-02-27 13:51:25 UTC
Yes I confirm that we have a lot of "Invalid IPC credentials." in /var/log/messages.

Concerning rpcuser, maybe this can help:

# ps aux | grep [r]pcuser
rpcuser  30140  0.0  0.0  27424  1396 ?        S<s  Feb24   0:03 rpc.statd -H /usr/share/cluster/nfsserver.sh -d

Comment 10 Jan Friesse 2012-02-27 14:27:21 UTC
Created attachment 566051 [details]
Proposed patch

Unlink shm buffers if init fails

If ipc init failed, buffers was not unlinked nether by client (lib) side
nor server (corosync) side. This may lead to fill all available space,
resulting in no accept of other connection. Typical example can be user
running any corosync ipc binary (like corosync-objctl), without correct
uid/gid entry in corosync configuration, resulting in DOS.

Comment 13 Patrick Van Gilst 2012-02-27 15:14:16 UTC
Thanks a lot for your efficiency!
I'm waiting for the suggestions from rgmanager's maintainers before I give a try.

Comment 15 Jan Friesse 2012-03-07 08:57:26 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause
Trying to do corosync IPC with user account without privileges to do IPC.

Consequence
Application is correctly informed about no privileges to do IPC, error message is correctly logged, but temporary buffers in /dev/shm used for IPC are not deleted and /dev/shm is keep filling.

Fix
Delete temporary buffers in /dev/shm by applications (implemented in lib) if corosync server didn't did so.

Result
Library properly deletes temporary buffers in /dev/shm if corosync didn't did so and /dev/shm is not filled up.

Comment 20 Miroslav Svoboda 2012-04-26 12:32:15 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,11 +1 @@
-Cause
+Previously, the underlying library of corosync did not delete temporary buffers used for Inter-Process Communication (IPC) that are stored in the /dev/shm shared memory file system. Therefore, if the user without proper privileges attempted to establish an IPC connection, the attempt failed with an error message as expected but memory allocated for temporary buffers was not released. This could eventually result in /dev/shm being fully used and Denial of Service. This update modifies the coroipcc library to let applications delete temporary buffers if the buffers were not deleted by the corosync server. The /dev/shm file system is no longer cluttered with needless data in this scenario and IPC connections can be established as expected.-Trying to do corosync IPC with user account without privileges to do IPC.
-
-Consequence
-Application is correctly informed about no privileges to do IPC, error message is correctly logged, but temporary buffers in /dev/shm used for IPC are not deleted and /dev/shm is keep filling.
-
-Fix
-Delete temporary buffers in /dev/shm by applications (implemented in lib) if corosync server didn't did so.
-
-Result
-Library properly deletes temporary buffers in /dev/shm if corosync didn't did so and /dev/shm is not filled up.

Comment 22 errata-xmlrpc 2012-06-20 12:23:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0777.html


Note You need to log in before you can comment on or make changes to this bug.