Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Created attachment 526680[details]
portion of log file where error starts ocurring
Description of problem:
After running rhev 3.0 beta 2 hypervisor for 10 days it was marked unresponsive in the rhev manager and needed vdsm to be restarted. This hypervisor was the only one running in the rhev installation, so was running as SPM; iSCSI back-end. In the hypervisor vdsm log it shows:
Dummy-4421::DEBUG::2011-10-04 21:09:49,836::storage_mailbox::623::Storage.Misc.excCmd::(_checkForMail) 'dd if=/rhev/data-center/bf5008ea-b174-4ad8-8f35-20e1003c92c8/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000' (cwd None)
Dummy-4421::ERROR::2011-10-04 21:09:49,837::storage_mailbox::669::Storage.MailBox.SpmMailMonitor::(run) Traceback (most recent call last):
File "/usr/share/vdsm/storage/storage_mailbox.py", line 665, in run
self._checkForMail()
File "/usr/share/vdsm/storage/storage_mailbox.py", line 623, in _checkForMail
(rc, in_mail, err) = misc.execCmd(cmd, sudo=False, raw=True)
File "/usr/share/vdsm/storage/misc.py", line 172, in execCmd
env=env)
File "/usr/lib64/python2.6/subprocess.py", line 632, in __init__
errread, errwrite) = self._get_handles(stdin, stdout, stderr)
File "/usr/lib64/python2.6/subprocess.py", line 1055, in _get_handles
p2cread, p2cwrite = os.pipe()
OSError: [Errno 24] Too many open files
[root@rhev-h9 vdsm]# lsof |grep 2431 |grep pipe |wc -l
4070
[root@rhev-h9 vdsm]# lsof |grep 2431 |grep pipe |grep w |wc -l
9
[root@rhev-h9 vdsm]# lsof |grep 2431 |grep pipe |grep r |wc -l
4061
lsof shows vdsm process has 4070 open file descriptors pertaining to pipes.
Version-Release number of selected component (if applicable):
rhev 3.0 beta 2, hypervisor running on RHEL (rather than RHEV-H).
[root@rhev-h9 vdsm]# rpm -q vdsm
vdsm-4.9-96.1.el6.x86_64
How reproducible:
Unknown, but it looks like there's a leak in the pipe file descriptors when vdsm execv's a child process. It looks specifically like it's child stdout pipes that are leaking. Perhaps vdsm is holding an object open in python somewhere which prevents it from closing the pipe and eventually it runs out of fds?
Comment 3RHEL Program Management
2011-10-07 16:11:02 UTC
Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.
Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.
Comment 4Federico Simoncelli
2011-10-11 11:36:40 UTC
commit 09881a22f82786b4b3321630925c200929d1c846
Author: Federico Simoncelli <fsimonce>
Date: Tue Oct 11 11:04:20 2011 +0000
BZ#743887 Close lifeline pipe when oop is killed
Change-Id: Ife49c57106642b5d5c823867af594d5e4c5bc470
http://gerrit.usersys.redhat.com/1024
Comment 5Federico Simoncelli
2011-10-11 13:05:17 UTC
Looking at the attached vdsm logs you can see many "Operation Stuck" messages for the domain 0d681641-1b9d-4691-9356-d6e536eb9067 (in particular for the getReadDelay method).
I reproduced the problem and I noticed that the number of file descriptors is proportional to the number of "Operation Stuck" messages found in the logs, I also noticed that in such situation the ProcessPool kills the Helper and doesn't explicitly close the lifeline fd (opened with os.open).
For reproducing and verifying purpose you can speed up the "Operation Stuck" messages configuring vdsm.conf with:
[irs]
process_pool_timeout = 4
sd_health_check_delay = 2
Restart vdsm and connect to a storage pool with an NFS domain and block the connection, the messages:
Storage.StatsThread::(run) Could not figure out delay for domain `fd3cd061-792f-463f-9707-5cc9d2bb746f` (Operation Stuck)
should begin to appear in vdsm.log.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
http://rhn.redhat.com/errata/RHEA-2011-1782.html
Created attachment 526680 [details] portion of log file where error starts ocurring Description of problem: After running rhev 3.0 beta 2 hypervisor for 10 days it was marked unresponsive in the rhev manager and needed vdsm to be restarted. This hypervisor was the only one running in the rhev installation, so was running as SPM; iSCSI back-end. In the hypervisor vdsm log it shows: Dummy-4421::DEBUG::2011-10-04 21:09:49,836::storage_mailbox::623::Storage.Misc.excCmd::(_checkForMail) 'dd if=/rhev/data-center/bf5008ea-b174-4ad8-8f35-20e1003c92c8/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000' (cwd None) Dummy-4421::ERROR::2011-10-04 21:09:49,837::storage_mailbox::669::Storage.MailBox.SpmMailMonitor::(run) Traceback (most recent call last): File "/usr/share/vdsm/storage/storage_mailbox.py", line 665, in run self._checkForMail() File "/usr/share/vdsm/storage/storage_mailbox.py", line 623, in _checkForMail (rc, in_mail, err) = misc.execCmd(cmd, sudo=False, raw=True) File "/usr/share/vdsm/storage/misc.py", line 172, in execCmd env=env) File "/usr/lib64/python2.6/subprocess.py", line 632, in __init__ errread, errwrite) = self._get_handles(stdin, stdout, stderr) File "/usr/lib64/python2.6/subprocess.py", line 1055, in _get_handles p2cread, p2cwrite = os.pipe() OSError: [Errno 24] Too many open files [root@rhev-h9 vdsm]# lsof |grep 2431 |grep pipe |wc -l 4070 [root@rhev-h9 vdsm]# lsof |grep 2431 |grep pipe |grep w |wc -l 9 [root@rhev-h9 vdsm]# lsof |grep 2431 |grep pipe |grep r |wc -l 4061 lsof shows vdsm process has 4070 open file descriptors pertaining to pipes. Version-Release number of selected component (if applicable): rhev 3.0 beta 2, hypervisor running on RHEL (rather than RHEV-H). [root@rhev-h9 vdsm]# rpm -q vdsm vdsm-4.9-96.1.el6.x86_64 How reproducible: Unknown, but it looks like there's a leak in the pipe file descriptors when vdsm execv's a child process. It looks specifically like it's child stdout pipes that are leaking. Perhaps vdsm is holding an object open in python somewhere which prevents it from closing the pipe and eventually it runs out of fds?