Bug 1150585 - numad dies after number of "Could not write 1 to /cgroup/cpuset/libvirt/qemu/vm_name/emulator/cpuset.mems -- errno: 13" errors
Summary: numad dies after number of "Could not write 1 to /cgroup/cpuset/libvirt/qemu/...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: numad
Version: 6.5
Hardware: Unspecified
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Bill Gray
QA Contact: qe-baseos-daemons
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-10-08 13:08 UTC by Alexandros Gkesos
Modified: 2019-07-11 08:15 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1235164 (view as bug list)
Environment:
Last Closed: 2015-07-22 07:46:24 UTC


Attachments (Terms of Use)
Hypervisor's sosreport (15.05 MB, application/x-xz)
2014-10-09 07:21 UTC, Alexandros Gkesos
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:1441 normal SHIPPED_LIVE numad bug fix update 2015-07-20 18:05:46 UTC

Description Alexandros Gkesos 2014-10-08 13:08:15 UTC
Description of problem:

When upgrading numad-0.5-9.20130814git.el6.x86_64 -> numad-0.5-10.20140620git.el6_5.x86_64 on KVM-Hypervisor we see the following errors in numad.log and later on, numad crashes.

...
Tue Oct  7 15:00:02 2014: Advising pid 7952 (qemu-kvm) move from nodes (1) to nodes (1)
Tue Oct  7 15:00:02 2014: Could not write 1 to /cgroup/cpuset/libvirt/qemu/zevis2/emulator/cpuset.mems -- errno: 13
Tue Oct  7 15:00:02 2014: Could not write 0-1 to /cgroup/cpuset/libvirt/qemu/zevis2/emulator/cpuset.mems -- errno: 13
Tue Oct  7 15:00:02 2014: PID 7952 moved to node(s) 1 in 0.0 seconds
Tue Oct  7 15:01:37 2014: Advising pid 8033 (qemu-kvm) move from nodes (1) to nodes (1)
Tue Oct  7 15:01:37 2014: Could not write 1 to /cgroup/cpuset/libvirt/qemu/tcn1/emulator/cpuset.mems -- errno: 13
Tue Oct  7 15:01:37 2014: Could not write 0-1 to /cgroup/cpuset/libvirt/qemu/tcn1/emulator/cpuset.mems -- errno: 13
Tue Oct  7 15:01:37 2014: PID 8033 moved to node(s) 1 in 0.0 seconds
Tue Oct  7 15:02:57 2014: Advising pid 11520 (qemu-kvm) move from nodes (1) to nodes (1)
Tue Oct  7 15:02:57 2014: Could not write 1 to /cgroup/cpuset/libvirt/qemu/ntvmeuc02/emulator/cpuset.mems -- errno: 13
Tue Oct  7 15:02:57 2014: Could not write 0-1 to /cgroup/cpuset/libvirt/qemu/ntvmeuc02/emulator/cpuset.mems -- errno: 13
Tue Oct  7 15:02:57 2014: PID 11520 moved to node(s) 1 in 0.0 seconds
Tue Oct  7 15:03:02 2014: Could not open stat file: /proc/1068/stat
Tue Oct  7 15:03:02 2014: Advising pid 26413 (qemu-kvm) move from nodes (1) to nodes (1)
Tue Oct  7 15:03:02 2014: Could not write 1 to /cgroup/cpuset/libvirt/qemu/zevisk1/emulator/cpuset.mems -- errno: 13
Tue Oct  7 15:03:02 2014: Could not write 0-1 to /cgroup/cpuset/libvirt/qemu/zevisk1/emulator/cpuset.mems -- errno: 13
Tue Oct  7 15:03:02 2014: PID 26413 moved to node(s) 1 in 0.0 seconds
Tue Oct  7 15:03:07 2014: Could not get node meminfo


Behaviour with previous numad version

Tue Oct  7 08:37:53 2014: Advising pid 16553 (qemu-kvm) move from nodes (0) to nodes (1)
Tue Oct  7 08:37:53 2014: Including task: 16553
Tue Oct  7 08:37:53 2014: Including task: 16632
Tue Oct  7 08:37:53 2014: Including task: 16633
Tue Oct  7 08:37:53 2014: Including task: 16634
Tue Oct  7 08:37:53 2014: Including task: 16635
Tue Oct  7 08:37:53 2014: PID 16553 moved to node(s) 1 in 0.0 seconds
Tue Oct  7 08:37:58 2014: Advising pid 11520 (qemu-kvm) move from nodes (0) to nodes (1)
Tue Oct  7 08:37:58 2014: Including task: 11520
Tue Oct  7 08:37:58 2014: Including task: 11592
Tue Oct  7 08:37:58 2014: Including task: 11593
Tue Oct  7 08:37:58 2014: PID 11520 moved to node(s) 1 in 0.1 seconds


Version-Release number of selected component (if applicable): numad-0.5-10.20140620git.el6_5.x86_64


How reproducible: Everytime (by customer)


Steps to Reproduce:
1. Upgrade from numad-0.5-9.20130814git.el6.x86_64 to numad-0.5-10.20140620git.el6_5.x86_64

Actual results:

Tue Oct  7 15:00:02 2014: Could not write 1 to /cgroup/cpuset/libvirt/qemu/zevis2/emulator/cpuset.mems -- errno: 13
Tue Oct  7 15:00:02 2014: Could not write 0-1 to /cgroup/cpuset/libvirt/qemu/zevis2/emulator/cpuset.mems -- errno: 13

and numad crashes after some time.

Expected results:

No errors (i have the normal logs in description)
Numad won't crash

Comment 4 Alexandros Gkesos 2014-10-09 07:21:25 UTC
Created attachment 945210 [details]
Hypervisor's sosreport

Comment 6 Jürgen Thomann 2015-03-13 10:04:33 UTC
We have the same problem and it is caused by too many open file descriptors. I'm now testing the following patch:

--- a/numad.c
+++ b/numad.c
@@ -1111,6 +1111,7 @@ int write_to_cpuset_file(char *fname, char *s) {
     numad_log(LOG_DEBUG, "Writing %s to: %s\n", s, fname);
     if (write(fd, s, strlen(s)) <= 0) {
         numad_log(LOG_CRIT, "Could not write %s to %s -- errno: %d\n", s, fname, errno);
+        close(fd);
         return -1;
     }
     close(fd);

Comment 7 Bill Gray 2015-06-01 12:33:55 UTC
Looks like the patch in Comment 6 would have prevented numad from crashing by fixing the failure to close the file during error conditions when attempting to write to the file.  Thanks.  

The new version of numad no longer uses cpusets, entirely eliminating the writes to the cpuset control files.

Comment 14 errata-xmlrpc 2015-07-22 07:46:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1441.html


Note You need to log in before you can comment on or make changes to this bug.