| Summary: | [rhel6.1] [libcgroup] service cgconfig start fails on permission error (service restart works) | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Haim <hateya> |
| Component: | libcgroup | Assignee: | Ivana Varekova <varekova> |
| Status: | CLOSED WORKSFORME | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.1 | CC: | abaron, danken, dnaori, jsafrane, mgoldboi, rvokal, yeylon, ykaul |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-04-13 11:08:03 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Haim
2011-03-03 13:03:13 UTC
What's getenforce? Are there interesting AVCs in audit.log? no, [root@rhev-i32c-01 ~]# service cgconfig start Starting cgconfig service: Loading configuration file /etc/cgconfig.conf failed Cgroup mounting failed Failed to parse /etc/cgconfig.conf [FAILED] [root@rhev-i32c-01 ~]# getenforce Permissive [root@rhev-i32c-01 ~]# service cgconfig restart Stopping cgconfig service: [ OK ] Starting cgconfig service: [ OK ] [root@rhev-i32c-01 ~]# (In reply to comment #3) > [root@rhev-i32c-01 ~]# service cgconfig start > Starting cgconfig service: Loading configuration file /etc/cgconfig.conf failed > Cgroup mounting failed > Failed to parse /etc/cgconfig.conf [FAILED] > > [root@rhev-i32c-01 ~]# service cgconfig restart > Stopping cgconfig service: [ OK ] > Starting cgconfig service: [ OK ] > [root@rhev-i32c-01 ~]# This can indicate that you have already mounted some cgroups somewhere. What does cat /proc/mounts and cat /proc/cgroups tell you? [root@camel-vdsa ~]# cat /proc/mounts && cat /proc/cgroups rootfs / rootfs rw 0 0 /proc /proc proc rw,relatime 0 0 /sys /sys sysfs rw,seclabel,relatime 0 0 udev /dev devtmpfs rw,seclabel,relatime,size=8158948k,nr_inodes=2039737,mode=755 0 0 devpts /dev/pts devpts rw,seclabel,relatime,gid=5,mode=620,ptmxmode=000 0 0 tmpfs /dev/shm tmpfs rw,seclabel,relatime 0 0 /dev/mapper/vg0-lv_root / ext4 rw,seclabel,relatime,barrier=1,data=ordered 0 0 none /selinux selinuxfs rw,relatime 0 0 udev /dev devtmpfs rw,seclabel,relatime,size=8158948k,nr_inodes=2039737,mode=755 0 0 /proc/bus/usb /proc/bus/usb usbfs rw,relatime 0 0 /dev/sda1 /boot ext4 rw,seclabel,relatime,barrier=1,data=ordered 0 0 /dev/mapper/vg0-lv_home /home ext4 rw,seclabel,relatime,barrier=1,data=ordered 0 0 none /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0 cgroup /cgroup/cpuset cgroup rw,relatime,cpuset 0 0 cgroup /cgroup/cpu cgroup rw,relatime,cpu 0 0 cgroup /cgroup/cpuacct cgroup rw,relatime,cpuacct 0 0 cgroup /cgroup/memory cgroup rw,relatime,memory 0 0 cgroup /cgroup/devices cgroup rw,relatime,devices 0 0 cgroup /cgroup/freezer cgroup rw,relatime,freezer 0 0 cgroup /cgroup/net_cls cgroup rw,relatime,net_cls 0 0 cgroup /cgroup/blkio cgroup rw,relatime,blkio 0 0 sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0 //10.35.116.10/log/ /mnt/log cifs rw,relatime,unc=\\10.35.116.10\log,username=administrator,uid=0,noforceuid,gid=0,noforcegid,addr=10.35.116.10,file_mode=0755,dir_mode=0755,serverino,rsize=16384,wsize=57344 0 0 //10.35.116.10/log/ /mnt/log cifs rw,relatime,unc=\\10.35.116.10\log,username=administrator,uid=0,noforceuid,gid=0,noforcegid,addr=10.35.116.10,file_mode=0755,dir_mode=0755,serverino,rsize=16384,wsize=57344 0 0 //10.35.116.10/log/ /mnt/log cifs rw,relatime,unc=\\10.35.116.10\log,username=administrator,uid=0,noforceuid,gid=0,noforcegid,addr=10.35.116.10,file_mode=0755,dir_mode=0755,serverino,rsize=16384,wsize=57344 0 0 /etc/auto.misc /misc autofs rw,relatime,fd=7,pgrp=2046,timeout=300,minproto=5,maxproto=5,indirect 0 0 -hosts /net autofs rw,relatime,fd=13,pgrp=2046,timeout=300,minproto=5,maxproto=5,indirect 0 0 #subsys_name hierarchy num_cgroups enabled cpuset 1 4 1 ns 0 1 1 cpu 2 4 1 cpuacct 3 4 1 memory 4 4 1 devices 5 4 1 freezer 6 4 1 net_cls 7 1 1 blkio 8 4 1 Just as I thought, cgroups are still mounted after cgconfig stop. Question is why? Do you have the controllers mounted twice? (Almost) the only thing "service cgconfig stop" does is calling /sbin/cgclear. 1) Please check, that cgclear is called (using "bash -x /etc/init.d/cgconfig stop"). It should *not* leave any cgroup mounted (check with the /proc/mounts). 2) If cgclear keeps something mounted, please provide its strace Please note that second call to cgclear probably succeeds (that's why service cgconfig restart works), so you may need to re-start the service to the weird state. Provide also your /etc/cgconfig.conf (although it seems it is the default one). first - I'm able to start service when bash -xv precedes the initial service start command. without it, it fails.
now, for your questions:
1) cgclear is called and all cgroups mounts are cleared.
stop() {
echo -n "Stopping cgconfig service: "
cgclear
rm -f /var/lock/subsys/$servicename
log_success_msg
}
2) its not
3) config:
mount {
cpuset = /cgroup/cpuset;
cpu = /cgroup/cpu;
cpuacct = /cgroup/cpuacct;
memory = /cgroup/memory;
devices = /cgroup/devices;
freezer = /cgroup/freezer;
net_cls = /cgroup/net_cls;
blkio = /cgroup/blkio;
}
I can't reproduce this particular bug however I've noticed one of the LTP cgroup regression tests started failing intermittently starting with about the -117 kernel. The only symptom of the failure I get is rmdir failing to remove a directory because it wasn't empty. The tests themselves seem to produce identical results and they all still pass, the failure happens during the cleanup phase. I suspect sometimes one of the cgroups doesn't get unmounted cleanly. The LTP tests do not use libgroup. Would it be possible for you to try and reproduce the problem with the -116 kernel or something older? I'm wondering if we are seeing a regression in the kernel. well, its quit problematic for me to go back. I'm currently using kernel-2.6.32-118.el6.x86_64. ok, I looked into the test I have failing more today, It is basically just creating a cgroup mount point directory, mounting it, umounting it over and over in a loop which checking for kernel oops's in the logs. When I added some more instrumentation to the test to see what was failing when, the problem went away. Something smells like a race condition here and I'm not sure if in my case it's in the kernel or my test, I don't think it's a test issue since older kernels don't seem to have the problem. I'm thinking sometimes we are getting into a situation where a cgroup doesn't unmount when it is supposed to. Hello Haim, can you try to reproduce the problem with the newest version of libcgroup package - libcgroup-0.37-2.el6 and write here whether it is affected. (In reply to comment #11) > Hello Haim, > can you try to reproduce the problem with the newest version of libcgroup > package - libcgroup-0.37-2.el6 and write here whether it is affected. Hi, tested again with libcgroup-0.37-2.el6.x86_64 and problem has not reproduced! manage to stop\start\stop service with 0 errors. what is the difference ? how come this version resolves this issue? There are no significant differences between libcgroup-0.37-2 and libcgroup-0.37-1, I have no explanation why it works now. I suppose the kernel got updated too... Keep an eye on it and reopen the bug if it happens again. |