Bug 681860 - [rhel6.1] [libcgroup] service cgconfig start fails on permission error (service restart works)
Summary: [rhel6.1] [libcgroup] service cgconfig start fails on permission error (servi...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libcgroup
Version: 6.1
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Ivana Varekova
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-03-03 13:03 UTC by Haim
Modified: 2014-01-13 00:49 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-04-13 11:08:03 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Haim 2011-03-03 13:03:13 UTC
Description of problem:

unable to start cgconfig service after stopping it, get the following error: 

[root@rhev-i32c-01 ~]# /etc/init.d/cgconfig start                              
Starting cgconfig service: Loading configuration file /etc/cgconfig.conf failed
Permission denied                                                              
Failed to parse /etc/cgconfig.conf                         [FAILED]  

at the beginning, I thought some other process takes the lock (checked with lsof but it was free), but, when I used 
'restart' option (after service is stopped) it worked. 

not sure if its a design issue, but its an issue for sure. 

libcgroup-0.37-1.el6.x86_64
kernel-2.6.32-118.el6.x86_64
2.6.32-118.el6.x86_64

Comment 2 Dan Kenigsberg 2011-03-03 13:20:23 UTC
What's getenforce? Are there interesting AVCs in audit.log?

Comment 3 Haim 2011-03-03 13:28:13 UTC
no, 

[root@rhev-i32c-01 ~]# service cgconfig start
Starting cgconfig service: Loading configuration file /etc/cgconfig.conf failed
Cgroup mounting failed
Failed to parse /etc/cgconfig.conf                         [FAILED]

[root@rhev-i32c-01 ~]# getenforce
Permissive

[root@rhev-i32c-01 ~]# service cgconfig restart
Stopping cgconfig service:                                 [  OK  ]
Starting cgconfig service:                                 [  OK  ]
[root@rhev-i32c-01 ~]#

Comment 4 Jan Safranek 2011-03-03 13:45:24 UTC
(In reply to comment #3)
> [root@rhev-i32c-01 ~]# service cgconfig start
> Starting cgconfig service: Loading configuration file /etc/cgconfig.conf failed
> Cgroup mounting failed
> Failed to parse /etc/cgconfig.conf                         [FAILED]
> 
> [root@rhev-i32c-01 ~]# service cgconfig restart
> Stopping cgconfig service:                                 [  OK  ]
> Starting cgconfig service:                                 [  OK  ]
> [root@rhev-i32c-01 ~]#

This can indicate that you have already mounted some cgroups somewhere. What does cat /proc/mounts and cat /proc/cgroups tell you?

Comment 5 Haim 2011-03-03 14:13:36 UTC
[root@camel-vdsa ~]# cat /proc/mounts && cat /proc/cgroups 
rootfs / rootfs rw 0 0                                     
/proc /proc proc rw,relatime 0 0                           
/sys /sys sysfs rw,seclabel,relatime 0 0                   
udev /dev devtmpfs rw,seclabel,relatime,size=8158948k,nr_inodes=2039737,mode=755 0 0
devpts /dev/pts devpts rw,seclabel,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /dev/shm tmpfs rw,seclabel,relatime 0 0
/dev/mapper/vg0-lv_root / ext4 rw,seclabel,relatime,barrier=1,data=ordered 0 0
none /selinux selinuxfs rw,relatime 0 0
udev /dev devtmpfs rw,seclabel,relatime,size=8158948k,nr_inodes=2039737,mode=755 0 0
/proc/bus/usb /proc/bus/usb usbfs rw,relatime 0 0
/dev/sda1 /boot ext4 rw,seclabel,relatime,barrier=1,data=ordered 0 0
/dev/mapper/vg0-lv_home /home ext4 rw,seclabel,relatime,barrier=1,data=ordered 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0
cgroup /cgroup/cpuset cgroup rw,relatime,cpuset 0 0
cgroup /cgroup/cpu cgroup rw,relatime,cpu 0 0
cgroup /cgroup/cpuacct cgroup rw,relatime,cpuacct 0 0
cgroup /cgroup/memory cgroup rw,relatime,memory 0 0
cgroup /cgroup/devices cgroup rw,relatime,devices 0 0
cgroup /cgroup/freezer cgroup rw,relatime,freezer 0 0
cgroup /cgroup/net_cls cgroup rw,relatime,net_cls 0 0
cgroup /cgroup/blkio cgroup rw,relatime,blkio 0 0
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0
//10.35.116.10/log/ /mnt/log cifs rw,relatime,unc=\\10.35.116.10\log,username=administrator,uid=0,noforceuid,gid=0,noforcegid,addr=10.35.116.10,file_mode=0755,dir_mode=0755,serverino,rsize=16384,wsize=57344 0 0
//10.35.116.10/log/ /mnt/log cifs rw,relatime,unc=\\10.35.116.10\log,username=administrator,uid=0,noforceuid,gid=0,noforcegid,addr=10.35.116.10,file_mode=0755,dir_mode=0755,serverino,rsize=16384,wsize=57344 0 0
//10.35.116.10/log/ /mnt/log cifs rw,relatime,unc=\\10.35.116.10\log,username=administrator,uid=0,noforceuid,gid=0,noforcegid,addr=10.35.116.10,file_mode=0755,dir_mode=0755,serverino,rsize=16384,wsize=57344 0 0
/etc/auto.misc /misc autofs rw,relatime,fd=7,pgrp=2046,timeout=300,minproto=5,maxproto=5,indirect 0 0
-hosts /net autofs rw,relatime,fd=13,pgrp=2046,timeout=300,minproto=5,maxproto=5,indirect 0 0
#subsys_name    hierarchy       num_cgroups     enabled
cpuset  1       4       1
ns      0       1       1
cpu     2       4       1
cpuacct 3       4       1
memory  4       4       1
devices 5       4       1
freezer 6       4       1
net_cls 7       1       1
blkio   8       4       1

Comment 6 Jan Safranek 2011-03-03 14:33:29 UTC
Just as I thought, cgroups are still mounted after cgconfig stop. Question is why? Do you have the controllers mounted twice?

(Almost) the only thing "service cgconfig stop" does is calling /sbin/cgclear.
1) Please check, that cgclear is called (using "bash -x /etc/init.d/cgconfig stop"). It should *not* leave any cgroup mounted (check with the /proc/mounts).
2) If cgclear keeps something mounted, please provide its strace

Please note that second call to cgclear probably succeeds (that's why service cgconfig restart works), so you may need to re-start the service to the weird state.

Provide also your /etc/cgconfig.conf (although it seems it is the default one).

Comment 7 Haim 2011-03-03 21:16:42 UTC
first - I'm able to start service when bash -xv precedes the initial service start command. without it, it fails. 
now, for your questions:

1) cgclear is called and all cgroups mounts are cleared. 

stop() {
    echo -n "Stopping cgconfig service: "
    cgclear
    rm -f /var/lock/subsys/$servicename
    log_success_msg
}


2) its not 
3) config: 

mount {
        cpuset  = /cgroup/cpuset;
        cpu     = /cgroup/cpu;
        cpuacct = /cgroup/cpuacct;
        memory  = /cgroup/memory;
        devices = /cgroup/devices;
        freezer = /cgroup/freezer;
        net_cls = /cgroup/net_cls;
        blkio   = /cgroup/blkio;
}

Comment 8 Mike Gahagan 2011-03-08 19:41:26 UTC
I can't reproduce this particular bug however I've noticed one of the LTP cgroup regression tests started failing intermittently starting with about the -117 kernel. The only symptom of the failure I get is rmdir failing to remove a directory because it wasn't empty. The tests themselves seem to produce identical results and they all still pass, the failure happens during the cleanup phase. I suspect sometimes one of the cgroups doesn't get unmounted cleanly. The LTP tests do not use libgroup. 

Would it be possible for you to try and reproduce the problem with the -116 kernel or something older? I'm wondering if we are seeing a regression in the kernel.

Comment 9 Haim 2011-03-09 07:39:07 UTC
well, its quit problematic for me to go back. 
I'm currently using kernel-2.6.32-118.el6.x86_64.

Comment 10 Mike Gahagan 2011-03-09 20:37:58 UTC
ok, I looked into the test I have failing more today, It is basically just creating a cgroup mount point directory, mounting it, umounting it over and over in a loop which checking for kernel oops's in the logs. When I added some more instrumentation to the test to see what was failing when, the problem went away. Something smells like a race condition here and I'm not sure if in my case it's in the kernel or my test, I don't think it's a test issue since older kernels don't seem to have the problem. I'm thinking sometimes we are getting into a situation where a cgroup doesn't unmount when it is supposed to.

Comment 11 Ivana Varekova 2011-03-11 12:29:19 UTC
Hello Haim,
can you try to reproduce the problem with the newest version of libcgroup package - libcgroup-0.37-2.el6 and write here whether it is affected.

Comment 12 Haim 2011-03-11 13:48:13 UTC
(In reply to comment #11)
> Hello Haim,
> can you try to reproduce the problem with the newest version of libcgroup
> package - libcgroup-0.37-2.el6 and write here whether it is affected.

Hi, 

tested again with libcgroup-0.37-2.el6.x86_64 and problem has not reproduced! 
manage to stop\start\stop service with 0 errors. 
what is the difference ? how come this version resolves this issue?

Comment 13 Jan Safranek 2011-04-13 11:08:03 UTC
There are no significant differences between libcgroup-0.37-2 and libcgroup-0.37-1, I have no explanation why it works now. I suppose the kernel got updated too...

Keep an eye on it and reopen the bug if it happens again.


Note You need to log in before you can comment on or make changes to this bug.