Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 681860

Summary:	[rhel6.1] [libcgroup] service cgconfig start fails on permission error (service restart works)
Product:	Red Hat Enterprise Linux 6	Reporter:	Haim <hateya>
Component:	libcgroup	Assignee:	Ivana Varekova <varekova>
Status:	CLOSED WORKSFORME	QA Contact:	Red Hat Kernel QE team <kernel-qe>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	6.1	CC:	abaron, danken, dnaori, jsafrane, mgoldboi, rvokal, yeylon, ykaul
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-04-13 11:08:03 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Haim 2011-03-03 13:03:13 UTC

Description of problem:

unable to start cgconfig service after stopping it, get the following error: 

[root@rhev-i32c-01 ~]# /etc/init.d/cgconfig start                              
Starting cgconfig service: Loading configuration file /etc/cgconfig.conf failed
Permission denied                                                              
Failed to parse /etc/cgconfig.conf                         [FAILED]  

at the beginning, I thought some other process takes the lock (checked with lsof but it was free), but, when I used 
'restart' option (after service is stopped) it worked. 

not sure if its a design issue, but its an issue for sure. 

libcgroup-0.37-1.el6.x86_64
kernel-2.6.32-118.el6.x86_64
2.6.32-118.el6.x86_64

Comment 2 Dan Kenigsberg 2011-03-03 13:20:23 UTC

What's getenforce? Are there interesting AVCs in audit.log?

Comment 3 Haim 2011-03-03 13:28:13 UTC

no, 

[root@rhev-i32c-01 ~]# service cgconfig start
Starting cgconfig service: Loading configuration file /etc/cgconfig.conf failed
Cgroup mounting failed
Failed to parse /etc/cgconfig.conf                         [FAILED]

[root@rhev-i32c-01 ~]# getenforce
Permissive

[root@rhev-i32c-01 ~]# service cgconfig restart
Stopping cgconfig service:                                 [  OK  ]
Starting cgconfig service:                                 [  OK  ]
[root@rhev-i32c-01 ~]#

Comment 4 Jan Safranek 2011-03-03 13:45:24 UTC

(In reply to comment #3)
> [root@rhev-i32c-01 ~]# service cgconfig start
> Starting cgconfig service: Loading configuration file /etc/cgconfig.conf failed
> Cgroup mounting failed
> Failed to parse /etc/cgconfig.conf                         [FAILED]
> 
> [root@rhev-i32c-01 ~]# service cgconfig restart
> Stopping cgconfig service:                                 [  OK  ]
> Starting cgconfig service:                                 [  OK  ]
> [root@rhev-i32c-01 ~]#

This can indicate that you have already mounted some cgroups somewhere. What does cat /proc/mounts and cat /proc/cgroups tell you?

Comment 5 Haim 2011-03-03 14:13:36 UTC

[root@camel-vdsa ~]# cat /proc/mounts && cat /proc/cgroups 
rootfs / rootfs rw 0 0                                     
/proc /proc proc rw,relatime 0 0                           
/sys /sys sysfs rw,seclabel,relatime 0 0                   
udev /dev devtmpfs rw,seclabel,relatime,size=8158948k,nr_inodes=2039737,mode=755 0 0
devpts /dev/pts devpts rw,seclabel,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /dev/shm tmpfs rw,seclabel,relatime 0 0
/dev/mapper/vg0-lv_root / ext4 rw,seclabel,relatime,barrier=1,data=ordered 0 0
none /selinux selinuxfs rw,relatime 0 0
udev /dev devtmpfs rw,seclabel,relatime,size=8158948k,nr_inodes=2039737,mode=755 0 0
/proc/bus/usb /proc/bus/usb usbfs rw,relatime 0 0
/dev/sda1 /boot ext4 rw,seclabel,relatime,barrier=1,data=ordered 0 0
/dev/mapper/vg0-lv_home /home ext4 rw,seclabel,relatime,barrier=1,data=ordered 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0
cgroup /cgroup/cpuset cgroup rw,relatime,cpuset 0 0
cgroup /cgroup/cpu cgroup rw,relatime,cpu 0 0
cgroup /cgroup/cpuacct cgroup rw,relatime,cpuacct 0 0
cgroup /cgroup/memory cgroup rw,relatime,memory 0 0
cgroup /cgroup/devices cgroup rw,relatime,devices 0 0
cgroup /cgroup/freezer cgroup rw,relatime,freezer 0 0
cgroup /cgroup/net_cls cgroup rw,relatime,net_cls 0 0
cgroup /cgroup/blkio cgroup rw,relatime,blkio 0 0
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0
//10.35.116.10/log/ /mnt/log cifs rw,relatime,unc=\\10.35.116.10\log,username=administrator,uid=0,noforceuid,gid=0,noforcegid,addr=10.35.116.10,file_mode=0755,dir_mode=0755,serverino,rsize=16384,wsize=57344 0 0
//10.35.116.10/log/ /mnt/log cifs rw,relatime,unc=\\10.35.116.10\log,username=administrator,uid=0,noforceuid,gid=0,noforcegid,addr=10.35.116.10,file_mode=0755,dir_mode=0755,serverino,rsize=16384,wsize=57344 0 0
//10.35.116.10/log/ /mnt/log cifs rw,relatime,unc=\\10.35.116.10\log,username=administrator,uid=0,noforceuid,gid=0,noforcegid,addr=10.35.116.10,file_mode=0755,dir_mode=0755,serverino,rsize=16384,wsize=57344 0 0
/etc/auto.misc /misc autofs rw,relatime,fd=7,pgrp=2046,timeout=300,minproto=5,maxproto=5,indirect 0 0
-hosts /net autofs rw,relatime,fd=13,pgrp=2046,timeout=300,minproto=5,maxproto=5,indirect 0 0
#subsys_name    hierarchy       num_cgroups     enabled
cpuset  1       4       1
ns      0       1       1
cpu     2       4       1
cpuacct 3       4       1
memory  4       4       1
devices 5       4       1
freezer 6       4       1
net_cls 7       1       1
blkio   8       4       1

Comment 6 Jan Safranek 2011-03-03 14:33:29 UTC

Just as I thought, cgroups are still mounted after cgconfig stop. Question is why? Do you have the controllers mounted twice?

(Almost) the only thing "service cgconfig stop" does is calling /sbin/cgclear.
1) Please check, that cgclear is called (using "bash -x /etc/init.d/cgconfig stop"). It should *not* leave any cgroup mounted (check with the /proc/mounts).
2) If cgclear keeps something mounted, please provide its strace

Please note that second call to cgclear probably succeeds (that's why service cgconfig restart works), so you may need to re-start the service to the weird state.

Provide also your /etc/cgconfig.conf (although it seems it is the default one).

Comment 7 Haim 2011-03-03 21:16:42 UTC

first - I'm able to start service when bash -xv precedes the initial service start command. without it, it fails. 
now, for your questions:

1) cgclear is called and all cgroups mounts are cleared. 

stop() {
    echo -n "Stopping cgconfig service: "
    cgclear
    rm -f /var/lock/subsys/$servicename
    log_success_msg
}


2) its not 
3) config: 

mount {
        cpuset  = /cgroup/cpuset;
        cpu     = /cgroup/cpu;
        cpuacct = /cgroup/cpuacct;
        memory  = /cgroup/memory;
        devices = /cgroup/devices;
        freezer = /cgroup/freezer;
        net_cls = /cgroup/net_cls;
        blkio   = /cgroup/blkio;
}

Comment 8 Mike Gahagan 2011-03-08 19:41:26 UTC

I can't reproduce this particular bug however I've noticed one of the LTP cgroup regression tests started failing intermittently starting with about the -117 kernel. The only symptom of the failure I get is rmdir failing to remove a directory because it wasn't empty. The tests themselves seem to produce identical results and they all still pass, the failure happens during the cleanup phase. I suspect sometimes one of the cgroups doesn't get unmounted cleanly. The LTP tests do not use libgroup. 

Would it be possible for you to try and reproduce the problem with the -116 kernel or something older? I'm wondering if we are seeing a regression in the kernel.

Comment 9 Haim 2011-03-09 07:39:07 UTC

well, its quit problematic for me to go back. 
I'm currently using kernel-2.6.32-118.el6.x86_64.

Comment 10 Mike Gahagan 2011-03-09 20:37:58 UTC

ok, I looked into the test I have failing more today, It is basically just creating a cgroup mount point directory, mounting it, umounting it over and over in a loop which checking for kernel oops's in the logs. When I added some more instrumentation to the test to see what was failing when, the problem went away. Something smells like a race condition here and I'm not sure if in my case it's in the kernel or my test, I don't think it's a test issue since older kernels don't seem to have the problem. I'm thinking sometimes we are getting into a situation where a cgroup doesn't unmount when it is supposed to.

Comment 11 Ivana Varekova 2011-03-11 12:29:19 UTC

Hello Haim,
can you try to reproduce the problem with the newest version of libcgroup package - libcgroup-0.37-2.el6 and write here whether it is affected.

Comment 12 Haim 2011-03-11 13:48:13 UTC

(In reply to comment #11)
> Hello Haim,
> can you try to reproduce the problem with the newest version of libcgroup
> package - libcgroup-0.37-2.el6 and write here whether it is affected.

Hi, 

tested again with libcgroup-0.37-2.el6.x86_64 and problem has not reproduced! 
manage to stop\start\stop service with 0 errors. 
what is the difference ? how come this version resolves this issue?

Comment 13 Jan Safranek 2011-04-13 11:08:03 UTC

There are no significant differences between libcgroup-0.37-2 and libcgroup-0.37-1, I have no explanation why it works now. I suppose the kernel got updated too...

Keep an eye on it and reopen the bug if it happens again.