Bug 862803

Summary: UV: numad fails on SGI - libcgroup issue
Product: Red Hat Enterprise Linux 6 Reporter: George Beshers <gbeshers>
Component: libcgroupAssignee: Jan Chaloupka <jchaloup>
Status: CLOSED NOTABUG QA Contact: Mike Gahagan <mgahagan>
Severity: high Docs Contact:
Priority: medium    
Version: 6.4CC: ccui, ctatman, gbeshers, jsafrane, mmilgram, ovasik, pschiffe, tlavigne, varekova
Target Milestone: rc   
Target Release: 6.5   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-06-24 06:55:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 961026, 996235, 1056239, 1075802, 1164899    

Description George Beshers 2012-10-03 15:12:16 UTC
Description of problem:
I am working on case 00696618.

The customer attempted to run "numad" on an SGI uv100 machine, but it failed because /cgroup/cpuset/cpuset.cpus doesn't exist.

With a default configuration on standard hardware, that file exists.

I notice that there are several cpuset and numa packages installed that are not from Red Hat, but from SGI.  For example:

cpuset-utils-2.0-sgi706r1.rhel6.x86_64                      Thu 09 Aug 2012 01:55:26 PM CEST
kmod-numatools-2.0-sgi706r3.rhel6.x86_64                    Mon 20 Aug 2012 05:52:33 PM CEST
libcpuset-1.0-sgi706r1.rhel6.x86_64                         Thu 09 Aug 2012 01:45:29 PM CEST
libnuma-3.0sgi-sgi706r1.rhel6.x86_64                        Thu 09 Aug 2012 01:55:35 PM CEST
numatools-2.0-sgi706r8.rhel6.x86_64                         Mon 20 Aug 2012 06:07:31 PM CEST

There are numerous other SGI packages, but it is not obvious to me what effects they have on cgroups, but they might be effecting the layout of /cgroup, such that it doesn't have /cgroup/cpuset/cpuset.cpus.


numad is currently in tech preview, but we probably want to fix the issue.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 George Beshers 2012-10-03 15:18:11 UTC
/etc/init.d/cgconfig is failing to start because the default config, which
lives in /etc/cgconfig.conf, tries to mount --among other controllers-- the
cgroup memory controller, but because we boot UV with cgroup_disable=memory,
the mount fails (more accurately cgconfigparser -l /etc/cgconfig.conf fails)
and the cgconfig service is designed such as you get either everything
or nothing.

As a test, I moved the original /etc/cgconfig.conf to /etc/cgconfig.conf.orig,
removed the memory = /cgroup/memory; line from /etc/cgconfig.conf, and now
things are working as expected i.e. you get all cgroup controllers aside from
the memory one.

There is a problem with scalability without "cgroup_disabled=memory".
This may not be an issue for a UV100 system.

Also, I have not tested this recently so it is worth checking
for Rhel7.

Comment 3 George Beshers 2012-10-03 15:37:42 UTC
Note: they will need to edit /etc/sysconfig/uvconfig to avoid
it editing /boot/efi/efi/redhat/grub.conf.

Comment 4 RHEL Program Management 2012-12-14 07:00:25 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 5 Peter Schiffer 2013-08-13 17:22:17 UTC
George,

so, what would be the ideal solution for this? Just skip the controllers defined in /etc/cgconfig.conf file which failed to mount?

Thanks,

peter

Comment 6 RHEL Program Management 2013-10-14 04:45:16 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 7 Jan Chaloupka 2014-02-25 12:33:47 UTC
Is there any progress in solving this issue? Ping George Beshers. Peter Schiffer, what is the latest information about it?

Comment 8 Peter Schiffer 2014-02-25 16:26:28 UTC
I've never get reply to my question from comment 5, so there's not much on my side.

peter

Comment 10 Jan Chaloupka 2014-03-05 13:45:03 UTC
Skipping not mounted controllers/subsystems is the same as not having them in cgconfig.conf. Just need to know which ones to skip. But if this is just the memory, is there any problem with commenting/removing the line with memory = /cgroup/memory in /etc/cgconfig.conf?

Comment 11 Jan Chaloupka 2014-08-05 07:09:52 UTC
ping

Comment 13 George Beshers 2015-06-24 00:32:26 UTC
This is no longer a problem.

This BZ should be closed.

Comment 14 Jan Chaloupka 2015-06-24 06:55:11 UTC
Thank you George for letting us know.

Kind Regards

Jan