Bug 1000174 - [oo-accept-node] Fails to check if users are properly in cgroups
Summary: [oo-accept-node] Fails to check if users are properly in cgroups
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Containers
Version: 2.x
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Jason DeTiberus
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks: 1001151
TreeView+ depends on / blocked
 
Reported: 2013-08-22 21:07 UTC by Kenny Woodson
Modified: 2015-05-14 23:27 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1001151 (view as bug list)
Environment:
Last Closed: 2013-09-19 16:47:54 UTC
Target Upstream Version:
xtian: needinfo-


Attachments (Terms of Use)

Description Kenny Woodson 2013-08-22 21:07:45 UTC
Description of problem:

While doing a little research into checking process' memory I stumbled upon a process that was _not_ being confined in cgroups but oo-accept-node did not report on this.

After some investigation I found a definite bug in oo-accept-node that doesn't detect cgroup processes correctly.

Version-Release number of selected component (if applicable):
Current

How reproducible:
Easy

Steps to Reproduce:
1. Start a process.
2. Remove the procs out of /cgroup/all/openshift/UUID/cgroup.procs
3. Run oo-accept-node -v

Actual results:
The check completely fails to detect when cgroup procs are not in the cgroup.procs file but are running in the ps table.

Expected results:
Properly detect processes that are running but not in cgroups.

Additional info:

There are a few problems here.

The ENV['GEAR_MIN_UID'] is '500'.  
-This is a string.  
-This is _also_ incorrect as the minimum gear UID should be 1000.  There should never be a gear less than 1000.

The ENV['GEAR_MAX_UID'] is '6500'.
-This is a string.  
-This is _also_ incorrect as the minimum gear UID should be 1000.  There should never be a gear less than 1000.

FIX:
min_uid = ENV['GEAR_MIN_UID'].to_i
max_uid = ENV['GEAR_MAX_UID'].to_i



uid and pid were strings.
Fix:
    all_user_procs.each do |line|
        uid,pid = line.split
        uid = uid.to_i
        pid = pid.to_i

Let's also keep in mind that some of our nodes have 3000+ users on them and we need this script to achieve decent performance.
Would be nice if $USERS was a hash:
$USERS['uuid'] = #old user data

passwd_lines = $USERS.select { |u| u.uid == uid }

Comment 1 Jason DeTiberus 2013-08-23 15:24:41 UTC
https://github.com/openshift/origin-server/pull/3483

Comment 2 openshift-github-bot 2013-08-23 22:59:48 UTC
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/2003bc01f12cccd54b9e61390e8ea3931f889a2c
<oo-accept-node> Bug 1000174 - oo-accept-node fixes

https://bugzilla.redhat.com/show_bug.cgi?id=1000174

In check_cgroups_procs: Convert uid string values to integer before
comparisons, test for all defined cgroups controllers (not just all or
memory)

Remove unnecessary call to $USERS.dup

Fix an issue where 3 digit uids would not be verified in
check_cgroups_procs (this is the case for a non-district node wit the
default node.conf)

Update default node.conf values to match the default district values for
min/max uids

Comment 3 Jianwei Hou 2013-08-26 09:58:39 UTC
Tested on devenv-stage_457
The node.conf file is not merged in the env, for I still get:

GEAR_MIN_UID=500                                             # Lower bound of UID used to create gears
GEAR_MAX_UID=6500


In my test, I found that the /cgroup/all/openshift/UUID/cgroup.procs is not writable, whenever I tried to update the fail, I got rejected, is there any way to achieve step 2 in the bug description?

Please also help to move the bug to on_qa, thanks!

Comment 4 Xiaoli Tian 2013-08-26 10:55:21 UTC
Tested it on devenv-stage_457, after the stop and start cgconfig service, the existing process will be removed from cgroup.procs files:

[root@ip-10-40-54-111 ~]# service cgconfig stop
Stopping cgconfig service:                                 [  OK  ]
[root@ip-10-40-54-111 ~]# service cgconfig start
Starting cgconfig service:                                 [  OK  ]
[root@ip-10-40-54-111 ~]# cat  /cgroup/all/openshift/521b31aaddde1c0acd000003/cgroup.procs 
[root@ip-10-40-54-111 ~]# 

If process is not existing in /cgroup/all/openshift/UUID/cgroup.procs, but existing in ps table, oo-accept-node will report the error

[root@ip-10-40-54-111 ~]# oo-accept-node 
FAIL: 521b31aaddde1c0acd000003 has a process missing from cgroups: 16951 cgroups controller: all
FAIL: 521b31aaddde1c0acd000003 has a process missing from cgroups: 16952 cgroups controller: all
FAIL: 521b31aaddde1c0acd000003 has a process missing from cgroups: 16953 cgroups controller: all
FAIL: 521b31aaddde1c0acd000003 has a process missing from cgroups: 16967 cgroups controller: all

After running the following scripts, all the cgroup configure will come back to normal:
[root@ip-10-40-54-111 ~]# oo-cgroup-enable --with-all-containers
[root@ip-10-40-54-111 ~]# cat  /cgroup/all/openshift/521b31aaddde1c0acd000003/cgroup.procs
16951
16952
16953
16967
[root@ip-10-40-54-111 ~]# oo-accept-node 
PASS

The only remaining issue in this bug is max and min gear id is not built in the devenv-stage image, not sure why.

[root@ip-10-40-54-111 ~]# cat /etc/openshift/node.conf|grep GEAR_M
GEAR_MIN_UID=500                      # Lower bound of UID used to create gears
GEAR_MAX_UID=6500                     # Upper bound of UID used to create gears

The package version is
rubygem-openshift-origin-node-1.13.12-1.el6oso.noarch

Comment 5 Jason DeTiberus 2013-08-26 13:23:16 UTC
Step 2 can also be replicated by using cgclassify: 'cglassify -g cpu,cpuacct,memory,net_cls,freezer:/ <pidlist>'

The node.conf file is listed as noreplace in the spec file, so it will not be updated by just updating the RPMs.  

Also, the devenv RPM copies node.conf.libra (in the li repo) to node.conf, submitted PR: https://github.com/openshift/li/pull/1857 to address this.

For the other environments, Ops will need to make any changes needed to the config files that already exist in production.

Comment 6 openshift-github-bot 2013-08-29 20:11:10 UTC
Commit pushed to master at https://github.com/openshift/li

https://github.com/openshift/li/commit/f15cc4308622ac1f86c7d93a393d9ab79840729b
Bug 1000174 - Update node.conf.libra for GEAR_MIN_UID and GEAR_MAX_UID

https://bugzilla.redhat.com/show_bug.cgi?id=1000174

Update default node.conf.libra values to match the default district
values for GEAR_MIN_UID and GEAR_MAX_UID

Comment 7 Meng Bo 2013-09-02 06:49:32 UTC
[root@ip-10-40-93-30 ~]# cat /etc/openshift/node.conf|grep GEAR|grep UID
GEAR_MIN_UID=1000                                            # Lower bound of UID used to create gears
GEAR_MAX_UID=6999                                            # Upper bound of UID used to create gears

The gear uid range has been updated on devenv_3734.

Move bug to verified.


Note You need to log in before you can comment on or make changes to this bug.