Bug 1005421
Summary: | oo-accept-node throwing "split" error during cgroup check | ||
---|---|---|---|
Product: | OpenShift Online | Reporter: | Matt Woodson <mwoodson> |
Component: | Containers | Assignee: | Rob Millner <rmillner> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | libra bugs <libra-bugs> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | 2.x | CC: | bmeng, chunchen, dmcphers, mfisher, mwoodson, rmillner, xtian |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-10-17 13:28:20 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Matt Woodson
2013-09-06 21:26:17 UTC
Interesting. It would appear as though the bug is that: /bin/ps -p #{pid} -o uid,pid,ppid,etime,cmd ...is returning characters outside US-ASCII. The only field that should have a chance to return non-ascii would be "cmd" (the whole command and args). I'd be really curious what command has unicode args - that seems suspicious. Next time you see the issue, save the output of the following command to a file. ps -e -o uid,pid,ppid,etime,cmd I'll modify the function so that it does not rely on split. Thanks! Putting in NEEDINFO to collect output of the following command next time the issue shows up. It needs to be output directly to a file instead of a pastebin to preserve original encoding. Thanks! ps -e -o uid,pid,ppid,etime,cmd Pull request to scrub non-ascii characters out of the command result... https://github.com/openshift/origin-server/pull/3587 Skipping the needinfo. Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/70d7b0c718225f505be206cde27a5c9795f02d25 Bug 1005421 - the ps command was returning unicode characters, strip them out. Hi Rob, Any suggestion for how to verify this bug? I am not sure why it is using US-ASCII encoding for the cgroup check. During my test, I used a command which contain Chinese chars, which should out of acsii certainly. But it will get pass without the fix. # ps -p 15460 -o uid,pid,cmd UID PID CMD 1002 15460 vim 测试.txt And it will not throw error during oo-accept-node INFO: checking cgroups processes FAIL: 522ec6dab57b356b3a000001 has a process missing from cgroups: 717 cgroups controller: all FAIL: 522ec6dab57b356b3a000001 has a process missing from cgroups: 14880 cgroups controller: all FAIL: 522ec6dab57b356b3a000001 has a process missing from cgroups: 14881 cgroups controller: all FAIL: 522ec6dab57b356b3a000001 has a process missing from cgroups: 15460 cgroups controller: all FAIL: 522ec6dab57b356b3a000001 has a process missing from cgroups: 30480 cgroups controller: all FAIL: 522ec6dab57b356b3a000001 has a process missing from cgroups: 30481 cgroups controller: all FAIL: 522ec6dab57b356b3a000001 has a process missing from cgroups: 30482 cgroups controller: all FAIL: 522ec6dab57b356b3a000001 has a process missing from cgroups: 30501 cgroups controller: all INFO: checking presence of tc qdisc The problem is only reproducible with LANG="C" and devenv normally runs LANG="en_US.UTF-8". This script does a lot of text parsing, I believe a better fix would be to force ruby to use unicode internally for text processing or set LANG in its running environment. Also, determine what other scripts are running under LANG=C in INT/STG/PROD. The issue of LANG being set improperly has been run into during gear moves. The issue in INT/STG/PROD likely has broader ramifications - chasing that down instead. We haven't been able to root-cause the reason why this script occasionally, spontaneously fails to properly handle unicode input. I'm hesitant to hard-code the workaround in Origin source but we should change how oo-accept-node runs in INT/STG/PROD to force unicode handling by default. /usr/bin/env oo-ruby -E UTF-8:UTF-8 /usr/sbin/oo-accept-node The release ticket has been updated to make that change. Running Q/E on this will be difficult since it requires a set of coincidences but this should work. 1. Create an app app 2. Log into the app and run the following ruby script: $0 = "Cédric" sleep 3. Run ps and observe that its process name changed to "Cédric". 4. Use cgclassify to move the process in #2 into the root cgroup. cgclassify -g cpu,cpuacct,memory,freezer,net_cls:/ [pid] 5. Run the following to observe the failure: LANG="C" /usr/sbin/oo-accept-node 6. Run the following to observe it working even though the language is set incorrectly. LANG="C" /usr/bin/env oo-ruby -E UTF-8:UTF-8 /usr/sbin/oo-accept-node Test on devenv_3776 with method in comment#11, It can get the expect error in step5 and can get expect result in step6, # LANG="C" /usr/sbin/oo-accept-node /usr/sbin/oo-accept-node:450:in `split': invalid byte sequence in US-ASCII (ArgumentError) from /usr/sbin/oo-accept-node:450:in `block (3 levels) in check_cgroup_procs' from /usr/sbin/oo-accept-node:449:in `each' from /usr/sbin/oo-accept-node:449:in `block (2 levels) in check_cgroup_procs' from /usr/sbin/oo-accept-node:446:in `each' from /usr/sbin/oo-accept-node:446:in `block in check_cgroup_procs' from /usr/sbin/oo-accept-node:445:in `each' from /usr/sbin/oo-accept-node:445:in `check_cgroup_procs' from /usr/sbin/oo-accept-node:841:in `<main>' # LANG="C" /usr/bin/env oo-ruby -E UTF-8:UTF-8 /usr/sbin/oo-accept-node FAIL: 523152a76849bb753100011c has a process missing from cgroups: 30977 cgroups controller: all Move bug to verified. Its facter. /opt/rh/ruby193/root/usr/share/ruby/vendor_ruby/facter.rb line 44 sets LANG to "C". Re-opening the ticket to review the implications for mcollective. Loading facter does not appear to change the internal string management; however, it affects oo_spawn. irb(main):001:0> require 'rubygems' => false irb(main):002:0> require 'openshift-origin-node' => true irb(main):003:0> ::OpenShift::Runtime::Utils::oo_spawn('echo $LANG') => ["en_US.UTF-8\n", "", 0] irb(main):004:0> require 'facter' => true irb(main):005:0> ::OpenShift::Runtime::Utils::oo_spawn('echo $LANG') => ["C\n", "", 0] Changing severity to low since a workaround was provided for the original problem. We don't currently know of any other issues related to factor changing LANG; but it seems like the sort of thing that causes subtle issues in the future. Puppet labs fixed the problem in facter 1.7.0 and 2.0.0. http://projects.puppetlabs.com/issues/12012 Built a new facter package for 1.7.3 which no longer sets LANG="C". https://brewweb.devel.redhat.com/buildinfo?buildID=296881 Waiting for it to be tagged into the release. The newer facter package has been tagged for the release. ruby193-facter-1.7.3-4.el6oso irb(main):003:0> require 'openshift-origin-node' => true irb(main):004:0> ::OpenShift::Runtime::Utils::oo_spawn('echo $LANG') => ["en_US.UTF-8\n", "", 0] irb(main):005:0> require 'facter' => true irb(main):006:0> ::OpenShift::Runtime::Utils::oo_spawn('echo $LANG') => ["en_US.UTF-8\n", "", 0] Tested on devenv_3837, issue has been fixed. The facter package version is: ruby193-facter-1.7.3-4.el6oso.x86_64 |