Bug 988822

Summary: [origin_runtime_191] Gear with high cpu usage will not be throttled automatically
Product: OpenShift Online Reporter: Meng Bo <bmeng>
Component: ContainersAssignee: Fotios Lindiakos <fotios>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.xCC: jkeck
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-08-07 22:57:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Meng Bo 2013-07-26 13:36:06 UTC
Description of problem:
Create a gear, make sure the watchman in running. Use some script to make the CPU usage keep a high level for the gear. Watch the /var/log/messages to see if the gear will be throttled by the abuse.

Check the gear cpu_cfs.quota.us after a while.

Version-Release number of selected component (if applicable):
fork_ami_origin_runtime_183_and_191_724

How reproducible:
always

Steps to Reproduce:
1.Create an app
2.SSH login to the gear run the following script to generate high CPU performance
for i in `seq 1 10`;
 do ( while true; do true; done ) & 
done
3. Check if the gear cgroup can be throttled


Actual results:
During the CPU keep high usage, the gear cgroup cpu setting will not be changed.

Expected results:
The cgroup setting should be reduced since the abuse.

Additional info:
[php1-bmengdev.dev.rhcloud.com 51f2668b9e3e140a3e000001]\> top

top - 09:29:18 up  1:30,  0 users,  load average: 24.59, 17.76, 9.75
Tasks:  27 total,  19 running,   8 sleeping,   0 stopped,   0 zombie
Cpu(s): 99.3%us,  0.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3717184k total,  1259260k used,  2457924k free,    55232k buffers
Swap:  1023992k total,        0k used,  1023992k free,   315864k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                  
 9178 501       20   0  106m  972  300 R  5.6  0.0   0:27.64 bash                                                                                                     
 9196 501       20   0  106m  972  300 R  5.6  0.0   0:25.21 bash                                                                                                     
12504 501       20   0  106m  956  296 R  5.6  0.0   0:03.50 bash                                                                                                     
12506 501       20   0  106m  960  300 R  5.6  0.0   0:03.50 bash                                                                                                     
12507 501       20   0  106m  960  300 R  5.6  0.0   0:03.50 bash                                                                                                     
12508 501       20   0  106m  960  300 R  5.6  0.0   0:03.50 bash                                                                                                     
12509 501       20   0  106m  964  300 R  5.6  0.0   0:03.50 bash                                                                                                     
12510 501       20   0  106m  964  300 R  5.6  0.0   0:03.50 bash                                                                                                     
12511 501       20   0  106m  964  300 R  5.6  0.0   0:03.50 bash                                                                                                     
12512 501       20   0  106m  964  300 R  5.6  0.0   0:03.50 bash                                                                                                     
 9176 501       20   0  106m  972  300 R  5.3  0.0   0:27.63 bash                                                                                                     
 9177 501       20   0  106m  972  300 R  5.3  0.0   0:27.63 bash                                                                                                     
 9179 501       20   0  106m  972  300 R  5.3  0.0   0:27.63 bash                                                                                                     
 9193 501       20   0  106m  972  300 R  5.3  0.0   0:25.21 bash                                                                                                     
 9194 501       20   0  106m  972  300 R  5.3  0.0   0:25.20 bash                                                                                                     
 9195 501       20   0  106m  972  300 R  5.3  0.0   0:25.21 bash                                                                                                     
12505 501       20   0  106m  960  300 R  5.3  0.0   0:03.50 bash                                                                                                     
12513 501       20   0  106m  964  300 R  5.3  0.0   0:03.49 bash                                                                                                     
 9644 501       20   0 14892 1248 1008 R  0.3  0.0   0:00.56 top                                                                                                      
 1134 501       20   0  100m 1896  848 S  0.0  0.1   0:00.21 sshd                                                                                                     
 1135 501       20   0  106m 2232 1556 S  0.0  0.1   0:00.14 bash                                                                                                     
 5534 501       20   0  390m  13m 7928 S  0.0  0.4   0:00.20 httpd                                                                                                    
 5538 501       20   0 32208 1200  964 S  0.0  0.0   0:00.00 rotatelogs                                                                                               
 5541 501       20   0 32208 1076  836 S  0.0  0.0   0:00.00 rotatelogs                                                                                               
 5550 501       20   0  390m 6596  452 S  0.0  0.2   0:00.00 httpd                                                                                                    
12369 501       20   0  100m 1880  844 S  0.0  0.1   0:00.01 sshd                                                                                                     
12370 501       20   0  106m 2196 1532 S  0.0  0.1   0:00.13 bash  



#tailf /var/log/messages
Jul 26 09:27:58 ip-10-40-78-68 rhc-watchman[1958]: Running rhc-watchman => delay: 20s, exception threshold: 10
Jul 26 09:28:18 ip-10-40-78-68 rhc-watchman[1958]: Running rhc-watchman => delay: 20s, exception threshold: 10
Jul 26 09:28:38 ip-10-40-78-68 rhc-watchman[1958]: Running rhc-watchman => delay: 20s, exception threshold: 10
Jul 26 09:28:58 ip-10-40-78-68 rhc-watchman[1958]: Running rhc-watchman => delay: 20s, exception threshold: 10
Jul 26 09:29:18 ip-10-40-78-68 rhc-watchman[1958]: Running rhc-watchman => delay: 20s, exception threshold: 10
Jul 26 09:29:38 ip-10-40-78-68 rhc-watchman[1958]: Running rhc-watchman => delay: 20s, exception threshold: 10
Jul 26 09:29:58 ip-10-40-78-68 rhc-watchman[1958]: Running rhc-watchman => delay: 20s, exception threshold: 10
Jul 26 09:30:18 ip-10-40-78-68 rhc-watchman[1958]: Running rhc-watchman => delay: 20s, exception threshold: 10
Jul 26 09:30:38 ip-10-40-78-68 rhc-watchman[1958]: Running rhc-watchman => delay: 20s, exception threshold: 10
Jul 26 09:30:58 ip-10-40-78-68 rhc-watchman[1958]: Running rhc-watchman => delay: 20s, exception threshold: 10

Comment 1 Fotios Lindiakos 2013-07-26 23:07:40 UTC
This is being merged into master tonight and should be ready for testing.

Comment 2 Meng Bo 2013-07-29 08:35:18 UTC
Checked on devenv_3574. The throttler and restorer works fine now.


After burning the cpu,
check the /var/log/messages
Jul 29 04:20:32 ip-10-152-150-219 rhc-watchman[1940]: Running rhc-watchman => delay: 20s, exception threshold: 10
Jul 29 04:20:32 ip-10-152-150-219 rhc-watchman[1940]: Throttler: throttle => 51f623c5a721920a7d000001 (871.476)
Jul 29 04:20:52 ip-10-152-150-219 rhc-watchman[1940]: Running rhc-watchman => delay: 20s, exception threshold: 10
Jul 29 04:20:52 ip-10-152-150-219 rhc-watchman[1940]: Throttler: over_threshold => 51f623c5a721920a7d000001 (986.79)

check the cgroup setting of the gear
[php1-bmengdev1.dev.rhcloud.com 51f623c5a721920a7d000001]\> oo-cgroup-read cpu.cfs_quota_us
30000

Kill the script:
check the /var/log/messages
Jul 29 04:29:12 ip-10-152-150-219 rhc-watchman[1940]: Running rhc-watchman => delay: 20s, exception threshold: 10
Jul 29 04:29:12 ip-10-152-150-219 rhc-watchman[1940]: Throttler: restore => 51f623c5a721920a7d000001 (9.355)

check the cgroup setting of the gear
[php1-bmengdev1.dev.rhcloud.com 51f623c5a721920a7d000001]\> oo-cgroup-read cpu.cfs_quota_us
100000

Move bug to verified.