Bug 1280438 - haproxy_ctld error on a close-to-quota gear
Summary: haproxy_ctld error on a close-to-quota gear
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: ImageStreams
Version: 2.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Timothy Williams
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On: 1261540
Blocks: 1267325
TreeView+ depends on / blocked
 
Reported: 2015-11-11 18:03 UTC by Timothy Williams
Modified: 2015-12-17 17:11 UTC (History)
10 users (show)

Fixed In Version: openshift-origin-cartridge-haproxy-1.31.4.1-1.el6op rubygem-openshift-origin-node-1.38.4.1-1.el6op
Doc Type: Bug Fix
Doc Text:
When a gear was almost at its quota limit, a warning message was returned from commands used by haproxy_ctld. This caused the haproxy_ctld log to incorrectly fill up with the quota warning message, which was not intended to be logged by haproxy_ctld. This bug fix filters the warning messages when logging for haproxy_ctld, and as a result quota limit warning messages are no longer logged to the haproxy_ctld log file.
Clone Of: 1261540
Environment:
Last Closed: 2015-12-17 17:11:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:2666 0 normal SHIPPED_LIVE Important: Red Hat OpenShift Enterprise 2.2.8 security, bug fix, and enhancement update 2015-12-17 22:07:54 UTC

Description Timothy Williams 2015-11-11 18:03:51 UTC
+++ This bug was initially created as a clone of Bug #1261540 +++

Description of problem:

I have an app on rhcloud.com that's close to its quota.  During builds and when ssh-ing it tells me:
`Warning: Gear 55e73c8c0c1e66c54c00000d is using 93.9% of inodes allowed`
That's fine.

But I saw this nonsense in haproxy_ctld.log:
```
E, [2015-09-08T13:12:44.608192 #408384] ERROR -- : Failed to get stats from Warning:
E, [2015-09-08T13:12:44.608655 #408384] ERROR -- : Failed to get stats from Gear
E, [2015-09-08T13:12:44.609350 #408384] ERROR -- : Failed to get stats from 55e73c8c0c1e66c54c00000d
E, [2015-09-08T13:12:44.609828 #408384] ERROR -- : Failed to get stats from is
E, [2015-09-08T13:12:44.610308 #408384] ERROR -- : Failed to get stats from using
E, [2015-09-08T13:12:44.610412 #408384] ERROR -- : Failed to get stats from 93.9%
E, [2015-09-08T13:12:44.610906 #408384] ERROR -- : Failed to get stats from of
E, [2015-09-08T13:12:44.611436 #408384] ERROR -- : Failed to get stats from inodes
E, [2015-09-08T13:12:44.612098 #408384] ERROR -- : Failed to get stats from allowed
```
Digging into the source I see `get_remote_sessions_count()` is called with URLs from app_haproxy_status_urls.conf, which turns out already corrupted:
```
[mathdown468cf5aaf3prod-cben.rhcloud.com 55e73c8c0c1e66c54c00000d]\> cat haproxy/conf/app_haproxy_status_urls.conf 
Warning:
Gear
55e73c8c0c1e66c54c00000d
is
using
93.9%
of
inodes
allowed
```

From there I'm not sure where to debug...
This is a throwaway app, feel free to inspect or mutate it in any way, or tell me what to check.

Version-Release number of selected component (if applicable):
?

How reproducible:
Tried once, reproduced.

Steps to Reproduce:
1. create scalable app
2. dd if=/dev/zero ... to fill up to >90% quota.
   (the original discovery was for inodes quota, this repro is for size quota)
3. restart
4. scale to 2 gears
5. or maybe I restarted here?  not sure.
6. profit: 

Actual results:
```
[tmp-mathdown.rhcloud.com 55ef20d37628e14a5300019c]\> cat haproxy/conf/app_haproxy_status_urls.conf 
Warning:
Gear
55ef20d37628e14a5300019c
is
using
94.4%
of
disk
quota
[tmp-mathdown.rhcloud.com 55ef20d37628e14a5300019c]\> tail app-root/logs/haproxy_ctld.log 
E, [2015-09-09T11:02:35.844932 #280135] ERROR -- : Failed to get stats from quota
E, [2015-09-09T11:02:40.847626 #280135] ERROR -- : Failed to get stats from Warning:
E, [2015-09-09T11:02:40.849385 #280135] ERROR -- : Failed to get stats from Gear
E, [2015-09-09T11:02:40.849737 #280135] ERROR -- : Failed to get stats from 55ef20d37628e14a5300019c
E, [2015-09-09T11:02:40.850333 #280135] ERROR -- : Failed to get stats from is
E, [2015-09-09T11:02:40.850822 #280135] ERROR -- : Failed to get stats from using
E, [2015-09-09T11:02:40.850912 #280135] ERROR -- : Failed to get stats from 94.4%
E, [2015-09-09T11:02:40.851988 #280135] ERROR -- : Failed to get stats from of
E, [2015-09-09T11:02:40.852337 #280135] ERROR -- : Failed to get stats from disk
E, [2015-09-09T11:02:40.852719 #280135] ERROR -- : Failed to get stats from quota
```

Expected results:
```
[prod-mathdown.rhcloud.com 55ee99710c1e662f8b000013]\> cat haproxy/conf/app_haproxy_status_urls.conf 
cat: haproxy/conf/app_haproxy_status_urls.conf: No such file or directory
[prod-mathdown.rhcloud.com 55ee99710c1e662f8b000013]\> tail app-root/logs/haproxy_ctld.log
I, [2015-09-08T04:19:04.773447 #492161]  INFO -- : Starting haproxy_ctld
I, [2015-09-08T04:29:09.878269 #492161]  INFO -- : remove-gear - capacity: 0.0% gear_count: 2 sessions: 0 remove_thresh: 31.5%
I, [2015-09-08T04:29:11.263448 #492161]  INFO -- : remove-gear - exit_code: 1  output: Cannot remove gear because min limit '2' reached.
```
(not sure why app_haproxy_status_urls.conf is missing, that's also a 2-gear app.)

Additional info:

--- Additional comment from John W. Lamb on 2015-09-29 11:22:06 EDT ---

I was able to reproduce this in Online. Working on it now.

--- Additional comment from openshift-github-bot on 2015-10-14 12:03:06 EDT ---

Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/bb92573aaa7b56069ab25d8f558756bc9a67e451
fix rhcsh error output, clean up cart sub hooks

The `welcome` function in `rhcsh` was sending messages to `stdout` which
should have gone to `stderr`.

The subscriber hooks for `haproxy`, `jbosseap` and `jbossas` carts were
modified at first to fix Bug 1261540, but since Miciah has addressed
that in the runtime code, the scripts are now simply optimized to use
bash built-ins to avoid forking new processes.

--- Additional comment from openshift-github-bot on 2015-10-14 14:07:06 EDT ---

Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/5733b053f7f3fb358130abff677c040cd18634ed
controller: execute_connections: nix client output

Filter client output out from the publish hooks output and chomp it
before providing it as input to the subscribe hooks.

This commit fixes bug 1261540.

--- Additional comment from DeShuai Ma on 2015-10-18 23:21:54 EDT ---

Test on devenv_5667 and verify this bug.

When ssh app see this warning:
Warning: Gear 562459ec5b9ed6aec8000011 is using 92.9% of disk quota

[rb20-dma.dev.rhcloud.com 562459ec5b9ed6aec8000011]\> dd if=/dev/zero of=app-root/repo/test bs=1M count=950
[rb20-dma.dev.rhcloud.com 562459ec5b9ed6aec8000011]\> cat app-root/logs/
haproxy_ctld.log  haproxy.log       ruby.log          
[rb20-dma.dev.rhcloud.com 562459ec5b9ed6aec8000011]\> cat app-root/logs/haproxy_ctld.log 
I, [2015-10-18T22:48:32.644300 #13821]  INFO -- : Starting haproxy_ctld
I, [2015-10-18T22:58:37.735770 #13821]  INFO -- : remove-gear - capacity: 0.0% gear_count: 2 sessions: 0 remove_thresh: 31.5%
I, [2015-10-18T22:58:46.234788 #13821]  INFO -- : remove-gear - exit_code: 0  output:

Comment 4 Gaoyun Pei 2015-11-19 09:22:35 UTC
Verify this bug on OpenShiftEnterpriseErrata/2.2/2015-11-17.1 puddle.

[root@broker ~]# rhc ssh scaletest
Connecting to yes-scaletest-1.com.cn ...
...
Warning: Gear yes-scaletest-1 is using 93.9% of disk quota
[scaletest-yes.ose22-auto.com.cn yes-scaletest-1]\> cat haproxy/conf/app_haproxy_status_urls.conf
cat: haproxy/conf/app_haproxy_status_urls.conf: No such file or directory
[scaletest-yes.ose22-auto.com.cn yes-scaletest-1]\> cat app-root/logs/haproxy_ctld.log 
I, [2015-11-19T04:05:35.947921 #8941]  INFO -- : Starting haproxy_ctld
I, [2015-11-19T04:09:24.814787 #11560]  INFO -- : Starting haproxy_ctld

Comment 6 errata-xmlrpc 2015-12-17 17:11:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-2666.html


Note You need to log in before you can comment on or make changes to this bug.