Bug 1261540

Summary:	haproxy_ctld error on a close-to-quota gear
Product:	OpenShift Online	Reporter:	Beni Paskin-Cherniavksy <beni.cherniavsky>
Component:	Image	Assignee:	John W. Lamb <jolamb>
Status:	CLOSED WONTFIX	QA Contact:	DeShuai Ma <dma>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	2.x	CC:	abhgupta, aos-bugs, bperkins, jokerman, mmccomas
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	1267325 1280438 (view as bug list)		Environment:
Last Closed:	2017-05-31 18:22:11 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1267325, 1280438

Description Beni Paskin-Cherniavksy 2015-09-09 15:08:45 UTC

Description of problem:

I have an app on rhcloud.com that's close to its quota.  During builds and when ssh-ing it tells me:
`Warning: Gear 55e73c8c0c1e66c54c00000d is using 93.9% of inodes allowed`
That's fine.

But I saw this nonsense in haproxy_ctld.log:
```
E, [2015-09-08T13:12:44.608192 #408384] ERROR -- : Failed to get stats from Warning:
E, [2015-09-08T13:12:44.608655 #408384] ERROR -- : Failed to get stats from Gear
E, [2015-09-08T13:12:44.609350 #408384] ERROR -- : Failed to get stats from 55e73c8c0c1e66c54c00000d
E, [2015-09-08T13:12:44.609828 #408384] ERROR -- : Failed to get stats from is
E, [2015-09-08T13:12:44.610308 #408384] ERROR -- : Failed to get stats from using
E, [2015-09-08T13:12:44.610412 #408384] ERROR -- : Failed to get stats from 93.9%
E, [2015-09-08T13:12:44.610906 #408384] ERROR -- : Failed to get stats from of
E, [2015-09-08T13:12:44.611436 #408384] ERROR -- : Failed to get stats from inodes
E, [2015-09-08T13:12:44.612098 #408384] ERROR -- : Failed to get stats from allowed
```
Digging into the source I see `get_remote_sessions_count()` is called with URLs from app_haproxy_status_urls.conf, which turns out already corrupted:
```
[mathdown468cf5aaf3prod-cben.rhcloud.com 55e73c8c0c1e66c54c00000d]\> cat haproxy/conf/app_haproxy_status_urls.conf 
Warning:
Gear
55e73c8c0c1e66c54c00000d
is
using
93.9%
of
inodes
allowed
```

From there I'm not sure where to debug...
This is a throwaway app, feel free to inspect or mutate it in any way, or tell me what to check.

Version-Release number of selected component (if applicable):
?

How reproducible:
Tried once, reproduced.

Steps to Reproduce:
1. create scalable app
2. dd if=/dev/zero ... to fill up to >90% quota.
   (the original discovery was for inodes quota, this repro is for size quota)
3. restart
4. scale to 2 gears
5. or maybe I restarted here?  not sure.
6. profit: 

Actual results:
```
[tmp-mathdown.rhcloud.com 55ef20d37628e14a5300019c]\> cat haproxy/conf/app_haproxy_status_urls.conf 
Warning:
Gear
55ef20d37628e14a5300019c
is
using
94.4%
of
disk
quota
[tmp-mathdown.rhcloud.com 55ef20d37628e14a5300019c]\> tail app-root/logs/haproxy_ctld.log 
E, [2015-09-09T11:02:35.844932 #280135] ERROR -- : Failed to get stats from quota
E, [2015-09-09T11:02:40.847626 #280135] ERROR -- : Failed to get stats from Warning:
E, [2015-09-09T11:02:40.849385 #280135] ERROR -- : Failed to get stats from Gear
E, [2015-09-09T11:02:40.849737 #280135] ERROR -- : Failed to get stats from 55ef20d37628e14a5300019c
E, [2015-09-09T11:02:40.850333 #280135] ERROR -- : Failed to get stats from is
E, [2015-09-09T11:02:40.850822 #280135] ERROR -- : Failed to get stats from using
E, [2015-09-09T11:02:40.850912 #280135] ERROR -- : Failed to get stats from 94.4%
E, [2015-09-09T11:02:40.851988 #280135] ERROR -- : Failed to get stats from of
E, [2015-09-09T11:02:40.852337 #280135] ERROR -- : Failed to get stats from disk
E, [2015-09-09T11:02:40.852719 #280135] ERROR -- : Failed to get stats from quota
```

Expected results:
```
[prod-mathdown.rhcloud.com 55ee99710c1e662f8b000013]\> cat haproxy/conf/app_haproxy_status_urls.conf 
cat: haproxy/conf/app_haproxy_status_urls.conf: No such file or directory
[prod-mathdown.rhcloud.com 55ee99710c1e662f8b000013]\> tail app-root/logs/haproxy_ctld.log
I, [2015-09-08T04:19:04.773447 #492161]  INFO -- : Starting haproxy_ctld
I, [2015-09-08T04:29:09.878269 #492161]  INFO -- : remove-gear - capacity: 0.0% gear_count: 2 sessions: 0 remove_thresh: 31.5%
I, [2015-09-08T04:29:11.263448 #492161]  INFO -- : remove-gear - exit_code: 1  output: Cannot remove gear because min limit '2' reached.
```
(not sure why app_haproxy_status_urls.conf is missing, that's also a 2-gear app.)

Additional info:

Comment 1 John W. Lamb 2015-09-29 15:22:06 UTC

I was able to reproduce this in Online. Working on it now.

Comment 2 openshift-github-bot 2015-10-14 16:03:06 UTC

Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/bb92573aaa7b56069ab25d8f558756bc9a67e451
fix rhcsh error output, clean up cart sub hooks

The `welcome` function in `rhcsh` was sending messages to `stdout` which
should have gone to `stderr`.

The subscriber hooks for `haproxy`, `jbosseap` and `jbossas` carts were
modified at first to fix Bug 1261540, but since Miciah has addressed
that in the runtime code, the scripts are now simply optimized to use
bash built-ins to avoid forking new processes.

Comment 3 openshift-github-bot 2015-10-14 18:07:06 UTC

Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/5733b053f7f3fb358130abff677c040cd18634ed
controller: execute_connections: nix client output

Filter client output out from the publish hooks output and chomp it
before providing it as input to the subscribe hooks.

This commit fixes bug 1261540.

Comment 4 DeShuai Ma 2015-10-19 03:21:54 UTC

Test on devenv_5667 and verify this bug.

When ssh app see this warning:
Warning: Gear 562459ec5b9ed6aec8000011 is using 92.9% of disk quota

[rb20-dma.dev.rhcloud.com 562459ec5b9ed6aec8000011]\> dd if=/dev/zero of=app-root/repo/test bs=1M count=950
[rb20-dma.dev.rhcloud.com 562459ec5b9ed6aec8000011]\> cat app-root/logs/
haproxy_ctld.log  haproxy.log       ruby.log          
[rb20-dma.dev.rhcloud.com 562459ec5b9ed6aec8000011]\> cat app-root/logs/haproxy_ctld.log 
I, [2015-10-18T22:48:32.644300 #13821]  INFO -- : Starting haproxy_ctld
I, [2015-10-18T22:58:37.735770 #13821]  INFO -- : remove-gear - capacity: 0.0% gear_count: 2 sessions: 0 remove_thresh: 31.5%
I, [2015-10-18T22:58:46.234788 #13821]  INFO -- : remove-gear - exit_code: 0  output:

Comment 5 Eric Paris 2017-05-31 18:22:11 UTC

We apologize, however, we do not plan to address this report at this time. The majority of our active development is for the v3 version of OpenShift. If you would like for Red Hat to reconsider this decision, please reach out to your support representative. We are very sorry for any inconvenience this may cause.