Bug 1016428 - [origin_broker_98]app will can not be accessible after deleting the head HA web_framework gear by 'oo-admin-repair --removed-nodes'
[origin_broker_98]app will can not be accessible after deleting the head HA w...
Status: CLOSED NOTABUG
Product: OpenShift Online
Classification: Red Hat
Component: Pod (Show other bugs)
2.x
All All
medium Severity medium
: ---
: ---
Assigned To: Ravi Sankar
libra bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-08 03:29 EDT by zhaozhanqi
Modified: 2015-05-14 20:21 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-10-09 07:03:58 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
application_zqphp_mongo (12.27 KB, text/plain)
2013-10-08 23:09 EDT, zhaozhanqi
no flags Details

  None (edit)
Description zhaozhanqi 2013-10-08 03:29:08 EDT
Description of problem:

Given one scalable exist and enable HA, scale-up this app and make sure there are 2 HA gears, stop the Head HA gear's node. Run 'oo-admin-repair --removed-nodes' it will delete this app.

Version-Release number of selected component (if applicable):
devevn_stage_488

How reproducible:
always

Steps to Reproduce:
1. create one scale app and enable HA
2. scale-up this app and make it have 2 HA gears at least
3. stop the head HA gear(5253a0a58942e1c5ba0000b7) 's  node and another HA gear(d847925e2fdf11e3ad6a22000a9047d8) node is alive 
   /etc/init.d/ruby193-mcollective stop
4. Run oo-admin-repair --removed-nodes

Actual results:
after step 2:

rhc app show zqpy26s -g
ID                               State   Cartridges             Size  SSH URL
-------------------------------- ------- ---------------------- ----- -------------------------------------------------------------------------------------
5253a0a58942e1c5ba0000b7         started python-2.6 haproxy-1.4 small 5253a0a58942e1c5ba0000b7@zqpy26s-zqd.dev.rhcloud.com
800989830652802600271872         started python-2.6 haproxy-1.4 small 800989830652802600271872@800989830652802600271872-zqd.dev.rhcloud.com
d84152e02fdf11e3ad6a22000a9047d8 started python-2.6 haproxy-1.4 small d84152e02fdf11e3ad6a22000a9047d8@d84152e02fdf11e3ad6a22000a9047d8-zqd.dev.rhcloud.com
d847925e2fdf11e3ad6a22000a9047d8 started python-2.6 haproxy-1.4 small d847925e2fdf11e3ad6a22000a9047d8@d847925e2fdf11e3ad6a22000a9047d8-zqd.dev.rhcloud.com
843156847076750150074368         started python-2.6 haproxy-1.4 small 843156847076750150074368@843156847076750150074368-zqd.dev.rhcloud.com
[zqzhao@dhcp-13-222 non_scalable]$ rhc ssh zqpy26s --gear ls
=== 800989830652802600271872 python-2.6+haproxy-1.4
app-root
git
python
=== 5253a0a58942e1c5ba0000b7 python-2.6+haproxy-1.4
app-root
git
haproxy
python
=== d847925e2fdf11e3ad6a22000a9047d8 python-2.6+haproxy-1.4
app-root
git
haproxy
python
=== d84152e02fdf11e3ad6a22000a9047d8 python-2.6+haproxy-1.4
app-root
git
python
=== 843156847076750150074368 python-2.6+haproxy-1.4
app-root
git
python

--->


step 4: 
will delete this app if input no

oo-admin-repair --removed-nodes
Started at: 2013-10-08 02:13:21 -0400
Time to fetch mongo data: 0.023s
Total gears found in mongo: 9
Servers that are unresponsive:
	Server: ip-10-195-198-222 (district: dist1), Confirm [yes/no]: 
yes
Check failed.
Some servers are unresponsive: ip-10-195-198-222


Do you want to delete unresponsive servers from their respective districts [yes/no]: no
Found 1 unresponsive scalable apps that can not be recovered but framework/db backup available.
zqpy26s (id: 5253a0a58942e1c5ba0000b7, backup-gears: 5253a1048942e1c5ba0000dc, 5253a1048942e1c5ba0000dd, 5253a1048942e1c5ba0000de)
Do you want to skip all of them [yes/no]:(Warning: entering 'no' will delete the apps) 

Expected results:
Should only delete the head HA gear, and this app still can be accessible

Additional info:
Comment 1 Ravi Sankar 2013-10-08 16:57:17 EDT
Tried reproduction steps couple of times but unable to recreate the issue.

When the app is in HA mode and scaled, For the app to recover/make-it-accessible, we need at least one of the framework gear alive that has both *ha-proxy and web-framework* carts and not just *web-framework* cart (can occur because of scale up).

If you are still able to reproduce, please attach oo-admin-repair output, mongo record for the app along with exact reproduction steps.
Comment 2 Ravi Sankar 2013-10-08 17:18:02 EDT
Probably, your test set-up might be incorrect.
Step-2. scale-up this app and make it have 2 HA gears at least 
==> scale up event won't make the app HA

You need to issue 'make-ha' event to make the app HA i.e 2 gears with web-framework + ha-proxy carts
example: curl -k --user 'ravip:nopass' https://localhost/broker/rest/domains/ravip/applications/app3/events -X POST -d event=make-ha
Comment 3 zhaozhanqi 2013-10-08 22:24:55 EDT
(In reply to Ravi Sankar from comment #2)
> Probably, your test set-up might be incorrect.
> Step-2. scale-up this app and make it have 2 HA gears at least 
> ==> scale up event won't make the app HA
> 
> You need to issue 'make-ha' event to make the app HA i.e 2 gears with
> web-framework + ha-proxy carts
> example: curl -k --user 'ravip:nopass'
> https://localhost/broker/rest/domains/ravip/applications/app3/events -X POST
> -d event=make-ha

My detail steps is as below:

1) Change the  /usr/libexec/openshift/cartridges/haproxy/metadata/manifest.yml
     Scaling:
      Min: 1
      Max: 5
      Multiplier: 2
2) restart /etc/init.d/ruby193-mcollective restart
3) clean the cache
       oo-admin-broker-cache -c
4) create one scale app and scale up
    rhc app create zqphps php-5.3 -s
    rhc cartridge scale -a zqphps -c php-5.3 --min 5
Comment 4 zhaozhanqi 2013-10-08 23:06:58 EDT
(In reply to Ravi Sankar from comment #1)
> Tried reproduction steps couple of times but unable to recreate the issue.
> 
> When the app is in HA mode and scaled, For the app to
> recover/make-it-accessible, we need at least one of the framework gear alive
> that has both *ha-proxy and web-framework* carts and not just
> *web-framework* cart (can occur because of scale up).
> 
> If you are still able to reproduce, please attach oo-admin-repair output,
> mongo record for the app along with exact reproduction steps.

Still can reproduce this issue.

Comment 3 step 4 result:

rhc app show zqphps -g
ID                               State   Cartridges          Size  SSH URL
-------------------------------- ------- ------------------- ----- -------------------------------------------------------------------------------------
5254c0bf0779d50baf000007         started php-5.3 haproxy-1.4 small 5254c0bf0779d50baf000007@zqphps-zqd.dev.rhcloud.com
148018120590275179970560         started php-5.3 haproxy-1.4 small 148018120590275179970560@148018120590275179970560-zqd.dev.rhcloud.com
81c6b7a0308b11e39dc312313d2d21dc started php-5.3 haproxy-1.4 small 81c6b7a0308b11e39dc312313d2d21dc@81c6b7a0308b11e39dc312313d2d21dc-zqd.dev.rhcloud.com
81e2e9de308b11e39dc312313d2d21dc started php-5.3 haproxy-1.4 small 81e2e9de308b11e39dc312313d2d21dc@81e2e9de308b11e39dc312313d2d21dc-zqd.dev.rhcloud.com
5254c1040779d5099c000005         started php-5.3 haproxy-1.4 small 5254c1040779d5099c000005@5254c1040779d5099c000005-zqd.dev.rhcloud.com
[zqzhao@dhcp-13-222 non_scalable]$ rhc ssh zqphps --gear ls
=== 148018120590275179970560 php-5.3+haproxy-1.4
app-root
git
php
=== 81e2e9de308b11e39dc312313d2d21dc php-5.3+haproxy-1.4
app-root
git
haproxy
php
=== 5254c1040779d5099c000005 php-5.3+haproxy-1.4
app-root
git
php
=== 81c6b7a0308b11e39dc312313d2d21dc php-5.3+haproxy-1.4
app-root
git
php
=== 5254c0bf0779d50baf000007 php-5.3+haproxy-1.4
app-root
git
haproxy
php


You can see two HA & web framework gears 5254c0bf0779d50baf000007(head gear) and 81e2e9de308b11e39dc312313d2d21dc, mongo you can refer to attachment


5) make the gear 5254c0bf0779d50baf000007 node down (you can move this gear to the node you want to make down), and 81e2e9de308b11e39dc312313d2d21dc is another node is alive
6) Run 'oo-admin-repair --removed-nodes'
Started at: 2013-10-08 22:44:07 -0400
Time to fetch mongo data: 0.022s
Total gears found in mongo: 5
Servers that are unresponsive:
	Server: ip-10-202-17-245 (district: dist1), Confirm [yes/no]: 
yes
Check failed.
Some servers are unresponsive: ip-10-202-17-245


Do you want to delete unresponsive servers from their respective districts [yes/no]: no
Found 1 unresponsive scalable apps that can not be recovered but framework/db backup available.
zqphps (id: 5254c0bf0779d50baf000007, backup-gears: 5254c1040779d50baf00002b, 5254c1040779d50baf00002c, 5254c1040779d50baf00002d, 5254c1040779d50baf00002e)
Do you want to skip all of them [yes/no]:(Warning: entering 'no' will delete the apps) no


Total time: 92.404s
Finished at: 2013-10-08 22:45:40 -0400

7)  rhc app show zqphps
Application 'zqphps' not found.
Comment 5 zhaozhanqi 2013-10-08 23:09:44 EDT
Created attachment 809632 [details]
application_zqphp_mongo
Comment 6 zhaozhanqi 2013-10-09 05:48:46 EDT
According to comment 2, enable the HA for app by rest api

curl -s -k -H 'Content-Type: Application/json' --user xxxxx:xxxx https://ec2-23-20-74-48.compute-1.amazonaws.com/broker/rest/domains/zqd/applications/zqphps/events -X POST -d '{"event":"make-ha"}'

And do the same thing above

And then run 'oo-admin-repair --removed-nodes', it can delete the head HA web_framework gear, but the app will can not be accessible. The other HA web_framework gear also can not be accessible, return 503 Service Unavailable
Comment 7 zhaozhanqi 2013-10-09 07:03:58 EDT
Retest again on a new env,can not reproduce this bug and the HA web_framework can be accessible. so close this bug temporary.

Note You need to log in before you can comment on or make changes to this bug.