Hide Forgot
Description of problem: Given one scalable exist and enable HA, scale-up this app and make sure there are 2 HA gears, stop the Head HA gear's node. Run 'oo-admin-repair --removed-nodes' it will delete this app. Version-Release number of selected component (if applicable): devevn_stage_488 How reproducible: always Steps to Reproduce: 1. create one scale app and enable HA 2. scale-up this app and make it have 2 HA gears at least 3. stop the head HA gear(5253a0a58942e1c5ba0000b7) 's node and another HA gear(d847925e2fdf11e3ad6a22000a9047d8) node is alive /etc/init.d/ruby193-mcollective stop 4. Run oo-admin-repair --removed-nodes Actual results: after step 2: rhc app show zqpy26s -g ID State Cartridges Size SSH URL -------------------------------- ------- ---------------------- ----- ------------------------------------------------------------------------------------- 5253a0a58942e1c5ba0000b7 started python-2.6 haproxy-1.4 small 5253a0a58942e1c5ba0000b7.rhcloud.com 800989830652802600271872 started python-2.6 haproxy-1.4 small 800989830652802600271872.rhcloud.com d84152e02fdf11e3ad6a22000a9047d8 started python-2.6 haproxy-1.4 small d84152e02fdf11e3ad6a22000a9047d8.rhcloud.com d847925e2fdf11e3ad6a22000a9047d8 started python-2.6 haproxy-1.4 small d847925e2fdf11e3ad6a22000a9047d8.rhcloud.com 843156847076750150074368 started python-2.6 haproxy-1.4 small 843156847076750150074368.rhcloud.com [zqzhao@dhcp-13-222 non_scalable]$ rhc ssh zqpy26s --gear ls === 800989830652802600271872 python-2.6+haproxy-1.4 app-root git python === 5253a0a58942e1c5ba0000b7 python-2.6+haproxy-1.4 app-root git haproxy python === d847925e2fdf11e3ad6a22000a9047d8 python-2.6+haproxy-1.4 app-root git haproxy python === d84152e02fdf11e3ad6a22000a9047d8 python-2.6+haproxy-1.4 app-root git python === 843156847076750150074368 python-2.6+haproxy-1.4 app-root git python ---> step 4: will delete this app if input no oo-admin-repair --removed-nodes Started at: 2013-10-08 02:13:21 -0400 Time to fetch mongo data: 0.023s Total gears found in mongo: 9 Servers that are unresponsive: Server: ip-10-195-198-222 (district: dist1), Confirm [yes/no]: yes Check failed. Some servers are unresponsive: ip-10-195-198-222 Do you want to delete unresponsive servers from their respective districts [yes/no]: no Found 1 unresponsive scalable apps that can not be recovered but framework/db backup available. zqpy26s (id: 5253a0a58942e1c5ba0000b7, backup-gears: 5253a1048942e1c5ba0000dc, 5253a1048942e1c5ba0000dd, 5253a1048942e1c5ba0000de) Do you want to skip all of them [yes/no]:(Warning: entering 'no' will delete the apps) Expected results: Should only delete the head HA gear, and this app still can be accessible Additional info:
Tried reproduction steps couple of times but unable to recreate the issue. When the app is in HA mode and scaled, For the app to recover/make-it-accessible, we need at least one of the framework gear alive that has both *ha-proxy and web-framework* carts and not just *web-framework* cart (can occur because of scale up). If you are still able to reproduce, please attach oo-admin-repair output, mongo record for the app along with exact reproduction steps.
Probably, your test set-up might be incorrect. Step-2. scale-up this app and make it have 2 HA gears at least ==> scale up event won't make the app HA You need to issue 'make-ha' event to make the app HA i.e 2 gears with web-framework + ha-proxy carts example: curl -k --user 'ravip:nopass' https://localhost/broker/rest/domains/ravip/applications/app3/events -X POST -d event=make-ha
(In reply to Ravi Sankar from comment #2) > Probably, your test set-up might be incorrect. > Step-2. scale-up this app and make it have 2 HA gears at least > ==> scale up event won't make the app HA > > You need to issue 'make-ha' event to make the app HA i.e 2 gears with > web-framework + ha-proxy carts > example: curl -k --user 'ravip:nopass' > https://localhost/broker/rest/domains/ravip/applications/app3/events -X POST > -d event=make-ha My detail steps is as below: 1) Change the /usr/libexec/openshift/cartridges/haproxy/metadata/manifest.yml Scaling: Min: 1 Max: 5 Multiplier: 2 2) restart /etc/init.d/ruby193-mcollective restart 3) clean the cache oo-admin-broker-cache -c 4) create one scale app and scale up rhc app create zqphps php-5.3 -s rhc cartridge scale -a zqphps -c php-5.3 --min 5
(In reply to Ravi Sankar from comment #1) > Tried reproduction steps couple of times but unable to recreate the issue. > > When the app is in HA mode and scaled, For the app to > recover/make-it-accessible, we need at least one of the framework gear alive > that has both *ha-proxy and web-framework* carts and not just > *web-framework* cart (can occur because of scale up). > > If you are still able to reproduce, please attach oo-admin-repair output, > mongo record for the app along with exact reproduction steps. Still can reproduce this issue. Comment 3 step 4 result: rhc app show zqphps -g ID State Cartridges Size SSH URL -------------------------------- ------- ------------------- ----- ------------------------------------------------------------------------------------- 5254c0bf0779d50baf000007 started php-5.3 haproxy-1.4 small 5254c0bf0779d50baf000007.rhcloud.com 148018120590275179970560 started php-5.3 haproxy-1.4 small 148018120590275179970560.rhcloud.com 81c6b7a0308b11e39dc312313d2d21dc started php-5.3 haproxy-1.4 small 81c6b7a0308b11e39dc312313d2d21dc.rhcloud.com 81e2e9de308b11e39dc312313d2d21dc started php-5.3 haproxy-1.4 small 81e2e9de308b11e39dc312313d2d21dc.rhcloud.com 5254c1040779d5099c000005 started php-5.3 haproxy-1.4 small 5254c1040779d5099c000005.rhcloud.com [zqzhao@dhcp-13-222 non_scalable]$ rhc ssh zqphps --gear ls === 148018120590275179970560 php-5.3+haproxy-1.4 app-root git php === 81e2e9de308b11e39dc312313d2d21dc php-5.3+haproxy-1.4 app-root git haproxy php === 5254c1040779d5099c000005 php-5.3+haproxy-1.4 app-root git php === 81c6b7a0308b11e39dc312313d2d21dc php-5.3+haproxy-1.4 app-root git php === 5254c0bf0779d50baf000007 php-5.3+haproxy-1.4 app-root git haproxy php You can see two HA & web framework gears 5254c0bf0779d50baf000007(head gear) and 81e2e9de308b11e39dc312313d2d21dc, mongo you can refer to attachment 5) make the gear 5254c0bf0779d50baf000007 node down (you can move this gear to the node you want to make down), and 81e2e9de308b11e39dc312313d2d21dc is another node is alive 6) Run 'oo-admin-repair --removed-nodes' Started at: 2013-10-08 22:44:07 -0400 Time to fetch mongo data: 0.022s Total gears found in mongo: 5 Servers that are unresponsive: Server: ip-10-202-17-245 (district: dist1), Confirm [yes/no]: yes Check failed. Some servers are unresponsive: ip-10-202-17-245 Do you want to delete unresponsive servers from their respective districts [yes/no]: no Found 1 unresponsive scalable apps that can not be recovered but framework/db backup available. zqphps (id: 5254c0bf0779d50baf000007, backup-gears: 5254c1040779d50baf00002b, 5254c1040779d50baf00002c, 5254c1040779d50baf00002d, 5254c1040779d50baf00002e) Do you want to skip all of them [yes/no]:(Warning: entering 'no' will delete the apps) no Total time: 92.404s Finished at: 2013-10-08 22:45:40 -0400 7) rhc app show zqphps Application 'zqphps' not found.
Created attachment 809632 [details] application_zqphp_mongo
According to comment 2, enable the HA for app by rest api curl -s -k -H 'Content-Type: Application/json' --user xxxxx:xxxx https://ec2-23-20-74-48.compute-1.amazonaws.com/broker/rest/domains/zqd/applications/zqphps/events -X POST -d '{"event":"make-ha"}' And do the same thing above And then run 'oo-admin-repair --removed-nodes', it can delete the head HA web_framework gear, but the app will can not be accessible. The other HA web_framework gear also can not be accessible, return 503 Service Unavailable
Retest again on a new env,can not reproduce this bug and the HA web_framework can be accessible. so close this bug temporary.