Bug 1016428
Summary: | [origin_broker_98]app will can not be accessible after deleting the head HA web_framework gear by 'oo-admin-repair --removed-nodes' | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Online | Reporter: | zhaozhanqi <zzhao> | ||||
Component: | Pod | Assignee: | Ravi Sankar <rpenta> | ||||
Status: | CLOSED NOTABUG | QA Contact: | libra bugs <libra-bugs> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 2.x | CC: | rpenta, xtian | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | All | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2013-10-09 11:03:58 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
zhaozhanqi
2013-10-08 07:29:08 UTC
Tried reproduction steps couple of times but unable to recreate the issue. When the app is in HA mode and scaled, For the app to recover/make-it-accessible, we need at least one of the framework gear alive that has both *ha-proxy and web-framework* carts and not just *web-framework* cart (can occur because of scale up). If you are still able to reproduce, please attach oo-admin-repair output, mongo record for the app along with exact reproduction steps. Probably, your test set-up might be incorrect. Step-2. scale-up this app and make it have 2 HA gears at least ==> scale up event won't make the app HA You need to issue 'make-ha' event to make the app HA i.e 2 gears with web-framework + ha-proxy carts example: curl -k --user 'ravip:nopass' https://localhost/broker/rest/domains/ravip/applications/app3/events -X POST -d event=make-ha (In reply to Ravi Sankar from comment #2) > Probably, your test set-up might be incorrect. > Step-2. scale-up this app and make it have 2 HA gears at least > ==> scale up event won't make the app HA > > You need to issue 'make-ha' event to make the app HA i.e 2 gears with > web-framework + ha-proxy carts > example: curl -k --user 'ravip:nopass' > https://localhost/broker/rest/domains/ravip/applications/app3/events -X POST > -d event=make-ha My detail steps is as below: 1) Change the /usr/libexec/openshift/cartridges/haproxy/metadata/manifest.yml Scaling: Min: 1 Max: 5 Multiplier: 2 2) restart /etc/init.d/ruby193-mcollective restart 3) clean the cache oo-admin-broker-cache -c 4) create one scale app and scale up rhc app create zqphps php-5.3 -s rhc cartridge scale -a zqphps -c php-5.3 --min 5 (In reply to Ravi Sankar from comment #1) > Tried reproduction steps couple of times but unable to recreate the issue. > > When the app is in HA mode and scaled, For the app to > recover/make-it-accessible, we need at least one of the framework gear alive > that has both *ha-proxy and web-framework* carts and not just > *web-framework* cart (can occur because of scale up). > > If you are still able to reproduce, please attach oo-admin-repair output, > mongo record for the app along with exact reproduction steps. Still can reproduce this issue. Comment 3 step 4 result: rhc app show zqphps -g ID State Cartridges Size SSH URL -------------------------------- ------- ------------------- ----- ------------------------------------------------------------------------------------- 5254c0bf0779d50baf000007 started php-5.3 haproxy-1.4 small 5254c0bf0779d50baf000007.rhcloud.com 148018120590275179970560 started php-5.3 haproxy-1.4 small 148018120590275179970560.rhcloud.com 81c6b7a0308b11e39dc312313d2d21dc started php-5.3 haproxy-1.4 small 81c6b7a0308b11e39dc312313d2d21dc.rhcloud.com 81e2e9de308b11e39dc312313d2d21dc started php-5.3 haproxy-1.4 small 81e2e9de308b11e39dc312313d2d21dc.rhcloud.com 5254c1040779d5099c000005 started php-5.3 haproxy-1.4 small 5254c1040779d5099c000005.rhcloud.com [zqzhao@dhcp-13-222 non_scalable]$ rhc ssh zqphps --gear ls === 148018120590275179970560 php-5.3+haproxy-1.4 app-root git php === 81e2e9de308b11e39dc312313d2d21dc php-5.3+haproxy-1.4 app-root git haproxy php === 5254c1040779d5099c000005 php-5.3+haproxy-1.4 app-root git php === 81c6b7a0308b11e39dc312313d2d21dc php-5.3+haproxy-1.4 app-root git php === 5254c0bf0779d50baf000007 php-5.3+haproxy-1.4 app-root git haproxy php You can see two HA & web framework gears 5254c0bf0779d50baf000007(head gear) and 81e2e9de308b11e39dc312313d2d21dc, mongo you can refer to attachment 5) make the gear 5254c0bf0779d50baf000007 node down (you can move this gear to the node you want to make down), and 81e2e9de308b11e39dc312313d2d21dc is another node is alive 6) Run 'oo-admin-repair --removed-nodes' Started at: 2013-10-08 22:44:07 -0400 Time to fetch mongo data: 0.022s Total gears found in mongo: 5 Servers that are unresponsive: Server: ip-10-202-17-245 (district: dist1), Confirm [yes/no]: yes Check failed. Some servers are unresponsive: ip-10-202-17-245 Do you want to delete unresponsive servers from their respective districts [yes/no]: no Found 1 unresponsive scalable apps that can not be recovered but framework/db backup available. zqphps (id: 5254c0bf0779d50baf000007, backup-gears: 5254c1040779d50baf00002b, 5254c1040779d50baf00002c, 5254c1040779d50baf00002d, 5254c1040779d50baf00002e) Do you want to skip all of them [yes/no]:(Warning: entering 'no' will delete the apps) no Total time: 92.404s Finished at: 2013-10-08 22:45:40 -0400 7) rhc app show zqphps Application 'zqphps' not found. Created attachment 809632 [details]
application_zqphp_mongo
According to comment 2, enable the HA for app by rest api curl -s -k -H 'Content-Type: Application/json' --user xxxxx:xxxx https://ec2-23-20-74-48.compute-1.amazonaws.com/broker/rest/domains/zqd/applications/zqphps/events -X POST -d '{"event":"make-ha"}' And do the same thing above And then run 'oo-admin-repair --removed-nodes', it can delete the head HA web_framework gear, but the app will can not be accessible. The other HA web_framework gear also can not be accessible, return 503 Service Unavailable Retest again on a new env,can not reproduce this bug and the HA web_framework can be accessible. so close this bug temporary. |