Bug 1016418

Summary: [origin_broker_98]Met error while deleting the HA gear by oo-admin-repair --removed-nodes
Product: OpenShift Online Reporter: zhaozhanqi <zzhao>
Component: PodAssignee: Ravi Sankar <rpenta>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.xCC: rpenta, xtian
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-10-17 13:33:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description zhaozhanqi 2013-10-08 07:11:23 UTC
Description of problem:
Given one scalable exist and enable HA, scale-up this app and make sure there are 2 HA gears, stop one of HA gear's node. Run 'oo-admin-repair --removed-nodes' will show error "Unable to analyze application with id: 5253a0a58942e1c5ba0000b7, error: Server identity: ip-10-144-71-216 not in unresponsive servers but 'removed' is set for Gear: 5253a0a58942e1c5ba0000b7"

Version-Release number of selected component (if applicable):
devenv_stage_488

How reproducible:
sometimes

Steps to Reproduce:
1. create one scale app and enable HA
2. scale-up this app and make it have 2 HA gears at least
3. stop one of HA gear(d847925e2fdf11e3ad6a22000a9047d8) 's  node 
   /etc/init.d/ruby193-mcollective stop
4. Run oo-admin-repair --removed-nodes

Actual results:
after step 2:

rhc app show zqpy26s -g
ID                               State   Cartridges             Size  SSH URL
-------------------------------- ------- ---------------------- ----- -------------------------------------------------------------------------------------
5253a0a58942e1c5ba0000b7         started python-2.6 haproxy-1.4 small 5253a0a58942e1c5ba0000b7.rhcloud.com
800989830652802600271872         started python-2.6 haproxy-1.4 small 800989830652802600271872.rhcloud.com
d84152e02fdf11e3ad6a22000a9047d8 started python-2.6 haproxy-1.4 small d84152e02fdf11e3ad6a22000a9047d8.rhcloud.com
d847925e2fdf11e3ad6a22000a9047d8 started python-2.6 haproxy-1.4 small d847925e2fdf11e3ad6a22000a9047d8.rhcloud.com
843156847076750150074368         started python-2.6 haproxy-1.4 small 843156847076750150074368.rhcloud.com
[zqzhao@dhcp-13-222 non_scalable]$ rhc ssh zqpy26s --gear ls
=== 800989830652802600271872 python-2.6+haproxy-1.4
app-root
git
python
=== 5253a0a58942e1c5ba0000b7 python-2.6+haproxy-1.4
app-root
git
haproxy
python
=== d847925e2fdf11e3ad6a22000a9047d8 python-2.6+haproxy-1.4
app-root
git
haproxy
python
=== d84152e02fdf11e3ad6a22000a9047d8 python-2.6+haproxy-1.4
app-root
git
python
=== 843156847076750150074368 python-2.6+haproxy-1.4
app-root
git
python

--->

step 4:
[root@ip-10-144-71-216 ~]# oo-admin-repair --removed-nodes
Started at: 2013-10-08 02:41:54 -0400
Time to fetch mongo data: 0.022s
Total gears found in mongo: 9
Servers that are unresponsive:
	Server: ip-10-195-198-222 (district: dist1), Confirm [yes/no]: 
yes
Check failed.
Some servers are unresponsive: ip-10-195-198-222


Do you want to delete unresponsive servers from their respective districts [yes/no]: no
Unable to analyze application with id: 5253a0a58942e1c5ba0000b7, error: Server identity: ip-10-144-71-216 not in unresponsive servers but 'removed' is set for Gear: 5253a0a58942e1c5ba0000b7


Total time: 51.954s
Finished at: 2013-10-08 02:42:46 -0400




Expected results:
no this error and can delete gear d847925e2fdf11e3ad6a22000a9047d8

Additional info:

Comment 1 Ravi Sankar 2013-10-08 21:20:59 UTC
Unable to reproduce the issue.
My hunch is that you might have reused the gear for multiple test cases. Please try this on a fresh environment.

Comment 2 zhaozhanqi 2013-10-09 10:57:39 UTC
I also can not reproduce this bug, May be it is caused by oo-admin-move. Anyway, I will pay attention on it.close this bug temporary.