Bug 876693 - scaled gears do not move correctly
Summary: scaled gears do not move correctly
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OKD
Classification: Red Hat
Component: Pod
Version: 2.x
Hardware: All
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Rajat Chopra
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-11-14 18:03 UTC by Kenny Woodson
Modified: 2015-05-15 02:08 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-12-19 19:26:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
move log for this gear (1.96 KB, text/plain)
2012-11-14 18:03 UTC, Kenny Woodson
no flags Details

Description Kenny Woodson 2012-11-14 18:03:44 UTC
Created attachment 645039 [details]
move log for this gear

Description of problem:
When moving a scaled gear recently, the gear has become out of sync with mongo. The gear in question is listed in additional info.  We moved this gear off of ex-std-node74 but then when looking at mongo it tells us the scaled gears still exist on node74 which they do not.  They are now located elsewhere.  

For instance, a1fd466408244af08404dde6fc446e98 is on ex-std-node36.

We need tools to function properly and update mongo when moves to scaled gears occur.  We also need tools to fix the problems after they occur with scaled gears.  Currently this gear is in disarray as we cannot repair it with the current tool set or general ops knowledge.  

Version-Release number of selected component (if applicable):

current, 2.0.19.1

How reproducible:

Steps to Reproduce:
1. Create a scaled gear.
2. Create multiple "subgears" for this gear.
3. Move the main application's gear for this gear.
  
Actual results:

The main gear migrated successfully.  The site is still up.  Mongo points to the incorrect locations of the scaled gears.

Expected results:

oo-admin-move should successfully move the gears and handle scaled gears correctly by updating mongo properly.

Additional info:

 App Name:      redhatchallenge
    App UUID:      aa70ada22855472a8bf5a6d12a1d93b9
    Creation Time: 2012-08-04 05:49:18 AM
    URL:           http://redhatchallenge-rhc.rhcloud.com

 Gear[0]
            Server Identity: ex-std-node21.prod.rhcloud.com
            Gear UUID:       aa70ada22855472a8bf5a6d12a1d93b9
            Gear UID:        4554
              Group Instance[1]: 
   Server Identity: ex-std-node74.prod.rhcloud.com
            Gear UUID:       a1fd466408244af08404dde6fc446e98
            Gear UID:        2565
             Gear[1]
            Server Identity: ex-std-node74.prod.rhcloud.com
            Gear UUID:       5e6e28a066e548aca075eaf394b7527f
            Gear UID:        3179
             Gear[2]
            Server Identity: ex-std-node74.prod.rhcloud.com
            Gear UUID:       89181dce383e44cfa3355b88caa3c861
            Gear UID:        5110
             Gear[3]
            Server Identity: ex-std-node74.prod.rhcloud.com
            Gear UUID:       53a0d8c17e0549168eac836670317413
            Gear UID:        3745
             Gear[4]
            Server Identity: ex-std-node49.prod.rhcloud.com
            Gear UUID:       36b74e9eb31c44b58650444d2a0d0292
            Gear UID:        2798
             Gear[5]
            Server Identity: ex-std-node74.prod.rhcloud.com
            Gear UUID:       2e0f630d33614afd9184b0cea886e2ee
            Gear UID:        3180
             Gear[6]
            Server Identity: ex-std-node74.prod.rhcloud.com
            Gear UUID:       4ebc1b95f48a401ab452e0154639c2a7
            Gear UID:        3184
             Gear[7]
            Server Identity: ex-std-node74.prod.rhcloud.com
            Gear UUID:       c2db3776227549a5b481c120c6feee1d
            Gear UID:        3825
              Group Instance[2]:

Comment 1 Dan McPherson 2012-11-14 19:20:02 UTC
Rajat, the log looks right.  Perhaps your theory about concurrent scale ups is relevant.

Comment 2 Rajat Chopra 2012-11-16 18:12:20 UTC
The logs suggest nothing wrong with the move. One way that things can go wrong is with concurrent operations. In this case a likely guess is that a scale-up operation was happening while the move (of a different gear of the app) was going on.

A new method has been added in oo-ctl-admin-app that can remove such a broken gear. More information with bug#876330 !

Comment 3 Rajat Chopra 2012-11-16 20:04:00 UTC
The original concurrency problems will be fixed with model_refactor code. Until then ops will have to manually see if there are broken gears, and use the updated oo-admin-ctl-app script.

Comment 4 Rony Gong 🔥 2012-11-19 06:38:06 UTC
Verified on devenv_stage_254
[root@ip-10-202-193-65 data]# oo-admin-ctl-app -c removegear -l qgong -a qsphp -g 0fb4256d47cb4b61a0115148a82e8cd6
Success


Note You need to log in before you can comment on or make changes to this bug.