Bug 834351 - User's consumed gears and actual gears don't match
Summary: User's consumed gears and actual gears don't match
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OKD
Classification: Red Hat
Component: Pod
Version: 2.x
Hardware: x86_64
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Rajat Chopra
QA Contact: libra bugs
URL:
Whiteboard:
: 835144 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-06-21 15:40 UTC by Kenny Woodson
Modified: 2015-05-15 01:58 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-07-13 23:43:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Kenny Woodson 2012-06-21 15:40:28 UTC
Description of problem:
rhc-admin-chk reports that a user's consumed gears and actual gears do not match.  We believe this is due to a race condition on application create/destroy as data in mongo is getting out of sync with what is actually on the nodes.

Our current work around is to set the consumed gears to the actual gears found.

Version-Release number of selected component (if applicable):

Current version: rhc-broker-0.93.28-1.el6_2.noarch


How reproducible:
We are seeing this in stg and production.  I would guess that this is reproducible.

Steps to Reproduce:
1. Run rhc-admin-check in stg
2. View the output of rhc-admin-check
3. Verify there are FAILURES
  
Actual results:

The data in mongo does not reflect what is on the nodes.  

Expected results:

rhc-admin-check should pass with having all users' consumed gears matching their acutal gears.

Additional info:
Here is the output from rhc-admin-check:

FAIL - user mmcgrath+nagios has a mismatch in consumed gears (0) and actual gears (1)!
FAIL - user jhou has a mismatch in consumed gears (2) and actual gears (3)!
FAIL - user c9stage1155 has a mismatch in consumed gears (1) and actual gears (0)!
FAIL - user mzimen has a mismatch in consumed gears (0) and actual gears (6)!

Comment 1 Dan McPherson 2012-06-21 16:52:03 UTC
To be clear this has nothing to do with what's on the nodes.  It's a mismatch entirely in mongo.  Can you verify if these breaks happen with the latest code, there was at least one fix for this issue this release.

Comment 2 Thomas Wiest 2012-06-21 22:29:41 UTC
We've cleaned up all of the ones that rhc-admin-chk reported in STG. So, we'll see if more get reported now.

Comment 3 Xiaoli Tian 2012-06-25 05:34:08 UTC
Hard to reproduce it in devenv, please help to check if it happened again because of bug 834737 in stage.

Thanks

Comment 4 Thomas Wiest 2012-06-25 13:24:54 UTC
When we deployed to STG, we made sure rhc-admin-chk was succeeding (clean).

Today when I run it, I see the following mismatched consumed gears:

FAIL - user [REDACTED] has a mismatch in consumed gears (9) and actual gears (5)!

FAIL - user [REDACTED] has a mismatch in consumed gears (0) and actual gears (1)!

FAIL - user [REDACTED] has a mismatch in consumed gears (-1) and actual gears (0)!

FAIL - user [REDACTED] has a mismatch in consumed gears (1) and actual gears (0)!

FAIL - user [REDACTED] has a mismatch in consumed gears (2) and actual gears (0)!


e-mail me if you need the logins and I'll forward you the output from rhc-admin-chk.

So, this bug is definitely still present in the current STG build.

rhc-broker-0.94.17-1.el6_3.noarch

Comment 5 Dan McPherson 2012-06-25 13:33:04 UTC
I need broker logs from what these users have done.

Comment 6 Thomas Wiest 2012-06-25 14:08:17 UTC
Sent as a separate e-mail to Dan.

Comment 7 Dan McPherson 2012-06-25 16:57:56 UTC
*** Bug 835144 has been marked as a duplicate of this bug. ***

Comment 8 Dan McPherson 2012-06-25 20:38:41 UTC
Hi Ravi,

  I have been looking over this one today.  I'll let you have it at the end of the day if I don't think I am done with it.

Thanks,

Dan

Comment 9 Rajat Chopra 2012-07-07 00:00:04 UTC
Done with first part of improving the situation with rev#9062edb0f922844306d5eb150624fa2897bda13d in crankcase.repo! 
Work undertaken ->

1. Mongo save before we create on node

2. raise an exception when destroy fails, after destroy is called upon all gears (and app should not get deleted subsequently)

3. ngears get decremented only after destroy succeeds, and do not delete the gear from group_instance if a failure happens

4. encapsulate destroy logic into a single place (cleanup)



Second part will part of a User Story. Work should undertake saving gears explicitly as an atomic operation. Either that, or prevention of concurrent processes on app should be prevented with locks.

Comment 10 Johnny Liu 2012-07-11 14:28:42 UTC
Retest this bug on devenv_1884, and still can reproduce. I am not sure if this the second part (part of a User Story), if it is, pls move this bug to ON_QA, I will verify it again.


1.Create an user with maxgears=3
2.Change Min to 4 in /usr/libexec/stickshift/cartridges/php-5.3/info/manifest.yml
3.Create an scaling app with above type via command line

Fail to create app, no any gear is saved in mongo, but user consumed gear become 1.

PRIMARY> db.user.find()
{ "_id" : "jialiu", "apps" : [ ], "capabilities" : { "gear_sizes" : [ "small" ] }, "consumed_gears" : 1, "domains" : [ { "uuid" : "44b83b734bfa4f3ba72daac9f035644e", "namespace" : "jialiu" } ], "env_vars" : null, "gear_usage_records" : [ 	{ 	"event" : "begin", 	"uuid" : "414b89874da84cf48395f0ad2149571c", 	"gear_uuid" : "1606f58fc38c4fe1b9d90a6feaa9e4ca", 	"time" : ISODate("2012-07-11T14:16:17.834Z"), 	"gear_size" : "small", 	"sync_time" : null } ], "login" : "jialiu", "max_gears" : 3, "parent_user_login" : null, "ssh_keys" : { "default" : { "key" : "AAAAB3NzaC1yc2EAAAADAQABAAABAQDSv44aEPcObZkAN5VI8XHW23b7JL0wBftkPwtXwHF6ppxnvpIhQNyBy5crHWGrigEOGLsJWH7hmo/rfkELuhfpdaGIB582AAJ5Eeug+Fv7yQFQodCafALhh/piXXnJ7xsnFpy6Pz5OVuxC2nRoew8oqSIjKaHTdjzuSPNRviEKLTypcREtnQp7nCCTDm3NFjaM40tDA3/i9m708qViQHv5tqkdyrfLMu5Lq+oJMrzP911aCn3F0GTc+T/cUC/R2ay5wLhv9FT+eTrDSOsMFt9BZYFT+mfSyIJhaGuxB7OUQ3qf4RRMo+0hGINRSLldFtRaZiQUyqw6nKPR460MfgMN", "type" : "ssh-rsa" } }, "system_ssh_keys" : null, "uuid" : "307d5fbf4c2c4fd9a21d349d4f177045", "vip" : false }

Comment 11 Rajat Chopra 2012-07-11 21:16:55 UTC
Fixed with rev#870689373c385ce50d395bf92f430b72e3ff5d8a in crankcase.repo

Comment 12 Johnny Liu 2012-07-12 06:51:27 UTC
Verified this bug on devenv-stage_223, PASS.


Note You need to log in before you can comment on or make changes to this bug.