Bug 834351

Summary: User's consumed gears and actual gears don't match
Product: OKD Reporter: Kenny Woodson <kwoodson>
Component: PodAssignee: Rajat Chopra <rchopra>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.xCC: dmcphers, jialiu, mshao, rmillner, rpenta, twiest, xtian
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-07-13 23:43:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Kenny Woodson 2012-06-21 15:40:28 UTC
Description of problem:
rhc-admin-chk reports that a user's consumed gears and actual gears do not match.  We believe this is due to a race condition on application create/destroy as data in mongo is getting out of sync with what is actually on the nodes.

Our current work around is to set the consumed gears to the actual gears found.

Version-Release number of selected component (if applicable):

Current version: rhc-broker-0.93.28-1.el6_2.noarch


How reproducible:
We are seeing this in stg and production.  I would guess that this is reproducible.

Steps to Reproduce:
1. Run rhc-admin-check in stg
2. View the output of rhc-admin-check
3. Verify there are FAILURES
  
Actual results:

The data in mongo does not reflect what is on the nodes.  

Expected results:

rhc-admin-check should pass with having all users' consumed gears matching their acutal gears.

Additional info:
Here is the output from rhc-admin-check:

FAIL - user mmcgrath+nagios has a mismatch in consumed gears (0) and actual gears (1)!
FAIL - user jhou has a mismatch in consumed gears (2) and actual gears (3)!
FAIL - user c9stage1155 has a mismatch in consumed gears (1) and actual gears (0)!
FAIL - user mzimen has a mismatch in consumed gears (0) and actual gears (6)!

Comment 1 Dan McPherson 2012-06-21 16:52:03 UTC
To be clear this has nothing to do with what's on the nodes.  It's a mismatch entirely in mongo.  Can you verify if these breaks happen with the latest code, there was at least one fix for this issue this release.

Comment 2 Thomas Wiest 2012-06-21 22:29:41 UTC
We've cleaned up all of the ones that rhc-admin-chk reported in STG. So, we'll see if more get reported now.

Comment 3 Xiaoli Tian 2012-06-25 05:34:08 UTC
Hard to reproduce it in devenv, please help to check if it happened again because of bug 834737 in stage.

Thanks

Comment 4 Thomas Wiest 2012-06-25 13:24:54 UTC
When we deployed to STG, we made sure rhc-admin-chk was succeeding (clean).

Today when I run it, I see the following mismatched consumed gears:

FAIL - user [REDACTED] has a mismatch in consumed gears (9) and actual gears (5)!

FAIL - user [REDACTED] has a mismatch in consumed gears (0) and actual gears (1)!

FAIL - user [REDACTED] has a mismatch in consumed gears (-1) and actual gears (0)!

FAIL - user [REDACTED] has a mismatch in consumed gears (1) and actual gears (0)!

FAIL - user [REDACTED] has a mismatch in consumed gears (2) and actual gears (0)!


e-mail me if you need the logins and I'll forward you the output from rhc-admin-chk.

So, this bug is definitely still present in the current STG build.

rhc-broker-0.94.17-1.el6_3.noarch

Comment 5 Dan McPherson 2012-06-25 13:33:04 UTC
I need broker logs from what these users have done.

Comment 6 Thomas Wiest 2012-06-25 14:08:17 UTC
Sent as a separate e-mail to Dan.

Comment 7 Dan McPherson 2012-06-25 16:57:56 UTC
*** Bug 835144 has been marked as a duplicate of this bug. ***

Comment 8 Dan McPherson 2012-06-25 20:38:41 UTC
Hi Ravi,

  I have been looking over this one today.  I'll let you have it at the end of the day if I don't think I am done with it.

Thanks,

Dan

Comment 9 Rajat Chopra 2012-07-07 00:00:04 UTC
Done with first part of improving the situation with rev#9062edb0f922844306d5eb150624fa2897bda13d in crankcase.repo! 
Work undertaken ->

1. Mongo save before we create on node

2. raise an exception when destroy fails, after destroy is called upon all gears (and app should not get deleted subsequently)

3. ngears get decremented only after destroy succeeds, and do not delete the gear from group_instance if a failure happens

4. encapsulate destroy logic into a single place (cleanup)



Second part will part of a User Story. Work should undertake saving gears explicitly as an atomic operation. Either that, or prevention of concurrent processes on app should be prevented with locks.

Comment 10 Johnny Liu 2012-07-11 14:28:42 UTC
Retest this bug on devenv_1884, and still can reproduce. I am not sure if this the second part (part of a User Story), if it is, pls move this bug to ON_QA, I will verify it again.


1.Create an user with maxgears=3
2.Change Min to 4 in /usr/libexec/stickshift/cartridges/php-5.3/info/manifest.yml
3.Create an scaling app with above type via command line

Fail to create app, no any gear is saved in mongo, but user consumed gear become 1.

PRIMARY> db.user.find()
{ "_id" : "jialiu", "apps" : [ ], "capabilities" : { "gear_sizes" : [ "small" ] }, "consumed_gears" : 1, "domains" : [ { "uuid" : "44b83b734bfa4f3ba72daac9f035644e", "namespace" : "jialiu" } ], "env_vars" : null, "gear_usage_records" : [ 	{ 	"event" : "begin", 	"uuid" : "414b89874da84cf48395f0ad2149571c", 	"gear_uuid" : "1606f58fc38c4fe1b9d90a6feaa9e4ca", 	"time" : ISODate("2012-07-11T14:16:17.834Z"), 	"gear_size" : "small", 	"sync_time" : null } ], "login" : "jialiu", "max_gears" : 3, "parent_user_login" : null, "ssh_keys" : { "default" : { "key" : "AAAAB3NzaC1yc2EAAAADAQABAAABAQDSv44aEPcObZkAN5VI8XHW23b7JL0wBftkPwtXwHF6ppxnvpIhQNyBy5crHWGrigEOGLsJWH7hmo/rfkELuhfpdaGIB582AAJ5Eeug+Fv7yQFQodCafALhh/piXXnJ7xsnFpy6Pz5OVuxC2nRoew8oqSIjKaHTdjzuSPNRviEKLTypcREtnQp7nCCTDm3NFjaM40tDA3/i9m708qViQHv5tqkdyrfLMu5Lq+oJMrzP911aCn3F0GTc+T/cUC/R2ay5wLhv9FT+eTrDSOsMFt9BZYFT+mfSyIJhaGuxB7OUQ3qf4RRMo+0hGINRSLldFtRaZiQUyqw6nKPR460MfgMN", "type" : "ssh-rsa" } }, "system_ssh_keys" : null, "uuid" : "307d5fbf4c2c4fd9a21d349d4f177045", "vip" : false }

Comment 11 Rajat Chopra 2012-07-11 21:16:55 UTC
Fixed with rev#870689373c385ce50d395bf92f430b72e3ff5d8a in crankcase.repo

Comment 12 Johnny Liu 2012-07-12 06:51:27 UTC
Verified this bug on devenv-stage_223, PASS.