Bug 834351 - User's consumed gears and actual gears don't match
User's consumed gears and actual gears don't match
Status: CLOSED CURRENTRELEASE
Product: OpenShift Origin
Classification: Red Hat
Component: Pod (Show other bugs)
2.x
x86_64 Unspecified
medium Severity medium
: ---
: ---
Assigned To: Rajat Chopra
libra bugs
: Triaged
: 835144 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-21 11:40 EDT by Kenny Woodson
Modified: 2015-05-14 21:58 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-07-13 19:43:42 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Kenny Woodson 2012-06-21 11:40:28 EDT
Description of problem:
rhc-admin-chk reports that a user's consumed gears and actual gears do not match.  We believe this is due to a race condition on application create/destroy as data in mongo is getting out of sync with what is actually on the nodes.

Our current work around is to set the consumed gears to the actual gears found.

Version-Release number of selected component (if applicable):

Current version: rhc-broker-0.93.28-1.el6_2.noarch


How reproducible:
We are seeing this in stg and production.  I would guess that this is reproducible.

Steps to Reproduce:
1. Run rhc-admin-check in stg
2. View the output of rhc-admin-check
3. Verify there are FAILURES
  
Actual results:

The data in mongo does not reflect what is on the nodes.  

Expected results:

rhc-admin-check should pass with having all users' consumed gears matching their acutal gears.

Additional info:
Here is the output from rhc-admin-check:

FAIL - user mmcgrath+nagios@redhat.com has a mismatch in consumed gears (0) and actual gears (1)!
FAIL - user jhou@redhat.com has a mismatch in consumed gears (2) and actual gears (3)!
FAIL - user c9stage1155 has a mismatch in consumed gears (1) and actual gears (0)!
FAIL - user mzimen@redhat.com has a mismatch in consumed gears (0) and actual gears (6)!
Comment 1 Dan McPherson 2012-06-21 12:52:03 EDT
To be clear this has nothing to do with what's on the nodes.  It's a mismatch entirely in mongo.  Can you verify if these breaks happen with the latest code, there was at least one fix for this issue this release.
Comment 2 Thomas Wiest 2012-06-21 18:29:41 EDT
We've cleaned up all of the ones that rhc-admin-chk reported in STG. So, we'll see if more get reported now.
Comment 3 Xiaoli Tian 2012-06-25 01:34:08 EDT
Hard to reproduce it in devenv, please help to check if it happened again because of bug 834737 in stage.

Thanks
Comment 4 Thomas Wiest 2012-06-25 09:24:54 EDT
When we deployed to STG, we made sure rhc-admin-chk was succeeding (clean).

Today when I run it, I see the following mismatched consumed gears:

FAIL - user [REDACTED] has a mismatch in consumed gears (9) and actual gears (5)!

FAIL - user [REDACTED] has a mismatch in consumed gears (0) and actual gears (1)!

FAIL - user [REDACTED] has a mismatch in consumed gears (-1) and actual gears (0)!

FAIL - user [REDACTED] has a mismatch in consumed gears (1) and actual gears (0)!

FAIL - user [REDACTED] has a mismatch in consumed gears (2) and actual gears (0)!


e-mail me if you need the logins and I'll forward you the output from rhc-admin-chk.

So, this bug is definitely still present in the current STG build.

rhc-broker-0.94.17-1.el6_3.noarch
Comment 5 Dan McPherson 2012-06-25 09:33:04 EDT
I need broker logs from what these users have done.
Comment 6 Thomas Wiest 2012-06-25 10:08:17 EDT
Sent as a separate e-mail to Dan.
Comment 7 Dan McPherson 2012-06-25 12:57:56 EDT
*** Bug 835144 has been marked as a duplicate of this bug. ***
Comment 8 Dan McPherson 2012-06-25 16:38:41 EDT
Hi Ravi,

  I have been looking over this one today.  I'll let you have it at the end of the day if I don't think I am done with it.

Thanks,

Dan
Comment 9 Rajat Chopra 2012-07-06 20:00:04 EDT
Done with first part of improving the situation with rev#9062edb0f922844306d5eb150624fa2897bda13d in crankcase.repo! 
Work undertaken ->

1. Mongo save before we create on node

2. raise an exception when destroy fails, after destroy is called upon all gears (and app should not get deleted subsequently)

3. ngears get decremented only after destroy succeeds, and do not delete the gear from group_instance if a failure happens

4. encapsulate destroy logic into a single place (cleanup)



Second part will part of a User Story. Work should undertake saving gears explicitly as an atomic operation. Either that, or prevention of concurrent processes on app should be prevented with locks.
Comment 10 Johnny Liu 2012-07-11 10:28:42 EDT
Retest this bug on devenv_1884, and still can reproduce. I am not sure if this the second part (part of a User Story), if it is, pls move this bug to ON_QA, I will verify it again.


1.Create an user with maxgears=3
2.Change Min to 4 in /usr/libexec/stickshift/cartridges/php-5.3/info/manifest.yml
3.Create an scaling app with above type via command line

Fail to create app, no any gear is saved in mongo, but user consumed gear become 1.

PRIMARY> db.user.find()
{ "_id" : "jialiu@redhat.com", "apps" : [ ], "capabilities" : { "gear_sizes" : [ "small" ] }, "consumed_gears" : 1, "domains" : [ { "uuid" : "44b83b734bfa4f3ba72daac9f035644e", "namespace" : "jialiu" } ], "env_vars" : null, "gear_usage_records" : [ 	{ 	"event" : "begin", 	"uuid" : "414b89874da84cf48395f0ad2149571c", 	"gear_uuid" : "1606f58fc38c4fe1b9d90a6feaa9e4ca", 	"time" : ISODate("2012-07-11T14:16:17.834Z"), 	"gear_size" : "small", 	"sync_time" : null } ], "login" : "jialiu@redhat.com", "max_gears" : 3, "parent_user_login" : null, "ssh_keys" : { "default" : { "key" : "AAAAB3NzaC1yc2EAAAADAQABAAABAQDSv44aEPcObZkAN5VI8XHW23b7JL0wBftkPwtXwHF6ppxnvpIhQNyBy5crHWGrigEOGLsJWH7hmo/rfkELuhfpdaGIB582AAJ5Eeug+Fv7yQFQodCafALhh/piXXnJ7xsnFpy6Pz5OVuxC2nRoew8oqSIjKaHTdjzuSPNRviEKLTypcREtnQp7nCCTDm3NFjaM40tDA3/i9m708qViQHv5tqkdyrfLMu5Lq+oJMrzP911aCn3F0GTc+T/cUC/R2ay5wLhv9FT+eTrDSOsMFt9BZYFT+mfSyIJhaGuxB7OUQ3qf4RRMo+0hGINRSLldFtRaZiQUyqw6nKPR460MfgMN", "type" : "ssh-rsa" } }, "system_ssh_keys" : null, "uuid" : "307d5fbf4c2c4fd9a21d349d4f177045", "vip" : false }
Comment 11 Rajat Chopra 2012-07-11 17:16:55 EDT
Fixed with rev#870689373c385ce50d395bf92f430b72e3ff5d8a in crankcase.repo
Comment 12 Johnny Liu 2012-07-12 02:51:27 EDT
Verified this bug on devenv-stage_223, PASS.

Note You need to log in before you can comment on or make changes to this bug.