Bug 834351

Summary:	User's consumed gears and actual gears don't match
Product:	OKD	Reporter:	Kenny Woodson <kwoodson>
Component:	Pod	Assignee:	Rajat Chopra <rchopra>
Status:	CLOSED CURRENTRELEASE	QA Contact:	libra bugs <libra-bugs>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	2.x	CC:	dmcphers, jialiu, mshao, rmillner, rpenta, twiest, xtian
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	x86_64
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2012-07-13 23:43:42 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Kenny Woodson 2012-06-21 15:40:28 UTC

Description of problem:
rhc-admin-chk reports that a user's consumed gears and actual gears do not match.  We believe this is due to a race condition on application create/destroy as data in mongo is getting out of sync with what is actually on the nodes.

Our current work around is to set the consumed gears to the actual gears found.

Version-Release number of selected component (if applicable):

Current version: rhc-broker-0.93.28-1.el6_2.noarch


How reproducible:
We are seeing this in stg and production.  I would guess that this is reproducible.

Steps to Reproduce:
1. Run rhc-admin-check in stg
2. View the output of rhc-admin-check
3. Verify there are FAILURES
  
Actual results:

The data in mongo does not reflect what is on the nodes.  

Expected results:

rhc-admin-check should pass with having all users' consumed gears matching their acutal gears.

Additional info:
Here is the output from rhc-admin-check:

FAIL - user mmcgrath+nagios has a mismatch in consumed gears (0) and actual gears (1)!
FAIL - user jhou has a mismatch in consumed gears (2) and actual gears (3)!
FAIL - user c9stage1155 has a mismatch in consumed gears (1) and actual gears (0)!
FAIL - user mzimen has a mismatch in consumed gears (0) and actual gears (6)!

Comment 1 Dan McPherson 2012-06-21 16:52:03 UTC

To be clear this has nothing to do with what's on the nodes.  It's a mismatch entirely in mongo.  Can you verify if these breaks happen with the latest code, there was at least one fix for this issue this release.

Comment 2 Thomas Wiest 2012-06-21 22:29:41 UTC

We've cleaned up all of the ones that rhc-admin-chk reported in STG. So, we'll see if more get reported now.

Comment 3 Xiaoli Tian 2012-06-25 05:34:08 UTC

Hard to reproduce it in devenv, please help to check if it happened again because of bug 834737 in stage.

Thanks

Comment 4 Thomas Wiest 2012-06-25 13:24:54 UTC

When we deployed to STG, we made sure rhc-admin-chk was succeeding (clean).

Today when I run it, I see the following mismatched consumed gears:

FAIL - user [REDACTED] has a mismatch in consumed gears (9) and actual gears (5)!

FAIL - user [REDACTED] has a mismatch in consumed gears (0) and actual gears (1)!

FAIL - user [REDACTED] has a mismatch in consumed gears (-1) and actual gears (0)!

FAIL - user [REDACTED] has a mismatch in consumed gears (1) and actual gears (0)!

FAIL - user [REDACTED] has a mismatch in consumed gears (2) and actual gears (0)!


e-mail me if you need the logins and I'll forward you the output from rhc-admin-chk.

So, this bug is definitely still present in the current STG build.

rhc-broker-0.94.17-1.el6_3.noarch

Comment 5 Dan McPherson 2012-06-25 13:33:04 UTC

I need broker logs from what these users have done.

Comment 6 Thomas Wiest 2012-06-25 14:08:17 UTC

Sent as a separate e-mail to Dan.

Comment 7 Dan McPherson 2012-06-25 16:57:56 UTC

*** Bug 835144 has been marked as a duplicate of this bug. ***

Comment 8 Dan McPherson 2012-06-25 20:38:41 UTC

Hi Ravi,

  I have been looking over this one today.  I'll let you have it at the end of the day if I don't think I am done with it.

Thanks,

Dan

Comment 9 Rajat Chopra 2012-07-07 00:00:04 UTC

Done with first part of improving the situation with rev#9062edb0f922844306d5eb150624fa2897bda13d in crankcase.repo! 
Work undertaken ->

1. Mongo save before we create on node

2. raise an exception when destroy fails, after destroy is called upon all gears (and app should not get deleted subsequently)

3. ngears get decremented only after destroy succeeds, and do not delete the gear from group_instance if a failure happens

4. encapsulate destroy logic into a single place (cleanup)



Second part will part of a User Story. Work should undertake saving gears explicitly as an atomic operation. Either that, or prevention of concurrent processes on app should be prevented with locks.

Comment 10 Johnny Liu 2012-07-11 14:28:42 UTC

Retest this bug on devenv_1884, and still can reproduce. I am not sure if this the second part (part of a User Story), if it is, pls move this bug to ON_QA, I will verify it again.


1.Create an user with maxgears=3
2.Change Min to 4 in /usr/libexec/stickshift/cartridges/php-5.3/info/manifest.yml
3.Create an scaling app with above type via command line

Fail to create app, no any gear is saved in mongo, but user consumed gear become 1.

PRIMARY> db.user.find()
{ "_id" : "jialiu", "apps" : [ ], "capabilities" : { "gear_sizes" : [ "small" ] }, "consumed_gears" : 1, "domains" : [ { "uuid" : "44b83b734bfa4f3ba72daac9f035644e", "namespace" : "jialiu" } ], "env_vars" : null, "gear_usage_records" : [ 	{ 	"event" : "begin", 	"uuid" : "414b89874da84cf48395f0ad2149571c", 	"gear_uuid" : "1606f58fc38c4fe1b9d90a6feaa9e4ca", 	"time" : ISODate("2012-07-11T14:16:17.834Z"), 	"gear_size" : "small", 	"sync_time" : null } ], "login" : "jialiu", "max_gears" : 3, "parent_user_login" : null, "ssh_keys" : { "default" : { "key" : "AAAAB3NzaC1yc2EAAAADAQABAAABAQDSv44aEPcObZkAN5VI8XHW23b7JL0wBftkPwtXwHF6ppxnvpIhQNyBy5crHWGrigEOGLsJWH7hmo/rfkELuhfpdaGIB582AAJ5Eeug+Fv7yQFQodCafALhh/piXXnJ7xsnFpy6Pz5OVuxC2nRoew8oqSIjKaHTdjzuSPNRviEKLTypcREtnQp7nCCTDm3NFjaM40tDA3/i9m708qViQHv5tqkdyrfLMu5Lq+oJMrzP911aCn3F0GTc+T/cUC/R2ay5wLhv9FT+eTrDSOsMFt9BZYFT+mfSyIJhaGuxB7OUQ3qf4RRMo+0hGINRSLldFtRaZiQUyqw6nKPR460MfgMN", "type" : "ssh-rsa" } }, "system_ssh_keys" : null, "uuid" : "307d5fbf4c2c4fd9a21d349d4f177045", "vip" : false }

Comment 11 Rajat Chopra 2012-07-11 21:16:55 UTC

Fixed with rev#870689373c385ce50d395bf92f430b72e3ff5d8a in crankcase.repo

Comment 12 Johnny Liu 2012-07-12 06:51:27 UTC

Verified this bug on devenv-stage_223, PASS.