Bug 817663 - rhc-admin-chk reports false positives based on current activity
rhc-admin-chk reports false positives based on current activity
Status: CLOSED CURRENTRELEASE
Product: OpenShift Origin
Classification: Red Hat
Component: Kubernetes (Show other bugs)
2.x
Unspecified Unspecified
high Severity low
: ---
: ---
Assigned To: Rajat Chopra
libra bugs
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-04-30 15:47 EDT by Thomas Wiest
Modified: 2015-05-14 21:52 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-06-08 13:58:53 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Thomas Wiest 2012-04-30 15:47:06 EDT
Description of problem:

rhc-admin-chk says gears not in mongo that actually are. I ran rhc-admin-chk against PROD and got these errors:

Gear 11b2e01f33eb4a21a7747f8a76ca722e exists on node [ex-std-node49.prod.rhcloud.com, uid:4285] but does not exist in mongo database
Gear a0ebaf6ab0d448e18d1dd771b0ec0966 exists on node [ex-std-node43.prod.rhcloud.com, uid:4289] but does not exist in mongo database
Gear 3b739dbc773e45e6885bce96d1829da4 exists on node [ex-std-node52.prod.rhcloud.com, uid:4288] but does not exist in mongo database
Gear e886223b70f24c1283171403c0f07dcb exists on node [ex-std-node52.prod.rhcloud.com, uid:4290] but does not exist in mongo database

However, when I checked mongo, these gears were all in mongo.

NOTE: I checked and these gears are NOT haproxy slave gears.

There were some others that were reported as not being in mongo, which were correct (so it's not 100% wrong).


Version-Release number of selected component (if applicable):
rhc-broker-0.91.18-1.el6_2.noarch

How reproducible:
unsure, every time I run rhc-admin-chk, these same entries show up.


Steps to Reproduce:
1. Unknown


Actual results:
rhc-admin-chk incorrectly flags some gears as "does not exist in mongo database" when they actually do.


Expected results:
Should only flag gears that are actually not in mongo.
Comment 1 Thomas Wiest 2012-04-30 16:00:57 EDT
Ok, so I just ran rhc-admin-chk again (about 5 minutes after the last one), and now those gears are not flagged as missing from mongo.

This leads me to believe that rhc-admin-chk can be fooled into printing false positives for actions taken while it's being run.

For instance, if an app is being created while rhc-admin-chk is running, it may only see the mongo portion of the app create, and flag it as being missing from a node.

This means that in PROD, we will likely get a lot of false positives, which is bad because eventually we'd like to write a monitoring check around this call.

I'm not sure how to fix this, as we can't just stop people from creating / destroying apps while rhc-admin-chk is running.
Comment 2 Rajat Chopra 2012-05-25 17:44:00 EDT
rev#dbea76cf10856985b34c4225a7b99d3d96512265 in li.repo

The node gears are now fetched first (because it takes a long time), then the mongo entries are gotten. This reduces the time gap wherein gears got created but the entries didnt exist in mongo.

Can only completely eliminate this if we implement a re-check after 'N' seconds, where N is the maximum time taken for app creation, but thats a moving target. Please re-open the bug if we really need that implemented.
Comment 3 Rony Gong 2012-05-29 01:49:52 EDT
Verify Steps:
1. in client run: rhc app create -a app1 -t php-5.3
2. same time in server run: rhc-admin-chk
3. after app1 created, run again: rhc-admin-chk

result:
1. app1 created success
2. didn't show: Gear e886223b70f24c1283171403c0f07dcb exists on node [ex-std-node52.prod.rhcloud.com, uid:4290] but does not exist in mongo database
Gear is for app1 uuid.

3. didn't show: Gear e886223b70f24c1283171403c0f07dcb exists on node [ex-std-node52.prod.rhcloud.com, uid:4290] but does not exist in mongo database
Gear is for app1 uuid.
Comment 4 Xiaoli Tian 2012-05-29 03:55:42 EDT
After removing the user data from mongodb and run rhc-admin-chk, it will report the correct error:

PRIMARY> use openshift_broker_dev
switched to db openshift_broker_dev
PRIMARY> u = db.user.findOne( { "_id" : "xtian+t91@redhat.com" } )
PRIMARY> u['apps']=[]
[ ]
PRIMARY> db.user.save(u)
PRIMARY> exit
bye

[root@ip-10-123-45-244 stickshift]# rhc-admin-chk

Check failed.
 FAIL - user xtian+t91@redhat.com has a mismatch in consumed gears (1) and actual gears (0)!

Gear 873444994b4742a6920f1c4bb96d9e9f exists on node [ip-10-123-45-244, uid:504] but does not exist in mongo database

Note You need to log in before you can comment on or make changes to this bug.