Bug 1281912 - oo-admin-chk doesn't recognize when gear no longer exists on node
Summary: oo-admin-chk doesn't recognize when gear no longer exists on node
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 2.2.0
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: Rory Thrasher
QA Contact: Jianwei Hou
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-11-13 19:04 UTC by Rory Thrasher
Modified: 2015-12-07 03:06 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-12-07 03:06:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Rory Thrasher 2015-11-13 19:04:36 UTC
Description of problem:

When oo-admin-chk is run, it has the nodes loop through their gear users and doesn't actually check if the gear exists.  If the gear is deleted without removing the user, then oo-admin-chk will not recognize that the gear doesn't exist on any nodes.







How reproducible:
Always

Steps to Reproduce:
1. Create any app
rhc create-app myapp python-3.3

2. Manually delete the app from the node
rm -rf /var/lib/openshift/GEARUUID

3. Wait 10 minutes (oo-admin-chk doesn't flag failures until 600 seconds after app creation to allow the app to finish starting up)

4. Run oo-admin-chk


Actual results:
oo-admin-chk succeeds and does not recognize that the gear is missing.


# oo-admin-chk
Started at: 2015-11-13 16:01:03 UTC

User data populated in 0 seconds

Domain data populated in 0 seconds

District data populated in 0 seconds

Total gears found in mongo: 2
Application data populated in 0 seconds

Usage data populated in 0 seconds

Fetched all gears in 20 seconds
Total gears found on the nodes: 2
Total nodes that responded: 2
Checked application gears on nodes in 0 seconds

Checked application gears on nodes (reverse match) in 0 seconds


Finished at: 2015-11-13 16:01:23 UTC
Total time: 20.434s
SUCCESS


Expected results:
oo-admin-chk should fail and inform the user that Gear UUID does not exist on any nodes.

# oo-admin-chk
Started at: 2015-11-13 16:01:03 UTC

User data populated in 0 seconds

Domain data populated in 0 seconds

District data populated in 0 seconds

Total gears found in mongo: 2
Application data populated in 0 seconds

Usage data populated in 0 seconds

Fetched all gears in 20 seconds
Total gears found on the nodes: 1
Total nodes that responded: 1
Checked application gears on nodes in 0 seconds

Checked application gears on nodes (reverse match) in 0 seconds


Finished at: 2015-11-13 16:01:23 UTC
Total time: 20.434s
Gear 564603c95f4834200d00001f does not exist on any node
Please see https://access.redhat.com/site/solutions/712593 for more information.
FAILED
Please refer to the oo-admin-repair tool man page to resolve some of these inconsistencies if no suggestion was provided with any error message(s).

Additional info:

May not contain the same error messages as the expected output until https://bugzilla.redhat.com/show_bug.cgi?id=1111598 has been completed.  Still should mention that Gear UUID does not exist on any node.

Comment 1 Johnny Liu 2015-11-18 10:27:27 UTC
"rm -rf /var/lib/openshift/<UUID>" does not trigger error message.

But delete gear uuid entry from /etc/passwd (or userdel -f -r <uuid>) will trigger the error.

Comment 2 Rory Thrasher 2015-12-04 20:09:23 UTC
After some exploration, it looks like there was previously a check for the gear directory in oo-admin-chk.  It was removed for speed purposes and because the check existed elsewhere in oo-accept-node <https://github.com/openshift/origin-server/blob/master/node-util/sbin/oo-accept-node#L594>.

QE:  Please verify that oo-accept-node will catch an instance of a user existing in /etc/passd but not having a corresponding gear directory in /var/lib/openshift/

1. Create any app
rhc create-app myapp python-3.3

2. Manually delete the app from the node
rm -rf /var/lib/openshift/GEARUUID

3. Run oo-accept-node from the broker

If oo-accept-node reports a failure like below, then this check exists and this bug is notabug.

FAIL: user {gear_uuid} does not have a home directory /var/lib/openshift/{gear_uuid}

Comment 3 Johnny Liu 2015-12-07 03:06:53 UTC
Yeah, oo-accept-node reports a failure:
#  oo-accept-node
FAIL: user jialiu-php53app-1 does not have a home directory /var/lib/openshift/jialiu-php53app-1
FAIL: Gear does not have an OPENSHIFT_GEAR_DNS variable: 'jialiu-php53app-1'
2 ERRORS


So according to comment 2, close it as "NOTABUG".


Note You need to log in before you can comment on or make changes to this bug.