Bug 1281912

Summary: oo-admin-chk doesn't recognize when gear no longer exists on node
Product: OpenShift Container Platform Reporter: Rory Thrasher <rthrashe>
Component: NodeAssignee: Rory Thrasher <rthrashe>
Status: CLOSED NOTABUG QA Contact: Jianwei Hou <jhou>
Severity: low Docs Contact:
Priority: low    
Version: 2.2.0CC: aos-bugs, jialiu, jokerman, mmccomas
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-07 03:06:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rory Thrasher 2015-11-13 19:04:36 UTC
Description of problem:

When oo-admin-chk is run, it has the nodes loop through their gear users and doesn't actually check if the gear exists.  If the gear is deleted without removing the user, then oo-admin-chk will not recognize that the gear doesn't exist on any nodes.







How reproducible:
Always

Steps to Reproduce:
1. Create any app
rhc create-app myapp python-3.3

2. Manually delete the app from the node
rm -rf /var/lib/openshift/GEARUUID

3. Wait 10 minutes (oo-admin-chk doesn't flag failures until 600 seconds after app creation to allow the app to finish starting up)

4. Run oo-admin-chk


Actual results:
oo-admin-chk succeeds and does not recognize that the gear is missing.


# oo-admin-chk
Started at: 2015-11-13 16:01:03 UTC

User data populated in 0 seconds

Domain data populated in 0 seconds

District data populated in 0 seconds

Total gears found in mongo: 2
Application data populated in 0 seconds

Usage data populated in 0 seconds

Fetched all gears in 20 seconds
Total gears found on the nodes: 2
Total nodes that responded: 2
Checked application gears on nodes in 0 seconds

Checked application gears on nodes (reverse match) in 0 seconds


Finished at: 2015-11-13 16:01:23 UTC
Total time: 20.434s
SUCCESS


Expected results:
oo-admin-chk should fail and inform the user that Gear UUID does not exist on any nodes.

# oo-admin-chk
Started at: 2015-11-13 16:01:03 UTC

User data populated in 0 seconds

Domain data populated in 0 seconds

District data populated in 0 seconds

Total gears found in mongo: 2
Application data populated in 0 seconds

Usage data populated in 0 seconds

Fetched all gears in 20 seconds
Total gears found on the nodes: 1
Total nodes that responded: 1
Checked application gears on nodes in 0 seconds

Checked application gears on nodes (reverse match) in 0 seconds


Finished at: 2015-11-13 16:01:23 UTC
Total time: 20.434s
Gear 564603c95f4834200d00001f does not exist on any node
Please see https://access.redhat.com/site/solutions/712593 for more information.
FAILED
Please refer to the oo-admin-repair tool man page to resolve some of these inconsistencies if no suggestion was provided with any error message(s).

Additional info:

May not contain the same error messages as the expected output until https://bugzilla.redhat.com/show_bug.cgi?id=1111598 has been completed.  Still should mention that Gear UUID does not exist on any node.

Comment 1 Johnny Liu 2015-11-18 10:27:27 UTC
"rm -rf /var/lib/openshift/<UUID>" does not trigger error message.

But delete gear uuid entry from /etc/passwd (or userdel -f -r <uuid>) will trigger the error.

Comment 2 Rory Thrasher 2015-12-04 20:09:23 UTC
After some exploration, it looks like there was previously a check for the gear directory in oo-admin-chk.  It was removed for speed purposes and because the check existed elsewhere in oo-accept-node <https://github.com/openshift/origin-server/blob/master/node-util/sbin/oo-accept-node#L594>.

QE:  Please verify that oo-accept-node will catch an instance of a user existing in /etc/passd but not having a corresponding gear directory in /var/lib/openshift/

1. Create any app
rhc create-app myapp python-3.3

2. Manually delete the app from the node
rm -rf /var/lib/openshift/GEARUUID

3. Run oo-accept-node from the broker

If oo-accept-node reports a failure like below, then this check exists and this bug is notabug.

FAIL: user {gear_uuid} does not have a home directory /var/lib/openshift/{gear_uuid}

Comment 3 Johnny Liu 2015-12-07 03:06:53 UTC
Yeah, oo-accept-node reports a failure:
#  oo-accept-node
FAIL: user jialiu-php53app-1 does not have a home directory /var/lib/openshift/jialiu-php53app-1
FAIL: Gear does not have an OPENSHIFT_GEAR_DNS variable: 'jialiu-php53app-1'
2 ERRORS


So according to comment 2, close it as "NOTABUG".