Description of problem: When oo-admin-chk is run, it has the nodes loop through their gear users and doesn't actually check if the gear exists. If the gear is deleted without removing the user, then oo-admin-chk will not recognize that the gear doesn't exist on any nodes. How reproducible: Always Steps to Reproduce: 1. Create any app rhc create-app myapp python-3.3 2. Manually delete the app from the node rm -rf /var/lib/openshift/GEARUUID 3. Wait 10 minutes (oo-admin-chk doesn't flag failures until 600 seconds after app creation to allow the app to finish starting up) 4. Run oo-admin-chk Actual results: oo-admin-chk succeeds and does not recognize that the gear is missing. # oo-admin-chk Started at: 2015-11-13 16:01:03 UTC User data populated in 0 seconds Domain data populated in 0 seconds District data populated in 0 seconds Total gears found in mongo: 2 Application data populated in 0 seconds Usage data populated in 0 seconds Fetched all gears in 20 seconds Total gears found on the nodes: 2 Total nodes that responded: 2 Checked application gears on nodes in 0 seconds Checked application gears on nodes (reverse match) in 0 seconds Finished at: 2015-11-13 16:01:23 UTC Total time: 20.434s SUCCESS Expected results: oo-admin-chk should fail and inform the user that Gear UUID does not exist on any nodes. # oo-admin-chk Started at: 2015-11-13 16:01:03 UTC User data populated in 0 seconds Domain data populated in 0 seconds District data populated in 0 seconds Total gears found in mongo: 2 Application data populated in 0 seconds Usage data populated in 0 seconds Fetched all gears in 20 seconds Total gears found on the nodes: 1 Total nodes that responded: 1 Checked application gears on nodes in 0 seconds Checked application gears on nodes (reverse match) in 0 seconds Finished at: 2015-11-13 16:01:23 UTC Total time: 20.434s Gear 564603c95f4834200d00001f does not exist on any node Please see https://access.redhat.com/site/solutions/712593 for more information. FAILED Please refer to the oo-admin-repair tool man page to resolve some of these inconsistencies if no suggestion was provided with any error message(s). Additional info: May not contain the same error messages as the expected output until https://bugzilla.redhat.com/show_bug.cgi?id=1111598 has been completed. Still should mention that Gear UUID does not exist on any node.
"rm -rf /var/lib/openshift/<UUID>" does not trigger error message. But delete gear uuid entry from /etc/passwd (or userdel -f -r <uuid>) will trigger the error.
After some exploration, it looks like there was previously a check for the gear directory in oo-admin-chk. It was removed for speed purposes and because the check existed elsewhere in oo-accept-node <https://github.com/openshift/origin-server/blob/master/node-util/sbin/oo-accept-node#L594>. QE: Please verify that oo-accept-node will catch an instance of a user existing in /etc/passd but not having a corresponding gear directory in /var/lib/openshift/ 1. Create any app rhc create-app myapp python-3.3 2. Manually delete the app from the node rm -rf /var/lib/openshift/GEARUUID 3. Run oo-accept-node from the broker If oo-accept-node reports a failure like below, then this check exists and this bug is notabug. FAIL: user {gear_uuid} does not have a home directory /var/lib/openshift/{gear_uuid}
Yeah, oo-accept-node reports a failure: # oo-accept-node FAIL: user jialiu-php53app-1 does not have a home directory /var/lib/openshift/jialiu-php53app-1 FAIL: Gear does not have an OPENSHIFT_GEAR_DNS variable: 'jialiu-php53app-1' 2 ERRORS So according to comment 2, close it as "NOTABUG".