Red Hat Bugzilla – Bug 965285
[oo-accept-node] httpd config references UUID without associated gear
Last modified: 2017-11-08 17:14:46 EST
Description of problem:
oo-accept-node is detecting the following issue and throwing the error:
FAIL: httpd config references UUID without associated gear: '519155be5004466472000232'
Gear is deleted in mongo and the user and gear directory are not present but it seems to leave behind some frontend configuration.
sudo grep -r -l 519155be5004466472000232 /var/lib/openshift/.httpd.d
Please send me any mcollective and broker logs pertaining to that gear for analysis. Thanks!
Got the logs - thanks!
Ok, I'm not seeing a cause for this gear to have a lingering front-end configuration.
The new rhc-fix-stale-frontend script can accept a uuid as an argument. It will go to production at the end of this sprint but should work if its copied to the ex node early to get rid of this gear.
I believe there's nothing else that can be done for this ticket so closing it as errata (use the new rhc-fix-stale-frontend). Please re-open if you would like it explored further.
We're seeing around 4 new of these daily in PROD.
Please take a look again at what could possibly be causing this.
It seems to be happening sporadically on gear destroy. Also, this is the only thing left of the gear.
Lets pick the most gear that exhibits this problem and send all of the following:
1. Any broker logs pertaining to that application.
2. The mcollective logs from that ex node.
3. /var/log/messages, /var/log/audit/audit.log, /var/log/secure from the ex node.
4. the complete contents of /var/log/openshift from the ex node.
Sorry, that should read "most recent gear that exhibits the problem".
Last week we were seeing around 4 of these a day. However now, there are far less (although it is still happening).
Since the data he requested is both large and very secret, I've sent Rob an e-mail telling him how he can download it.
Removing need info as I gave the info in comment 7.
Taking off the build blocker list but keeping it as the high priority item on my plate.
Complete system logs were provided for a specific gear and it was determined that the gear account was removed by hand (ex: by calling userdel XXXXXXXXXXXX). This would leave a stale front-end Apache configuration in place. I have a query out to find out more about why the gear was deleted.
It looks like the broker purged the application in question from mongodb when the application connector hooks failed to run.
Release ticket updated to request the use of oo-app-destroy instead of userdel to purge stale gears.
Waiting on more information to see why the application hook calls failed.
Need another gear which has a stale front-end configuration but was not destroyed by ops with "userdel".
Second round of fixes to this class of issue were to fix problems in oo-accept-node resulting from the v1 -> v2 migration.
This is impossible to Q/E so I'm moving it directly to closed. Please re-open if a large number of then show up, otherwise, we'll deal with small numbers of them as they happen.