Red Hat Bugzilla – Bug 1273658
Failure while deleting gear: '' is not a legal cartridge identifier
Last modified: 2016-01-29 11:46:40 EST
Created attachment 1084935 [details]
Summary of affected INT apps
Description of problem:
In the INT environment, I'm seeing these accept-node errors on all std nodes:
OpenShift::MissingElementError error reading /var/lib/openshift/562226120e78864f6700019b/php/metadata/manifest.yml: Version is a required element
The error appears to be occurring with failed app-destroys.
[firstname.lastname@example.org ~]# grep 562226120e78864f6700019b /var/log/openshift/node/platform.log |grep delet
October 20 17:21:48 INFO [request_id=1b4417fdcb3fa725035d2c10d3eeef38,app_uuid=562226120e78864f6700019b] Failure while deleting gear 562226120e78864f6700019b: '' is not a legal cartridge identifier
There are many of these errors in the platform logs, as it tries to delete the app throughout the day.
The issue can be fixed by running 'oo-admin-gear destroygear -c $UUID' to remove the gear. Then the app destroy finishes automatically.
Version-Release number of selected component (if applicable):
It appears on the most heavily-used nodes in INT. use-std-node1,2,3. It is not yet present in STG.
Steps to Reproduce:
1. Create apps in INT as part of regular QE testing.
2. Ops can run 'oo-accept-node -v' to look for "Version is a required element"
App is never deleted. Mcollective tries forever to delete the apps.
Apps should finish deleting, even if '' is an illegal cartridge identifier.
Commit pushed to master at https://github.com/openshift/origin-server
FrontendHttpServer: Recover from missing manifest
OpenShift::Runtime::FrontendHttpServer#initialize: Rescue any exception
from initializing the cartridge model, and set @standalone_web_proxy to
false in that contingency.
Before this commit, failure to initialize the cartridge model would cause
a failure to initialize the frontend http server, which would cause
a failure to initialize the container plugin, which would prevent the
container plugin's destroy method from finishing. Consequently, it was
impossible to delete a gear with a bad manifest.yml file.
This commit fixes bug 1273658.
1. create an app
2. delete below in manifest.yaml
3. rhc delete app PHP
4. app is delete successfully
5. check the log
grep 56318c088636d89b2a000051 /var/log/openshift/node/platform.log
October 28 23:01:41 INFO Shell command 'quota -p --always-resolve -w 56318c088636d89b2a000051' ran. rc=0 out=Disk quotas for user 56318c088636d89b2a000051 (uid 1000):
October 28 23:12:36 WARN Failure while deleting gear 56318c088636d89b2a000051: Version is a required element
October 28 23:12:36 INFO Failure while deleting gear 56318c088636d89b2a000051: Version is a required element
October 28 23:12:36 INFO Shell command 'rm /var/lib/openshift/.last_access/56318c088636d89b2a000051' ran. rc=1 out=
1802249 600 56318c088636d89b2a000051 56318c088636d89b2a000051 56318c088636d89b2a000051 56318c088636d89b2a000051
October 28 23:12:38 INFO Shell command 'userdel --remove -f "56318c088636d89b2a000051"' ran. rc=0 out=
could you help to confirm the step is OK?
To test, I truncated manifest.yml so that it was empty, which was the initial cause of the problem reported (at least it was for some gears we looked at). However, simply deleting the "Version:" field should trigger the same error.
You should still see the error message in the logs. However, the node runtime should continue after it encounters the error and ultimately delete the gear, so the gear should be gone (and /var/lib/openshift/56318c088636d89b2a000051 should have been removed, along with frontend configuration etc.) after the rhc command finishes.
Other than that, your verification procedure looks good.
thanks for your info