Bug 1273658 - Failure while deleting gear: '' is not a legal cartridge identifier
Failure while deleting gear: '' is not a legal cartridge identifier
Status: CLOSED CURRENTRELEASE
Product: OpenShift Online
Classification: Red Hat
Component: Containers (Show other bugs)
2.x
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Miciah Dashiel Butler Masters
chaoyang
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-10-20 18:13 EDT by Stefanie Forrester
Modified: 2016-01-29 11:46 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-01-29 11:46:40 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Summary of affected INT apps (7.47 KB, text/plain)
2015-10-20 18:13 EDT, Stefanie Forrester
no flags Details

  None (edit)
Description Stefanie Forrester 2015-10-20 18:13:41 EDT
Created attachment 1084935 [details]
Summary of affected INT apps

Description of problem:
In the INT environment, I'm seeing these accept-node errors on all std nodes:

OpenShift::MissingElementError error reading /var/lib/openshift/562226120e78864f6700019b/php/metadata/manifest.yml: Version is a required element

The error appears to be occurring with failed app-destroys.

[root@use-std-node3.int ~]# grep 562226120e78864f6700019b /var/log/openshift/node/platform.log |grep delet
October 20 17:21:48 INFO [request_id=1b4417fdcb3fa725035d2c10d3eeef38,app_uuid=562226120e78864f6700019b] Failure while deleting gear 562226120e78864f6700019b: '' is not a legal cartridge identifier

There are many of these errors in the platform logs, as it tries to delete the app throughout the day.

The issue can be fixed by running 'oo-admin-gear destroygear -c $UUID' to remove the gear. Then the app destroy finishes automatically.

Version-Release number of selected component (if applicable):
openshift-origin-node-util-1.38.4-1.el6oso.noarch

How reproducible: 
It appears on the most heavily-used nodes in INT. use-std-node1,2,3. It is not yet present in STG.

Steps to Reproduce:
1. Create apps in INT as part of regular QE testing.
2. Ops can run 'oo-accept-node -v' to look for "Version is a required element"
3.

Actual results:
App is never deleted. Mcollective tries forever to delete the apps.

Expected results:
Apps should finish deleting, even if '' is an illegal cartridge identifier.

Additional info:
Comment 1 Miciah Dashiel Butler Masters 2015-10-21 19:55:30 EDT
PR: https://github.com/openshift/origin-server/pull/6285
Comment 2 openshift-github-bot 2015-10-22 17:45:23 EDT
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/650f8368aeeabc0e7a9757b3bdcf993962838efc
FrontendHttpServer: Recover from missing manifest

OpenShift::Runtime::FrontendHttpServer#initialize: Rescue any exception
from initializing the cartridge model, and set @standalone_web_proxy to
false in that contingency.

Before this commit, failure to initialize the cartridge model would cause
a failure to initialize the frontend http server, which would cause
a failure to initialize the container plugin, which would prevent the
container plugin's destroy method from finishing.  Consequently, it was
impossible to delete a gear with a bad manifest.yml file.

This commit fixes bug 1273658.
Comment 3 chaoyang 2015-10-29 00:44:40 EDT
verify step:
1. create an app
2. delete below in manifest.yaml
Version: ‘5.4‘
3. rhc delete app PHP
4. app is delete successfully
5. check the log 
grep 56318c088636d89b2a000051 /var/log/openshift/node/platform.log

October 28 23:01:41 INFO Shell command 'quota -p --always-resolve -w 56318c088636d89b2a000051' ran. rc=0 out=Disk quotas for user 56318c088636d89b2a000051 (uid 1000): 
October 28 23:12:36 WARN Failure while deleting gear 56318c088636d89b2a000051: Version is a required element
October 28 23:12:36 INFO Failure while deleting gear 56318c088636d89b2a000051: Version is a required element
October 28 23:12:36 INFO Shell command 'rm /var/lib/openshift/.last_access/56318c088636d89b2a000051' ran. rc=1 out=
1802249    600        56318c088636d89b2a000051 56318c088636d89b2a000051 56318c088636d89b2a000051 56318c088636d89b2a000051
October 28 23:12:38 INFO Shell command 'userdel --remove -f "56318c088636d89b2a000051"' ran. rc=0 out=

version:
devenv-stage-1188

could you help to confirm the step is OK?
Comment 4 Miciah Dashiel Butler Masters 2015-10-29 12:32:27 EDT
To test, I truncated manifest.yml so that it was empty, which was the initial cause of the problem reported (at least it was for some gears we looked at).  However, simply deleting the "Version:" field should trigger the same error.

You should still see the error message in the logs.  However, the node runtime should continue after it encounters the error and ultimately delete the gear, so the gear should be gone (and /var/lib/openshift/56318c088636d89b2a000051 should have been removed, along with frontend configuration etc.) after the rhc command finishes.

Other than that, your verification procedure looks good.
Comment 5 chaoyang 2015-11-04 00:11:45 EST
thanks for your info

Note You need to log in before you can comment on or make changes to this bug.