Bug 1273658 - Failure while deleting gear: '' is not a legal cartridge identifier
Summary: Failure while deleting gear: '' is not a legal cartridge identifier
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Containers
Version: 2.x
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Miciah Dashiel Butler Masters
QA Contact: Chao Yang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-10-20 22:13 UTC by Stefanie Forrester
Modified: 2016-01-29 16:46 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-01-29 16:46:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Summary of affected INT apps (7.47 KB, text/plain)
2015-10-20 22:13 UTC, Stefanie Forrester
no flags Details

Description Stefanie Forrester 2015-10-20 22:13:41 UTC
Created attachment 1084935 [details]
Summary of affected INT apps

Description of problem:
In the INT environment, I'm seeing these accept-node errors on all std nodes:

OpenShift::MissingElementError error reading /var/lib/openshift/562226120e78864f6700019b/php/metadata/manifest.yml: Version is a required element

The error appears to be occurring with failed app-destroys.

[root ~]# grep 562226120e78864f6700019b /var/log/openshift/node/platform.log |grep delet
October 20 17:21:48 INFO [request_id=1b4417fdcb3fa725035d2c10d3eeef38,app_uuid=562226120e78864f6700019b] Failure while deleting gear 562226120e78864f6700019b: '' is not a legal cartridge identifier

There are many of these errors in the platform logs, as it tries to delete the app throughout the day.

The issue can be fixed by running 'oo-admin-gear destroygear -c $UUID' to remove the gear. Then the app destroy finishes automatically.

Version-Release number of selected component (if applicable):
openshift-origin-node-util-1.38.4-1.el6oso.noarch

How reproducible: 
It appears on the most heavily-used nodes in INT. use-std-node1,2,3. It is not yet present in STG.

Steps to Reproduce:
1. Create apps in INT as part of regular QE testing.
2. Ops can run 'oo-accept-node -v' to look for "Version is a required element"
3.

Actual results:
App is never deleted. Mcollective tries forever to delete the apps.

Expected results:
Apps should finish deleting, even if '' is an illegal cartridge identifier.

Additional info:

Comment 1 Miciah Dashiel Butler Masters 2015-10-21 23:55:30 UTC
PR: https://github.com/openshift/origin-server/pull/6285

Comment 2 openshift-github-bot 2015-10-22 21:45:23 UTC
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/650f8368aeeabc0e7a9757b3bdcf993962838efc
FrontendHttpServer: Recover from missing manifest

OpenShift::Runtime::FrontendHttpServer#initialize: Rescue any exception
from initializing the cartridge model, and set @standalone_web_proxy to
false in that contingency.

Before this commit, failure to initialize the cartridge model would cause
a failure to initialize the frontend http server, which would cause
a failure to initialize the container plugin, which would prevent the
container plugin's destroy method from finishing.  Consequently, it was
impossible to delete a gear with a bad manifest.yml file.

This commit fixes bug 1273658.

Comment 3 Chao Yang 2015-10-29 04:44:40 UTC
verify step:
1. create an app
2. delete below in manifest.yaml
Version: ‘5.4‘
3. rhc delete app PHP
4. app is delete successfully
5. check the log 
grep 56318c088636d89b2a000051 /var/log/openshift/node/platform.log

October 28 23:01:41 INFO Shell command 'quota -p --always-resolve -w 56318c088636d89b2a000051' ran. rc=0 out=Disk quotas for user 56318c088636d89b2a000051 (uid 1000): 
October 28 23:12:36 WARN Failure while deleting gear 56318c088636d89b2a000051: Version is a required element
October 28 23:12:36 INFO Failure while deleting gear 56318c088636d89b2a000051: Version is a required element
October 28 23:12:36 INFO Shell command 'rm /var/lib/openshift/.last_access/56318c088636d89b2a000051' ran. rc=1 out=
1802249    600        56318c088636d89b2a000051 56318c088636d89b2a000051 56318c088636d89b2a000051 56318c088636d89b2a000051
October 28 23:12:38 INFO Shell command 'userdel --remove -f "56318c088636d89b2a000051"' ran. rc=0 out=

version:
devenv-stage-1188

could you help to confirm the step is OK?

Comment 4 Miciah Dashiel Butler Masters 2015-10-29 16:32:27 UTC
To test, I truncated manifest.yml so that it was empty, which was the initial cause of the problem reported (at least it was for some gears we looked at).  However, simply deleting the "Version:" field should trigger the same error.

You should still see the error message in the logs.  However, the node runtime should continue after it encounters the error and ultimately delete the gear, so the gear should be gone (and /var/lib/openshift/56318c088636d89b2a000051 should have been removed, along with frontend configuration etc.) after the rhc command finishes.

Other than that, your verification procedure looks good.

Comment 5 Chao Yang 2015-11-04 05:11:45 UTC
thanks for your info


Note You need to log in before you can comment on or make changes to this bug.