Created attachment 786189 [details] mcollective log for deconfigure operation Description of problem: Applications are sometimes halfway created - when this occurs, the deconfigure operation throws rescues: E, [2013-08-12T23:01:51.094286 #3760] ERROR -- : openshift.rb:302:in `rescue in with_container_from_args' User does not exist in cgroups: 5207709d5973cad178000078 User does not exist in cgroups: 5207709d5973cad178000078 E, [2013-08-12T23:01:51.236228 #3760] ERROR -- : openshift.rb:963:in `rescue in has_app_cartridge_action' can't find user for 5207709d5973cad178000078 {"--with-app-uuid"=>"5207709d5973cad178000078", "--with-container-uuid"=>"5207709d5973cad178000078", Version-Release number of selected component (if applicable): rubygem-openshift-origin-node-1.12.10-1.el6oso.noarch How reproducible: sometimes Steps to Reproduce: 1. Wait for an application to be halfway created 2. Review mocllecitve logs Actual results: Application is left on the node Expected results: Application should be removed Additional info: see attached logfile
Could you elaborate on what you mean by 'Applications are sometimes halfway created'? Is there a way to reproduce this error? Even if it is not reliably reproducible, it is better than guessing what you might have done when you saw this error. The first sign of problem is not the quoted part. It is here: E, [2013-08-12T23:01:27.193240 #3760] ERROR -- : openshift.rb:302:in `rescue in with_container_from_args' CLIENT_ERROR: Unexpected error: User does not exist in cgroups: 5207709d5973cad178000078 CLIENT_ERROR: Unexpected error: User does not exist in cgroups: 5207709d5973cad178000078 {"--with-app-uuid"=>"5207709d5973cad178000078", "--with-container-uuid"=>"5207709d5973cad178000078", The subsequent operations involving this user, including "deconfigure", would thus fail. We need to figure out why the user doesn't exist. After this is observed, what sort of state is the application in? Does it exist? If so, can it be removed?
Sten, Have you had a chance to look at this?
This bz can be closed, we haven't been able to reproduce at all. If it re-occurs, we'll open a new bz.