Bug 918764 - Unable to delete an app when userid is missing from /etc/passwd
Summary: Unable to delete an app when userid is missing from /etc/passwd
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OKD
Classification: Red Hat
Component: Containers
Version: 2.x
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Paul Morie
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-03-06 20:02 UTC by Thomas Wiest
Modified: 2015-05-14 23:06 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-03-15 13:53:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Thomas Wiest 2013-03-06 20:02:39 UTC
Description of problem:
Our monitoring in INT created an application, but is now unable to delete it. 


When I try to manually delete the app, I get this:
$ rhc app delete -k -p $pass --confirm -a chkexsrv2
Deleting application 'chkexsrv2' ... The server did not respond correctly. This may be an issue with the server configuration or with your connection to
the server (such as a Web proxy or firewall). Please verify that you can access the OpenShift server
https://localhost/broker/rest/domains/nagiosmonitor/applications/chkexsrv2



The mcollective log from this host looks like this:
I, [2013-03-06T14:55:51.016675 #13831]  INFO -- : openshift.rb:34:in `cartridge_do_action' cartridge_do_action call / action: cartridge_do, agent=openshift, data={:cartridge=>"openshift-origin-node",
 :action=>"app-destroy",
 :args=>
  {"--with-app-uuid"=>"5136b0c76cec0e399f0000df",
   "--with-app-name"=>"chkexsrv2",
   "--with-container-uuid"=>"5136b0c76cec0e399f0000df",
   "--with-container-name"=>"chkexsrv2",
   "--with-namespace"=>"nagiosmonitor",
   "--with-uid"=>5388,
   "--with-request-id"=>"03e4855dda03cbb6373817bf1052aa96",
   "--cart-name"=>"openshift-origin-node"},
 :process_results=>true}

I, [2013-03-06T14:55:51.017030 #13831]  INFO -- : openshift.rb:35:in `cartridge_do_action' cartridge_do_action validation = openshift-origin-node app-destroy {"--with-app-uuid"=>"5136b0c76cec0e399f0000df", "--with-app-name"=>"chkexsrv2", "--with-container-uuid"=>"5136b0c76cec0e399f0000df", "--with-container-name"=>"chkexsrv2", "--with-namespace"=>"nagiosmonitor", "--with-uid"=>5388, "--with-request-id"=>"03e4855dda03cbb6373817bf1052aa96", "--cart-name"=>"openshift-origin-node"}
I, [2013-03-06T14:55:51.017362 #13831]  INFO -- : openshift.rb:74:in `execute_action' Executing action [app-destroy] using method oo_app_destroy with args [{"--with-app-uuid"=>"5136b0c76cec0e399f0000df", "--with-app-name"=>"chkexsrv2", "--with-container-uuid"=>"5136b0c76cec0e399f0000df", "--with-container-name"=>"chkexsrv2", "--with-namespace"=>"nagiosmonitor", "--with-uid"=>5388, "--with-request-id"=>"03e4855dda03cbb6373817bf1052aa96", "--cart-name"=>"openshift-origin-node"}]
I, [2013-03-06T14:55:51.027281 #13831]  INFO -- : openshift.rb:168:in `rescue in oo_app_destroy' ERROR: unable to destroy user account 5136b0c76cec0e399f0000df
I, [2013-03-06T14:55:51.028915 #13831]  INFO -- : openshift.rb:169:in `rescue in oo_app_destroy' ["/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.5.13/lib/openshift-origin-node/model/unix_user.rb:171:in `destroy'", "/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.5.13/lib/openshift-origin-node/model/v1_cart_model.rb:61:in `destroy'", "/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.5.13/lib/openshift-origin-node/model/application_container.rb:127:in `destroy'", "/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:166:in `oo_app_destroy'", "/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:76:in `execute_action'", "/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:48:in `cartridge_do_action'", "/opt/rh/ruby193/root/usr/share/ruby/mcollective/rpc/agent.rb:86:in `handlemsg'", "/opt/rh/ruby193/root/usr/share/ruby/mcollective/agents.rb:126:in `block (2 levels) in dispatch'", "/opt/rh/ruby193/root/usr/share/ruby/timeout.rb:68:in `timeout'", "/opt/rh/ruby193/root/usr/share/ruby/mcollective/agents.rb:125:in `block in dispatch'"]
I, [2013-03-06T14:55:51.029095 #13831]  INFO -- : openshift.rb:83:in `execute_action' Finished executing action [app-destroy] (-1)
I, [2013-03-06T14:55:51.029263 #13831]  INFO -- : openshift.rb:56:in `cartridge_do_action' cartridge_do_action failed (-1)
------
ERROR: unable to destroy user account 5136b0c76cec0e399f0000df
------)
I, [2013-03-06T14:55:54.171602 #13831]  INFO -- : openshift.rb:34:in `cartridge_do_action' cartridge_do_action call / action: cartridge_do, agent=openshift, data={:cartridge=>"openshift-origin-node",
 :action=>"app-destroy",
 :args=>
  {"--with-app-uuid"=>"5136b0c76cec0e399f0000df",
   "--with-app-name"=>"chkexsrv2",
   "--with-container-uuid"=>"5136b0c76cec0e399f0000df",
   "--with-container-name"=>"chkexsrv2",
   "--with-namespace"=>"nagiosmonitor",
   "--with-uid"=>5388,
   "--with-request-id"=>"387867483a843088764957a26403f486",
   "--cart-name"=>"openshift-origin-node"},
 :process_results=>true}

I, [2013-03-06T14:55:54.172302 #13831]  INFO -- : openshift.rb:35:in `cartridge_do_action' cartridge_do_action validation = openshift-origin-node app-destroy {"--with-app-uuid"=>"5136b0c76cec0e399f0000df", "--with-app-name"=>"chkexsrv2", "--with-container-uuid"=>"5136b0c76cec0e399f0000df", "--with-container-name"=>"chkexsrv2", "--with-namespace"=>"nagiosmonitor", "--with-uid"=>5388, "--with-request-id"=>"387867483a843088764957a26403f486", "--cart-name"=>"openshift-origin-node"}
I, [2013-03-06T14:55:54.172674 #13831]  INFO -- : openshift.rb:74:in `execute_action' Executing action [app-destroy] using method oo_app_destroy with args [{"--with-app-uuid"=>"5136b0c76cec0e399f0000df", "--with-app-name"=>"chkexsrv2", "--with-container-uuid"=>"5136b0c76cec0e399f0000df", "--with-container-name"=>"chkexsrv2", "--with-namespace"=>"nagiosmonitor", "--with-uid"=>5388, "--with-request-id"=>"387867483a843088764957a26403f486", "--cart-name"=>"openshift-origin-node"}]
I, [2013-03-06T14:55:54.178087 #13831]  INFO -- : openshift.rb:168:in `rescue in oo_app_destroy' ERROR: unable to destroy user account 5136b0c76cec0e399f0000df
I, [2013-03-06T14:55:54.178311 #13831]  INFO -- : openshift.rb:169:in `rescue in oo_app_destroy' ["/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.5.13/lib/openshift-origin-node/model/unix_user.rb:171:in `destroy'", "/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.5.13/lib/openshift-origin-node/model/v1_cart_model.rb:61:in `destroy'", "/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.5.13/lib/openshift-origin-node/model/application_container.rb:127:in `destroy'", "/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:166:in `oo_app_destroy'", "/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:76:in `execute_action'", "/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:48:in `cartridge_do_action'", "/opt/rh/ruby193/root/usr/share/ruby/mcollective/rpc/agent.rb:86:in `handlemsg'", "/opt/rh/ruby193/root/usr/share/ruby/mcollective/agents.rb:126:in `block (2 levels) in dispatch'", "/opt/rh/ruby193/root/usr/share/ruby/timeout.rb:68:in `timeout'", "/opt/rh/ruby193/root/usr/share/ruby/mcollective/agents.rb:125:in `block in dispatch'"]
I, [2013-03-06T14:55:54.178579 #13831]  INFO -- : openshift.rb:83:in `execute_action' Finished executing action [app-destroy] (-1)
I, [2013-03-06T14:55:54.178746 #13831]  INFO -- : openshift.rb:56:in `cartridge_do_action' cartridge_do_action failed (-1)
------
ERROR: unable to destroy user account 5136b0c76cec0e399f0000df
------)


Version-Release number of selected component (if applicable):
rhc-node-1.5.8-1.el6oso.x86_64


How reproducible:
unknown. Once the app is stuck like this, all attempts to destroy it fail.


Steps to Reproduce:
1. unknown
 

Actual results:
Can't delete the app.


Expected results:
Should always be able to delete an app.

Comment 1 Paul Morie 2013-03-06 22:52:12 UTC
The root cause for the error message being observed is that the system is in a state where the user is missing from /etc/passwd but the broker still considers the application to exist.  The UnixUser class reads /etc/passwd to get the location of the user's home directory.  Because the user uid is passed in from mcollection, the logic which is intended to check for the case of an already existing user passes:

https://github.com/openshift/origin-server/blob/master/node/lib/openshift-origin-node/model/unix_user.rb#L164

The exception on the following line is then raised because @homedir is nil.

Comment 2 Paul Morie 2013-03-06 22:53:20 UTC
Addendum to above comment:

I trying to determine how the system got into this state, as this could be due to another bug.

Comment 4 Paul Morie 2013-03-07 16:43:52 UTC
Pull request has been merged.

Comment 5 Meng Bo 2013-03-08 02:43:37 UTC
Checked on devenv-stage_313, issue has been fixed.

1.Create app
2.Remove the user of the gear from /etc/passwd
3.Try to delete the app

App is deleted successfully.

Mark bug as verified.


Note You need to log in before you can comment on or make changes to this bug.