Description of problem: We're sporadically getting this error when trying to create apps in PROD. DEBUG: rhc app create -k -a 'd1308070951drupal' -t 'php-5.3' Application Options ------------------- Namespace: openshiftnagios Cartridges: php-5.3 Gear Size: default Scaling: no Creating application 'd1308070951drupal' ... Unable to complete the requested operation due to: Cannot validate input uid: value should be a number. Reference ID: 58852561d1deebdb2171df7fb28dd1e9 Reference ID: 58852561d1deebdb2171df7fb28dd1e9 DEBUG: rhc cartridge add -k -a 'd1308070951drupal' -c 'mysql-5.1' Adding mysql-5.1 to application 'd1308070951drupal' ... Application 'd1308070951drupal' not found. Adding mysql-5.1 to application 'd1308070951drupal' ... Application 'd1308070951drupal' not found. Exception: No such file or directory - /tmp/d20130807-2873-1mwdmsh/d1308070951drupal /usr/local/lib/rhc_helper.rb:183:in `chdir' /usr/local/lib/rhc_helper.rb:183:in `create_drupal' /usr/local/bin/nagios-ctl-app:132:in `main' /usr/local/bin/nagios-ctl-app:161 Connection to ex-srv1.prod.rhcloud.com closed. Exit Code: 0 Version-Release number of selected component (if applicable): We saw it in both 2.0.30 and after upgrading to 2.0.31. How reproducible: Very sporadic, only seen in PROD. Steps to Reproduce: 1. unknown, just seen in PROD. Actual results: Sporadically fails with "Cannot validate input uid" Expected results: No error
Can you please provide the broker, mcollective and platform (node) logs for a request that fails with this error?
Providing snippets of logs from the broker and mcollective in the previous comment. Seems like useradd is failing because it is unable to lock /etc/passwd .
Most likely some of these files were left behind by a crashing useradd/userdel operation. /etc/passwd.lock /etc/shadow.lock /etc/group.lock /etc/gshadow.lock
Have we seen this issue again in PROD? Thanks, Mrunal
We've changed how we're doing these creates now. We're now using the --from-code method of creating a drupal quickstart. So, no, we're no longer seeing this problem, but I don't know if that's because it's fixed or because we changed how we're doing the creates.
Lowering severity for now since we haven't been able to root cause it looking at the logs and also we haven't seen it again. It will be easier to debug, if we can inspect the nodes if/when the issue is seen.
Hey Thomas, I'm at a client site right now and they've encountered this. Can you please tell me more information about in which component this --from-code method was changed in? I'd like to settle this one. I've attached a full stack trace in the hopes it helps.
Created attachment 802523 [details] Stack trace from gear creation providing more information on this error
Dan Trainor, Did you see any of these files when the issue occured? /etc/passwd.lock /etc/shadow.lock /etc/group.lock /etc/gshadow.lock If so, do you have timestamps so we can correlate with the node call?
The message "Cannot validate input uid: value should be a number" is being emitted from the MCollective client on the Broker when attempting to query a Node. From the openshift.ddl, action "has_uid_or_gid", :description => "Returns whether this system has already taken the uid or gid" do display :always input :uid, :prompt => "uid/gid", :description => "uid/gid", :type => :number, :optional => false mcollective_application_container_proxy.rb#has_uid_or_gid? is populating +uid+ with some value that is not considered :numeric by MCollective validators.
This seems to be a duplicate of bug 1039641 and was fixed by Dan with --> https://github.com/openshift/origin-server/pull/4300
*** This bug has been marked as a duplicate of bug 1039641 ***