Created attachment 691550 [details] development.log Description of problem: After server is updated and migrated, the "Node execution failure" error is continuously seen when creating new scalable applications. However, creating non-scalable applications are ok Version-Release number of selected component (if applicable): Upgrading devenv-stage_281 to latest devenv rhc-broker-1.4.3-1.el6_3.noarch rhc-node-1.4.3-1.el6_3.x86_64 How reproducible: Always(tested 3 times, reproduced 3 times) Steps to Reproduce: 1. Launch devenv-stage_281 instance, create scalable apps against it. 2. SSH into instance, modify devenv.repo to candidate to upgrade to latest available devenv sed -i 's/stage/candidate/g' /etc/yum.repos.d/devenv.repo 3. yum -y update 4. restart rhc-datastore since mongodb-server is updated as well 5. cd /var/www/openshift/broker; rake tmp:clear 6. Execute migrate-mongo-2.0.23 and migrate-dynect-2.0.23 7. rhc-admin-migrate --version 2.0.23(this script is broken by now, but will not affect reproducing the bug) 8. Create a new scalable application Actual results: [hjw@hjwlaptop devenv]$ rhc app create php2s php-5.3 -s -px Application Options ------------------- Namespace: 281t1 Cartridges: php-5.3 Gear Size: default Scaling: yes Creating application 'php2s' ... Node execution failure (invalid exit code from node). If the problem persists please contact Red Hat support. Expected results: Should not fail. Additional info: attached development.log and mcollective.log
Created attachment 691551 [details] mcollective log
The sequence of events in the MCollective logs indicate some sort of broker issue: 1. The broker sends an app-create message which is processed successfully by the node library, and a response is sent back to the broker. 2. The broker then sends a duplicate app-create message. The node library fails to create the application as the gear user already exists as of step 1, which is correct behavior. 3. The broker receives the failed reply from step 2 and rolls back app creation, deleting the skeletal app and user created in step 1 due to the message payload duplication. The overall failure appears to be due to the duplicate app-create messages sent in sequence, and is why I am reassigning this to the broker team for further analysis.
Found the issue. The steps need 'service mcollective restart' on the nodes before broker cache is cleared. Since the cartridge model has changed, mcollective needs to reload new models to gather data about new fields.
Problem solved! Have to restart mcollective before broker cache is cleared. Then I was able to create scalable applications. Moving this bug to verified