906751 – "Node execution failure" problem when creating new scalable apps after server upgrade and migration

Bug 906751 - "Node execution failure" problem when creating new scalable apps after server upgrade and migration

Summary: "Node execution failure" problem when creating new scalable apps after server...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OKD
Classification:	Red Hat
Component:	Pod
Sub Component:
Version:	2.x
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Rajat Chopra
QA Contact:	libra bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-02-01 11:33 UTC by Jianwei Hou
Modified:	2015-05-15 02:13 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-02-13 23:00:00 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
development.log (88.31 KB, text/x-log) 2013-02-01 11:33 UTC, Jianwei Hou	no flags	Details
mcollective log (56.91 KB, text/x-log) 2013-02-01 11:33 UTC, Jianwei Hou	no flags	Details
View All

Description Jianwei Hou 2013-02-01 11:33:13 UTC

Created attachment 691550 [details]
development.log

Description of problem:
After server is updated and migrated, the "Node execution failure" error is continuously seen when creating new scalable applications.
However, creating non-scalable applications are ok

Version-Release number of selected component (if applicable):
Upgrading devenv-stage_281 to latest devenv
rhc-broker-1.4.3-1.el6_3.noarch
rhc-node-1.4.3-1.el6_3.x86_64

How reproducible:
Always(tested 3 times, reproduced 3 times)

Steps to Reproduce:
1. Launch devenv-stage_281 instance, create scalable apps against it.
2. SSH into instance, modify devenv.repo to candidate to upgrade to latest available devenv
sed -i 's/stage/candidate/g' /etc/yum.repos.d/devenv.repo
3. yum -y update
4. restart rhc-datastore since mongodb-server is updated as well
5. cd /var/www/openshift/broker; rake tmp:clear
6. Execute migrate-mongo-2.0.23 and migrate-dynect-2.0.23
7. rhc-admin-migrate --version 2.0.23(this script is broken by now, but will not affect reproducing the bug)
8. Create a new scalable application

  
Actual results:
[hjw@hjwlaptop devenv]$ rhc app create php2s php-5.3 -s -px
Application Options
-------------------
  Namespace:  281t1
  Cartridges: php-5.3
  Gear Size:  default
  Scaling:    yes

Creating application 'php2s' ... Node execution failure (invalid exit code from node).  If the problem persists please contact Red
Hat support.


Expected results:
Should not fail.

Additional info:
attached development.log and mcollective.log

Comment 1 Jianwei Hou 2013-02-01 11:33:38 UTC

Created attachment 691551 [details]
mcollective log

Comment 2 Dan Mace 2013-02-01 19:30:00 UTC

The sequence of events in the MCollective logs indicate some sort of broker issue:

1. The broker sends an app-create message which is processed successfully by the node library, and a response is sent back to the broker.
2. The broker then sends a duplicate app-create message. The node library fails to create the application as the gear user already exists as of step 1, which is correct behavior.
3. The broker receives the failed reply from step 2 and rolls back app creation, deleting the skeletal app and user created in step 1 due to the message payload duplication.

The overall failure appears to be due to the duplicate app-create messages sent in sequence, and is why I am reassigning this to the broker team for further analysis.

Comment 3 Rajat Chopra 2013-02-02 00:25:04 UTC

Found the issue. The steps need 'service mcollective restart' on the nodes before broker cache is cleared.
Since the cartridge model has changed, mcollective needs to reload new models to gather data about new fields.

Comment 4 Jianwei Hou 2013-02-04 07:23:24 UTC

Problem solved!

Have to restart mcollective before broker cache is cleared. Then I was able to create scalable applications.
Moving this bug to verified

Note You need to log in before you can comment on or make changes to this bug.