Bug 869862

Summary:	Tidy error in logs running test suite
Product:	OKD	Reporter:	Dan McPherson <dmcphers>
Component:	Containers	Assignee:	Rob Millner <rmillner>
Status:	CLOSED CURRENTRELEASE	QA Contact:	libra bugs <libra-bugs>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	2.x	CC:	mfisher, xtian
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2012-11-06 18:48:33 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Dan McPherson 2012-10-25 01:22:44 UTC

Description of problem:

Ex:


https://ci.dev.openshift.redhat.com/jenkins/job/merge_pull_requests/795/artifact/li/rhc/log/broker/development.log



Details:

[11931] DEBUG DEBUG: [#<MCollective::RPC::Result:0x7f46ffa6a408 @results={:statusmsg=>"cartridge_do_action failed 121.  Output CLIENT_MESSAGE: Stopping app...\nCLIENT_MESSAGE: Running 'git prune'\n/usr/libexec/openshift/cartridges/abstract/info/bin/tidy.sh: line 14: git: command not found\nCLIENT_MESSAGE: Running 'git gc --aggressive'\n/usr/libexec/openshift/cartridges/abstract/info/bin/tidy.sh: line 16: git: command not found\n/usr/libexec/openshift/cartridges/abstract/info/bin/tidy.sh: line 29: awk: command not found\n/usr/libexec/openshift/cartridges/abstract/info/bin/tidy.sh: line 38: awk: command not found\n/usr/libexec/openshift/cartridges/abstract/info/lib/util: line 846: id: command not found\nFailed to start php-5.3\n", :data=>{:output=>"CLIENT_MESSAGE: Stopping app...\nCLIENT_MESSAGE: Running 'git prune'\n/usr/libexec/openshift/cartridges/abstract/info/bin/tidy.sh: line 14: git: command not found\nCLIENT_MESSAGE: Running 'git gc --aggressive'\n/usr/libexec/openshift/cartridges/abstract/info/bin/tidy.sh: line 16: git: command not found\n/usr/libexec/openshift/cartridges/abstract/info/bin/tidy.sh: line 29: awk: command not found\n/usr/libexec/openshift/cartridges/abstract/info/bin/tidy.sh: line 38: awk: command not found\n/usr/libexec/openshift/cartridges/abstract/info/lib/util: line 846: id: command not found\nFailed to start php-5.3\n", :exitcode=>121}, :statuscode=>1, :sender=>"ip-10-98-89-254"}, @action="cartridge_do", @agent="openshift">]
[11931] DEBUG DEBUG: server results: /usr/libexec/openshift/cartridges/abstract/info/bin/tidy.sh: line 14: git: command not found
[11931] DEBUG DEBUG: server results: /usr/libexec/openshift/cartridges/abstract/info/bin/tidy.sh: line 16: git: command not found
[11931] DEBUG DEBUG: server results: /usr/libexec/openshift/cartridges/abstract/info/bin/tidy.sh: line 29: awk: command not found
[11931] DEBUG DEBUG: server results: /usr/libexec/openshift/cartridges/abstract/info/bin/tidy.sh: line 38: awk: command not found
[11931] DEBUG DEBUG: server results: /usr/libexec/openshift/cartridges/abstract/info/lib/util: line 846: id: command not found
[11931] DEBUG DEBUG: server results: Failed to start php-5.3
[11931] DEBUG DEBUG: Cartridge command php-5.3::tidy exitcode = 121
[11931] ERROR Node execution failure (invalid exit code from node).  If the problem persists please contact Red Hat support.
[11931] ERROR #<OpenShift::NodeException: Node execution failure (invalid exit code from node).  If the problem persists please contact Red Hat support.>




Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Run the devenv test cases
2. Looks in the log
3.

Comment 1 Rob Millner 2012-10-25 18:11:17 UTC

It looks like the path disappeared for this app before tidy was called.  I'll see if I can reproduce it on devenv.

Comment 2 Rob Millner 2012-10-25 19:42:51 UTC

It seems like we're trying to re-use applications which causes tests to step on each other.

The following race condition is taken from the broker log.

1. Begin creating domain ci98745438 for cucumber-test_ci98745438

2. Restart app test for account cucumber-test_ci98745438
   - this failed, no app named "test"

3. Finish creating domain ci98745438 for cucumber-test_ci98745438
   - Finishes step 1

4. Restart app test for account "cucumber-test_ci98745438
   - this failed, no app named "test"

5. Get user info for cucumber-test_ci98745438

6. Begin creating app test for cucumber-test_ci98745438.

7. Begin restarting app test for cucumber-test_ci98745438.

8. Begin creating gears for app test for cucumber-test_ci98745438.
   - Continuation from step 6

9. Fail restarting app test
   - Fails step 7

10. Mcollective create app test for cucumber-test_ci98745438.
   - Continuation from step 8

11. Begin tidy for app test for cucumber-test_ci98745438.

12. Begin configuring php cartridge for app test for cucumber-test_ci98745438.
    - Continuation from step 10

13. Fail tidy app test
    - Fails step 11

14. Begin tidy for app test for cucumber-test_ci98745438.
    - Retry step 13

15. Finish configuring php for app test
    - Finishes step 12

16. Finish tydy for app test
    - Finishes step 13

...it goes on like this at some length.

Comment 3 Rob Millner 2012-10-25 21:51:34 UTC

The command immediately before step 1 above access status of the php cartridge for app "test" in domain "ci83265489" for user "cucumber-test_ci83265489".

Working backward through the logs, user "cucumber-test_ci83265489" progresses through the other tidy steps.  It looks like the cartridge lifecycle test switched application IDs in the middle of a test.

Comment 4 Rob Millner 2012-10-26 17:42:02 UTC

Its actually worse than just a race condition.  Any caller doing "Given an existing $FOO application" will get the one with the randomly generated domain name that sorts the lowest.

So if you have three apps: A, B and C that are all created and tested in parallel; then all tests will use A.

Further, if you have three test apps created and tested in a sequence: A, B and C; all tests will use A.

Comment 5 Rob Millner 2012-10-26 18:31:19 UTC

A quick audit of all test cases shows that any feature file which looks for an existing app of a certain type also creates and destroys that app.  It should be the case that each cucumber invocation gets its own unique apps.

Tests show that the two obvious globals for achieving that: @apps and @app, are wiped out between scenarios.  As is any other global.  We can't store the app set inside a global.

I'm adding the process ID of the creator to the application info saved on disk instead to assert ownership and leaving the option for un-owned applications which are intentionally created to be used by tests.

Comment 6 Rob Millner 2012-10-26 21:11:37 UTC

Pull request: https://github.com/openshift/origin-server/pull/777

Comment 7 Xiaoli Tian 2012-10-29 02:34:13 UTC

Move it to ON_QA since above pull request is merged since devenv_2392

Comment 8 Xiaoli Tian 2012-10-29 02:35:18 UTC

According to the latest test result:

https://ci.dev.openshift.redhat.com/jenkins/job/merge_pull_requests/835/artifact/li/rhc/log/broker/development.log

There's no such errors, move it to verified.