Bug 848639
Summary: | Semaphore Sets of gears that no longer exist are left around causing semaphore set exhaustion | ||
---|---|---|---|
Product: | OKD | Reporter: | Xiaoli Tian <xtian> |
Component: | Containers | Assignee: | Rob Millner <rmillner> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | libra bugs <libra-bugs> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 1.x | CC: | mfisher, qgong, twiest |
Target Milestone: | --- | Keywords: | Regression |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | libra_ami #2037 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2012-09-17 21:29:03 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Xiaoli Tian
2012-08-16 05:47:16 UTC
Have not met this error today, it's fixed. Tested on latest INT (devenv_2007) #rhc app create -a 6x52j3uib3 -t perl-5.10 -l xtian+test6 -p 123456 -d Submitting form: rhlogin: xtian+test6 debug: true Contacting https://int.openshift.redhat.com Creating application: 6x52j3uib3 in 83sdqiw02s Contacting https://int.openshift.redhat.com Response from server: DEBUG: Exit Code: 0 broker_c: ["namespace", "rhlogin", "ssh", "app_uuid", "debug", "alter", "cartridge", "cart_type", "action", "app_name", "api"] api_c: ["placeholder"] API version: 1.1.3 DEBUG: '6x52j3uib3' creation returned success. Now your new domain name is being propagated worldwide (this might take a minute)... Pulling new repo down git clone --quiet ssh://9ee7f9387b4541a1bd2edd6bfe3403d2.rhcloud.com/~/git/6x52j3uib3.git/ 6x52j3uib3 Warning: Permanently added '6x52j3uib3-83sdqiw02s.int.rhcloud.com' (RSA) to the list of known hosts. Checking if the application is available #1 Application 6x52j3uib3 is available at: http://6x52j3uib3-83sdqiw02s.int.rhcloud.com/ Git URL: ssh://9ee7f9387b4541a1bd2edd6bfe3403d2.rhcloud.com/~/git/6x52j3uib3.git/ To make changes to '6x52j3uib3', commit to 6x52j3uib3/. Successfully created application: 6x52j3uib3r Reopen it since it can be reproduced on INT(devenv_2021), same error This is caused because gears that were destroyed still have their semaphore sets still listed in ipcs. Eventually, the ex-node runs out of semaphore sets and then httpd can't restart because it can't create a semaphore set. Here's the log from the gear creation of ef685948d2de4c179a07270179e465c6 (from the description above): I, [2012-08-16T01:17:56.430621 #23636] INFO -- : stickshift.rb:373:in `cartridge_do_action' cartridge_ do_action ERROR (120) ------ Initialized empty Git repository in /var/lib/stickshift/ef685948d2de4c179a07270179e465c6/git/7v0kxgnaag .git/ /var/lib/stickshift/ef685948d2de4c179a07270179e465c6/git/7v0kxgnaag.git /tmp /tmp httpd not running, trying to start Failed to restart master httpd, please contact support ------) Here's the http error log: [Thu Aug 16 01:17:56 2012] [crit] (28)No space left on device: mod_rewrite: Parent could not create RewriteLock file /dev/shm/httpd-rewritelock.lock ^--- This is actually caused by semaphore set exhaustion as there is plenty of space on /dev/shm. Here's a snippet of ipcs: ------ Semaphore Arrays -------- key semid owner perms nsems [... SNIP ...] 0x00000000 213549990 3029 600 1 0x00000000 213746599 3054 600 1 0x00000000 214565800 3049 600 1 0x00000000 214008745 3061 600 1 0x00000000 214139818 3056 600 1 0x00000000 214270891 ca717c02ef 600 1 # <--- this gear still exists, the rest in this list don't 0x00000000 214598572 3039 600 1 0x00000000 214729645 3066 600 1 0x00000000 215385006 3075 600 1 0x00000000 215417775 3083 600 1 0x00000000 216859568 3084 600 1 0x00000000 216892337 3088 600 1 0x00000000 216925106 3076 600 1 [... SNIP ...] Steps to reproduce: 1) Create a lot of gears 2) run ipcs -s 3) notice that all of the semaphores have proper owners 4) Remove most of the gears 5) run ipcs -s 6) notice that the semaphore sets of the gears that were removed are still in the list Ok, so this seems to be a sporadic bug in that just creating an app and then destroying it doesn't leave around a semaphore set. Here is a list of cartridges that create semaphore sets when created: Ruby-1.8 Ruby-1.9 Python-2.6 Php-5.3 Perl-5.10 As you can see, it's basically all of the cartridges that use httpd inside the gear. So, the steps above to reproduce might not work. This might only happen when the gear httpd fails to stop properly during a gear destroy and leaves around the semaphore set. The gear destroy now removes IPC entities owned by the gear uuid. https://github.com/openshift/crankcase/pull/407 Updated pull request. https://github.com/openshift/crankcase/pull/408 Pull request merged. Not meet above error again after a bunch of testing against INT, move it to verified. This bug happened again now in INT after updating: #rhc app create -a g1ergtshrg -t nodejs-0.6 -l xtian+test2 -p 123456 -d Submitting form: rhlogin: xtian+test2 debug: true Contacting https://int.openshift.redhat.com Creating application: g1ergtshrg in kqnjeqs4yw Contacting https://int.openshift.redhat.com Problem reported from server. Response code was 500. DEBUG: Initialized empty Git repository in /var/lib/stickshift/af30fe4947144907ad020ccf25311ee5/git/g1ergtshrg.git/ /var/lib/stickshift/af30fe4947144907ad020ccf25311ee5/git/g1ergtshrg.git /tmp /tmp Failed to restart master httpd, please contact support Cartridge return code: 120 Exit Code: 143 api_c: ["placeholder"] broker_c: ["namespace", "rhlogin", "ssh", "app_uuid", "debug", "alter", "cartridge", "cart_type", "action", "app_name", "api"] API version: 1.1.3 RESULT: Node execution failure (invalid exit code from node). If the problem persists please contact Red Hat support. Command Return: 143 WARNING: the 1st attempt failed => let's try it again. Running Command - rm -rf g1ergtshrg && rhc app create -a g1ergtshrg -t nodejs-0.6 -l xtian+test2 -p 123456 -d Submitting form: rhlogin: xtian+test2 debug: true Contacting https://int.openshift.redhat.com Creating application: g1ergtshrg in kqnjeqs4yw Contacting https://int.openshift.redhat.com Problem reported from server. Response code was 500. DEBUG: Initialized empty Git repository in /var/lib/stickshift/00e7cb6d83d24f549d137a58ee7eae91/git/g1ergtshrg.git/ /var/lib/stickshift/00e7cb6d83d24f549d137a58ee7eae91/git/g1ergtshrg.git /tmp /tmp Failed to restart master httpd, please contact support Cartridge return code: 120 Exit Code: 143 broker_c: ["namespace", "rhlogin", "ssh", "app_uuid", "debug", "alter", "cartridge", "cart_type", "action", "app_name", "api"] api_c: ["placeholder"] API version: 1.1.3 RESULT: Node execution failure (invalid exit code from node). If the problem persists please contact Red Hat support. Command Return: 143 After confirmed with Thomas, this is not caused by semaphore set exhaustion, but by some other configure error in httpd, move it to ON_QA to test it again since the error is already fixed by Thomas. It works in current INT now. Just to add more confirmation, yesterday we cleared out all of the unowned semaphore sets in INT. This morning, I checked and there were 0 unowned semaphore sets in INT, so I believe this bug has been fixed. |