Bug 1020841 - Get " signal Segmentation fault" error in log and return 500 when accessing python-2.7/2.6 app with medium gear &python-2.6 app with large gear
Get " signal Segmentation fault" error in log and return 500 when accessing p...
Status: CLOSED CURRENTRELEASE
Product: OpenShift Online
Classification: Red Hat
Component: Containers (Show other bugs)
2.x
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: mfisher
libra bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-18 07:19 EDT by chunchen
Modified: 2016-09-29 22:15 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-01-23 22:25:19 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description chunchen 2013-10-18 07:19:16 EDT
Description of problem:
when creating a python-2.7/2.6 app with medium gear size, will get 500 page when access this app via browser.

Version-Release number of selected component (if applicable):
devenv_3912

How reproducible:
always

Steps to Reproduce:
1. Create a python-2.7/2.6 app with medium gear size
rhc app create cpy27 python-2.7 -g medium --no-git
2. Access this app via browser and tail the log of this app
rhc tail cpy27

Actual results:
1) get 500 page(Internal Server Error)
2) meet errors:
[Fri Oct 18 06:11:20 2013] [error] [client 127.2.1.129] Premature end of script headers: application
[Fri Oct 18 06:11:20 2013] [notice] child pid 31629 exit signal Segmentation fault (11)

Expected results:
Should access python-2.7/2.6 app with medium gear size via browser

Additional info:
Comment 1 Xiaoli Tian 2013-10-21 06:03:56 EDT
Could be reproduced with large gear python-2.6 app as well:

[Mon Oct 21 05:58:05 2013] [notice] Apache/2.2.15 (Unix) mod_wsgi/3.2 Python/2.6.6 configured -- resuming normal operations
[Mon Oct 21 05:59:29 2013] [error] [client 127.1.246.1] Premature end of script headers: application
[Mon Oct 21 05:59:29 2013] [notice] child pid 14415 exit signal Segmentation fault (11)
Comment 2 Rob Millner 2013-10-21 21:26:59 EDT
This is due to setting stack-size.  I can reliably test this by removing the setting from performance.conf.erb to show no failure or putting it back in to show a failure.

Looking through a few core dumps, it appears as though the settings are causing writes off of the stack which is causing the segfault.

The docs on WSGIDaemonProcess mention that the value for stack-size is in bytes.  We appear to be making the following settings:

Gear size:    Memory:         stack-size:
small          512MB           8388 bytes
medium        1024MB          16777 bytes
large         2048MB          33554 bytes

The system limits allow up to 10485760 bytes and the default may well be that value.

Looking at the manpage for pthread_attr_setstack, it fails if you try to set a stack size below 16384 bytes.  I'll bet the only reason why this setting works on small gears is that 8388 is low enough that setstack fails and the default ends up getting used.

Python docs claim it needs 32k stacks just to run the interpreter.
http://docs.python.org/2/library/thread.html

Also, the embedded formula produces values that are not aligned to 4k boundaries.
stack-size=<%= (((ENV['OPENSHIFT_GEAR_MEMORY_MB'].to_i * 0.8)/25) * 1024).to_i/2 

The value given is likely being rounded down to the nearest 4k page boundary.

Tweaking the stack size seems to risk blowing things up.  And it doesn't matter much since most of what the script itself is doing will be on the heap which we have no control over.

Perhaps a better alternative would be to tweak down the number of threads we create instead.

We set 25 for every gear size.  How about something like "10 threads per 1024MB" instead which would result in the following table:
Gear size:    Memory:         threads:
small          512MB             5
medium        1024MB            10
large         2048MB            20
Comment 3 Michal Fojtik 2013-10-22 07:20:53 EDT
Hi Rob,

Thanks a **lot** for this investigation. Yeah, I agree that the better way would be to set the number of threads OR number of processes. I'll work on this today and make a PR.
Comment 4 Michal Fojtik 2013-10-22 07:31:24 EDT
The PR:

https://github.com/openshift/origin-server/pull/3951
Comment 5 openshift-github-bot 2013-10-22 15:51:25 EDT
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/3cfc6954453fad0952cf53910bc69ea8e0d7abb7
Bug 1020841 - Tune python cartridge by increasing number of threads instead of stack-size
Comment 6 chunchen 2013-10-23 01:47:07 EDT
It's fixed, verified on devenv_3932.

Note You need to log in before you can comment on or make changes to this bug.