Bug 219235

Summary: serializer issue with incomplete class
Product: Red Hat Enterprise Linux 4 Reporter: William Lovaton <williama_lovaton>
Component: phpAssignee: Joe Orton <jorton>
Status: CLOSED ERRATA QA Contact: David Lawrence <dkl>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.4CC: walovaton
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-05-18 20:32:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
core dump of a new crash none

Description William Lovaton 2006-12-12 00:25:57 UTC
Description of problem:
Monitoring apache error log with "tail -f" shows the following httpd crashes:
[notice] child pid XXXXX exit signal Segmentation fault (11)

After setting CoreDumpDirectory apache directive on my production web server I
tried debugging the resulting core dumps and this is what I got:
(gdb) bt full
#0  0x00469a25 in memcpy () from /lib/tls/libc.so.6
No symbol table info available.
#1  0x010935bf in zif_var_export () from /etc/httpd/modules/libphp4.so
No symbol table info available.
#2  0x01095ac9 in php_var_serialize () from /etc/httpd/modules/libphp4.so
No symbol table info available.
#3  0x01039838 in ps_srlzr_encode_php () from /etc/httpd/modules/libphp4.so
No symbol table info available.
#4  0x09fa3cb0 in ?? ()
No symbol table info available.
#5  0xbff665e0 in ?? ()
No symbol table info available.
#6  0xbff665d8 in ?? ()
No symbol table info available.
#7  0x00000000 in ?? ()
No symbol table info available.

After looking around on the Internet I found the following php bug report which
Joe Orton helped to fix (may be not related but somewhat close):
http://bugs.php.net/bug.php?id=34435&edit=2

My production web server is a clean install without updates of Red Hat
Enterprise Linux 4 Update 4 with default apache and php packages. Oracle client
support 9.2.0.4 correctly installed, Zend Optimizer, TurckMM-Cache 2.4.6 for
PHP, fdf support and obviously oci8 support.

Specific apache configuration includes the file timers.conf in /etc/httpd/conf.d
whit the following content:
RequestHeader set requestTime %t
Header set requestDuration %D
Header set responseTime %t

This is used for several things like tracking request duration on the web
browser and specially, to keep a log on the server of scripts that takes longer
that n seconds (n = 20 secs right now) to execute using the request time to take
the exact time when apache gets the request (before actual starting the php
script) and register_shutdown_function() to measure the total time after script
shutdown.

The funny thing is that most of the crashes in error_log (if not all) match some
items of said "performance" log.  It's like it takes a lot of time to execute
and finally crashes.  AFAIK the PHP program ends normally and the user gets the
result but creates an error_log entry saying that it crashed.

Having this into account, I have been tracking a very long time problem with our
heavily loaded web application.  This app has lots of instrumentation to measure
time used in database queries or any other possible slow algorithm, every time a
crash occurs and the execution takes longer than 20 secs it shows up in my
performance log and when I try to see the detailed log of the instrumentation it
says that total time of instrumented code is about, for example, 0.1 secs.  The
slowness is always before getting to the first check point of said
instrumentation and that only means a bunch of requires (just classes and
functions definitions) and session_start().

Only today I managed to see that the crashes in error_log are related to this
unmeasurable slowness that I have.



Version-Release number of selected component (if applicable):
httpd-2.0.52-25.ent
php-4.3.9-3.15
php-ldap-4.3.9-3.15
php-gd-4.3.9-3.15
php-pear-4.3.9-3.15
php-devel-4.3.9-3.15
Oracle EE Client Installation 9.2.0.4
TurckMM-Cache 2.4.6 compiled from source
Zend Optimizer 2.6.2 (I tried without this one and still crashes)



How reproducible:
I don't yet see any pattern, I just can say that it's not very often, it happens
like every 30 seconds or so (keep in mind that this is a heavily loaded web
server with about 10 millions hits per day).  And the other thing I can say is
that it doesn't happen with Fedora Core 3 with latest updates.  We still have
another mission critical web server running FC3 and httpd processes doesn't
crash there.  It started when we migrated one of our production web servers from
FC3 to RHEL4U4 running one specific web app made completely in-house (both web
apps were developed using the same design patterns)



Additional info:
My web server specs are:
- IBM xSeries 445
- 8 CPUs 2GHz each with HT (16 CPUs as seen by the OS)
- 12 GB of RAM

Comment 1 Joe Orton 2006-12-12 09:33:07 UTC
Thanks for the report.  

- Can you reproduce with this "TurckMM-Cache" also disabled?
- Can you reproduce this with php-4.3.9-3.22, the latest update for RHEL4?
(there are no pertinent fixes but this should always be the first step)

Can you bzip2 and attach one of the core dumps produced? (hit the "Private" link
so it's not publically available)

Comment 2 William Lovaton 2006-12-12 21:38:23 UTC
Hi Joe, thanks for the reply.

- It's difficult to test without TurckMM-Cache because the web app has a huge
traffic and it will melt the server compiling PHP programs over and over again.
 You can see more about this op-code cache here:
http://turck-mmcache.sourceforge.net/index_old.html

I know eAccelerator is a fork of this project (http://eaccelerator.net/) and it
seems like it is being maintained but I don't want to switch to a new one
knowing that TurckMM-Cache worked fine with every distro I tried.  Our Fedora
Core 3 production servers use TurckMM with PHP 4.3

Do you know a good alternative to Turck? do you know if eAccelerator is good
enough?  May be I'll try to use this one and see if this prevents the crashes.

- I'll try to upgrade PHP but it's very hard for me since this server is not
registered in RHN.  Is there an easy way to upgrade httpd and php without
fighting too much with deps?

- Given the backtrace in my initial post I can see that the problem occurs when
it tries to serialize something.  Right now in our apps there are 3 places where
we depends heavily on PHP serialization: 1) PHP sessions.  2) Query caching.  3)
Performance monitoring and user data.

I think number 1 is not causing any problem.

Number 2 uses several backends to store serialized data (shm, disk, null).  I'll
try to use NULL to avoid serializing and caching and see what happens (I believe
our app can take the performance hit of this).

Number 3 is used to store $_REQUEST and $_SESSION data in a special file every
time a request takes too long to execute (more than 20 secs) so that we can have
a chance of looking at the input given by the user and see if the slowness is
reproducible on a test system and try to optimize the algorithms.  This too can
be disabled to avoid serializing and storing of this information.

- I'll attach the bziped core later


Thanks again for your help.


Comment 4 Joe Orton 2006-12-13 12:04:22 UTC
That's great, thanks a lot, I have a reproducer which seems to match this exactly.

Comment 5 William Lovaton 2006-12-13 13:19:16 UTC
Cool!

I was thinking about disabling TurckMM-Cache really early in the morning without
many users on the system and see if the crashes were reproducible that way but I
am almost sure that this isn't causing any problem.

I really hope you can track this sucker down.

Little question:  What would you recommend me for an op-code cache for PHP that
works well with RHEL 4? I mean, sure there are lots of customers using PHP on
heavy loaded web sites, isn't it?

Cheers.


Comment 6 Joe Orton 2006-12-13 13:52:04 UTC
No, sorry, we don't have a recommendation of any particular op-cache package.

I'm building packages with the patch for the serializer issue now and will make
them available for testing shortly.

Comment 7 Joe Orton 2006-12-13 14:46:07 UTC
Can you try out the php-4.3.2-3.23 packages from here:

http://people.redhat.com/jorton/Nahant-php/

not that these packages *have not been through Red Hat QA* - they've only been
tested to not cause regressions in the PHP test suite and to fix the specific
reproduction case I have which is hopefully the same issue that you are seeing.

Comment 8 William Lovaton 2006-12-13 15:34:20 UTC
Thanks, I'll try them soon, may be at the end of the day.

- Do I need to upgrade httpd too?
- How do I downgrade if needed? (I've never done that with RPM)

Cheers.

Comment 9 Joe Orton 2006-12-13 15:38:34 UTC
No, just the php packages are needed.  To downgrade to older versions of
packages, find the RPMs on RHN/the CD or wherever and "upgrade" to them using:

  # rpm -Uvh --oldpackage php-4.3.2-3.15.i386.rpm ...



Comment 10 William Lovaton 2006-12-13 16:10:11 UTC
Thanks, will do.

I'm gonna ask you an "off-bug" question: Do you think is a good idea to use the
latest oci8 PHP module (the one in PECL) with PHP 4.3.X??  I need the pinging
functionality in that version to solve some issues with Oracle: Once in a while
it returns an Oracle error saying that is not connected to Oracle, we use
persistent connections to the DB but no one is killing the server side process
or anything and I don't know what is happening there.

Comment 11 William Lovaton 2006-12-13 23:24:50 UTC
Hi Joe,

You saved my day! it totally worked.  Our production server have been running
for about an entire hour with the updates and there is not a single crash in the
error_log.  Our performance monitor tool shows good measures again and all of
the entries there are justified (sometimes a heavy query, sometimes a very slow
network link, etc).  Our "slowness" events went from every 10 seconds to every
1.5 minutes which is normal.

The installed packages are:
php-ldap-4.3.9-3.23
php-4.3.9-3.23
php-pear-4.3.9-3.23
php-gd-4.3.9-3.23
php-devel-4.3.9-3.23

I'll keep monitoring the system for a few days and I'll give back some more
feedback.

Thanks again.


Comment 12 Joe Orton 2006-12-14 08:58:43 UTC
Thanks a lot for testing out the packages.  w.r.t. to the oci8 question, I'd
certainly recommend using the latest upstream OCI8 module that works - but
again, this is not something we've done in-house testing on, so can't give a
more specific recommendation.

Comment 13 RHEL Program Management 2006-12-14 09:04:22 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 15 William Lovaton 2006-12-15 14:46:57 UTC
Hi Joe,

I got a new crash but this seems to be a totally different issue.

PHP have been very stable, there wasn't any crashes in the log until 14 Dec at
7:20 AM, it was the only one.  I then activated the core dump directory in
apache and waited for other crash to happen.  It finally did today 15 Dec at
7:27 AM and the core dump show the following:

(gdb) bt full
#0  0x001f4666 in malloc_consolidate () from /lib/tls/libc.so.6
No symbol table info available.
#1  0x001f5643 in _int_malloc () from /lib/tls/libc.so.6
No symbol table info available.
#2  0x001f7401 in malloc () from /lib/tls/libc.so.6
No symbol table info available.
#3  0x010fda96 in _emalloc () from /etc/httpd/modules/libphp4.so
No symbol table info available.
#4  0x010f0f5c in php_end_implicit_flush () from /etc/httpd/modules/libphp4.so
No symbol table info available.
#5  0x0000a001 in ?? ()
No symbol table info available.
#6  0x00000000 in ?? ()
No symbol table info available.


It's weird it happened almost at the same hour in the morning, may be httpd is
restarting or something like that and it is crashing doing that.

Do you think I should file a different bug report??  I'll attach the bzipped
core later.

Cheers.

Comment 16 William Lovaton 2006-12-15 14:52:23 UTC
Created attachment 143765 [details]
core dump of a new crash

This is the new core dump that I got today at 7:27 AM (local time).  I assume
that this will happen once a day may be.  Not a really serious issue for me but
I still want to know what might be happening there.

Comment 17 Joe Orton 2006-12-15 15:12:15 UTC
That does look like a separate issue; can you file a new bug, and attach your
php.ini too?

Comment 18 William Lovaton 2006-12-15 15:44:38 UTC
Ok, I just got another crash at 10:34 AM with the following message:
[Fri Dec 15 10:34:58 2006] [notice] child pid 16791 exit signal Bus error (7),
possible coredump in /tmp/httpd-coredump

I'll create a new bug report soon with the new information.

Comment 26 Ken Reilly 2007-08-23 13:41:33 UTC
I cleared the rhel-4.6 flag for this BZ and request that you [Joe] do the same
for any others proposed for 4.6. Please don't set the rhel-4.7 flag until you
know which ones you want to proceed fixes for. 

Thanks...


Ken
 

Comment 27 William Lovaton 2007-08-24 21:09:19 UTC
Sorry, I have to ask: Is this fix not included in RHEL 4.5?? I thought it was.

Comment 28 RHEL Program Management 2008-02-01 19:11:02 UTC
This request was evaluated by Red Hat Product Management for
inclusion, but this component is not scheduled to be updated in
the current Red Hat Enterprise Linux release. If you would like
this request to be reviewed for the next minor release, ask your
support representative to set the next rhel-x.y flag to "?".

Comment 29 RHEL Program Management 2008-05-13 15:40:33 UTC
This request was evaluated by Red Hat Product Management for
inclusion, but this component is not scheduled to be updated in
the current Red Hat Enterprise Linux release. If you would like
this request to be reviewed for the next minor release, ask your
support representative to set the next rhel-x.y flag to "?".

Comment 30 RHEL Program Management 2008-09-05 17:11:12 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 32 William Lovaton 2009-01-26 15:04:09 UTC
Hi, is this going in for the next update? I hope so since I'm still using RHEL 4.

Thanks.

Comment 33 Joe Orton 2009-01-26 15:37:08 UTC
William - the fix for this bug is indeed scheduled to be included in RHEL 4.8.

Comment 35 William Lovaton 2009-04-10 18:33:37 UTC
Hi Joe,

I just upgraded all of my servers and I reinstalled RHEL 4 but I don't have the PHP packages you gave me in comment #7 (php-4.3.9-3.23) that fixed this problem.

Is there a way to get these packages?

Any idea when an updated package will be officially released?

Comment 36 Joe Orton 2009-04-14 08:25:52 UTC
William - I've uploaded the latest package set here:

  http://people.redhat.com/jorton/Nahant-php/

Let me know how testing goes!

Comment 37 William Lovaton 2009-04-16 19:49:21 UTC
So far so good.  I installed the updates very early this morning and there is not even one segfault in the logs.

Before there used to be about 1000 segfaults a day, so far there is none.

Thanks a lot, I'll keep you updated if anything abnormal show up.

Comment 38 Joe Orton 2009-04-17 07:53:55 UTC
Great to hear - thanks for following up, and sorry this has taken so long to resolve.

Comment 40 errata-xmlrpc 2009-05-18 20:32:37 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1013.html