Bug 125258 - fsock_open fails 99.5% of the time
Summary: fsock_open fails 99.5% of the time
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: php
Version: 2
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Joe Orton
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-06-04 04:10 UTC by Jon Benson
Modified: 2007-11-30 22:10 UTC (History)
1 user (show)

Fixed In Version: 4.3.10-2.4
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-12-21 22:16:09 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
strace output as requested (365.76 KB, application/x-gzip-compressed)
2004-06-28 02:25 UTC, Jon Benson
no flags Details

Description Jon Benson 2004-06-04 04:10:34 UTC
Description of problem:
We recently upgraded a server from RH 7.1 to Fedora Core 2

A number of customers that use fsock_open in their PHP scripts have 
now found that they do not work at all.

Basically the PHP script times out (NOT the fsock_open call):
Fatal error: Maximum execution time of 30 seconds exceeded 
in /home/jon/public_html/jon_test.php on line 2

The relevant line of code (where $host is the IP of the server):
$fp = fsockopen("209.132.177.50", 80, $errno, $errstr, 10);

Now the truly weird thing is I have managed to have the script work 
by renaming the file, but then renaming it back it doesn't work.
Eg.  "mv jon_test.php jon2.php" and then calling jon2.php worked.

However this is NOT reliable either as a copied file that was working 
for a while eventually stopped working as well.

NOTE: This is something to do with the php module for Apache as 
running "php jon_test.php" from the command line works fine.



Version-Release number of selected component (if applicable):
php-4.3.6-5

How reproducible:
Everytime

Steps to Reproduce:
1. create a test file with code similar to the following:
<?php
$fp = fsockopen("209.132.177.50", 80, $errno, $errstr, 10);
if (!$fp) {
   echo "$errstr ($errno)<br />\n";
} else {
   stream_set_timeout($fp, 2);

   $out = "GET / HTTP/1.0\r\n";
   $out .= "Host: fedora.redhat.com\r\n";
   $out .= "Connection: Close\r\n\r\n";

   fwrite($fp, $out);
   while (!feof($fp)) {
       echo fread($fp, 128);
   }
   fclose($fp);
}
?>


2.  Access the file via a webpage hosted by Apache

      
Actual results:
Fatal error: Maximum execution time of 30 seconds exceeded 
in /home/jon/public_html/jon_test.php on line 2


Expected results:
The Fedora homepage should be displayed.


Additional info:
A customer has tried to use curl to workaround this with no luck.  
I'll be trying the same myself when I get the chance.

Comment 1 Jon Benson 2004-06-08 05:44:31 UTC
netstat shows that the socket is actually being connected:
tcp        0      0 203.30.164.96:34325     209.132.177.50:80       
ESTABLISHED 2422/httpd

But the script timeout still occurs.

Using curl as follows:
<?php
// create a new curl resource
$ch = curl_init();

// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://fedora.redhat.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);

// grab URL and pass it to the browser
curl_exec($ch);

// close curl resource, and free up system resources
curl_close($ch);
?>

This also manages to create the connection but actually "hangs" 
rather then timeout.


Comment 2 Jon Benson 2004-06-25 05:50:53 UTC
Any feedback folks?


Comment 3 Joe Orton 2004-06-25 11:25:51 UTC
I can't reproduce any problems here, both scripts you've posted work
100% of the time both from /usr/bin/php and via httpd.

When the script "hangs", and you've found the httpd child with the
ESTABLISHED connection, what does:

# strace -p <pid>

on that child process ID produce?

Comment 4 Jon Benson 2004-06-28 02:24:44 UTC
It seems it's something peculiar to that server.  I've setup a test 
box in the dev environment here and it is working fine.  :(

I will attach the output of strace but be aware it extracts to a 54M 
file.

I used:
strace -p `netstat -anp | grep "209.132.177.50" | grep ESTABLISHED | 
cut -dD -f 2 | cut -d " " -f 2 | cut -d "/" -f 1` 2> strace.out

Shortly after I'd hit the page.

With strace attached the output of the webpage changed from:
Fatal error: Maximum execution time of 30 seconds exceeded 
in /home/httpd/public_html/jon_fsock.php on line 2

to:
Warning: fsockopen(): unable to connect to 209.132.177.50:80 
in /home/redhatozstagingcom/public_html/jon_fsock.php on line 2
Bad file descriptor (9)


Comment 5 Jon Benson 2004-06-28 02:25:52 UTC
Created attachment 101457 [details]
strace output as requested

The promised strace output.

Comment 6 Joe Orton 2004-06-29 10:19:33 UTC
That's interesting.  Do you have a lot of vhosts /error logs set up on
that host?

Comment 7 Jon Benson 2004-06-30 00:41:38 UTC
Yes.  Over 1500 vhosts are setup.  Most of them have a combined log.

I take it you have an idea what the issue might be?


Comment 8 Joe Orton 2004-06-30 09:55:30 UTC
But just to confirm absolutely, you have ~1500 individual ErrorLog or
CustomLog directives configured?

If so, yes, this is a rather severe PHP bug.  Technical description:

PHP internally uses the select() system call to wait for I/O on file
descriptors such as the socket opened by fsockopen().  select() uses
an array which can take a file descriptor number up to 1024; if used
with file descriptors > 1024, then you can get some random memory
corruption.  In this case the arrays get corrupt and select() goes a
bit crazy, producing the crazy strace output you attached.

httpd uses one fd per ErrorLog/CustomLog directive, so that's any easy
way to get fd numbers up above 1024.

Comment 9 Jon Benson 2004-07-01 00:31:46 UTC
Yes we certainly do, an example:
<VirtualHost 203.30.164.96:80>
ServerName eq.rpgaddicts.net
ServerAlias www.eq.rpgaddicts.net
ServerAdmin webmaster.net
DocumentRoot /home/eqrpgaddictsnet/public_html
SuexecUserGroup eqrpgaddictsnet eqrpgaddictsnet
CustomLog /home/eqrpgaddictsnet/eq.rpgaddicts.net_log combined
</VirtualHost>

We have increased the fd's available by adding:
ulimit -n 8192
to /etc/rc.d/init.d/httpd

As this worked in RH7.1 and by your description I take it this is a 
system level (kernel) bug?

If there is anything else I can do to help with this please let me 
know.


Comment 10 Joe Orton 2004-07-01 10:45:50 UTC
It's a PHP bug and it can be fixed there: it could fail the same way
on any kernel and OS version as far as I can see.  It requires the
conjunction of the two things to trigger: a configuration which pushes
fd numbers > 1024, and a script which exercises any part of PHP which
uses select().

(As an aside, putting the ulimit command in /etc/sysconfig/httpd is a
better approach, to avoid your changes being lost during an httpd
upgrade.)


Comment 11 Jon Benson 2004-07-02 00:37:03 UTC
Should I submit a bug report to the PHP team then?

Thanks for the tip.  :)


Comment 12 Joe Orton 2004-07-02 09:16:02 UTC
There is a bug in the PHP database which is probably the same issue
but wasn't analysed fully when reported:

http://bugs.php.net/bug.php?id=24189

I'm working on patches which mitigate the issue.  The real fix is to
use poll() rather than select(), since poll() doesn't have the fd
number limit; but this is rather a lot of work.  Another alternative
is to make a custom build of the PHP RPM, adding "-DFD_SETSIZE=4096"
or so on to  CFLAGS.


Comment 13 Jon Benson 2004-07-07 07:06:50 UTC
I tried compiling a custom RPM by editing the php.spec to contain:
CFLAGS="$RPM_OPT_FLAGS -Wall -fno-strict-aliasing -DFD_SETSIZE=8192"; 
export CFLAGS

But this doesn't seem to have had any effect.  I guess I'll just have 
to wait until you manage to release a new RPM for it.

Thanks for all your assistance.


Comment 14 Joe Orton 2004-07-17 22:18:01 UTC
The test 4.3.8 RPMs here: http://people.redhat.com/jorton/FedoraC2-php/
include a workaround for a couple of the select() issues, though you
may still hit others - could you try these out?  Behaviour will be at
least no worse than currently.


Comment 15 Jon Benson 2004-07-19 01:50:37 UTC
You don't appear to have the base RPM package there?

Hence trying up2date I got the following.

Unresolvable chain of dependencies:
php-imap-4.3.8-2.1                       requires php = 4.3.8-2.1
php-ldap-4.3.8-2.1                       requires php = 4.3.8-2.1
php-mysql-4.3.8-2.1                      requires php = 4.3.8-2.1


Comment 16 Joe Orton 2004-07-19 07:03:43 UTC
Sorry, it was missed from the upload for some reason, it's there now.

Comment 17 Jon Benson 2004-07-19 08:01:50 UTC
Update applied and my test page (fsock) worked fine.  Curl still 
hangs.

I'll give it a go again tomorrow to make sure it's not related to a 
fresh start of the server, or some such thing, and will let you know 
the results.


Comment 18 Joe Orton 2004-07-19 08:05:08 UTC
Yes, this won't fix curl.  (Fixing curl would involve changes to curl
and possibly even the curl API unfortunately)

Comment 19 Jon Benson 2004-07-20 00:56:55 UTC
Well things are still working so all looks good.  :)

Thanks again.

Comment 20 Joe Orton 2004-10-18 14:55:26 UTC
OK, well this select vs FD_SETSIZE problem has been fixed upstream for
good for PHP 5.1 thanks to the excellent work of Wez Furlong.  But the
workarounds are good for now. 

Comment 21 Jon Benson 2004-12-13 02:37:59 UTC
Do the workarounds cover ftp_connect as I'm seeing issues with it 
that I can only presume are related?

Thanks again.


Comment 22 Joe Orton 2004-12-14 12:06:42 UTC
Yes, the FTP extension has similar problems.  It looks like we can add
more of the workarounds in a future FC2 update.

Comment 23 Jon Benson 2004-12-15 00:08:20 UTC
Thanks Joe.   I've added a PHP bug here if you wanted to give them 
the sort of detail I can't:
http://bugs.php.net/bug.php?id=31080



Comment 24 Jon Benson 2004-12-17 06:26:49 UTC
Two quick questions.  

Any idea on when I might see the workarounds for the ftp issue?

Are these workarounds making it in to other RedHat builds such as 
RedHat Enterprise?

We are setting up a new server with Enterprise running on it and I'd 
hate to think we're going to run in to the same problems on there 
too.  :)


Comment 25 Joe Orton 2004-12-17 10:18:38 UTC
To the first question: could be soon.

To the second question: yes, bug 132003 is tracking the same issue for
RHEL3 and a php update is due to be issued soon which contains some of
the same workarounds.

Comment 27 Joe Orton 2004-12-21 22:16:09 UTC
Workaround now added for the ftp extension in 4.3.10-2.4.  Jon, please
file new bugs (against RHEL3 as appropriate) for any further issues
you see, rather than reopening this one.

Comment 28 Jon Benson 2004-12-21 23:56:53 UTC
Thanks and will do.



Note You need to log in before you can comment on or make changes to this bug.