Description of problem: We recently upgraded a server from RH 7.1 to Fedora Core 2 A number of customers that use fsock_open in their PHP scripts have now found that they do not work at all. Basically the PHP script times out (NOT the fsock_open call): Fatal error: Maximum execution time of 30 seconds exceeded in /home/jon/public_html/jon_test.php on line 2 The relevant line of code (where $host is the IP of the server): $fp = fsockopen("209.132.177.50", 80, $errno, $errstr, 10); Now the truly weird thing is I have managed to have the script work by renaming the file, but then renaming it back it doesn't work. Eg. "mv jon_test.php jon2.php" and then calling jon2.php worked. However this is NOT reliable either as a copied file that was working for a while eventually stopped working as well. NOTE: This is something to do with the php module for Apache as running "php jon_test.php" from the command line works fine. Version-Release number of selected component (if applicable): php-4.3.6-5 How reproducible: Everytime Steps to Reproduce: 1. create a test file with code similar to the following: <?php $fp = fsockopen("209.132.177.50", 80, $errno, $errstr, 10); if (!$fp) { echo "$errstr ($errno)<br />\n"; } else { stream_set_timeout($fp, 2); $out = "GET / HTTP/1.0\r\n"; $out .= "Host: fedora.redhat.com\r\n"; $out .= "Connection: Close\r\n\r\n"; fwrite($fp, $out); while (!feof($fp)) { echo fread($fp, 128); } fclose($fp); } ?> 2. Access the file via a webpage hosted by Apache Actual results: Fatal error: Maximum execution time of 30 seconds exceeded in /home/jon/public_html/jon_test.php on line 2 Expected results: The Fedora homepage should be displayed. Additional info: A customer has tried to use curl to workaround this with no luck. I'll be trying the same myself when I get the chance.
netstat shows that the socket is actually being connected: tcp 0 0 203.30.164.96:34325 209.132.177.50:80 ESTABLISHED 2422/httpd But the script timeout still occurs. Using curl as follows: <?php // create a new curl resource $ch = curl_init(); // set URL and other appropriate options curl_setopt($ch, CURLOPT_URL, "http://fedora.redhat.com/"); curl_setopt($ch, CURLOPT_HEADER, 0); // grab URL and pass it to the browser curl_exec($ch); // close curl resource, and free up system resources curl_close($ch); ?> This also manages to create the connection but actually "hangs" rather then timeout.
Any feedback folks?
I can't reproduce any problems here, both scripts you've posted work 100% of the time both from /usr/bin/php and via httpd. When the script "hangs", and you've found the httpd child with the ESTABLISHED connection, what does: # strace -p <pid> on that child process ID produce?
It seems it's something peculiar to that server. I've setup a test box in the dev environment here and it is working fine. :( I will attach the output of strace but be aware it extracts to a 54M file. I used: strace -p `netstat -anp | grep "209.132.177.50" | grep ESTABLISHED | cut -dD -f 2 | cut -d " " -f 2 | cut -d "/" -f 1` 2> strace.out Shortly after I'd hit the page. With strace attached the output of the webpage changed from: Fatal error: Maximum execution time of 30 seconds exceeded in /home/httpd/public_html/jon_fsock.php on line 2 to: Warning: fsockopen(): unable to connect to 209.132.177.50:80 in /home/redhatozstagingcom/public_html/jon_fsock.php on line 2 Bad file descriptor (9)
Created attachment 101457 [details] strace output as requested The promised strace output.
That's interesting. Do you have a lot of vhosts /error logs set up on that host?
Yes. Over 1500 vhosts are setup. Most of them have a combined log. I take it you have an idea what the issue might be?
But just to confirm absolutely, you have ~1500 individual ErrorLog or CustomLog directives configured? If so, yes, this is a rather severe PHP bug. Technical description: PHP internally uses the select() system call to wait for I/O on file descriptors such as the socket opened by fsockopen(). select() uses an array which can take a file descriptor number up to 1024; if used with file descriptors > 1024, then you can get some random memory corruption. In this case the arrays get corrupt and select() goes a bit crazy, producing the crazy strace output you attached. httpd uses one fd per ErrorLog/CustomLog directive, so that's any easy way to get fd numbers up above 1024.
Yes we certainly do, an example: <VirtualHost 203.30.164.96:80> ServerName eq.rpgaddicts.net ServerAlias www.eq.rpgaddicts.net ServerAdmin webmaster.net DocumentRoot /home/eqrpgaddictsnet/public_html SuexecUserGroup eqrpgaddictsnet eqrpgaddictsnet CustomLog /home/eqrpgaddictsnet/eq.rpgaddicts.net_log combined </VirtualHost> We have increased the fd's available by adding: ulimit -n 8192 to /etc/rc.d/init.d/httpd As this worked in RH7.1 and by your description I take it this is a system level (kernel) bug? If there is anything else I can do to help with this please let me know.
It's a PHP bug and it can be fixed there: it could fail the same way on any kernel and OS version as far as I can see. It requires the conjunction of the two things to trigger: a configuration which pushes fd numbers > 1024, and a script which exercises any part of PHP which uses select(). (As an aside, putting the ulimit command in /etc/sysconfig/httpd is a better approach, to avoid your changes being lost during an httpd upgrade.)
Should I submit a bug report to the PHP team then? Thanks for the tip. :)
There is a bug in the PHP database which is probably the same issue but wasn't analysed fully when reported: http://bugs.php.net/bug.php?id=24189 I'm working on patches which mitigate the issue. The real fix is to use poll() rather than select(), since poll() doesn't have the fd number limit; but this is rather a lot of work. Another alternative is to make a custom build of the PHP RPM, adding "-DFD_SETSIZE=4096" or so on to CFLAGS.
I tried compiling a custom RPM by editing the php.spec to contain: CFLAGS="$RPM_OPT_FLAGS -Wall -fno-strict-aliasing -DFD_SETSIZE=8192"; export CFLAGS But this doesn't seem to have had any effect. I guess I'll just have to wait until you manage to release a new RPM for it. Thanks for all your assistance.
The test 4.3.8 RPMs here: http://people.redhat.com/jorton/FedoraC2-php/ include a workaround for a couple of the select() issues, though you may still hit others - could you try these out? Behaviour will be at least no worse than currently.
You don't appear to have the base RPM package there? Hence trying up2date I got the following. Unresolvable chain of dependencies: php-imap-4.3.8-2.1 requires php = 4.3.8-2.1 php-ldap-4.3.8-2.1 requires php = 4.3.8-2.1 php-mysql-4.3.8-2.1 requires php = 4.3.8-2.1
Sorry, it was missed from the upload for some reason, it's there now.
Update applied and my test page (fsock) worked fine. Curl still hangs. I'll give it a go again tomorrow to make sure it's not related to a fresh start of the server, or some such thing, and will let you know the results.
Yes, this won't fix curl. (Fixing curl would involve changes to curl and possibly even the curl API unfortunately)
Well things are still working so all looks good. :) Thanks again.
OK, well this select vs FD_SETSIZE problem has been fixed upstream for good for PHP 5.1 thanks to the excellent work of Wez Furlong. But the workarounds are good for now.
Do the workarounds cover ftp_connect as I'm seeing issues with it that I can only presume are related? Thanks again.
Yes, the FTP extension has similar problems. It looks like we can add more of the workarounds in a future FC2 update.
Thanks Joe. I've added a PHP bug here if you wanted to give them the sort of detail I can't: http://bugs.php.net/bug.php?id=31080
Two quick questions. Any idea on when I might see the workarounds for the ftp issue? Are these workarounds making it in to other RedHat builds such as RedHat Enterprise? We are setting up a new server with Enterprise running on it and I'd hate to think we're going to run in to the same problems on there too. :)
To the first question: could be soon. To the second question: yes, bug 132003 is tracking the same issue for RHEL3 and a php update is due to be issued soon which contains some of the same workarounds.
Workaround now added for the ftp extension in 4.3.10-2.4. Jon, please file new bugs (against RHEL3 as appropriate) for any further issues you see, rather than reopening this one.
Thanks and will do.