Red Hat Bugzilla – Bug 185078
PHP process hung in D state with flock
Last modified: 2010-01-11 22:10:10 EST
Description of problem:
Field report says running Apache would sometime cause php process hung in D
state within flock() system call and/or cause lock performance issue.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Got a rough idea what could go wrong. In RHEL 3 (linux 2.4 based) kernel, flock
has the following logic:
1. lock_kernel (Big Kernel Lock - BKL)
2. call filesystem-specific supplemental lock
3. handle linux vfs flock
That BKL could be the culprit.
Date: Fri, 10 Mar 2006 19:11:21 -0500
From: Wendy Cheng <email@example.com>
To: "Treece, Britt" <Britt.Treece@savvis.net>
CC: "Stanley, Jon" <firstname.lastname@example.org>
Subject: Re: [Linux-cluster] GFS load average and locking
Treece, Britt wrote:
>Did the sysrq-t's that I sent illustrate this problem further? I'm
>hoping that they corroborate the situation that you described below.
(This reply will be logged into our ticket to get everyone in the loop.)
There are three layers of code we're examining:
1. PHP layer
Google searches found there were reports saying php session didn't get handled
Also by checking into RHEL src rpms, I find flock() is
invoked as blocking call. This can lead to the following two issues:
(sorry hit wrong key ... continue)
2. RHEL3 kernel flock implementation
The flock() is implemented as:
step1: lock_kernel (BKL) - has been removed from RHEL 4.
step2: get filesystem lock (ext3 is noop while GFS calls gfs_lock)
step3: get vfs layer lock (local memory fetch logic, relatively fast)
The BKL is certainly a performance hit. However, since it will get dropped when
process is not scheduled (sleeps), it will not significantly serialize the flock
call as (I) previously expected. Also ext3 doesn't show signs of performance
hit, look to me the issue is step 2, the GFS's glock, if we really have an issue.
3. GFS Layer
GFS lock is obtained via network (to/from lock server) so it is subject
to network congestion and certain level of overhead must be expected.
Look to me that the (customer's) concern is that when it is waiting for
the lock, it gets into D state (un-interruptible) and pumps up the
system "load". The reason for this is that the flock() is invoked (by
PHP) as a "blocking" call. By design, it (gfs_lock) loops around to wait
for the lock to arrive where it accumulates the CPU consumption that
leads to high "load". We certainly can make cosmetic changes to reduce
this artificial "load" count (with some restrictions) but be aware that
the average wait time will still be largely unchanged (or maybe even
longer). Another solution would be for PHP to use non-blocking lock
calls. Will discuss this issue with Joe Orton, our PHP maintainer. (Joe,
this could be a nice white paper titled as "fine-tune php on a cluster
filesystem" :) ). However, I have a fundamental question: does this
"artificial" load number really affect the overall system thruput ? I
can send out a test kernel for you to try out (quantify it) if you're
In general, I don't see we really capture the required info when the system is
in sluggish states - that is, the PHP session thread traces from all GFS nodes
to really understand what the threads are waiting for. What we need to do is:
1. Check out php session handling - is it really bad ?
2. Quantify BKL impact - however, base kernel team has indicated removing this
from RHEL3 would be too risky.
3. Make changes to gfs_lock busy wait logic (maybe).
4. "Upgrade" PHP to use non-block flock call (to work better with cluster
5. The current set of sysrq do not show the real problem (it shows the
artificial load as I explain above). What we really need is the thread trace
when system is in slugghish state from all GFS nodes (use crash to do the job
instead of sysrq-t).
Also as common sense, seperate lock traffic with other traffic is a good
thing, playing around with GFS tunables would be another good thing.
RHEL4 works as expected (2nd thread/process blocks) but not RHEL3. Look like bug
Closing bug as it has been verified fixed by the customer, should be appearing
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.
Any chance of getting the SRPM for his errata on ftp.redhat.com?