Bug 185078

Summary: PHP process hung in D state with flock
Product: Red Hat Cluster Suite Reporter: Wendy Cheng <wcheng>
Component: gfsAssignee: Chris Feist <cfeist>
Status: CLOSED ERRATA QA Contact: GFS Bugs <gfs-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 3CC: cfeist, rkenna, tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Fixed In Version: RHBA-2006-0593 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-07-20 09:52:21 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 174689    

Description Wendy Cheng 2006-03-10 07:14:17 EST
Description of problem:
Field report says running Apache would sometime cause php process hung in D
state within flock() system call and/or cause lock performance issue. 

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
Actual results:

Expected results:

Additional info:

Got a rough idea what could go wrong. In RHEL 3 (linux 2.4 based) kernel, flock
has the following logic:

1. lock_kernel (Big Kernel Lock - BKL)
2. call filesystem-specific supplemental lock
3. handle linux vfs flock
4. unlock_kernel

That BKL could be the culprit.
Comment 5 Wendy Cheng 2006-03-10 19:15:59 EST
Date: Fri, 10 Mar 2006 19:11:21 -0500
From: Wendy Cheng <wcheng@redhat.com>
To: "Treece, Britt" <Britt.Treece@savvis.net>
CC: "Stanley, Jon" <jstanley@savvis.net>
Subject: Re: [Linux-cluster] GFS load average and locking
Treece, Britt wrote:
>Did the sysrq-t's that I sent illustrate this problem further?  I'm
>hoping that they corroborate the situation that you described below.

(This reply will be logged into our ticket to get everyone in the loop.)

There are three layers of code we're examining:
1. PHP layer
Google searches found there were reports saying php session didn't get handled


Also by checking into RHEL src rpms, I find flock() is
invoked as blocking call. This can lead to the following two issues:
Comment 6 Wendy Cheng 2006-03-10 19:24:47 EST
(sorry hit wrong key ... continue)

2. RHEL3 kernel flock implementation
The flock() is implemented as:
   step1: lock_kernel (BKL) - has been removed from RHEL 4.
   step2: get filesystem lock (ext3 is noop while GFS calls gfs_lock)
   step3: get vfs layer lock (local memory fetch logic, relatively fast)
   step4: unlock_kernel
The BKL is certainly a performance hit. However, since it will get dropped when
process is not scheduled (sleeps), it will not significantly serialize the flock
call as (I) previously expected. Also ext3 doesn't show signs of performance
hit, look to me the issue is step 2, the GFS's glock, if we really have an issue.

3. GFS Layer
GFS lock is obtained via network (to/from lock server) so it is subject
to network congestion and certain level of overhead must be expected.
Look to me that the (customer's) concern is that when it is waiting for
the lock, it gets into D state (un-interruptible) and pumps up the
system "load". The reason for this is that the flock() is invoked (by
PHP) as a "blocking" call. By design, it (gfs_lock) loops around to wait
for the lock to arrive where it accumulates the CPU consumption that
leads to high "load". We certainly can make cosmetic changes to reduce
this artificial "load" count (with some restrictions) but be aware that
the average wait time will still be largely unchanged (or maybe even
longer). Another solution would be for PHP to use non-blocking lock
calls. Will discuss this issue with Joe Orton, our PHP maintainer. (Joe,
this could be a nice white paper titled as "fine-tune php on a cluster
filesystem" :) ). However, I have a fundamental question: does this
"artificial" load number really affect the overall system thruput ? I
can send out a test kernel for you to try out (quantify it) if you're

In general, I don't see we really capture the required info when the system is
in sluggish states - that is, the PHP session thread traces from all GFS nodes
to really understand what the threads are waiting for. What we need to do is:
1. Check out php session handling - is it really bad ?
2. Quantify BKL impact - however, base kernel team has indicated removing this
from RHEL3 would be too risky.
3. Make changes to gfs_lock busy wait logic (maybe).
4. "Upgrade" PHP to use non-block flock call (to work better with cluster
filesystem) (maybe).
5. The current set of sysrq do not show the real problem (it shows the
artificial load as I explain above). What we really need is the thread trace
when system is in slugghish state from all GFS nodes (use crash to do the job
instead of sysrq-t).
Also as common sense, seperate lock traffic with other traffic is a good
thing, playing around with GFS tunables would be another good thing.
-- Wendy

Comment 11 Wendy Cheng 2006-03-21 15:45:31 EST
RHEL4 works as expected (2nd thread/process blocks) but not RHEL3. Look like bug
Comment 17 Chris Feist 2006-06-27 17:31:24 EDT
Closing bug as it has been verified fixed by the customer, should be appearing
in RHEL3U8.
Comment 20 Red Hat Bugzilla 2006-07-20 09:52:21 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

Comment 21 John Newbigin 2006-08-20 19:23:53 EDT
Any chance of getting the SRPM for his errata on ftp.redhat.com?