Red Hat Bugzilla – Bug 198740
statfs() call on a GFS filesystem in recovery is blocking.
Last modified: 2010-01-11 22:12:03 EST
Description of problem:
On a GNBD+GFS cluster, i am running a statfs() system call while crashing
another node. The statfs() system call is blocked (this is normal) , but also
uses 100% of the CPU, as if there was a busy waiting. When the fencing request
is responded, the system call returns and all is back to normal.
I have looked at gfs-kernel/src/gfs/super.c in "stat_gfs_async" and
"stat_gfs_sync" to search for an active waiting but i'am just a GFS user so i
don't understand all that it does.
Version-Release number of selected component (if applicable):
I'm currently using a packaged version of GFS-6.1.5, not the latest stable
branch of the CVS.
Steps to Reproduce:
1.run statfs() (a "df" should do the same)
2.crash a node at the same time as (1)
3.run "top" and watch out how CPU "df" crunches.
All CPU is consumed in "df"
No workload for "df"
When answering fencing, all goes fine after the end of recovery of GFS.
I have seen something similar.
There is a utility running which runs 'dt -kt gfs' every 5 seconds. Sometimes
these take a VERY long time (many minutes) to complete and something only a few
seconds (16TB filesystem). We have also seen this occur when deleting several
million files using 'rm -f *' and df hangs indefinitely.
strace shows that it is in 'sys_statfs()'. Unfortunately, we cannot get the
stack because it is consuming all the resources on the processor. We can only
look at it from the other processors on the system.
Sometimes the node running the df command gets fenced during this period.
Unfortunately, the system logs do not provide any information.
(In reply to comment #1)
> There is a utility running which runs 'dt -kt gfs' every 5 seconds.
What is 'dt'?
(In reply to comment #0)
> On a GNBD+GFS cluster, i am running a statfs() system call while crashing
> another node.
How are you crashing the other node?
> The statfs() system call is blocked (this is normal) , but also
> uses 100% of the CPU, as if there was a busy waiting. When the fencing request
> is responded, the system call returns and all is back to normal.
What fencing method are you using?
Also, are you using dlm or gulm?
Typo... meant to be 'df -kt gfs'
> How are you crashing the other node?
I'm crashing it using "echo b > /proc/sysrqtrigger". The node does not come back
in the GFS cluster.
> What fencing method are you using?
For the purpose of the experiment, i configured the fencing tool to be manual.
Please note that the node was rebooted and didn't joined back the cluster
immediately (done it manually)
> Also, are you using dlm or gulm?
I also forgot to say it was a 3 nodes cluster.
Hmm. I am using a 5 node cluster with DLM and manual fencing and I didn't see
the problem. I need to continue to investigate.
Note that 'df' is known to be slow on GFS(1) filesystems, and this is even
mentioned in the FAQ. I did see that the CPU utilization was high (~99%) for the
> I didn't see the problem. I need to continue to investigate.
Did you mean that you didn't expererienced the 99% CPU usage ? Or do you mean
that this is not a bug ?
In my mind, having "df" take 99% CPU is not normal; i thought it should
- either block and wait the fencing to be replied, and not take 99%
- not block, compute results, take 99% for a given period of time (I don't mind
if this is long) and then return.
My problem was that for the time i replied manually to the fencing, my "df"
process was blocked and took 99%.
I intentionaly waited a long time (a few minutes) before replying to the fencing
request. "df" was blocked for this period, taking 99% CPU.
(In reply to comment #7)
> Did you mean that you didn't expererienced the 99% CPU usage ? Or do you mean
> that this is not a bug ?
I did see 'df' using 99% CPU. But 'df' appears to use ~99% CPU even under normal
circumstances (without introducing a fenced node).
I did not 'reply' to the manual fencing request and 'df' returned normally. I
will continue to attempt to replicate this problem.
I've spent some time trying to recreate this problem and have not been able to
do so. Let me explain what I attempted to do: in a 5-node cluster configured for
manual fencing I did 'df' and the immediately killed the node. The failed node
was fenced and I didn *not* ack the fencing operation. Meanwhile, output from
'top' showed the 'df' process to be consuming abour 99% CPU. I had several
printk statements in the kernel code to watch the progress of the statfs. The
'df' command continued normally and returned successfully without ack-ing the
fenceing operation. In comparison, it appeared to behave just as a 'df' on a
normal GFS filesystem.
Can you tell me what gfs-kernel rpm you have installed? I'd like to make sure we
have the same kernel bits.
I don't see a gfs-kernel package, but I am assuming it is the same as the
kernelheader (2.6.9-45). I have this code running on my cluster now and the
performance for df is quite good. Is there a heavy I/O load on the filesystem
while the df is running?
In addition, I should point out that running 'df' every 5 seconds on a 16TB
filesystem (or any size for that matter) is a bad idea. This will have a severe
impact on cluster/filesystem performance.
I've reproduced it on the "STABLE" branch. Here is the experiment:
In a 3 nodes cluster, only 2 nodes have the GFS filesystem mounted. Both nodes
are running I/Os (these are performance tests run with bonnie++; i think any
kind of intensive I/O should do). At the same time, i issue:
- "echo b > /proc/sysrq-trigger" on the 1st node
- "df" on the 2nd node
On 1 of the 2 remaining nodes, the fence request appears. At this time, i kill
the bonnie++ on the second node. "df" continues to run undefinitely... till a
reply the fence request.
I've performed the same test with no I/Os : "df" exits immediately.
When a node is fenced, all activity on the filesystem is blocked until fencing
succeeds. This should only be apparent when using fence_manual since it requires
a user to manually ack the fence operation. Running "df" while a fence operation
is waiting to be ack'd will cause it to block. However, if "df" can get its
information from cache, it will complete successfully even with a pending fence
I believe that the real issue being reported here is not that "df" will block
while waiting for a fence operation to be ack'd (that is expected), but rather
that the "df" process spins, consuming ~99% of the cpu.
I've tried to recreate this recently and have been unable to do so. Closing this
for the time being.