Description of problem: On a GNBD+GFS cluster, i am running a statfs() system call while crashing another node. The statfs() system call is blocked (this is normal) , but also uses 100% of the CPU, as if there was a busy waiting. When the fencing request is responded, the system call returns and all is back to normal. I have looked at gfs-kernel/src/gfs/super.c in "stat_gfs_async" and "stat_gfs_sync" to search for an active waiting but i'am just a GFS user so i don't understand all that it does. Version-Release number of selected component (if applicable): I'm currently using a packaged version of GFS-6.1.5, not the latest stable branch of the CVS. How reproducible: 100% reproducible Steps to Reproduce: 1.run statfs() (a "df" should do the same) 2.crash a node at the same time as (1) 3.run "top" and watch out how CPU "df" crunches. Actual results: All CPU is consumed in "df" Expected results: No workload for "df" Additional info: When answering fencing, all goes fine after the end of recovery of GFS.
I have seen something similar. There is a utility running which runs 'dt -kt gfs' every 5 seconds. Sometimes these take a VERY long time (many minutes) to complete and something only a few seconds (16TB filesystem). We have also seen this occur when deleting several million files using 'rm -f *' and df hangs indefinitely. strace shows that it is in 'sys_statfs()'. Unfortunately, we cannot get the stack because it is consuming all the resources on the processor. We can only look at it from the other processors on the system. Sometimes the node running the df command gets fenced during this period. Unfortunately, the system logs do not provide any information.
(In reply to comment #1) > There is a utility running which runs 'dt -kt gfs' every 5 seconds. What is 'dt'?
(In reply to comment #0) > On a GNBD+GFS cluster, i am running a statfs() system call while crashing > another node. How are you crashing the other node? > The statfs() system call is blocked (this is normal) , but also > uses 100% of the CPU, as if there was a busy waiting. When the fencing request > is responded, the system call returns and all is back to normal. What fencing method are you using? Also, are you using dlm or gulm?
Typo... meant to be 'df -kt gfs' Using dlm.
> How are you crashing the other node? I'm crashing it using "echo b > /proc/sysrqtrigger". The node does not come back in the GFS cluster. > What fencing method are you using? For the purpose of the experiment, i configured the fencing tool to be manual. Please note that the node was rebooted and didn't joined back the cluster immediately (done it manually) > Also, are you using dlm or gulm? DLM. I also forgot to say it was a 3 nodes cluster.
Hmm. I am using a 5 node cluster with DLM and manual fencing and I didn't see the problem. I need to continue to investigate. Note that 'df' is known to be slow on GFS(1) filesystems, and this is even mentioned in the FAQ. I did see that the CPU utilization was high (~99%) for the df process.
> I didn't see the problem. I need to continue to investigate. Did you mean that you didn't expererienced the 99% CPU usage ? Or do you mean that this is not a bug ? In my mind, having "df" take 99% CPU is not normal; i thought it should - either block and wait the fencing to be replied, and not take 99% - not block, compute results, take 99% for a given period of time (I don't mind if this is long) and then return. My problem was that for the time i replied manually to the fencing, my "df" process was blocked and took 99%. I intentionaly waited a long time (a few minutes) before replying to the fencing request. "df" was blocked for this period, taking 99% CPU.
(In reply to comment #7) > > Did you mean that you didn't expererienced the 99% CPU usage ? Or do you mean > that this is not a bug ? I did see 'df' using 99% CPU. But 'df' appears to use ~99% CPU even under normal circumstances (without introducing a fenced node). I did not 'reply' to the manual fencing request and 'df' returned normally. I will continue to attempt to replicate this problem.
I've spent some time trying to recreate this problem and have not been able to do so. Let me explain what I attempted to do: in a 5-node cluster configured for manual fencing I did 'df' and the immediately killed the node. The failed node was fenced and I didn *not* ack the fencing operation. Meanwhile, output from 'top' showed the 'df' process to be consuming abour 99% CPU. I had several printk statements in the kernel code to watch the progress of the statfs. The 'df' command continued normally and returned successfully without ack-ing the fenceing operation. In comparison, it appeared to behave just as a 'df' on a normal GFS filesystem. Can you tell me what gfs-kernel rpm you have installed? I'd like to make sure we have the same kernel bits.
kernel 2.6.9-34 ccs-1.0.3-0.x86_64.rpm cman-1.0.4-0.x86_64.rpm cman-kernheaders-2.6.9-41.0.x86_64.rpm dlm-1.0.0-5.x86_64.rpm dlm-kernheaders-2.6.9-39.1.x86_64.rpm dmraid-1.0.0.rc8-1_RHEL4_U2.x86_64.rpm fence-1.32.18-0.x86_64.rpm GFS-6.1.5-0.x86_64.rpm GFS-kernheaders-2.6.9-45.0.x86_64.rpm gnbd-1.0.2-0.x86_64.rpm gnbd-kernheaders-2.6.9-9.14.x86_64.rpm gulm-1.0.6-0.x86_64.rpm iddev-2.0.0-3.x86_64.rpm ipvsadm-1.24-6.x86_64.rpm lvm2-cluster-2.02.01-1.2.RHEL4.x86_64.rpm magma-1.0.4-U4pre1.x86_64.rpm magma-plugins-1.0.6-U4pre1.x86_64.rpm perl-Net-Telnet-3.03-3.noarch.rpm rgmanager-1.9.46-U4pre1.x86_64.rpm system-config-cluster-1.0.25-1.0.noarch.rpm
I don't see a gfs-kernel package, but I am assuming it is the same as the kernelheader (2.6.9-45). I have this code running on my cluster now and the performance for df is quite good. Is there a heavy I/O load on the filesystem while the df is running? In addition, I should point out that running 'df' every 5 seconds on a 16TB filesystem (or any size for that matter) is a bad idea. This will have a severe impact on cluster/filesystem performance.
I've reproduced it on the "STABLE" branch. Here is the experiment: In a 3 nodes cluster, only 2 nodes have the GFS filesystem mounted. Both nodes are running I/Os (these are performance tests run with bonnie++; i think any kind of intensive I/O should do). At the same time, i issue: - "echo b > /proc/sysrq-trigger" on the 1st node - "df" on the 2nd node On 1 of the 2 remaining nodes, the fence request appears. At this time, i kill the bonnie++ on the second node. "df" continues to run undefinitely... till a reply the fence request. I've performed the same test with no I/Os : "df" exits immediately.
When a node is fenced, all activity on the filesystem is blocked until fencing succeeds. This should only be apparent when using fence_manual since it requires a user to manually ack the fence operation. Running "df" while a fence operation is waiting to be ack'd will cause it to block. However, if "df" can get its information from cache, it will complete successfully even with a pending fence operation. I believe that the real issue being reported here is not that "df" will block while waiting for a fence operation to be ack'd (that is expected), but rather that the "df" process spins, consuming ~99% of the cpu. I've tried to recreate this recently and have been unable to do so. Closing this for the time being.