Bug 250718
Summary: | fs.sh inefficient scripting leads to load peaks and disk saturation | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Janne Peltonen <janne.peltonen> |
Component: | rgmanager | Assignee: | Lon Hohberger <lhh> |
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 5.0 | CC: | cluster-maint, cward, h.plankl, tao, uniks |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-09-02 11:04:20 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 487600 | ||
Bug Blocks: | |||
Attachments: |
Description
Janne Peltonen
2007-08-03 06:44:05 UTC
Created attachment 160582 [details]
Weekly load graph for one system with the fs.sh problem
Excellent, thanks. There are a number of optimizations we can make fairly quickly. such as replacing pattern matching/substitution utilities (grep/awk/etc) with pure bash script. This will (by itself!) reduce load a bit, but there's more we can do for sure. Fixing product. Apparently, the load caused by the fs.sh execs wasn't the reason my system got stuck; the reason was plain and simple memory starvation. Now, there were the load peaks on a mostly idle system that went away as I added the exit 0 into fs.sh status. But the other system, with real load, stopped getting stuck only after I added more memory. So the real culprit wasn't fs.sh after all. And this means I should lower the severity of this bug, too. As a side note, and this should perhaps be a separate bugzilla, after adding the memory and being able to see what's actually happening with loads on the busy system, I noticed that there were still small load peaks left, with a height of abt +6 (that is, they add abt 6 units of load to any real load there is), and with abt an eleven-hour period. These peaks don't seem to be reflected in any other statistics; I can only assume there is something going on inside kernel... The not-so-busy system still doesn't have the load peaks. It also doesn't have as many clustered services running as the other one. Oh yes, the smaller peaks with the 11-hour period aren't caused by ip.sh, or at least they didn't go away when I put an exit 0 into the beginning of ip.sh status. There are other, additional ways we can limit load here, too. For example, if we disable status checks for the 'service.sh' agent (which is a no-op). That's one less, although that one only happens once per hour by default. One way to make this work is to build a FS replacement agent in C. All cluster version 5 defects should be reported under red hat enterprise linux 5 product name - not cluster suite. I've written a program which might help - it's sort of a drop-in replacement for fs.sh. http://people.redhat.com/lhh/fsc-0.5.tar.gz Notes: * This forks to call the 'findfs' utility * This *DOES NOT* update /etc/mtab - the standard 'mount' utility is not spawned. * Specifying your file system type (ext2, ext3) is required in cluster.conf. * force_unmount, self_fence, etc. are not implemented at this point * You must move (or chmod -x) fs.sh if you intend to try this out, * fsc does no logging whatsoever; you should test with rg_test suitably before trying it in a cluster. See: http://sources.redhat.com/cluster/wiki/ResourceTrees ...for more information about how to use rg_test to test your services. Let me know if this is the right direction for you. If you require them for any testing, I can make the following changes fairly easily: * make self_fence work * fstype default to 'ext3' * alternatively, we could build the mount(1) command line and fork + exec it. This will reduce performance a lot, however, it will update mtab and make the fstype requirement obsolete. Created attachment 296043 [details]
A patch to readlinkr.c to prevent handling an ablosute link as relative
fsc seems good so far, I've yet to gather the courage to apply it in the
production environment. With the patch attached, it seems to work OK with my
test setup.
I've applied your patch to my source base. All feedback is appreciated; even if you're not running it in production. Created attachment 296164 [details]
Weekly load with fs.sh up to 26th and fsc beg. with 27th
I did replace fs.sh with fsc on a not-so-critical production cluster with 48
cluster-controlled ext3 file systems. The results (disappearance of the phantom
load peaks) are clearly visible on the weekly load graphs; see attachment.
*** Bug 474364 has been marked as a duplicate of this bug. *** Whoops - current agent w/ patch applied: http://people.redhat.com/lhh/fsc-0.5.1.tar.gz Also missing is a check to see if the file system is still accessible. http://people.redhat.com/lhh/fsc-0.5.2.tar.gz Updated agent. Includes external_mount="[0|1]" option, which forks/execs mount/umount during start/stop. This has the benefit of updating /etc/mtab. Also includes self_fence support and an auto-generated man page. Created attachment 333878 [details]
fs.sh which has a quick_status option.
This agent is an updated fs.sh agent which has a quick_status option. The quick_status option trades off verbosity for speed. When quick_status="1" in cluster.conf for a given file system, fs.sh does not fork().
I verified this using 'strace -vf'.
Note: It can fork if you are using symbolic links, LABEL= or UUID=. Also, because it does not fork, it also does not log. Lack of logging is a known limitation of fsc, so this new agent does not introduce something which was not already a trade off for using fsc. Created attachment 333890 [details]
strace of old fs.sh without quick_status
Created attachment 333891 [details]
strace of new fs.sh using quick_status="1"
Created attachment 333893 [details]
strace of fsc
fsc is still faster and produces less strace output, but fs.sh with quick_status is pretty good and saves a lot of maintenance that would be introduced if we included fsc directly. Also, it's less confusing to 'turn on' quick_status than it is to swap resource agents around.
A lot of the "bloat" in the newer fs.sh strace output is rt_sigprocmask() which occurs many times.
[root@molly ~]# wc -l fs.sh-old.out
8841 fs.sh-old.out
[root@molly ~]# wc -l ./fs.sh-new.out
629 ./fs.sh-new.out
[root@molly ~]# grep -v rt_sig ./fs.sh-new.out | wc -l
205
[root@molly ~]# wc -l fsc.out
39 fsc.out
However, the important parts...
[root@molly ~]# grep ^clone\(Proc ./fs.sh-old.out | wc -l
41
[root@molly ~]# grep ^clone\(Proc ./fs.sh-new.out | wc -l
0
[root@molly ~]# grep ^clone\(Proc ./fsc.out | wc -l
0
Note that the RelaxNG schema doesn't know about the new quick_status parameter and therefore will be upset about it. It cannot be added to the schema until the fs.sh change is deemed acceptable. I updated fsc based on patch from Eduardo Damato; he noticed that the format string was wrong if there were no mount options specified. Oops :) http://people.redhat.com/lhh/fsc-0.5.3.tar.gz Note that fsc more or less got rejected for upstream inclusion on the basis that it's a waste of effort to maintain a second agent to do something we already provide. Furthermore, it's written in C. This is why fs.sh was carefully (and painfully) updated to eliminate fork() and clone(). *** Bug 487600 has been marked as a duplicate of this bug. *** http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=36cecda9ca9b879b631531e80a0278ed8886d893 Also, a related patch here which allows administrators to cap status check children: http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=b90358e8b77d0dfbfed4335757feda76d0b677a9 We have similar problem on our cluster, where are about 25 ext3 resources. Cluster have only two nodes and load on both nodes are consistently about 10... We've just updated fs.sh to the one with quick_status option and first test looks fine. So, is there any chance that this new fs.sh will be released as a official errata? Quick_status is slated for RHEL 5.4 inclusion. ~~ Attention - RHEL 5.4 Beta Released! ~~ RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner! If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity. Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value. Questions can be posted to this bug or your customer or partner representative. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1339.html |