Description of problem: In at least my x86_64 machines, I get an incomprehensible number of fs.sh execs per day. Rgmanagers invokes the script ok as configured, once per 10 seconds for each fs, but there are so many subshells created that the number of fs.sh execs grows. I have 26 file systems to check. In each ten-second period, there are six seconds with 0 fs.sh execs, one second with abt 100..200 execs, two seconds with abt 1500..2000 execs, and one second with abt 100..200 execs. This adds up to some tens of millions of fs.sh execs per day. Something with these fs.sh execs creates a periodical load fluctuation for my machines. In a mostly idle system, the load increases to abt 4 and then decreases again. (See the attached graph. I inserted an exit 0 to the front of fs.sh status switch-case the day before yesterday (Aug 01), and the periodical peaks disappeared. The smaller peaks yesterday (Aug 02) are 'real' load peaks (that is, I have an explanation for them) and have nothing to do with fs.sh. Also, I caught a beginning load increase in action on an another server, inserted the exit 0 - loads promptly fell back from abt 10.0 to abt 0.5. (Sorry, no graph.)) The problem with these load peaks is, if I have a lot of real fs/disk load on the system, it might start acting wildly if the real load peaks coincide with the fs.sh caused peaks. I've seen loads like 500... and a really badly stuck system. (I still don't get what it is that periodically increases and decreases. The number of fs.sh invocations is more or less constant all the time. Somehow they add up somewhere, then something seems to flow over, and starts to slowly add up again.) Also, they seem to sharpen real load peaks, even when the system doesn't get stuck. Version-Release number of selected component (if applicable): 2.0.27-2.1lhh.el5 (also 2.0.24-1.el5) How reproducible: Enable process accounting. Create a resource group with an fs resource (ext3, on an fc disk. I've got qlogic hba's and eva) and start it. Actual results: Ten to twenty thousand fs.sh execs per minute, according to process accounting. Periodical load peaks. System getting stuck on disk operations. Expected results: Abt same numbers of fs.sh execs as there are, for example, ip.sh execs. No excess load or disk saturation. Additional info: My system is a Centos 5, but Lon asked me to file a bugzilla anyway... ;)
Created attachment 160582 [details] Weekly load graph for one system with the fs.sh problem
Excellent, thanks. There are a number of optimizations we can make fairly quickly. such as replacing pattern matching/substitution utilities (grep/awk/etc) with pure bash script. This will (by itself!) reduce load a bit, but there's more we can do for sure.
Fixing product.
Apparently, the load caused by the fs.sh execs wasn't the reason my system got stuck; the reason was plain and simple memory starvation. Now, there were the load peaks on a mostly idle system that went away as I added the exit 0 into fs.sh status. But the other system, with real load, stopped getting stuck only after I added more memory. So the real culprit wasn't fs.sh after all. And this means I should lower the severity of this bug, too. As a side note, and this should perhaps be a separate bugzilla, after adding the memory and being able to see what's actually happening with loads on the busy system, I noticed that there were still small load peaks left, with a height of abt +6 (that is, they add abt 6 units of load to any real load there is), and with abt an eleven-hour period. These peaks don't seem to be reflected in any other statistics; I can only assume there is something going on inside kernel... The not-so-busy system still doesn't have the load peaks. It also doesn't have as many clustered services running as the other one.
Oh yes, the smaller peaks with the 11-hour period aren't caused by ip.sh, or at least they didn't go away when I put an exit 0 into the beginning of ip.sh status.
There are other, additional ways we can limit load here, too. For example, if we disable status checks for the 'service.sh' agent (which is a no-op). That's one less, although that one only happens once per hour by default.
One way to make this work is to build a FS replacement agent in C.
All cluster version 5 defects should be reported under red hat enterprise linux 5 product name - not cluster suite.
I've written a program which might help - it's sort of a drop-in replacement for fs.sh. http://people.redhat.com/lhh/fsc-0.5.tar.gz Notes: * This forks to call the 'findfs' utility * This *DOES NOT* update /etc/mtab - the standard 'mount' utility is not spawned. * Specifying your file system type (ext2, ext3) is required in cluster.conf. * force_unmount, self_fence, etc. are not implemented at this point * You must move (or chmod -x) fs.sh if you intend to try this out, * fsc does no logging whatsoever; you should test with rg_test suitably before trying it in a cluster. See: http://sources.redhat.com/cluster/wiki/ResourceTrees ...for more information about how to use rg_test to test your services.
Let me know if this is the right direction for you. If you require them for any testing, I can make the following changes fairly easily: * make self_fence work * fstype default to 'ext3' * alternatively, we could build the mount(1) command line and fork + exec it. This will reduce performance a lot, however, it will update mtab and make the fstype requirement obsolete.
Created attachment 296043 [details] A patch to readlinkr.c to prevent handling an ablosute link as relative fsc seems good so far, I've yet to gather the courage to apply it in the production environment. With the patch attached, it seems to work OK with my test setup.
I've applied your patch to my source base. All feedback is appreciated; even if you're not running it in production.
Created attachment 296164 [details] Weekly load with fs.sh up to 26th and fsc beg. with 27th I did replace fs.sh with fsc on a not-so-critical production cluster with 48 cluster-controlled ext3 file systems. The results (disappearance of the phantom load peaks) are clearly visible on the weekly load graphs; see attachment.
*** Bug 474364 has been marked as a duplicate of this bug. ***
Whoops - current agent w/ patch applied: http://people.redhat.com/lhh/fsc-0.5.1.tar.gz
Also missing is a check to see if the file system is still accessible.
http://people.redhat.com/lhh/fsc-0.5.2.tar.gz Updated agent. Includes external_mount="[0|1]" option, which forks/execs mount/umount during start/stop. This has the benefit of updating /etc/mtab.
Also includes self_fence support and an auto-generated man page.
Created attachment 333878 [details] fs.sh which has a quick_status option. This agent is an updated fs.sh agent which has a quick_status option. The quick_status option trades off verbosity for speed. When quick_status="1" in cluster.conf for a given file system, fs.sh does not fork(). I verified this using 'strace -vf'.
Note: It can fork if you are using symbolic links, LABEL= or UUID=. Also, because it does not fork, it also does not log.
Lack of logging is a known limitation of fsc, so this new agent does not introduce something which was not already a trade off for using fsc.
Created attachment 333890 [details] strace of old fs.sh without quick_status
Created attachment 333891 [details] strace of new fs.sh using quick_status="1"
Created attachment 333893 [details] strace of fsc fsc is still faster and produces less strace output, but fs.sh with quick_status is pretty good and saves a lot of maintenance that would be introduced if we included fsc directly. Also, it's less confusing to 'turn on' quick_status than it is to swap resource agents around. A lot of the "bloat" in the newer fs.sh strace output is rt_sigprocmask() which occurs many times. [root@molly ~]# wc -l fs.sh-old.out 8841 fs.sh-old.out [root@molly ~]# wc -l ./fs.sh-new.out 629 ./fs.sh-new.out [root@molly ~]# grep -v rt_sig ./fs.sh-new.out | wc -l 205 [root@molly ~]# wc -l fsc.out 39 fsc.out However, the important parts... [root@molly ~]# grep ^clone\(Proc ./fs.sh-old.out | wc -l 41 [root@molly ~]# grep ^clone\(Proc ./fs.sh-new.out | wc -l 0 [root@molly ~]# grep ^clone\(Proc ./fsc.out | wc -l 0
Note that the RelaxNG schema doesn't know about the new quick_status parameter and therefore will be upset about it. It cannot be added to the schema until the fs.sh change is deemed acceptable. I updated fsc based on patch from Eduardo Damato; he noticed that the format string was wrong if there were no mount options specified. Oops :) http://people.redhat.com/lhh/fsc-0.5.3.tar.gz
Note that fsc more or less got rejected for upstream inclusion on the basis that it's a waste of effort to maintain a second agent to do something we already provide. Furthermore, it's written in C. This is why fs.sh was carefully (and painfully) updated to eliminate fork() and clone().
*** Bug 487600 has been marked as a duplicate of this bug. ***
http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=36cecda9ca9b879b631531e80a0278ed8886d893 Also, a related patch here which allows administrators to cap status check children: http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=b90358e8b77d0dfbfed4335757feda76d0b677a9
We have similar problem on our cluster, where are about 25 ext3 resources. Cluster have only two nodes and load on both nodes are consistently about 10... We've just updated fs.sh to the one with quick_status option and first test looks fine. So, is there any chance that this new fs.sh will be released as a official errata?
Quick_status is slated for RHEL 5.4 inclusion.
~~ Attention - RHEL 5.4 Beta Released! ~~ RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner! If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity. Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value. Questions can be posted to this bug or your customer or partner representative.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1339.html