Bug 250718

Summary:

fs.sh inefficient scripting leads to load peaks and disk saturation

Product:

Red Hat Enterprise Linux 5

Reporter:

Janne Peltonen <janne.peltonen>

Component:

rgmanager

Assignee:

Lon Hohberger <lhh>

Status:

CLOSED ERRATA

QA Contact:

Cluster QE <mspqa-list>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

5.0

CC:

cluster-maint, cward, h.plankl, tao, uniks

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2009-09-02 11:04:20 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

487600

Bug Blocks:

Attachments:

Description	Flags
Weekly load graph for one system with the fs.sh problem	none
A patch to readlinkr.c to prevent handling an ablosute link as relative	none
Weekly load with fs.sh up to 26th and fsc beg. with 27th	none
fs.sh which has a quick_status option.	none
strace of old fs.sh without quick_status	none
strace of new fs.sh using quick_status="1"	none
strace of fsc	none

Description Janne Peltonen 2007-08-03 06:44:05 UTC

Description of problem:

In at least my x86_64 machines, I get an incomprehensible number of fs.sh execs
per day. Rgmanagers invokes the script ok as configured, once per 10 seconds for
each fs, but there are so many subshells created that the number of fs.sh execs
grows. I have 26 file systems to check. In each ten-second period, there are six
seconds with 0 fs.sh execs, one second with abt 100..200 execs, two seconds with
abt 1500..2000 execs, and one second with abt 100..200 execs. This adds up to
some tens of millions of fs.sh execs per day.

Something with these fs.sh execs creates a periodical load fluctuation for my
machines. In a mostly idle system, the load increases to abt 4 and then
decreases again. (See the attached graph. I inserted an exit 0 to the front of
fs.sh status switch-case the day before yesterday (Aug 01), and the periodical
peaks disappeared. The smaller peaks yesterday (Aug 02) are 'real' load peaks
(that is, I have an explanation for them) and have nothing to do with fs.sh.
Also, I caught a beginning load increase in action on an another server,
inserted the exit 0 - loads promptly fell back from abt 10.0 to abt 0.5. (Sorry,
no graph.))

The problem with these load peaks is, if I have a lot of real fs/disk load on
the system, it might start acting wildly if the real load peaks coincide with
the fs.sh caused peaks. I've seen loads like 500... and a really badly stuck
system. (I still don't get what it is that periodically increases and decreases.
The number of fs.sh invocations is more or less constant all the time. Somehow
they add up somewhere, then something seems to flow over, and starts to slowly
add up again.) Also, they seem to sharpen real load peaks, even when the system
doesn't get stuck.

Version-Release number of selected component (if applicable):

2.0.27-2.1lhh.el5 (also 2.0.24-1.el5)

How reproducible:

Enable process accounting.

Create a resource group with an fs resource (ext3, on an fc disk. I've got
qlogic hba's and eva) and start it.
 
Actual results:

Ten to twenty thousand fs.sh execs per minute, according to process accounting.
Periodical load peaks. System getting stuck on disk operations.

Expected results:

Abt same numbers of fs.sh execs as there are, for example, ip.sh execs. No
excess load or disk saturation.

Additional info:

My system is a Centos 5, but Lon asked me to file a bugzilla anyway... ;)

Comment 1 Janne Peltonen 2007-08-03 06:44:09 UTC

Created attachment 160582 [details]
Weekly load graph for one system with the fs.sh problem

Comment 2 Lon Hohberger 2007-08-13 15:01:18 UTC

Excellent, thanks.  There are a number of optimizations we can make fairly
quickly. such as replacing pattern matching/substitution utilities
(grep/awk/etc) with pure bash script.  This will (by itself!) reduce load a bit,
but there's more we can do for sure.

Comment 3 Lon Hohberger 2007-08-13 20:10:42 UTC

Fixing product.

Comment 4 Janne Peltonen 2007-08-14 07:30:06 UTC

Apparently, the load caused by the fs.sh execs wasn't the reason my system got
stuck; the reason was plain and simple memory starvation. Now, there were the
load peaks on a mostly idle system that went away as I added the exit 0 into
fs.sh status. But the other system, with real load, stopped getting stuck only
after I added more memory. So the real culprit wasn't fs.sh after all. And this
means I should lower the severity of this bug, too.

As a side note, and this should perhaps be a separate bugzilla, after adding the
memory and being able to see what's actually happening with loads on the busy
system, I noticed that there were still small load peaks left, with a height of
abt +6 (that is, they add abt 6 units of load to any real load there is), and
with abt an eleven-hour period. These peaks don't seem to be reflected in any
other statistics; I can only assume there is something going on inside kernel...
The not-so-busy system still doesn't have the load peaks. It also doesn't have
as many clustered services running as the other one.

Comment 5 Janne Peltonen 2007-08-17 19:06:23 UTC

Oh yes, the smaller peaks with the 11-hour period aren't caused by ip.sh, or at
least they didn't go away when I put an exit 0 into the beginning of ip.sh status.

Comment 6 Lon Hohberger 2007-08-21 13:03:46 UTC

There are other, additional ways we can limit load here, too.  For example, if
we disable status checks for the 'service.sh' agent (which is a no-op).  That's
one less, although that one only happens once per hour by default.

Comment 7 Lon Hohberger 2007-09-06 15:03:14 UTC

One way to make this work is to build a FS replacement agent in C.

Comment 9 Kiersten (Kerri) Anderson 2007-11-19 19:57:29 UTC

All cluster version 5 defects should be reported under red hat enterprise linux
5 product name - not cluster suite.

Comment 10 Lon Hohberger 2008-02-14 01:51:40 UTC

I've written a program which might help - it's sort of a drop-in replacement for
fs.sh.

http://people.redhat.com/lhh/fsc-0.5.tar.gz

Notes:
* This forks to call the 'findfs' utility
* This *DOES NOT* update /etc/mtab - the standard 'mount' utility is not spawned.
* Specifying your file system type (ext2, ext3) is required in cluster.conf.
* force_unmount, self_fence, etc. are not implemented at this point
* You must move (or chmod -x) fs.sh if you intend to try this out,
* fsc does no logging whatsoever; you should test with rg_test suitably before
trying it in a cluster. See:

   http://sources.redhat.com/cluster/wiki/ResourceTrees

...for more information about how to use rg_test to test your services.

Comment 11 Lon Hohberger 2008-02-26 18:24:59 UTC

Let me know if this is the right direction for you.

If you require them for any testing, I can make the following changes fairly easily:
 * make self_fence work
 * fstype default to 'ext3'
 * alternatively, we could build the mount(1) command
   line and fork + exec it.  This will reduce performance
   a lot, however, it will update mtab and make the fstype
   requirement obsolete.

Comment 12 Janne Peltonen 2008-02-27 10:25:46 UTC

Created attachment 296043 [details]
A patch to readlinkr.c to prevent handling an ablosute link as relative

fsc seems good so far, I've yet to gather the courage to apply it in the
production environment. With the patch attached, it seems to work OK with my
test setup.

Comment 13 Lon Hohberger 2008-02-27 15:03:39 UTC

I've applied your patch to my source base.  All feedback is appreciated; even if
you're not running it in production.

Comment 14 Janne Peltonen 2008-02-28 07:16:45 UTC

Created attachment 296164 [details]
Weekly load with fs.sh up to 26th and fsc beg. with 27th

I did replace fs.sh with fsc on a not-so-critical production cluster with 48
cluster-controlled ext3 file systems. The results (disappearance of the phantom
load peaks) are clearly visible on the weekly load graphs; see attachment.

Comment 15 Lon Hohberger 2008-12-03 16:59:26 UTC

*** Bug 474364 has been marked as a duplicate of this bug. ***

Comment 16 Lon Hohberger 2008-12-03 17:02:43 UTC

Whoops - current agent w/ patch applied:

http://people.redhat.com/lhh/fsc-0.5.1.tar.gz

Comment 17 Lon Hohberger 2008-12-08 14:54:12 UTC

Also missing is a check to see if the file system is still accessible.

Comment 19 Lon Hohberger 2009-03-02 20:29:58 UTC

http://people.redhat.com/lhh/fsc-0.5.2.tar.gz

Updated agent.  Includes external_mount="[0|1]" option, which forks/execs mount/umount during start/stop.  This has the benefit of updating /etc/mtab.

Comment 20 Lon Hohberger 2009-03-02 20:30:57 UTC

Also includes self_fence support and an auto-generated man page.

Comment 21 Lon Hohberger 2009-03-03 15:19:25 UTC

Created attachment 333878 [details]
fs.sh which has a quick_status option.

This agent is an updated fs.sh agent which has a quick_status option.  The quick_status option trades off verbosity for speed.  When quick_status="1" in cluster.conf for a given file system, fs.sh does not fork().

I verified this using 'strace -vf'.

Comment 22 Lon Hohberger 2009-03-03 15:20:25 UTC

Note: It can fork if you are using symbolic links, LABEL= or UUID=.

Also, because it does not fork, it also does not log.

Comment 23 Lon Hohberger 2009-03-03 15:38:37 UTC

Lack of logging is a known limitation of fsc, so this new agent does not introduce something which was not already a trade off for using fsc.

Comment 24 Lon Hohberger 2009-03-03 16:28:46 UTC

Created attachment 333890 [details]
strace of old fs.sh without quick_status

Comment 25 Lon Hohberger 2009-03-03 16:29:09 UTC

Created attachment 333891 [details]
strace of new fs.sh using quick_status="1"

Comment 26 Lon Hohberger 2009-03-03 16:41:27 UTC

Created attachment 333893 [details]
strace of fsc

fsc is still faster and produces less strace output, but fs.sh with quick_status is pretty good and saves a lot of maintenance that would be introduced if we included fsc directly.  Also, it's less confusing to 'turn on' quick_status than it is to swap resource agents around.

A lot of the "bloat" in the newer fs.sh strace output is rt_sigprocmask() which occurs many times.

[root@molly ~]# wc -l fs.sh-old.out 
8841 fs.sh-old.out
[root@molly ~]#  wc -l ./fs.sh-new.out 
629 ./fs.sh-new.out
[root@molly ~]# grep -v rt_sig ./fs.sh-new.out | wc -l
205
[root@molly ~]# wc -l fsc.out
39 fsc.out

However, the important parts...

[root@molly ~]# grep ^clone\(Proc ./fs.sh-old.out  | wc -l
41
[root@molly ~]# grep ^clone\(Proc ./fs.sh-new.out  | wc -l
0
[root@molly ~]# grep ^clone\(Proc ./fsc.out  | wc -l
0

Comment 27 Lon Hohberger 2009-03-04 20:25:43 UTC

Note that the RelaxNG schema doesn't know about the new quick_status parameter and therefore will be upset about it.

It cannot be added to the schema until the fs.sh change is deemed acceptable.

I updated fsc based on patch from Eduardo Damato; he noticed that the format string was wrong if there were no mount options specified.  Oops :)

http://people.redhat.com/lhh/fsc-0.5.3.tar.gz

Comment 28 Lon Hohberger 2009-03-04 20:43:17 UTC

Note that fsc more or less got rejected for upstream inclusion on the basis that it's a waste of effort to maintain a second agent to do something we already provide.  Furthermore, it's written in C.

This is why fs.sh was carefully (and painfully) updated to eliminate fork() and clone().

Comment 29 Lon Hohberger 2009-03-31 19:13:18 UTC

*** Bug 487600 has been marked as a duplicate of this bug. ***

Comment 30 Lon Hohberger 2009-03-31 19:16:05 UTC

http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=36cecda9ca9b879b631531e80a0278ed8886d893

Also, a related patch here which allows administrators to cap status check children:

http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=b90358e8b77d0dfbfed4335757feda76d0b677a9

Comment 33 Krzysztof Kopec 2009-06-05 08:30:09 UTC

We have similar problem on our cluster, where are about 25 ext3 resources. Cluster have only two nodes and load on both nodes are consistently about 10...
We've just updated fs.sh to the one with quick_status option and first test looks fine. So, is there any chance that this new fs.sh will be released as a official errata?

Comment 34 Lon Hohberger 2009-06-09 18:03:22 UTC

Quick_status is slated for RHEL 5.4 inclusion.

Comment 35 Chris Ward 2009-07-03 17:57:32 UTC

~~ Attention - RHEL 5.4 Beta Released! ~~

RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value.

Questions can be posted to this bug or your customer or partner representative.

Comment 37 errata-xmlrpc 2009-09-02 11:04:20 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1339.html