Bug 452430

Summary:	Sealert/SeTroubleShoot thrashing
Product:	[Fedora] Fedora	Reporter:	Bevis King <brwk>
Component:	nfs-utils	Assignee:	John Dennis <jdennis>
Status:	CLOSED WONTFIX	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	high	Docs Contact:
Priority:	low
Version:	8
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2009-01-09 06:37:40 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Bevis King 2008-06-22 21:18:17 UTC

Description of problem:
Just updated to the latest kernel and packages - sealert recommended placing a
file called /.autorelabel in the root directory - did this and rebooted with the
new kernel.  System has been up 1 hr 19 mins - sealert has already clocked up 26
CPU minutes, setroubleshootd has clocked up around 1 min 50 seconds.  System
performance is appallingly slow - load average is 8.23; normal for this system
is around 0.20 maximum.

Selinux status is set to permissive; system is a Athlon 64 box with 2GB RAM, ~
2TB of software RAID disc; low traffic Oracle 10i database, web server and nfs
server loads.  NFS mounts are taking 30-60 seconds per mount to (successfully)
mount from the 3 served clients.

Please advise on how to reduce the impact.

Version-Release number of selected component (if applicable):
Kernel 2.6.25.6-27.fc8 
All updates latest as at 22nd Jun 2008.

How reproducible:
All the time.

Steps to Reproduce:
Attempt basic DB, NFS and Web operations on a lightly loaded server with selinux
status set to permissive.
  
Actual results:
Heavily loaded system; unmanageable number of file attribute errors despite
relabeling, appalling NFS mount lead-time.

Expected results:
Normal operation - quick mounts, etc.

Additional info:

Comment 1 John Dennis 2008-06-23 14:55:35 UTC

What is the version of setroubleshoot? Versions less than 2.x can produce
excessive alerts in some situations.

Comment 2 Bevis King 2008-06-23 15:08:07 UTC

setroubleshoot-2.0.5-2.fc8

Update: the system became totally catatonic within 2 hours and to be rebooted
with selinux disabled completely to get any user service back.

I do have the massive log files, but doubt I can upload them - they're massive.
 Is there any kind of grep I can perform on them that will help diagnose what
caused this problem.

I guess the key observations are:
1. doing the /.autorelabel suggested by sealert definitely made things much
worse than it had been.
2. sealert should probably detect just how many errors it's been reporting and
drop itself out of the loop after maybe a couple of hundred similar events in
quick succession - a "last message repeated 8,000 times" kind of response rather
than trying to queue that many notifications which seems likely the way it got
to 26 minutes CPU time in 1hr 20m uptime.

Hope that helps...

Regards, Bevis.

Comment 3 John Dennis 2008-06-23 15:34:32 UTC

You should be fine with this version, I suspect there are other issues going on.
setroubleshoot does (or should) aggregate identical alerts into a single alert
(incrementing the alert's report count). setroubleshoot should also truncate
unique alerts to a maximum of 50.

I sounds like you've got a situation where you're constantly getting denials
(denials are reported in permissive mode). setroubleshoot still has to run to
determine the current denial is identical to a previous denial so even if
setroubleshoot is correctly coalescing these into a single alert it is going to
consume resources.

You have two choices, turn off setroubleshoot (service setroubleshoot stop or
chkconfig setroubleshoot off), or better yet fix the problem it's trying to warn
you about. Open the sealert browser (Applications-->System Tools-->SELinux
Troubleshooter) and look for alerts with high alert counts (or click on the
report count column to sort by report count). Then figure out why you're getting
that denial so frequently. We can help you here, after you identify the alert
with the high count use (Edit-->Copy Alert) to put the alert on the clipboard
and paste it into this bugzilla.

Re, large log files, you don't specify which log files you're talking about, the
audit log file should be large in this instance, but not the setroubleshoot log
file (/var/log/setroubleshoot/setroubleshootd.log) unless it's continuously
faulting and writing tracebacks into the log file, is it?

Comment 4 Bevis King 2008-07-19 11:21:59 UTC

OK, I think I've tracked down where there problem is.  This is actually to do
with a memory leak in the NFS mountd code, the rpc.mountd does this about 20,000
times per mount request:

ioctl(9, DM_TABLE_DEPS, 0x7fd62ddc6490) = 0
ioctl(9, DM_TABLE_DEPS, 0x7fd62ddc6490) = 0
ioctl(9, DM_TABLE_DEPS, 0x7fd62ddc6490) = 0
ioctl(9, DM_TABLE_DEPS, 0x7fd62ddc6490) = 0
ioctl(9, DM_TABLE_DEPS, 0x7fd62ddc6490) = 0
ioctl(9, DM_TABLE_DEPS, 0x7fd62ddc6490) = 0
ioctl(9, DM_TABLE_DEPS, 0x7fd62ddc6490) = 0
ioctl(9, DM_TABLE_DEPS, 0x7fd62ddc6490) = 0
ioctl(9, DM_TABLE_DEPS, 0x7fd62ddc6490) = 0
ioctl(9, DM_TABLE_DEPS, 0x7fd62ddc6490) = 0

Doing some research this seems to be covered by debian bug:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=413661

and:
http://linux-nfs.org/pipermail/nfsv4/2007-July/006339.html

The discussions pointed to above seem to indicate that the bug was fixed in
e2fsprogs 1.40 and nfs-kernel-server 1.0.12.  I'm not sure where that leaves
Fedora 8.

I can understand that my situation may be extreme as the system has three Linux
software RAID devices, each producing a volume group, with a total of about 40
logical volumes defined.  Do a device manager rescan and you'll pretty much
disappear under the load.

Any thoughts on this?

Regards, Bevis.

Comment 5 Bug Zapper 2008-11-26 10:54:33 UTC

This message is a reminder that Fedora 8 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 8.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '8'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 8's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 8 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 6 Bug Zapper 2009-01-09 06:37:40 UTC

Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.