Bug 160315 - System Hangs when trying to write data to full filesystem
System Hangs when trying to write data to full filesystem
Status: CLOSED WORKSFORME
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
athlon Linux
medium Severity medium
: ---
: ---
Assigned To: Peter Staubach
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-06-14 07:53 EDT by Kieran Foley
Modified: 2007-11-30 17:07 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-06-20 15:19:08 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Kieran Foley 2005-06-14 07:53:46 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)

Description of problem:
We have a cronjob that runs nightly which copies files/data from one filesystem to another. Both of these filesystems are configured using Hitachi storage, connected through a SAN. They are mounted on logical volumes using LVM.
It has happened on 2 occasions that the filesystem that we are copying data to fills up 100%. The cron job continues to try and write data to the filesystem but because it it full the system hangs and the only way we can get on it and kill the jobs is to power cycle the server. We have since grown the filesystems and hopefully this will not happen again but I'd like to know if it is normal for the server to hang in this situation and if not what can we do to prevent a re-occurance?

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Fill up file systems 100%
2.Continuously try to write date to the filesystem that's 100%
3.Systems hangs
  

Actual Results:  systems hangs

Additional info:
Comment 1 Bill Nottingham 2005-06-14 12:10:27 EDT
Assigning the kernel.

Kieran: please reset the arch to something that accurately reflects which OS
version you're running (it's currently set to sparc.)
Comment 2 Kieran Foley 2005-06-14 13:21:34 EDT
Hi Bill,

The OS is Linux 

[root@recordme root]# cat /etc/redhat-release 
Red Hat Enterprise Linux ES release 3 (Taroon)

[root@recordme root]# uname -a
Linux recordme 2.4.21-4.ELsmp #1 SMP Fri Oct 3 17:31:21 EDT 2003 i686 athlon 
i386 GNU/Linux

The Hardware is a Sun Fire V20.
Comment 4 Peter Staubach 2005-06-20 10:27:59 EDT
It doesn't seem like something as simple as filling up a file system should
cause a system hang.

I tried to reproduce this by creating a 14G file system, comfortably larger
than the 1G memory on the system, and then running a program which continually
tries to write a 64K buffer to a file.  Eventually, the file system fills up.

It was quite slow at times while the file and file system were filling up, but
I haven't been able to reproduce a hang yet.  The testcase is just getting a
ENOSPC error each time that it attempts to write the buffer.

So, I need some more information.  Does this happen everytime that the file
system fills up?  Are there any messages via dmesg(8)?  This cronjob, how is
it copying files?
Comment 5 Kieran Foley 2005-06-20 11:04:08 EDT
The system hung on the 2 occasions that the file system filled up over. Both 
times it happened over a weekend. I am guessing thata the cron was trying to 
continuously trying to write for over a day. We were not notified of the 
matter until Monday morning. The cronjob uses the mv command to move the files.

The filesystems in question are configured from a Hitachi 9970 storage unit. 
The server is connected up through a SAN using qlogic HBA's.
Comment 6 Peter Staubach 2005-06-20 14:15:56 EDT
I am not having any luck in reproducing this situation.  I don't have access to
a Hitachi 9970 accessed qlogic HBA's, but I don't know why that would matter,
unless there were messages in /var/log/messages to indicate that there was
something going on there.  There weren't any messages, were there?

I will need some information in order to proceed.  Is the system pingable?  Is
this a hard hang, where nothing appears to be doing anything?  Would it be
possible to capture some information the next that this happens?  I am thinking
of Alt-SysRq-T, Alt-SysRq-M, Alt-SysRq-P, and Alt-SysRq-W.  (This will have to
be enabled first via something like "echo 1 > /proc/sys/kernel/sysrq", before
the hang occurs.)

Comment 7 Kieran Foley 2005-06-20 15:02:11 EDT
There were no messages in the /var/log/messages file to say that there were 
issues with the qlogics etc...

The system was pingable but that's about all. It was a hard hang. 

I have enabled Sysrq but I hope that this does not re-occur since we have 
added a considerable amount of storage to prevent a re-occurance and have 
configured monitoring to send alerts when the file systems starts to fill.

Thanks for your help.
Comment 8 Peter Staubach 2005-06-20 15:19:08 EDT
Well, without some more information, I don't see much that I can do.

I am going to close this BZ as "WORKSFORME", but if the problem reoccurs
and more information can be gained, please reopen this BZ and I will look
at it some more.

Good luck...

Note You need to log in before you can comment on or make changes to this bug.