Red Hat Bugzilla – Bug 160315
System Hangs when trying to write data to full filesystem
Last modified: 2007-11-30 17:07:07 EST
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)
Description of problem:
We have a cronjob that runs nightly which copies files/data from one filesystem to another. Both of these filesystems are configured using Hitachi storage, connected through a SAN. They are mounted on logical volumes using LVM.
It has happened on 2 occasions that the filesystem that we are copying data to fills up 100%. The cron job continues to try and write data to the filesystem but because it it full the system hangs and the only way we can get on it and kill the jobs is to power cycle the server. We have since grown the filesystems and hopefully this will not happen again but I'd like to know if it is normal for the server to hang in this situation and if not what can we do to prevent a re-occurance?
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.Fill up file systems 100%
2.Continuously try to write date to the filesystem that's 100%
Actual Results: systems hangs
Assigning the kernel.
Kieran: please reset the arch to something that accurately reflects which OS
version you're running (it's currently set to sparc.)
The OS is Linux
[root@recordme root]# cat /etc/redhat-release
Red Hat Enterprise Linux ES release 3 (Taroon)
[root@recordme root]# uname -a
Linux recordme 2.4.21-4.ELsmp #1 SMP Fri Oct 3 17:31:21 EDT 2003 i686 athlon
The Hardware is a Sun Fire V20.
It doesn't seem like something as simple as filling up a file system should
cause a system hang.
I tried to reproduce this by creating a 14G file system, comfortably larger
than the 1G memory on the system, and then running a program which continually
tries to write a 64K buffer to a file. Eventually, the file system fills up.
It was quite slow at times while the file and file system were filling up, but
I haven't been able to reproduce a hang yet. The testcase is just getting a
ENOSPC error each time that it attempts to write the buffer.
So, I need some more information. Does this happen everytime that the file
system fills up? Are there any messages via dmesg(8)? This cronjob, how is
it copying files?
The system hung on the 2 occasions that the file system filled up over. Both
times it happened over a weekend. I am guessing thata the cron was trying to
continuously trying to write for over a day. We were not notified of the
matter until Monday morning. The cronjob uses the mv command to move the files.
The filesystems in question are configured from a Hitachi 9970 storage unit.
The server is connected up through a SAN using qlogic HBA's.
I am not having any luck in reproducing this situation. I don't have access to
a Hitachi 9970 accessed qlogic HBA's, but I don't know why that would matter,
unless there were messages in /var/log/messages to indicate that there was
something going on there. There weren't any messages, were there?
I will need some information in order to proceed. Is the system pingable? Is
this a hard hang, where nothing appears to be doing anything? Would it be
possible to capture some information the next that this happens? I am thinking
of Alt-SysRq-T, Alt-SysRq-M, Alt-SysRq-P, and Alt-SysRq-W. (This will have to
be enabled first via something like "echo 1 > /proc/sys/kernel/sysrq", before
the hang occurs.)
There were no messages in the /var/log/messages file to say that there were
issues with the qlogics etc...
The system was pingable but that's about all. It was a hard hang.
I have enabled Sysrq but I hope that this does not re-occur since we have
added a considerable amount of storage to prevent a re-occurance and have
configured monitoring to send alerts when the file systems starts to fill.
Thanks for your help.
Well, without some more information, I don't see much that I can do.
I am going to close this BZ as "WORKSFORME", but if the problem reoccurs
and more information can be gained, please reopen this BZ and I will look
at it some more.