From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt; UiuqmHmqouVilORJ) Description of problem: Periodical hang of system (~10 sec) ----------------------------------- OS: Redhat Linux 7.3 Kernel 2.4.18-3custom #5 SMP i686 CPU: Dual Athlon 2000+ RAM: 2 GB registered ECC HardDisks: 2x Maxtor 80GB 2x 15GB SoftwareRaid-Partions (Mirrored) are in use Description: Systems behaves normally until I'm starting an java-based Chatserver that creates an great amount of load on harddisks. From that point on system stops every ~350(this number varies very much) sec and hangs for ~10-15 sec. After that it continues without any problems until next 350 sec are over and then pauses again. By testing I found out that when the error occures all threads in the system go on running, until they do I/O to the harddisk. At that point they stop and wait for more than 10 seconds and then go on with their work. Threads that do not write to the harddisk do not stop, even if they make output to networkcard or console. My first idea was, that this was caused by the journaling of the ext3-fs. So I changed the "ext3" entries in fstab to "ext2" and did a reboot, but the error keeps on occuring. The file-system is a SoftwareRaid - RaidLevel 1 with ext3 used on it. Here is the output of a perl-script I wrote. It sleeps for 1 second and then checks, how long it really did sleep. If this script only writes to console and I look at the output in my ssh terminal, it does not notice any unnormally long sleeps. If it writes to the harddisk (by redirecting the output to a file) it does notice unnormally long sleeps: [...] Sleep for: 11 sec Time since lastoccurence: 408 sec Sleep for: 6 sec Time since lastoccurence: 297 sec Sleep for: 13 sec Time since lastoccurence: 325 sec Sleep for: 12 sec Time since lastoccurence: 275 sec Sleep for: 11 sec Time since lastoccurence: 408 sec Sleep for: 6 sec Time since lastoccurence: 297 sec Sleep for: 13 sec Time since lastoccurence: 325 sec Sleep for: 12 sec Time since lastoccurence: 275 sec Sleep for: 13 sec Time since lastoccurence: 499 sec Sleep for: 15 sec Time since lastoccurence: 342 sec Sleep for: 9 sec Time since lastoccurence: 260 sec Sleep for: 14 sec Time since lastoccurence: 728 sec [...] At the exactly same moments also the chatserver(and as far as I can see, every other thread either) waits for the same amount of time. So I would be very pleased, if someone could give me a solution to the problem or a hint what to look for. If you require more information about the system, please mail me at hangbug Thanks in advance Mathias Retzlaff Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.Don't know how to reproduce it on another system. 2. 3. Actual Results: the system stops periodically Expected Results: the system should run without hanging Additional info:
can you check if IDE DMA is turned on (hdparm -i /dev/hda) ?
[root]# hdparm -i /dev/hda /dev/hda: Model=MAXTOR 6L080L4, FwRev=A93.0500, SerialNo=664219358072 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs } RawCHS=16383/16/63, TrkSize=32256, SectSize=21298, ECCbytes=4 BuffType=DualPortCache, BuffSize=1819kB, MaxMultSect=16, MultSect=16 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=156355584 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 udma6 AdvancedPM=no WriteCache=enabled Drive Supports : ATA/ATAPI-5 T13 1321D revision 1 : ATA-1 ATA-2 ATA-3 ATA-4 ATA-5
hdparm -i only shows what we negotiated at fs discovery time. If we've dropped to pio due to IO errors, you'll need normal "hdparm /dev/hd*" output to show that. Are there any kernel messages showing up in /var/log/messages which might indicate problems talking to this disk? Also, do you see the same problem if you use the standard Red Hat kernels?
[root]# hdparm /dev/hda /dev/hda: multcount = 16 (on) I/O support = 0 (default 16-bit) unmaskirq = 0 (off) using_dma = 1 (on) keepsettings = 0 (off) nowerr = 0 (off) readonly = 0 (off) readahead = 8 (on) geometry = 9732/255/63, sectors = 156355584, start = 0 busstate = 1 (on) [root]# hdparm /dev/hdc /dev/hdc: multcount = 16 (on) I/O support = 0 (default 16-bit) unmaskirq = 0 (off) using_dma = 1 (on) keepsettings = 0 (off) nowerr = 0 (off) readonly = 0 (off) readahead = 8 (on) geometry = 9732/255/63, sectors = 156355584, start = 0 busstate = 1 (on) -------------------------------------------------------------------------- /var/log/messages: [...] Sep 22 00:22:56 kernel: Uniform Multi-Platform E-IDE driver Revision: 6.31 Sep 22 00:22:56 kernel: ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx Sep 22 00:22:56 kernel: AMD7441: IDE controller on PCI bus 00 dev 39 Sep 22 00:22:56 kernel: AMD7441: chipset revision 4 Sep 22 00:22:56 kernel: AMD7441: not 100%% native mode: will probe irqs later Sep 22 00:22:56 kernel: AMD7441: disabling single-word DMA support (revision < C4) Sep 22 00:22:56 kernel: ide0: BM-DMA at 0xb800-0xb807, BIOS settings: hda:DMA, hdb:DMA Sep 22 00:22:56 kernel: ide1: BM-DMA at 0xb808-0xb80f, BIOS settings: hdc:DMA, hdd:pio Sep 22 00:22:56 kernel: hda: MAXTOR 6L080L4, ATA DISK drive Sep 22 00:22:56 kernel: hdb: FX54++W, ATAPI CD/DVD-ROM drive Sep 22 00:22:56 kernel: hdc: MAXTOR 6L080L4, ATA DISK drive Sep 22 00:22:56 kernel: ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 Sep 22 00:22:56 kernel: ide1 at 0x170-0x177,0x376 on irq 15 Sep 22 00:22:56 kernel: blk: queue c03cd2c4, I/O limit 4095Mb (mask 0xffffffff) Sep 22 00:22:56 kernel: hda: 156355584 sectors (80054 MB) w/1819KiB Cache, CHS=9732/255/63, UDMA(100) Sep 22 00:22:56 kernel: blk: queue c03cd628, I/O limit 4095Mb (mask 0xffffffff) Sep 22 00:22:56 kernel: hdc: 156355584 sectors (80054 MB) w/1819KiB Cache, CHS=155114/16/63, UDMA(100) [...] -------------------------------------------------------------------------- I did not try standard kernel yet. Only change I made in Kernel is, to rise the limit for open file per process from 1024 to 16384.
ehm that limit is a runtime tunable.. no need to recompile for that ;(
I can't find the hard limit for maximum number of open file per process to be runtime tunable. The current value I compiled into the kernel is 16384. When I'm searching for "16384" in [root]# sysctl -A I do find it three times: 1.) net.ipv4.tcp_wmem = 4096 16384 131072 2.) net.ipv4.route.gc_thresh = 16384 3.) kernel.msgmnb = 16384 and none of those is the value I'm looking for. The things I changed are: ------------------------- include/linux/fs.h: old: #define INR_OPEN 1024 ... #define NR_FILE 8192 new: #define INR_OPEN 16384 ... #define NR_FILE 32768 ------------------------- include/linux/limits.h: old: #define NR_OPEN 1024 new: #define NR_OPEN 16384 ------------------------- If I am wrong please tell me so. I'm going to test the standard kernel to see whether the hanging will disappear. (Post here tomorrow morning (CET)). Thanks in advance for your help. Mathias Retzlaff
Umm, changing NR_OPEN will break things in non-obvious ways, especially if you have any old binaries lying around. The correct way to do this is with the setrlimit syscall, or the corresponding shell command ("ulimit" in bash.) Unprivileged users cannot raise the soft limit above the hard limit, so if the hard limit is set to 1024, that's a fixed ceiling unless root changes it. Root can change the limits arbitrarily, though. /etc/security/limits.conf will let you change the default limits for users. If you want particular users to be able to use more than 1024 fds, I'd recommend increasing the hard limit but leaving the soft limit at 1024. That way, users will still get a default 1024 fds, but they will be able to raise that themselves if they want more. That will allow an application to use more fds if it really needs to, without risking breaking old apps which cannot cope with so many fds.
This morning I tested both: booting system with standard-SMP-kernel an with standard-UniProcessor-kernel but the system keeps on hanging periodically. The currently running system is standard-SMP-kernel (Linux 2.4.18-3smp #1 SMP Thu Apr 18 06:59:55 EDT 2002 i686 unknown) and I used /etc/security/limit.conf for raising the max. number of open files. So if you need any further information just ask for it. I'm wondering why kjournald is still running and consuming cputime, although I did switch every partions filesystem to ext2 ...
Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/