Description of problem: Two different servers (with different hardware and HDD-controllers) running a similar setup (although not identical) are hanging nearly every morning (always the time when high disk I/O occurs due to backups). Version-Release number of selected component (if applicable): The problem exists since late Fedora Core 5, 6 and 7 (as far as I remember the problems started around april 2007). The latest updates were always installed (including kernels), without difference. The problem got worse in the last weeks. Earlier we had a deadlock situation once or twice a week, at the moment the deadlock nearly occurs daily. Since the hardware is different on both servers I suspect a software problem here and no hardware problem. From the current investigations it looks like a kernel problem, although all stuck processes are running in userspace. How reproducible: That is the difficult part. So far I think the problem was narrowed down to our mailsystem, which is the most identical part on both servers. It is running like this: Server1: Postfix -> Amavisd (including Spamassassin) -> Cyrus-Imapd (delivering via lmtp) Server2: Postfix > Amavisd -> Procmail -> Spamassassin -> Cyrus-Imapd (delivering via yrus-deliver) Both Postfix instances use MySQL for their mailaliases. There are different partitions (all on LVM) for the serverprocesses, e.g. single partitions for the imap-spool, the /var/lib/imap directory and the postfix queue. Each partition as a corresponding partition on the second disk and is mirrored by rsync every 4 hours. Steps to Reproduce: 1. Wait for high disk I/O 2. 3. Actual results: The system runs into a deadlock, but is still responsive (see below). Expected results: No deadlock should occur. Additional info: If a SSH session is open to the server it is still possible to lookup the processtable, kill some processes (eventually resulting in breaking the deadlock), trigger sysrq, run strace, etc. However, trying to log into that sever in such a situation results in a timeout. I am happy to provide debugging info if pointed to what would be helpful. I will attach two files. The first shows a processlist when the problem exists. Please note that a lot of processes are stuck (e.g. crond, httpd, etc). The second one is a callstacktrace for all processes (resulting from "echo t > /proc/sysrq-trigger"). As far as I know each of such a deadlock situation had high I/O load on /var/spool/imap (it is always this partition). At the moment it is often triggered by rsync mirroring that partition to its corresponding partition on the second disk, but not always (sometimes it is triggered by a backup to a ftp-server, but again the same partition). All partitions use ext3 as filesystem. The fstab for the partitions in question looks like that: /dev/hd2/varspoolimap /var/spool/imap ext3 async,auto,data=ordered,exec,noacl,noatime,nodev,nosuid,nouser,rw 1 2 /dev/hd1/varspoolimap /mnt/mirror/varspoolimap ext3 async,auto,data=ordered,exec,noacl,noatime,nodev,nosuid,nouser,rw 1 2 Looking at the calltrace it seems all processes are waiting for some I/O. I am able to kill still running processes (before I made the attached calltrace I killed all cron instances). If the right process is killed the deadlock will resolv. However, it is not the same process that needs to be killed, which is why a suspect a deadlock in the kernel, either ext3 or scheduler (CFQ) related. I would like to add that an open ssh session is still very responsive, so it is not just a high load causing the server to slow done. The server gets stuck and the deadlock never resolvs alone (at least not waiting several hours). It is either needed to kill processes or hardreset the server (in case no ssh session was open) Lets look at an example of a stuck process (here rsync, which is marked as uninterruptable (D)): Call Trace: [<c042ffcb>] lock_timer_base+0x19/0x35 [<c060bb55>] schedule_timeout+0x70/0x8f [<c042fd37>] process_timeout+0x0/0x5 [<c060bb50>] schedule_timeout+0x6b/0x8f [<c060b67c>] io_schedule_timeout+0x39/0x5d [<c0465eea>] congestion_wait+0x50/0x64 [<c0438539>] autoremove_wake_function+0x0/0x35 [<c04620e2>] balance_dirty_pages_ratelimited_nr+0x148/0x193 [<c045e7fd>] generic_file_buffered_write+0x4c7/0x5d3 [<c05a5c5f>] sock_wfree+0x21/0x36 [<c05a742f>] __kfree_skb+0xb5/0x109 [<c042cca5>] current_fs_time+0x41/0x46 [<c045ed89>] __generic_file_aio_write_nolock+0x480/0x4df [<c045ee3d>] generic_file_aio_write+0x55/0xb4 [<f888ffb8>] ext3_file_write+0x24/0x90 [ext3] [<c047a31c>] do_sync_write+0xc7/0x10a [<c0438539>] autoremove_wake_function+0x0/0x35 [<c047a255>] do_sync_write+0x0/0x10a [<c047ab6e>] vfs_write+0xa8/0x15a [<c047b18b>] sys_write+0x41/0x67 [<c0404f8e>] syscall_call+0x7/0xb Every single stuck process (look at the full attached trace) does some writing -> ext3_file_write -> ... -> balance_dirty_pages_ratelimited_nr -> ... -> congestion_wait -> ... -> schedule_timeout -> lock_timer_base Looking at LKML there were some problems there, and I guess that our problem is also running in some deadlock here. This morning I booted one server with the anticipatory scheduler to sort our problems with CFQ. As written I am able to present more debug information. Please tell me where to look. When logged in I am still able to do "lsof" on all processes, and "strace" at least on the running/waiting processes (doing strace on an uninterruptable "D"-process is no good, having strace hanging as well and no chance to abort). Both servers currently running the recent F7 update kernel: Linux zoidberg.canadatux.org 2.6.22.1-27.fc7 #1 SMP Tue Jul 17 17:13:26 EDT 2007 i686 athlon i386 GNU/Linux But the problem existed with 2.6.21 and 2.6.20 (maybe 2.6.19, not so sure) as well. I suspect that the sysrq-call trace is the most interessting part for debugging it down. Since it is big (500K) I will attach it as seperate file instead of copy and pasting it here. The kernel is not tainted and no special modules are loaded (except standard modules as device-mapper, iptables, adm1026 and the network and disk-controller drivers).
Created attachment 159939 [details] Processlist showing the bug
Created attachment 159940 [details] Calltrace from echo t >/proc/sysrq-trigger This is the full calltrace (after killing the crond processes shown in the processlisting) resulting from "echo t > /proc/sysrq-trigger"
Try booting with kernel options nohz=off highres=off to disable the new timer code. Also try 2.6.22.1-33
(In reply to comment #3) > Try booting with kernel options > > nohz=off highres=off > > to disable the new timer code. > > Also try 2.6.22.1-33 OK, I will do. However I do not think that it is related to the new timer code, since the problem was already there with a lot older kernel. In the meantime I have mounted all partitions with default values and had no deadlock so far (have to watch it further). Might it be possible that "noatime" is causing the problem? The CFQ scheduler might expect more dirty pages from doing high I/O like rsync. If noatime is set there won't be any, at least not on the partition reading from.
> OK, I will do. However I do not think that it is related to the new timer > code, since the problem was already there with a lot older kernel. I would like to give an update on this as I think that I found the root cause of the problem. So far we have not changed anything (still running 2.6.22.1-27 with the new timer code). The only thing that was changed were the mount options. Both systems now run stable for over a week, so the problem is not longer existent. Old fstab: /dev/hd2/varspoolimap /var/spool/imap ext3 async,auto,data=ordered,exec,noacl,noatime,nodev,nosuid,nouser,rw 1 2 New fstab: /dev/hd2/varspoolimap /var/spool/imap ext3 defaults 1 2 Since "defaults" implies rw,suid,dev,exec,auto,nouser,async the only differences here are: data=ordered (but that is a ext3-default, so also no difference) noacl (should no make a difference, SELinux is off, no security contextes are used here) noatime (which I suspect is the rootcause) nodev (should not affect rsync) nosuid (likewise should not affect rsync) I am pretty sure that mounting our imap-spool with noatime seems to cause deadlocks on high I/O load. So there is mostlikely some bug within the kernel and the schedular. At least since 2.6.19. I provided the stacktrace of such a situation with the original bugreport. Anything more I could provide to narrow the issue down?
(In reply to comment #5) > Anything more I could provide to narrow the issue down? In the meantime I recompiled 2.6.22.1-27.fc7 with just the twoline patch from Richard Kennedy (http://lkml.org/lkml/2007/8/2/89) which seems to have fixed this issue for now. I am still watching the systems closely to be sure that the patch indeed has the indended effect. Regards, Matthias
Created attachment 160963 [details] Patch to fix the issue Posted by Richard Kennedy on LKML: http://lkml.org/lkml/2007/8/2/89
Hello, I'm reviewing this bug as part of the kernel bug triage project, an attempt to isolate current bugs in the fedora kernel. http://fedoraproject.org/wiki/KernelBugTriage I am CC'ing myself to this bug and will try and assist you in resolving it if I can. There hasn't been much activity on this bug for a while. Could you tell me if you are still having problems with the latest kernel? Could you also test 2.6.23 from rawhide as this should have the fix indicated in comment #7. If the problem no longer exists then please close this bug or I'll do so in a few days if there is no additional information lodged. Cheers Chris
> There hasn't been much activity on this bug for a while. Could you tell me if > you are still having problems with the latest kernel? Yes, the bug is still present. > Could you also test 2.6.23 > from rawhide as this should have the fix indicated in comment #7. The last activity was my mail from august 9th to LKML (see http://lkml.org/lkml/2007/8/9/346). There was no further answer to that. The patch from comment #7 has resolved the issue (the affected system has now a uptime of nearly 50 days). However, that patch is neither present upstream (checked 2.6.23-rc7), nor in rawhide (checked kernel-2.6.23-0.189.rc6.git8.fc8). So, the problem still exists. Regards, Matthias
Okay, will re-assign and hopefully get this included soon. Chuck?
This sounds very similar to a problem I have been seeing with FC5 and F7. I am holding one machine (that I can't afford to have hangs happen on every couple of days) at 2.6.19-1.2288.2.4.fc5 and it is working fine.
(In reply to comment #11) > This sounds very similar to a problem I have been seeing with FC5 and F7. I am > holding one machine (that I can't afford to have hangs happen on every couple of > days) at 2.6.19-1.2288.2.4.fc5 and it is working fine. If you can post some details to confirm the dupe that would be helpful - similar output to Matthias' for example.
I submitted 235043 which refers to this issue and probably has some of what you want. The two systems I saw the problem on (though one I kept at FC5 and never tried F7) have only 512MB of memory and use ext3 file systems with journal=data and write barriers running on top of software raid 1 on pata disks.
Hi, any news here? I'm running the system with the problem mentioned above together with Matthias Hensler. We currently using 2.6.22.1-27.fc7 (last ChangeLog entry * Tue Jul 17 2007 John W. Linville <linville>) with patch form comment #7, noatime option enabled and haven't faced the problem again. Is this patch or another patch trying to fix this problem included in RawHide kernel kernel-2.6.23.1-30.fc8. If yes, we would offer to try it on our system.
We just go round and round on linux-kernel and nobody seems to have a safe short-term fix. Supposedly 2.6.25 will have the real fix but it's hundreds of lines of code changes.
FWIW this patch should be much safer: http://lkml.org/lkml/diff/2007/9/20/415/1
Is anybody else using patch from comment #16 ? Two days ago I've compiled kernel-2.6.22.9-91.bz249563.fc7 including the fix from comment #16 Yesterday the server got slow and somebody from first level support rebooted the server without further investigating the situation. Today the server got stuck a bit (means some static html pages from webmail frontend still worked, but login (ssh and serial) timed ou)t. Unfortunately I had no open session to investigate it further and I had to do hard reset. For now I have reverted back to kernel-2.6.22.1-27.bz249563.fc7 with fix from comment #7.
I didn't try any of the patches, but I did collect some vmstat and meminfo data and post a pointer to it on lmkl yesterday. I don't know if the right people have looked at it, but during the thread someone mentioned they were interested in seeing that data.
Created attachment 246931 [details] vmstat and memifo data from while processes were stuck This is the mixed vmstat and meminfo data I posted a link to previously. Since it didn't get any attention on lmkl, I wanted to put a copy somewhere where anyone who might want to look at it later can find it.
Any news regarding this problem? I'm in the process to update to FC8, but I don't want to take the risk that I'm stuck without a running kernel. Was this problem addressed in FC8 kernel?
If you are already using the latest FC5 or later without problem, then you probably won't run into what I am seeing. I have seen the problem happening continuously since late FC5 through current F8. In fact I just rebooted the machine where this problem is triggered to clear things up again about an hour ago.
I may also be having the same deadlock. I had /var/spool/squid mounted noatime (just changed it after reading this bug report) Symptoms were Login prompt at console lets you type something in, password prompt doesn't come up System is pingable ssh attempts timeout Wasn't able to do any sysrq logging as the person who resets the machine doesn't have enough CS background. And I've never had an active login when it has happened. It happens every 3-7 days. The system is an older 650MHZ PIII with RAID1 and only 512MB of RAM. So it all sounds very similar to reports above. I wouldn't call my disk I/O unusually high, though. Currently running FC8 with 2.6.23.
I'm not running plain kernel. I'm running kernel 2.6.22.1-27.bz249563.fc7 This is local compiled 2.6.22.1-27 with additional patch from comment #7 Without the patch I'm facing the problem described in this bug report. I need to know if the problem is fixed in FC8 kernel, since in comment #17 I've tried a updated FC7 kernel with the patch from comment #7 and faced the problem again.
The problem should be fixed properly in 2.6.24.
Setting to NEEDINFO then - could all reporters test with a 2.6.24-based kernel and report their results. Thanks in advance.
OK, will do so in a few days and post the result here.
I turned off noatime, rebooted, and have had the hang/lockup since then with 2.6.23 on FC8.
I have upgraded the machine I have been having problems on to rawhide and am using a 2.6.24 kernel. So far the problem hasn't reoccurred, but it is still too early for me to say the problem has been fixed.
I have had 2.6.24-2.fc9 running continuously for 3.5 days, which considering that I have been doing significant yum updates in that time is longer that I would have expected to go without a lockup under the old kernels. I probably need to go a few more days to be really sure, but if things still look good on Friday I will try an upgrade of my system I have held on FC5 because of this problem.
After a week I rebooted the machine to start using a kernel update. It hadn't made a week in a long time (about 6 months), so I think there is a very good chance the problem is fixed in the 2.6.24 kernel.
Okay, I'm closing to reflect that then. As the original reporter hasn't commented in a while and there's plenty of evidence to believe things are resolved. Please re-open if the problem re-occurs. Cheers Chris
I'm still getting deadlocks of some sort. ssh half responds (sock connects, no communication..), syslog will stop logging. System is FC8, with a 2.6.24 kernel (can't look at the subversion right now since the system is locked up and at another location.) sysrq keys work for unmounting, killing, and rebooting, which is the SOP when the system locks up like this, which is about once a week. How do people get the state of the system when this happens?
(In reply to comment #32) > > How do people get the state of the system when this happens? sysrq-t, but you've got to have some way to capture it.
So I've set up a silly scheme.. might work.. I've read that for some people, an active login still works mostly. So, I ssh in and while there, every 60 seconds, echo 't' to the sysrq /proc file. Meanwhile, I use klogctl to read from the kernel ring buffer to stdout... Hopefully while in this state, existing sockets remain working, sshd's and klogctl remains working... The output gets logged on the host performing the ssh. We'll see if it works.... If it doesn't I guess I'd need to boot with a serial port configured as console and take a dump with a null modem and laptop? I don't see any other way.