Hide Forgot
Daniel Stodden reported that the issue still exists in RHEL6.1. The problem was that in 6.1 after the fix we also backported upstream commit 3588fe2 (xen/events: change to using fasteoi, 2010-08-27) which reintroduces the bug even though with even less frequency. Upstream later reverted to edge-triggering with 7e186bd (xen: do not clear and mask evtchns in __xen_evtchn_do_upcall, 2011-05-06), which is huge but---stripped of stuff we do not have in RHEL6---is almost exactly a revert of 3588fe2. Email thread follows. After the thread is a summary of the comment trail from cloned bug 550724. The one I'm after revolves around v2.6.32.21-2-gc578392 (on linux-2.6.32.y/master): "xen: handle events as edge-triggered". --------------------------------------------------------- Daniel Stodden 06:53 PM 08/25/2011: Symptom is the occasional I/O deadlock when irqbalanced is running, due to event migration under I/O load, which can drop notifications. What happens is that blockfront misses a backend kick, because it's got this supersized hardirq context which makes this more likely to be affected than other places. The symptom would be the disk wedging. --------------------------------------------------------- Paolo Bonzini 10:34 AM 08/26/2011: Hi, was this RHEL6.0 or RHEL6.1? For "regular" events, after changing them to edge-triggered, upstream switched to fasteoi with commit 3588fe2 and then back to edge-triggered with 7e186bd. We have (buggy) level-triggered in RHEL6.0. In RHEL6.1 we changed it to fasteoi as part of resyncing events.c with upstream; we do not have 7e186bd in either RHEL6.1 or RHEL6.2, and likely will not backport it since we do not support pcifront. Shall we change it back to edge-triggered, i.e. revert 3588fe2? --------------------------------------------------------- Stefano Stabellini 02:03 PM 08/26/2011: As the author of 7e186bd, I think the best thing you could do is to backport 7e186bd plus all the other related upstream fixes to events.c (see recent email by Ian Campbell related to Debian http://marc.info/?l=xen-devel&m=131422114313553). Reverting 3588fe2 would give you an events.c very different from both upstream Linux and Xenolinux kernels. --------------------------------------------------------- Paolo Bonzini 03:03 PM 08/26/2011 Ok, I see. I'm quite sure the full backport is not an option for 6.2. The "new" IRQ infrastructure in upstream also makes the backport pretty hard. That's why I was hoping to get by with a revert. But actually, once I strip 7e186bd of all the pirq stuff, I get exactly a revert of 3588fe2 except for the addition of mask_ack_dynirq. I attach the 2.6.32 backport. +++ This bug was initially created as a clone of Bug #550724 +++ Created attachment 380486 [details] A terminal window with a top running shows the progressively degrading state. Description of problem: Sometimes under a small load, processes start locking up in D state. Load climbs steadly. Reboot required. Version-Release number of selected component (if applicable): Happens with both: kernel-2.6.31.6-166.fc12.x86_64 kernel-2.6.31.9-174.fc12.x86_64 How reproducible: Frequency roughly about every 24 hours. Happens when several LTSP sessions started, or sometime during logwatch processing, or during rsync backups. Additional info: System is a large Xen DomU. Dom0 is able to destroy and re-create the system. --- Additional comment from norm@turing.une.edu.au on 2010-02-19 00:31:23 EST --- This kernel thread seems to mention something like this problem: http://www.mail-archive.com/kvm@vger.kernel.org/msg23039.html There didn't seem to be any resolution to the problem in that thread. --- Additional comment from norm@turing.une.edu.au on 2010-02-22 15:25:51 EST --- I've posted this bug in the upstream kernel bugzilla http://bugzilla.kernel.org/show_bug.cgi?id=15370 "task jbd2/dm-1-8:973 blocked for more than 120 seconds." --- Additional comment from norm@turing.une.edu.au on 2010-02-24 21:26:16 EST --- Updated to kernel-2.6.32.9-64.fc12.x86_64 and it crashed after about 19 hours. This problem is quite frustrating. I have not been reproduce the problem in a different VM on either the same hardware or different hardware. Other VMs seem to withstand heavy disk tests with no problem. It's something about the LTSP workload. The network I'm sure is in this mix somewhere. In the not very conclusive thread mentioned in comment #26, it is said: "I kept hitting this issue for 2.6.31 guest kernel even with a simple network test." The system can hum along with labs full of students and work fine for several hours, then suddenly one of the jbd2/dm- tasks won't come out of D state and the rest of the system follows on as more and more processes catch the D state. Other jbd2/dm- tasks catch the D as well after the first one goes. Depending on which jbd2/dm- goes D first, the growth of the D's can be very fast or start slow and then spread at an increasing pace. The problem is load related but it is not a big load. Its some combination of network and disk I think. I'm at a lose as to what more information I can provide. --- Additional comment from norm@turing.une.edu.au on 2010-03-01 02:48:33 EST --- Created attachment 396977 [details] /usr/lib64/xen/bin/xenctx -s System.map-2.6.32.9-64.fc12.x86_64 just before panic I also ran the xenctx on each processor just before it panicked as the processes started to hang. Unfortunately I didn't get the console output of the panic other than the line: Kernel panic - not syncing: hung_task: blocked tasks There was no output in xm dmesg other than the messages mentioned in comment #9 --- Additional comment from norm@turing.une.edu.au on 2010-03-08 22:59:08 EST --- Created attachment 398674 [details] A different domU VM hung I guess the xenctx dumps were also inconclusive. I have some slightly new information. I have another domU that has hung. This was a test system the I was about to use to get a 2.6.33 to run and it hung in an rpm -i kernel... command. The hang seems to be on the /boot filesystem so I can still run commands and get logs if I don't access /boot or run commands like ps. Same dom0 hardware so that is still a possible problem, although I've not seen any signs of trouble in dom0. This dom0 only runs two domUs, this test system and the main LTSP system I have logged in all the above. Previous disk tests have not be able to make the test system fail. I guess this tells us something. It's probably not an ext4 problem as /boot is ext3. It's probably nothing to do with the network. The only network activity is my ssh logins. It's not quotas as they are not turned on. There are not even any users on this system, it's just a small test system. This one is running 2.6.32.3-10.fc12.x86_64 and it seems to have first hung in: INFO: task kjournald:580 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kjournald D ffff88012f57a300 0 580 2 0x00000080 ffff88012fc2fc00 0000000000000246 0000000000000000 ffffffff817f3a80 0000000001865b1e ffffffffb47a46fe ffff88012fc2ffd8 ffff88012fc2ffd8 ffff88012f7ae158 000000000000f980 0000000000015740 ffff88012f7ae158 Call Trace: [<ffffffff81140972>] ? sync_buffer+0x0/0x44 [<ffffffff81140972>] ? sync_buffer+0x0/0x44 [<ffffffff8145337d>] io_schedule+0x43/0x5d [<ffffffff811409b2>] sync_buffer+0x40/0x44 [<ffffffff814538d0>] __wait_on_bit+0x48/0x7b [<ffffffff81012e5b>] ? xen_hypervisor_callback+0x1b/0x20 [<ffffffff81453971>] out_of_line_wait_on_bit+0x6e/0x79 [<ffffffff81140972>] ? sync_buffer+0x0/0x44 [<ffffffff81074a64>] ? wake_bit_function+0x0/0x33 [<ffffffff8113f765>] ? submit_bh+0x109/0x117 [<ffffffff811408d5>] __wait_on_buffer+0x24/0x26 [<ffffffff811bfc12>] wait_on_buffer+0x3d/0x41 [<ffffffff811c019a>] journal_commit_transaction+0x4cf/0xefd [<ffffffff8104eb5b>] ? finish_task_switch+0x8a/0xac [<ffffffff81074a2b>] ? autoremove_wake_function+0x0/0x39 [<ffffffff8100e87f>] ? xen_restore_fl_direct_end+0x0/0x1 [<ffffffff81454b03>] ? _spin_unlock_irqrestore+0x19/0x1b [<ffffffff81065479>] ? try_to_del_timer_sync+0x73/0x81 [<ffffffff8100e87f>] ? xen_restore_fl_direct_end+0x0/0x1 [<ffffffff811c384c>] kjournald+0xe8/0x225 [<ffffffff81074a2b>] ? autoremove_wake_function+0x0/0x39 [<ffffffff811c3764>] ? kjournald+0x0/0x225 [<ffffffff8107473e>] kthread+0x7f/0x87 [<ffffffff81012d2a>] child_rip+0xa/0x20 [<ffffffff81011f11>] ? int_ret_from_sys_call+0x7/0x1b [<ffffffff8101269d>] ? retint_restore_args+0x5/0x6 [<ffffffff81012d20>] ? child_rip+0x0/0x20 It's still possible to log into this system and do a restricted set of things. Is there anything other information I can collect while it's in this state? For example cat /proc/*/stat can show me the blocked processes. Anything else? The system is installed on one LV with that divided into a number of partitions: /dev/mapper/SYSTEM-root / ext4 defaults 1 1 UUID=3118ec38-93e1-456c-995f-c67217901296 /boot ext3 defaults 1 2 /dev/mapper/SYSTEM-tmp /tmp ext4 defaults 1 2 /dev/mapper/SYSTEM-var /var ext4 defaults 1 2 /dev/mapper/SYSTEM-mail /var/spool/mail ext4 defaults 1 2 /dev/mapper/SYSTEM-opt /opt ext4 defaults 1 2 /dev/mapper/SYSTEM-swap swap swap defaults 0 0 tmpfs /dev/shm tmpfs defaults 0 0 devpts /dev/pts devpts gid=5,mode=620 0 0 sysfs /sys sysfs defaults 0 0 proc /proc proc defaults 0 0 --- Additional comment from drjones@redhat.com on 2010-03-09 04:35:07 EST --- Sorry, I haven't had any time to look at this. The backtraces you've captured are useful though. The one where we panic'ed immediately after detecting D-state is this Call Trace: [<ffffffff8122e328>] delay_tsc+0x48 <-- [<ffffffff8122e26f>] __delay+0xf [<ffffffff8122e2b3>] __const_udelay+0x42 [<ffffffff81453982>] panic+0x131 [<ffffffff810acbfd>] watchdog+0x18e [<ffffffff8100e87f>] xen_restore_fl_direct_reloc+0x4 [<ffffffff810aca6f>] hung_task_panic+0x17 [<ffffffff810745b6>] kthread+0x7f [<ffffffff81012d6a>] child_rip+0xa [<ffffffff81011f51>] int_ret_from_sys_call+0x7 [<ffffffff810126dd>] retint_restore_args+0x5 [<ffffffff81012d60>] kernel_thread+0xe0 It shows that the watchdog tripped after we called xen_restore_fl_direct(). xen_restore_fl_direct() just checks if there are pending events and if so, then makes a simple xen-version hypercall in order to trap to Xen, which will force event channel processing. All the other stacks you've captured also contain xen_restore_fl_direct(). This indicates to me that Xen is getting messed up at some point, and then each process eventually gets hung up after trapping down to it. It further looks like the bug is in Xen, and not the guest, since the D-state problem transferred to the other domain on your system as well, once again with xen_restore_fl_direct() on the stack. All that said, I don't know how exactly, or what, is getting Xen messed up. It does seem likely to be related to your LTSP workload though, since you're currently the only reporter I'm aware of. Maybe we can get 'xm dmesg' output before you start the LTSP workload, and then on some regular interval after you start it, and then of course right after all the processes lock up in D state. Please also make sure all debug messages are turned by adding the following to your xen.gz line in grub loglvl=all guest_loglvl=all debug_stack_lines=80 --- Additional comment from vevroux@msn.com on 2010-04-30 07:27:02 EDT --- Hi, I have the exact same issue with xen 3.1.4 running on FC8 DOM0 2.6.21.7-5.fc8xen. INFO: task kjournald:851 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kjournald D ffff88001ce9ca80 0 851 2 0x00000000 ffff880002d1daf0 0000000000000246 ffff88001ce1d908 ffff88001e0cb008 000000000073335c 0000000000000001 ffff880002d1db58 ffff88001d5bdb00 ffff88001cde1b48 000000000000f8e0 ffff88001cde1b48 0000000000015600 Call Trace: [<ffffffff8106e897>] ? clocksource_read+0xf/0x11 [<ffffffff8106aca7>] ? ktime_get_ts+0x4e/0x53 [<ffffffff81419393>] io_schedule+0x31/0x42 [<ffffffff811e9503>] get_request_wait+0xeb/0x179 [<ffffffff81067ae7>] ? autoremove_wake_function+0x0/0x39 [<ffffffff811e982c>] __make_request+0x29b/0x3c5 [<ffffffff811e71cb>] generic_make_request+0x24e/0x2ac [<ffffffff8100e7f2>] ? check_events+0x12/0x20 [<ffffffff811e730c>] submit_bio+0xe3/0x100 [<ffffffff8100e7f2>] ? check_events+0x12/0x20 [<ffffffff8111b2f3>] submit_bh+0xf4/0x117 [<ffffffff8100e7df>] ? xen_restore_fl_direct_end+0x0/0x1 [<ffffffff81196b39>] journal_do_submit_data+0x3c/0x4a [<ffffffff81197055>] journal_commit_transaction+0x3be/0xe55 [<ffffffff81048fcb>] ? finish_task_switch+0x6d/0xe6 [<ffffffff8100e7df>] ? xen_restore_fl_direct_end+0x0/0x1 [<ffffffff8141ab29>] ? _spin_unlock_irqrestore+0x19/0x1b [<ffffffff8105c11f>] ? try_to_del_timer_sync+0x5a/0x66 [<ffffffff8100e7df>] ? xen_restore_fl_direct_end+0x0/0x1 [<ffffffff8119a78b>] kjournald+0xeb/0x230 [<ffffffff81067ae7>] ? autoremove_wake_function+0x0/0x39 [<ffffffff8141ab29>] ? _spin_unlock_irqrestore+0x19/0x1b [<ffffffff8119a6a0>] ? kjournald+0x0/0x230 [<ffffffff81067765>] kthread+0x91/0x99 [<ffffffff81012daa>] child_rip+0xa/0x20 [<ffffffff81011f97>] ? int_ret_from_sys_call+0x7/0x1b [<ffffffff8101271d>] ? retint_restore_args+0x5/0x6 [<ffffffff81012da0>] ? child_rip+0x0/0x20 vif3.0 Link encap:Ethernet HWaddr FE:FF:FF:FF:FF:FF inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1048 errors:0 dropped:0 overruns:0 frame:0 TX packets:2082 errors:0 dropped:32 overruns:0 carrier:0 collisions:0 txqueuelen:32 RX bytes:114497 (111.8 KiB) TX bytes:248113 (242.2 KiB) vif4.0 Link encap:Ethernet HWaddr FE:FF:FF:FF:FF:FF inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:363 errors:0 dropped:0 overruns:0 frame:0 TX packets:414 errors:0 dropped:35 overruns:0 carrier:0 collisions:0 txqueuelen:32 RX bytes:43364 (42.3 KiB) TX bytes:55012 (53.7 KiB) vif5.0 Link encap:Ethernet HWaddr FE:FF:FF:FF:FF:FF inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:3127 errors:0 dropped:0 overruns:0 frame:0 TX packets:2850 errors:0 dropped:16 overruns:0 carrier:0 collisions:0 txqueuelen:32 RX bytes:443692 (433.2 KiB) TX bytes:404390 (394.9 KiB) vif6.0 Link encap:Ethernet HWaddr FE:FF:FF:FF:FF:FF inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:436 errors:0 dropped:0 overruns:0 frame:0 TX packets:1087 errors:0 dropped:226 overruns:0 carrier:0 collisions:0 txqueuelen:32 RX bytes:63671 (62.1 KiB) TX bytes:127551 (124.5 KiB) --- Additional comment from drjones@redhat.com on 2010-05-05 03:23:17 EDT --- Is this really linked to network use? Have you tried running the guests with the network down? --- Additional comment from norm@turing.une.edu.au on 2010-05-07 03:40:16 EDT --- Andrew, I get the impression you would be surprised if this problem was related to network use? Is there a reason for this? --- Additional comment from norm@turing.une.edu.au on 2010-05-14 12:57:50 EDT --- Created attachment 414109 [details] guest panic on disk test Well I finally got brave and ran some disk tests in production. The result was a panic in the guest within seconds. Attached is the stack trace. syslogd also wall'ed this: Message from syslogd@turing at May 15 02:24:59 ... kernel:------------[ cut here ]------------ Message from syslogd@turing at May 15 02:24:59 ... kernel:invalid opcode: 0000 [#1] SMP Message from syslogd@turing at May 15 02:24:59 ... kernel:last sysfs file: /sys/devices/vbd-51712/block/xvda/stat Message from syslogd@turing at May 15 02:24:59 ... kernel:Stack: Message from syslogd@turing at May 15 02:24:59 ... kernel:Call Trace: Message from syslogd@turing at May 15 02:24:59 ... kernel:Code: 8b 38 48 8b 70 08 48 89 77 08 48 89 3e 48 89 00 48 89 40 08 66 ff 03 ff 14 25 c8 a7 6b 81 48 8b 42 f8 48 83 e0 fc 48 39 c3 74 04 <0f> 0b eb fe f0 80 62 f8 fe 48 89 cf ff 55 98 48 8b 45 80 8b 90 Timeout, server not responding. The disk test was an fio test, run with "sudo fio four-threads-randio.fio" where four-threads-randio.fio contains: ; Four threads, two query, two writers. [global] rw=randread size=256m directory=/tmp/fio-testing/data ioengine=libaio iodepth=4 invalidate=1 direct=1 [bgwriter] rw=randwrite iodepth=32 [queryA] iodepth=1 ioengine=mmap direct=0 thinktime=3 [queryB] iodepth=1 ioengine=mmap direct=0 thinktime=5 [bgupdater] rw=randrw iodepth=16 thinktime=40 size=32m Note that Dom0 logged no disk errors, or any error, in dmesg, syslog or "xm dmesg" on this crash. --- Additional comment from norm@turing.une.edu.au on 2010-05-31 23:15:26 EDT --- The most consistent way to trigger this bug (>50%) is to run, on a remote system: ssh turing dump 3uf - /dev/mapper/SYSTEM-root | /usr/bin/gzip -9 > backup.root.gz turing is the DomU system that crashes. The remote system is located on totally different hardware. Seems to me that the workload here is a lot of disk reads and a lot of network writes. --- Additional comment from norm@turing.une.edu.au on 2010-06-20 17:41:26 EDT --- I have finally been alble to reproduce this on completely different hardware while doing a similar command to comment #58. No stack traces as nothing logged to syslog and I didn't turn on hung_task_panic=1 for this kernel. This is different hardware to all he other reports in this bug. The host software is Centos 5.5 running 2.6.18-194.3.1.el5xen. I had a top running in a screen session of the single Fedora 12 domU: top - 07:26:30 up 2 days, 21:34, 1 user, load average: 24.99, 24.97, 24.53 Tasks: 149 total, 2 running, 146 sleeping, 0 stopped, 1 zombie Cpu(s): 0.1%us, 0.1%sy, 0.0%ni, 0.0%id, 99.8%wa, 0.0%hi, 0.0%si, 0.1%st Mem: 4060828k total, 4040752k used, 20076k free, 3486120k buffers Swap: 6291448k total, 320k used, 6291128k free, 134136k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 235 root 20 0 0 0 0 D 0.0 0.0 0:00.18 jbd2/dm-0-8 887 root 20 0 0 0 0 D 0.0 0.0 0:00.24 flush-253:0 1321 ntp 20 0 60716 2776 1904 D 0.0 0.1 0:00.27 ntpd 1384 postgres 20 0 183m 4988 4192 D 0.0 0.1 0:07.96 postmaster 1409 postgres 20 0 155m 1232 444 D 0.0 0.0 0:00.08 postmaster 1411 root 20 0 62464 1436 844 D 0.0 0.0 0:00.32 amd 1441 postgres 20 0 183m 1652 716 R 0.0 0.0 0:03.75 postmaster 1442 postgres 20 0 155m 1392 516 D 0.0 0.0 0:04.80 postmaster 1470 root 20 0 255m 5852 4212 D 0.0 0.1 0:00.37 abrtd 9076 root 20 0 54076 2588 1856 R 0.0 0.1 8:09.33 top 22487 root 20 0 23256 1948 144 D 0.0 0.0 0:14.18 dump 22488 root 20 0 23256 1948 144 D 0.0 0.0 0:14.07 dump 22489 root 20 0 23256 1948 144 D 0.0 0.0 0:14.47 dump 22530 root 20 0 109m 788 236 D 0.0 0.0 0:00.00 crond 22607 root 20 0 71840 2016 420 D 0.0 0.0 0:00.00 sendmail 22608 smmsp 20 0 59156 1628 408 D 0.0 0.0 0:00.00 sendmail 22609 root 20 0 109m 788 236 D 0.0 0.0 0:00.00 crond 22610 root 20 0 71840 2016 420 D 0.0 0.0 0:00.00 sendmail 22611 smmsp 20 0 59156 1628 408 D 0.0 0.0 0:00.00 sendmail 22612 root 20 0 109m 788 236 D 0.0 0.0 0:00.00 crond 22613 root 20 0 71840 2016 420 D 0.0 0.0 0:00.00 sendmail 22614 smmsp 20 0 59156 1628 408 D 0.0 0.0 0:00.00 sendmail 22615 root 20 0 109m 788 236 D 0.0 0.0 0:00.00 crond 22616 root 20 0 71840 2016 420 D 0.0 0.0 0:00.00 sendmail 22617 smmsp 20 0 59156 1628 408 D 0.0 0.0 0:00.00 sendmail --- Additional comment from norm@turing.une.edu.au on 2010-06-21 02:00:04 EDT --- Created attachment 425531 [details] A sysrq l stack of domU Tried to do an "xm sysrq N w" to show stacks of blocked tasks on hung domU. Nothing was displayed. An "xm sysrq N l" showed the attached stack. Looks a bit different to the other stacks though I'm not sure if it's useful. --- Additional comment from norm@turing.une.edu.au on 2010-07-01 23:40:32 EDT --- Could this problem is related to timers or clock source in any way? In a bug that I think is similar to this, #526627, the clock source was the suspected cause for awhile. Also comment #54 above reports a very similar problem with KVM in bug #563672 My domU once logged: Jun 11 01:12:51 turing kernel: hrtimer: interrupt took 1645273 ns out of the blue. That's the only time I have ever seen that message. DomU currently reports: $ cat /sys/devices/system/clocksource/clocksource0/available_clocksource xen tsc $ cat /sys/devices/system/clocksource/clocksource0/current_clocksource xen --- Additional comment from norm@turing.une.edu.au on 2010-07-28 03:03:28 EDT --- Just noticed that this bug is referenced here: http://lists.xensource.com/archives/html/xen-devel/2010-06/msg00590.html No solution there either. One comment from Rick Boone was: > I booted into 2.6.32.12 yesterday, but I'm still getting a hung task crash. (Initially, I got a soft lockup crash on one of the cpu's, after switching to 2.6.32.12, but it seems I was able to resolve that by switching the kernel to tickless) --- Additional comment from drjones@redhat.com on 2010-08-20 01:56:50 EDT --- I think I've reproduced this bug on a very recent rhel6 kernel. When I mount these file systems I get this log to the console JBD: barrier-based sync failed on xvdb - disabling barriers and the first tasks to start hanging (with ext3) are kjournald, flush-202 - not always that order (reproduced 3 times) and now on the ext4 reproduced case it was jbd2/xvda2-8 My reproducer is currently to simply build a kernel on these file systems and wait. Andrew --- Additional comment from norm@turing.une.edu.au on 2010-08-22 06:39:14 EDT --- (In reply to comment #74) > Ah, nevermind. Just got it to reproduce with ext4. Only notes I have right now > is that when I mount these file systems I get this log to the console All my crashes have been with ext4 (with or without quotas enabled) except for the crash mentioned in comment #34 which was ext3 on a /boot partition. > JBD: barrier-based sync failed on xvdb - disabling barriers Hmm, this messages is familiar for some reason but I can't find it in any of my current logs. > and the first tasks to start hanging (with ext3) are > > kjournald, flush-202 - not always that order (reproduced 3 times) > > and now on the ext4 reproduced case it was > > jbd2/xvda2-8 jbd2/dm* for my ext4 filesystems. Any one could go first but the most likely seems to be /var All systems that have had this problem for me have been on one or more lv of the host system. I have not tried image files. > My reproducer is currently to simply build a kernel on these file systems and > wait. My feelings on the workload that reproduces this is that it has to be stop/start disk activity. Just straight heavy I/O rarely hits this problem in my testing. The network also figures prominently in the mix. After the the jdb2 hang the first processes to hang are usually things like httpd, nscd, smdb and imap. Also backups, with dump across the network, seems to trigger the problem reasonably often. I'll put my test system (with has seen the problem only twice) into a loop of kernel building and see what happens. The production system that sees the problem often is a LTSP and nxclient server. So that's lots of desktop users using browsers, compilers, editors, etc. Lots of stop/start network and disk I/O. --- Additional comment from norm@turing.une.edu.au on 2010-08-23 17:48:48 EDT --- (In reply to comment #75) > (In reply to comment #74)> > > JBD: barrier-based sync failed on xvdb - disabling barriers > Hmm, this messages is familiar for some reason but I can't find it in any of my > current logs. I saw a discussion of this message on the ext4 mailing list. I don't get it in my logs as far as I can tell. http://marc.info/?t=128017340300003&r=1&w=2 > > My reproducer is currently to simply build a kernel on these file systems and > > wait. > I'll put my test system (with has seen the problem only twice) into a loop of > kernel building and see what happens. My test system, building the kernel went into this state after 2 days. However it did it when doing a backup (dump) to a remote system again and/or sending an email. top - 07:29:45 up 4 days, 14:24, 3 users, load average: 25.99, 25.96, 25.35 Tasks: 153 total, 1 running, 152 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 0.0%id, 99.9%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4060836k total, 2080144k used, 1980692k free, 128332k buffers Swap: 6291448k total, 5444k used, 6286004k free, 374432k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMMAND 20361 root 20 0 14908 1172 856 R 0.3 0.0 4:18.03 top 237 root 20 0 0 0 0 D 0.0 0.0 6:29.73 jbd2/dm-0-8 825 root 20 0 0 0 0 D 0.0 0.0 1:36.20 flush-253:0 919 root 20 0 8456 908 580 D 0.0 0.0 0:00.10 patch 920 root 20 0 170m 2644 1680 D 0.0 0.1 0:00.00 crond 923 root 20 0 170m 2644 1680 D 0.0 0.1 0:00.00 crond 926 root 20 0 111m 976 404 D 0.0 0.0 0:00.00 crond 927 smmsp 20 0 59152 1656 420 D 0.0 0.0 0:00.00 sendmail 928 root 20 0 71836 2040 432 D 0.0 0.1 0:00.00 sendmail 929 root 20 0 115m 1112 524 D 0.0 0.0 0:00.00 crond 930 smmsp 20 0 59152 1656 420 D 0.0 0.0 0:00.00 sendmail 932 root 20 0 71836 2040 432 D 0.0 0.1 0:00.00 sendmail 933 root 20 0 117m 1136 540 D 0.0 0.0 0:00.00 crond 934 smmsp 20 0 59152 1656 420 D 0.0 0.0 0:00.00 sendmail 935 root 20 0 71836 2040 432 D 0.0 0.1 0:00.00 sendmail 936 root 20 0 128m 1244 612 D 0.0 0.0 0:00.00 crond 937 smmsp 20 0 59152 1656 420 D 0.0 0.0 0:00.00 sendmail 938 root 20 0 71836 2044 432 D 0.0 0.1 0:00.00 sendmail 1218 ntp 20 0 60708 1736 868 D 0.0 0.0 0:00.33 ntpd 1281 postgres 20 0 183m 3416 2776 D 0.0 0.1 0:01.25 postmaster 1373 postgres 20 0 183m 1068 396 D 0.0 0.0 0:01.44 postmaster 1374 postgres 20 0 155m 908 268 D 0.0 0.0 0:01.61 postmaster 32393 root 20 0 23256 1920 144 D 0.0 0.0 0:00.39 dump 32394 root 20 0 23256 1920 144 D 0.0 0.0 0:00.31 dump 32395 root 20 0 23256 1920 144 D 0.0 0.0 0:00.29 dump --- Additional comment from norm@turing.une.edu.au on 2010-08-23 17:58:51 EDT --- (In reply to comment #76) > > > JBD: barrier-based sync failed on xvdb - disabling barriers > I don't get it in my logs as far as I can tell. Er, yes I do. It has appeared in some of the ancient logs of my test system. --- Additional comment from norm@turing.une.edu.au on 2010-08-26 21:43:23 EDT --- In 5 days of rebuilding the kernel, my test system crashed once as described in comment #76. So still pretty tough to re-produce this in test. My production system crashed 13 times during the same period. One observation. The crash in comment #34 was a pivotal moment. It was the one that got this bug moved from a fedora kernel (guest) to the enterprise kernel (the host). Do we really think this problem is in the host kernel or even Xen? KVM guests seem to also see something similar. Comment #34 was also interesting in that there was little network activity on the guest that crashed. My main crashing guest was also running on the host at the same time but it didn't crash (then). --- Additional comment from drjones@redhat.com on 2010-09-02 12:34:20 EDT --- Right, good question (where does this bug belong). If the problem on KVM hosts really seems to be the same (I'll try to reproduce that soon), then we can focus more on the guest. At this time though, I still want to dig deeper on what I saw in comment 36. I'll try to get to this asap, but I still have some other issues to knock off the TODO list first, sorry. Drew --- Additional comment from eric@pkill.info on 2010-09-04 01:02:17 EDT --- We have experienced the same problem: for high CPU tasks, the PV F12 DomU works well; for hight I/O tasks, the F12 DomU crashes and the Dom0 also crashes. One oops message: Sep 2 21:27:33 node04 kernel: INFO: task ext4-dio-unwrit:990 blocked for more than 120 seconds. Sep 2 21:27:33 node04 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Sep 2 21:27:33 node04 kernel: ext4-dio-unwr D ffff8800157890c0 0 990 2 0x00000000 Sep 2 21:27:33 node04 kernel: ffff880012e39d40 0000000000000246 0000000003f44d9c 0000000000006ad1 Sep 2 21:27:33 node04 kernel: ffff8800124e44d8 ffff8800124e44a0 ffff880012e39fd8 ffff880012e39fd8 Sep 2 21:27:33 node04 kernel: ffff8800124e4870 0000000000007d30 000000000000b0c0 ffff8800124e4870 Sep 2 21:27:33 node04 kernel: Call Trace: Sep 2 21:27:33 node04 kernel: [<ffffffff8102d986>] ? set_next_entity+0x28/0x78 Sep 2 21:27:33 node04 kernel: [<ffffffff8140ae01>] __mutex_lock_common+0x11e/0x18d Sep 2 21:27:33 node04 kernel: [<ffffffff8140ae89>] __mutex_lock_slowpath+0x19/0x1b Sep 2 21:27:33 node04 kernel: [<ffffffff8140af9a>] mutex_lock+0x31/0x4b Sep 2 21:27:33 node04 kernel: [<ffffffff81157b29>] ext4_end_aio_dio_work+0x2c/0x6d Sep 2 21:27:33 node04 kernel: [<ffffffff81053705>] worker_thread+0x1c3/0x24f Sep 2 21:27:33 node04 kernel: [<ffffffff81157afd>] ? ext4_end_aio_dio_work+0x0/0x6d Sep 2 21:27:33 node04 kernel: [<ffffffff81057b54>] ? autoremove_wake_function+0x0/0x39 Sep 2 21:27:33 node04 kernel: [<ffffffff81053542>] ? worker_thread+0x0/0x24f Sep 2 21:27:33 node04 kernel: [<ffffffff81057711>] kthread+0x7f/0x87 Sep 2 21:27:33 node04 kernel: [<ffffffff8100ab2a>] child_rip+0xa/0x20 Sep 2 21:27:33 node04 kernel: [<ffffffff81057692>] ? kthread+0x0/0x87 Sep 2 21:27:33 node04 kernel: [<ffffffff8100ab20>] ? child_rip+0x0/0x20 Sep 2 21:27:33 node04 kernel: INFO: task tapdisk:18044 blocked for more than 120 seconds. Sep 2 21:27:33 node04 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Sep 2 21:27:33 node04 kernel: tapdisk D ffff8800157890c0 0 18044 1 0x00000080 Sep 2 21:27:33 node04 kernel: ffff88006cc29738 0000000000000286 ffff88006cc29708 ffff88007e3d4cb0 Sep 2 21:27:33 node04 kernel: ffff88007e3d4cb0 ffff880000000002 ffff88006cc29fd8 ffff88006cc29fd8 Sep 2 21:27:33 node04 kernel: ffff88004064c870 0000000000007d30 000000000000b0c0 ffff88004064c870 Sep 2 21:27:33 node04 kernel: Call Trace: Sep 2 21:27:33 node04 kernel: [<ffffffff81189bab>] do_get_write_access+0x200/0x40e Sep 2 21:27:33 node04 kernel: [<ffffffff81057b8d>] ? wake_bit_function+0x0/0x33 Sep 2 21:27:33 node04 kernel: [<ffffffff8118fc7e>] ? jbd2_journal_add_journal_head+0xbf/0x13d Sep 2 21:27:33 node04 kernel: [<ffffffff81189eff>] jbd2_journal_get_write_access+0x2b/0x44 Sep 2 21:27:33 node04 kernel: [<ffffffff811538ab>] ? ext4_read_block_bitmap+0x54/0x29c Sep 2 21:27:33 node04 kernel: [<ffffffff81176639>] __ext4_journal_get_write_access+0x56/0x60 Sep 2 21:27:33 node04 kernel: [<ffffffff811791e2>] ext4_mb_mark_diskspace_used+0x72/0x333 Sep 2 21:27:33 node04 kernel: [<ffffffff81178a48>] ? ext4_mb_initialize_context+0x7d/0x18d Sep 2 21:27:33 node04 kernel: [<ffffffff8117bfc8>] ext4_mb_new_blocks+0x1e8/0x42c Sep 2 21:27:33 node04 kernel: [<ffffffff811736c0>] ? ext4_ext_find_extent+0x130/0x27f Sep 2 21:27:33 node04 kernel: [<ffffffff81175945>] ext4_ext_get_blocks+0x1289/0x14e0 Sep 2 21:27:33 node04 kernel: [<ffffffff81021dd6>] ? xen_multicall_flush+0x2c/0x2e Sep 2 21:27:33 node04 kernel: [<ffffffff810be586>] ? unmap_vmas+0x663/0x7b8 Sep 2 21:27:33 node04 kernel: [<ffffffff81188f03>] ? start_this_handle+0x3dd/0x402 Sep 2 21:27:33 node04 kernel: [<ffffffff8115b011>] ext4_get_blocks+0x152/0x30f Sep 2 21:27:33 node04 kernel: [<ffffffff8115b246>] ext4_get_block_dio_write+0x78/0xb6 Sep 2 21:27:33 node04 kernel: [<ffffffff8110bc19>] __blockdev_direct_IO+0x521/0xae0 Sep 2 21:27:33 node04 kernel: [<ffffffff81159f76>] ext4_direct_IO+0x1b6/0x247 Sep 2 21:27:33 node04 kernel: [<ffffffff8115b1ce>] ? ext4_get_block_dio_write+0x0/0xb6 Sep 2 21:27:33 node04 kernel: [<ffffffff81157b6a>] ? ext4_end_io_dio+0x0/0x85 Sep 2 21:27:33 node04 kernel: [<ffffffff810a7cf6>] generic_file_direct_write+0xed/0x16d Sep 2 21:27:33 node04 kernel: [<ffffffff810a7f0c>] __generic_file_aio_write+0x196/0x286 Sep 2 21:27:33 node04 kernel: [<ffffffff810a805f>] generic_file_aio_write+0x63/0xad Sep 2 21:27:33 node04 kernel: [<ffffffff811549af>] ? ext4_file_write+0x0/0x95 Sep 2 21:27:33 node04 kernel: [<ffffffff81154a3d>] ext4_file_write+0x8e/0x95 Sep 2 21:27:33 node04 kernel: [<ffffffff810d6d97>] ? kmem_cache_alloc+0x47/0x140 Sep 2 21:27:33 node04 kernel: [<ffffffff81114e52>] aio_rw_vect_retry+0x8e/0x19d Sep 2 21:27:33 node04 kernel: [<ffffffff81114dc4>] ? aio_rw_vect_retry+0x0/0x19d Sep 2 21:27:33 node04 kernel: [<ffffffff811168e7>] aio_run_iocb+0x79/0x11f Sep 2 21:27:33 node04 kernel: [<ffffffff8111748d>] sys_io_submit+0x5db/0x6c6 Sep 2 21:27:33 node04 kernel: [<ffffffff8100a078>] system_call_fastpath+0x16/0x1b Sep 2 21:27:33 node04 kernel: [<ffffffff8100a010>] ? system_call+0x0/0x52 Another one: INFO: task tapdisk:12896 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. tapdisk D ffff8800157890c0 0 12896 1 0x00000080 ffff88007ce871a8 0000000000000282 0000000000000000 ffff880001c62dc0 0000001001080001 ffff8800586e27e0 ffff88007ce87fd8 ffff88007ce87fd8 ffff880001c63190 0000000000007d30 000000000000b0c0 ffff880001c63190 Call Trace: [<ffffffff8140a7fd>] io_schedule+0x73/0xb5 [<ffffffff811d624c>] get_request_wait+0xee/0x17c [<ffffffff81057b54>] ? autoremove_wake_function+0x0/0x39 [<ffffffff811cddf4>] ? elv_merge+0x167/0x19f [<ffffffff811d65c2>] __make_request+0x2e8/0x3fb [<ffffffff811d4c1d>] generic_make_request+0x2c8/0x321 [<ffffffff810a9192>] ? mempool_alloc+0x6c/0x11e [<ffffffff81026966>] ? __wake_up_common+0x4e/0x84 [<ffffffff811d4d59>] submit_bio+0xe3/0x100 [<ffffffff811048c3>] submit_bh+0xf4/0x117 [<ffffffff811063b4>] __block_write_full_page+0x1d8/0x2b3 [<ffffffff8115bf23>] ? noalloc_get_block_write+0x0/0x60 [<ffffffff81105d70>] ? end_buffer_async_write+0x0/0x1ab [<ffffffff81105d70>] ? end_buffer_async_write+0x0/0x1ab [<ffffffff8115bf23>] ? noalloc_get_block_write+0x0/0x60 [<ffffffff81106518>] block_write_full_page_endio+0x89/0x95 [<ffffffff81106539>] block_write_full_page+0x15/0x17 [<ffffffff81158866>] ext4_writepage+0x2ac/0x2f0 [<ffffffff810b16f8>] pageout+0x137/0x1fe [<ffffffff810b1bba>] shrink_page_list+0x265/0x483 [<ffffffff810dae9a>] ? mem_cgroup_del_lru+0x39/0x3b [<ffffffff810b0e8f>] ? isolate_pages_global+0x1ba/0x201 [<ffffffff81008ad3>] ? __switch_to+0x452/0x56f [<ffffffff810b21a9>] shrink_inactive_list+0x3d1/0x6d2 [<ffffffff81057b4f>] ? wake_up_bit+0x25/0x2a [<ffffffff81036c87>] ? try_to_wake_up+0x2fc/0x30e [<ffffffff810b2825>] shrink_zone+0x37b/0x44b [<ffffffff810b2b17>] do_try_to_free_pages+0x222/0x399 [<ffffffff810b2d76>] try_to_free_pages+0x6e/0x70 [<ffffffff810b0cd5>] ? isolate_pages_global+0x0/0x201 [<ffffffff810ac912>] __alloc_pages_nodemask+0x3f3/0x63c [<ffffffff810d49dc>] alloc_slab_page+0x1e/0x20 [<ffffffff810d526e>] __slab_alloc+0x16b/0x488 [<ffffffff8110b812>] ? __blockdev_direct_IO+0x11a/0xae0 [<ffffffff8110b812>] ? __blockdev_direct_IO+0x11a/0xae0 [<ffffffff810d5b7f>] kmem_cache_alloc_notrace+0x7d/0xf2 [<ffffffff8110b812>] __blockdev_direct_IO+0x11a/0xae0 [<ffffffff810aeee4>] ? pagevec_lookup+0x22/0x2b [<ffffffff810afff5>] ? invalidate_inode_pages2_range+0x27e/0x2b7 [<ffffffff81159f76>] ext4_direct_IO+0x1b6/0x247 [<ffffffff8115b1ce>] ? ext4_get_block_dio_write+0x0/0xb6 [<ffffffff81157b6a>] ? ext4_end_io_dio+0x0/0x85 [<ffffffff810a7cf6>] generic_file_direct_write+0xed/0x16d [<ffffffff810a7f0c>] __generic_file_aio_write+0x196/0x286 [<ffffffff810f0803>] ? set_fd_set+0x50/0x5b [<ffffffff810f1858>] ? core_sys_select+0x1d6/0x20a [<ffffffff810a805f>] generic_file_aio_write+0x63/0xad [<ffffffff811549af>] ? ext4_file_write+0x0/0x95 [<ffffffff81154a3d>] ext4_file_write+0x8e/0x95 [<ffffffff810d6d97>] ? kmem_cache_alloc+0x47/0x140 [<ffffffff81114e52>] aio_rw_vect_retry+0x8e/0x19d [<ffffffff81114dc4>] ? aio_rw_vect_retry+0x0/0x19d [<ffffffff811168e7>] aio_run_iocb+0x79/0x11f [<ffffffff8111748d>] sys_io_submit+0x5db/0x6c6 [<ffffffff8100a078>] system_call_fastpath+0x16/0x1b [<ffffffff8100a010>] ? system_call+0x0/0x52 A brief list of the hardware and configuratioin: Dom0âēs hardware platform: Motherboard: INTEL S5500BC S5500 Quad Core Xeon Server Board CPU: 2 x Intel Quad Core Xeon E5520 2.26G (5.86GT/sec,8M,Socket 1366) Memory: 8 x Kingston DDR-3 1333MHz 4GB ECC REG. CL9 DIMM w/Parity & Thermal Sensor HD: 4 x WD WD10EARS 1 TB, SATA II 3Gb/s, 64 MB Cache Dom0: F12, with kernel 2.6.32.13 + 2.6.32 Xen patches v2 from http://gentoo-xen-kernel.googlecode.com/files/xen-patches-2.6.32-2.tar.bz2 Dom0 configuration: Processor type and features ---> [*] Symmetric multi-processing support [*] Support sparse irq numbering [*] Enable Xen compatible kernel Preemption Model (No Forced Preemption (Server)) ---> Device Drivers ---> XEN ---> [*] Privileged Guest (domain 0) <*> Backend driver support (NEW) <*> Block-device backend driver (NEW) <*> Block-device tap backend driver (NEW) < > Block-device tap backend driver 2 (NEW) <*> Network-device backend driver (NEW) (8) Maximum simultaneous transmit requests (as a power of 2) (NEW) [ ] Pipelined transmitter (DANGEROUS) (NEW) < > Network-device loopback driver (NEW) < > PCI-device backend driver (NEW) < > TPM-device backend driver (NEW) <M> SCSI backend driver (NEW) <M> USB backend driver (NEW) <M> Block-device frontend driver <M> Network-device frontend driver <M> Network-device frontend driver acceleration for Solarflare NICs (NEW) <M> SCSI frontend driver (NEW) <M> USB frontend driver (NEW) [*] Taking the HCD statistics (for debug) (NEW) [ ] HCD suspend/resume support (DO NOT USE) (NEW) <*> User-space granted page access driver (NEW) <*> Framebuffer-device frontend driver (NEW) <*> Keyboard-device frontend driver (NEW) [*] Disable serial port drivers (NEW) <*> Export Xen attributes in sysfs (NEW) (256) Number of guest devices (NEW) Xen version compatibility (no compatibility code) ---> [*] Place shared vCPU info in per-CPU storage (NEW) DomU: F12 with pv_ops kernel: 2.6.32.19-163.fc12.x86_64 The details of hardware and kernel configuration is here: http://pkill.info/b/2405/setting-up-stable-xen-dom0-with-fedora-xen-3-4-3-with-xenified-linux-kernel-2-6-32-13-in-fedora-12/ Our solution now: Disable the hard disk cache by: hdparm -W0 /dev/sdX After disabling the hard disk cache, the VMs works. When we do a middle size (processing 20GB data simultaneously on 8VMs on top of 2 physical machines) intensive I/O test, the VMs and Dom0 don't crash. We will run a full size (about 800GB data on these 8VMs) experiment and see whether is works. --- Additional comment from norm@turing.une.edu.au on 2010-09-17 02:34:22 EDT --- Some recent threads that look like this problem: http://forums.citrix.com/thread.jspa?threadID=272708 (Xen block devices stop responding in guest) That lead to this kernel mailing list thread: http://lkml.org/lkml/2010/9/1/178 (I/O scheduler deadlocks on Xen virtual block devices) The main points from the kernel thread: - Problem probably in the Xen I/O stack and not guest scheduler. - These patches might be the real fixes: xen: use percpu interrupts for IPIs and VIRQs xen: handle events as edge-triggered A work-around might be to disable irqbalance in the guest. This seems to have been the case so far in the citrix thread mentioned above. I'm now running my main crashing guest without irqbalance. --- Additional comment from eric@pkill.info on 2010-09-17 03:07:25 EDT --- We have done experiments on LVM backed DomUs. The DomUs have this problem as in comments 80 and the other configurations (hardware, the Dom0, DomU's kernel, etc.) are the same except the backed VBD (moved from file / tap:aio to LVM backed). The 8 LVM backed DomUs work well so far with around 5GB writes and 5GB reads for each DomU. Hope this piece info can help. --- Additional comment from drjones@redhat.com on 2010-09-17 09:09:26 EDT --- Hi Norman, Great work tracking these mail threads and potentially finding a solution. I did a quick test on my machine. First I confirmed I could still reproduce it with a kernel compile loop with irqbalance on. It did reproduce in something less then 30 minutes. Then I turned off irqbalance and ran the kernel compile loop a few hours. But then I turned irqbalance back on to confirm I could again trigger it quickly, but still haven't seen it in almost an hour, so currently my testing isn't super conclusive. I'm anxious to hear your results, since I know you've been plagued with it frequently. Drew --- Additional comment from norm@turing.une.edu.au on 2010-09-19 22:25:00 EDT --- Turning off irqbalance is looking very promising. I have not had a crash since doing so and the system has been through some significant workload. Since the 23rd of March this year I've been recording (most) of the times of crashes. Rough stats are: Shortest uptime: 6.683 minutes Longest uptime: 8.885 days Number of Crashes: 210 Crashes/day: 0.845 The 8 days was set during a period of very little activity. As I write this, 4 days of uptime has passed with a normal to large workload. One note, during a particularly high load during this uptime, the message mentioned in comment #67 made a return: Sep 20 11:42:55 turing kernel: hrtimer: interrupt took 940625 ns On the disappointing side, I've still not been able to create a test system that will reliably crash. --- Additional comment from drjones@redhat.com on 2010-09-20 04:13:53 EDT --- I think we can be confident that irqbalance off does the trick. I ran the kernel compile loop with irqbalance off all weekend and it was still running this morning. I turned irqbalance back on and was able to reproduce within an hour or so. Next up: backport the patches pointed to and try another round with irqbalance on. --- Additional comment from drjones@redhat.com on 2010-09-20 08:53:36 EDT --- Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: A work-around for this issue is to turn off irqbalance, i.e. chkconfig --level 2345 irqbalance off service irqbalance stop --- Additional comment from bburns@redhat.com on 2010-09-20 08:56:07 EDT --- Set z-stream flag as this has several BZs that may be duplicates and could have fairy large impact in the field. --- Additional comment from rlerch@redhat.com on 2010-09-21 01:42:12 EDT --- Andrew, thanks for the Technical Note draft, would i be able to get a summary of the following 2 dot points of information so i can write this known issue up for the 6.0 Technical Notes? I tried to get this information from reading the bug, but got a bit lost. * Cause What actions or circumstances cause the bug to present (if known). Note: if the cause is not known or is not fully diagnosed, current hypotheses or theories should be noted. * Consequence What happens when the bug presents. cheers, ryanlerch --- Additional comment from drjones@redhat.com on 2010-09-21 02:56:59 EDT --- Hi Ryan, (In reply to comment #88) > * Cause > What actions or circumstances cause the bug to present (if known). > Note: if the cause is not known or is not fully diagnosed, > current hypotheses or theories should be noted. While we're still doing some testing, I think it's 90+% likely that it's due to the way the event channel irq was set up. Xen events behave like edge triggered interrupts, but the kernel was setting them up as level. Since it's a fundamental issue, basically any action using interrupts could trigger the bug, but it seems you need high activity to increase the probability of hitting it. My reproducer was just to compile a kernel on the guest which triggered interrupts for the xen block device. > * Consequence > What happens when the bug presents. > All the tasks associated with the device that had interrupt issues (usually blkfront) will end up in D-state (i.e. hung). --- Additional comment from drjones@redhat.com on 2010-09-22 04:53:31 EDT --- I've been running the kernel compile loop on a RHEL6 guest patched with the two patches pointed to in comment 81 for over 30 hours (with irqbalance on, of course). I'm pretty confident that they fix the issue and will post them. The "handle events as edge-triggered" patch, which is probably the one that truly fixes the issue, is already in Linus' tree, but .36-rc3, which means it won't be in Fedora until F15. --- Additional comment from norm@turing.une.edu.au on 2010-09-22 05:57:54 EDT --- I just saw that the 2 patches also made it into 2.6.32.22 and 2.6.35.5 So they should make it into Fedora 12 if the stable updates are pushed there. Fedora 13 is currently 2.6.34.7 so I'm not sure about that. See: http://lwn.net/Articles/406311/ http://lwn.net/Articles/406312/ --- Additional comment from drjones@redhat.com on 2010-09-22 08:05:10 EDT --- (In reply to comment #91) > I just saw that the 2 patches also made it into 2.6.32.22 and 2.6.35.5 Ah, nice. Yeah, I forgot to check the stable trees to see what got pulled in by them. --- Additional comment from norm@turing.une.edu.au on 2010-09-22 09:10:30 EDT --- This bug started out in the fedora 12 kernel moved around some and ended up in the RHE 6 kernel. Should this bug be cloned and changed to the F12 kernel to encourage the stable 2.6.32.22 updates to be applied there? --- Additional comment from drjones@redhat.com on 2010-09-22 09:46:33 EDT --- Right, agreed. Instead of cloning (since this bug is huge). I've just created a new bug pointing to the patches, bug 636534. --- Additional comment from cebbert@redhat.com on 2010-09-22 11:29:26 EDT --- (In reply to comment #93) > Should this bug be cloned and changed to the F12 kernel to encourage the stable > 2.6.32.22 updates to be applied there? They're already applied. --- Additional comment from drjones@redhat.com on 2010-09-23 11:52:52 EDT --- QA, This patch needs some wide-spread testing on different machines. I saw no problems with it on my machine when testing, but there's been a report that it causes a boot crash. See Chuck's comment in bug 636534 comment 3. Thanks, Drew --- Additional comment from norm@turing.une.edu.au on 2010-09-24 00:05:07 EDT --- (In reply to comment #96) > This patch needs some wide-spread testing on different machines. I saw no > problems with it on my machine when testing, but there's been a report that it > causes a boot crash. See Chuck's comment in bug 636534 comment 3. Seems there is a typo in some instances of the patch. See bug 636534 comment 4. --- Additional comment from drjones@redhat.com on 2010-09-24 02:38:04 EDT --- (In reply to comment #97) > Seems there is a typo in some instances of the patch. See bug 636534 comment 4. Cool, I just saw that message that there was a typo on xen-devel too. The patch I've posted doesn't have it, so we're good-to-go. Thanks, Drew --- Additional comment from pm-rhel@redhat.com on 2010-10-04 21:50:47 EDT --- This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. --- Additional comment from drjones@redhat.com on 2010-11-16 04:07:37 EST --- *** Bug 587265 has been marked as a duplicate of this bug. *** --- Additional comment from arozansk@redhat.com on 2010-11-17 14:44:38 EST --- Patch(es) available on kernel-2.6.32-83.el6 --- Additional comment from arozansk@redhat.com on 2010-11-17 14:49:20 EST --- List of patches present on kernel-2.6.32-83.el6: Related patch: http://patchwork.usersys.redhat.com/patch/28339 Related patch: http://patchwork.usersys.redhat.com/patch/28340
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Created attachment 520809 [details] backport/revert patch
Patch(es) available on kernel-2.6.32-195.el6
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2011-1530.html