My machine started often rebooting by itself with 2.6.18-194.3.1.el5xen kernel. What seems to trigger the issue is starting a few (~6-8) virtual machines at the same time while an array is rebuilding/verifying shorty after reboot. array config: md104 : active raid5 sdd3[0] sdc3[1] sdb6[2] sda9[3] 11999808 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] During the VM bootup the array is syncing very slowly (~600KB/s). Some thoughts: - I've never seen before upgrading to 5.5 - the machine is otherwise physically healthy. Memory test passes, cooling is OK. Recently changed HDD (SATA) cables, still occurs. The only thing interesting the logs is this: 53832 May 30 20:48:50 ns1 kernel: INFO: task md104_resync:491 blocked for more than 120 seconds. 53833 May 30 20:48:50 ns1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 53834 May 30 20:48:50 ns1 kernel: md104_resync D ffff88000100a460 0 491 11 494 490 (L-TLB) 53835 May 30 20:48:50 ns1 kernel: ffff8801da5fbd70 0000000000000246 ffff8801da5fbcb0 0000000000000000 53836 May 30 20:48:50 ns1 kernel: 0000000000000009 ffff8801da5e77e0 ffff88000002b7a0 000000000008169c 53837 May 30 20:48:50 ns1 kernel: ffff8801da5e79c8 0000000000000000 53838 May 30 20:48:50 ns1 kernel: Call Trace: 53839 May 30 20:48:50 ns1 kernel: [<ffffffff80264eaf>] __kprobes_text_start+0x317/0x438 53840 May 30 20:48:50 ns1 kernel: [<ffffffff8029c309>] keventd_create_kthread+0x0/0xc4 53841 May 30 20:48:50 ns1 kernel: [<ffffffff8040b4b8>] md_do_sync+0x1d8/0x833 53842 May 30 20:48:50 ns1 kernel: [<ffffffff80288749>] dequeue_task+0x18/0x37 53843 May 30 20:48:50 ns1 kernel: [<ffffffff80288790>] deactivate_task+0x28/0x5f 53844 May 30 20:48:50 ns1 kernel: [<ffffffff8026ef31>] monotonic_clock+0x35/0x7b 53845 May 30 20:48:51 ns1 kernel: [<ffffffff80262dd3>] thread_return+0x6c/0x113 53846 May 30 20:48:51 ns1 kernel: [<ffffffff80248d8c>] try_to_wake_up+0x392/0x3a4 53847 May 30 20:48:51 ns1 kernel: [<ffffffff8029c521>] autoremove_wake_function+0x0/0x2e 53848 May 30 20:48:51 ns1 kernel: [<ffffffff8029c309>] keventd_create_kthread+0x0/0xc4 53849 May 30 20:48:51 ns1 kernel: [<ffffffff8040be8c>] md_thread+0xf8/0x10e 53850 May 30 20:48:51 ns1 kernel: [<ffffffff8029c309>] keventd_create_kthread+0x0/0xc4 53851 May 30 20:48:51 ns1 kernel: [<ffffffff8040bd94>] md_thread+0x0/0x10e 53852 May 30 20:48:51 ns1 kernel: [<ffffffff80233b0f>] kthread+0xfe/0x132 53853 May 30 20:48:51 ns1 kernel: [<ffffffff80260b2c>] child_rip+0xa/0x12 53854 May 30 20:48:51 ns1 kernel: [<ffffffff8029c309>] keventd_create_kthread+0x0/0xc4 53855 May 30 20:48:51 ns1 kernel: [<ffffffff80233a11>] kthread+0x0/0x132 53856 May 30 20:48:51 ns1 kernel: [<ffffffff80260b22>] child_rip+0x0/0x12 53857 May 30 20:48:51 ns1 kernel:
(In reply to comment #0) > Some thoughts: > - I've never seen before upgrading to 5.5 Can you please try 5.4? or any other earlier release? I don't know of anything patch-wise in 5.5 that it could be, but it should be a quick and possibly informative exercise to try. Thanks, Drew
Maybe it's not a regression from 5.4. Recently one of the hard drives starting showing a much higher temperature then usual. I do have a suspicion that it is failing, so perhaps HW issues (inability to read/write a block in 120s) appearing now may be triggering the condition. I checked the logs and I've installed kernel-xen-2.6.18-194.3.1.el5.x86_64 on May 15th. This is the first time that the tracebacks started appearing in /var/log/messages. Up to the time the machine was running kernel-xen-2.6.18-164.15.1.el5.x86_64 from March 19th - no issues. It is not an easy task to run the downgraded kernel, this is a semi-production system. I've scheduled an outage for Thu night and will see what I can find.
I've tried running the old kernel for a couple of hours, but the messages didn't pop up. Likely doesn't prove anything. The MD errors are triggered pretty randomly. And this time rebuild of dirty array wasn't necessary.
And it's back again. Different machine, this time kernel on bare metal (no xen).The message is repeated several times, but only for md101_resync. kernel-2.6.18-194.26.1.el5 This started appearing after updating to the above kernel. There's no mention in the logs while running kernel-2.6.18-194.17.1.el5. If there are any logs / command outputs I can take, now is the time. Let me know information I can add to make the issue easy for you guys to fix. /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md100 : active raid5 sdd1[2] sdc1[1] sdb1[0] 585937280 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] md101 : active raid5 sdd2[2] sdc2[1] sdb2[0] 585937280 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] md0 : active raid1 sda1[0] hdb1[1] 513984 blocks [2/2] [UU] md1 : active raid1 sda2[0] hdb2[1](W) 30716160 blocks [2/2] [UU] unused devices: <none> /var/log/messages: Dec 19 00:48:09 bigbang kernel: INFO: task md101_resync:30403 blocked for more than 120 seconds. Dec 19 00:48:09 bigbang kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 19 00:48:09 bigbang kernel: md101_resync D ffff8101e9e55080 0 30403 87 30402 (L-TLB) Dec 19 00:48:09 bigbang kernel: ffff8101f4471d70 0000000000000046 e16bc1a617ae2752 ffff81021fa0a40c Dec 19 00:48:09 bigbang kernel: ffff81021fa0000c 000000000000000a ffff8100a6abe7e0 ffff8101e9e55080 Dec 19 00:48:09 bigbang kernel: 0000500ce5c9b391 0000000000002469 ffff8100a6abe9c8 00000000c6048baf Dec 19 00:48:09 bigbang kernel: Call Trace: Dec 19 00:48:09 bigbang kernel: [<ffffffff800a08fe>] keventd_create_kthread+0x0/0xc4 Dec 19 00:48:09 bigbang kernel: [<ffffffff8021af2b>] md_do_sync+0x1d8/0x833 Dec 19 00:48:09 bigbang kernel: [<ffffffff8008ca47>] enqueue_task+0x41/0x56 Dec 19 00:48:09 bigbang kernel: [<ffffffff8008cab2>] __activate_task+0x56/0x6d Dec 19 00:48:09 bigbang kernel: [<ffffffff8008c897>] dequeue_task+0x18/0x37 Dec 19 00:48:09 bigbang kernel: [<ffffffff80062ff8>] thread_return+0x62/0xfe Dec 19 00:48:09 bigbang kernel: [<ffffffff800a0b16>] autoremove_wake_function+0x0/0x2e Dec 19 00:48:09 bigbang kernel: [<ffffffff800a08fe>] keventd_create_kthread+0x0/0xc4 Dec 19 00:48:09 bigbang kernel: [<ffffffff8021b8ff>] md_thread+0xf8/0x10e Dec 19 00:48:09 bigbang kernel: [<ffffffff800a08fe>] keventd_create_kthread+0x0/0xc4 Dec 19 00:48:09 bigbang kernel: [<ffffffff8021b807>] md_thread+0x0/0x10e Dec 19 00:48:09 bigbang kernel: [<ffffffff8003290a>] kthread+0xfe/0x132 Dec 19 00:48:09 bigbang kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Dec 19 00:48:09 bigbang kernel: [<ffffffff800a08fe>] keventd_create_kthread+0x0/0xc4 Dec 19 00:48:09 bigbang kernel: [<ffffffff8003280c>] kthread+0x0/0x132 Dec 19 00:48:09 bigbang kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 Dec 19 00:48:09 bigbang kernel:
Hi, any updates? I can reproduce this with every raidcheck (weekly). Disk don't have any reallocated or pending sectors. kernel 2.6.18-194.32.1.el5
Not Xen related. *** This bug has been marked as a duplicate of bug 573106 ***