Bug 597369 - kernel oops with md
Summary: kernel oops with md
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 13
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-28 18:22 UTC by Maciej Żenczykowski
Modified: 2011-06-27 16:56 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-06-27 16:56:15 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Maciej Żenczykowski 2010-05-28 18:22:42 UTC
kernel 2.6.33.5-112.fc13.x86_64

Ran mdadm -S /dev/md127 while some other arrays where in the process of syncing.

md127's purpose is interpreting the partition table of a virtual machines disk (and making it available [when not in use by the guest] to the host OS as /dev/md127pX).  It is stored on a single LVM logical volume residing on two LVM physical volumes, both of which were syncing md mirror raids.

(It might be worth mentioning that md127 didn't auto-assemble under fc12, but does under fc13 (and here within 5 minutes of booting into fc13 for the first time... a hard lockup))


May 28 17:30:21 nike kernel: md: md127 stopped.
May 28 17:30:21 nike kernel: md: unbind<dm-5>
May 28 17:30:21 nike kernel: md: export_rdev(dm-5)
May 28 17:30:21 nike kernel: md127: detected capacity change from 8589930496 to 0
May 28 17:30:21 nike kernel: BUG: unable to handle kernel paging request at 0000000000fffff8
May 28 17:30:21 nike kernel: IP: [<ffffffff811566f4>] sysfs_remove_group+0x8c/0xc5
May 28 17:30:21 nike kernel: PGD 12836a067 PUD 12836b067 PMD 0 
May 28 17:30:21 nike kernel: Oops: 0000 [#1] SMP 
May 28 17:30:21 nike kernel: last sysfs file: /sys/devices/virtual/block/md0/md/sync_action
May 28 17:30:21 nike kernel: CPU 1 
May 28 17:30:21 nike kernel: Pid: 10, comm: events/1 Not tainted 2.6.33.5-112.fc13.x86_64 #1 Mac-F42C89C8/MacBookPro4,1
May 28 17:30:21 nike kernel: RIP: 0010:[<ffffffff811566f4>]  [<ffffffff811566f4>] sysfs_remove_group+0x8c/0xc5
May 28 17:30:21 nike kernel: RSP: 0018:ffff88013baebde0  EFLAGS: 00010206
May 28 17:30:21 nike kernel: RAX: 0000000000000013 RBX: ffff8801398fee00 RCX: 0000000000000000
May 28 17:30:21 nike kernel: RDX: 0000000000000064 RSI: 0000000000000000 RDI: ffff880139fc3448
May 28 17:30:21 nike kernel: RBP: ffff88013baebe00 R08: ffff88013baea000 R09: ffff880100000001
May 28 17:30:21 nike kernel: R10: ffff88013999d070 R11: ffff88013a02ccc8 R12: ffff88013188b500
May 28 17:30:21 nike kernel: R13: 0000000000fffff8 R14: ffff88013188b500 R15: ffff880005918248
May 28 17:30:21 nike kernel: FS:  0000000000000000(0000) GS:ffff880005900000(0000) knlGS:0000000000000000
May 28 17:30:21 nike kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
May 28 17:30:21 nike kernel: CR2: 0000000000fffff8 CR3: 0000000128329000 CR4: 00000000000006e0
May 28 17:30:21 nike kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 28 17:30:21 nike kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
May 28 17:30:21 nike kernel: Process events/1 (pid: 10, threadinfo ffff88013baea000, task ffff88013baf0000)
May 28 17:30:21 nike kernel: Stack:
May 28 17:30:21 nike kernel: ffff880139fc3650 ffff880139fc3400 ffff880139fc3448 ffff88013baf0000
May 28 17:30:21 nike kernel: <0> ffff88013baebe30 ffffffff813413a9 ffff88013baf0000 ffff880005918240
May 28 17:30:21 nike kernel: <0> ffff88013baf0000 ffff88013baf0000 ffff88013baebee0 ffffffff81060d31
May 28 17:30:21 nike kernel: Call Trace:
May 28 17:30:21 nike kernel: [<ffffffff813413a9>] mddev_delayed_delete+0x4a/0x98
May 28 17:30:21 nike kernel: [<ffffffff81060d31>] worker_thread+0x1a4/0x232
May 28 17:30:21 nike kernel: [<ffffffff8134135f>] ? mddev_delayed_delete+0x0/0x98
May 28 17:30:21 nike kernel: [<ffffffff8106480b>] ? autoremove_wake_function+0x0/0x34
May 28 17:30:21 nike kernel: [<ffffffff81060b8d>] ? worker_thread+0x0/0x232
May 28 17:30:21 nike kernel: [<ffffffff810643bb>] kthread+0x7a/0x82
May 28 17:30:21 nike kernel: [<ffffffff8100a924>] kernel_thread_helper+0x4/0x10
May 28 17:30:21 nike kernel: [<ffffffff81064341>] ? kthread+0x0/0x82
May 28 17:30:21 nike kernel: [<ffffffff8100a920>] ? kernel_thread_helper+0x0/0x10
May 28 17:30:21 nike kernel: Code: 00 00 00 48 c7 c7 23 dc 77 81 e8 8c 4e ef ff f0 41 ff 06 4d 89 f4 4c 8b 6b 10 eb 0f 48 8b 30 4c 89 e7 49 83 c5 08 e8 b9 d1 ff ff <49> 8b 45 00 48 85 c0 75 e8 48 83 3b 00 74 08 4c 89 e7 e8 50 ed 
May 28 17:30:21 nike kernel: RIP  [<ffffffff811566f4>] sysfs_remove_group+0x8c/0xc5
May 28 17:30:21 nike kernel: RSP <ffff88013baebde0>
May 28 17:30:21 nike kernel: CR2: 0000000000fffff8
May 28 17:30:21 nike kernel: ---[ end trace 4df645bc2c4c4667 ]---
May 28 17:31:30 nike abrt: Kerneloops: Reported 1 kernel oopses to Abrt

Judging from later messages in the system log the machine appears to have continued running for another two hours - however X and even the caps lock key were totally frozen (and networking wasn't yet up) so I had to hard power off.

Comment 1 Maciej Żenczykowski 2010-05-28 20:08:43 UTC
Verified to also occur on 2.6.33.4-95.fc13.x86_64.

This oops triggers every time I run mdadm -S /dev/md127.

If X is up, then the keyboard/mouse/X are unresponsive (however at least part of the kernel and userspace is still up, I have successfully ssh-ed in (authentication successful) although I did not get a shell prompt).
Unresponsive here means: no caps lock light toggle on caps lock key press, no seconds passing on X clock.

However if X is not up, then apparently besides the oops being logged nothing else happens (no hang).

I'm pretty sure this didn't happen with 2.6.32.13-120.fc12.x86_64.


May 28 20:29:14 nike kernel: md: md127 stopped.
May 28 20:29:14 nike kernel: md: unbind<dm-5>
May 28 20:29:14 nike kernel: md: export_rdev(dm-5)
May 28 20:29:14 nike kernel: md127: detected capacity change from 8589930496 to 0
May 28 20:29:14 nike kernel: BUG: unable to handle kernel paging request at 0000000000fffff8
May 28 20:29:14 nike kernel: IP: [<ffffffff811566f4>] sysfs_remove_group+0x8c/0xc5
May 28 20:29:14 nike kernel: PGD ac02d067 PUD a483f067 PMD 0 
May 28 20:29:14 nike kernel: Oops: 0000 [#1] SMP 
May 28 20:29:14 nike kernel: last sysfs file: /sys/devices/virtual/block/md127/uevent
May 28 20:29:14 nike kernel: CPU 0 
May 28 20:29:14 nike kernel: Pid: 9, comm: events/0 Not tainted 2.6.33.5-112.fc13.x86_64 #1 Mac-F42C89C8/MacBookPro4,1
May 28 20:29:14 nike kernel: RIP: 0010:[<ffffffff811566f4>]  [<ffffffff811566f4>] sysfs_remove_group+0x8c/0xc5
May 28 20:29:14 nike kernel: RSP: 0018:ffff88013bae9de0  EFLAGS: 00010206
May 28 20:29:14 nike kernel: RAX: 0000000000000013 RBX: ffff8801399e1a00 RCX: 0000000000000000
May 28 20:29:14 nike kernel: RDX: 0000000000000064 RSI: 0000000000000000 RDI: ffff88013a3be448
May 28 20:29:14 nike kernel: RBP: ffff88013bae9e00 R08: ffff88013bae8000 R09: ffff8800a4865db8
May 28 20:29:14 nike kernel: R10: ffff88013912bc30 R11: ffff880137f723c8 R12: ffff8801393b17d0
May 28 20:29:14 nike kernel: R13: 0000000000fffff8 R14: ffff8801393b17d0 R15: ffff880005818248
May 28 20:29:14 nike kernel: FS:  0000000000000000(0000) GS:ffff880005800000(0000) knlGS:0000000000000000
May 28 20:29:14 nike kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
May 28 20:29:14 nike kernel: CR2: 0000000000fffff8 CR3: 00000000a84e5000 CR4: 00000000000006f0
May 28 20:29:14 nike kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 28 20:29:14 nike kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
May 28 20:29:14 nike kernel: Process events/0 (pid: 9, threadinfo ffff88013bae8000, task ffff88013bad5d40)
May 28 20:29:14 nike kernel: Stack:
May 28 20:29:14 nike kernel: ffff88013a3be650 ffff88013a3be400 ffff88013a3be448 ffff88013bad5d40
May 28 20:29:14 nike kernel: <0> ffff88013bae9e30 ffffffff813413a9 ffff88013bad5d40 ffff880005818240
May 28 20:29:14 nike kernel: <0> ffff88013bad5d40 ffff88013bad5d40 ffff88013bae9ee0 ffffffff81060d31
May 28 20:29:14 nike kernel: Call Trace:
May 28 20:29:14 nike kernel: [<ffffffff813413a9>] mddev_delayed_delete+0x4a/0x98
May 28 20:29:14 nike kernel: [<ffffffff81060d31>] worker_thread+0x1a4/0x232
May 28 20:29:14 nike kernel: [<ffffffff8134135f>] ? mddev_delayed_delete+0x0/0x98
May 28 20:29:14 nike kernel: [<ffffffff8106480b>] ? autoremove_wake_function+0x0/0x34
May 28 20:29:14 nike kernel: [<ffffffff81060b8d>] ? worker_thread+0x0/0x232
May 28 20:29:14 nike kernel: [<ffffffff810643bb>] kthread+0x7a/0x82
May 28 20:29:14 nike kernel: [<ffffffff8100a924>] kernel_thread_helper+0x4/0x10
May 28 20:29:14 nike kernel: [<ffffffff81064341>] ? kthread+0x0/0x82
May 28 20:29:14 nike kernel: [<ffffffff8100a920>] ? kernel_thread_helper+0x0/0x10
May 28 20:29:14 nike kernel: Code: 00 00 00 48 c7 c7 23 dc 77 81 e8 8c 4e ef ff f0 41 ff 06 4d 89 f4 4c 8b 6b 10 eb 0f 48 8b 30 4c 89 e7 49 83 c5 08 e8 b9 d1 ff ff <49> 8b 45 00 48 85 c0 75 e8 48 83 3b 00 74 08 4c 89 e7 e8 50 ed 
May 28 20:29:14 nike kernel: RIP  [<ffffffff811566f4>] sysfs_remove_group+0x8c/0xc5
May 28 20:29:14 nike kernel: RSP <ffff88013bae9de0>
May 28 20:29:14 nike kernel: CR2: 0000000000fffff8
May 28 20:29:14 nike kernel: ---[ end trace 2d0095a978b29fec ]---

Comment 2 Maciej Żenczykowski 2010-05-28 20:11:47 UTC
Oh, forgot to mention this: the background sync of other raid arrays is irrelevant.

I've verified I can stop a normal raid array just fine, thus:

Current theory is that this is related to either:
- an (linear) md raid array with partitions,
or to
- a 1 disk linear md raid array on lvm on top of raid1.

Comment 3 Bug Zapper 2011-06-02 13:02:28 UTC
This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 4 Bug Zapper 2011-06-27 16:56:15 UTC
Fedora 13 changed to end-of-life (EOL) status on 2011-06-25. Fedora 13 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.