Bug 180285
Summary: | panic during lvremove of snapshot of a mirror volume | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Nate Straz <nstraz> | ||||
Component: | lvm2 | Assignee: | Alasdair Kergon <agk> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | GFS Bugs <gfs-bugs> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 4.0 | CC: | agk, dwysocha, jbrassow, mbroz, rkenna | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | kernel 2.6.9-30.1 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2007-01-10 13:46:36 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Nate Straz
2006-02-06 21:41:46 UTC
Does this still happen with the latest rpms? What kind of mirroring are you using? (DM Mirroring support is not available until RHEL4 U4.) Do you have a script that can reproduce? It'd be nice to know the kernel version in this case as well. (Might as well get the device mapper version from dmsetup too.) Crashed in a GFS assert so one for GFS people to look at first and determine the detailed cause of the failure. (Then it might have to be transferred back.) And please retest with a kernel package > 2.6.9-30.1 (checked into kernel cvs yesterday) as there've been some snapshot fixes recently. I'm loading the RHEL4-U3-re0127 images now (that includes the 30 kernel). I'll try reproducing it with that. I was using whatever mirroring that system-config-lvm sets up. I'd wager that's LVM2 mirroring. Whether that implies DM mirroring, I don't know. I haven't written a script to reproduce it, nor did I reproduce it on my first try. I was running the 29 kernel with device-mapper-1.02.02-3.0.RHEL4. There isn't a GFS build against the -30 kernel so I tried to reproduce this with ext3. I was able to panic the system. Running: kernel: 2.6.9-30.ELsmp lvm2-2.02.01-1.3.RHEL4 device-mapper-1.02.02-3.0.RHEL4 ------------[ cut here ]------------ Unable to handle kernel paging request at virtual address f8c51004 printing eip: f882dc47 *pde = 00000000 kernel BUG at drivers/md/kcopyd.c:145! invalid operand: 0000 [#1] SMP Modules linked in: md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc button battery ac uhci_hcd e7xxx_edac edac_mc hw_random shpchp e1000 floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod CPU: 1 EIP: 0060:[<f8863c9f>] Not tainted VLI EFLAGS: 00010283 (2.6.9-30.ELsmp) EIP is at client_free_pages+0xb/0x32 [dm_mod] eax: 00000100 ebx: f63969a0 ecx: f72c66c0 edx: 00000000 esi: f8bd1080 edi: 00000000 ebp: 00000000 esp: f5fdaf30 ds: 007b es: 007b ss: 0068 Process lvremove (pid: 6882, threadinfo=f5fda000 task=f7aaa030) Stack: f63969a0 f886464f f7e26980 f882c7ca f8bd1080 f7e26780 f885fa9d f7e26780 cfb93120 f5fda000 08ff4ac8 f8861442 f88f7000 f886afe0 f8861b66 00000004 00000000 f8862e40 f88f7000 f8861ac0 f88f7000 f886b080 08ff4ac8 c134fd04 Call Trace: [<f886464f>] kcopyd_client_destroy+0x12/0x26 [dm_mod] [<f882c7ca>] snapshot_dtr+0x4f/0x58 [dm_snapshot] [<f885fa9d>] table_destroy+0x39/0x83 [dm_mod] [<f8861442>] __hash_remove+0x63/0x78 [dm_mod] [<f8861b66>] dev_remove+0xa6/0xc3 [dm_mod] [<f8862e40>] ctl_ioctl+0xd9/0x13c [dm_mod] [<f8861ac0>] dev_remove+0x0/0xc3 [dm_mod] [<c016a316>] sys_ioctl+0x227/0x269 [<c02d2703>] syscall_call+0x7/0xb [<c02d007b>] schedule+0x283/0x8d3 Code: ff ff eb 17 89 38 45 89 c7 39 dd eb da 89 f0 89 fa e8 7d ff ff ff 01 5e 14 31 c0 5b 5e 5f 5d c3 53 89 c3 8b 40 14 39 43 18 74 08 <0f> 0b 91 00 c9 52 86 f8 8b 43 10 e8 8c ff ff ff c7 43 10 00 00 <1>Oops: 0000 [#2] Fatal exception: panic in 5 seconds SMP Modules linked in: md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc button battery ac uhci_hcd e7xxx_edac edac_mc hw_random shpchp e1000 floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod CPU: 2 EIP: 0060:[<f882dc47>] Not tainted VLI EFLAGS: 00010246 (2.6.9-30.ELsmp) EIP is at persistent_commit+0x90/0xd5 [dm_snapshot] eax: 00000000 ebx: 00000000 ecx: f8c51000 edx: 00000000 esi: 00000000 edi: f72c4bc0 ebp: 00000000 esp: f69dcf08 ds: 007b es: 007b ss: 0068 Process kcopyd (pid: 6018, threadinfo=f69dc000 task=f71ea630) Stack: 00004732 00000000 0000183e 00000000 e7c84654 f7e26980 e7c84654 f882ca6a f882ca9a e7c84654 c93ae144 00000000 f8863dc8 00000000 c93ae144 f886b178 00000246 f8863d89 f8863f64 00000001 f886c8c0 f886c8c4 00000206 d85dc000 Call Trace: [<f882ca6a>] copy_callback+0x0/0x34 [dm_snapshot] [<f882ca9a>] copy_callback+0x30/0x34 [dm_snapshot] [<f8863dc8>] run_complete_job+0x3f/0x47 [dm_mod] [<f8863d89>] run_complete_job+0x0/0x47 [dm_mod] [<f8863f64>] process_jobs+0x16/0xcb [dm_mod] [<f8864028>] do_work+0xf/0x2d [dm_mod] [<c01309f3>] worker_thread+0x168/0x1d5 [<f8864019>] do_work+0x0/0x2d [dm_mod] [<c011e7d7>] default_wake_function+0x0/0xc [<c011e7d7>] default_wake_function+0x0/0xc [<c013088b>] worker_thread+0x0/0x1d5 [<c0133f7d>] kthread+0x73/0x9b [<c0133f0a>] kthread+0x0/0x9b [<c01041f5>] kernel_thread_helper+0x5/0xb Code: b9 01 00 00 00 89 f8 e8 73 fb ff ff 85 c0 89 c6 74 07 c7 47 08 00 00 00 00 31 db 3b 5f 28 73 1c 31 ed 8b 4f 2c 31 d2 01 e9 85 f6 <8b> 41 04 0f 94 c2 43 ff 11 83 c5 08 3b 5f 28 72 e6 c7 47 28 00 <0>Fatal exception: panic in 5 seconds Kernel panic - not syncing: Fatal exception I believe you have to use kernel >= 30.1, it contains a critical snapshot fix. Those panics are quite different and should be fixed in yesterday's kernel. Created attachment 124416 [details]
Test case
This test case seems to reproduce the second set of panics easily. The sleep 5
seems to be critical to get it to panic. Please try it with -30.1 since I
don't
see any builds of it on porkchop.
I tried the attached test case on -30.1 and instead of panicking, the lvremove command hung. Crash shows this stack trace: crash> bt 3656 PID: 3656 TASK: f7aba5b0 CPU: 2 COMMAND: "lvremove" #0 [e0ec3e88] schedule at c02d0569 #1 [e0ec3eec] kcopyd_client_destroy at f886471d #2 [e0ec3f3c] mirror_dtr at f8835626 #3 [e0ec3f48] table_destroy at f885fa9b #4 [e0ec3f5c] __hash_remove at f886143d #5 [e0ec3f68] dev_remove at f8861b61 #6 [e0ec3f74] ctl_ioctl at f8862e3c #7 [e0ec3f94] sys_ioctl at c016a243 #8 [e0ec3fc0] system_call at c02d2630 EAX: 00000036 EBX: 00000008 ECX: c134fd04 EDX: 092bb3f0 DS: 007b ESI: 092bb3f0 ES: 007b EDI: 002fa440 SS: 007b ESP: bfe3ece4 EBP: bfe3feb8 CS: 0073 EIP: 001a27a2 ERR: 00000036 EFLAGS: 00000282 Can you get traces for *all* processes on the system? That's a mirror being removed by lvremove. What exactly is the PV, LV, filesystem configuration? And see bug 179786 for other diagnostic commands (dmsetup) to run. Ah - just seen the attachment answering the configuration question. I've never tried a snapshot of an lvm2 mirror so the code shouldn't really have let you run it... But the important thing is that you cannot reproduce the GFS panic any more - then we'll switch this back to an lvm2 bug and assume the panic was fixed by the kcopyd patch. #!/bin/sh # # Script to reproduce Red Hat Bugzilla 180285 # VG="errata" LVSIZE="25G" SNAPSIZE="10G" lvcreate -m 1 -n mirror -L $LVSIZE $VG mkfs -t ext3 /dev/mapper/$VG-mirror mkdir -p /mnt/mirror mount /dev/mapper/$VG-mirror /mnt/mirror lvcreate -s -L $SNAPSIZE -n mirror-snap $VG/mirror dd if=/dev/zero of=/mnt/mirror/zerofile bs=1024k count=100 & sleep 5 lvremove -f $VG/mirror-snap umount /mnt/mirror lvremove -f $VG/mirror The status NEEDINFO_QA has been deprecated in favor of NEEDINFO. Moving bugs to NEEDINFO. switching to lvm2 bug as mentioned in comment #11 What information are you looking for? To close this bug, simply verify that you can't reproduce it anymore (given that snapshots of mirrors are disallowed). You could then open a RFE bug to add the ability to take snapshots of mirrors if you like, but this bug would close. Current RHEL4 lvm2 tools should not allow snapshots of mirrors, lvcreate -s will fail with "Snapshots and mirrors may not yet be mixed." (Support for snapshot of mirrors is RFE bug 220502). |