Bug 164630
| Summary: | panic while running fsstress to a filesystem on a mirror | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 4 | Reporter: | Corey Marthaler <cmarthal> |
| Component: | kernel | Assignee: | Jonathan Earl Brassow <jbrassow> |
| Status: | CLOSED ERRATA | QA Contact: | |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.0 | CC: | agk, jbrassow, kanderso |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | RHSA-2005-514 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2005-10-05 13:46:44 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 156323 | ||
I've seen this twice now while only running two fsstress instances alone to an ext3 filesystem on top of a mirror. This also happens with a linear running fsstress and while doing pvmoves. This bug has been around for a long time... probably since the first time the dm-raid1.c file was included in the kernel. There is a function, __rh_alloc(), which expects to be called while a lock is held. However, it will unlock(read) and relock(write) in order to do allocation. This creates many races. I've changed the way the functions are called (and the arguments to one of them) so that allocation can happen without unlocking. This change closes 2 different race conditions. Devel ACK - needs to get fixed in order for mirrors to be useful during U2 beta on SMP machines. All of this chasing around for race conditions could have been avoided if I would have realized that what was meant by 'create_workqueue' was 'create_singlethread_workqueue'... This eliminates the races and (as best I can tell) is what the original author envisioned. The new patch is a couple line change. Patch submitted to rhkernel. It has been tested by QA and has successfully solved the problem. I saw this once this morning with the latest kern and GFS build while running an
I/O load to a mirror with GFS on top. I have since been attempting to reproduce
again but have not been able to. Will continue to try...
[root@link-08 mirror]# uname -ar
Linux link-08 2.6.9-15.ELsmp #1 SMP Fri Aug 5 19:00:35 EDT 2005 x86_64 x86_64
x86_64 GNU/Linux
Aug 9 04:39:31 link-08 kernel: GFS 2.6.9-38.0 (built Aug 8 2005 15:52:32)
installed
Aug 9 04:39:50 link-08 kernel: GFS: Trying to join cluster "lock_nolock", ""
Aug 9 04:39:50 link-08 kernel: Lock_Nolock 2.6.9-38.0 (built Aug 8 2005
15:52:12) installed
Aug 9 04:39:50 link-08 kernel: GFS: fsid=dm-5.0: Joined cluster. Now mounting FS...
Aug 9 04:39:50 link-08 kernel: GFS: fsid=dm-5.0: jid=0: Trying to acquire
journal lock...
Aug 9 04:39:50 link-08 kernel: GFS: fsid=dm-5.0: jid=0: Looking at journal...
Aug 9 04:39:50 link-08 kernel: GFS: fsid=dm-5.0: jid=0: Done
Unable to handle kernel NULL pointer dereference at 0000000000000038 RIP:
<ffffffffa00e3704>{:dm_mirror:mirror_end_io+84}
PML4 12bf7067 PGD 12bfa067 PMD 0
Oops: 0002 [1] SMP
CPU 0
Modules linked in: lock_nolock(U) gfs(U) lock_harness(U) md5 ipv6 parport_pc lp
parport autofs4 i2c_dev i2c_core sunrpc ds yenta_socket pcmcia_core button
battery ac ohci_hcd hw_random tg3 floppy dm_snapshot dm_zero dm_mirror ext3 jbd
dm_mod qla2300 qla2xxx scsi_transport_fc mptscsih mptbase sd_mod scsi_mod
Pid: 4083, comm: fsstress Tainted: G M 2.6.9-15.ELsmp
RIP: 0010:[<ffffffffa00e3704>] <ffffffffa00e3704>{:dm_mirror:mirror_end_io+84}
RSP: 0018:ffffffff8044a6e8 EFLAGS: 00010216
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffff0000274640
RDX: ffffff0000274640 RSI: 0000000000001795 RDI: 000001001f136c30
RBP: 000001001f136c18 R08: ffffff0000273000 R09: 00000000000007ff
R10: 000001001f52ac00 R11: ffffffffa009a0c2 R12: 000001001f136c30
R13: 000001001f136c00 R14: 0000010036bc7350 R15: 0000000000000000
FS: 0000002a95576b00(0000) GS:ffffffff804d3280(005b) knlGS:00000000f7fdd6c0
CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 0000000000000038 CR3: 0000000000101000 CR4: 00000000000006e0
Process fsstress (pid: 4083, threadinfo 00000100366e4000, task 000001001f11b7f0)
Stack: ffffffffa00e36b0 000001003466ab38 0000000000000000 0000010036bc7340
000001002e1c4d80 0000000000000000 0000000000007000 ffffffffa009a122
000001001f136c00 000001002e1c4d80
Call Trace:<IRQ> <ffffffffa00e36b0>{:dm_mirror:mirror_end_io+0}
<ffffffffa009a122>{:dm_mod:clone_endio+96}
<ffffffffa00e2428>{:dm_mirror:write_callback+314}
<ffffffffa009ee83>{:dm_mod:endio+184}
<ffffffffa009a18f>{:dm_mod:clone_endio+205}
<ffffffff80249ae2>{__end_that_request_first+238}
<ffffffffa0006d7d>{:scsi_mod:scsi_end_request+40}
<ffffffffa0007092>{:scsi_mod:scsi_io_completion+497}
<ffffffffa0002d21>{:scsi_mod:scsi_softirq+213}
<ffffffff8013b774>{__do_softirq+88}
<ffffffff8013b81d>{do_softirq+49} <ffffffff80112f77>{do_IRQ+328}
<ffffffff8011061b>{ret_from_intr+0} <EOI>
<ffffffff801cbb04>{dummy_inode_permission+0}
<ffffffff80184d24>{open_namei+289} <ffffffff80184d10>{open_namei+269}
<ffffffff801761c4>{filp_open+39} <ffffffff801e9c85>{strncpy_from_user+74}
<ffffffff801762cd>{get_unused_fd+230} <ffffffff801763bc>{sys_open+57}
<ffffffff801250ff>{cstar_do_call+27}
Code: f0 ff 4b 38 0f 94 c0 84 c0 0f 84 4b 01 00 00 49 8d 7d 50 e8
RIP <ffffffffa00e3704>{:dm_mirror:mirror_end_io+84} RSP <ffffffff8044a6e8>
CR2: 0000000000000038
<0>Kernel panic - not syncing: Oops
After a reboot and another mount attempt:
Lock_Harness 2.6.9-38.0 (built Aug 8 2005 15:52:12) installed
GFS 2.6.9-38.0 (built Aug 8 2005 15:52:32) installed
GFS: Trying to join cluster "lock_nolock", ""
Lock_Nolock 2.6.9-38.0 (built Aug 8 2005 15:52:12) installed
GFS: fsid=dm-5.0: Joined cluster. Now mounting FS...
GFS: fsid=dm-5.0: jid=0: Trying to acquire journal lock...
GFS: fsid=dm-5.0: jid=0: Looking at journal...
GFS: fsid=dm-5.0: jid=0: Acquiring the transaction lock...
GFS: fsid=dm-5.0: jid=0: Replaying journal...
GFS: fsid=dm-5.0: fatal: filesystem consistency error
GFS: fsid=dm-5.0: function = foreach_descriptor
GFS: fsid=dm-5.0: file =
/usr/src/build/600098-x86_64/BUILD/gfs-kernel-2.6.9-38/smp/src/gfs/recovery.c,
line = 382
GFS: fsid=dm-5.0: time = 1123582218
GFS: fsid=dm-5.0: about to withdraw from the cluster
GFS: fsid=dm-5.0: waiting for outstanding I/O
GFS: fsid=dm-5.0: telling LM to withdraw
GFS: fsid=dm-5.0: withdrawn
GFS: fsid=dm-5.0: start = 13122448
mh_magic = 0x01161970
mh_type = 8
mh_generation = 0
mh_format = 800
mh_incarn = 0
lh_flags = 0x00000000
lh_pad = 0
lh_first = 13122384
lh_sequence = 33296
lh_tail = 13122000
lh_last_dump = 13119488
lh_reserved =
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
GFS: fsid=dm-5.0: jid=0: Failed
GFS: fsid=dm-5.0: error recovering journal 0: -5
Aug 9 05:10:17 link-08 kernel: Lock_Harness 2.6.9-38.0 (built Aug 8 2005
15:52:12) installed
Aug 9 05:10:17 link-08 kernel: GFS 2.6.9-38.0 (built Aug 8 2005 15:52:32)
installed
Aug 9 05:10:17 link-08 kernel: GFS: Trying to join cluster "lock_nolock", ""
Aug 9 05:10:17 link-08 kernel: Lock_Nolock 2.6.9-38.0 (built Aug 8 2005
15:52:12) installed
Aug 9 05:10:17 link-08 kernel: GFS: fsid=dm-5.0: Joined cluster. Now mounting FS...
Aug 9 05:10:17 link-08 kernel: GFS: fsid=dm-5.0: jid=0: Trying to acquire
journal lock...
Aug 9 05:10:17 link-08 kernel: GFS: fsid=dm-5.0: jid=0: Looking at journal...
Aug 9 05:10:17 link-08 kernel: GFS: fsid=dm-5.0: jid=0: Acquiring the
transaction lock...
Aug 9 05:10:17 link-08 kernel: GFS: fsid=dm-5.0: jid=0: Replaying journal...
Aug 9 05:10:18 link-08 kernel: GFS: fsid=dm-5.0: fatal: filesystem consistency
error
Aug 9 05:10:18 link-08 kernel: GFS: fsid=dm-5.0: function = foreach_descriptor
Aug 9 05:10:18 link-08 kernel: GFS: fsid=dm-5.0: file =
/usr/src/build/600098-x86_64/BUILD/gfs-kernel-2.6.9-38/smp/src/gfs/recovery.c,
line = 382
Aug 9 05:10:18 link-08 kernel: GFS: fsid=dm-5.0: time = 1123582218
Aug 9 05:10:18 link-08 kernel: GFS: fsid=dm-5.0: about to withdraw from the cluster
Aug 9 05:10:18 link-08 kernel: GFS: fsid=dm-5.0: waiting for outstanding I/O
Aug 9 05:10:18 link-08 kernel: GFS: fsid=dm-5.0: telling LM to withdraw
Aug 9 05:10:18 link-08 kernel: GFS: fsid=dm-5.0: withdrawn
Aug 9 05:10:18 link-08 kernel: GFS: fsid=dm-5.0: start = 13122448
Aug 9 05:10:18 link-08 kernel: mh_magic = 0x01161970
Aug 9 05:10:18 link-08 kernel: mh_type = 8
Aug 9 05:10:18 link-08 kernel: mh_generation = 0
Aug 9 05:10:18 link-08 kernel: mh_format = 800
Aug 9 05:10:18 link-08 kernel: mh_incarn = 0
Aug 9 05:10:18 link-08 kernel: lh_flags = 0x00000000
Aug 9 05:10:18 link-08 kernel: lh_pad = 0
Aug 9 05:10:18 link-08 kernel: lh_first = 13122384
Aug 9 05:10:18 link-08 kernel: lh_sequence = 33296
Aug 9 05:10:18 link-08 kernel: lh_tail = 13122000
Aug 9 05:10:18 link-08 kernel: lh_last_dump = 13119488
Aug 9 05:10:18 link-08 kernel: lh_reserved =
Aug 9 05:10:18 link-08 kernel: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Aug 9 05:10:18 link-08 last message repeated 3 times
Aug 9 05:10:18 link-08 kernel: GFS: fsid=dm-5.0: jid=0: Failed
Aug 9 05:10:18 link-08 kernel: GFS: fsid=dm-5.0: error recovering journal 0: -5
Aug 9 05:10:19 link-08 hald[2847]: Timed out waiting for hotplug event 308.
Rebasing to 309
Aug 9 05:10:32 link-08 hal.hotplug[3642]: timout(10000 ms) waiting for
/block/diapered_dm-5
The error handling in mirror_end_io looks rather harsh to me. How is EWOULDBLOCK handled with read ahead? Or EOPNOTSUPP ? Also, ms->lock appears to be used from both interrupt and non-interrupt context without suspending interrupts. And map_context->ptr can still get referenced after it's been freed, which would be my guess at the cause of this particular crash. I take that back, confused by the non-simple flow through the function. The rw == READ test is always true? Speculation is the way to the dark side. :)
Keying in on the important Oops statements:
Code: f0 ff 4b 38 0f 94 c0 84 c0 0f 84 4b 01 00 00 49 8d 7d 50 e8
RIP <ffffffffa00e3704>{:dm_mirror:mirror_end_io+84} RSP <ffffffff8044a6e8>
CR2: 0000000000000038
Doing objdump -D on the mirror module:
00000000000026b0 <mirror_end_io>:
<snip>
26f4: e8 64 e5 ff ff callq c5d <__rh_lookup>
26f9: 4c 89 e7 mov %r12,%rdi
26fc: 48 89 c3 mov %rax,%rbx
26ff: e8 00 00 00 00 callq 2704 <mirror_end_io+0x54>
2704: f0 ff 4b 38 lock decl 0x38(%rbx)
Add +84 (+0x54) to the mirror_end_io function to get 2704. Goto line 2704, and
verify the machine code - yup, this is exactly the line we are looking for...
Translate that to real code to get:
static void rh_dec(struct region_hash *rh, region_t region)
{
unsigned long flags;
struct region *reg;
int should_wake = 0;
read_lock(&rh->hash_lock);
reg = __rh_lookup(rh, region);
read_unlock(&rh->hash_lock);
if (atomic_dec_and_test(®->pending)) { <--- right here
Could it be our 'reg' variable? One more check... if it is, pending would must
be offset from reg by 0x38. Indeed! 'pending' is at 0x38 offset in the region
structure (assuming a 64-bit machine will pad 4 bytes between state and
hash_list - which it will)!
We can be reasonably certain that our bug is a know race that we have thought of
in theory but have been trying to reproduce...
Could the following patch fix it?
diff -purN linux-2.6.9-12.EL-01/drivers/md/dm-raid1.c linux-2.6.9-12.EL-02/drive
rs/md/dm-raid1.c
--- linux-2.6.9-12.EL-01/drivers/md/dm-raid1.c 2005-08-03 15:28:40.031213364 -0
500
+++ linux-2.6.9-12.EL-02/drivers/md/dm-raid1.c 2005-08-03 15:30:24.220524638 -0
500
@@ -378,6 +378,11 @@ static void rh_inc(struct region_hash *r
read_lock(&rh->hash_lock);
reg = __rh_find(rh, region);
+
+ spin_lock_irq(&rh->region_lock);
+ atomic_inc(®->pending);
+ spin_unlock_irq(&rh->region_lock);
+
if (reg->state == RH_CLEAN) {
rh->log->type->mark_region(rh->log, reg->key);
@@ -387,7 +392,6 @@ static void rh_inc(struct region_hash *r
spin_unlock_irq(&rh->region_lock);
}
- atomic_inc(®->pending);
read_unlock(&rh->hash_lock);
}
@@ -409,17 +413,17 @@ static void rh_dec(struct region_hash *r
reg = __rh_lookup(rh, region);
read_unlock(&rh->hash_lock);
+ spin_lock_irqsave(&rh->region_lock, flags);
if (atomic_dec_and_test(®->pending)) {
- spin_lock_irqsave(&rh->region_lock, flags);
if (reg->state == RH_RECOVERING) {
list_add_tail(®->list, &rh->quiesced_regions);
} else {
reg->state = RH_CLEAN;
list_add(®->list, &rh->clean_regions);
}
- spin_unlock_irqrestore(&rh->region_lock, flags);
should_wake = 1;
}
+ spin_unlock_irqrestore(&rh->region_lock, flags);
if (should_wake)
wake();
Reproduced this again while attempting to hit 165242.
This was on an x86 UP machine running a 2.6.9-15.EL kernel compiled without
linux-2.6.9-dm-mirroring.patch. I was running ext3 to a 25M linear and running 6
iterations of fsstress and while doing pvmoves at the same time to simulate
mirroring. This had ran around 1.5 - 2 hours before triping.
Unable to handle kernel NULL pointer dereference at virtual address 00000020
printing eip:
e01aa6b2
*pde = 1a1c7067
Oops: 0002 [#1]
Modules linked in: md5(U) ipv6(U) parport_pc(U) lp(U) parport(U) autofs4(U)
i2c_dev(U) i2c_core(U) sunrpc(U) dm_mirror(U) button(U) battery(U) ac(U)
uhci_hcd(U) ehci_hcd(U) e1000(U) floppy(U) ext3(U) jbd(U) dm_mod(U)
qla2300(U)qla2xxx(U) scsi_transport_fc(U) sd_mod(U) scsi_mod(U)
CPU: 0
EIP: 0060:[<e01aa6b2>] Not tainted VLI
EFLAGS: 00010286 (2.6.9-prep)
EIP is at rh_dec+0xc/0x124 [dm_mirror]
eax: 00000000 ebx: df50628c ecx: 00000000 edx: 00000000
esi: df506280 edi: d6a499fc ebp: d459eb80 esp: c03e4eb0
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, threadinfo=c03e4000 task=c0358be0)
Stack: 00000000 df506280 d6a499fc e01abaf2 d67d29fc 00000000 e0060187 d6a49a04
e007e080 e01abacf 00000000 d459eb80 00000400 e0060122 00000000 c016df35
d459eb80 df506280 e01aaef7 db207900 e01aaf3c 00000000 00000000 d459eb80
Call Trace:
[<e01abaf2>] mirror_end_io+0x23/0x28 [dm_mirror]
[<e0060187>] clone_endio+0x65/0xee [dm_mod]
[<e01abacf>] mirror_end_io+0x0/0x28 [dm_mirror]
[<e0060122>] clone_endio+0x0/0xee [dm_mod]
[<c016df35>] bio_endio+0x4f/0x54
[<e01aaef7>] write_callback+0x0/0x49 [dm_mirror]
[<e01aaf3c>] write_callback+0x45/0x49 [dm_mirror]
[<e00653c6>] dec_count+0x43/0x48 [dm_mod]
[<e0065499>] endio+0x39/0x46 [dm_mod]
[<e0065460>] endio+0x0/0x46 [dm_mod]
[<c016df35>] bio_endio+0x4f/0x54
[<c0250dcc>] __end_that_request_first+0xea/0x1ab
[<e0046dcc>] scsi_end_request+0x1b/0x174 [scsi_mod]
[<e0047263>] scsi_io_completion+0x20b/0x417 [scsi_mod]
[<e0042001>] scsi_finish_command+0xad/0xb1 [scsi_mod]
[<e0041f26>] scsi_softirq+0xba/0xc2 [scsi_mod]
[<c012677d>] __do_softirq+0x35/0x79
[<c0109338>] do_softirq+0x46/0x4d
=======================
[<c01088fc>] do_IRQ+0x2b3/0x2bf
[<c030fb6c>] common_interrupt+0x18/0x20
[<c010403b>] default_idle+0x23/0x26
[<c010408c>] cpu_idle+0x1f/0x34
[<c03b86b9>] start_kernel+0x214/0x216
Code: d3 ed a8 20 0f 45 fd 0f 45 ea 89 e9 89 fa 89 f0 e8 b2 fe ff ff 8b 5b 08 eb
d3 5b 5e 5f 5d c3 57 56 53 89 c3e8 3f f9 ff ff 89 c1 <ff> 48 20 0f 94 c0 84 c0
0f 84 06 01 00 00 9c 5f fa 81 7b 20 3c
<0>Kernel panic - not syncing: Fatal exception in interrupt
Exactly the same spot (except on x86):
drivers/md/dm-raid1.c:400
16a6: 57 push %edi
16a7: 56 push %esi
16a8: 53 push %ebx
16a9: 89 c3 mov %eax,%ebx
drivers/md/dm-raid1.c:406
16ab: e8 3f f9 ff ff call fef <__rh_lookup>
16b0: 89 c1 mov %eax,%ecx
include/asm/atomic.h:135
16b2: ff 48 20 decl 0x20(%eax) <--- right here
16a6+c = 16b2
Translate to C code:
static void rh_dec(struct region_hash *rh, region_t region)
{
unsigned long flags;
struct region *reg;
int should_wake = 0;
read_lock(&rh->hash_lock);
reg = __rh_lookup(rh, region);
read_unlock(&rh->hash_lock);
if (atomic_dec_and_test(®->pending)) { <--- right here
Note that in this case the offset is 0x20. This is because pointers are four
bytes on x86... We can easily see that 'pending' is 0x20 from the top of the
region structure on x86.
by "6 iterations of fsstress" do you mean 'fsstress -p 6' ? Taking a look at the following code:
######################################################################
static void rh_inc(struct region_hash *rh, region_t region)
{
struct region *reg;
read_lock(&rh->hash_lock);
reg = __rh_find(rh, region);
if (reg->state == RH_CLEAN) {
rh->log->type->mark_region(rh->log, reg->key);
spin_lock_irq(&rh->region_lock);
reg->state = RH_DIRTY;
list_del_init(®->list); /* take off the clean list */
spin_unlock_irq(&rh->region_lock);
}
atomic_inc(®->pending);
read_unlock(&rh->hash_lock);
}
static void rh_inc_pending(struct region_hash *rh, struct bio_list *bios)
{
struct bio *bio;
for (bio = bios->head; bio; bio = bio->bi_next)
rh_inc(rh, bio_to_region(rh, bio));
}
static void rh_dec(struct region_hash *rh, region_t region)
{
unsigned long flags;
struct region *reg;
int should_wake = 0;
read_lock(&rh->hash_lock);
reg = __rh_lookup(rh, region);
read_unlock(&rh->hash_lock);
if (atomic_dec_and_test(®->pending)) {
spin_lock_irqsave(&rh->region_lock, flags);
if (reg->state == RH_RECOVERING) {
list_add_tail(®->list, &rh->quiesced_regions);
} else {
reg->state = RH_CLEAN;
list_add(®->list, &rh->clean_regions);
}
spin_unlock_irqrestore(&rh->region_lock, flags);
should_wake = 1;
}
if (should_wake)
wake();
}
######################################################################
It would be possible for the process to be in rh_inc, just after failing the
(reg-state == RH_CLEAN), then be interrupted by rh_dec. rh_dec (if there is
only one write outstanding) would place the region on the clean list. process
control returns to rh_inc, which increments pending; but the region is still on
the clean list. The write (which caused the rh_inc) proceeds, but
rh_update_states removes the region because it is on the clean list. When the
new write completes, the region lookup returns NULL - causing the Oops.
This is an _extremely_ small window of opportunity. Could it be this?
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-514.html |
Description of problem: I created an ext3 filesystem on top of a mirror and then ran a fsstress/genesis/accordion I/O load while at the same time doing pvmoves from pvs in the VG to others in the VG which had been extended later. Unable to handle kernel NULL pointer dereference at 0000000000000038 RIP: <ffffffffa00b7f71>{:dm_mirror:mirror_end_io+67} PML4 1ab77067 PGD 3ebc9067 PMD 0 Oops: 0002 [1] SMP CPU 1 Modules linked in: md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc ds yenta_socket pcmcia_core button battery ac ohci_hcd hw_random tg3 floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod Pid: 0, comm: swapper Tainted: G M 2.6.9-11.28.ELsmp RIP: 0010:[<ffffffffa00b7f71>] <ffffffffa00b7f71>{:dm_mirror:mirror_end_io+67} RSP: 0018:0000010037e0bd98 EFLAGS: 00010216 RAX: 0000000000000000 RBX: 0000010037e3dc30 RCX: ffffff00000545e0 RDX: ffffff00000545e0 RSI: 0000000000000a81 RDI: 0000010037e3dc30 RBP: 0000010037e3dc00 R08: ffffff0000054000 R09: 00000000000001ff R10: 000001003f1c0800 R11: ffffffffa006f0c2 R12: 0000000000000000 R13: 0000000000000a81 R14: 0000000000000000 R15: 0000000000001000 FS: 0000002a95576b00(0000) GS:ffffffff804cf400(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000038 CR3: 000000003ff38000 CR4: 00000000000006e0 Process swapper (pid: 0, threadinfo 000001003ff34000, task 000001003ff0b030) Stack: ffffffffa00b7f2e 0000010023a1c118 0000000000000000 000001001d7d7be0 000001002a3b2680 ffffffffa006f122 000001002a3b2680 0000010037e3dc00 0000010019ee5250 000001001f2d5100 Call Trace:<IRQ> <ffffffffa00b7f2e>{:dm_mirror:mirror_end_io+0} <ffffffffa006f122>{:dm_mod:clone_endio+96} <ffffffffa00b6ede>{:dm_mirror:write_callback+70} <ffffffffa0073dbf>{:dm_mod:endio+184} <ffffffffa006f18f>{:dm_mod:clone_endio+205} <ffffffff80247532>{__end_that_request_first+238} <ffffffffa0006d7d>{:scsi_mod:scsi_end_request+40} <ffffffffa0007092>{:scsi_mod:scsi_io_completion+497} <ffffffffa0002d21>{:scsi_mod:scsi_softirq+213} <ffffffff8013b284>{__do_softirq+88} <ffffffff8013b32d>{do_softirq+49} <ffffffff80112f7b>{do_IRQ+328} <ffffffff8011061f>{ret_from_intr+0} <EOI> <ffffffff8010e5c9>{default_idle+0} <ffffffff8010e5e9>{default_idle+32} <ffffffff8010e65c>{cpu_idle+26} Code: f0 41 ff 4c 24 38 0f 94 c0 84 c0 74 76 48 8d 5d 50 48 89 df RIP <ffffffffa00b7f71>{:dm_mirror:mirror_end_io+67} RSP <0000010037e0bd98> CR2: 0000000000000038 <0>Kernel panic - not syncing: Oops [root@link-02 ~]# pvscan PV /dev/hda6 VG VolGroup00 lvm2 [17.53 GB / 64.00 MB free] PV /dev/sda1 VG VG lvm2 [135.66 GB / 110.66 GB free] PV /dev/sdb1 VG VG lvm2 [135.66 GB / 110.66 GB free] PV /dev/sdc1 VG VG lvm2 [135.66 GB / 135.65 GB free] PV /dev/sdd1 VG VG lvm2 [135.66 GB / 135.66 GB free] PV /dev/sdf1 VG VG lvm2 [135.66 GB / 135.66 GB free] PV /dev/sdg1 lvm2 [135.66 GB] [root@link-02 ~]# vgscan Reading all physical volumes. This may take a while... Found volume group "VolGroup00" using metadata type lvm2 Found volume group "VG" using metadata type lvm2 [root@link-02 ~]# lvscan ACTIVE '/dev/VolGroup00/LogVol00' [15.53 GB] inherit ACTIVE '/dev/VolGroup00/LogVol01' [1.94 GB] inherit ACTIVE '/dev/VG/mirror' [25.00 GB] inherit ACTIVE '/dev/VG/mirror_mlog' [4.00 MB] inherit ACTIVE '/dev/VG/mirror_mimage_0' [25.00 GB] inherit ACTIVE '/dev/VG/mirror_mimage_1' [25.00 GB] inherit *Note* There were I/O errors on /dev/sde last night, however that device was not being used at all in this senario, and the errors occured well before this problem happened. Version-Release number of selected component (if applicable): [root@link-02 ~]# lvcreate --version LVM version: 2.01.13 (2005-07-13) Library version: 1.01.03 (2005-06-13) Driver version: 4.4.0