Bug 123440
Summary: | syslog flooded with 'raid5: switching cache buffer size' messages | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Sean E. Millichamp <sean> |
Component: | kernel | Assignee: | Heinz Mauelshagen <heinzm> |
Status: | CLOSED CANTFIX | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 3.0 | CC: | petrides, riel, smithj4, tao |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2007-09-12 15:38:38 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Sean E. Millichamp
2004-05-18 16:05:23 UTC
I created a snapshot last night on the volume and a flood of the messages began appearing again. So something with the LVM snapshots definitely seems to be triggering a problem with raid5. I realized I didn't state the kernel version before, but I'm current with up2date. so: kernel-smp-2.4.21-15.EL i686 I just got a kernel panic that caused the volume to become unusable: May 19 10:31:28 xmco2 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000008 May 19 10:31:28 xmco2 kernel: printing eip: May 19 10:31:28 xmco2 kernel: f886e180 May 19 10:31:28 xmco2 kernel: *pde = 04581001 May 19 10:31:28 xmco2 kernel: *pte = 00000000 May 19 10:31:28 xmco2 kernel: Oops: 0000 May 19 10:31:28 xmco2 kernel: parport_pc lp parport autofs e1000 ip_conntrack_ftp iptable_mangle ipt_LOG ipt_state iptable_filter iptable_nat ip_conntrack ip_tables floppy sg microcode loo May 19 10:31:28 xmco2 kernel: CPU: 1 May 19 10:31:28 xmco2 kernel: EIP: 0060:[<f886e180>] Not tainted May 19 10:31:28 xmco2 kernel: EFLAGS: 00010213 May 19 10:31:28 xmco2 kernel: May 19 10:31:28 xmco2 kernel: EIP is at lvm_find_exception_table [lvm-mod] 0x50 (2.4.21-15.ELsmp/i686) May 19 10:31:28 xmco2 kernel: eax: 00000000 ebx: f90d9038 ecx: 00000000 edx: 0001ffff May 19 10:31:28 xmco2 kernel: esi: 1e948280 edi: 00000018 ebp: 00000902 esp: f7f33e20 May 19 10:31:28 xmco2 kernel: ds: 0068 es: 0068 ss: 0068 May 19 10:31:28 xmco2 kernel: Process kupdated (pid: 10, stackpage=f7f33000) May 19 10:31:28 xmco2 kernel: Stack: 0001ffff c04efc70 f35cf000 00000000 00000000 f7f33ea4 f886d1fe 00000902 May 19 10:31:28 xmco2 kernel: 1e948280 f35cf000 1e948280 00000001 f35cf170 f35cf000 f7f05200 00000001 May 19 10:31:28 xmco2 kernel: f88695f8 f7f33ea4 f7f33e9c 1e948280 f35cf000 f7fad000 f7f32000 00000000 May 19 10:31:28 xmco2 kernel: Call Trace: [<f886d1fe>] lvm_snapshot_remap_block [lvm-mod] 0x7e (0xf7f33e38) May 19 10:31:28 xmco2 kernel: [<f88695f8>] lvm_map [lvm-mod] 0x1a8 (0xf7f33e60) May 19 10:31:28 xmco2 kernel: [<c0133d1b>] update_process_times_statistical [kernel] 0x7b (0xf7f33e84) May 19 10:31:28 xmco2 kernel: [<f8869a57>] lvm_make_request_fn [lvm-mod] 0x17 (0xf7f33eb8) May 19 10:31:28 xmco2 kernel: [<c01cd22a>] generic_make_request [kernel] 0xea (0xf7f33ec4) May 19 10:31:28 xmco2 kernel: [<c01cd2c9>] submit_bh_rsector [kernel] 0x49 (0xf7f33eec) May 19 10:31:28 xmco2 kernel: [<c0161dfb>] write_locked_buffers [kernel] 0x3b (0xf7f33f08) May 19 10:31:28 xmco2 kernel: [<c0161f70>] write_some_buffers [kernel] 0x160 (0xf7f33f1c) May 19 10:31:28 xmco2 kernel: [<c016611d>] sync_old_buffers [kernel] 0x7d (0xf7f33fc0) May 19 10:31:28 xmco2 kernel: [<c01664c6>] kupdate [kernel] 0xe6 (0xf7f33fd4) May 19 10:31:28 xmco2 kernel: [<c01663e0>] kupdate [kernel] 0x0 (0xf7f33fe4) May 19 10:31:28 xmco2 kernel: [<c010958d>] kernel_thread_helper [kernel] 0x5 (0xf7f33ff0) May 19 10:31:28 xmco2 kernel: May 19 10:31:28 xmco2 kernel: Code: 39 71 08 74 0f 8b 09 47 39 d9 75 f4 83 c4 08 5b 5e 5f 5d c3 May 19 10:31:28 xmco2 kernel: May 19 10:31:28 xmco2 kernel: Kernel panic: Fatal exception It appeared within a number of the previously stated messages. The overhead with switching MD RAID5 cache buffers is a known issue we can't avoid in that combination because of the restriction to map 512 byte sectors in the lvm driver Okay... if the overhead is just a fact of life I suppose I can deal with that if I have to. Any idea if some future fix/workaround is planned that I might see in (say) RHEL4+, or is that just a permanent fact of life? It would be nice if that restriction/warning had been mentioned either in the installer or the RHEL documentation. The installer lets you build LVM on MD devices and there is no warning I saw in the docs or installer saying that bad things could happen with LVM on MD RAID-5. But any idea about the kernel oops? It seems related and that is way more serious then a performance hit. Fix will be in RHEL4+ with device-mapper/lvm2 replacing lvm1. Wasn't able to reproduce the oops here yet. Does comment #5 mean that there will be no fix for this constant cache buffer switching in RHEL3 then? I am seeing this on a server and it doesn't seem to have been caused by LVM snapshots, I believe that it was triggered only by some heavy IO load. Closing, because this is not reproducable. |