166544 – 2.6.9-16.ELsmp null pointer dereference in __bounce_end_io_read on x86_64

Bug 166544 - 2.6.9-16.ELsmp null pointer dereference in __bounce_end_io_read on x86_64

Summary: 2.6.9-16.ELsmp null pointer dereference in __bounce_end_io_read on x86_64

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.0
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Larry Woodman
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	168430
TreeView+	depends on / blocked

Reported:	2005-08-23 06:52 UTC by Norm Murray
Modified:	2007-11-30 22:07 UTC (History)
CC List:	1 user (show)
Fixed In Version:	RHSA-2006-0132
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-03-07 19:37:10 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
patch to highmem.c (705 bytes, patch) 2006-01-05 03:32 UTC, Norm Murray	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2006:0132	0	qe-ready	SHIPPED_LIVE	Moderate: Updated kernel packages available for Red Hat Enterprise Linux 4 Update 3	2006-03-09 16:31:00 UTC

Description Norm Murray 2005-08-23 06:52:38 UTC

Description of problem:
repeatable panic trying to start gimp

Linux up 2.6.9-16.ELsmp #1 SMP Mon Aug 15 20:38:46 EDT 2005 x86_64 x86_64 x86_64
GNU/Linux

No vmcore available from the panic, got the oops via netdump:
[root@amazon-2000 172.16.45.70-2005-08-23-02:52]# cat log
Unable to handle kernel NULL pointer dereference at 0000000000000003 RIP:
<ffffffff801648cb>{__bounce_end_io_read+69}
PML4 122b39067 PGD 122b2e067 PMD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in: netconsole netdump nfsd exportfs lockd md5 ipv6 parport_pc lp
parport i2c_dev i2c_core sunrpc dm_mod button battery ac ohci_hcd ehci_hcd
snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer
snd_page_alloc<ffffffff801648cb>{__bounce_end_io_read+69}
RSP: 0018:ffffffff8044a820  EFLAGS: 00010202
RAX: 00000101253a7000 RBX: 000001012285e300 RCX: 0000000000017000
RDX: 0000000000000000 RSI: 000001012fc77400 RDI: 000001012285e380
RBP: 0000000000000000 R08: 000001012603fc00 R09: 0000010004f84938
R10: 00000101253a7c00 R11: 000001012285e380 R12: 0000000000000000
R13: 000001012fc77400 R14: 0000000000000000 R15: 0000000000017000
FS:  0000000000000000(0000) GS:ffffffff804d3300(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000003 CR3: 0000000000101000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffffffff804d6000, task ffffffff803ca980)
Stack: 0000000000017000 000001012285e380 0000000000000000 0000010126246eb8
       ffffffff801649a2 000000000001e000 ffffffff80249aae <IRQ>
<ffffffff801649a2>{bounce_end_io_read_isa+25}
<ffffffff80249aae>{__end_that_request_first+238}
       <ffffffffa0006d7d>{:scsi_mod:scsi_end_request+40}
<ffffffffa0007092>{:scsi_mod:scsi_io_completion+497}
       <ffffffffa0002d21>{:scsi_mod:scsi_softirq+213}
<ffffffff8013b724>{__do_softirq+88}
       <ffffffff8013b7cd>{do_softirq+49} <ffffffff80112f77>{do_IRQ+328}
       <ffffffff8011061b>{ret_from_intr+0}  <EOI> <ffffffff8010e6cc>{mwait_idle+86}
       <ffffffff8010e65c>{cpu_idle+26} <ffffffff804d967b>{start_kernel+470}
       <ffffffff804d91d5>{_sinittext+469}
 
Code: 48 0f b6 42 03 49 b8 b7 6d db b6 6d db b6 6d 48 bf 00 00 00
RIP <ffffffff801648cb>{__bounce_end_io_read+69} RSP <ffffffff8044a820>
CR2: 0000000000000003


Given that this is x86_64 with no highmem - I'm not even sure why it would be in
this function

Comment 2 Jason Baron 2005-08-23 15:40:15 UTC

bounce code can get called even on 64-bit system, if the device can't dma to the
requested address.

Comment 3 Norm Murray 2005-08-24 07:18:26 UTC

ok... do doing bounce isn't necessarily odd... except on x86_64 we can only
bounce through the isa_dma_pool because without highmem we don't initialize any
other bounce pool.

Not directly related to the panic is a side issue... on this brand new hardware,
why am I limited to isa range for dma... in part, we don't create a lowmem dma
pool because there is no highmem region on the system. Another reason would seem
to be that the scsi (sata) device is not being detected as being able to dma to
the entire 4gb of ram on the system... could be bad support for the device, or
bad bios, or bad detection code...

Comment 4 Norm Murray 2005-08-26 06:32:12 UTC

As I continue to dig... 

Reproduction simply takes cat'ing a file on the lvm array setup in this system,
while access to the disk which is not part of the lvm array doesn't seem to have
any issues. 

Haven't tracked down how yet, but it appears that IO from the lvm array is not
being flagged as capable to do dma to all of memory, even though the underlying
device can - and with no highmem zone, the dma must go through the isa dma region.

Comment 5 Norm Murray 2005-09-06 00:55:30 UTC

So... took lvm apart and used the bare drives... individually, they all work
fine. So there seems to be something in the lvm path that is requiring the
bounce IO, and having problems therein, which is not required by the bare devices.

Comment 6 Norm Murray 2006-01-05 03:32:03 UTC

Created attachment 122807 [details]
patch to highmem.c

Patch from upstream via Dell in IT 85468

Comment 13 Red Hat Bugzilla 2006-03-07 19:37:10 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0132.html

Note You need to log in before you can comment on or make changes to this bug.