Bug 298811

Summary: pci_alloc_consistent() for 64k on 16gig machine -> return value is not multiple of 64k
Product: Red Hat Enterprise Linux 4 Reporter: William Reich <reich>
Component: kernelAssignee: Prarit Bhargava <prarit>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: high    
Version: 4.4CC: agospoda, emcnabb, jbaron, jplans, nhorman, nstrug, riek, tao, wcohen
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-05-18 19:23:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 461297    
Attachments:
Description Flags
pci_alloc_consistent.stp
none
list of probeable functions in my kernel
none
good run of stap - memory limited to 3 gig
none
bad run - 16 gig of memory used by the machine
none
sysreport for 16gig rh4.4 machine 64bit
none
high address but no IOMMU - first pic
none
high address but no IOMMU - #2
none
possible-rhel4-gart-fix.patch
none
Module to reproduce issue
none
Makefile for prarit.c
none
new prarit.c module
none
RHEL4 fix for this issue
none
Upstream patch that fixes an overflow bug
none
Upstream patch that fixes alignment bug
none
Upstream patch that fixes this issue
none
RHEL4 fix for this issue none

Description William Reich 2007-09-20 18:33:20 UTC
Description of problem:


Version-Release number of selected component (if applicable):
This issue is recreated on RH 4 Update 4 64 bit and
RH 4 Update 5 64 bit.

On a machine with 16 gig of memory, when issuing a 
pci_alloc_consistent() call with a size of 64k,
the return value is a multiple of 16k. We expect a return
value to be a multiple of 64k.

When we limit the amount of memory available to the kernel
by adding the kernel switch mem=4096M to the kernel boot line, 
then the pci_alloc_consistent()
call DOES return a value which is a multiple of 64k.

By adjusting the kernel switch  " mem= " , we notice that any value
<= 4096M gives us a multiple of 64k, which is good.
Anything > 4096M on the kernel line and the pci_alloc_consistent() 
call returns a value which is a multiple of 16k, which is bad
in our way of thinking.

Our PCI device needs the full 64k block.

In our code, we have called
pci_set_dma_mask() with a mask value corresponding to "32 bit dma" and
we  call pci_set_consistent_dma_mask() with a mask value corresponding to "32
bit dma".

When we finally call pci_alloc_consistent() with an input value of 64k.

We would like to understand why we are not
getting a return value from pci_alloc_consistent() that is
a multiple of 64k when using more than 4096M of memory.

Comment 2 Prarit Bhargava 2007-09-21 14:01:08 UTC
William, are you seeing a driver load failure or a driver operation failure?

In either case, could you please attach:

a) /proc/cpuinfo, /proc/meminfo,
b) lspci -xxx -vv
c) a log of the failure

Thanks,

P.

Comment 3 Prarit Bhargava 2007-09-21 14:06:27 UTC
William,

I was just speaking with my colleague jbaron (cc'd on this BZ) who mentioned
that there have been some allocation related patches in this area of code.

Could you please grab the 59.EL5 (or newer) kernel from
http://people.redhat.com/~jbaron/rhel4/ and retest?

Thanks,

P.

Comment 4 William Reich 2007-09-21 14:48:14 UTC
what is this ?
" 59.EL5 (or newer) kernel from
http://people.redhat.com/~jbaron/rhel4/ and retest? "

Is this an update to Red Hat 4 
or is this a version of Red Hat 5 ?
When we did an "up to date" on 9/19, we receive a 2.6.55 kernel for RH 4.

I will retrieve the log items you requested in comment #2.
Please note that item (c) in this list is from my code.
If is a simple " if " statement -
if the address received is not a multiple of 64k, then complain...

I would classify the error as a "driver operation error"
because the error occurs in the initialization logic of the driver.

Comment 5 William Reich 2007-09-21 18:24:41 UTC
Here is the data you asked for.
Also included is the /boot/grub/grub.conf file.
This is the RH 4 Update 4 64bit box.

+++++++++++++++++++++++++++++

00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a4)
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 0
        Capabilities: <available only to root>
00: de 10 5e 00 06 01 b0 00 a4 00 80 05 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 44 00 00 00 00 00 00 00 00 00 00 00

00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev b1)
        Subsystem: nVidia Corporation: Unknown device cb84
        Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 0
00: de 10 51 00 0f 00 a0 00 b1 00 01 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 de 10 84 cb
30: 00 00 00 00 00 00 00 00 00 00 00 00 ff 00 00 00

00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2)
(prog-if 10 [OHCI])
        Subsystem: Hewlett-Packard Company: Unknown device 31f8
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 0 (750ns min, 250ns max)
        Interrupt: pin A routed to IRQ 177
        Region 0: Memory at f39e0000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: <available only to root>
00: de 10 5a 00 07 00 b0 00 a2 10 03 0c 00 00 80 00
10: 00 00 9e f3 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 f8 31
30: 00 00 00 00 44 00 00 00 00 00 00 00 05 01 03 01

00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a4)
(prog-if 20 [EHCI])
        Subsystem: Hewlett-Packard Company: Unknown device 31f8
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 0 (750ns min, 250ns max)
        Interrupt: pin B routed to IRQ 185
        Region 0: Memory at f39d0000 (32-bit, non-prefetchable) [size=256]
        Capabilities: <available only to root>
00: de 10 5b 00 06 00 b0 00 a4 20 03 0c 00 00 80 00
10: 00 00 9d f3 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 f8 31
30: 00 00 00 00 44 00 00 00 00 00 00 00 0a 02 03 01

00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev a3) (prog-if 8a [Master
SecP PriP])
        Subsystem: Hewlett-Packard Company: Unknown device 31f8
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 0 (750ns min, 250ns max)
        Region 4: I/O ports at 1000 [size=16]
        Capabilities: <available only to root>
00: de 10 53 00 05 00 b0 00 a3 8a 01 01 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 01 10 00 00 00 00 00 00 00 00 00 00 3c 10 f8 31
30: 00 00 00 00 44 00 00 00 00 00 00 00 00 00 03 01

00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2) (prog-if 01
[Subtractive decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
        Status: Cap- 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 0
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=120
        I/O behind bridge: 00002000-00003fff
        Memory behind bridge: f3a00000-f3bfffff
        Prefetchable memory behind bridge: e8000000-efffffff
        Secondary status: 66Mhz- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ <SERR- <PERR-
        BridgeCtl: Parity+ SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B-
00: de 10 5c 00 07 01 a0 00 a2 01 04 06 00 00 01 00
10: 00 00 00 00 00 00 00 00 00 01 01 78 20 30 80 22
20: a0 f3 b0 f3 00 e8 f0 ef 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 ff 00 0b 02

00:0c.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) (prog-if 00
[Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 0, Cache Line Size 10
        Bus: primary=00, secondary=08, subordinate=0a, sec-latency=0
        I/O behind bridge: 00004000-00004fff
        Memory behind bridge: f3c00000-f3dfffff
        Secondary status: 66Mhz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort+ <SERR- <PERR-
        BridgeCtl: Parity+ SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
        Capabilities: <available only to root>
00: de 10 5d 00 47 01 10 00 a3 00 04 06 10 00 01 00
10: 00 00 00 00 00 00 00 00 00 08 0a 00 41 41 00 20
20: c0 f3 d0 f3 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 00 03 00

00:0d.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) (prog-if 00
[Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 0, Cache Line Size 10
        Bus: primary=00, secondary=05, subordinate=07, sec-latency=0
        Secondary status: 66Mhz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- <SERR- <PERR-
        BridgeCtl: Parity+ SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
        Capabilities: <available only to root>
00: de 10 5d 00 47 01 10 00 a3 00 04 06 10 00 01 00
10: 00 00 00 00 00 00 00 00 00 05 07 00 f1 01 00 00
20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
30: ff ff 00 00 40 00 00 00 00 00 00 00 ff 00 03 00

00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) (prog-if 00
[Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 0, Cache Line Size 10
        Bus: primary=00, secondary=02, subordinate=04, sec-latency=0
        Secondary status: 66Mhz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- <SERR- <PERR-
        BridgeCtl: Parity+ SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
        Capabilities: <available only to root>
00: de 10 5d 00 47 01 10 00 a3 00 04 06 10 00 01 00
10: 00 00 00 00 00 00 00 00 00 02 04 00 f1 01 00 00
20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
30: ff ff 00 00 40 00 00 00 00 00 00 00 ff 00 03 00

00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
HyperTransport Technology Configuration
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Capabilities: <available only to root>
00: 22 10 00 11 00 00 10 00 00 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 80 00 00 00 00 00 00 00 00 00 00 00

00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
00: 22 10 01 11 00 00 00 00 00 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM
Controller
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
00: 22 10 02 11 00 00 00 00 00 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Miscellaneous Control
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Capabilities: <available only to root>
00: 22 10 03 11 00 00 10 00 00 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 f0 00 00 00 00 00 00 00 00 00 00 00

00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
HyperTransport Technology Configuration
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Capabilities: <available only to root>
00: 22 10 00 11 00 00 10 00 00 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 80 00 00 00 00 00 00 00 00 00 00 00

00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
00: 22 10 01 11 00 00 00 00 00 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM
Controller
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
00: 22 10 02 11 00 00 00 00 00 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Miscellaneous Control
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Capabilities: <available only to root>
00: 22 10 03 11 00 00 10 00 00 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 f0 00 00 00 00 00 00 00 00 00 00 00

00:1a.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
HyperTransport Technology Configuration
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Capabilities: <available only to root>
00: 22 10 00 11 00 00 10 00 00 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 80 00 00 00 00 00 00 00 00 00 00 00

00:1a.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
00: 22 10 01 11 00 00 00 00 00 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:1a.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM
Controller
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
00: 22 10 02 11 00 00 00 00 00 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:1a.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Miscellaneous Control
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Capabilities: <available only to root>
00: 22 10 03 11 00 00 10 00 00 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 f0 00 00 00 00 00 00 00 00 00 00 00

00:1b.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
HyperTransport Technology Configuration
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Capabilities: <available only to root>
00: 22 10 00 11 00 00 10 00 00 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 80 00 00 00 00 00 00 00 00 00 00 00

00:1b.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
00: 22 10 01 11 00 00 00 00 00 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:1b.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM
Controller
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
00: 22 10 02 11 00 00 00 00 00 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:1b.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Miscellaneous Control
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Capabilities: <available only to root>
00: 22 10 03 11 00 00 10 00 00 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 f0 00 00 00 00 00 00 00 00 00 00 00

01:03.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02) (prog-if
00 [VGA])
        Subsystem: Hewlett-Packard Company: Unknown device 31fb
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping+ SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 127 (2000ns min), Cache Line Size 10
        Interrupt: pin A routed to IRQ 193
        Region 0: Memory at e8000000 (32-bit, prefetchable) [size=128M]
        Region 1: I/O ports at 3000 [size=256]
        Region 2: Memory at f3bf0000 (32-bit, non-prefetchable) [size=64K]
        Capabilities: <available only to root>
00: 02 10 5e 51 87 01 90 02 02 00 00 03 10 7f 00 00
10: 08 00 00 e8 01 30 00 00 00 00 bf f3 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 fb 31
30: 00 00 00 00 50 00 00 00 00 00 00 00 0b 01 08 00

01:04.0 System peripheral: Compaq Computer Corporation Integrated Lights Out
Controller (rev 03)
        Subsystem: Hewlett-Packard Company: Unknown device 3305
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Interrupt: pin A routed to IRQ 201
        Region 0: I/O ports at 2800 [size=256]
        Region 1: Memory at f3be0000 (32-bit, non-prefetchable) [size=512]
        Capabilities: <available only to root>
00: 11 0e 03 b2 03 01 90 02 03 00 80 08 00 00 80 00
10: 01 28 00 00 00 00 be f3 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 05 33
30: 00 00 00 00 f0 00 00 00 00 00 00 00 0b 01 00 00

01:04.2 System peripheral: Compaq Computer Corporation Integrated Lights Out 
Processor (rev 03)
        Subsystem: Hewlett-Packard Company: Unknown device 3305
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 120, Cache Line Size 10
        Interrupt: pin B routed to IRQ 209
        Region 0: I/O ports at 3400 [size=256]
        Region 1: Memory at f3bd0000 (32-bit, non-prefetchable) [size=2K]
        Region 2: Memory at f3bc0000 (32-bit, non-prefetchable) [size=8K]
        Region 3: Memory at f3b00000 (32-bit, non-prefetchable) [size=512K]
        Capabilities: <available only to root>
00: 11 0e 04 b2 17 01 90 02 03 00 80 08 10 78 80 00
10: 01 34 00 00 00 00 bd f3 00 00 bc f3 00 00 b0 f3
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 05 33
30: 00 00 00 00 f0 00 00 00 00 00 00 00 0a 02 00 00

01:04.4 USB Controller: Hewlett-Packard Company: Unknown device 3300 (prog-if 00
[UHCI])
        Subsystem: Hewlett-Packard Company: Unknown device 3305
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 120
        Interrupt: pin B routed to IRQ 209
        Region 4: I/O ports at 3800 [size=32]
        Capabilities: <available only to root>
00: 3c 10 00 33 45 01 90 02 00 00 03 0c 00 78 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 01 38 00 00 00 00 00 00 00 00 00 00 3c 10 05 33
30: 00 00 00 00 f0 00 00 00 00 00 00 00 0a 02 00 00

01:04.6 Class 0c07: Hewlett-Packard Company: Unknown device 3302 (prog-if 01)
        Subsystem: Hewlett-Packard Company: Unknown device 3305
        Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Interrupt: pin A routed to IRQ 201
        Region 0: Memory at f3af0000 (32-bit, non-prefetchable) [size=256]
        Capabilities: <available only to root>
00: 3c 10 02 33 02 00 90 02 00 01 07 0c 00 00 80 00
10: 00 00 af f3 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 05 33
30: 00 00 00 00 f0 00 00 00 00 00 00 00 0b 01 00 00

08:00.0 RAID bus controller: Hewlett-Packard Company Hewlett-Packard Smart Array
Controller (rev 03)
        Subsystem: Hewlett-Packard Company: Unknown device 3234
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 0, Cache Line Size 10
        Interrupt: pin A routed to IRQ 201
        Region 0: Memory at f3d00000 (64-bit, non-prefetchable) [size=1M]
        Region 2: I/O ports at 4000 [size=256]
        Region 3: Memory at f3cf0000 (64-bit, non-prefetchable) [size=4K]
        Capabilities: <available only to root>
00: 3c 10 30 32 47 04 10 00 03 00 04 01 10 00 00 00
10: 04 00 d0 f3 00 00 00 00 01 40 00 00 04 00 cf f3
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 34 32
30: 00 00 00 00 b0 00 00 00 00 00 00 00 0b 01 00 00

40:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a4)
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 0
        Capabilities: <available only to root>
00: de 10 5e 00 06 01 b0 00 a4 00 80 05 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 44 00 00 00 00 00 00 00 ff 00 00 00

40:01.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev b1)
        Subsystem: nVidia Corporation: Unknown device cb84
        Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 0
00: de 10 d3 00 0f 00 a0 00 b1 00 80 05 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 de 10 84 cb
30: 00 00 00 00 00 00 00 00 00 00 00 00 ff 00 00 00

40:0b.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) (prog-if 00
[Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 0, Cache Line Size 10
        Bus: primary=40, secondary=4f, subordinate=51, sec-latency=0
        Secondary status: 66Mhz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- <SERR- <PERR-
        BridgeCtl: Parity+ SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
        Capabilities: <available only to root>
00: de 10 5d 00 47 01 10 00 a3 00 04 06 10 00 01 00
10: 00 00 00 00 00 00 00 00 40 4f 51 00 f1 01 00 00
20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
30: ff ff 00 00 40 00 00 00 00 00 00 00 ff 00 03 00

40:0c.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) (prog-if 00
[Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 0, Cache Line Size 10
        Bus: primary=40, secondary=4c, subordinate=4e, sec-latency=0
        I/O behind bridge: 00005000-00005fff
        Memory behind bridge: fdf00000-fdffffff
        Prefetchable memory behind bridge: 00000000f3e00000-00000000f3e00000
        Secondary status: 66Mhz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort+ <SERR- <PERR-
        BridgeCtl: Parity+ SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
        Capabilities: <available only to root>
00: de 10 5d 00 47 01 10 00 a3 00 04 06 10 00 01 00
10: 00 00 00 00 00 00 00 00 40 4c 4e 00 51 51 00 20
20: f0 fd f0 fd e1 f3 e1 f3 00 00 00 00 00 00 00 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 00 03 00

40:0d.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) (prog-if 00
[Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 0, Cache Line Size 10
        Bus: primary=40, secondary=49, subordinate=4b, sec-latency=0
        Memory behind bridge: f8000000-fbffffff
        Secondary status: 66Mhz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort+ <SERR- <PERR-
        BridgeCtl: Parity+ SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
        Capabilities: <available only to root>
00: de 10 5d 00 47 01 10 00 a3 00 04 06 10 00 01 00
10: 00 00 00 00 00 00 00 00 40 49 4b 00 f1 01 00 20
20: 00 f8 f0 fb f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 00 03 00

40:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) (prog-if 00
[Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 0, Cache Line Size 10
        Bus: primary=40, secondary=46, subordinate=48, sec-latency=0
        Secondary status: 66Mhz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- <SERR- <PERR-
        BridgeCtl: Parity+ SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
        Capabilities: <available only to root>
00: de 10 5d 00 47 01 10 00 a3 00 04 06 10 00 01 00
10: 00 00 00 00 00 00 00 00 40 46 48 00 f1 01 00 00
20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
30: ff ff 00 00 40 00 00 00 00 00 00 00 ff 00 03 00

49:00.0 PCI bridge: Intel Corporation 41210 [Lanai] Serial to Parallel PCI
Bridge (A-Segment Bridge) (rev 09) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 0, Cache Line Size 10
        Bus: primary=49, secondary=4a, subordinate=4a, sec-latency=120
        Memory behind bridge: f8000000-f9ffffff
        Secondary status: 66Mhz+ FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ <SERR- <PERR-
        BridgeCtl: Parity+ SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
        Capabilities: <available only to root>
00: 86 80 40 03 47 01 10 00 09 00 04 06 10 00 81 00
10: 00 00 00 00 00 00 00 00 49 4a 4a 78 f0 00 a0 22
20: 00 f8 f0 f9 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 44 00 00 00 00 00 00 00 00 00 03 00

49:00.2 PCI bridge: Intel Corporation 41210 [Lanai] Serial to Parallel PCI
Bridge (B-Segment Bridge) (rev 09) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 0, Cache Line Size 10
        Bus: primary=49, secondary=4b, subordinate=4b, sec-latency=120
        Memory behind bridge: fa000000-fbffffff
        Secondary status: 66Mhz+ FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ <SERR- <PERR-
        BridgeCtl: Parity+ SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
        Capabilities: <available only to root>
00: 86 80 41 03 47 01 10 00 09 00 04 06 10 00 81 00
10: 00 00 00 00 00 00 00 00 49 4b 4b 78 f0 00 a0 22
20: 00 fa f0 fb f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 44 00 00 00 00 00 00 00 00 00 03 00

4a:04.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5706 Gigabit
Ethernet (rev 02)
        Subsystem: Hewlett-Packard Company: Unknown device 3070
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 120 (16000ns min), Cache Line Size 10
        Interrupt: pin A routed to IRQ 82
        Region 0: Memory at f8000000 (64-bit, non-prefetchable) [size=32M]
        Capabilities: <available only to root>
00: e4 14 4a 16 56 01 b0 02 02 00 00 02 10 78 00 00
10: 04 00 00 f8 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 70 30
30: 00 00 00 00 40 00 00 00 00 00 00 00 05 01 40 00

4b:05.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5706 Gigabit
Ethernet (rev 02)
        Subsystem: Hewlett-Packard Company: Unknown device 3070
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 120 (16000ns min), Cache Line Size 10
        Interrupt: pin A routed to IRQ 217
        Region 0: Memory at fa000000 (64-bit, non-prefetchable) [size=32M]
        Capabilities: <available only to root>
00: e4 14 4a 16 56 01 b0 02 02 00 00 02 10 78 00 00
10: 04 00 00 fa 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 70 30
30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 40 00

4c:00.0 PCI bridge: PLX Technology, Inc. PEX 8111 PCI Express-to-PCI Bridge (rev
21) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 0, Cache Line Size 10
        Region 0: Memory at f3ef0000 (64-bit, prefetchable) [size=64K]
        Bus: primary=4c, secondary=4d, subordinate=4d, sec-latency=127
        I/O behind bridge: 00005000-00005fff
        Memory behind bridge: fdf00000-fdffffff
        Secondary status: 66Mhz- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ <SERR- <PERR-
        BridgeCtl: Parity+ SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
        Capabilities: <available only to root>
00: b5 10 11 81 47 01 10 00 21 00 04 06 10 00 01 00
10: 0c 00 ef f3 00 00 00 00 4c 4d 4d 7f 50 50 00 22
20: f0 fd f0 fd f0 ff 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 03 00

4d:04.0 Communication controller: Ulticom (Formerly DGM&S): Unknown device 0303
        Subsystem: Ulticom (Formerly DGM&S): Unknown device 0303
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 127 (1000ns max), Cache Line Size 10
        Interrupt: pin A routed to IRQ 217
        Region 0: Memory at fdff0000 (32-bit, non-prefetchable) [size=512]
        Region 1: I/O ports at 5000 [size=256]
        Region 2: Memory at fdfe0000 (32-bit, non-prefetchable) [size=16K]
        Region 3: Memory at fdfd0000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: <available only to root>
00: d4 12 03 03 57 01 b0 02 00 00 80 07 10 7f 00 00
10: 00 00 ff fd 01 50 00 00 00 00 fe fd 00 00 fd fd
20: 00 00 00 00 00 00 00 00 00 00 00 00 d4 12 03 03
30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 00 04








MemTotal:     14328536 kB
MemFree:      13328108 kB
Buffers:         30856 kB
Cached:         845120 kB
SwapCached:          0 kB
Active:         295816 kB
Inactive:       628284 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:     14328536 kB
LowFree:      13328108 kB
SwapTotal:     2096440 kB
SwapFree:      2096440 kB
Dirty:             336 kB
Writeback:           0 kB
Mapped:          56208 kB
Slab:            42936 kB
CommitLimit:   9260708 kB
Committed_AS:   122044 kB
PageTables:       2104 kB
VmallocTotal: 536870911 kB
VmallocUsed:    263468 kB
VmallocChunk: 536607223 kB
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     2048 kB







processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 65
model name      : Dual-Core AMD Opteron(tm) Processor 8220
stepping        : 3
cpu MHz         : 1004.727
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext lm 3dnowext 3dnow pni cx16
bogomips        : 2009.30
TLB size        : 1088 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp [4] [5]

processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 65
model name      : Dual-Core AMD Opteron(tm) Processor 8220
stepping        : 3
cpu MHz         : 1004.727
cache size      : 1024 KB
physical id     : 1
siblings        : 2
core id         : 0
cpu cores       : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext lm 3dnowext 3dnow pni cx16
bogomips        : 2009.30
TLB size        : 1088 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp [4] [5]

processor       : 2
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 65
model name      : Dual-Core AMD Opteron(tm) Processor 8220
stepping        : 3
cpu MHz         : 1004.727
cache size      : 1024 KB
physical id     : 2
siblings        : 2
core id         : 0
cpu cores       : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext lm 3dnowext 3dnow pni cx16
bogomips        : 2009.30
TLB size        : 1088 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp [4] [5]

processor       : 3
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 65
model name      : Dual-Core AMD Opteron(tm) Processor 8220
stepping        : 3
cpu MHz         : 1004.727
cache size      : 1024 KB
physical id     : 3
siblings        : 2
core id         : 0
cpu cores       : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext lm 3dnowext 3dnow pni cx16
bogomips        : 2009.30
TLB size        : 1088 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp [4] [5]

processor       : 4
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 65
model name      : Dual-Core AMD Opteron(tm) Processor 8220
stepping        : 3
cpu MHz         : 1004.727
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 1
cpu cores       : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext lm 3dnowext 3dnow pni cx16
bogomips        : 2009.30
TLB size        : 1088 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp [4] [5]

processor       : 5
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 65
model name      : Dual-Core AMD Opteron(tm) Processor 8220
stepping        : 3
cpu MHz         : 1004.727
cache size      : 1024 KB
physical id     : 1
siblings        : 2
core id         : 1
cpu cores       : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext lm 3dnowext 3dnow pni cx16
bogomips        : 2009.30
TLB size        : 1088 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp [4] [5]

processor       : 6
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 65
model name      : Dual-Core AMD Opteron(tm) Processor 8220
stepping        : 3
cpu MHz         : 1004.727
cache size      : 1024 KB
physical id     : 2
siblings        : 2
core id         : 1
cpu cores       : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext lm 3dnowext 3dnow pni cx16
bogomips        : 2009.30
TLB size        : 1088 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp [4] [5]

processor       : 7
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 65
model name      : Dual-Core AMD Opteron(tm) Processor 8220
stepping        : 3
cpu MHz         : 1004.727
cache size      : 1024 KB
physical id     : 3
siblings        : 2
core id         : 1
cpu cores       : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext lm 3dnowext 3dnow pni cx16
bogomips        : 2009.30
TLB size        : 1088 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp [4] [5]








# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/cciss/c0d0p3
#          initrd /initrd-version.img
#boot=/dev/cciss/c0d0
default=saved
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Red Hat Enterprise Linux AS (2.6.9-42.ELsmp)
        savedefault
        root (hd0,0)
        kernel /vmlinuz-2.6.9-42.ELsmp ro root=LABEL=/ mem=16384M
# selinux=0 acpi=off mem=16384M
        initrd /initrd-2.6.9-42.ELsmp.img
title Red Hat Enterprise Linux AS-up (2.6.9-42.EL)
        savedefault
        root (hd0,0)
        kernel /vmlinuz-2.6.9-42.EL ro root=LABEL=/ selinux=0 acpi=off noacpi
        initrd /initrd-2.6.9-42.EL.img

Comment 6 Prarit Bhargava 2007-09-24 13:01:57 UTC
(In reply to comment #4)
> what is this ?
> " 59.EL5 (or newer) kernel from

Oops.  Sorry for the typo -- that should have read 59.EL4.  It is the latest
RHEL4 codebase which eventually will become RHEL4.6, RHEL4.7, etc..

> http://people.redhat.com/~jbaron/rhel4/ and retest? "
> 

Thanks,

P.


Comment 7 William Reich 2007-09-25 12:08:02 UTC
We used the "59" kernel as you requested.
The experiment failed.
There was no difference in behavior.
----
To clarify,
when we request a 64k block, we are expecting
that the address of that 64k block to begin at an address 
that is on a 64k boundry.
When we are on  a machine with 16gig of memory, that does not occur.
When we limit the memory of the machine to 4gig, then
we do get what we expect.

Comment 8 William Reich 2007-10-03 20:28:44 UTC
we are going to try RH5 when a machine frees up because
we are curious.
However, this will not be long term fix for us because our
customers have stated that they wish to remain with RH4.

Comment 9 William Reich 2007-10-03 20:31:29 UTC
To provide more detail,
the chip on the board that we interface with across the PCI bus
demands that the memory address be on a 64k byte boundry because
the chip utilizes the lower bits of the address for its own
purposes.


Comment 10 William Reich 2007-10-03 20:42:28 UTC
The only potential workaround that we see
is to allocate a 128k block of memory, and then move
our pointer for the chip to the address of where the 64k boundry is.
This should be workable in theory since there must be
a 64k boundry somewhere within a 128k block of memory. Although we will have
additional work to do so as to keep track of kernel pointers vs chip
pointers. And, of course, we will waste half of the 128k of memory.
( we have not tried this yet since a machine is not available at the moment. )
---
Either way, we are still curious as to why the behavior of the
pci_alloc_consistent() call changes when the amount of memory in the
machine changes.


Comment 11 Prarit Bhargava 2007-10-04 11:17:22 UTC
William, what is the HP model # of this system?

Thanks,

P.

Comment 14 Prarit Bhargava 2007-10-04 11:46:29 UTC
William, 

Just to make sure I have this right:  You want an 64k-aligned region of memory
to be returned from pci_alloc_consistent.

If that is the case, then you should know that pci_alloc_consistent does not
guarantee alignment.  It might be 16k aligned, it might be 64k, etc..

You must use the pci_pool_* functions to re-allocate memory within the memory
region returned by pci_alloc_consistent.  The pci_pool_create function allows
one to set alignment requirements.

P.

Comment 15 William Reich 2007-10-04 11:50:43 UTC
The machine we have with 16gig of memory is a
HP Proliant DL585 G2.
It is big, expensive, and physically heavy.
This is the machine that our customer has chosen for their
deployments.

Comment 16 William Reich 2007-10-04 11:58:18 UTC
"... you should know that pci_alloc_consistent does not
guarantee alignment.  It might be 16k aligned, it might be 64k, etc..."

Where can I find the documentation that supports this statement ?
We have found very little documentation on this API in our research.

It is very odd that we would have been "just lucky" to be
having this API work by giving us 64k boundries over the many
machines and many deployments that we have had over the last 6 years.
Granted, we have only attempted to use 16gig machines recently...

It is still strange to us that the behavior of the API changes
with the amount of memory allowed to be used by the kernel.
( We use the same physical box, and simply add the mem=x switch
to the kernel command line in grub.conf . )


Comment 17 William Reich 2007-10-04 12:00:31 UTC
oops - I forgot to answer your question in #14 directly...

Yes, we are expecting the return of pci_alloc_consistent()
to be aligned on a 64k boundry.

Comment 18 William Reich 2007-10-04 12:02:48 UTC
what happened to comments 12 & 13 on this thread ??

Comment 19 William Reich 2007-10-04 12:13:16 UTC
more related to comment 14...

If you are suggesting that we have to create a pool of some type,
then this pool is going to have to be bigger than the 64k bytes that
we need because you are suggesting that the return will not be
64k bounded.
This seems like a lot of work to get 1 64k chunk of memory.
It seems that this idea is similar to the idea I put forth in 
comment 10. However using the pool APIs implies even more work.

Comment 20 Prarit Bhargava 2007-10-04 12:23:43 UTC
Hmmm ... DL585G2, I think we have one of these.  Is your BIOS, etc., all up-to-date?

I'm going to go off and run a few tests on the box.

I'll ping you with results.

P.


Comment 21 Prarit Bhargava 2007-10-04 12:27:06 UTC
I misunderstood what you were attempting to do.  I got a bit confused.

You are doing the following:

pci_alloc_consistent(... size=64k, GFP_KERNEL);

and are getting a return of a 16k-byte aligned value, correct?

I thought you were doing the pci_alloc_consistent and then attempting to
re-allocate 16k within that area....

P.

Comment 22 William Reich 2007-10-04 12:39:24 UTC
Yes, the size input to our call of pci_alloc_consistent() is 64k.
( By the way, GFP_KERNEL is not a valid input to this call... )

Comment 23 William Reich 2007-10-04 12:41:43 UTC
Our DL585 machines arrived in our lab within the last 45 days.
I'm told that the BIOS is the newest available - Feb 2007.
( I have not inspected this for myself yet as the machines
are being used by someone else at the moment. )

Comment 24 William Reich 2007-10-04 12:49:56 UTC
bios info...

version A07  Feb 27, 2007



Comment 25 Prarit Bhargava 2007-10-04 13:34:39 UTC
(In reply to comment #22)
> Yes, the size input to our call of pci_alloc_consistent() is 64k.
> ( By the way, GFP_KERNEL is not a valid input to this call... )

Sorry, I was mixing up coherent and consistent :)

I'll take a look at our dl585g2 when it becomes available.  Could you
cut-and-paste the exact pci_alloc_consistent call you're making (with the exact
size value)?

Thanks :)

P.



Comment 26 Andy Gospodarek 2007-10-04 14:30:25 UTC
(In reply to comment #16)
> 
> It is still strange to us that the behavior of the API changes
> with the amount of memory allowed to be used by the kernel.
> ( We use the same physical box, and simply add the mem=x switch
> to the kernel command line in grub.conf . )
> 

It's not completely strange -- at least in my opinion.  I'm going to dig through
and see what I see, but I'm guessing that memory zone mapping will change
drastically as you increase (more than double) the amount of memory you have in
your system.

Comment 27 William Reich 2007-10-04 15:06:17 UTC
( cut & paste of code as requested )

.
 .
  .

/* here we call a wrapper to get the memory */

#define ACTUAL_DMA_SIZE 0x10000 /* 64 k alignment */

        /* first do the board address list */
        if (!(boardConfig->dmaBrdAddrList.virt_memoryp =
                  stream_pci_alloc_consistent((boardConfig->pciDev),
                        ACTUAL_DMA_SIZE, (dma_addr_t *)
boardConfig->dmaBrdAddrList.physical_mem)))
        {
                cmn_err(CE_CONT, "%s(%d): pcimb3_pci_alloc_consistent()
failed\n", __FUNCTION__, __LINE__);
                return 1;
        }
.
 .
  .


+++++++++++++++++++++++++++++++++++++++++++++
/* this is the wrapper code that actually calls the kernel */

caddr_t
stream_pci_alloc_consistent(stream_pci_dev_t *dev, 
                            size_t size,
                            void *phys_addr)
{
    dma_addr_t *dma_p = (dma_addr_t *)phys_addr;


    /*caller will check for failure*/
    return (caddr_t)pci_alloc_consistent(STREAM_CONVERT_TO_PCI_DEV(dev),
                                         size, dma_p);

}/* stream_pci_alloc_consistent*/




Comment 28 Andy Gospodarek 2007-10-04 22:19:38 UTC
Created attachment 216601 [details]
pci_alloc_consistent.stp

This script can be used to monitor calls to pci_alloc_consistent (really
dma_alloc_coherent since pci_alloc_consistent is inline it can't be probed) and
will scream and print a backtrace whenever you hit the bug you are seeing.  You
may need to install systemtap if it's not installed already.

You can run it by doing the following:

# stap filname.stp

Feel free to capture the output if you like.  You will obviously want to run
this before loading the module that causes the problem.

Comment 30 William Reich 2007-10-04 23:09:57 UTC
I am not familar with "systemtap".
questions:
1) how do I verify that this tool is installed on my box ?
2) if the tool is not on the box, which package needs to be installed ?

thanks

Comment 31 Andy Gospodarek 2007-10-05 00:34:30 UTC
You can probably just install it with an 

# up2date -i systemtap 

and up2date will take care of the the dependencies and install all the packages
you need for the tool. 

After that you will need to install the kernel-debuginfo too.  This can be
downloaded from:

http://updates.redhat.com/enterprise/4AS/en/os/Debuginfo/x86_64/RPMS/

Just install kernel-debuginfo rpm version that matches the one you are running.


Comment 32 William Reich 2007-10-05 15:21:04 UTC
need some help...

What does this error mean ?

+++++++++++++++++


# cat pci_alloc_consistent.stp
probe kernel.function("dma_alloc_coherent").return
{
        #printf(" address = 0x%x\n",$return);
        if (($return & ~($size)) != $return) {
                printf("INCORRECT MEMORY ALIGNMENT! ");
                printf("dma_alloc_coherent: size = 0x%x address =
0x%x\n",$size,$return);
                print_backtrace();
        } else {
                printf("dma_alloc_coherent: size = 0x%x address =
0x%x\n",$size,$return);
        }
}

# 
# 
# 
# 
# 
# stap -v  ./pci_alloc_consistent.stp
Pass 1: parsed user script and 25 library script(s) in 120usr/0sys/137real ms.
semantic error: target variables not available to .return probes
semantic error: no match for probe point
while: resolving probe point kernel.function("dma_alloc_coherent").return
Pass 2: analyzed script: 0 probe(s), 1 function(s), 0 global(s) in
270usr/30sys/299real ms.
Pass 2: analysis failed.  Try again with more '-v' (verbose) options.
# 
# 
# date
Fri Oct  5 11:17:03 EDT 2007


Comment 33 William Reich 2007-10-05 15:45:21 UTC
Created attachment 217641 [details]
list of probeable functions in my kernel

Comment 34 William Reich 2007-10-05 15:47:14 UTC
(In reply to comment #33)
> Created an attachment (id=217641) [edit]
> list of probeable functions in my kernel
> 
I listed the probeable functions in my kernel, and placed the
results in the attachment id=217641


Comment 35 William Reich 2007-10-05 15:48:02 UTC
dma_alloc_coherent is in the list.

Comment 36 William Reich 2007-10-05 16:58:28 UTC
I found a workaround.

Instead of asking for the values of the variables,
I just removed them.

Since I only have a limited number of calls to pci_alloc_consistent,
it looks like this idea might pan out.

I will post a good run where the memory is limited to 3gig,
and then post a bad run where the memory is the full 16gig.

Comment 37 William Reich 2007-10-05 17:03:27 UTC
Created attachment 217701 [details]
good run of stap - memory limited to 3 gig

the stap was executed with memory on the machine limited to 3 gig.
The exact stp file is listed at the end of the logfile

Comment 38 William Reich 2007-10-05 17:16:23 UTC
Created attachment 217711 [details]
bad run - 16 gig of memory used by the machine

in this run of the stp file,
the full 16 gig of memory was allowed to be used by the kernel.
My driver reported the error that it was expected to.

Comment 39 William Reich 2007-10-05 17:17:33 UTC
I do not understand why on the good run, the
pci_alloc_consistent ( or dma_alloc_coherent )
was called twice, but on the bad run the routine was only called once.

Comment 40 Andy Gospodarek 2007-10-05 20:07:40 UTC
Great job getting it working!  I think I know what is wrong with the script. 
Can you try this as the script?

probe kernel.function("dma_alloc_coherent").return
{
        printf("dma_alloc_coherent: size = 0x10000 address = 0x%x\n",$return);
        print_backtrace();
}


That should work correctly for your instance and we might be able to get some
more info about the return address and might help us understand why it was
called twice -- any chance pci_alloc_consistent is being called twice by your code?



Comment 41 William Reich 2007-10-06 00:06:54 UTC
the machine is offline. I can not do this test tonight from home.
I'll try Monday...
+++
a question - 
what are you expecting to see as the return address ?
Based on debug I inserted into my code,
I know that the "bad" address is a multiple of 16k, not 64k.
Each test reveals a different address, but it is always a multiple of
16k when the machine is configured for 16gig.
Since the new script is just giving a backtrace
( which I believe will be the same backtrace as comment 38 ),
what new information will be revealed by this test ?

Comment 42 William Reich 2007-10-08 03:42:33 UTC
--- good run with 4gig of memory allowed on the box
using stap script to return $return...

# stap ./p2.stp


dma_alloc_coherent: size = 0x10000 address =0x1001dbb0000
trace for 30764 (mount)
 0xffffffff8012233c : kretprobe_trampoline+0x1/0x2 []
 0xffffffff8012233b : kretprobe_trampoline+0x0/0x2 []
 0xffffffffa059c06c : init_module+0x18106c/0x0 [nfs]
 0xffffffffa04e813c : init_module+0xcd13c/0x0 [nfs]
 0xffffffffa04e7f15 : init_module+0xccf15/0x0 [nfs]
 0xffffffffa04e2cd1 : init_module+0xc7cd1/0x0 [nfs]
 0xffffffffa04e7583 : init_module+0xcc583/0x0 [nfs]
 0xffffffffa04e750b : init_module+0xcc50b/0x0 [nfs]
 0xffffffff8017fa5d : get_sb_single+0x50/0x93 []
 0xffffffff8017fb41 : do_kern_mount+0xa1/0x179 []
 0xffffffff801951a0 : do_mount+0x69a/0x6e2 []
 0xffffffff801ea749 : __up_read+0x10/0x8b []
 0xffffffff80123ed3 : do_page_fault+0x23f/0x628 []
 0xffffffff80169ca3 : handle_mm_fault+0x175/0x568 []
 0xffffffff8018728a : __user_walk+0x5e/0x69 []
 0xffffffff80195543 : sys_mount+0xba/0x158 []
 0xffffffff8011026a : system_call+0x7e/0x83 []

dma_alloc_coherent: end




dma_alloc_coherent: size = 0x10000 address =0x10079370000
trace for 30764 (mount)
 0xffffffff8012233c : kretprobe_trampoline+0x1/0x2 []
 0xffffffff8012233b : kretprobe_trampoline+0x0/0x2 []
 0xffffffffa059c06c : init_module+0x18106c/0x0 [nfs]
 0xffffffffa04e813c : init_module+0xcd13c/0x0 [nfs]
 0xffffffffa04e7f15 : init_module+0xccf15/0x0 [nfs]
 0xffffffffa04e2cd1 : init_module+0xc7cd1/0x0 [nfs]
 0xffffffffa04e7583 : init_module+0xcc583/0x0 [nfs]
 0xffffffffa04e750b : init_module+0xcc50b/0x0 [nfs]
 0xffffffff8017fa5d : get_sb_single+0x50/0x93 []
 0xffffffff8017fb41 : do_kern_mount+0xa1/0x179 []
 0xffffffff801951a0 : do_mount+0x69a/0x6e2 []
 0xffffffff801ea749 : __up_read+0x10/0x8b []
 0xffffffff80123ed3 : do_page_fault+0x23f/0x628 []
 0xffffffff80169ca3 : handle_mm_fault+0x175/0x568 []
 0xffffffff8018728a : __user_walk+0x5e/0x69 []
 0xffffffff80195543 : sys_mount+0xba/0x158 []
 0xffffffff8011026a : system_call+0x7e/0x83 []

dma_alloc_coherent: end


Comment 43 Andy Gospodarek 2007-10-08 12:12:27 UTC
The purpose of this script it to help us capture all calls to
pci_alloc_consistent so that we can examine everything more closely and try to
make sure this isn't something unique to your setup (which I'm guessing it's
not).  It also allows us to test this on a system (yours or ours) without
constantly recompiling the kernel or modules (since we don't have access to your
module).

Comment 44 William Reich 2007-10-08 14:21:49 UTC
while trying to get the "bad" run,
I see, so far, 1 hit instead of 2 in your debug code.
I'm guessing this is an ethernet driver.
However, my code is still complaining about an error
while your code shows no output that can be correlated to my output.

Therefore, I'm going to have to add some print statements
back into my driver...

( Translation - the stap debug code does appear to "hit"
when my driver declares a failure. )

more later

Comment 45 William Reich 2007-10-08 14:35:33 UTC
please ignore comment 44.
It appears that I am not awake yet...

---------

I finally did find 2 calls to pci_alloc_consistent in my driver.

---------

I'm still trying to get a bad run with meaningful data...

Comment 46 William Reich 2007-10-08 15:11:48 UTC
Here is the output of the "bad" run.

The trigger only hits once because my code declares failure, and the
second attempt at pci_alloc_consistent() is not performed.

This data reveals that the output of pci_alloc_consistent() is correct.

Therefore, I now have to go back to my code and look hard at the error logic.

+++++++++++++++





dma_alloc_coherent: size = 0x10000 address =0x102605f0000
trace for 14246 (mount)
 0xffffffff8012233c : kretprobe_trampoline+0x1/0x2 []
 0xffffffff8012233b : kretprobe_trampoline+0x0/0x2 []
 0xffffffffa02d143c : momOpen+0xf4/0x150 [strmfs_mod]
 0xffffffffa022513c : init_module+0x713c/0xa9000 [nfs]
 0xffffffffa0224f15 : init_module+0x6f15/0xa9000 [nfs]
 0xffffffffa021fcd1 : init_module+0x1cd1/0xa9000 [nfs]
 0xffffffffa0224583 : init_module+0x6583/0xa9000 [nfs]
 0xffffffffa022450b : init_module+0x650b/0xa9000 [nfs]
 0xffffffff8017fa5d : get_sb_single+0x50/0x93 []
 0xffffffff8017fb41 : do_kern_mount+0xa1/0x179 []
 0xffffffff801951a0 : do_mount+0x69a/0x6e2 []
 0xffffffff801ea749 : __up_read+0x10/0x8b []
 0xffffffff80123ed3 : do_page_fault+0x23f/0x628 []
 0xffffffff80169ca3 : handle_mm_fault+0x175/0x568 []
 0xffffffff8018728a : __user_walk+0x5e/0x69 []
 0xffffffff80195543 : sys_mount+0xba/0x158 []
 0xffffffff8011026a : system_call+0x7e/0x83 []

dma_alloc_coherent: end



Comment 47 Andy Gospodarek 2007-10-08 15:17:13 UTC
Sounds good, William.  Keep us posted on what you find.

Comment 49 William Reich 2007-10-09 13:54:34 UTC
My error logic appears to be fine.
Our debug data is incomplete.

Recall that the formal prototype for pci_alloc_consistent()
is...

void *pci_alloc_consistent( struct pci_dev *dev,
       size_t size,
       dma_addr_t *dma_handle) ;

There are actually two returns from this function!
The value of *dma_handle is the value that is not 64k bounded.

Another way to look at this prototype is:
void *pci_alloc_consistent( struct pci_dev *dev,
       size_t size,
       void *phys_addr) ;   <------------ !!

The value that we captured in the debug is the
formal return parameter, which is the virtual memory address.
The value we do not see from the stp debug script
is the value of the physical memory address,
which is returned as *dma_handle.

Our program is checking that the PHYSICAL address is 64k bounded.
This is not occurring when 16gig of memory is in the machine.




Comment 50 Andy Gospodarek 2007-10-09 20:11:50 UTC
Ah, now we are getting somewhere. :-)  The systemtap script can be easily
modified to see if this happens on other systems as well.  I'll put one together
and test it on my system and see what results I get.

Comment 51 Andy Gospodarek 2007-10-12 20:26:10 UTC
Sorry I've been slow with this -- I'll get this working next week.

Comment 52 Andy Gospodarek 2007-10-19 20:28:16 UTC
I've been playing around with a stap script for this without much luck.  I can't
seem to set probe points on specific filename/line #'s so I may have to
instrument a special kernel to gather this info.  I'm going to speak with
someone on Monday and find out if this is the case, and if so, I'll spin some
test kernels that we can use to gather this info.


Comment 53 Andy Gospodarek 2007-10-31 21:43:30 UTC
So I instrumented an upstream kernel's pci_alloc_consistent like this (the same
could easily apply to rhel4)

diff --git a/include/asm-generic/pci-dma-compat.h
b/include/asm-generic/pci-dma-compat.h
index 25c10e9..db121a8 100644
--- a/include/asm-generic/pci-dma-compat.h
+++ b/include/asm-generic/pci-dma-compat.h
@@ -19,7 +19,9 @@ static inline void *
 pci_alloc_consistent(struct pci_dev *hwdev, size_t size,
                     dma_addr_t *dma_handle)
 {
-       return dma_alloc_coherent(hwdev == NULL ? NULL : &hwdev->dev, size,
dma_handle, GFP_ATOMIC);
+       void *retval = dma_alloc_coherent(hwdev == NULL ? NULL : &hwdev->dev,
size, dma_handle, GFP_ATOMIC);
+       printk(KERN_CRIT "p_a_c: return = 0x%lx *dma_handle =
0x%lx\n",retval,*dma_handle);
+       return retval;
 }

 static inline void

when I use a device driver that is setup to operate similar to yours, I seem to
get addresses on the correct boundries with mem=4096M and without (this system
has 16G of memory and is also an AMD system).

p_a_c: return = 0xffff810104cda000 *dma_handle = 0x104cda000
p_a_c: return = 0xffff810104c66000 *dma_handle = 0x104c66000
p_a_c: return = 0xffff81000f8e0000 *dma_handle = 0xf8e0000
p_a_c: return = 0xffff81000f8e8000 *dma_handle = 0xf8e8000
p_a_c: return = 0xffff81000f8f0000 *dma_handle = 0xf8f0000
p_a_c: return = 0xffff81000f900000 *dma_handle = 0xf900000
p_a_c: return = 0xffff81000f8f0000 *dma_handle = 0xf8f0000
p_a_c: return = 0xffff810010108000 *dma_handle = 0x10108000
p_a_c: return = 0xffff81000f8e0000 *dma_handle = 0xf8e0000
p_a_c: return = 0xffff81000f900000 *dma_handle = 0xf900000



Comment 54 William Reich 2007-10-31 22:21:54 UTC
Great !  You recreated the problem.

( The last 4 hex digits of *dma_handle 
is expected to be '0000'. Your data
shows four cases where the result does not meet this goal. )

Comment 55 Andy Gospodarek 2007-11-01 14:31:35 UTC
Actually, William, that was for all allocations, not a 64k allocation.  I hacked
up a driver and now get the following output.  Remember that I am doing all of
this on an upstream kernel (2.6.24-rc1), so this isn't the best indicator that a
problem won't happen on RHEL4.

tehuti: 0x10ed0000 = pci_alloc_consistent(dev, 0x10000, 0x10ed0000)
tehuti: 0x10ee0000 = pci_alloc_consistent(dev, 0x10000, 0x10ee0000)
tehuti: 0x10ef0000 = pci_alloc_consistent(dev, 0x10000, 0x10ef0000)
tehuti: 0x10f00000 = pci_alloc_consistent(dev, 0x10000, 0x10f00000)
tehuti: 0x10f10000 = pci_alloc_consistent(dev, 0x10000, 0x10f10000)
tehuti: 0x10f20000 = pci_alloc_consistent(dev, 0x10000, 0x10f20000)
tehuti: 0x10f30000 = pci_alloc_consistent(dev, 0x10000, 0x10f30000)
tehuti: 0x10f40000 = pci_alloc_consistent(dev, 0x10000, 0x10f40000)
tehuti: 0x10f50000 = pci_alloc_consistent(dev, 0x10000, 0x10f50000)
tehuti: 0x10f60000 = pci_alloc_consistent(dev, 0x10000, 0x10f60000)
tehuti: 0x10f70000 = pci_alloc_consistent(dev, 0x10000, 0x10f70000)
tehuti: 0x10f80000 = pci_alloc_consistent(dev, 0x10000, 0x10f80000)
tehuti: 0x10f90000 = pci_alloc_consistent(dev, 0x10000, 0x10f90000)
tehuti: 0x10fa0000 = pci_alloc_consistent(dev, 0x10000, 0x10fa0000)
tehuti: 0x10fb0000 = pci_alloc_consistent(dev, 0x10000, 0x10fb0000)
tehuti: 0x10fc0000 = pci_alloc_consistent(dev, 0x10000, 0x10fc0000)
tehuti: 0x10fd0000 = pci_alloc_consistent(dev, 0x10000, 0x10fd0000)
tehuti: 0x10fe0000 = pci_alloc_consistent(dev, 0x10000, 0x10fe0000)
tehuti: 0x10ff0000 = pci_alloc_consistent(dev, 0x10000, 0x10ff0000)
tehuti: 0x11000000 = pci_alloc_consistent(dev, 0x10000, 0x11000000)
tehuti: 0x11010000 = pci_alloc_consistent(dev, 0x10000, 0x11010000)
tehuti: 0x11020000 = pci_alloc_consistent(dev, 0x10000, 0x11020000)
tehuti: 0x11030000 = pci_alloc_consistent(dev, 0x10000, 0x11030000)

Everything seems to be working out well and this system currently has 8G or RAM.

MemTotal:      8266748 kB
MemFree:       5888532 kB
Buffers:         75220 kB
Cached:        1895920 kB
SwapCached:          0 kB
Active:        1684912 kB
Inactive:       527032 kB
SwapTotal:     2031608 kB
SwapFree:      2031608 kB
Dirty:              40 kB

Can you share some of your addresses with us when you get this failure?  I'm
wondering how the physical addresses differ when booting with or without mem=4096M.


Comment 56 William Reich 2007-11-01 14:40:36 UTC
I have to schedule time on the box.
I will gather the data you request.

Comment 57 Andy Gospodarek 2007-11-01 15:15:23 UTC
I'm actually re-installing my system with RHEL4.4 so I can test some as well -- I should have some results in a little while.

Comment 58 William Reich 2007-11-01 18:51:24 UTC
interesting info...

My box has multiple disks on it so we can switch between the
various OSs that we support.
When I got the box, it contained RH5 64 bit
( 2.6.18-8.e15 ( Tikanga ) ).

My problem does not appear in this configuration.

I'm going back to RH 4 now...

Comment 59 Andy Gospodarek 2007-11-01 19:02:36 UTC
I hacked up the forcedeth driver on this box (a RHEL4.5 install with the kernel
close to what we will ship with 4.6 and still cannot reproduce the error you are
seeing).

forcedeth: ex: 0x8b50000 = pci_alloc_consistent(dev, 0x10000, 0x8b50000)
forcedeth: ex: 0x4a00000 = pci_alloc_consistent(dev, 0x10000, 0x4a00000)
forcedeth: ex: 0xa490000 = pci_alloc_consistent(dev, 0x10000, 0xa490000)
forcedeth: ex: 0x8450000 = pci_alloc_consistent(dev, 0x10000, 0x8450000)
forcedeth: ex: 0xaaf0000 = pci_alloc_consistent(dev, 0x10000, 0xaaf0000)
forcedeth: ex: 0x1ab20000 = pci_alloc_consistent(dev, 0x10000, 0x1ab20000)
forcedeth: ex: 0x17a20000 = pci_alloc_consistent(dev, 0x10000, 0x17a20000)
forcedeth: ex: 0x1ab50000 = pci_alloc_consistent(dev, 0x10000, 0x1ab50000)
forcedeth: ex: 0x16d00000 = pci_alloc_consistent(dev, 0x10000, 0x16d00000)
forcedeth: ex: 0x14360000 = pci_alloc_consistent(dev, 0x10000, 0x14360000)
forcedeth: ex: 0x14320000 = pci_alloc_consistent(dev, 0x10000, 0x14320000)
forcedeth: ex: 0x15d40000 = pci_alloc_consistent(dev, 0x10000, 0x15d40000)
forcedeth: ex: 0x17af0000 = pci_alloc_consistent(dev, 0x10000, 0x17af0000)
forcedeth: ex: 0x142a0000 = pci_alloc_consistent(dev, 0x10000, 0x142a0000)
forcedeth: ex: 0x177e0000 = pci_alloc_consistent(dev, 0x10000, 0x177e0000)
forcedeth: ex: 0x176a0000 = pci_alloc_consistent(dev, 0x10000, 0x176a0000)
forcedeth: ex: 0x1ba30000 = pci_alloc_consistent(dev, 0x10000, 0x1ba30000)
forcedeth: ex: 0x144c0000 = pci_alloc_consistent(dev, 0x10000, 0x144c0000)
forcedeth: ex: 0x3130000 = pci_alloc_consistent(dev, 0x10000, 0x3130000)
forcedeth: ex: 0x2f50000 = pci_alloc_consistent(dev, 0x10000, 0x2f50000)
forcedeth: ex: 0xfbab0000 = pci_alloc_consistent(dev, 0x10000, 0xfbab0000)
forcedeth: ex: 0x1a80000 = pci_alloc_consistent(dev, 0x10000, 0x1a80000)
forcedeth: ex: 0x2d80000 = pci_alloc_consistent(dev, 0x10000, 0x2d80000)
forcedeth: ex: 0xaf00000 = pci_alloc_consistent(dev, 0x10000, 0xaf00000)
forcedeth: ex: 0xbae0000 = pci_alloc_consistent(dev, 0x10000, 0xbae0000)
forcedeth: ex: 0x37b0000 = pci_alloc_consistent(dev, 0x10000, 0x37b0000)
forcedeth: ex: 0x1b360000 = pci_alloc_consistent(dev, 0x10000, 0x1b360000)
forcedeth: ex: 0xa6d0000 = pci_alloc_consistent(dev, 0x10000, 0xa6d0000)
forcedeth: ex: 0x1ca60000 = pci_alloc_consistent(dev, 0x10000, 0x1ca60000)
forcedeth: ex: 0xaca0000 = pci_alloc_consistent(dev, 0x10000, 0xaca0000)
forcedeth: ex: 0x12d0000 = pci_alloc_consistent(dev, 0x10000, 0x12d0000)
forcedeth: ex: 0x1c960000 = pci_alloc_consistent(dev, 0x10000, 0x1c960000)


MemTotal:      8226676 kB
MemFree:       6328120 kB
Buffers:        114332 kB
Cached:        1584944 kB
SwapCached:          0 kB
Active:        1338956 kB
Inactive:       404148 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      8226676 kB

Linux dhcp231-119.rdu.redhat.com 2.6.9-65.EL.gtest.33 #1 Thu Nov 1 16:08:41 EDT
2007 x86_64 x86_64 x86_64 GNU/Linux



Comment 60 William Reich 2007-11-01 21:23:32 UTC
on a RH 4 update 4 ( 64bit ) box with 16gig of ram, here are the values
that I am getting...



run a
Nov  1 16:56:38 boeing kernel: pcimb3_dma_init(14445): *tmp_p = 0000000004066000
Nov  1 16:56:38 boeing kernel: pcimb3_dma_init(14461): *tmp_p = 0000000004066000

run b
Nov  1 17:10:51 boeing kernel: pcimb3_dma_init(14445): *tmp_p = 0000000004076000
Nov  1 17:10:51 boeing kernel: pcimb3_dma_init(14461): *tmp_p = 0000000004076000

run c
Nov  1 17:12:18 boeing kernel: pcimb3_dma_init(14445): *tmp_p = 0000000004086000
Nov  1 17:12:18 boeing kernel: pcimb3_dma_init(14461): *tmp_p = 0000000004086000

reboot 

run d
Nov  1 17:16:03 boeing kernel: pcimb3_dma_init(14445): *tmp_p = 0000000004064000
Nov  1 17:16:03 boeing kernel: pcimb3_dma_init(14461): *tmp_p = 0000000004064000

run e
Nov  1 17:18:00 boeing kernel: pcimb3_dma_init(14445): *tmp_p = 0000000004074000
Nov  1 17:18:00 boeing kernel: pcimb3_dma_init(14461): *tmp_p = 0000000004074000


Since the last 4 hex digits are not '0000', I declare an error.

Comment 61 William Reich 2007-11-02 11:38:26 UTC
Created attachment 246651 [details]
sysreport for 16gig rh4.4 machine 64bit

this is the sysreport for the machine that is
returning the unexpected values from pci_alloc_consistent().
This is a RH4.4 64 bit machine with 16gig of memory.

Comment 62 Andy Gospodarek 2007-11-05 17:04:35 UTC
Ah-ha!  It looks like I recreated your issue with an older kernel:

forcedeth: 0x3e7b0000 = pci_alloc_consistent(dev, 0x10000, 0xc0c12000)

I was running 2.6.9-55 on that one, so I'll see if I can sift through the
changelogs to find the patch/bug that may have fixed this one.

Can you test a later kernel for me?  On of my test kernels would be fine as it's
close to what will be used for RHEL4.6.

http://people.redhat.com/agospoda/#rhel4

Comment 63 William Reich 2007-11-06 13:02:29 UTC
Time on the 16gig machine is difficult to schedule.
I'll see what I can do.
--
How does this request differ from the kernel used in comment 7 ?

 So far, we have run these tests to date...

2.6.9-42 ( RH 4 update 4 ) failed
2.6.9.55 ( RH 4 update 5 ) failed
2.6.9.59ish ( from comment 7 ) failed
2.6.18-8.e15 ( RH 5 Tikanga ) pass


Comment 64 Andy Gospodarek 2007-11-06 14:49:00 UTC
Well there have been plenty of changes between the latest RHEL4 kernels
(2.6.9-65) and whatever you tested last, but I've yet to pinpoint one change
that may have resolved this.  My inability to use systemtap to accurately grab
the info is tough too since that means I need to recompile a driver each time I
want to test a different snapshot.  I feel confident that if tests showing -65
resolves this issue that I can narrow it down.

Of the bugs that seem like they could be related, this one sticks out.

Fixed in -60
https://bugzilla.redhat.com/show_bug.cgi?id=294981

It seems like a possibility based on the problem at hand.




Comment 65 William Reich 2007-11-08 16:09:58 UTC
regarding 294981, I received an "access denied" message when I tried to view it.

Comment 66 William Reich 2007-11-08 16:11:28 UTC
regarding testing an experimental kernel,
I've been informed that the 16 gig machines will not be available to me
until 11/19/07 at the earliest
due to company delivery commitments

Comment 67 Andy Gospodarek 2007-11-08 16:21:34 UTC
That's OK.  I'll continue to test some different patches and see if I can come up with 'the one' that seems to resolve this issue on the latest RHEL4 builds

Comment 68 William Reich 2008-01-11 20:52:32 UTC
Please point me to the kernel you wish me to test.
I checked one link, but only found "68" kernels

Comment 69 Andy Gospodarek 2008-01-11 20:59:37 UTC
Anything on my people page should be fine.

Comment 70 William Reich 2008-01-17 15:08:01 UTC
using the "68.2" kernel from
http://people.redhat.com/agospoda/#rhel4,
the test failed.

So now we know:

2.6.9-42 ( RH 4 update 4 ) failed
2.6.9.55 ( RH 4 update 5 ) failed
2.6.9.59ish ( from comment 7 ) failed
2.6.9.68.2  ( from comment 62 ) failed
2.6.18-8.e15 ( RH 5 Tikanga ) pass

Comment 71 William Reich 2008-01-18 13:51:32 UTC
also, 2.6.9.67 fails  ( all of these are 64 bit versions )

Comment 72 William Reich 2008-02-05 13:59:17 UTC
I had to release the machine for other projects.

Comment 73 Andy Gospodarek 2008-02-05 17:28:32 UTC
William, I'm *truly* sorry that I haven't narrowed down this problem yet.  In
the past I discussed this with a few others, but we didn't come up with anything
meaningful.  I'd like to get this resolved for the next update, so I'm going to
bring in a few other folks to help me look at this.

Comment 75 Andy Gospodarek 2008-02-26 21:41:59 UTC
William,

One thought whem talking to someone is that the iommu could be in-play here. 
Can you try and boot with iommu=off on the kernel command line and see if that
makes a difference.

Thanks.



Comment 76 William Reich 2008-02-26 22:28:32 UTC
on the hp proliant dl585 machine with 16 gig of memory,
the machine panic'd during bootup when
I added " iommu=off " to the kernel command line in the grub.conf file.
This was using a 2.6.9.67 64bit kernel.

Comment 77 Andy Gospodarek 2008-02-26 22:50:00 UTC
Panic'd, eh?  You didn't happen to capture the output from that did you?

Comment 78 William Reich 2008-02-27 15:07:40 UTC
"Kernel panic - not syncing: PCI-DMA: high address but no IOMMU"

Comment 79 William Reich 2008-02-27 15:12:33 UTC
Created attachment 296071 [details]
high address but no IOMMU - first pic

screen picture #1 of error message ( related to comment 78 )

Comment 80 William Reich 2008-02-27 15:13:29 UTC
Created attachment 296072 [details]
high address but no IOMMU - #2

second screen capture of 
"high address but no IOMMU"

Comment 81 William Reich 2008-02-27 15:14:20 UTC
Comment on attachment 296072 [details]
high address but no IOMMU - #2

related to comment 78

Comment 82 William Reich 2008-04-14 12:29:58 UTC
any news on this topic ?

Comment 83 Andy Gospodarek 2008-04-14 18:17:35 UTC
Created attachment 302376 [details]
possible-rhel4-gart-fix.patch

I was looking at this again today and chatting with someone else when they
suggested the attached patch that was added during 2.6.17 development.

It seems like this is a decent candidate since it expands the space to search
for valid memory.  Can you give this a try?

Comment 84 William Reich 2008-04-14 18:42:09 UTC
msg received.
I have submitted a request to get time on our 16gig-of-memory machine.
Hopefully the end of this week...

Comment 85 Andy Gospodarek 2008-04-16 19:16:05 UTC
My test kernels have been updated to include a patch for this bugzilla.

http://people.redhat.com/agospoda/#rhel4

Please test them and report back your results.

Comment 86 William Reich 2008-04-21 13:05:27 UTC
Just got the machines today ( 4/21... ) 

more status later...

Comment 87 William Reich 2008-04-21 14:53:27 UTC
using 2.6.9-69.EL.gtest.43smp,
the test failed.
...same errors...

Comment 97 Prarit Bhargava 2008-07-02 15:01:39 UTC
Created attachment 310803 [details]
Module to reproduce issue

William,

Please compile and test the following module on your system and let us know the
results.

I am going to book time on a 64G dl-585-g2 in RDU to see if I can reproduce it
using the attached module.

(Makefile will be attached next)

P.

Comment 98 Prarit Bhargava 2008-07-02 15:02:44 UTC
Created attachment 310807 [details]
Makefile for prarit.c

Comment 99 Prarit Bhargava 2008-07-02 17:42:17 UTC
I was able to reproduce Ulticom's issue by loading my module (see comments #97 &
#98) on a 64G DL585 G2.

Output from my module:

addr[0] = 00000103f4220000
dma_handle[0] = 40a1000
INCORRECT ALIGNMENT to 10000 

Interesting that this doesn't appear to happen @ 8G...

P.

Comment 100 Prarit Bhargava 2008-07-02 22:09:32 UTC
Created attachment 310860 [details]
new prarit.c module

William, please test this new module ASAP and let me know the results.	I
noticed in my output that all returned addresses were above the 4G mark, so I
bumped the dma mask.

This module explicitly obsoletes the previous module code.

P.

Comment 102 William Reich 2008-07-03 11:00:33 UTC
I am currently out of the office.
I will get to this upon my return the week of July 7.

Comment 103 Prarit Bhargava 2008-07-07 15:51:03 UTC
After looking into this over the weekend and checking with lwoodman regarding
alignment requirements it turns out that my initial thoughts regarding
pci_alloc_consistent were correct.

* pci_alloc_consistent does not guarantee alignment to the size requested *

If a user/driver requires a specific alignment the user/driver must use PCI (or
DMA) pool functions.  That, AFAICT, is the only way to guarantee DMA alignment
requests.

P.

Comment 105 William Reich 2008-07-07 16:09:27 UTC
hmmm...
This is very strange that the behavior would change
based on the amount of memory in the machine.
So, all these years, my driver is running my luck ?
...
Are you saying that my only option is to
do what is described in comment 10 ?
( request twice as much as I need, and find the boundry within the
allocation that I got back from the call... )

Comment 106 William Reich 2008-07-07 17:02:10 UTC

I found the "pool" documentation.
I'll have to play with this as an experiment
( when I can get time on the 16gig machine... I'm still waiting to try
the experiment from comment 100... )

Comment 107 William Reich 2008-07-07 17:08:06 UTC

I found this statement in the DMA-mapping.txt file.
It seems to me that this information conflicts with the information presented in
comment 103...

please advise...
+++++++++++++++++++++

.
 .
  .

pci_alloc_consistent returns two values: the virtual address which you
can use to access it from the CPU and dma_handle which you pass to the
card.

The cpu return address and the DMA bus master address are both
guaranteed to be aligned to the smallest PAGE_SIZE order which
is greater than or equal to the requested size.  This invariant
exists (for example) to guarantee that if you allocate a chunk
which is smaller than or equal to 64 kilobytes, the extent of the
buffer you receive will not cross a 64K boundary.
...
 

Comment 108 William Reich 2008-07-07 17:20:56 UTC

just an FYI,
the date on the DMA-mapping.txt file is Oct 18, 2004.
This file is part of RH4AS.

On a RH5AS system, the file is dated Sep 19, 2006.

Both versions of the file appear to have the same language
when referring to pci_alloc_consistent().

Comment 110 Prarit Bhargava 2008-07-07 18:12:22 UTC
Hi William,

Good catch!  I think you hit the nail on the head when you noted that RHEL4
added the dma-mapping.txt sometime after the release of RHEL4.  I suspect that
no one proof read the new document before it was committed :(.  I'll probably
write up a patch to remove the conflicting comments in dma-mapping.txt.

As to the other question, "Why does this work if I bump the dma mask?" ...

pci_alloc_consistent() calls dma_alloc_coherent()

dma_alloc_coherent() executes the following code:

     memory = dma_alloc_pages(dev, gfp, get_order(size));
                .
                .
                .
                int high, mmu;
                bus = virt_to_bus(memory);
                high = (bus + size) >= dma_mask; <<< by bumping the dma_mask,
                                                     you are simply increasing
                                                     this value.  Therefore,
                                                     high = 0 (always).
                mmu = high;
                if (force_iommu && !(gfp & GFP_DMA)) 
                        mmu = 1;
                if (no_iommu || dma_mask < 0xffffffffUL) { 
                        if (high) {
                                if (!(gfp & GFP_DMA)) { 
                                        gfp |= GFP_DMA; 
                                        free_pages((unsigned long)memory,
                                                   get_order(size));
                                        goto again;
                                }
                                goto free;
                        }
                        mmu = 0; 
                }       
                memset(memory, 0, size); 
                if (!mmu) { 
                        *dma_handle = virt_to_bus(memory);
                        return memory;  <<< we always return here because mmu=0
                }


In the case that that high evaluates to 1, we end up at another method of
getting a dma area,

        *dma_handle = dma_map_area(dev, bus, size, PCI_DMA_BIDIRECTIONAL, 0);

The call to dma_map_area *does not guarantee* any alignment.  It simply queries
the iommu for an empty area and returns pointers to the area (virtual and a dma
handle).

(The function, dma_alloc_coherent is found in arch/x86_64/kernel/pci-gart.c)

Why does it seem like this suddenly broke in RHEL4?

As you add memory the likelihood of the value of "high" being calculated as 1
gets larger because the bus value gets larger.

Why does this work in RHEL5?

RHEL5 uses multiple DMA address ranges.

What is the right thing to do in RHEL4?

I would suggest using the dma pool functions, however, if you feel "safe" in
bumping the dma_mask feel free to do that instead...

P.

Comment 111 William Reich 2008-07-07 18:32:44 UTC

just an FYI,
the date on the DMA-mapping.txt file is Oct 18, 2004.
This file is part of RH4AS.

On a RH5AS system, the file is dated Sep 19, 2006.

Both versions of the file appear to have the same language
when referring to pci_alloc_consistent().

Comment 112 William Reich 2008-07-07 18:34:11 UTC
oops - comment 111 should be deleted.
I pushed the wrong button on my browser...


Comment 113 Prarit Bhargava 2008-07-08 11:54:52 UTC
(In reply to comment #112)
> oops - comment 111 should be deleted.
> I pushed the wrong button on my browser...
> 

No problem William :)

P.

Comment 114 William Reich 2008-07-08 15:45:53 UTC

in response to comment 100, here is
the output of the prarit.ko driver
when the machine is configured to use only  4  gig of memory...


Jul  8 11:36:21 boeing kernel: addr[0] = 000001005bc30000
Jul  8 11:36:21 boeing kernel: dma_handle[0] = 5bc30000
Jul  8 11:36:21 boeing kernel: addr[1] = 000001005c680000
Jul  8 11:36:21 boeing kernel: dma_handle[1] = 5c680000
Jul  8 11:36:21 boeing kernel: addr[2] = 000001005cea0000
Jul  8 11:36:21 boeing kernel: dma_handle[2] = 5cea0000
Jul  8 11:36:21 boeing kernel: addr[3] = 000001005c500000
Jul  8 11:36:21 boeing kernel: dma_handle[3] = 5c500000
Jul  8 11:36:21 boeing kernel: addr[4] = 000001005b700000
Jul  8 11:36:21 boeing kernel: dma_handle[4] = 5b700000
Jul  8 11:36:21 boeing kernel: addr[5] = 000001005d260000
Jul  8 11:36:21 boeing kernel: dma_handle[5] = 5d260000
Jul  8 11:36:21 boeing kernel: addr[6] = 000001005cfd0000
Jul  8 11:36:21 boeing kernel: dma_handle[6] = 5cfd0000
Jul  8 11:36:21 boeing kernel: addr[7] = 000001005c230000
Jul  8 11:36:21 boeing kernel: dma_handle[7] = 5c230000
Jul  8 11:36:21 boeing kernel: addr[8] = 000001005c5c0000
Jul  8 11:36:21 boeing kernel: dma_handle[8] = 5c5c0000
Jul  8 11:36:21 boeing kernel: addr[9] = 000001005f020000
Jul  8 11:36:21 boeing kernel: dma_handle[9] = 5f020000


Comment 115 William Reich 2008-07-08 15:51:44 UTC
Now I have configured the machine for 16 gig...

Here is the output. ( I do not see any errors being reported... )

Jul  8 11:43:53 boeing kernel: addr[0] = 000001027aab0000
Jul  8 11:43:53 boeing kernel: dma_handle[0] = 27aab0000
Jul  8 11:43:53 boeing kernel: addr[1] = 000001027aaa0000
Jul  8 11:43:53 boeing kernel: dma_handle[1] = 27aaa0000
Jul  8 11:43:53 boeing kernel: addr[2] = 000001027aac0000
Jul  8 11:43:53 boeing kernel: dma_handle[2] = 27aac0000
Jul  8 11:43:53 boeing kernel: addr[3] = 000001027aad0000
Jul  8 11:43:53 boeing kernel: dma_handle[3] = 27aad0000
Jul  8 11:43:53 boeing kernel: addr[4] = 000001027aae0000
Jul  8 11:43:53 boeing kernel: dma_handle[4] = 27aae0000
Jul  8 11:43:53 boeing kernel: addr[5] = 000001027aaf0000
Jul  8 11:43:53 boeing kernel: dma_handle[5] = 27aaf0000
Jul  8 11:43:53 boeing kernel: addr[6] = 000001027ab00000
Jul  8 11:43:53 boeing kernel: dma_handle[6] = 27ab00000
Jul  8 11:43:53 boeing kernel: addr[7] = 000001027ab10000
Jul  8 11:43:53 boeing kernel: dma_handle[7] = 27ab10000
Jul  8 11:43:53 boeing kernel: addr[8] = 000001027ab20000
Jul  8 11:43:53 boeing kernel: dma_handle[8] = 27ab20000
Jul  8 11:43:53 boeing kernel: addr[9] = 000001027ab30000
Jul  8 11:43:53 boeing kernel: dma_handle[9] = 27ab30000

I did confirm that the machine is using 16 gig via
/proc/meminfo.
AND, my driver also failed ( as it has been... )



Comment 116 William Reich 2008-07-08 15:54:30 UTC

Please note that in comment 115,
the DMA mask was for 64 bit operation.


Comment 117 William Reich 2008-07-08 15:56:51 UTC

I reran the test, but this time the coherent_dma_mask
was set for 32bits.

The prarit.ko test driver reported a failure in this case...

Jul  8 11:49:15 boeing kernel: addr[0] = 000001016e500000
Jul  8 11:49:15 boeing kernel: dma_handle[0] = 40d3000
Jul  8 11:49:15 boeing kernel: INCORRECT ALIGNMENT to 10000 


So this reveals the problem that my driver is having...


Comment 118 William Reich 2008-07-08 17:09:43 UTC
I tried changing the input to
pci_set_dma_mask() and
pci_set_consistent_dma_mask()
from 32bit to 64bit.

My driver still failed.

Comment 119 William Reich 2008-07-08 18:37:24 UTC

I need some help on this "pool" stuff.

Given this MAN page info ( below ), I am having trouble
determining the correct values for the input parameters.

If I want a 64k block of memory that must start on a 64k boundry,
what are the correct values for "size","align", and "allocation" ?
64k, 64k, & 64k
or
64k, 64k, & 0
or 
something else ??

Also, do I need to pre-size this pool in anyway ?
For example, I know that I need about 16 of these 64k block allocations.
Do I have to tell the kernel in some way about my need for 16, or
is this "pool" of 'near infinite' size such that I can call
pci_pool_alloc( dma_pool_alloc ) as many times as I wish ??

thanks...
++++++++++++++++++++++++++++

NAME
dma_pool_create - Creates a pool of consistent memory blocks, for dma.  
SYNOPSIS

    struct dma_pool * dma_pool_create  (const char * name, struct device * dev,
size_t size, size_t align, size_t allocation);

 
ARGUMENTS

name
    name of pool, for diagnostics

dev
    device that will be doing the DMA

size
    size of the blocks in this pool.

align
    alignment requirement for blocks; must be a power of two

allocation
    returned blocks won't cross this boundary (or zero)

 

Comment 120 Prarit Bhargava 2008-07-09 13:12:54 UTC
Hi William,

I think you want to do the following:

reichs_pool = pci_pool_create("reich's device", &reichs_dev,
                0x10000 /* 64K in size */,
                0x10000 /* 64K byte aligned */,
                0x10000 /* don't cross a 64K boundary */);

--- aside

I suppose in this case the last argument could be zero.  AFAICT, doing a 64K
size that is 64K byte aligned will return you a pointer that doesn't cross a
64K boundary ...

--- end aside 

and then do

addr = dma_pool_alloc(reichs_pool, GFP_ATOMIC, &dma_handle);

which should return you

addr (64K in size, aligned to 0x10000, and doesn't cross a 64K boundary)
dma_handle = physical pointer usable by device that is aligned to 0x10000

P.

Comment 122 William Reich 2008-07-11 19:36:20 UTC
the "pool" stuff did not work on my 16gig machine.

Same symptoms as before
- when the memory is set to 4gig or less, my memory allocations are
      bounded at the correct 64k boundry.
- when the memory is set to 16gig, my memory allocations are
          not on a 64k boundry.

I tried using two versions of the pool create:
reichs_pool = pci_pool_create("reich's device", &reichs_dev,
                0x10000 /* 64K in size */,
                0x10000 /* 64K byte aligned */,
                0x10000 /* don't cross a 64K boundary */);
and
reichs_pool = pci_pool_create("reich's device", &reichs_dev,
                0x10000 /* 64K in size */,
                0x10000 /* 64K byte aligned */,
                0       ;

same results...



Comment 123 William Reich 2008-07-11 19:48:32 UTC

report #454417 has been created to update documentation

Comment 124 William Reich 2008-07-15 16:48:03 UTC

it is interesting to note that as I continue to run experiments with the
pci_alloc_consistent(), the virtual address IS coming back 64k aligned, but the
physical address is not.
( I need the physical address to be aligned... )

Comment 125 Prarit Bhargava 2008-07-15 17:07:21 UTC
(In reply to comment #124)
> 
> it is interesting to note that as I continue to run experiments with the
> pci_alloc_consistent(), the virtual address IS coming back 64k aligned, but the
> physical address is not.
> ( I need the physical address to be aligned... )

Yes -- that was noted previously and is tested for in the prarit.c module.

P.

Comment 126 Prarit Bhargava 2008-07-16 12:50:33 UTC
William, can we get some simple details about your device?  Is it a 32-bit or
64-bit device?  What is the dma_mask set to for this device?

Comment 127 William Reich 2008-07-16 13:23:03 UTC
we have a 32bit device.
We use the 32 bit dma mask.

Referring to other comments ( #1 , #117 & #118 ) in this thread,


#define DMA_32BIT_MASK 0x00000000ffffffffULL

u64     stream_dma_32bit_mask            = DMA_32BIT_MASK ;

And for completeness,

#define DMA_64BIT_MASK 0xffffffffffffffffULL

u64     stream_dma_64bit_mask            = DMA_64BIT_MASK ;





Comment 128 William Reich 2008-07-16 13:25:51 UTC

and, just to complete the thought,
we are call BOTH pci_set_dma_mask()
and pci_set_consistent_dma_mask() ( in that order )
with the 32 bit dma mask as the input value.

Comment 129 Prarit Bhargava 2008-07-17 23:26:49 UTC
William, if I sent you a kernel patch could you test it?  I'm writing code now
and expect to have something ready tomorrow (Friday).

P.

Comment 130 William Reich 2008-07-17 23:49:42 UTC
I'm willing.
I just have to wait my turn for the machine with 16gig of memory.
Of course, we also have to have matching kernels...


Comment 131 Prarit Bhargava 2008-07-18 17:49:30 UTC
Created attachment 312159 [details]
RHEL4 fix for this issue

William, please test with this patch.  It applies to the latest kernel source
available from here:

http://people.redhat.com/vgoyal/rhel4/

P.

Comment 132 Prarit Bhargava 2008-07-18 17:59:25 UTC
Created attachment 312162 [details]
Upstream patch that fixes an overflow bug

Sent to LKML and Jesse Barnes.

Comment 133 Prarit Bhargava 2008-07-18 18:00:33 UTC
Created attachment 312163 [details]
Upstream patch that fixes alignment bug

Sent to LKML & Jesse Barnes.

Comment 134 William Reich 2008-07-18 18:21:14 UTC

Please clarify - which test do you wish me to execute ?

Do you want me to exercise my original problem as
described in comment #1, or am I to exercise the
dma/pci pool stuff referenced in comment #122 ?

Also, what is the relationship between this patch and 
bug  report #454417  ?

Comment 135 Prarit Bhargava 2008-07-18 18:34:09 UTC
(In reply to comment #134)
> 
> Please clarify - which test do you wish me to execute ?
> 
> Do you want me to exercise my original problem as
> described in comment #1, or am I to exercise the
> dma/pci pool stuff referenced in comment #122 ?

Actually I was hoping you could test my prarit.c module ... but now that I think
of it, your independent driver test is probably better :)

> 
> Also, what is the relationship between this patch and 
> bug  report #454417  ?

454417 is for RHEL _5_.

P.


Comment 136 Prarit Bhargava 2008-07-18 18:35:55 UTC
> 
> Also, what is the relationship between this patch and 
> bug  report #454417  ?

>454417 is for RHEL _5_.

Oops.  Scratch that.

454417 is probably no longer valid after this patch, IMO.  An unintended
consequence of fixing this code is that pci_alloc_consistent/dma_alloc_coherent
will always return size-aligned values.

P.

Comment 137 Prarit Bhargava 2008-07-21 14:31:38 UTC
I *finally* got the upstream patches through to LKML.

Links to upstream submits for this issue:

http://marc.info/?l=linux-kernel&m=121664984730778&w=2

http://marc.info/?l=linux-kernel&m=121664984830791&w=2

P.

Comment 138 William Reich 2008-07-21 14:47:07 UTC
just for info, 
it looks like I will not be able to get time on the 16gig machine
until Wed 7/23...

Comment 139 Prarit Bhargava 2008-07-21 14:53:40 UTC
William, no problem -- FYI, I've tested this on a few systems within RH and it
seems to fix the issue.  

Before submitting to our internal kernel list, I would like to make sure it
fixes your problem ;) and that there isn't another issue blocking you.

P.

Comment 140 Prarit Bhargava 2008-07-23 11:22:12 UTC
Created attachment 312462 [details]
Upstream patch that fixes this issue

Submitted to LKML.

Comment 141 Prarit Bhargava 2008-07-23 11:24:20 UTC
Patch upstream here:

http://marc.info/?l=linux-kernel&m=121681201313560&w=2

P.

Comment 142 William Reich 2008-07-24 15:02:05 UTC
Executive Summary - Success !

Details...
using the "vanilla" kernel from comment #131, we
saw that the problem described by this buzz report still existed.
We used our own driver to reveal the problem.
We did not use the test driver from comment #100.
So, this kernel failed, as expected.
.
We then applied the patch ( also from comment #131 ) to the kernel.
We executed the test using the same driver, and the test passed.

thanks

Comment 143 William Reich 2008-07-24 15:05:31 UTC
now the 64 million dollar question...

When my customers ask "how do I get this patch? ", what do I tell them ?

Something like,
"For RedHat AS 4, you need patch such&such, which is available
from Redhat via _____________ . "
and
"For RedHat AS 5, you need patch such&such, which is available
from Redhat via _____________ . "

Comment 144 Prarit Bhargava 2008-07-24 15:19:55 UTC
For RHEL4 it looks likely that this will be in 4.8.

I've opened up a separate BZ for RHEL5 -- 455813.

P.

Comment 145 Prarit Bhargava 2008-07-29 18:19:57 UTC
Created attachment 312916 [details]
RHEL4 fix for this issue

Posted.

Comment 148 RHEL Program Management 2008-09-03 13:01:32 UTC
Updating PM score.

Comment 150 Vivek Goyal 2008-11-18 14:31:10 UTC
Committed in 78.18.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 157 errata-xmlrpc 2009-05-18 19:23:00 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1024.html