Red Hat Bugzilla – Bug 198498
2.6.17-1.2365.fc6 and Calgary-X watchdog hangs (DMA error) on 2-node x460.
Last modified: 2007-11-30 17:11:37 EST
Description of problem:
2.6.17-1.2365.fc6 hangs under a two node x460. There is a couple of show tracks
and lastly this message shows up:
calgary_watchdog: DMA error on bus 0, CSR = 0x2010000
Version-Release number of selected component (if applicable):
Install 2.6.17-1.2365.fc6 kernel
Created attachment 132239 [details]
Serial output showing the problem.
Looks like a Calgary bug indeed...
<Jul/11 10:28 am>Starting udev: Unable to handle kernel NULL pointer dereference
at 0000000000000030 RIP:
<Jul/11 10:28 am> [<ffffffff80283c2f>] calgary_alloc_coherent+0x3c/0xfb
Can you please attach your .config?
As a workaround, you can boot with command line arguments "iommu=off" until we
fix this one. Thanks for reporting!
Created attachment 132310 [details]
Config file used to build 2.6.17-1.2365.fc6 kernel.
If you want to test this kernel (please keep in mind that the FC/RHEL5 kernel
has also some extra patches on top of the 2.6.17+2.6.18-rc1-git3 patch set),
you can download the source rpm from:
There are also other mirror sites are much faster. Pls take a look
If the problem is due to some patches being screwed up, please please report
this here with an explanation of what needs to be done to correct the problem.
Tnanks Konrad, I was looking for the SRPM earlier today. We'll give it a spin on
our test machine, if it's reproducible there, so much the better. We're also
trying to get some time on a dual node 460, will keep you updated.
Created attachment 132316 [details]
I am also attaching a sysreport (a tarball of /proc, /etc, dmidecode, etc) so
that you can find out if your machine mirrors yours in case this is a config
Please note that I have HotPlug Memory on in the BIOS.
Ok, I think I know what's going on. This machine violates a rule we counted on
relating the PCI bus number to the Calgary PHB (which was a total hack, but
worked on every machine we tried it on so far). You have PCI busses 0xe and 0xf,
for which bus_to_phb() returns the same result (it should return different
results, as they're different PHBs), which leads to one of the busses not
getting initialized (in init_one()/init_one_nontranslated()), which leads to
dev->bus->self not being set, which leads to the NULL pointer deref.
Fixing this is a good chance to redo this logic to stop relying on any
relationship between PCI bus numbers and Calgary/PHB-per-Calgary. We're working
Created attachment 132472 [details]
Mainline patch to fix this bug
I was able to get access to a x460 and reproduce this problem with a mainline
kernel. Attached is a patch against mainline that fixes the issue on my
system. I am in the process of verifing the fix against the fc6 kernel you
There are many issues that had to be resolved to fix this problem. Firstly,
when I originally wrote the code to handle NUMA systems, I had a large
misunderstand that was not corrected until now. That was that I thought the
"number of nodes online" referred to number of physical systems connected. So
that if NUMA was disabled, there would only be 1 node and it would only show
that node's PCI bus. In reality if NUMA is disabled, the system displays all
of the connected system but is only ignorant of the delays in accessing main
memory. Therefore, references to num_online_nodes() and MAX_NUMNODES are
incorrect and need to be set to the maximum number of nodes that can be
accessed (which are 4). I created a variable, CAL_MAX_NODES, and set it to 4
to fix this.
Secondly, when walking the PCI in detect_calgary, the code only checked the
first "slot" when looking to see if a device is present. This will work for
most cases, but unfortunately it isn't always the case. In the NUMA MXE
drawers, there are USB devices present on the 3rd slot (with slot 1 being
empty). So, to work around this, all slots (up to 8) are scanned to see if
there are any devices present.
Lastly, the bus is being enumerated on large systems in a different way the we
originally thought. This throws the ugly logic we had out the window. To more
elegantly handle this, I reorganized the kva array to be sparse (which removed
the need to have any bus number to kva slot logic in tce.c) and created a
secondary space array to contain the bus number to phb mapping. This will
enable the remove the translation_disabled bitmap in the future.
With these changes Calgary boots on an x460 with 4 nodes with and without NUMA
The patch above applies with minimal fuzz (-2 lines) to the fc6. Testing on
this tree shortly.
Also, a better workaround for this problem would be "iommu=soft" as "iommu=off"
will cause problems in systems with mroe than 4G RAM.
The x460 can be an 8-node machine. Perhaps the CAL_MAX_NODES should be set to 8
instead of 4?
I verified that there are a maximum of 8 chassis for the x460 on the IBM
website. I guess that is what happens when you ask someone instead of checking
the web. Thanks Konrad, I'll fix that in the next version of the patch (to be
Created attachment 132566 [details]
Patch against the fc6 kernel
This patch should fix the issue. Please test and verify it works on your
Patch works splendidly.
Patch posted on the internal reflector for inclusion in RHEL5.
Patch included into mainline as of 2.6.18-rc3. Bug can probably be closed now?
Not yet. Dave Jones/Don Zickus need to close this bug when he puts the patch in
the FC/RHEL kernel.
Sorry, patch was picked up and then dropped b/c rc3 has it.