198498 – 2.6.17-1.2365.fc6 and Calgary-X watchdog hangs (DMA error) on 2-node x460.

Bug 198498 - 2.6.17-1.2365.fc6 and Calgary-X watchdog hangs (DMA error) on 2-node x460.

Summary: 2.6.17-1.2365.fc6 and Calgary-X watchdog hangs (DMA error) on 2-node x460.

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	6
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Konrad Rzeszutek
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-07-11 14:40 UTC by Konrad Rzeszutek
Modified:	2007-11-30 22:11 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2006-07-31 16:28:39 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Serial output showing the problem. (55.97 KB, text/plain) 2006-07-11 14:40 UTC, Konrad Rzeszutek	no flags	Details
Config file used to build 2.6.17-1.2365.fc6 kernel. (60.40 KB, text/plain) 2006-07-12 14:48 UTC, Konrad Rzeszutek	no flags	Details
Sysreport (248.24 KB, application/octet-stream) 2006-07-12 17:02 UTC, Konrad Rzeszutek	no flags	Details
Mainline patch to fix this bug (6.33 KB, patch) 2006-07-14 23:33 UTC, Jon Mason	no flags	Details \| Diff
Patch against the fc6 kernel (6.49 KB, patch) 2006-07-17 18:29 UTC, Jon Mason	no flags	Details \| Diff
Show Obsolete (1) View All

Description Konrad Rzeszutek 2006-07-11 14:40:28 UTC

Description of problem:

2.6.17-1.2365.fc6 hangs under a two node x460. There is a couple of show tracks
and lastly this message shows up:

calgary_watchdog: DMA error on bus 0, CSR = 0x2010000


Version-Release number of selected component (if applicable):
2.6.17-1.2365.fc6

How reproducible:
Install 2.6.17-1.2365.fc6 kernel

Actual results:

Expected results:


Additional info:

Comment 1 Konrad Rzeszutek 2006-07-11 14:40:29 UTC

Created attachment 132239 [details]
Serial output showing the problem.

Comment 2 Muli Ben-Yehuda 2006-07-12 04:22:07 UTC

Looks like a Calgary bug indeed...

<Jul/11 10:28 am>Starting udev: Unable to handle kernel NULL pointer dereference
at 0000000000000030 RIP: 
<Jul/11 10:28 am> [<ffffffff80283c2f>] calgary_alloc_coherent+0x3c/0xfb

Can you please attach your .config?

As a workaround, you can boot with command line arguments "iommu=off" until we
fix this one. Thanks for reporting!

Comment 3 Konrad Rzeszutek 2006-07-12 14:48:01 UTC

Created attachment 132310 [details]
Config file used to build 2.6.17-1.2365.fc6 kernel.

If you want to test this kernel (please keep in mind that the FC/RHEL5 kernel
has also some extra patches on top of the 2.6.17+2.6.18-rc1-git3 patch set),
you can download the source rpm from:

ftp://ftp.tu-chemnitz.de/pub/linux/fedora-core/development/source/SRPMS/

There are also other mirror sites are much faster. Pls take a look
in: http://fedora.redhat.com/Download/mirrors.html.

If the problem is due to some patches being screwed up, please please report
this here with an explanation of what needs to be done to correct the problem.

Thanks!

Comment 4 Muli Ben-Yehuda 2006-07-12 16:19:09 UTC

Tnanks Konrad, I was looking for the SRPM earlier today. We'll give it a spin on
our test machine, if it's reproducible there, so much the better. We're also
trying to get some time on a dual node 460, will keep you updated.

Comment 5 Konrad Rzeszutek 2006-07-12 17:02:50 UTC

Created attachment 132316 [details]
Sysreport

I am also attaching a sysreport (a tarball of /proc, /etc, dmidecode, etc) so
that you can find out if your machine mirrors yours in case this is a config
issue.

Please note that I have HotPlug Memory on in the BIOS.

Comment 6 Muli Ben-Yehuda 2006-07-12 18:00:13 UTC

Ok, I think I know what's going on. This machine violates a rule we counted on
relating the PCI bus number to the Calgary PHB (which was a total hack, but
worked on every machine we tried it on so far). You have PCI busses 0xe and 0xf,
for which bus_to_phb() returns the same result (it should return different
results, as they're different PHBs), which leads to one of the busses not
getting initialized (in init_one()/init_one_nontranslated()), which leads to
dev->bus->self not being set, which leads to the NULL pointer deref.

Fixing this is a good chance to redo this logic to stop relying on any
relationship between PCI bus numbers and Calgary/PHB-per-Calgary. We're working
on it.

Comment 7 Jon Mason 2006-07-14 23:33:22 UTC

Created attachment 132472 [details]
Mainline patch to fix this bug

I was able to get access to a x460 and reproduce this problem with a mainline
kernel.  Attached is a patch against mainline that fixes the issue on my
system.  I am in the process of verifing the fix against the fc6 kernel you
linked above.  

There are many issues that had to be resolved to fix this problem.  Firstly,
when I originally wrote the code to handle NUMA systems, I had a large
misunderstand that was not corrected until now.  That was that I thought the
"number of nodes online" referred to number of physical systems connected.  So
that if NUMA was disabled, there would only be 1 node and it would only show
that node's PCI bus.  In reality if NUMA is disabled, the system displays all
of the connected system but is only ignorant of the delays in accessing main
memory.  Therefore, references to num_online_nodes() and MAX_NUMNODES are
incorrect and need to be set to the maximum number of nodes that can be
accessed (which are 4).  I created a variable, CAL_MAX_NODES, and set it to 4
to fix this.

Secondly, when walking the PCI in detect_calgary, the code only checked the
first "slot" when looking to see if a device is present.  This will work for
most cases, but unfortunately it isn't always the case.  In the NUMA MXE
drawers, there are USB devices present on the 3rd slot (with slot 1 being
empty).  So, to work around this, all slots (up to 8) are scanned to see if
there are any devices present.

Lastly, the bus is being enumerated on large systems in a different way the we
originally thought.  This throws the ugly logic we had out the window.	To more
elegantly handle this, I reorganized the kva array to be sparse (which removed
the need to have any bus number to kva slot logic in tce.c) and created a
secondary space array to contain the bus number to phb mapping.  This will
enable the remove the translation_disabled bitmap in the future.

With these changes Calgary boots on an x460 with 4 nodes with and without NUMA
enabled.

Comment 8 Jon Mason 2006-07-14 23:54:35 UTC

The patch above applies with minimal fuzz (-2 lines) to the fc6.  Testing on
this tree shortly.

Also, a better workaround for this problem would be "iommu=soft" as "iommu=off"
will cause problems in systems with mroe than 4G RAM.

Comment 9 Konrad Rzeszutek 2006-07-17 14:41:49 UTC

Jon,

The x460 can be an 8-node machine. Perhaps the CAL_MAX_NODES should be set to 8
instead of 4?

Comment 10 Jon Mason 2006-07-17 15:52:39 UTC

I verified that there are a maximum of 8 chassis for the x460 on the IBM
website.  I guess that is what happens when you ask someone instead of checking
the web.  Thanks Konrad, I'll fix that in the next version of the patch (to be
posted shortly).

Comment 11 Jon Mason 2006-07-17 18:29:49 UTC

Created attachment 132566 [details]
Patch against the fc6 kernel

This patch should fix the issue.  Please test and verify it works on your
system.

Comment 12 Konrad Rzeszutek 2006-07-18 20:28:54 UTC

Patch works splendidly.

Comment 13 Konrad Rzeszutek 2006-07-18 20:40:00 UTC

Patch posted on the internal reflector for inclusion in RHEL5.

Comment 14 Muli Ben-Yehuda 2006-07-30 09:38:36 UTC

Patch included into mainline as of 2.6.18-rc3. Bug can probably be closed now?

Comment 15 Konrad Rzeszutek 2006-07-31 16:17:38 UTC

Muli,

Not yet. Dave Jones/Don Zickus need to close this bug when he puts the patch in
the FC/RHEL kernel.

Comment 16 Don Zickus 2006-07-31 16:28:39 UTC

Sorry, patch was picked up and then dropped b/c rc3 has it.

-Don

Note You need to log in before you can comment on or make changes to this bug.