Bug 236922

Summary: DMA: Out of SW-IOMMU space
Product: Red Hat Enterprise Linux 5 Reporter: Brian Wheeler <bdwheele>
Component: kernelAssignee: Pete Zaitcev <zaitcev>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.0CC: amax, lcm, peterm, zaitcev
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0959 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-11-07 19:47:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
boot output
none
Candidate patch 1 - free DMA mappings upon a submission error none

Description Brian Wheeler 2007-04-18 13:54:01 UTC
Description of problem:


Version-Release number of selected component (if applicable):

2.6.18-8.1.1.el5
2.6.18-8.1.1.el5xen
2.6.18-8.el5

How reproducible:

All my IBM x3650 Machines 


Steps to Reproduce:
1. boot rhel5 kernel
2. watch the SW-IOMMU messages go by in /var/log/messages
3. wonder if they're important :)
  
Actual results:

Lots of messages

Expected results:

None of these messages, I suppose.

Additional info:

These messages are appearing on the x3650 machines from IBM we have.  The full
error message is:

Apr 18 09:34:11 calliope kernel: DMA: Out of SW-IOMMU space for 8 bytes at
device 0000:00:1d.2

Sometimes it is 64 bytes.  The device in question is:

00:1d.2 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB
Controller #3 (rev 09) (prog-if 00 [UHCI])
        Subsystem: IBM Unknown device 02dd
        Flags: bus master, medium devsel, latency 0, IRQ 90
        I/O ports at 2a00 [size=32]

We have the RSA-II Slimline adaptor installed on these machines, but I do not 
know which USB controller it may be attached to.  The x3755 machine we have
(which is AMD instead of Intel) does not exhibit this problem.

Comment 1 Konrad Rzeszutek 2007-04-18 14:23:23 UTC
Chris,

Have you seen this on your box?

Comment 2 Chris McDermott 2007-04-19 23:49:17 UTC
Konrad,

I personally haven't seen this. I'll check around, though.

Comment 3 Konrad Rzeszutek 2007-04-23 17:49:36 UTC
What kind of USB devices to you have attached to the machine (except the RSA II
card which provides a USB mouse + keyboard).


Comment 4 Brian Wheeler 2007-04-26 12:32:12 UTC
I don't have any usb devices attached to the machine.  Here's the output of lsusb

Bus 003 Device 001: ID 0000:0000  
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               1.10
  bDeviceClass            9 Hub
  bDeviceSubClass         0 Unused
  bDeviceProtocol         0 Full speed (or root) hub
  bMaxPacketSize0        64
  idVendor           0x0000 
  idProduct          0x0000 
  bcdDevice            2.06
  iManufacturer           3 Linux 2.6.18-8.el5 uhci_hcd
  iProduct                2 UHCI Host Controller
  iSerial                 1 0000:00:1d.2
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           25
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0 
    bmAttributes         0xe0
      Self Powered
      Remote Wakeup
    MaxPower                0mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         9 Hub
      bInterfaceSubClass      0 Unused
      bInterfaceProtocol      0 Full speed (or root) hub
      iInterface              0 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0002  1x 2 bytes
        bInterval             255
Hub Descriptor:
  bLength               9
  bDescriptorType      41
  nNbrPorts             2
  wHubCharacteristic 0x000a
    No power switching (usb 1.0)
    Per-port overcurrent protection
  bPwrOn2PwrGood        1 * 2 milli seconds
  bHubContrCurrent      0 milli Ampere
  DeviceRemovable    0x00
  PortPwrCtrlMask    0x60 
 Hub Port Status:
   Port 1: 0000.0103 power enable connect
   Port 2: 0000.0100 power

Bus 003 Device 003: ID 04b3:4001 IBM Corp. 
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               1.10
  bDeviceClass            0 (Defined at Interface level)
  bDeviceSubClass         0 
  bDeviceProtocol         0 
  bMaxPacketSize0        64
  idVendor           0x04b3 IBM Corp.
  idProduct          0x4001 
  bcdDevice            0.01
  iManufacturer           1 
  iProduct                2 
  iSerial                 3 
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           91
    bNumInterfaces          3
    bConfigurationValue     1
    iConfiguration          4 
    bmAttributes         0xe0
      Self Powered
      Remote Wakeup
    MaxPower                0mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         3 Human Interface Devices
      bInterfaceSubClass      1 Boot Interface Subclass
      bInterfaceProtocol      1 Keyboard
      iInterface              5 
        HID Device Descriptor:
          bLength                 9
          bDescriptorType        33
          bcdHID               1.10
          bCountryCode            0 Not supported
          bNumDescriptors         1
          bDescriptorType        34 Report
          wDescriptorLength      65
         Report Descriptors: 
           ** UNAVAILABLE **
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x82  EP 2 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0008  1x 8 bytes
        bInterval              10
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        1
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         3 Human Interface Devices
      bInterfaceSubClass      0 No Subclass
      bInterfaceProtocol      2 Mouse
      iInterface              6 
        HID Device Descriptor:
          bLength                 9
          bDescriptorType        33
          bcdHID               1.10
          bCountryCode            0 Not supported
          bNumDescriptors         1
          bDescriptorType        34 Report
          wDescriptorLength      63
         Report Descriptors: 
           ** UNAVAILABLE **
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0006  1x 6 bytes
        bInterval              10
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        2
      bAlternateSetting       0
      bNumEndpoints           2
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass      0 
      bInterfaceProtocol      0 
      iInterface              7 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x84  EP 4 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0040  1x 64 bytes
        bInterval               8
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x01  EP 1 OUT
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0040  1x 64 bytes
        bInterval               8

Bus 005 Device 001: ID 0000:0000  
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass            9 Hub
  bDeviceSubClass         0 Unused
  bDeviceProtocol         1 Single TT
  bMaxPacketSize0        64
  idVendor           0x0000 
  idProduct          0x0000 
  bcdDevice            2.06
  iManufacturer           3 Linux 2.6.18-8.el5 ehci_hcd
  iProduct                2 EHCI Host Controller
  iSerial                 1 0000:00:1d.7
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           25
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0 
    bmAttributes         0xe0
      Self Powered
      Remote Wakeup
    MaxPower                0mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         9 Hub
      bInterfaceSubClass      0 Unused
      bInterfaceProtocol      0 Full speed (or root) hub
      iInterface              0 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0002  1x 2 bytes
        bInterval              12
Hub Descriptor:
  bLength              11
  bDescriptorType      41
  nNbrPorts             8
  wHubCharacteristic 0x000a
    No power switching (usb 1.0)
    Per-port overcurrent protection
    TT think time 8 FS bits
  bPwrOn2PwrGood       10 * 2 milli seconds
  bHubContrCurrent      0 milli Ampere
  DeviceRemovable    0x00 0x60
  PortPwrCtrlMask    0x87  0x34 
 Hub Port Status:
   Port 1: 0000.0100 power
   Port 2: 0000.0100 power
   Port 3: 0000.0100 power
   Port 4: 0000.0100 power
   Port 5: 0000.0000
   Port 6: 0000.0100 power
   Port 7: 0000.0100 power
   Port 8: 0000.0100 power

Bus 001 Device 001: ID 0000:0000  
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               1.10
  bDeviceClass            9 Hub
  bDeviceSubClass         0 Unused
  bDeviceProtocol         0 Full speed (or root) hub
  bMaxPacketSize0        64
  idVendor           0x0000 
  idProduct          0x0000 
  bcdDevice            2.06
  iManufacturer           3 Linux 2.6.18-8.el5 uhci_hcd
  iProduct                2 UHCI Host Controller
  iSerial                 1 0000:00:1d.0
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           25
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0 
    bmAttributes         0xe0
      Self Powered
      Remote Wakeup
    MaxPower                0mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         9 Hub
      bInterfaceSubClass      0 Unused
      bInterfaceProtocol      0 Full speed (or root) hub
      iInterface              0 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0002  1x 2 bytes
        bInterval             255
Hub Descriptor:
  bLength               9
  bDescriptorType      41
  nNbrPorts             2
  wHubCharacteristic 0x000a
    No power switching (usb 1.0)
    Per-port overcurrent protection
  bPwrOn2PwrGood        1 * 2 milli seconds
  bHubContrCurrent      0 milli Ampere
  DeviceRemovable    0x00
  PortPwrCtrlMask    0x60 
 Hub Port Status:
   Port 1: 0000.0100 power
   Port 2: 0000.0100 power

Bus 002 Device 001: ID 0000:0000  
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               1.10
  bDeviceClass            9 Hub
  bDeviceSubClass         0 Unused
  bDeviceProtocol         0 Full speed (or root) hub
  bMaxPacketSize0        64
  idVendor           0x0000 
  idProduct          0x0000 
  bcdDevice            2.06
  iManufacturer           3 Linux 2.6.18-8.el5 uhci_hcd
  iProduct                2 UHCI Host Controller
  iSerial                 1 0000:00:1d.1
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           25
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0 
    bmAttributes         0xe0
      Self Powered
      Remote Wakeup
    MaxPower                0mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         9 Hub
      bInterfaceSubClass      0 Unused
      bInterfaceProtocol      0 Full speed (or root) hub
      iInterface              0 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0002  1x 2 bytes
        bInterval             255
Hub Descriptor:
  bLength               9
  bDescriptorType      41
  nNbrPorts             2
  wHubCharacteristic 0x000a
    No power switching (usb 1.0)
    Per-port overcurrent protection
  bPwrOn2PwrGood        1 * 2 milli seconds
  bHubContrCurrent      0 milli Ampere
  DeviceRemovable    0x00
  PortPwrCtrlMask    0x60 
 Hub Port Status:
   Port 1: 0000.0100 power
   Port 2: 0000.0100 power

Bus 004 Device 001: ID 0000:0000  
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               1.10
  bDeviceClass            9 Hub
  bDeviceSubClass         0 Unused
  bDeviceProtocol         0 Full speed (or root) hub
  bMaxPacketSize0        64
  idVendor           0x0000 
  idProduct          0x0000 
  bcdDevice            2.06
  iManufacturer           3 Linux 2.6.18-8.el5 uhci_hcd
  iProduct                2 UHCI Host Controller
  iSerial                 1 0000:00:1d.3
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           25
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0 
    bmAttributes         0xe0
      Self Powered
      Remote Wakeup
    MaxPower                0mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         9 Hub
      bInterfaceSubClass      0 Unused
      bInterfaceProtocol      0 Full speed (or root) hub
      iInterface              0 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0002  1x 2 bytes
        bInterval             255
Hub Descriptor:
  bLength               9
  bDescriptorType      41
  nNbrPorts             2
  wHubCharacteristic 0x000a
    No power switching (usb 1.0)
    Per-port overcurrent protection
  bPwrOn2PwrGood        1 * 2 milli seconds
  bHubContrCurrent      0 milli Ampere
  DeviceRemovable    0x00
  PortPwrCtrlMask    0x60 
 Hub Port Status:
   Port 1: 0000.0100 power
   Port 2: 0000.0100 power

Comment 5 Konrad Rzeszutek 2007-05-09 18:32:05 UTC
I can't reproduce on the box I have here.

When looking at the kernel I see that this message comes out of:

static void
swiotlb_full(struct device *dev, size_t size, int dir, int do_panic)
{
        /*
         * Ran out of IOMMU space for this operation. This is very bad.
         * Unfortunately the drivers cannot handle this operation properly.
         * unless they check for pci_dma_mapping_error (most don't)
         * When the mapping is small enough return a static buffer to limit
         * the damage, or panic when the transfer is too big.
         */
        printk(KERN_ERR "PCI-DMA: Out of SW-IOMMU space for %lu bytes at "
               "device %s\n", (unsigned long)size, dev ? dev->bus_id : "?");

        if (size > io_tlb_overflow && do_panic) {
                if (dir == PCI_DMA_FROMDEVICE || dir == PCI_DMA_BIDIRECTIONAL)
                        panic("PCI-DMA: Memory would be corrupted\n");
                if (dir == PCI_DMA_TODEVICE || dir == PCI_DMA_BIDIRECTIONAL)
                        panic("PCI-DMA: Random memory would be DMAed\n");
        }
}


So, does the box panic afterwards?

Can you also provide the full output of dmesg?

Comment 6 Brian Wheeler 2007-05-09 18:43:49 UTC
Created attachment 154418 [details]
boot output

Comment 7 Brian Wheeler 2007-05-09 18:44:44 UTC
I'm now running 2.6.18-8.1.3.el5 with the same results.  The dmesg output is
pretty useless:
-------------------------
usb 3-1: usbfs: usb_submit_urb returned -22
DMA: Out of SW-IOMMU space for 64 bytes at device 0000:00:1d.2
[bunches of these]

DMA: Out of SW-IOMMU space for 8 bytes at device 0000:00:1d.2
DMA: Out of SW-IOMMU space for 8 bytes at device 0000:00:1d.2
DMA: Out of SW-IOMMU space for 8 bytes at device 0000:00:1d.2

[then the whole thing repeats]
-------------------
I've attached the syslog messages related to boot until the first instance of
the out of SW-IOMMU.

The box doesn't panic, just keeps chugging along.

Comment 8 Konrad Rzeszutek 2007-05-09 19:06:16 UTC
Pete,

Would this be similar to "230427: cannot send bulk request to UHCI interrupt
endpoint" ?

Brian,

Thank you for the dmesg. By any chance are you running the IBM RSA2
daemon/helper application?

Comment 9 Brian Wheeler 2007-05-09 19:12:21 UTC
Yes, I am.  Version 1.09

Comment 10 Konrad Rzeszutek 2007-05-09 19:27:50 UTC
Brian,

Then you will be happy to know that "230427: cannot send bulk request to UHCI
interrupt endpoint" has the fix.

I am going to close this BZ as duplicate of that BZ (which is slated to go in
RHEL5 U1).


*** This bug has been marked as a duplicate of 230427 ***

Comment 11 Pete Zaitcev 2007-05-09 19:57:21 UTC
I don't see how rejecting bulk transfers would deplete the swiotlb pool,
as long as the HCD does not leak something in its error paths (if it does,
we have to fix that). I'll need to look further into this.


Comment 12 Konrad Rzeszutek 2007-05-09 20:05:55 UTC
I think I jumped the gun on this one.

Brian, can you uninstall the IBM RSA2 helper applicaiton and see if that solves
the issue?

Comment 13 Brian Wheeler 2007-05-10 14:58:58 UTC
I've shut the application down and the messages have stopped, so it does seem
related to that app.

Comment 14 Max Asbock 2007-05-10 17:13:15 UTC
I think Konrad was right after all in assuming that this related to the other
RSA II USB problem (230427).
In RHEL5 due to a bug in the USB kernel subsystem the ibmasm daemon fails to
send a bulk request to the RSAII endpoint. Therefore it assumes that the RSA II
is just being slow (or being reset) and keeps retrying forever. I have the
suspicion that the retrying uses up some resource and after about seven hours of
trying we get the IOMMU message.
May  7 14:07:08 feta ibmasm: SP USB device not found, will reload to search
every 10 seconds forever. 
.......
May  7 21:16:19 feta kernel: DMA: Out of SW-IOMMU space for 64 bytes at device
0000:00:1d.2
May  7 21:16:19 feta last message repeated 19 times
May  7 21:16:19 feta kernel: DMA: Out of SW-IOMMU space for 8 bytes at device
0000:00:1d.2
May  7 21:16:19 feta last message repeated 2 times

Since the problem described in 230427 will be fixed in RHEL 5.1 the retrying
will go away and therefore the problem will go away.
Alternatively, a version of the ibmasm daemon with a workaround could be used. A
daemon with a workaround has been implemented but it may not have been
officially release yet. This also would avoid the retry issue.
While there are solutions to the problem (assuming my assumption as to the cause
is correct) there still is the question where the root cause is:
Is the ibmasm daemon failing to free a resource while retrying or is libusb or
the kernel? I think for that reason this could be kept as a separate bug to
allow for debugging.

Comment 15 Konrad Rzeszutek 2007-05-10 22:16:14 UTC
Brian,

I am building a test kernel with the patch. Would it be possible for you to test
it along with the IBM RSA 1.09 daemon?

Comment 16 Brian Wheeler 2007-05-10 22:30:16 UTC
Sure.  None of these machines are in production yet, so I have a lot of leeway
with them :)



Comment 17 Konrad Rzeszutek 2007-05-11 01:21:48 UTC
Brian,

Here are the kernels
for 32-bit:
http://www.darnok.org/kernels/kernel-2.6.18-18.el5_usb.i686.rpm
for 64-bit:
http://www.darnok.org/kernels/kernel-2.6.18-18.el5_usb.x86_64.rpm

Thank you for testing them.

Comment 18 Brian Wheeler 2007-05-14 18:15:17 UTC
I've booted the 64-bit kernel a few minutes ago and I've not seen any messages
so far.  It looks like it took ~7 hours last time for the messages to start, so
I'll let you know tomorrow if they're back.

Comment 19 Brian Wheeler 2007-05-16 12:06:00 UTC
I was out of the office yesterday, but checking the logs this morning, that
seems to have fixed the problem.

Thanks!

Comment 20 Konrad Rzeszutek 2007-05-16 19:02:03 UTC
Brian,

Thanks for testing. Closing this BZ as DUP. I will ask Pete to post the patch
from BZ 230427 so that it will be included in RHEL5 U1.

*** This bug has been marked as a duplicate of 230427 ***

Comment 21 Pete Zaitcev 2007-06-01 20:41:30 UTC
Created attachment 155937 [details]
Candidate patch 1 - free DMA mappings upon a submission error

Comment 22 Pete Zaitcev 2007-06-02 00:00:43 UTC
Test kernel is available at:
 http://people.redhat.com/zaitcev/ftp/230427/
(it combines the fixes for bug 230427 and bug 236922)


Comment 23 RHEL Program Management 2007-06-06 19:02:18 UTC
This request was evaluated by Red Hat Kernel Team for inclusion in a Red
Hat Enterprise Linux maintenance release, and has moved to bugzilla 
status POST.

Comment 24 Don Zickus 2007-07-19 21:10:52 UTC
in kernel-2.6.18-26.el5

Comment 27 errata-xmlrpc 2007-11-07 19:47:09 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0959.html