Bug 236922
| Summary: | DMA: Out of SW-IOMMU space | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Brian Wheeler <bdwheele> | ||||||
| Component: | kernel | Assignee: | Pete Zaitcev <zaitcev> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 5.0 | CC: | amax, lcm, peterm, zaitcev | ||||||
| Target Milestone: | --- | Keywords: | Reopened | ||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | RHBA-2007-0959 | Doc Type: | Bug Fix | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2007-11-07 19:47:09 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Chris, Have you seen this on your box? Konrad, I personally haven't seen this. I'll check around, though. What kind of USB devices to you have attached to the machine (except the RSA II card which provides a USB mouse + keyboard). I don't have any usb devices attached to the machine. Here's the output of lsusb
Bus 003 Device 001: ID 0000:0000
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 1.10
bDeviceClass 9 Hub
bDeviceSubClass 0 Unused
bDeviceProtocol 0 Full speed (or root) hub
bMaxPacketSize0 64
idVendor 0x0000
idProduct 0x0000
bcdDevice 2.06
iManufacturer 3 Linux 2.6.18-8.el5 uhci_hcd
iProduct 2 UHCI Host Controller
iSerial 1 0000:00:1d.2
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 25
bNumInterfaces 1
bConfigurationValue 1
iConfiguration 0
bmAttributes 0xe0
Self Powered
Remote Wakeup
MaxPower 0mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 1
bInterfaceClass 9 Hub
bInterfaceSubClass 0 Unused
bInterfaceProtocol 0 Full speed (or root) hub
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0002 1x 2 bytes
bInterval 255
Hub Descriptor:
bLength 9
bDescriptorType 41
nNbrPorts 2
wHubCharacteristic 0x000a
No power switching (usb 1.0)
Per-port overcurrent protection
bPwrOn2PwrGood 1 * 2 milli seconds
bHubContrCurrent 0 milli Ampere
DeviceRemovable 0x00
PortPwrCtrlMask 0x60
Hub Port Status:
Port 1: 0000.0103 power enable connect
Port 2: 0000.0100 power
Bus 003 Device 003: ID 04b3:4001 IBM Corp.
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 1.10
bDeviceClass 0 (Defined at Interface level)
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 64
idVendor 0x04b3 IBM Corp.
idProduct 0x4001
bcdDevice 0.01
iManufacturer 1
iProduct 2
iSerial 3
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 91
bNumInterfaces 3
bConfigurationValue 1
iConfiguration 4
bmAttributes 0xe0
Self Powered
Remote Wakeup
MaxPower 0mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 1
bInterfaceClass 3 Human Interface Devices
bInterfaceSubClass 1 Boot Interface Subclass
bInterfaceProtocol 1 Keyboard
iInterface 5
HID Device Descriptor:
bLength 9
bDescriptorType 33
bcdHID 1.10
bCountryCode 0 Not supported
bNumDescriptors 1
bDescriptorType 34 Report
wDescriptorLength 65
Report Descriptors:
** UNAVAILABLE **
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x82 EP 2 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0008 1x 8 bytes
bInterval 10
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 1
bAlternateSetting 0
bNumEndpoints 1
bInterfaceClass 3 Human Interface Devices
bInterfaceSubClass 0 No Subclass
bInterfaceProtocol 2 Mouse
iInterface 6
HID Device Descriptor:
bLength 9
bDescriptorType 33
bcdHID 1.10
bCountryCode 0 Not supported
bNumDescriptors 1
bDescriptorType 34 Report
wDescriptorLength 63
Report Descriptors:
** UNAVAILABLE **
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0006 1x 6 bytes
bInterval 10
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 2
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 0
bInterfaceProtocol 0
iInterface 7
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x84 EP 4 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 8
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x01 EP 1 OUT
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 8
Bus 005 Device 001: ID 0000:0000
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 2.00
bDeviceClass 9 Hub
bDeviceSubClass 0 Unused
bDeviceProtocol 1 Single TT
bMaxPacketSize0 64
idVendor 0x0000
idProduct 0x0000
bcdDevice 2.06
iManufacturer 3 Linux 2.6.18-8.el5 ehci_hcd
iProduct 2 EHCI Host Controller
iSerial 1 0000:00:1d.7
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 25
bNumInterfaces 1
bConfigurationValue 1
iConfiguration 0
bmAttributes 0xe0
Self Powered
Remote Wakeup
MaxPower 0mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 1
bInterfaceClass 9 Hub
bInterfaceSubClass 0 Unused
bInterfaceProtocol 0 Full speed (or root) hub
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0002 1x 2 bytes
bInterval 12
Hub Descriptor:
bLength 11
bDescriptorType 41
nNbrPorts 8
wHubCharacteristic 0x000a
No power switching (usb 1.0)
Per-port overcurrent protection
TT think time 8 FS bits
bPwrOn2PwrGood 10 * 2 milli seconds
bHubContrCurrent 0 milli Ampere
DeviceRemovable 0x00 0x60
PortPwrCtrlMask 0x87 0x34
Hub Port Status:
Port 1: 0000.0100 power
Port 2: 0000.0100 power
Port 3: 0000.0100 power
Port 4: 0000.0100 power
Port 5: 0000.0000
Port 6: 0000.0100 power
Port 7: 0000.0100 power
Port 8: 0000.0100 power
Bus 001 Device 001: ID 0000:0000
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 1.10
bDeviceClass 9 Hub
bDeviceSubClass 0 Unused
bDeviceProtocol 0 Full speed (or root) hub
bMaxPacketSize0 64
idVendor 0x0000
idProduct 0x0000
bcdDevice 2.06
iManufacturer 3 Linux 2.6.18-8.el5 uhci_hcd
iProduct 2 UHCI Host Controller
iSerial 1 0000:00:1d.0
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 25
bNumInterfaces 1
bConfigurationValue 1
iConfiguration 0
bmAttributes 0xe0
Self Powered
Remote Wakeup
MaxPower 0mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 1
bInterfaceClass 9 Hub
bInterfaceSubClass 0 Unused
bInterfaceProtocol 0 Full speed (or root) hub
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0002 1x 2 bytes
bInterval 255
Hub Descriptor:
bLength 9
bDescriptorType 41
nNbrPorts 2
wHubCharacteristic 0x000a
No power switching (usb 1.0)
Per-port overcurrent protection
bPwrOn2PwrGood 1 * 2 milli seconds
bHubContrCurrent 0 milli Ampere
DeviceRemovable 0x00
PortPwrCtrlMask 0x60
Hub Port Status:
Port 1: 0000.0100 power
Port 2: 0000.0100 power
Bus 002 Device 001: ID 0000:0000
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 1.10
bDeviceClass 9 Hub
bDeviceSubClass 0 Unused
bDeviceProtocol 0 Full speed (or root) hub
bMaxPacketSize0 64
idVendor 0x0000
idProduct 0x0000
bcdDevice 2.06
iManufacturer 3 Linux 2.6.18-8.el5 uhci_hcd
iProduct 2 UHCI Host Controller
iSerial 1 0000:00:1d.1
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 25
bNumInterfaces 1
bConfigurationValue 1
iConfiguration 0
bmAttributes 0xe0
Self Powered
Remote Wakeup
MaxPower 0mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 1
bInterfaceClass 9 Hub
bInterfaceSubClass 0 Unused
bInterfaceProtocol 0 Full speed (or root) hub
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0002 1x 2 bytes
bInterval 255
Hub Descriptor:
bLength 9
bDescriptorType 41
nNbrPorts 2
wHubCharacteristic 0x000a
No power switching (usb 1.0)
Per-port overcurrent protection
bPwrOn2PwrGood 1 * 2 milli seconds
bHubContrCurrent 0 milli Ampere
DeviceRemovable 0x00
PortPwrCtrlMask 0x60
Hub Port Status:
Port 1: 0000.0100 power
Port 2: 0000.0100 power
Bus 004 Device 001: ID 0000:0000
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 1.10
bDeviceClass 9 Hub
bDeviceSubClass 0 Unused
bDeviceProtocol 0 Full speed (or root) hub
bMaxPacketSize0 64
idVendor 0x0000
idProduct 0x0000
bcdDevice 2.06
iManufacturer 3 Linux 2.6.18-8.el5 uhci_hcd
iProduct 2 UHCI Host Controller
iSerial 1 0000:00:1d.3
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 25
bNumInterfaces 1
bConfigurationValue 1
iConfiguration 0
bmAttributes 0xe0
Self Powered
Remote Wakeup
MaxPower 0mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 1
bInterfaceClass 9 Hub
bInterfaceSubClass 0 Unused
bInterfaceProtocol 0 Full speed (or root) hub
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0002 1x 2 bytes
bInterval 255
Hub Descriptor:
bLength 9
bDescriptorType 41
nNbrPorts 2
wHubCharacteristic 0x000a
No power switching (usb 1.0)
Per-port overcurrent protection
bPwrOn2PwrGood 1 * 2 milli seconds
bHubContrCurrent 0 milli Ampere
DeviceRemovable 0x00
PortPwrCtrlMask 0x60
Hub Port Status:
Port 1: 0000.0100 power
Port 2: 0000.0100 power
I can't reproduce on the box I have here.
When looking at the kernel I see that this message comes out of:
static void
swiotlb_full(struct device *dev, size_t size, int dir, int do_panic)
{
/*
* Ran out of IOMMU space for this operation. This is very bad.
* Unfortunately the drivers cannot handle this operation properly.
* unless they check for pci_dma_mapping_error (most don't)
* When the mapping is small enough return a static buffer to limit
* the damage, or panic when the transfer is too big.
*/
printk(KERN_ERR "PCI-DMA: Out of SW-IOMMU space for %lu bytes at "
"device %s\n", (unsigned long)size, dev ? dev->bus_id : "?");
if (size > io_tlb_overflow && do_panic) {
if (dir == PCI_DMA_FROMDEVICE || dir == PCI_DMA_BIDIRECTIONAL)
panic("PCI-DMA: Memory would be corrupted\n");
if (dir == PCI_DMA_TODEVICE || dir == PCI_DMA_BIDIRECTIONAL)
panic("PCI-DMA: Random memory would be DMAed\n");
}
}
So, does the box panic afterwards?
Can you also provide the full output of dmesg?
Created attachment 154418 [details]
boot output
I'm now running 2.6.18-8.1.3.el5 with the same results. The dmesg output is pretty useless: ------------------------- usb 3-1: usbfs: usb_submit_urb returned -22 DMA: Out of SW-IOMMU space for 64 bytes at device 0000:00:1d.2 [bunches of these] DMA: Out of SW-IOMMU space for 8 bytes at device 0000:00:1d.2 DMA: Out of SW-IOMMU space for 8 bytes at device 0000:00:1d.2 DMA: Out of SW-IOMMU space for 8 bytes at device 0000:00:1d.2 [then the whole thing repeats] ------------------- I've attached the syslog messages related to boot until the first instance of the out of SW-IOMMU. The box doesn't panic, just keeps chugging along. Pete, Would this be similar to "230427: cannot send bulk request to UHCI interrupt endpoint" ? Brian, Thank you for the dmesg. By any chance are you running the IBM RSA2 daemon/helper application? Yes, I am. Version 1.09 Brian, Then you will be happy to know that "230427: cannot send bulk request to UHCI interrupt endpoint" has the fix. I am going to close this BZ as duplicate of that BZ (which is slated to go in RHEL5 U1). *** This bug has been marked as a duplicate of 230427 *** I don't see how rejecting bulk transfers would deplete the swiotlb pool, as long as the HCD does not leak something in its error paths (if it does, we have to fix that). I'll need to look further into this. I think I jumped the gun on this one. Brian, can you uninstall the IBM RSA2 helper applicaiton and see if that solves the issue? I've shut the application down and the messages have stopped, so it does seem related to that app. I think Konrad was right after all in assuming that this related to the other RSA II USB problem (230427). In RHEL5 due to a bug in the USB kernel subsystem the ibmasm daemon fails to send a bulk request to the RSAII endpoint. Therefore it assumes that the RSA II is just being slow (or being reset) and keeps retrying forever. I have the suspicion that the retrying uses up some resource and after about seven hours of trying we get the IOMMU message. May 7 14:07:08 feta ibmasm: SP USB device not found, will reload to search every 10 seconds forever. ....... May 7 21:16:19 feta kernel: DMA: Out of SW-IOMMU space for 64 bytes at device 0000:00:1d.2 May 7 21:16:19 feta last message repeated 19 times May 7 21:16:19 feta kernel: DMA: Out of SW-IOMMU space for 8 bytes at device 0000:00:1d.2 May 7 21:16:19 feta last message repeated 2 times Since the problem described in 230427 will be fixed in RHEL 5.1 the retrying will go away and therefore the problem will go away. Alternatively, a version of the ibmasm daemon with a workaround could be used. A daemon with a workaround has been implemented but it may not have been officially release yet. This also would avoid the retry issue. While there are solutions to the problem (assuming my assumption as to the cause is correct) there still is the question where the root cause is: Is the ibmasm daemon failing to free a resource while retrying or is libusb or the kernel? I think for that reason this could be kept as a separate bug to allow for debugging. Brian, I am building a test kernel with the patch. Would it be possible for you to test it along with the IBM RSA 1.09 daemon? Sure. None of these machines are in production yet, so I have a lot of leeway with them :) Brian, Here are the kernels for 32-bit: http://www.darnok.org/kernels/kernel-2.6.18-18.el5_usb.i686.rpm for 64-bit: http://www.darnok.org/kernels/kernel-2.6.18-18.el5_usb.x86_64.rpm Thank you for testing them. I've booted the 64-bit kernel a few minutes ago and I've not seen any messages so far. It looks like it took ~7 hours last time for the messages to start, so I'll let you know tomorrow if they're back. I was out of the office yesterday, but checking the logs this morning, that seems to have fixed the problem. Thanks! Brian, Thanks for testing. Closing this BZ as DUP. I will ask Pete to post the patch from BZ 230427 so that it will be included in RHEL5 U1. *** This bug has been marked as a duplicate of 230427 *** Created attachment 155937 [details]
Candidate patch 1 - free DMA mappings upon a submission error
Test kernel is available at: http://people.redhat.com/zaitcev/ftp/230427/ (it combines the fixes for bug 230427 and bug 236922) This request was evaluated by Red Hat Kernel Team for inclusion in a Red Hat Enterprise Linux maintenance release, and has moved to bugzilla status POST. in kernel-2.6.18-26.el5 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0959.html |
Description of problem: Version-Release number of selected component (if applicable): 2.6.18-8.1.1.el5 2.6.18-8.1.1.el5xen 2.6.18-8.el5 How reproducible: All my IBM x3650 Machines Steps to Reproduce: 1. boot rhel5 kernel 2. watch the SW-IOMMU messages go by in /var/log/messages 3. wonder if they're important :) Actual results: Lots of messages Expected results: None of these messages, I suppose. Additional info: These messages are appearing on the x3650 machines from IBM we have. The full error message is: Apr 18 09:34:11 calliope kernel: DMA: Out of SW-IOMMU space for 8 bytes at device 0000:00:1d.2 Sometimes it is 64 bytes. The device in question is: 00:1d.2 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #3 (rev 09) (prog-if 00 [UHCI]) Subsystem: IBM Unknown device 02dd Flags: bus master, medium devsel, latency 0, IRQ 90 I/O ports at 2a00 [size=32] We have the RSA-II Slimline adaptor installed on these machines, but I do not know which USB controller it may be attached to. The x3755 machine we have (which is AMD instead of Intel) does not exhibit this problem.