Bug 236922
Summary: | DMA: Out of SW-IOMMU space | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Brian Wheeler <bdwheele> | ||||||
Component: | kernel | Assignee: | Pete Zaitcev <zaitcev> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 5.0 | CC: | amax, lcm, peterm, zaitcev | ||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | RHBA-2007-0959 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2007-11-07 19:47:09 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Brian Wheeler
2007-04-18 13:54:01 UTC
Chris, Have you seen this on your box? Konrad, I personally haven't seen this. I'll check around, though. What kind of USB devices to you have attached to the machine (except the RSA II card which provides a USB mouse + keyboard). I don't have any usb devices attached to the machine. Here's the output of lsusb Bus 003 Device 001: ID 0000:0000 Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 1.10 bDeviceClass 9 Hub bDeviceSubClass 0 Unused bDeviceProtocol 0 Full speed (or root) hub bMaxPacketSize0 64 idVendor 0x0000 idProduct 0x0000 bcdDevice 2.06 iManufacturer 3 Linux 2.6.18-8.el5 uhci_hcd iProduct 2 UHCI Host Controller iSerial 1 0000:00:1d.2 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 25 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0xe0 Self Powered Remote Wakeup MaxPower 0mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 9 Hub bInterfaceSubClass 0 Unused bInterfaceProtocol 0 Full speed (or root) hub iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0002 1x 2 bytes bInterval 255 Hub Descriptor: bLength 9 bDescriptorType 41 nNbrPorts 2 wHubCharacteristic 0x000a No power switching (usb 1.0) Per-port overcurrent protection bPwrOn2PwrGood 1 * 2 milli seconds bHubContrCurrent 0 milli Ampere DeviceRemovable 0x00 PortPwrCtrlMask 0x60 Hub Port Status: Port 1: 0000.0103 power enable connect Port 2: 0000.0100 power Bus 003 Device 003: ID 04b3:4001 IBM Corp. Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 1.10 bDeviceClass 0 (Defined at Interface level) bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 64 idVendor 0x04b3 IBM Corp. idProduct 0x4001 bcdDevice 0.01 iManufacturer 1 iProduct 2 iSerial 3 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 91 bNumInterfaces 3 bConfigurationValue 1 iConfiguration 4 bmAttributes 0xe0 Self Powered Remote Wakeup MaxPower 0mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 3 Human Interface Devices bInterfaceSubClass 1 Boot Interface Subclass bInterfaceProtocol 1 Keyboard iInterface 5 HID Device Descriptor: bLength 9 bDescriptorType 33 bcdHID 1.10 bCountryCode 0 Not supported bNumDescriptors 1 bDescriptorType 34 Report wDescriptorLength 65 Report Descriptors: ** UNAVAILABLE ** Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x82 EP 2 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0008 1x 8 bytes bInterval 10 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 1 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 3 Human Interface Devices bInterfaceSubClass 0 No Subclass bInterfaceProtocol 2 Mouse iInterface 6 HID Device Descriptor: bLength 9 bDescriptorType 33 bcdHID 1.10 bCountryCode 0 Not supported bNumDescriptors 1 bDescriptorType 34 Report wDescriptorLength 63 Report Descriptors: ** UNAVAILABLE ** Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0006 1x 6 bytes bInterval 10 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 2 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 0 bInterfaceProtocol 0 iInterface 7 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x84 EP 4 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0040 1x 64 bytes bInterval 8 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x01 EP 1 OUT bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0040 1x 64 bytes bInterval 8 Bus 005 Device 001: ID 0000:0000 Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass 9 Hub bDeviceSubClass 0 Unused bDeviceProtocol 1 Single TT bMaxPacketSize0 64 idVendor 0x0000 idProduct 0x0000 bcdDevice 2.06 iManufacturer 3 Linux 2.6.18-8.el5 ehci_hcd iProduct 2 EHCI Host Controller iSerial 1 0000:00:1d.7 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 25 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0xe0 Self Powered Remote Wakeup MaxPower 0mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 9 Hub bInterfaceSubClass 0 Unused bInterfaceProtocol 0 Full speed (or root) hub iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0002 1x 2 bytes bInterval 12 Hub Descriptor: bLength 11 bDescriptorType 41 nNbrPorts 8 wHubCharacteristic 0x000a No power switching (usb 1.0) Per-port overcurrent protection TT think time 8 FS bits bPwrOn2PwrGood 10 * 2 milli seconds bHubContrCurrent 0 milli Ampere DeviceRemovable 0x00 0x60 PortPwrCtrlMask 0x87 0x34 Hub Port Status: Port 1: 0000.0100 power Port 2: 0000.0100 power Port 3: 0000.0100 power Port 4: 0000.0100 power Port 5: 0000.0000 Port 6: 0000.0100 power Port 7: 0000.0100 power Port 8: 0000.0100 power Bus 001 Device 001: ID 0000:0000 Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 1.10 bDeviceClass 9 Hub bDeviceSubClass 0 Unused bDeviceProtocol 0 Full speed (or root) hub bMaxPacketSize0 64 idVendor 0x0000 idProduct 0x0000 bcdDevice 2.06 iManufacturer 3 Linux 2.6.18-8.el5 uhci_hcd iProduct 2 UHCI Host Controller iSerial 1 0000:00:1d.0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 25 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0xe0 Self Powered Remote Wakeup MaxPower 0mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 9 Hub bInterfaceSubClass 0 Unused bInterfaceProtocol 0 Full speed (or root) hub iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0002 1x 2 bytes bInterval 255 Hub Descriptor: bLength 9 bDescriptorType 41 nNbrPorts 2 wHubCharacteristic 0x000a No power switching (usb 1.0) Per-port overcurrent protection bPwrOn2PwrGood 1 * 2 milli seconds bHubContrCurrent 0 milli Ampere DeviceRemovable 0x00 PortPwrCtrlMask 0x60 Hub Port Status: Port 1: 0000.0100 power Port 2: 0000.0100 power Bus 002 Device 001: ID 0000:0000 Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 1.10 bDeviceClass 9 Hub bDeviceSubClass 0 Unused bDeviceProtocol 0 Full speed (or root) hub bMaxPacketSize0 64 idVendor 0x0000 idProduct 0x0000 bcdDevice 2.06 iManufacturer 3 Linux 2.6.18-8.el5 uhci_hcd iProduct 2 UHCI Host Controller iSerial 1 0000:00:1d.1 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 25 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0xe0 Self Powered Remote Wakeup MaxPower 0mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 9 Hub bInterfaceSubClass 0 Unused bInterfaceProtocol 0 Full speed (or root) hub iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0002 1x 2 bytes bInterval 255 Hub Descriptor: bLength 9 bDescriptorType 41 nNbrPorts 2 wHubCharacteristic 0x000a No power switching (usb 1.0) Per-port overcurrent protection bPwrOn2PwrGood 1 * 2 milli seconds bHubContrCurrent 0 milli Ampere DeviceRemovable 0x00 PortPwrCtrlMask 0x60 Hub Port Status: Port 1: 0000.0100 power Port 2: 0000.0100 power Bus 004 Device 001: ID 0000:0000 Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 1.10 bDeviceClass 9 Hub bDeviceSubClass 0 Unused bDeviceProtocol 0 Full speed (or root) hub bMaxPacketSize0 64 idVendor 0x0000 idProduct 0x0000 bcdDevice 2.06 iManufacturer 3 Linux 2.6.18-8.el5 uhci_hcd iProduct 2 UHCI Host Controller iSerial 1 0000:00:1d.3 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 25 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0xe0 Self Powered Remote Wakeup MaxPower 0mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 9 Hub bInterfaceSubClass 0 Unused bInterfaceProtocol 0 Full speed (or root) hub iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0002 1x 2 bytes bInterval 255 Hub Descriptor: bLength 9 bDescriptorType 41 nNbrPorts 2 wHubCharacteristic 0x000a No power switching (usb 1.0) Per-port overcurrent protection bPwrOn2PwrGood 1 * 2 milli seconds bHubContrCurrent 0 milli Ampere DeviceRemovable 0x00 PortPwrCtrlMask 0x60 Hub Port Status: Port 1: 0000.0100 power Port 2: 0000.0100 power I can't reproduce on the box I have here. When looking at the kernel I see that this message comes out of: static void swiotlb_full(struct device *dev, size_t size, int dir, int do_panic) { /* * Ran out of IOMMU space for this operation. This is very bad. * Unfortunately the drivers cannot handle this operation properly. * unless they check for pci_dma_mapping_error (most don't) * When the mapping is small enough return a static buffer to limit * the damage, or panic when the transfer is too big. */ printk(KERN_ERR "PCI-DMA: Out of SW-IOMMU space for %lu bytes at " "device %s\n", (unsigned long)size, dev ? dev->bus_id : "?"); if (size > io_tlb_overflow && do_panic) { if (dir == PCI_DMA_FROMDEVICE || dir == PCI_DMA_BIDIRECTIONAL) panic("PCI-DMA: Memory would be corrupted\n"); if (dir == PCI_DMA_TODEVICE || dir == PCI_DMA_BIDIRECTIONAL) panic("PCI-DMA: Random memory would be DMAed\n"); } } So, does the box panic afterwards? Can you also provide the full output of dmesg? Created attachment 154418 [details]
boot output
I'm now running 2.6.18-8.1.3.el5 with the same results. The dmesg output is pretty useless: ------------------------- usb 3-1: usbfs: usb_submit_urb returned -22 DMA: Out of SW-IOMMU space for 64 bytes at device 0000:00:1d.2 [bunches of these] DMA: Out of SW-IOMMU space for 8 bytes at device 0000:00:1d.2 DMA: Out of SW-IOMMU space for 8 bytes at device 0000:00:1d.2 DMA: Out of SW-IOMMU space for 8 bytes at device 0000:00:1d.2 [then the whole thing repeats] ------------------- I've attached the syslog messages related to boot until the first instance of the out of SW-IOMMU. The box doesn't panic, just keeps chugging along. Pete, Would this be similar to "230427: cannot send bulk request to UHCI interrupt endpoint" ? Brian, Thank you for the dmesg. By any chance are you running the IBM RSA2 daemon/helper application? Yes, I am. Version 1.09 Brian, Then you will be happy to know that "230427: cannot send bulk request to UHCI interrupt endpoint" has the fix. I am going to close this BZ as duplicate of that BZ (which is slated to go in RHEL5 U1). *** This bug has been marked as a duplicate of 230427 *** I don't see how rejecting bulk transfers would deplete the swiotlb pool, as long as the HCD does not leak something in its error paths (if it does, we have to fix that). I'll need to look further into this. I think I jumped the gun on this one. Brian, can you uninstall the IBM RSA2 helper applicaiton and see if that solves the issue? I've shut the application down and the messages have stopped, so it does seem related to that app. I think Konrad was right after all in assuming that this related to the other RSA II USB problem (230427). In RHEL5 due to a bug in the USB kernel subsystem the ibmasm daemon fails to send a bulk request to the RSAII endpoint. Therefore it assumes that the RSA II is just being slow (or being reset) and keeps retrying forever. I have the suspicion that the retrying uses up some resource and after about seven hours of trying we get the IOMMU message. May 7 14:07:08 feta ibmasm: SP USB device not found, will reload to search every 10 seconds forever. ....... May 7 21:16:19 feta kernel: DMA: Out of SW-IOMMU space for 64 bytes at device 0000:00:1d.2 May 7 21:16:19 feta last message repeated 19 times May 7 21:16:19 feta kernel: DMA: Out of SW-IOMMU space for 8 bytes at device 0000:00:1d.2 May 7 21:16:19 feta last message repeated 2 times Since the problem described in 230427 will be fixed in RHEL 5.1 the retrying will go away and therefore the problem will go away. Alternatively, a version of the ibmasm daemon with a workaround could be used. A daemon with a workaround has been implemented but it may not have been officially release yet. This also would avoid the retry issue. While there are solutions to the problem (assuming my assumption as to the cause is correct) there still is the question where the root cause is: Is the ibmasm daemon failing to free a resource while retrying or is libusb or the kernel? I think for that reason this could be kept as a separate bug to allow for debugging. Brian, I am building a test kernel with the patch. Would it be possible for you to test it along with the IBM RSA 1.09 daemon? Sure. None of these machines are in production yet, so I have a lot of leeway with them :) Brian, Here are the kernels for 32-bit: http://www.darnok.org/kernels/kernel-2.6.18-18.el5_usb.i686.rpm for 64-bit: http://www.darnok.org/kernels/kernel-2.6.18-18.el5_usb.x86_64.rpm Thank you for testing them. I've booted the 64-bit kernel a few minutes ago and I've not seen any messages so far. It looks like it took ~7 hours last time for the messages to start, so I'll let you know tomorrow if they're back. I was out of the office yesterday, but checking the logs this morning, that seems to have fixed the problem. Thanks! Brian, Thanks for testing. Closing this BZ as DUP. I will ask Pete to post the patch from BZ 230427 so that it will be included in RHEL5 U1. *** This bug has been marked as a duplicate of 230427 *** Created attachment 155937 [details]
Candidate patch 1 - free DMA mappings upon a submission error
Test kernel is available at: http://people.redhat.com/zaitcev/ftp/230427/ (it combines the fixes for bug 230427 and bug 236922) This request was evaluated by Red Hat Kernel Team for inclusion in a Red Hat Enterprise Linux maintenance release, and has moved to bugzilla status POST. in kernel-2.6.18-26.el5 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0959.html |