Bug 1119361 - r592 IRQ: DMA error
Summary: r592 IRQ: DMA error
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 20
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-14 15:28 UTC by Neal Becker
Modified: 2016-09-10 03:38 UTC (History)
9 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2015-04-28 18:25:28 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Neal Becker 2014-07-14 15:28:56 UTC
Description of problem:

My syslog is being filled with:
tail /var/log/messages
Jul 14 07:11:00 nbecker1 kernel: r592: IRQ: DMA error
Jul 14 07:11:00 nbecker1 kernel: r592: IRQ: card added
Jul 14 07:11:00 nbecker1 kernel: r592: IRQ: DMA error
Jul 14 07:11:00 nbecker1 kernel: r592: IRQ: card added
Jul 14 07:11:00 nbecker1 kernel: r592: IRQ: DMA error
Jul 14 07:11:00 nbecker1 kernel: r592: IRQ: card added
Jul 14 07:11:00 nbecker1 kernel: r592: IRQ: DMA error
Jul 14 07:11:00 nbecker1 kernel: r592: IRQ: card added
Jul 14 09:18:47 nbecker1 rsyslogd-2177: imjournal: begin to drop messages due to rate-limiting

Version-Release number of selected component (if applicable):

Seems to be true of all 3 of my kernels:

rpm -q kernel
kernel-3.14.9-200.fc20.x86_64
kernel-3.15.3-200.fc20.x86_64
kernel-3.15.4-200.fc20.x86_64

How reproducible:

100%

Steps to Reproduce:
1. boot
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 gordon 2014-08-16 13:23:36 UTC
I am seeing this bug on OpenSUSE with kernel-desktop versions 3.15.6 and 3.16.0. This bug does not appear to be present in the OpenSUSE stock 3.11.6 and 3.11.10 kernels.

The log files on my laptop, /var/log/messages and /var/log/warn, will grow HUGE with the following repeated over and over along with some extraneous related log entries:

2014-08-15T12:18:41.396981-04:00 localhost kernel: [17310.691176] r592: IRQ: card added
2014-08-15T12:18:41.396983-04:00 localhost kernel: [17310.691178] r592: IRQ: DMA error
.
<snip>
.
2014-08-15T12:18:41.397001-04:00 localhost kernel: [17311.223846] r592: IRQ: card added
2014-08-15T12:18:41.397004-04:00 localhost kernel: [17311.223847] r592: IRQ: DMA error

On any given day, there will be tens of thousands of these entries such that both /var/log/messages and /var/log/warn will exceed 3GB within a day.

I believe that this bug is generated due to the presence of a Ricoh R5C592 Memory Stick Adapter on my laptop and which has never been properly recognized by the stock linux kernels. Apparently, the kernel developers keep trying to make it work but have now introduced a "bigger" problem so to speak since it is now causing all these extraneous error messages to be generated in the logs.

I have since reverted back to the earlier kernels that don't have this problem.

FYI,

Gordon

Comment 2 Maxim Levitsky 2014-08-23 10:35:12 UTC
Very sorry I haven't noticed this :-(

Both drivers (r592 and r852) landed long ago, much before 3.12.
I am running here 3.12-rc1 so my first step would be to see if I can reproduce this.
Sorry, I was swamped with studies, but its over now, so I will handle this properly.

could you post your lspci?

I assume you didn't try an xD or Memstick card?

could you post your /proc/interrupts?

Best regards,
        Maxim Levitsky

Comment 3 Neal Becker 2014-08-23 12:02:35 UTC
spci
00:00.0 Host bridge: Intel Corporation Mobile PM965/GM965/GL960 Memory Controller Hub (rev 0c)
00:01.0 PCI bridge: Intel Corporation Mobile PM965/GM965/GL960 PCI Express Root Port (rev 0c)
00:1a.0 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #4 (rev 03)
00:1a.1 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #5 (rev 03)
00:1a.7 USB controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #2 (rev 03)
00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio Controller (rev 03)
00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 (rev 03)
00:1c.1 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 2 (rev 03)
00:1c.5 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 6 (rev 03)
00:1d.0 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #1 (rev 03)
00:1d.1 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #2 (rev 03)
00:1d.2 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #3 (rev 03)
00:1d.7 USB controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 (rev 03)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev f3)
00:1f.0 ISA bridge: Intel Corporation 82801HM (ICH8M) LPC Interface Controller (rev 03)
00:1f.1 IDE interface: Intel Corporation 82801HM/HEM (ICH8M/ICH8M-E) IDE Controller (rev 03)
00:1f.2 SATA controller: Intel Corporation 82801HM/HEM (ICH8M/ICH8M-E) SATA Controller [AHCI mode] (rev 03)
00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev 03)
01:00.0 VGA compatible controller: NVIDIA Corporation G86M [GeForce 8600M GS] (rev a1)
02:00.0 Network controller: Intel Corporation PRO/Wireless 4965 AG or AGN [Kedron] Network Connection (rev 61)
08:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 01)
09:09.0 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 IEEE 1394 Controller (rev 05)
09:09.1 SD Host controller: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 22)
09:09.2 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 12)
09:09.3 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 12)


at /proc/interrupts
           CPU0       CPU1       
  0:   46644053          0   IO-APIC-edge      timer
  1:     101308          0   IO-APIC-edge      i8042
  8:          1          0   IO-APIC-edge      rtc0
  9:      33800          0   IO-APIC-fasteoi   acpi
 12:      18030          0   IO-APIC-edge      i8042
 14:     206806          0   IO-APIC-edge      ata_piix
 15:          0          0   IO-APIC-edge      ata_piix
 16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb3
 18:        784          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb7
 19:          0          0   IO-APIC-fasteoi   uhci_hcd:usb6
 20:         14          0   IO-APIC-fasteoi   firewire_ohci
 21:     815080          0   IO-APIC-fasteoi   uhci_hcd:usb4, r592, mmc0
 23:         14          0   IO-APIC-fasteoi   ehci_hcd:usb2, uhci_hcd:usb5
 44:    7114760     933804   PCI-MSI-edge      ahci
 45:      64570      33985   PCI-MSI-edge      nouveau
 46:    3923781    1274141   PCI-MSI-edge      p3p1
 47:    2215894   13051590   PCI-MSI-edge      iwl4965
 48:        350          0   PCI-MSI-edge      snd_hda_intel
NMI:       3885      21611   Non-maskable interrupts
LOC:   25062896   48054361   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:       3885      21611   Performance monitoring interrupts
IWI:    1483237    1342822   IRQ work interrupts
RTR:          0          0   APIC ICR read retries
RES:    5794165    6753835   Rescheduling interrupts
CAL:      16384      24149   Function call interrupts
TLB:    1069664     964191   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:        712        709   Machine check polls
THR:          0          0   Hypervisor callback interrupts
ERR:          0
MIS:          0

I did plug an sd card in the other day, AFAICT it worked fine.
Problem is still there:

 ls -l /var/log/messages
-rw-r--r-- 1 root root 125589321 Aug 23 08:01 /var/log/messages
[nbecker@nbecker1 ~]$ tail /var/log/messages
Aug 23 08:01:47 nbecker1 kernel: r592: IRQ: card added
Aug 23 08:01:47 nbecker1 kernel: r592: IRQ: DMA error
Aug 23 08:01:47 nbecker1 kernel: [211701.788999] r592: IRQ: card added
Aug 23 08:01:47 nbecker1 kernel: [211701.789005] r592: IRQ: DMA error
Aug 23 08:01:47 nbecker1 kernel: r592: IRQ: card added
Aug 23 08:01:47 nbecker1 kernel: r592: IRQ: DMA error
Aug 23 08:01:47 nbecker1 kernel: [211701.829044] r592: IRQ: card added
Aug 23 08:01:47 nbecker1 kernel: [211701.829055] r592: IRQ: DMA error
Aug 23 08:01:47 nbecker1 kernel: r592: IRQ: card added
Aug 23 08:01:47 nbecker1 kernel: r592: IRQ: DMA error

Comment 4 Maxim Levitsky 2014-08-23 15:33:17 UTC
OK, could you now find .config of your kernel and do


cat /boot/config-3.17.0-rc1+ | grep MMC_RICOH_MMC
# CONFIG_MMC_RICOH_MMC is not set


does it also say that this setting is not set?
(replace the /boot/config-3.17.0-rc1+ with your kernel config location)

I am pretty sure that this is the problem and its explains everything, I explain later.

I probably should send a patch to remove this option alltogher.

Best regards,
        Maxim Levitsky

Comment 5 Neal Becker 2014-08-23 17:44:13 UTC
uname -a
Linux nbecker1 3.15.10-200.fc20.x86_64 #1 SMP Thu Aug 14 15:39:24 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

/boot/config-3.15.10-200.fc20.x86_64
contains:

CONFIG_MMC_RICOH_MMC=y

Comment 6 Maxim Levitsky 2014-08-24 18:41:24 UTC
Could you also post output of
sudo lspci -H1

After the error happened of course.
Best regards,
       Maxim Levitsky

Comment 7 Neal Becker 2014-08-25 11:05:28 UTC
AFAICT this happens continuously as long as the machine is on

sudo lspci -H1
[sudo] password for nbecker: 
00:00.0 Host bridge: Intel Corporation Mobile PM965/GM965/GL960 Memory Controller Hub (rev 0c)
00:01.0 PCI bridge: Intel Corporation Mobile PM965/GM965/GL960 PCI Express Root Port (rev 0c)
00:1a.0 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #4 (rev 03)
00:1a.1 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #5 (rev 03)
00:1a.7 USB controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #2 (rev 03)
00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio Controller (rev 03)
00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 (rev 03)
00:1c.1 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 2 (rev 03)
00:1c.5 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 6 (rev 03)
00:1d.0 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #1 (rev 03)
00:1d.1 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #2 (rev 03)
00:1d.2 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #3 (rev 03)
00:1d.7 USB controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 (rev 03)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev f3)
00:1f.0 ISA bridge: Intel Corporation 82801HM (ICH8M) LPC Interface Controller (rev 03)
00:1f.1 IDE interface: Intel Corporation 82801HM/HEM (ICH8M/ICH8M-E) IDE Controller (rev 03)
00:1f.2 SATA controller: Intel Corporation 82801HM/HEM (ICH8M/ICH8M-E) SATA Controller [AHCI mode] (rev 03)
00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev 03)
01:00.0 VGA compatible controller: NVIDIA Corporation G86M [GeForce 8600M GS] (rev a1)
02:00.0 Network controller: Intel Corporation PRO/Wireless 4965 AG or AGN [Kedron] Network Connection (rev 61)
08:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 01)
09:09.0 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 IEEE 1394 Controller (rev 05)
09:09.1 SD Host controller: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 22)
09:09.2 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 12)
09:09.3 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 12)

Comment 8 Maxim Levitsky 2014-08-30 16:03:05 UTC
I am pretty sure I know what is going on here.

r592 IRQ is shared here with USB so its interrupt handler is flooded with interrupts that don't belong to it.

It asks device for interrupt status, but somehow device reports that it needs serving an interrupt.
I suspect that device is somehow disabled by BIOS but still visible on PCI bus.

Another thing to note, which I suspected is the CONFIG_MMC_RICOH_MMC, which seems not to be the cause of the problem, but it can cause it.

The problem is that this card reader usually is represented by 5 PCI functions, of  which 0th is the firewire, 1th is the SD card reader, 2nd is MMC, 3rd is Memstick, and lastly 4rd is the xD, and it works fine in this configuration after much effort by me and others.

This setting (CONFIG_MMC_RICOH_MMC) disables MMC function (and lets SD function take MMC cards)
This has nasty effect that now as in your case, the memstick and xD functions are number 2 and 3.

And in some cases, linux thinks so, but hardware don't, thats why I asked for lspci -H1 as this ask PCI directly what its there.

So in your case this 'disabler' does work, so its not the cause of the problem, but still I would recommend to try and disable it (need to recompile the kernel and do _cold_ boot to new kernel)

Another thing to try is to bring an real memstick card and try it, its interesting if error goes away when its inserted, that is maybe BIOS will reenable the reader then.

Best regards,
       Maxim Levitsky

Comment 9 Maxim Levitsky 2014-09-06 14:40:27 UTC
Any update?

Comment 10 Neal Becker 2014-09-06 18:04:20 UTC
I haven't tried building a new kernel yet.  Trying it now.  I haven't built a kernel in a long time.

I grabbed the rpm source 
yumdownloader --source kernel

Then in kernel.spec I did
%define buildid .local

I tried edit config-local, adding
CONFIG_MMC_RICOH_MMC=n

but then rpmbuild fails.
Without this, rpmbuild -bp kernel.spec will succeed, but doesn't do much:
...
# configuration written to .config
#
+ echo '# x86_64'
+ cat .config
+ find . '(' -name '*.orig' -o -name '*~' ')' -exec rm -f '{}' ';'
+ find . -name .gitignore -exec rm -f '{}' ';'
+ cd ..
+ exit 0

Compilation finished at Sat Sep  6 13:58:33

All it seems to do is some config, but not compile anything.
What do I need to do?

Comment 11 Neal Becker 2014-09-16 17:38:57 UTC
I built/installed a kernel
Linux nbecker1 3.16.2-200.local.fc20.x86_64 #1 SMP Tue Sep 16 13:12:11 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux

with
# CONFIG_MMC_RICOH_MMC is not set

This does NOT fix the problem.

The system is essentially unusable unless I kill rsyslogd.

Comment 12 Neal Becker 2014-09-16 18:46:36 UTC
I may have gotten to the bottom of this.

I cleaned out all the old journal files and rebooted.  Now I don't have rsyslogd using 100% cpu.

The issue with 

Jul 14 07:11:00 nbecker1 kernel: r592: IRQ: DMA error
Jul 14 07:11:00 nbecker1 kernel: r592: IRQ: card added

is probably gone.  I'm guessing that the reason I kept getting rsyslogd running 100% cpu is processing the OLD journal files.

I tried rebooting into the stock 3.16.2-200.fc20.x86_64 as well as my .local version, and I don't see the problem.

Comment 13 Maxim Levitsky 2014-09-17 22:37:45 UTC
Please note that setting CONFIG_MMC_RICOH_MMC changes survives reboots.
You need full shutdown (on laptop preferably with battery removed) to be sure that it was reset.

I would glad if you test new and old kernel this way.
Also the issue has good chance to appear after hibernating/suspend, so if you could, please test it.



Thanks a lot!!

Best regards,
      Maxim Levitsky

Comment 14 Justin M. Forbes 2014-11-13 16:03:57 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 20 kernel bugs.

Fedora 20 has now been rebased to 3.17.2-200.fc20.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 21, and are still experiencing this issue, please change the version to Fedora 21.

If you experience different issues, please open a new bug report for those.

Comment 15 Neal Becker 2014-11-13 18:12:56 UTC
Funny you should mention, just this morning I updated to 3.17.2-200.fc20 and the problem is _still there_.

I did rebuild from srpm, using:
CONFIG_MMC_RICOH_MMC=n

in config-local, and the problem appears to be gone.

So I believe CONFIG_MMC_RICOH_MMC=n is needed.

Comment 16 Neal Becker 2015-01-06 19:55:16 UTC
After updating f20->f21, this problem seemed to go away.

Now on kernel-3.17.7-300.fc21.x86_64, it's back with a 
vengence:

Jan  6 14:39:57 nbecker1 kernel: [57260.180986] r592: IRQ: card added
Jan  6 14:39:57 nbecker1 kernel: [57260.180998] r592: IRQ: DMA error
Jan  6 14:39:57 nbecker1 kernel: r592: IRQ: card added
Jan  6 14:39:57 nbecker1 kernel: r592: IRQ: DMA error
Jan  6 14:39:57 nbecker1 kernel: [57260.188919] r592: IRQ: card added
Jan  6 14:39:57 nbecker1 kernel: [57260.188927] r592: IRQ: DMA error
Jan  6 14:39:57 nbecker1 kernel: r592: IRQ: card added

(why isn't logging rate limiting???)

Guess back to custom kernel

Comment 17 Fedora Kernel Team 2015-02-24 16:24:42 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 20 kernel bugs.

Fedora 20 has now been rebased to 3.18.7-100.fc20.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 21, and are still experiencing this issue, please change the version to Fedora 21.

If you experience different issues, please open a new bug report for those.

Comment 18 Fedora Kernel Team 2015-04-28 18:25:28 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in over 4 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.

Comment 19 Ryan C. Underwood 2016-09-10 03:38:23 UTC
Still encountering this in kernel 4.4 when using the SD card.  I can do some tests if necessary.


Note You need to log in before you can comment on or make changes to this bug.