Bug 1607872 - possible rpi kernel mmc regression
Summary: possible rpi kernel mmc regression
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 29
Hardware: aarch64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-24 13:13 UTC by Ognian Tschakalov
Modified: 2019-11-15 12:43 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-17 20:09:19 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Ognian Tschakalov 2018-07-24 13:13:04 UTC
Hello,
I’m reading the health info of a SD card:

using 4.17.6-200.fc28.aarch64 on an rpi3 and rpi3+

with the following code segment
----------------------
struct mmc_ioc_cmd idata;
char data_out[SD_BLOCK_SIZE];
memset(&idata, 0, sizeof(idata));
memset(&data_out[0], 0, sizeof(__u8) * SD_BLOCK_SIZE);
idata.write_flag = 1;
idata.opcode = SD_GEN_CMD;
idata.arg = cmd56_arg;
idata.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
idata.blksz = SD_BLOCK_SIZE;
idata.blocks = 1;
mmc_ioc_cmd_set_data(idata, data_out);

ret = ioctl(fd, MMC_IOC_CMD, &idata);
----------------------

leads to

Jul 20 18:23:53 ogi-it-rpi-p kernel: sdhost-bcm2835 3f202000.mmc: timeout waiting for hardware interrupt.
Jul 20 18:23:53 ogi-it-rpi-p kernel: sdhost-bcm2835 3f202000.mmc: __mmc_blk_ioctl_cmd: data error -110

this was flawlessly working in fc27 so it is a regression

Actually this is the first of 2 consecutive commands send to the card; If I ignore the above error and continue with the second command I can successfully read the health data. The problem is that the above command takes 10 seconds (timeout) before it throws the above error…

Actually it is not expected to get any data from the above write command…
Maybe I have set the wrong flags or did something change between F27 and F28

Tested today with the latest rawhide -> same result

Any ideas?

Thanks a lot
Ognian

Comment 1 Laura Abbott 2018-10-01 21:29:04 UTC
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 28 kernel bugs.
 
Fedora 28 has now been rebased to 4.18.10-300.fc28.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you have moved on to Fedora 29, and are still experiencing this issue, please change the version to Fedora 29.
 
If you experience different issues, please open a new bug report for those.

Comment 2 Ognian Tschakalov 2018-10-02 12:18:55 UTC
just retested with
uname -a
Linux ogi-it-rpi-p 4.18.11-300.fc29.aarch64 #1 SMP Sun Sep 30 15:02:59 UTC 2018 aarch64 aarch64 aarch64 GNU/Linux

things are getting even worser:
previously although getting the timeout, at least system counties to work
now everything freezes (the sd card contains all file systems on the raspberry pie ...)

Comment 3 Hans de Goede 2018-10-02 12:23:58 UTC
Have you tried using a different sdcard? sdcards wear out pretty quickly when used as the rootfs for a raspberry pi.

Comment 4 Ognian Tschakalov 2018-10-02 12:40:01 UTC
The card I use is an industrial one, which "reports remaining lifetime" -> "the health status" On this particular card the health status is 90% (almost new) this is what the above code snippet is about...
In my opinion at the moment the mmc subsystem is heavily rebuild, and for me it looks like the mmc driver for the raspberry pie is not yet updated ...

Comment 5 Jeremy Cline 2018-12-03 17:30:27 UTC
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 29 kernel bugs.
 
Fedora 29 has now been rebased to 4.19.5-300.fc29.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you experience different issues, please open a new bug report for those.

Comment 6 Ognian Tschakalov 2018-12-04 19:11:53 UTC
As per https://github.com/raspberrypi/linux/issues/2728#issuecomment-444084349 this bug may be resolved with 4.20, so I'll suggest to leave it open at least until then...

Comment 7 Laura Abbott 2019-04-09 20:46:06 UTC
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 29 kernel bugs.
 
Fedora XX has now been rebased to 5.0.6  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you have moved on to Fedora 30, and are still experiencing this issue, please change the version to Fedora 30.
 
If you experience different issues, please open a new bug report for those.

Comment 8 Jan Kratochvil 2019-06-21 08:43:32 UTC
It still affects: kernel-5.1.11-300.fc30.aarch64
On Raspberry Pi 3B+ when I boot from MicroSD from USB adapter all works fine.
When I use the same MicroSD into the MicroSD slot of Raspberry it boots fine and when it should be display text login prompt it prints:
[   70.246299] sdhost-bcm2835 3f202000.mmc: timeout waiting for hardware interrupt.
And it is mostly dead (it pings but no sshd works anymore, NumLock works but no login prompt etc.).
Raspbian boots fine there even directly from MicroSD.

Comment 9 Jan Kratochvil 2019-06-21 10:55:59 UTC
And it happens still with: kernel-5.2.0-0.rc3.git3.1.fc31.aarch64

Comment 10 Peter Robinson 2019-06-21 17:19:32 UTC
(In reply to Jan Kratochvil from comment #8)
> It still affects: kernel-5.1.11-300.fc30.aarch64
> On Raspberry Pi 3B+ when I boot from MicroSD from USB adapter all works fine.
> When I use the same MicroSD into the MicroSD slot of Raspberry it boots fine
> and when it should be display text login prompt it prints:
> [   70.246299] sdhost-bcm2835 3f202000.mmc: timeout waiting for hardware
> interrupt.
> And it is mostly dead (it pings but no sshd works anymore, NumLock works but
> no login prompt etc.).
> Raspbian boots fine there even directly from MicroSD.

Well Raspbian uses quite a different kernel to upstream.

As requested on the thread on the mailing list please provide the following:
* The make/model of the mSD card in use
* The rating on the PSU.

Comment 11 Jan Kratochvil 2019-06-21 17:55:41 UTC
(In reply to Peter Robinson from comment #10)
> Well Raspbian uses quite a different kernel to upstream.

So maybe Fedora can either upstream the required fix or at least backport it? Something like the backport:
https://src.fedoraproject.org/rpms/kernel/blob/f29/f/usb-dwc2-Fix-DMA-cache-alignment-issues.patch

There are some bugreports about this kernel error on various forums/mailinglists/bugtrackers and it looks to me as it is some upstream kernel regression but I haven't tracked it down yet.


> As requested on the thread on the mailing list please provide the following:
> * The make/model of the mSD card in use

"Kingston C16G JAPAN, SDC2", bought in 2009. But that does not matter, the important part is that it works with Raspbian while it does not with Fedora. That is the same category of Raspbian working while Fedora does not like:
Bug 1708717
https://src.fedoraproject.org/rpms/kernel/blob/f29/f/usb-dwc2-Fix-DMA-cache-alignment-issues.patch
Bug 1692903

I was curious why everyone is using Raspbian due to various its flaws and I see now there are real reasons.


> * The rating on the PSU.

Original Raspberry PSU 2.5A, problem happens even if everything is disconnected (except power cable + MicroSD, using just WiFi connection).

Comment 12 Peter Robinson 2019-06-21 18:18:04 UTC
> So maybe Fedora can either upstream the required fix or at least backport
> it? Something like the backport:
> https://src.fedoraproject.org/rpms/kernel/blob/f29/f/usb-dwc2-Fix-DMA-cache-
> alignment-issues.patch

That link gives me a 404m but knowing the issue and the patch that is not a forward port from the Raspbian kernel, that's pulling in an already accepted upstream fix for the upstream linus kernel tree. Fedora, and me as the maintainer of the Raspberry Pi in Fedora, do not have the time and resources to dig through non upstreamed patches for possible problems that I can can not directly reproduce because I simply do not have the time to do that.

> There are some bugreports about this kernel error on various
> forums/mailinglists/bugtrackers and it looks to me as it is some upstream
> kernel regression but I haven't tracked it down yet.

So please appropriately reference them here rather than randomly hand wave about reports.

> > As requested on the thread on the mailing list please provide the following:
> > * The make/model of the mSD card in use
> 
> "Kingston C16G JAPAN, SDC2", bought in 2009. But that does not matter, the
> important part is that it works with Raspbian while it does not with Fedora.

We, as have many other distributions including Raspbian, have had issues with a lot of SD cards, as can be seen in the eLinux Raspberyry Pi documentation as referenced from the Fedora Raspberry Pi FAQ.

https://fedoraproject.org/wiki/Architectures/ARM/Raspberry_Pi?rd=Raspberry_Pi#Prerequisites
https://elinux.org/RPi_SD_cards

The Kingston cards, as can be seen from the quite large amounts of red in the table against the brand, have caused us a lot of problems in the past, not just on the Raspberry Pi, but across a large amount of Arm devices in general (feel free to search the list archives).

I'm sorry but the "it works with Raspbian and not with Fedora" isn't always a justifiable response. As I maintain the support for the Raspberry Pi in Fedora in my own time and mostly with my own resources I need to focus on what provides most users the best return. Given that the Kingston cards are have generally proven to be quite problematic I have always suggested other cards, even on none RPi devices in Fedora. I'm sorry if you disagree with that assessment, I would be happy to accept patches to improve that from the downstream kernel fork if you have the time to test them, I am sorry but I do not.

> That is the same category of Raspbian working while Fedora does not like:
> Bug 1708717

No it's not, those were both bugs with upstream kernels and not related to, at least that I'm aware of, to the Raspbian kernel.

> https://src.fedoraproject.org/rpms/kernel/blob/f29/f/usb-dwc2-Fix-DMA-cache-
> alignment-issues.patch

> Bug 1692903

And that is a bug in something completely unrelated to the kernel, and neither of the two referenced bugs have anything WHAT SO EVER to do with the usb patch you've referenced in the middle there. 

> I was curious why everyone is using Raspbian due to various its flaws and I
> see now there are real reasons.

Please don't provide insults here, if you actually wish me to engage further with you please be pleasant.

Comment 13 Jan Kratochvil 2019-06-21 19:30:18 UTC
(In reply to Peter Robinson from comment #12)
> > There are some bugreports about this kernel error on various
> > forums/mailinglists/bugtrackers and it looks to me as it is some upstream
> > kernel regression but I haven't tracked it down yet.
> 
> So please appropriately reference them here rather than randomly hand wave
> about reports.

https://lmgtfy.com/?q=%22sdhost-bcm2835%22+%22mmc%3A+timeout+waiting+for+hardware+interrupt%22


> I'm sorry but the "it works with Raspbian and not with Fedora" isn't always
> a justifiable response.
..
> I'm sorry if you disagree with that assessment, I would be happy to accept
> patches to improve that from the downstream kernel fork if you have the time
> to test them, I am sorry but I do not.

I really disagree. I understand you do not have to fix everything, it is great
you are willing to accept a fix.  I have verified now that it is a regression
since (as this kernel boots fine directly from my Kingston MicroSD):
  Fedora-Server-29-1.2.aarch64.raw.xz = kernel-4.18.16-300.fc29.aarch64

Yes, it is not a fix yet...


> > That is the same category of Raspbian working while Fedora does not like:
> > Bug 1708717
> 
> No it's not, those were both bugs with upstream kernels and not related to,
> at least that I'm aware of, to the Raspbian kernel.

It is the same category from my user pointer of view as Raspbian is still using
kernels 4.x so Raspbian was not affected by this 5.x regression.


> > https://src.fedoraproject.org/rpms/kernel/blob/f29/f/usb-dwc2-Fix-DMA-cache-
> > alignment-issues.patch
> 
> > Bug 1692903
> 
> And that is a bug in something completely unrelated to the kernel, and
> neither of the two referenced bugs have anything WHAT SO EVER to do with the
> usb patch you've referenced in the middle there. 

All three issues are the same category from my user pointer of view as if I install
Raspbian I am not affected by them while if I install Fedora I am affected by them.


> > I was curious why everyone is using Raspbian due to various its flaws and I
> > see now there are real reasons.
> 
> Please don't provide insults here, if you actually wish me to engage further
> with you please be pleasant.

I am very grateful for your work. I have been interested in Raspberry for many years
but I did not see a good enough Linux distribution for it. Just I still think Fedora
needs some final detailing, I cannot say Fedora for Raspberry is perfect yet but it
is really close, thanks!

Comment 14 Jan Kratochvil 2019-06-24 06:49:11 UTC
That should/could get resolved by removing:
  https://src.fedoraproject.org/rpms/kernel/blob/f30/f/bcm2835-cpufreq-add-CPU-frequency-control-driver.patch

A scratch build (from e9086bdbaaa1f966291adc784f375cc3a24c5762) is:
  kernel-5.1.12-300.nocpufreq.fc30.aarch64
  https://koji.fedoraproject.org/koji/taskinfo?taskID=35768074

Comment 15 Peter Robinson 2019-06-24 07:14:52 UTC
(In reply to Jan Kratochvil from comment #14)
> That should/could get resolved by removing:

You don't get to dictate that without actually contributing.

> https://src.fedoraproject.org/rpms/kernel/blob/f30/f/bcm2835-cpufreq-add-CPU-
> frequency-control-driver.patch

There's a different driver in the 5.2 rc series, which has been approved for upstream and will be in 5.3, from
5.2.0-0.rc4.git2.1 and later builds, you might want to try rc5

https://koji.fedoraproject.org/koji/buildinfo?buildID=1288824

> A scratch build (from e9086bdbaaa1f966291adc784f375cc3a24c5762) is:
>   kernel-5.1.12-300.nocpufreq.fc30.aarch64
>   https://koji.fedoraproject.org/koji/taskinfo?taskID=35768074

Comment 16 Jan Kratochvil 2019-06-24 08:17:11 UTC
(In reply to Peter Robinson from comment #15)
> (In reply to Jan Kratochvil from comment #14)
> > That should/could get resolved by removing:
> 
> You don't get to dictate that without actually contributing.

That's not true but I have now deleted several paragraphs of my reply here, let's stick only to the technical topic.


> There's a different driver in the 5.2 rc series, which has been approved for
> upstream and will be in 5.3, from
> 5.2.0-0.rc4.git2.1 and later builds, you might want to try rc5
> 
> https://koji.fedoraproject.org/koji/buildinfo?buildID=1288824

Yes, that works. Could you backport that to the latest stable Fedora release? Thanks.

Comment 17 Peter Robinson 2019-06-24 10:47:51 UTC
> > https://koji.fedoraproject.org/koji/buildinfo?buildID=1288824
> 
> Yes, that works. Could you backport that to the latest stable Fedora
> release? Thanks.

That is my intention, but being quite new I want it to bake in rawhide for a while.

This is literally the first reported regression with the previous driver, and given it sped up the device a lot it actively reduced the support requests of "the pi is so slow" so that's why we've had it. It's been in place since the 4.17 RC series and the first time there's been a reported issue has been the 5.0.x series here on a single type of SD card that's 10 years old. So it will be pushed to stable once I'm happy it doesn't have worse side effects that I have to support (I do the RPi support in my own time).

Comment 18 Justin M. Forbes 2019-08-20 17:43:39 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 29 kernel bugs.

Fedora 29 has now been rebased to 5.2.9-100.fc29.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 30, and are still experiencing this issue, please change the version to Fedora 30.

If you experience different issues, please open a new bug report for those.

Comment 19 Justin M. Forbes 2019-09-17 20:09:19 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 3 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.

Comment 20 Ognian Tschakalov 2019-11-15 12:43:30 UTC
Tried
Linux localhost.localdomain 5.3.7-301.fc31.aarch64 #1 SMP Mon Oct 21 19:03:54 UTC 2019 aarch64 aarch64 aarch64 GNU/Linux

sdmon_src]$ dmesg |grep bcm2835
[    4.842170] bcm2835-power bcm2835-power: Broadcom BCM2835 power domains driver
[    4.923211] bcm2835-mbox 3f00b880.mailbox: mailbox enabled
[    9.440223] sdhost-bcm2835 3f202000.mmc: loaded - DMA enabled (>1)
[   28.472965] bcm2835-rng 3f104000.rng: hwrng registered
[   28.548775] bcm2835-wdt bcm2835-wdt: Broadcom BCM2835 watchdog timer
[   30.575757] snd_bcm2835: module is from the staging directory, the quality is unknown, you have been warned.
[   30.599164] bcm2835_audio bcm2835_audio: card created with 8 channels
[   30.708909] bcm2835_v4l2: module is from the staging directory, the quality is unknown, you have been warned.
[  224.473284] sdhost-bcm2835 3f202000.mmc: timeout waiting for hardware interrupt.
[  224.553300] sdhost-bcm2835 3f202000.mmc: __mmc_blk_ioctl_cmd: data error -110
[  234.713343] sdhost-bcm2835 3f202000.mmc: timeout waiting for hardware interrupt.

so for my use case nothing change...


Note You need to log in before you can comment on or make changes to this bug.