Bug 1878347 - f32 fails to boot from SD card on Macbook since kernel 5.8.6
Summary: f32 fails to boot from SD card on Macbook since kernel 5.8.6
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 32
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-12 06:40 UTC by Pany
Modified: 2020-11-24 01:52 UTC (History)
24 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2020-11-24 01:52:54 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
F32 kernel 5.8.7 fails to boot with LUKS LVM home (2.98 MB, image/jpeg)
2020-09-12 06:40 UTC, Pany
no flags Details
f32 kernel 5.8.10 with ext4 root fs fails to boot (122.59 KB, text/plain)
2020-09-26 05:58 UTC, Pany
no flags Details
f32 kernel 5.8.11 with ext4 root fs fails to boot (125.77 KB, text/plain)
2020-09-28 02:13 UTC, Pany
no flags Details
f32 kernel 5.8.11 systemd.debug-shell=1 journal (174.00 KB, text/plain)
2020-09-29 03:08 UTC, Pany
no flags Details
edit grub2 entry (replace rhgb quiet with systemd.debug-shell=1) (1.42 MB, image/jpeg)
2020-09-29 03:33 UTC, Pany
no flags Details
systemd.debug-shell=1 LUKS prompt to decrypt (1.77 MB, image/jpeg)
2020-09-29 03:35 UTC, Pany
no flags Details
systemd.debug-shell=1 kernel 5.8.11 boot failed (2.29 MB, image/jpeg)
2020-09-29 03:36 UTC, Pany
no flags Details
systemd.debug-shell=1 tty9 root FS was mounted read only (2.25 MB, image/jpeg)
2020-09-29 03:38 UTC, Pany
no flags Details
systemd.debug-shell=1 tty9 root FS was write protected (2.42 MB, image/jpeg)
2020-09-29 03:38 UTC, Pany
no flags Details
f32 kernel 5.8.5 booted successfully (240.10 KB, text/plain)
2020-09-29 04:02 UTC, Pany
no flags Details
f32 kernel 5.8.6 systemd.debug-shell=1 journal (162.85 KB, text/plain)
2020-09-29 08:26 UTC, Pany
no flags Details
lsinitrd /boot/initramfs-5.8.5-200.fc32.x86_64.img (the good one) (39.22 KB, text/plain)
2020-09-30 03:38 UTC, Pany
no flags Details
lsinitrd /boot/initramfs-5.8.6-200.fc32.x86_64.img (the failed one) (38.95 KB, text/plain)
2020-09-30 03:40 UTC, Pany
no flags Details

Description Pany 2020-09-12 06:40:16 UTC
Created attachment 1714633 [details]
F32 kernel 5.8.7 fails to boot with LUKS LVM home

Created attachment 1714633 [details]
F32 kernel 5.8.7 fails to boot with LUKS LVM home

1. Please describe the problem:
Fedora 32 Xfce worked very well on Macbook Pro 13' (Early 2015), until the kernel upgrade from 5.8.4 to 5.8.6 and 5.8.7, which cause the LUKS encrypted LVM /home volume could no longer be mounted.

2. What is the Version-Release number of the kernel:
Fedora release 32 (Thirty Two)

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
kernel-5.8.6-201.fc32

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
- Install a new Fedora 32 minimal OS
- Custom partition, encrypt LVM with LUKS
- Finish installation and reboot
- Select kernel-5.8.7-200.fc32 and enter the passphase
- OS hangs at "A start job is running for /dev/mapper/fedora_localhost_--live-home (<time> / no limit)" (please see the attachment)

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:
Sorry, I didn't try it.

6. Are you running any modules that not shipped with directly Fedora's kernel?:
Nope

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Comment 1 Pany 2020-09-21 01:49:52 UTC
Still not working with kernel-5.8.9-200.fc32.x86_64

Comment 2 Pany 2020-09-26 02:41:52 UTC
- this error persists after a system reinstall with both F32 & F33
- 5.8.6 works from the live USB but not from the installed system
- it is independent of the root FS (ext4/xfs)
- independent of whether /home gets mounted or not

Discussion on Ask Fedora https://ask.fedoraproject.org/t/f32-kernel-5-8-9-fails-to-boot-on-macbook-pro/9299

I have tried following steps:

1. I created a F33 Xfce Live USB and booted with it successfully. As a external storage, LUKS can be decrypted. Meanwhile, LVM root(xfs) and LVM home(ext4) both can be mounted (rw). So far so good.
Then I started the f33 (pre-release) installation procedure – reformatted the LVM root with xfs, kept the LVM home(ext4). It was a perfect installation, after which I tried booting with f33 (kernel 5.8.6). Unfortunately, the issue was still there.
See the journal here:
https://paste.centos.org/view/b88fed31
 f33-xfs-boot-failed

2. I repeated the f33 installation, but reformatted the LVM root with ext4, kept the LVM home(ext4). The failure seemed to be same.
See the journal here:
https://paste.centos.org/view/f28de1e3

3. At last, I reinstalled f32 again, and reformatted the LVM root with ext4, kept the LVM home(ext4). f32 (kernel 5.6.6) works very well.

4. I made an update, sudo dnf up -y , Then, rebooted to kernel 5.8.10, it hung there again.
See the journal here:
https://paste.centos.org/view/00428ad5

Comment 3 Zamir SUN 2020-09-26 03:34:46 UTC
Hi Pany,

Looks like the paste bin link you offered already expired. Will you please re-collect those log and upload to bugzilla as attachments?

Comment 4 Pany 2020-09-26 05:58:37 UTC
Created attachment 1716753 [details]
f32 kernel 5.8.10 with ext4 root fs fails to boot

Comment 5 Pany 2020-09-28 02:13:52 UTC
Created attachment 1717170 [details]
f32 kernel 5.8.11 with ext4 root fs fails to boot

Comment 6 Chris Murphy 2020-09-28 08:18:47 UTC
The screenshot shows XFS metadata and IO errors.

None of this is found in the text log files attached, they just stop after systemd journal is flushed. It sounds like the problem XFS file system was reformatted so evidence of the original problem maybe gone.

In the text logs attached, I see no errors. Root has been successfully mounted, swap is activated, but nothing else is mounted yet. Since sysroot is read-write and the journal is being flushed - later events should also have made it into the journal. This can be retrieved using '-b' flag and choosing a prior boot: -1 for the previous boot, -2 for the boot before that, etc.

It might be worth trying to boot with 'systemd.debug-shell=1' boot parameter, and seeing if you can switch to tty9. In this case you could just:
# journalctl -b > journal-fail.txt
# reboot

And then boot a working kernel, and attach the failure journal.

Comment 7 Chris Murphy 2020-09-28 08:39:26 UTC
In the failing case, there is this message:

Sep 28 09:27:08 fedora.local systemd-cryptsetup[470]: WARNING: Locking directory /run/cryptsetup is missing!

Do you see it in the successful case? The dm-crypt device is unlocked, and system root mounts.

Comment 8 Chris Murphy 2020-09-28 08:47:57 UTC
Does the problem start with kernel 5.8.5 or 5.8.6?

5.8.5 for fc32
https://koji.fedoraproject.org/koji/buildinfo?buildID=1601937

You'll need these three, make sure to grab the correct arch:

	kernel-5.8.5-200.fc32.x86_64.rpm (info) (download)
	kernel-core-5.8.5-200.fc32.x86_64.rpm (info) (download)
	kernel-modules-5.8.5-200.fc32.x86_64.rpm (info) (download)

Comment 9 Chris Murphy 2020-09-28 08:55:18 UTC
I guess the problem with the kernel hypothesis is that you're able to boot Fedora 33 pre-release media, which has been using kernel 5.8.6 for a while. Can you try a recent nightly? You don't have to reinstall, just see if you can boot, and also useful if you can unlock the LUKS volume, and mount either home or root.

20200927 workstation image for sure works in qemu-kvm, and has kernel 5.8.11
https://kojipkgs.fedoraproject.org/compose/branched/

Comment 10 Pany 2020-09-29 02:06:32 UTC
(In reply to Chris Murphy from comment #7)
> In the failing case, there is this message:
> 
> Sep 28 09:27:08 fedora.local systemd-cryptsetup[470]: WARNING: Locking
> directory /run/cryptsetup is missing!
> 
> Do you see it in the successful case? The dm-crypt device is unlocked, and
> system root mounts.

Yep, in the successful case (kernel 5.6.6), journal shows following lines:

 997   │ Sep 29 09:34:31 localhost.localdomain systemd[1]: Starting Cryptography Setup for luks-65d
       │ 9ed28-ea08-4ea5-a1dd-7b2b086f5e09...
 998   │ Sep 29 09:34:31 localhost.localdomain systemd-cryptsetup[544]: WARNING: Locking directory 
       │ /run/cryptsetup is missing!
 999   │ Sep 29 09:34:31 localhost.localdomain systemd[1]: Started Forward Password Requests to Ply
       │ mouth.

Comment 11 Pany 2020-09-29 03:08:22 UTC
Created attachment 1717399 [details]
f32 kernel 5.8.11 systemd.debug-shell=1 journal

Comment 12 Pany 2020-09-29 03:23:42 UTC
(In reply to Chris Murphy from comment #6)
> The screenshot shows XFS metadata and IO errors.
> 
> None of this is found in the text log files attached, they just stop after
> systemd journal is flushed. It sounds like the problem XFS file system was
> reformatted so evidence of the original problem maybe gone.
> 
> In the text logs attached, I see no errors. Root has been successfully
> mounted, swap is activated, but nothing else is mounted yet. Since sysroot
> is read-write and the journal is being flushed - later events should also
> have made it into the journal. This can be retrieved using '-b' flag and
> choosing a prior boot: -1 for the previous boot, -2 for the boot before
> that, etc.
> 
> It might be worth trying to boot with 'systemd.debug-shell=1' boot
> parameter, and seeing if you can switch to tty9. In this case you could just:
> # journalctl -b > journal-fail.txt
> # reboot
> 
> And then boot a working kernel, and attach the failure journal.

https://bugzilla.redhat.com/show_bug.cgi?id=1878347#c11 is the journal

Here is what I did to get the journal:

1. At grub menu, press 'e' to edit kernel 5.8.11 entry
2. Replace 'rhgh quiet' with 'systemd.debug-shell=1'
3. Press F10 to boot (the Ctrl+x didn't work, but I think it's another topic)
4. Switch to tty9
5. 'mount' shows the '/' was read-only, and 'mount / -o remount,rw' says the device is write-protected (I'll attach a screenshot later)
6. Mount the boot partition to /boot, then 'journalctl -b > /boot/journal-fail.txt'

Comment 13 Pany 2020-09-29 03:33:52 UTC
Created attachment 1717401 [details]
edit grub2 entry (replace rhgb quiet with systemd.debug-shell=1)

Comment 14 Pany 2020-09-29 03:35:13 UTC
Created attachment 1717402 [details]
systemd.debug-shell=1 LUKS prompt to decrypt

Comment 15 Pany 2020-09-29 03:36:45 UTC
Created attachment 1717403 [details]
systemd.debug-shell=1 kernel 5.8.11 boot failed

Comment 16 Chris Murphy 2020-09-29 03:37:39 UTC
Sep 29 10:49:28 fedora.local kernel: blk_update_request: I/O error, dev sdb, sector 197394688 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
Sep 29 10:49:28 fedora.local kernel: blk_update_request: I/O error, dev sdb, sector 197394688 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
...
Sep 29 10:49:28 fedora.local kernel: EXT4-fs (dm-1): I/O error while writing superblock
...
Sep 29 10:49:28 fedora.local kernel: Buffer I/O error on dev dm-1, logical block 0, lost sync page write

This is bad... I can't tell if this is some strange block layer bug we haven't seen before or if it's transient hardware problems. But you have literally no errors at all with kernel 5.8.4?

Comment 17 Pany 2020-09-29 03:38:03 UTC
Created attachment 1717404 [details]
systemd.debug-shell=1 tty9 root FS was mounted read only

Comment 18 Pany 2020-09-29 03:38:55 UTC
Created attachment 1717405 [details]
systemd.debug-shell=1 tty9 root FS was write protected

Comment 19 Chris Murphy 2020-09-29 04:00:20 UTC
The I/O errors are on dev sdb, which I'm pretty sure is this (there are two SD Card Readers reported, sdb and sdc)

Sep 29 10:49:17 fedora.local kernel: scsi 1:0:0:0: Direct-Access     APPLE    SD Card Reader   3.00 PQ: 0 ANSI: 6

I suspect the SD Card is failing, just because they are notoriously unreliable and tend to have transient failures before outright failing. But then I found this. It might be related...

commit 1bbd2c5daed796a411f2f9bc33699013aff8726a
Author: Manish Narani <manish.narani>
Date:   Wed Jul 15 00:42:33 2020 +0530

    mmc: sdhci-of-arasan: fix timings allocation code

This is in 5.8.6 but not 5.8.5.

Comment 20 Pany 2020-09-29 04:02:19 UTC
Created attachment 1717406 [details]
f32 kernel 5.8.5 booted successfully

(In reply to Chris Murphy from comment #8)
> Does the problem start with kernel 5.8.5 or 5.8.6?
> 
> 5.8.5 for fc32
> https://koji.fedoraproject.org/koji/buildinfo?buildID=1601937
> 
> You'll need these three, make sure to grab the correct arch:
> 
> 	kernel-5.8.5-200.fc32.x86_64.rpm (info) (download)
> 	kernel-core-5.8.5-200.fc32.x86_64.rpm (info) (download)
> 	kernel-modules-5.8.5-200.fc32.x86_64.rpm (info) (download)

kernel-5.8.5-200 can boot successfully

Comment 21 Pany 2020-09-29 04:20:53 UTC
(In reply to Chris Murphy from comment #8)
> Does the problem start with kernel 5.8.5 or 5.8.6?
> 
> 5.8.5 for fc32
> https://koji.fedoraproject.org/koji/buildinfo?buildID=1601937
> 
> You'll need these three, make sure to grab the correct arch:
> 
> 	kernel-5.8.5-200.fc32.x86_64.rpm (info) (download)
> 	kernel-core-5.8.5-200.fc32.x86_64.rpm (info) (download)
> 	kernel-modules-5.8.5-200.fc32.x86_64.rpm (info) (download)

And I downloaded kernel 5.8.6-200 from

https://koji.fedoraproject.org/koji/buildinfo?buildID=1605304

then run

dnf in kernel-{,core-,modules-}5.8.6-200.fc32.x86_64.rpm

It failed to boot. So maybe something different between 5.8.5-200 and 5.8.6-200 cause the failure?

Comment 22 Chris Murphy 2020-09-29 04:42:15 UTC
I wonder if there's any interaction going on between brcm (wifi) and the sd card slot. I also see brcmfmac messages interlaced with these errors and the stack trace. There is a problem reading lib firmware, but that might be the result of the mmc patch.

Sep 29 10:49:28 fedora.local kernel: brcmfmac 0000:03:00.0: loading /lib/firmware/brcm/brcmfmac43602-pcie.bin failed with error -5

Also in 5.8.6...

commit f02889decdc2ebcacb1e51e00829db835e8133d1
Author: Nicolas Saenz Julienne <nsaenzjulienne>
Date:   Tue Jul 21 13:23:02 2020 +0200

    brcmfmac: Set timeout value when configuring power save


This could be a coincidence, and distraction. My old 2011 mac, I'm pretty sure the firmware for wifi was also needed for mmcblk/sd card to work correctly.

At this point I think it's worth posting an email to the upstream mmc list, summarizing the findings so far, and you've narrowed it down to a change in 5.8.6. Consider including the suspicion (the two commits I've referred to) and you can reference this bug URL in your email. They'll want to know right off the bat that you're booting from SD Card slot, and the first failure messages (c16 should suffice).
http://vger.kernel.org/vger-lists.html#linux-mmc

I suggest also attaching to this bug report the output from: lspci -vvnn

You can reference the first line (each) for SD host controller and the 802.11 wireless, include that in the email to mmc list.

If you're feeling adventurous, then you could compile the kernel and try reverting (separately) these commits and see if that fixes the problem, that'd be fairly conclusive. If it doesn't then...it's harder to narrow down what's going on.

Comment 23 Chris Murphy 2020-09-29 04:45:29 UTC
(In reply to Pany from comment #21)
> It failed to boot. So maybe something different between 5.8.5-200 and 5.8.6-200 cause the failure?

Looks like it. The commits I refer to in c22 and c19 are in the 5.8.6 changelog.
https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.8.6

Comment 24 Pany 2020-09-29 05:50:40 UTC
(In reply to Chris Murphy from comment #19)
> The I/O errors are on dev sdb, which I'm pretty sure is this (there are two
> SD Card Readers reported, sdb and sdc)
> 
> Sep 29 10:49:17 fedora.local kernel: scsi 1:0:0:0: Direct-Access     APPLE  
> SD Card Reader   3.00 PQ: 0 ANSI: 6
> 
> I suspect the SD Card is failing, just because they are notoriously
> unreliable and tend to have transient failures before outright failing. But
> then I found this. It might be related...
> 
> commit 1bbd2c5daed796a411f2f9bc33699013aff8726a
> Author: Manish Narani <manish.narani>
> Date:   Wed Jul 15 00:42:33 2020 +0530
> 
>     mmc: sdhci-of-arasan: fix timings allocation code
> 
> This is in 5.8.6 but not 5.8.5.

My disk layout:

$ sudo lsblk
NAME                                          MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                                             8:0    0   113G  0 disk  
├─sda1                                          8:1    0   200M  0 part  
└─sda2                                          8:2    0 112.2G  0 part  
sdb                                             8:16   1   116G  0 disk  
├─sdb1                                          8:17   1   600M  0 part  
├─sdb2                                          8:18   1   600M  0 part  /boot/efi
├─sdb3                                          8:19   1     1G  0 part  /boot
└─sdb4                                          8:20   1 113.8G  0 part  
  └─luks-65d9ed28-ea08-4ea5-a1dd-7b2b086f5e09 253:0    0 113.8G  0 crypt 
    ├─fedora_localhost--live-root             253:1    0    70G  0 lvm   /
    ├─fedora_localhost--live-swap             253:2    0   7.8G  0 lvm   [SWAP]
    └─fedora_localhost--live-home             253:3    0    36G  0 lvm   /home

sda (SATA SSD shipped with Macbook) is for macOS and sdb (an SD card inserted into the Macbook SD Card Reader) is for fedora.

Comment 25 Pany 2020-09-29 08:26:29 UTC
Created attachment 1717469 [details]
f32 kernel 5.8.6 systemd.debug-shell=1 journal

(In reply to Chris Murphy from comment #22)
> I wonder if there's any interaction going on between brcm (wifi) and the sd
> card slot. I also see brcmfmac messages interlaced with these errors and the
> stack trace. There is a problem reading lib firmware, but that might be the
> result of the mmc patch.
> 
> Sep 29 10:49:28 fedora.local kernel: brcmfmac 0000:03:00.0: loading
> /lib/firmware/brcm/brcmfmac43602-pcie.bin failed with error -5
> 
> Also in 5.8.6...
> 
> commit f02889decdc2ebcacb1e51e00829db835e8133d1
> Author: Nicolas Saenz Julienne <nsaenzjulienne>
> Date:   Tue Jul 21 13:23:02 2020 +0200
> 
>     brcmfmac: Set timeout value when configuring power save
> 
> 
> This could be a coincidence, and distraction. My old 2011 mac, I'm pretty
> sure the firmware for wifi was also needed for mmcblk/sd card to work
> correctly.
> 

In order to avoid any misleading info, I booted kernel 5.8.6 with systemd.debug-shell=1, but there was no brcmfmac error.

> At this point I think it's worth posting an email to the upstream mmc list,
> summarizing the findings so far, and you've narrowed it down to a change in
> 5.8.6. Consider including the suspicion (the two commits I've referred to)
> and you can reference this bug URL in your email. They'll want to know right
> off the bat that you're booting from SD Card slot, and the first failure
> messages (c16 should suffice).
> http://vger.kernel.org/vger-lists.html#linux-mmc
> 
> I suggest also attaching to this bug report the output from: lspci -vvnn
> 
> You can reference the first line (each) for SD host controller and the
> 802.11 wireless, include that in the email to mmc list.
> 

'lspci -vvnn' shows:
...
00:14.0 USB controller [0c03]: Intel Corporation Wildcat Point-LP USB xHCI Controller [8
086:9cb1] (rev 03) (prog-if 30 [XHCI])
        Subsystem: Intel Corporation Device [8086:7270]
...
03:00.0 Network controller [0280]: Broadcom Inc. and subsidiaries BCM43602 802.11ac Wireless LAN SoC [14e4:43ba] (rev 01)
        Subsystem: Apple Inc. Device [106b:0133]
...
03:00.0 Network controller [0280]: Broadcom Inc. and subsidiaries BCM43602 802.11ac Wireless LAN SoC [14e4:43ba] (rev 01)
        Subsystem: Apple Inc. Device [106b:0133]
...

I wonder if the SD card controller is handled by the USB controller, since 'lsusb -vv' shows:

Bus 002 Device 002: ID 05ac:8406 Apple, Inc. Internal Memory Card Reader
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               3.00
  bDeviceClass            0 
  bDeviceSubClass         0 
  bDeviceProtocol         0 
  bMaxPacketSize0         9
  idVendor           0x05ac Apple, Inc.
  idProduct          0x8406 Internal Memory Card Reader
...


> If you're feeling adventurous, then you could compile the kernel and try
> reverting (separately) these commits and see if that fixes the problem,
> that'd be fairly conclusive. If it doesn't then...it's harder to narrow down
> what's going on.

To be honest, it's kinda hard for me, but I'll try it, just need some time to search for docs and guide.

Comment 26 Pany 2020-09-29 08:41:40 UTC
(In reply to Chris Murphy from comment #19)
> The I/O errors are on dev sdb, which I'm pretty sure is this (there are two
> SD Card Readers reported, sdb and sdc)
> 
> Sep 29 10:49:17 fedora.local kernel: scsi 1:0:0:0: Direct-Access     APPLE  
> SD Card Reader   3.00 PQ: 0 ANSI: 6
> 
> I suspect the SD Card is failing, just because they are notoriously
> unreliable and tend to have transient failures before outright failing. But
> then I found this. It might be related...
> 
> commit 1bbd2c5daed796a411f2f9bc33699013aff8726a
> Author: Manish Narani <manish.narani>
> Date:   Wed Jul 15 00:42:33 2020 +0530
> 
>     mmc: sdhci-of-arasan: fix timings allocation code
> 
> This is in 5.8.6 but not 5.8.5.

Another detail:

When trying to boot kernel 5.8.6 with systemd.debug-shell=1 to save the journal, I had to mount the boot partition since the root FS was write-protected. Here comes the wired thing - I expected the SD card I boot from to be /dev/sdb and /dev/sdb3 should be the boot partition, but it turned out that only /dev/sdc3 could be mounted as ext4 FS successfully, and it's exactly the /boot partition as well as the /dev/sdb3 when I running kernel 5.8.5 (the good one).

There's only one SATA SSD (sda, macOS) and one SD Card Reader (fedora installed into the SD card) on Macbook. And the journal do show two SD Card Readers - sdb and sdc.

Comment 27 Zamir SUN 2020-09-29 11:51:51 UTC
(In reply to Pany from comment #25)
> > If you're feeling adventurous, then you could compile the kernel and try
> > reverting (separately) these commits and see if that fixes the problem,
> > that'd be fairly conclusive. If it doesn't then...it's harder to narrow down
> > what's going on.
> 
> To be honest, it's kinda hard for me, but I'll try it, just need some time
> to search for docs and guide.

I've built a scratch build kernel with the commit in comment 22 reverted. Maybe you can try this one
https://koji.fedoraproject.org/koji/taskinfo?taskID=52449450

Note, the scratch build usually expires in 3 days (or 7 days? I don't remember for sure). So you might want to download it to your local disk as soon as possible.

Comment 28 Eric Sandeen 2020-09-29 14:34:51 UTC
(In reply to Chris Murphy from comment #16)
> Sep 29 10:49:28 fedora.local kernel: blk_update_request: I/O error, dev sdb,
> sector 197394688 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> Sep 29 10:49:28 fedora.local kernel: blk_update_request: I/O error, dev sdb,
> sector 197394688 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> ...
> Sep 29 10:49:28 fedora.local kernel: EXT4-fs (dm-1): I/O error while writing
> superblock
> ...
> Sep 29 10:49:28 fedora.local kernel: Buffer I/O error on dev dm-1, logical
> block 0, lost sync page write
> 
> This is bad... I can't tell if this is some strange block layer bug we
> haven't seen before or if it's transient hardware problems. But you have
> literally no errors at all with kernel 5.8.4?

Block layer errors, noting to do with the filesystems.  Dropping back off cc, sorry ;)

Comment 29 Chris Murphy 2020-09-29 20:17:22 UTC
(In reply to Pany from comment #25)
> 'lspci -vvnn' shows:
> ...
> 00:14.0 USB controller [0c03]: Intel Corporation Wildcat Point-LP USB xHCI
> Controller [8
> 086:9cb1] (rev 03) (prog-if 30 [XHCI])
>         Subsystem: Intel Corporation Device [8086:7270]
> ...
> 
> I wonder if the SD card controller is handled by the USB controller, since
> 'lsusb -vv' shows:
> 
> Bus 002 Device 002: ID 05ac:8406 Apple, Inc Internal Memory Card Reader
> Device Descriptor:

Good catch. There's a bunch of usb related commits in 5.8.6 too. I have no idea which ones are likely suspects. It's probably easiest to do a stable kernel bisect between tags 5.8.5 and 5.8.6 and let git bisect manage everything.

> To be honest, it's kinda hard for me, but I'll try it, just need some time
> to search for docs and guide.

I use this as a starting point: https://kernelnewbies.org/KernelBuild

For this case you want the stable tree. And for git bisect, maybe this as a starting point
https://thoughtbot.com/blog/git-bisect

Best to ask on #fedora-qa or #fedora-devel for cmurf, and I can help you abbreviate the process. For sure you will want to use 'make localmodconfig' which will build a kernel for your system, largely not building things you're not using - so why spend the time compiling those things?


As for /dev/sda /dev/sdb /dev/sdc being inconsistent - that's expected. Every boot those could get assigned to different devices.

Comment 30 Pany 2020-09-30 02:45:35 UTC
(In reply to Zamir SUN from comment #27)
> I've built a scratch build kernel with the commit in comment 22 reverted.
> Maybe you can try this one
> https://koji.fedoraproject.org/koji/taskinfo?taskID=52449450
> 
> Note, the scratch build usually expires in 3 days (or 7 days? I don't
> remember for sure). So you might want to download it to your local disk as
> soon as possible.

Thank you, but the boot failure seems the same...

Comment 31 Zamir SUN 2020-09-30 03:20:32 UTC
Just random thought, can you check the initramfs of both the working kernel and the non-functional kernel with the following command, and share the output here?

# sudo lsinitrd /boot/initramfs-$(uname -r).img | egrep "module|firmware"

For the non-functional kernel, just replace $(uname -r) with the kernel version.

Comment 32 Pany 2020-09-30 03:38:47 UTC
Created attachment 1717728 [details]
lsinitrd /boot/initramfs-5.8.5-200.fc32.x86_64.img (the good one)

(In reply to Zamir SUN from comment #31)
> Just random thought, can you check the initramfs of both the working kernel
> and the non-functional kernel with the following command, and share the
> output here?
> 
> # sudo lsinitrd /boot/initramfs-$(uname -r).img | egrep "module|firmware"
> 
> For the non-functional kernel, just replace $(uname -r) with the kernel
> version.

I downloaded kernel 5.8.5-200 from here:

https://koji.fedoraproject.org/koji/buildinfo?buildID=1601937

then:

sudo dnf in kernel-{,core-,modules-}5.8.5-200.fc32.x86_64.rpm

This is the latest kernel I can boot successfully.

Comment 33 Pany 2020-09-30 03:40:17 UTC
Created attachment 1717729 [details]
lsinitrd /boot/initramfs-5.8.6-200.fc32.x86_64.img (the failed one)

(In reply to Zamir SUN from comment #31)
> Just random thought, can you check the initramfs of both the working kernel
> and the non-functional kernel with the following command, and share the
> output here?
> 
> # sudo lsinitrd /boot/initramfs-$(uname -r).img | egrep "module|firmware"
> 
> For the non-functional kernel, just replace $(uname -r) with the kernel
> version.

I downloaded kernel 5.8.6-200 from here:

https://koji.fedoraproject.org/koji/buildinfo?buildID=1605304

then:

sudo dnf in kernel-{,core-,modules-}5.8.6-200.fc32.x86_64.rpm

This is the first kernel I can not boot.

Comment 34 Pany 2020-09-30 16:22:06 UTC
(In reply to Chris Murphy from comment #19)
> The I/O errors are on dev sdb, which I'm pretty sure is this (there are two
> SD Card Readers reported, sdb and sdc)
> 
> Sep 29 10:49:17 fedora.local kernel: scsi 1:0:0:0: Direct-Access     APPLE  
> SD Card Reader   3.00 PQ: 0 ANSI: 6
> 
> I suspect the SD Card is failing, just because they are notoriously
> unreliable and tend to have transient failures before outright failing. But
> then I found this. It might be related...
> 
> commit 1bbd2c5daed796a411f2f9bc33699013aff8726a
> Author: Manish Narani <manish.narani>
> Date:   Wed Jul 15 00:42:33 2020 +0530
> 
>     mmc: sdhci-of-arasan: fix timings allocation code
> 
> This is in 5.8.6 but not 5.8.5.

I made a patch to revert this commit:

$ cat 0001-revert1bbd2c5.patch                     
From bde53877b039a31ac884987e0358a41179686f97 Mon Sep 17 00:00:00 2001
From: Pany <pany>
Date: Wed, 30 Sep 2020 14:23:46 +0800
Subject: [PATCH] revert 1bbd2c5

---
 drivers/mmc/host/sdhci-of-arasan.c | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/drivers/mmc/host/sdhci-of-arasan.c b/drivers/mmc/host/sdhci-of-arasan.c
index d0a80bfb9..fb26e743e 100644
--- a/drivers/mmc/host/sdhci-of-arasan.c
+++ b/drivers/mmc/host/sdhci-of-arasan.c
@@ -1025,6 +1025,7 @@ static void arasan_dt_read_clk_phase(struct device *dev,
 static void arasan_dt_parse_clk_phases(struct device *dev,
 				       struct sdhci_arasan_clk_data *clk_data)
 {
+	int *iclk_phase, *oclk_phase;
 	u32 mio_bank = 0;
 	int i;
 
@@ -1036,32 +1037,28 @@ static void arasan_dt_parse_clk_phases(struct device *dev,
 	clk_data->set_clk_delays = sdhci_arasan_set_clk_delays;
 
 	if (of_device_is_compatible(dev->of_node, "xlnx,zynqmp-8.9a")) {
-		u32 zynqmp_iclk_phase[MMC_TIMING_MMC_HS400 + 1] =
-			ZYNQMP_ICLK_PHASE;
-		u32 zynqmp_oclk_phase[MMC_TIMING_MMC_HS400 + 1] =
-			ZYNQMP_OCLK_PHASE;
+		iclk_phase = (int [MMC_TIMING_MMC_HS400 + 1]) ZYNQMP_ICLK_PHASE;
+		oclk_phase = (int [MMC_TIMING_MMC_HS400 + 1]) ZYNQMP_OCLK_PHASE;
 
 		of_property_read_u32(dev->of_node, "xlnx,mio-bank", &mio_bank);
 		if (mio_bank == 2) {
-			zynqmp_oclk_phase[MMC_TIMING_UHS_SDR104] = 90;
-			zynqmp_oclk_phase[MMC_TIMING_MMC_HS200] = 90;
+			oclk_phase[MMC_TIMING_UHS_SDR104] = 90;
+			oclk_phase[MMC_TIMING_MMC_HS200] = 90;
 		}
 
 		for (i = 0; i <= MMC_TIMING_MMC_HS400; i++) {
-			clk_data->clk_phase_in[i] = zynqmp_iclk_phase[i];
-			clk_data->clk_phase_out[i] = zynqmp_oclk_phase[i];
+			clk_data->clk_phase_in[i] = iclk_phase[i];
+			clk_data->clk_phase_out[i] = oclk_phase[i];
 		}
 	}
 
 	if (of_device_is_compatible(dev->of_node, "xlnx,versal-8.9a")) {
-		u32 versal_iclk_phase[MMC_TIMING_MMC_HS400 + 1] =
-			VERSAL_ICLK_PHASE;
-		u32 versal_oclk_phase[MMC_TIMING_MMC_HS400 + 1] =
-			VERSAL_OCLK_PHASE;
+		iclk_phase = (int [MMC_TIMING_MMC_HS400 + 1]) VERSAL_ICLK_PHASE;
+		oclk_phase = (int [MMC_TIMING_MMC_HS400 + 1]) VERSAL_OCLK_PHASE;
 
 		for (i = 0; i <= MMC_TIMING_MMC_HS400; i++) {
-			clk_data->clk_phase_in[i] = versal_iclk_phase[i];
-			clk_data->clk_phase_out[i] = versal_oclk_phase[i];
+			clk_data->clk_phase_in[i] = iclk_phase[i];
+			clk_data->clk_phase_out[i] = oclk_phase[i];
 		}
 	}
 
-- 
2.26.2

Then build the kernel with this patch, but still not able to boot kernel 5.8.6.

I'll try to git bisect other commits related to USB. Thank you Chris!

Comment 35 Pany 2020-10-16 13:48:22 UTC
(In reply to Chris Murphy from comment #29)
> Good catch. There's a bunch of usb related commits in 5.8.6 too. I have no
> idea which ones are likely suspects. It's probably easiest to do a stable
> kernel bisect between tags 5.8.5 and 5.8.6 and let git bisect manage
> everything.

Finally, I found the only commit to blame:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.8.y&id=53965c79c2dbdc898ffc7fdbab37e920594f5e14

I can successfully boot with kernel 5.8.6 as long as I reverted this commit and rebuild the kernel.rpm

Comment 36 Zamir SUN 2020-10-17 13:25:34 UTC
(In reply to Pany from comment #35)
> Finally, I found the only commit to blame:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/
> ?h=linux-5.8.y&id=53965c79c2dbdc898ffc7fdbab37e920594f5e14
> 
> I can successfully boot with kernel 5.8.6 as long as I reverted this commit
> and rebuild the kernel.rpm

Great catch!

Are you willing to report this to mainline kernel with your findings? That will draw attention for kernel developers and (hopefully) get this under better attention.

I think either linux-usb list could be a good place to start with.

http://vger.kernel.org/vger-lists.html#linux-usb

Comment 37 Pany 2020-10-17 16:20:17 UTC
(In reply to Zamir SUN from comment #36)
> Are you willing to report this to mainline kernel with your findings? That
> will draw attention for kernel developers and (hopefully) get this under
> better attention.
> 
> I think either linux-usb list could be a good place to start with.
> 
> http://vger.kernel.org/vger-lists.html#linux-usb

Thank you, I've sent a message to the linux-usb list.

https://marc.info/?l=linux-usb&m=160295084610971&w=2

Comment 38 Bastien Nocera 2020-10-21 13:45:18 UTC
I've commented on the upstream discussion and prepared 2 patches which I'm compiling and will testing shortly.
I'll try to make a scratch build available once I've tested it locally.

Comment 39 Bastien Nocera 2020-10-22 11:51:45 UTC
(In reply to Pany from comment #35)
> (In reply to Chris Murphy from comment #29)
> > Good catch. There's a bunch of usb related commits in 5.8.6 too. I have no
> > idea which ones are likely suspects. It's probably easiest to do a stable
> > kernel bisect between tags 5.8.5 and 5.8.6 and let git bisect manage
> > everything.
> 
> Finally, I found the only commit to blame:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/
> ?h=linux-5.8.y&id=53965c79c2dbdc898ffc7fdbab37e920594f5e14
> 
> I can successfully boot with kernel 5.8.6 as long as I reverted this commit
> and rebuild the kernel.rpm

Could you please test with this scratch build?

https://koji.fedoraproject.org/koji/taskinfo?taskID=53966375

It contains the 2 patches I sent to the linux-usb list already.

Comment 40 M. Vefa Bicakci 2020-10-22 14:06:22 UTC
Hello all,

I published two patches at [1]. The first patch is modified/updated, and the second patch is an unmodified version of Bastien's second patch.

I verified that the patches do not introduce regressions. I do not have an Apple MacBook to verify that the actual issue is resolved, but the issue should be resolved with these patches.

Thanks!

[1] https://lore.kernel.org/linux-usb/20201022135521.375211-1-m.v.b@runbox.com/T/#m4a37b74da5ecad69b82322847959853daea420ba

Comment 41 Pany 2020-10-23 02:30:14 UTC
(In reply to Bastien Nocera from comment #39)
> (In reply to Pany from comment #35)
> Could you please test with this scratch build?
> 
> https://koji.fedoraproject.org/koji/taskinfo?taskID=53966375
> 
> It contains the 2 patches I sent to the linux-usb list already.

Bad news, this scratch build does not work as expected.

1. With "systemd.debug-shell=1" appended to cmdline, I tried to capture the journal. But the Macbook built-in keyboard stopped responding when I tried to press F9.
2. Plugging in an extra USB keyboard, I pressed F9 and tried to mount /dev/sdb3 or /dev/sdc3 to /boot in order to save the journal, but neither of them existed.
3. Appending "module_blacklist=apple_mfi_fastcharge" to cmdline, the kernel-5.10.0-0.rc0.20201021git071a0578b0ce.49.local.fc33 booted successfully.

Comment 42 Bastien Nocera 2020-10-23 09:07:52 UTC
(In reply to Pany from comment #41)
> (In reply to Bastien Nocera from comment #39)
> > (In reply to Pany from comment #35)
> > Could you please test with this scratch build?
> > 
> > https://koji.fedoraproject.org/koji/taskinfo?taskID=53966375
> > 
> > It contains the 2 patches I sent to the linux-usb list already.
> 
> Bad news, this scratch build does not work as expected.
> 
> 1. With "systemd.debug-shell=1" appended to cmdline, I tried to capture the
> journal. But the Macbook built-in keyboard stopped responding when I tried
> to press F9.
> 2. Plugging in an extra USB keyboard, I pressed F9 and tried to mount
> /dev/sdb3 or /dev/sdc3 to /boot in order to save the journal, but neither of
> them existed.
> 3. Appending "module_blacklist=apple_mfi_fastcharge" to cmdline, the
> kernel-5.10.0-0.rc0.20201021git071a0578b0ce.49.local.fc33 booted
> successfully.

OK. I'll try to setup a reproducer on a MacBook Air I have here, that'll make it easier to test.

Thanks.

Comment 43 Bastien Nocera 2020-10-23 13:16:56 UTC
(In reply to Bastien Nocera from comment #42)
<snip>
> OK. I'll try to setup a reproducer on a MacBook Air I have here, that'll
> make it easier to test.

I've managed to reproduce the original problem, though it'll take me a little
while to reproduce it again with the patches, and get enough debug to find
what the problem could be.

Comment 44 M. Vefa Bicakci 2020-10-24 15:21:44 UTC
(In reply to Bastien Nocera from comment #43)
> (In reply to Bastien Nocera from comment #42)
> <snip>
> > OK. I'll try to setup a reproducer on a MacBook Air I have here, that'll
> > make it easier to test.
> 
> I've managed to reproduce the original problem, though it'll take me a little
> while to reproduce it again with the patches, and get enough debug to find
> what the problem could be.

Hello Bastien,

I was able to get my hands onto an Apple MacBook Air, and I confirmed that the two patches that I published at [1] indeed resolve the issue at hand.

I am including below dmesg output which shows that 5.6.6-300.fc32 does not exhibit the issue, 5.8.15-201.fc32 exhibits the issue, and a custom-built kernel with the two patches does /not/ exhibit the issue.

The patches can be seen here: https://lore.kernel.org/linux-usb/20201022135521.375211-1-m.v.b@runbox.com/T/#m4a37b74da5ecad69b82322847959853daea420ba

I hope that this helps!

Oct 23 23:31:06 kernel: Linux version 5.6.6-300.fc32.x86_64 (mockbuild.fedoraproject.org) (gcc version 10.0.1 20200328 (Red Hat 10.0.1-0.11) (GCC)) #1 SMP Tue Apr 21 13:44:19 UTC 2020
Oct 23 23:31:08 kernel: scsi 2:0:0:0: Direct-Access     APPLE    SD Card Reader   3.00 PQ: 0 ANSI: 6

[root@localhost ~]# journalctl -b-1 --no-hostname -k | grep -e 'SD Card Reader' -e 'apple-mfi-fastcharge' -e 'Linux version'
Oct 24 01:01:22 kernel: Linux version 5.8.15-201.fc32.x86_64 (mockbuild.fedoraproject.org) (gcc (GCC) 10.2.1 20200723 (Red Hat 10.2.1-1), GNU ld version 2.34-5.fc32) #1 SMP Thu Oct 15 15:56:44 UTC 2020
Oct 24 01:01:24 kernel: scsi 2:0:0:0: Direct-Access     APPLE    SD Card Reader   3.00 PQ: 0 ANSI: 6
Oct 24 01:01:38 kernel: usbcore: registered new device driver apple-mfi-fastcharge
Oct 24 01:01:38 kernel: scsi 2:0:0:0: Direct-Access     APPLE    SD Card Reader   3.00 PQ: 0 ANSI: 6

[root@localhost ~]# journalctl -b0 --no-hostname -k | grep -e 'SD Card Reader' -e 'apple-mfi-fastcharge' -e 'Linux version'
Oct 24 01:11:20 kernel: Linux version 5.8.16-1.local (user@builder) (gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), GNU ld version 2.26.1-1.fc25) #64 SMP Sat Oct 24 00:41:24 EDT 2020
Oct 24 01:11:23 kernel: scsi 2:0:0:0: Direct-Access     APPLE    SD Card Reader   3.00 PQ: 0 ANSI: 6
Oct 24 01:11:52 kernel: usbcore: registered new device driver apple-mfi-fastcharge

Comment 45 Bastien Nocera 2020-10-27 14:06:56 UTC
My original version of the patch was buggy (should also have changed __check_for_non_generic_match() and __usb_bus_reprobe_drivers())
but Vefa's latest version works for me. The apple-mfi-fastcharge gets autoloaded on a MacBook Air, but
nothing uses it by default, and plugging an iOS device gets it matched as expected.

A kernel scratch build with Vefa's patches is available here:
https://koji.fedoraproject.org/koji/taskinfo?taskID=54314890

Please reply to the mailing-list with "Tested-by" tags, as I'm about to do once you've tested this successfully.

Comment 46 Pany 2020-10-28 04:04:54 UTC
(In reply to Bastien Nocera from comment #45)
> My original version of the patch was buggy (should also have changed
> __check_for_non_generic_match() and __usb_bus_reprobe_drivers())
> but Vefa's latest version works for me. The apple-mfi-fastcharge gets
> autoloaded on a MacBook Air, but
> nothing uses it by default, and plugging an iOS device gets it matched as
> expected.
> 
> A kernel scratch build with Vefa's patches is available here:
> https://koji.fedoraproject.org/koji/taskinfo?taskID=54314890
> 
> Please reply to the mailing-list with "Tested-by" tags, as I'm about to do
> once you've tested this successfully.

Thank you, Bastien, this scratch build works very well. And I've replied to the linux-usb mailing-list.

Thank you everybody helped me with this!

Comment 47 Pany 2020-11-24 01:52:54 UTC
I upgraded to Fedora 33 (kernel 5.9.8), this issue has already been fixed. So I think this report can be closed. Thanks to everybody again.


Note You need to log in before you can comment on or make changes to this bug.