1650391 – On Fedora 29, BTRFS RAID volumes fail to mount at boot after kernel upgrade from 4.18.16-300 to 4.18.17-300

Bug 1650391 - On Fedora 29, BTRFS RAID volumes fail to mount at boot after kernel upgrade from 4.18.16-300 to 4.18.17-300

Summary: On Fedora 29, BTRFS RAID volumes fail to mount at boot after kernel upgrade f...

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	29
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-11-16 04:42 UTC by wirerydr
Modified:	2019-05-16 00:20 UTC (History)
CC List:	20 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2019-02-21 21:06:32 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
dmesg output (96.18 KB, text/plain) 2018-11-16 04:47 UTC, wirerydr	no flags	Details
journald output (111.27 KB, text/plain) 2018-11-16 04:54 UTC, wirerydr	no flags	Details
'systemd-analyse blame' output (2.29 KB, text/plain) 2018-11-16 04:55 UTC, wirerydr	no flags	Details
'systemd-analyze critical chain' output (183 bytes, text/plain) 2018-11-16 04:55 UTC, wirerydr	no flags	Details
bug1650391-info.txt for Chris (748 bytes, text/plain) 2019-05-14 20:56 UTC, Devin Henderson	no flags	Details
bug1650391-journaludev.txt for Chris (76.64 KB, text/plain) 2019-05-14 20:56 UTC, Devin Henderson	no flags	Details
REAL bug1650391-info.txt for Chris (746 bytes, text/plain) 2019-05-14 21:09 UTC, Devin Henderson	no flags	Details
REAL bug1650391-journaludev.txt for Chris (641.55 KB, text/plain) 2019-05-14 21:10 UTC, Devin Henderson	no flags	Details
Show Obsolete (2) View All

Description wirerydr 2018-11-16 04:42:16 UTC

User-Agent:       Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0
Build Identifier: 

Following an upgrade of a Fedora 29 system from Kernel 4.18.16-300 to 4.18.17-300, two BTRFS raid-10 volumes (each containing multiple subvolumes) failed to be mounted at subsequent reboot, dropping the boot process into emergency mode.  Same thing occurred after subsequent upgrade to kernel 4.18.18-300.  Booting on kernel 4.18.16-300 works fine.  (All 3 kernels are simultaneously installed, so no dnf-install / dnf-remove activity in-between boot attempts.)

All volumes/subvolumes can be manually mounted by hand after the failure occurs, i.e. by issuing appropriate 'mount' commands at the command-line.  The volumes successfully passed subsequent 'btrfs check' (while unmounted) and 'btrfs scrub' (while mounted) operations.

Various diagnostics have been attached to this bug-report (e.g. output of dmesg, journalctl -xb, systemd-analyze critical-chain,etc.)  However, it would appear that the raid volumes may be trying to become mounted before all member-disks have become available.

All mounting is done via /etc/fstab, i.e. there are no hand-crafted systemd mount units involved. Initially the entries in /etc/fstab took the following form (several more subvols exist on both volumes but have been excluded here for brevity):

LABEL=btrfs-highrise-raid10-01   /var/data/filesystems/btrfs-highrise-raid10-01  btrfs   noatime,noauto                0 0
LABEL=btrfs-highrise-raid10-02   /var/data/filesystems/btrfs-highrise-raid10-02  btrfs   noatime,noauto                0 0
LABEL=btrfs-highrise-raid10-01   /var/data/subvols/backups_bareos                btrfs   noatime,subvol=backups_bareos 0 0
LABEL=btrfs-highrise-raid10-01   /var/data/subvols/backups_db                    btrfs   noatime,subvol=backups_db     0 0

Identical results occur whether or not the toplevel volumes are set to "noauto".

Identical results also occur when the /etc/fstab entries are changed to explicitly specify all member disks, such as:

/dev/sdb                         /var/data/filesystems/btrfs-highrise-raid10-01  btrfs   device=/dev/sdb,device=/dev/sdc,device=sdd,device=/dev/sde,noatime 0 0
/dev/sdf                         /var/data/filesystems/btrfs-highrise-raid10-02  btrfs   device=/dev/sdf,device=/dev/sdg,device=sdh,device=/dev/sdi,noatime 0 0

and also:

/dev/sdb                         /var/data/filesystems/btrfs-highrise-raid10-01  btrfs   device=fbb6ce57-66c3-4189-a180-2242ef8aa43b,device=4bc98037-1e45-4cd8-8c21-b4789d1586b8,device=72d8133b-7ffc-463c-bfcd-73a03cc42e54,device=03b19f31-0a34-4dfc-b420-ac0568ecf170,noatime 0 0
/dev/sdf                         /var/data/filesystems/btrfs-highrise-raid10-02  btrfs   device=34b5aa14-b79a-4565-973c-282c173d7354,device=67a155dd-6b7f-4e5b-ba39-71fc4e2f3885,device=5f2e017c-dd2c-4f3a-b8c7-d612590dbede,device=36072420-2451-4a9b-9264-e11861a2d3fc,noatime 0 0



Again, when all else is left the same and the system is just booted with different kernel revisions selected, the problem manifests reliably under 4.18.17-300 (and later), but not on 4.18.16-300 (and prior).  Prior to 4.18.17, this problem had not been experienced for 2+ years as the system hosting the raid was progressively migrated from one Fedora version to the next.

The following RPMs were updated via DNF immediately prior to the failure starting to occur:

dnf history info 21
Transaction ID : 21
Begin time     : Sat 10 Nov 2018 07:37:26 PM EST
Begin rpmdb    : 442:8975452ebac65dfad9eea3d24864b51cc8dddfa5
End time       : Sat 10 Nov 2018 07:38:28 PM EST (62 seconds)
End rpmdb      : 448:7c538d7e7c8cced1ed103716bda59b756903e9ec
User           : root <root>
Return-Code    : Success
Releasever     : 29
Command Line   : update
Packages Altered:
    Install  kernel-4.18.17-300.fc29.x86_64                @updates
    Install  kernel-core-4.18.17-300.fc29.x86_64           @updates
    Install  kernel-modules-4.18.17-300.fc29.x86_64        @updates
    Install  python3-unbound-1.8.1-1.fc29.x86_64           @updates
    Install  unbound-libs-1.8.1-1.fc29.x86_64              @updates
    Install  zchunk-libs-0.9.14-1.fc29.x86_64              @updates
    Upgrade  coreutils-8.30-6.fc29.x86_64                  @updates
    Upgraded coreutils-8.30-5.fc29.x86_64                  @@System
    Upgrade  coreutils-common-8.30-6.fc29.x86_64           @updates
    Upgraded coreutils-common-8.30-5.fc29.x86_64           @@System
    Upgrade  curl-7.61.1-4.fc29.x86_64                     @updates
    Upgraded curl-7.61.1-3.fc29.x86_64                     @@System
    Upgrade  dnf-4.0.4-2.fc29.noarch                       @updates
    Upgraded dnf-4.0.4-1.fc29.noarch                       @@System
    Upgrade  dnf-data-4.0.4-2.fc29.noarch                  @updates
    Upgraded dnf-data-4.0.4-1.fc29.noarch                  @@System
    Upgrade  dnf-yum-4.0.4-2.fc29.noarch                   @updates
    Upgraded dnf-yum-4.0.4-1.fc29.noarch                   @@System
    Upgrade  findutils-1:4.6.0-21.fc29.x86_64              @updates
    Upgraded findutils-1:4.6.0-20.fc29.x86_64              @@System
    Upgrade  glibc-2.28-17.fc29.x86_64                     @updates
    Upgraded glibc-2.28-9.fc29.x86_64                      @@System
    Upgrade  glibc-all-langpacks-2.28-17.fc29.x86_64       @updates
    Upgraded glibc-all-langpacks-2.28-9.fc29.x86_64        @@System
    Upgrade  glibc-common-2.28-17.fc29.x86_64              @updates
    Upgraded glibc-common-2.28-9.fc29.x86_64               @@System
    Upgrade  libcurl-7.61.1-4.fc29.x86_64                  @updates
    Upgraded libcurl-7.61.1-3.fc29.x86_64                  @@System
    Upgrade  libdnf-0.22.0-8.fc29.x86_64                   @updates
    Upgraded libdnf-0.22.0-6.fc29.x86_64                   @@System
    Upgrade  libgcc-8.2.1-5.fc29.x86_64                    @updates
    Upgraded libgcc-8.2.1-4.fc29.x86_64                    @@System
    Upgrade  libgomp-8.2.1-5.fc29.x86_64                   @updates
    Upgraded libgomp-8.2.1-4.fc29.x86_64                   @@System
    Upgrade  libicu-62.1-3.fc29.x86_64                     @updates
    Upgraded libicu-62.1-2.fc29.x86_64                     @@System
    Upgrade  libsolv-0.7.1-1.fc29.x86_64                   @updates
    Upgraded libsolv-0.6.35-3.fc29.x86_64                  @@System
    Upgrade  libstdc++-8.2.1-5.fc29.x86_64                 @updates
    Upgraded libstdc++-8.2.1-4.fc29.x86_64                 @@System
    Upgrade  lmdb-libs-0.9.22-4.fc29.x86_64                @updates
    Upgraded lmdb-libs-0.9.22-3.fc29.x86_64                @@System
    Upgrade  pcre-8.42-5.fc29.x86_64                       @updates
    Upgraded pcre-8.42-4.fc29.x86_64                       @@System
    Upgrade  pcre2-10.32-4.fc29.x86_64                     @updates
    Upgraded pcre2-10.32-3.fc29.x86_64                     @@System
    Upgrade  perl-Errno-1.29-424.fc29.x86_64               @updates
    Upgraded perl-Errno-1.29-423.fc29.x86_64               @@System
    Upgrade  perl-IO-1.39-424.fc29.x86_64                  @updates
    Upgraded perl-IO-1.39-423.fc29.x86_64                  @@System
    Upgrade  perl-interpreter-4:5.28.0-424.fc29.x86_64     @updates
    Upgraded perl-interpreter-4:5.28.0-423.fc29.x86_64     @@System
    Upgrade  perl-libs-4:5.28.0-424.fc29.x86_64            @updates
    Upgraded perl-libs-4:5.28.0-423.fc29.x86_64            @@System
    Upgrade  perl-macros-4:5.28.0-424.fc29.x86_64          @updates
    Upgraded perl-macros-4:5.28.0-423.fc29.x86_64          @@System
    Upgrade  plymouth-0.9.4-1.fc29.x86_64                  @updates
    Upgraded plymouth-0.9.3-14.fc29.x86_64                 @@System
    Upgrade  plymouth-core-libs-0.9.4-1.fc29.x86_64        @updates
    Upgraded plymouth-core-libs-0.9.3-14.fc29.x86_64       @@System
    Upgrade  plymouth-scripts-0.9.4-1.fc29.x86_64          @updates
    Upgraded plymouth-scripts-0.9.3-14.fc29.x86_64         @@System
    Upgrade  python3-3.7.1-2.fc29.x86_64                   @updates
    Upgraded python3-3.7.1-1.fc29.x86_64                   @@System
    Upgrade  python3-dnf-4.0.4-2.fc29.noarch               @updates
    Upgraded python3-dnf-4.0.4-1.fc29.noarch               @@System
    Upgrade  python3-hawkey-0.22.0-8.fc29.x86_64           @updates
    Upgraded python3-hawkey-0.22.0-6.fc29.x86_64           @@System
    Upgrade  python3-libdnf-0.22.0-8.fc29.x86_64           @updates
    Upgraded python3-libdnf-0.22.0-6.fc29.x86_64           @@System
    Upgrade  python3-libs-3.7.1-2.fc29.x86_64              @updates
    Upgraded python3-libs-3.7.1-1.fc29.x86_64              @@System
    Upgrade  selinux-policy-3.14.2-42.fc29.noarch          @updates
    Upgraded selinux-policy-3.14.2-40.fc29.noarch          @@System
    Upgrade  selinux-policy-targeted-3.14.2-42.fc29.noarch @updates
    Upgraded selinux-policy-targeted-3.14.2-40.fc29.noarch @@System
    Upgrade  vim-common-2:8.1.511-1.fc29.x86_64            @updates
    Upgraded vim-common-2:8.1.483-1.fc29.x86_64            @@System
    Upgrade  vim-enhanced-2:8.1.511-1.fc29.x86_64          @updates
    Upgraded vim-enhanced-2:8.1.483-1.fc29.x86_64          @@System
    Upgrade  vim-filesystem-2:8.1.511-1.fc29.noarch        @updates
    Upgraded vim-filesystem-2:8.1.483-1.fc29.noarch        @@System
    Upgrade  vim-minimal-2:8.1.511-1.fc29.x86_64           @updates
    Upgraded vim-minimal-2:8.1.483-1.fc29.x86_64           @@System


Reproducible: Always

Steps to Reproduce:
1. Have a multi-disk BTRFS array (in this case an 8-disk raid-10 array) present and functioning in the system.
2. Boot into Fedora 29 kernel 4.18.17-300 or later.
3. Observe the boot process drop into emergency mode.
Actual Results:  
The BTRFS multi-disk array fails to get mounted, and the boot process drops into emergency mode.

Expected Results:  
The BTRFS multi-disk array gets mounted successfully, and the boot process reaches the desired runlevel (as happens with kernels 4.18.16-300 and prior).

Comment 1 wirerydr 2018-11-16 04:47:07 UTC

Created attachment 1506300 [details]
dmesg output

Comment 2 wirerydr 2018-11-16 04:54:47 UTC

Created attachment 1506301 [details]
journald output

Comment 3 wirerydr 2018-11-16 04:55:15 UTC

Created attachment 1506302 [details]
'systemd-analyse blame' output

Comment 4 wirerydr 2018-11-16 04:55:59 UTC

Created attachment 1506303 [details]
'systemd-analyze critical chain' output

Comment 5 Jeremy Cline 2018-12-03 17:35:01 UTC

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 29 kernel bugs.
 
Fedora 29 has now been rebased to 4.19.5-300.fc29.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you experience different issues, please open a new bug report for those.

Comment 6 wirerydr 2018-12-04 22:28:21 UTC

(In reply to Jeremy Cline from comment #5)

> Fedora 29 has now been rebased to 4.19.5-300.fc29.  Please test this kernel
> update (or newer) and let us know if you issue has been resolved or if it is
> still present with the newer kernel.

Problem reoccured under kernel-4.19.5-300, which was the most-recent kernel as of the time of testing. Diagnostics yielded substantially the same logging output as has already been attached to this bugreport under kernel 4.18.17-300.

After reverting again to 4.18.16-300, the problem did not manifest.  So behaviour remains the same as originally documented:  Any kernel _after_ 4.18.16-300 appears to manifest the documented issue.

Comment 7 Jeremy Cline 2018-12-10 21:29:54 UTC

I'm afraid I don't see anything obvious in the changelog for v4.18.17. Your best best is to do a git-bisect and figure out which commit caused the regression. There are numerous guides on the Internet, but there's a Fedora-specific(ish) guide at https://docs.fedoraproject.org/en-US/quick-docs/kernel/troubleshooting/index.html#_bisecting_the_kernel

Comment 8 Justin M. Forbes 2019-01-29 16:14:10 UTC

*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 29 kernel bugs.

Fedora 29 has now been rebased to 4.20.5-200.fc29.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 9 Justin M. Forbes 2019-02-21 21:06:32 UTC

*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 3 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.

Comment 10 Chris Murphy 2019-04-03 06:19:39 UTC

Chances are this is solved by now but...


[    4.036987] BTRFS error (device sdd): devid 4 uuid fbb6ce57-66c3-4189-a180-2242ef8aa43b is missing

For whatever reason, one of the devices in the Btrfs volume is delayed in becoming available, and the mount fails because Btrfs by default will not do degraded mounts. There is a udev rule that I'm pretty sure should prevent systemd from even attempting to mount any Btrfs volume until all btrfs volume member devices are reported ready.

/usr/lib/udev/rules.d/64-btrfs.rules

There are actually two questions: why is this device late to the party? And why is the btrfs udev rule somehow not preventing systemd from attempting to mount this volume? I suggest separately using boot parameters rd.udev.debug and systemd.log_level=debug - if used together it might slow down boot enough that the slow device shows up and the problem doesn't happen only because of debugging.

Comment 11 Devin Henderson 2019-05-14 00:27:04 UTC

I just upgraded from Fedora 28 to 29 on my custom NAS box and I'm having this issue with the latest kernel 5.0.13-200.fc29.x86_64

If I comment out my btrfs raid mounts from /etc/fstab I can boot and I'm able to mount them manually or by uncommenting the lines in fstab and running `mount -a`.

Comment 12 Chris Murphy 2019-05-14 01:41:53 UTC

(In reply to Devin Henderson from comment #11)
> I just upgraded from Fedora 28 to 29 on my custom NAS box and I'm having
> this issue with the latest kernel 5.0.13-200.fc29.x86_64
> 
> If I comment out my btrfs raid mounts from /etc/fstab I can boot and I'm
> able to mount them manually or by uncommenting the lines in fstab and
> running `mount -a`.

Revert the /etc/fstab change, and then boot with 'rd.udev.debug systemd.debug-shell=1' and hopefully the problem will reproduce. You can then switch to tty9 and collect some information:

# script bug1650391-info.txt
# systemctl list-jobs
# btrfs fi show
# btrfs device scan
# btrfs fi show
# exit
# journalctl -b -o short-monotonic > bug1650391-journaludev.txt
# vi /etc/fstab   ## you can either comment the volume out again, or I think adding 'nofail' as a mount option will cause systemd to eventually continue the boot rather than hanging indefinitely waiting for this volume to become available
# reboot

And then attach the two files, /root/*txt to the bug report. I'm very curious why a kernel change would affect device discovery.

Comment 13 Devin Henderson 2019-05-14 20:54:21 UTC

Thanks Chris. I'm running this in a KVM guest and I ran those commands in the "emergency mode" shell that comes up when the boot fails, I hope that works. I'll add the attachments now.

BTW, adding nofail did allow the boot to finish. Once it was up I ran `mount -a` immediately and it worked.

Comment 14 Devin Henderson 2019-05-14 20:56:02 UTC

Created attachment 1568698 [details]
bug1650391-info.txt for Chris

Comment 15 Devin Henderson 2019-05-14 20:56:28 UTC

Created attachment 1568699 [details]
bug1650391-journaludev.txt for Chris

Comment 16 Devin Henderson 2019-05-14 21:03:48 UTC

Shoot! I forgot to add the kernel params. I will re-attache new text files shortly. Sorry.

Comment 17 Devin Henderson 2019-05-14 21:09:43 UTC

Created attachment 1568702 [details]
REAL bug1650391-info.txt for Chris

Comment 18 Devin Henderson 2019-05-14 21:10:00 UTC

Created attachment 1568703 [details]
REAL bug1650391-journaludev.txt for Chris

Comment 19 Devin Henderson 2019-05-14 21:10:58 UTC

New files are attached. Sorry for the confusion. I appreciate the help.

Comment 20 Chris Murphy 2019-05-15 22:32:31 UTC

[ 9.086400] nas kernel: BTRFS error (device vdc): devid 1 uuid 404b0f4d-6b07-4094-a0ac-07634b8b2548 is missing
...
[ 9.131558] nas kernel: BTRFS error (device vdc): devid 1 uuid 404b0f4d-6b07-4094-a0ac-07634b8b2548 is missing
...
[ 9.151269] nas kernel: BTRFS error (device vdc): devid 1 uuid 404b0f4d-6b07-4094-a0ac-07634b8b2548 is missing

And yet your 'btrfs fi show' only discovers one Btrfs volume with a totally different UUID a36a537e-71e5-4569-a642-d926ea21455d. Note that the filesystem/volume UUID for Btrfs applies to the entire volume across all devices, so in a 2x raid1, both devices will have the same UUID. There's a separate dev_item.uuid that is unique for each device stored in the superblock. And blkid and the kernel report the volume UUID not the device UUID. The device is distinguished in kernel messages with "devid n"

Can you provide the output from
# blkid
# btrfs insp dump-s -f /dev/vdb
# btrfs insp dump-s -f /dev/vdc
# btrfs rescue super -v /dev/vdb
# btrfs rescue super -v /dev/vdb

You can use script again for that if you're at a console.

These are read only commands, they don't make any changes. I see two Btrfs volume UUIDs in the journal, that's a bit troubling that maybe there's a stale super or some other confusion. Right now I'm a bit nervous about any more writes happening, so I suggest not mounting the volume rw. It's safe to mount ro. And it should be safe to make sure backups are up to date, just in case this all goes badly. But I'd make sure the backup, whether incremental or full, is separate (doesn't overwrite) prior backups.

This conversation needs to move to the upstream Btrfs list. It's sufficiently serious upstream needs to know of the conditions that led to this situation. It may be a pre-existing problem that's only just now getting caught by newer kernel versions; but the fact you get a mount failure during startup, but not manually, is weird. I smell a bug.

I suggest file attachments go in this bug report. And the conversation goes to the Btrfs list.

http://vger.kernel.org/vger-lists.html#linux-btrfs

Click on the subscribe link, send the email, reply to confirmation. (It's not strictly required to subscribe, but a lot of people do not reply all so if you don't subscribe just end the email with 'please reply all I'm not subscribed')

Include the following:
summary of the problem
kernel versions being used
btrfs progs version(s) being used

History of the file system. mkfs options used (was it always a raid1 or was it later converted into one); was a device ever replaced removed or add? did the volume ever have its UUID changed using btrfstune? Has it ever previously been mounted with degraded option, and if yes, what where the conditions requiring that? Has 'btrfs check --repair' been used? Don't use it now, just note if it has been used, and if you can remember what happened (error or no error, couple dozen lines or hundreds).

I'm gonna go dig into this udev debug in a bit.

Comment 21 Chris Murphy 2019-05-15 22:52:26 UTC

[    9.084816] nas kernel: BTRFS info (device vdc): disk space caching is enabled
[    9.086400] nas kernel: BTRFS error (device vdc): devid 1 uuid 404b0f4d-6b07-4094-a0ac-07634b8b2548 is missing
[    9.086407] nas kernel: BTRFS error (device vdc): failed to read the system array: -2
[    9.093686] nas kernel: BTRFS error (device vdc): open_ctree failed

The second two lines mean something tried to mount the file system while devid 1 was missing. It shouldn't be possible for systemd to do this because of this udev rule:

[chris@flap ~]$ sudo dnf provides /usr/lib/udev/rules.d/64-btrfs.rules
systemd-udev-241-8.git9ef65cb.fc30.x86_64 : Rule-based device node and kernel event manager

[chris@flap ~]$ sudo cat /usr/lib/udev/rules.d/64-btrfs.rules
# do not edit this file, it will be overwritten on update

SUBSYSTEM!="block", GOTO="btrfs_end"
ACTION=="remove", GOTO="btrfs_end"
ENV{ID_FS_TYPE}!="btrfs", GOTO="btrfs_end"
ENV{SYSTEMD_READY}=="0", GOTO="btrfs_end"

# let the kernel know about this btrfs filesystem, and check if it is complete
IMPORT{builtin}="btrfs ready $devnode"

# mark the device as not ready to be used by the system
ENV{ID_BTRFS_READY}=="0", ENV{SYSTEMD_READY}="0"

# reconsider pending devices in case when multidevice volume awaits
ENV{ID_BTRFS_READY}=="1", RUN+="/usr/bin/udevadm trigger -s block -p ID_BTRFS_READY=0"

LABEL="btrfs_end"
---

systemd is supposed to indefinitely wait for all Btrfs devices in a volume to be reported as ready, before it will even try to mount the file system. And yet... we have an error that only happens during failed mount attempts. And it failed because the device is not yet available. It doesn't appear until later...

[   10.113322] nas kernel: BTRFS: device label WD80EFAX devid 1 transid 272924 /dev/vdb


OK so I need something else :D I need you to boot up again with 'systemd.log_level=debug' and output the journal to a file 'journalctl -b -o short-monotonic > journalsysdebug.txt' or something otherwise sanely named.

UUID 404b0f4d... is only ever reported by the kernel, and it's missing
UUID a36a537e... is reported by both udev and btrfs progs but not the kernel

This is confusing.

Comment 22 Chris Murphy 2019-05-16 00:19:20 UTC

This might be a udev or systemd bug actually... I know it seem instigated by a kernel change. And a kernel change might be related, but I'll need to see systemd debug journal.

[    4.470332] nas kernel: virtio_blk virtio4: [vdb] 15628050432 512-byte logical blocks (8.00 TB/7.28 TiB)
[    4.539011] nas kernel: virtio_blk virtio5: [vdc] 15628050432 512-byte logical blocks (8.00 TB/7.28 TiB)

[    4.933352] nas systemd-udevd[382]: creating link '/dev/disk/by-uuid/a36a537e-71e5-4569-a642-d926ea21455d' to '/dev/vdb'
[    4.933411] nas systemd-udevd[382]: creating symlink '/dev/disk/by-uuid/a36a537e-71e5-4569-a642-d926ea21455d' to '../../vdb'

[    4.953123] nas systemd-udevd[377]: creating link '/dev/disk/by-uuid/a36a537e-71e5-4569-a642-d926ea21455d' to '/dev/vdc'
[    4.953197] nas systemd-udevd[377]: atomically replace '/dev/disk/by-uuid/a36a537e-71e5-4569-a642-d926ea21455d'

I don't grok udev pretty much at all. But clearly the virtio devices vdb and vdc exist way early. And then udev seems to initially assign the correct UUID to /dev/vdb; but then steps on that (?) with /dev/vdc. And later vdb appears to the kernel. I'm not really sure of the interaction between udev and kernel, why the block device can appear but kernel code doesn't see /dev/vdb is a btrfs device until much much later.

Comment 23 Chris Murphy 2019-05-16 00:20:27 UTC

Also, can you include a copy of /etc/fstab? I want to understand what assembly is supposed to be like and where all the UUIDs are supposed to end up.

Note You need to log in before you can comment on or make changes to this bug.

airlied
bskeggs
bugzilla
devin
ewk
hdegoede
ichavero
itamar
jarodwilson
jcline
jglisse
john.j5live
jonathan
josef
kernel-maint
linville
mchehab
mjg59
steved
wirerydr