Bug 1922485 - Outdated bcache-tools package results in R/O bcache devices under Linux Kernel >=5.10.8
Summary: Outdated bcache-tools package results in R/O bcache devices under Linux Kerne...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: bcache-tools
Version: 33
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Rolf Fokkens
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-29 21:43 UTC by Mitja Lužar
Modified: 2021-02-17 05:08 UTC (History)
4 users (show)

Fixed In Version: bcache-tools-1.1-0.fc33 bcache-tools-1.1-0.fc32
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-03 01:55:12 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Mitja Lužar 2021-01-29 21:43:18 UTC
Description of problem:
The following patch was included in kernel version 5.10.8:
bcache: set bcache device into read-only mode for BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET

This patch will force bcache devices into read-only mode, if the attached caching device was created using an outdated version of the bcache-tools package. This is because the current available version of bcache-tools (1.0.8-19.fc33) prepares the caching device with an outdated layout. Consequently, the patchset that includes the above patch will set an incompatibility bit for such bcache devices. 

Version-Release number of selected component (if applicable):
1.0.8-19.fc33

How reproducible:
Simple, any user attempting to create a bcache device on an updated Fedora 33 installation with the current version of bcache-tools will run into this issue.

Steps to Reproduce:
1. Install the bcache-tools package: 
dnf install bcache-tools

2. Prepare a set of test devices, example shown with file-backed virtual devices:
truncate -s 8G backing-dev.img
truncate -s 2G caching-dev.img
losetup --find --show backing-dev.img (/dev/loop0)
losetup --find --show caching-dev.img (/dev/loop1)

3. Using bcache-tools, prepare the backing device: 
make-bcache -B /dev/loop0

UUID:			faea7b67-e9eb-42d0-bcb0-cab597ad7e9a
Set UUID:		604327a5-fd26-48fb-8cef-e868eb51e48a
version:		1
block_size:		1
data_offset:		16

4. Prepare the caching device: 
make-bcache -C /dev/loop1

UUID:			646de313-09f1-4238-9863-a820f3080898
Set UUID:		2b21c11f-0210-4aa3-8858-444f65539b6b
version:		0
nbuckets:		4096
block_size:		1
bucket_size:		1024
nr_in_set:		1
nr_this_dev:		0
first_bucket:		1

5. Attach the caching device to the backing device, and verify that bcache returns a clean state for the device: 
echo "2b21c11f-0210-4aa3-8858-444f65539b6b" > /sys/block/bcache0/bcache/attach
cat /sys/block/bcache0/bcache/state
clean

6. Attempt to format the bcache device with a filesystem:
sudo mkfs.ext4 /dev/bcache0

Actual results:
Device is set to read-only mode from the moment a caching device is attached, preventing step 4. from completing correctly. The system log contains the following kernel messages:
Jan 29 20:41:00 Aurelian kernel: bcache: run_cache_set() Detect obsoleted large bucket layout, all attached bcache device will be read-only
Jan 29 20:41:00 Aurelian kernel: bcache: bch_cached_dev_attach() The obsoleted large bucket layout is unsupported, set the bcache device into read-only
Jan 29 20:41:00 Aurelian kernel: bcache: bch_cached_dev_attach() Please update to the latest bcache-tools to create the cache device

Expected results:
Device is accessible in a R/W state without any additional configuration required. 

Additional info:
The issue can be fixed by packaging a up-to-date upstream version of bcache-tools (>=1.1). Fixing this issue requires the user to detach the caching device, re-create it with a newer version of bcache-tools, and then re-attach the caching device to the backing device.

Steps to fix this issue on an existing bcache device, assuming a newer version of bcache-tools is available on the system:
1. Unmount any R/O filesystem on the bcache device
umount /dev/bcache0

2. Detach the caching device from the backing device, and verify no cache is present:
echo "2b21c11f-0210-4aa3-8858-444f65539b6b" > /sys/block/bcache0/bcache/detach
cat /sys/block/bcache0/bcache/state
no cache

3. Stop the caching device:
echo "1" > /sys/fs/bcache/2b21c11f-0210-4aa3-8858-444f65539b6b/stop

4. Remove previous bcache filesystem signature from the caching device:
wipefs -a /dev/loop1
/dev/loop1: 16 bytes were erased at offset 0x00001018 (bcache): c6 85 73 f6 4e 1a 45 ca 82 65 f5 7f 48 ba 6d 81

5. Re-create the caching device (example for compiled bcache-tools-1.1 from upstream):
./make-bcache -C /dev/loop1
Name			/dev/loop1
Label			
Type			cache
UUID:			5bc7be3e-5d47-48df-b07f-c1510132b115
Set UUID:		0099ae5b-fb20-4326-88ae-fd72e9da5587
version:		0
nbuckets:		4096
block_size_in_sectors:	1
bucket_size_in_sectors:	1024
nr_in_set:		1
nr_this_dev:		0
first_bucket:		1
/dev/loop1 blkdiscard beginning...done

6. Re-attach caching device to backing device, and verify state:
echo "0099ae5b-fb20-4326-88ae-fd72e9da5587" > /sys/block/bcache0/bcache/attach
cat /sys/block/bcache0/bcache/state
clean

bcache show
Name		Type		State			Bname		AttachToDev
/dev/loop1	3 (cache)	active          	N/A             N/A
/dev/loop0	1 (data)	clean(running)  	bcache0         /dev/loop1

7. The bcache device is now in a R/W state with an attached caching device, operating as expected.

Comment 1 Rolf Fokkens 2021-01-30 10:31:44 UTC
Thanks, will look into it.

I'm wondering though about existing systems. I'm running 5.10.8-100 myself (have just been upgrading for years), using bcache. No issue though.

Comment 2 Mitja Lužar 2021-01-30 12:24:49 UTC
Yep, you're right. I hit this when I booted into 5.10.10. Before that on 5.10.8 and 5.10.9 my bcache setup worked fine in R/W mode.

Comment 3 Fedora Update System 2021-01-30 15:59:54 UTC
FEDORA-2021-520b8b2830 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2021-520b8b2830

Comment 4 Rolf Fokkens 2021-01-30 20:20:53 UTC
Built a new bcache-tools (1.1), updating bcache-tools was about time anyway.

However, while running kernel-5.10.10-200.fc33 does not reproduce the issue:

bash-5.0$ cat /proc/version 
Linux version 5.10.10-200.fc33.x86_64 (mockbuild.fedoraproject.org) (gcc (GCC) 10.2.1 20201125 (Red Hat 10.2.1-9), GNU ld version 2.35-18.fc33) #1 SMP Sun Jan 24 19:58:54 UTC 2021
bash-5.0$ dmesg -T | grep bch
[Sat Jan 30 21:07:00 2021] bcache: bch_journal_replay() journal replay done, 2316 keys in 139 entries, seq 27381981
[Sat Jan 30 21:07:00 2021] bcache: bch_cached_dev_attach() Caching sda2 as bcache0 on set ea4f255f-ef68-4ae4-ab88-f2c5441074a6
bash-5.0$

Comment 5 Rolf Fokkens 2021-01-30 20:21:57 UTC
More info: https://www.spinics.net/lists/kernel/msg3792041.html

Comment 6 Fedora Update System 2021-01-31 01:24:30 UTC
FEDORA-2021-520b8b2830 has been pushed to the Fedora 33 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-520b8b2830`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-520b8b2830

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 7 Mitja Lužar 2021-02-01 16:36:30 UTC
So after nearly three days after I updated bcache-tools and recreated the cache device, this happened again:
Feb 01 17:18:16 Aurelian kernel: bcache: bch_journal_replay() journal replay done, 166 keys in 17 entries, seq 96643
Feb 01 17:18:16 Aurelian kernel: bcache: run_cache_set() Detect obsoleted large bucket layout, all attached bcache device will be read-only
Feb 01 17:18:16 Aurelian kernel: bcache: register_cache() registered cache device sdb7
Feb 01 17:18:16 Aurelian kernel: bcache: register_bdev() registered backing device sda1
Feb 01 17:18:16 Aurelian kernel: bcache: bch_cached_dev_attach() The obsoleted large bucket layout is unsupported, set the bcache device into read-only
Feb 01 17:18:16 Aurelian kernel: bcache: bch_cached_dev_attach() Please update to the latest bcache-tools to create the cache device
Feb 01 17:18:16 Aurelian kernel: bcache: bch_cached_dev_attach() Caching sda1 as bcache0 on set 6a5c5432-851d-4f8c-9c73-760c494c9e27

The bcache setup was set to writeback mode during this time, and it went through at least 10 mount/dismount cycles and reboots. I generated around 400GB of I/O, a 60/40 split between reads and writes. And now it's stuck in RO mode again. The filesystem in question is BTRFS, no errors reported during check, no I/O erorrs. This combination has worked solid for me for the past year and a half, until last week when this started.

So it looks like I opened up this bug report in error, something else is going haywire. I'm sorry. I have no idea what to try next at this point, maybe recreate the backing device as well? Hell, it's going to take an entire day to sync my backups and send the data back. Might as well stop now and hope that BTRFS eventually supports tiered storage.

Comment 8 Rolf Fokkens 2021-02-02 08:30:57 UTC
Although not for you, in general I feel a sense of relief. Glad that not all bcache installations will run into this, which could make systems unable to boot.

As a positive side affect it's a good thing that bcache-tools has been upgraded.

Comment 9 Fedora Update System 2021-02-02 20:26:39 UTC
FEDORA-2021-b6dddd3cf8 has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2021-b6dddd3cf8

Comment 10 Fedora Update System 2021-02-03 01:31:26 UTC
FEDORA-2021-b6dddd3cf8 has been pushed to the Fedora 32 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf install --enablerepo=updates-testing --advisory=FEDORA-2021-b6dddd3cf8 \*`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-b6dddd3cf8

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 11 Fedora Update System 2021-02-03 01:55:12 UTC
FEDORA-2021-520b8b2830 has been pushed to the Fedora 33 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 12 Tomasz Torcz 2021-02-05 16:25:38 UTC
Erm, my server got hit by it. Do we have steps to recover when system is unbootable (preferably to be run from dracut emergency shell)?

Comment 13 Rolf Fokkens 2021-02-06 12:03:19 UTC
I left quit an elobrate comment, but because bugzilla does no no this link, my comment was lost: https://marc.info/?l=linux-bcache&m=161261224608874&w=1 

So long story short: I cannot reproduce, the answer may be provided here: https://marc.info/?l=linux-bcache&m=161261224608874&w=1

Comment 14 Rolf Fokkens 2021-02-13 17:43:03 UTC
From the mail list:

-------- Forwarded Message --------
Subject: 	Re: bch_cached_dev_attach() The obsoleted large bucket layout is unsupported, set the bcache device into read-only
Date: 	Sun, 7 Feb 2021 23:29:34 +0800
From: 	Coly Li <colyli>
To: 	Rolf Fokkens <rolf>, linux-bcache.org

This is a regression and fixed in 5.11-rc6 by commit 0df28cad06eb
("bcache: only check feature sets when sb->version >=
BCACHE_SB_VERSION_CDEV_WITH_FEATURES").

Also the fix has been in stable kernels already last week. The fix
should go into distribution very soon IMHO.

Thanks.

Coly Li

Comment 15 Tomasz Torcz 2021-02-13 19:48:51 UTC
FWIW the patch fixing regression got included in 5.10.13.

Comment 16 Fedora Update System 2021-02-17 05:08:29 UTC
FEDORA-2021-b6dddd3cf8 has been pushed to the Fedora 32 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.