2005987 – btrfs device stats not showing raid1 errors

Bug 2005987 - btrfs device stats not showing raid1 errors

Summary: btrfs device stats not showing raid1 errors

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	rawhide
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	fedora-kernel-btrfs
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-09-20 15:42 UTC by Basic Six
Modified:	2022-06-07 23:41 UTC (History)
CC List:	23 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2022-06-07 23:41:35 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
drive details (9.30 KB, text/plain) 2021-09-21 18:41 UTC, Basic Six	no flags	Details
log (267.26 KB, text/plain) 2021-09-21 18:45 UTC, Basic Six	no flags	Details
View All

Description Basic Six 2021-09-20 15:42:48 UTC

Description of problem:

This is a raid1 mirror for the root device containing two drives. There's a monitoring job which checks "btrfs device stats" but it's not reporting the error:

# btrfs dev stats / | grep -vwc 0
0

It prints nothing but zeroes.

# btrfs dev stats -c / | grep read && echo ALL OK
[/dev/sda3].read_io_errs     0
[/dev/nvme0n1p3].read_io_errs     0
ALL OK

However, one drive is failing as btrfs did have read errors:

# dmesg | grep -wi btrfs
[2634355.709564] BTRFS info (device sda3): read error corrected: ino 27902168 off 8773632 (dev /dev/sda3 sector 52960104)
[2634355.733898] BTRFS info (device sda3): read error corrected: ino 27902168 off 8749056 (dev /dev/sda3 sector 52960056)
[2634355.777800] BTRFS info (device sda3): read error corrected: ino 27902168 off 8732672 (dev /dev/sda3 sector 52960024)
[2634355.778600] BTRFS info (device sda3): read error corrected: ino 27902168 off 8298496 (dev /dev/sda3 sector 52959176)
[2634355.802666] BTRFS info (device sda3): read error corrected: ino 27902168 off 8609792 (dev /dev/sda3 sector 52959784)
[2634355.804929] BTRFS info (device sda3): read error corrected: ino 27902168 off 8523776 (dev /dev/sda3 sector 52959616)
[2634355.805071] BTRFS info (device sda3): read error corrected: ino 27902168 off 8630272 (dev /dev/sda3 sector 52959824)
[2634355.805589] BTRFS info (device sda3): read error corrected: ino 27902168 off 8564736 (dev /dev/sda3 sector 52959696)

But it seems like "dev stats", the official tool for getting error reports is broken. Which is less than ideal because at some point, the second drive will fail as well and then what.

Version-Release number of selected component (if applicable):

btrfs-progs v5.11.1

Actual results:

Monitoring job doesn't detect bad drive in raid array. This may lead to data loss unless the admin manually checks dmesg regularly before the second drive fails as well.

Expected results:

If there's an official tool which prints out how many errors occurred, it should work. If the tool isn't reliable, there should be a warning that it's not reliable and a working alternative must be provided.

Additional info:

I'm setting the severity to "high" as this bug can lead to fatal data loss.

Comment 1 Chris Murphy 2021-09-20 16:25:47 UTC

Could you provide a complete dmesg for this boot? You can use 'journalctl -b $boot -k' to get previous boot dmesg where $boot is a negative value from list produced by --list-boots. And also report in the description the kernel version you're using.

There is an exception for errors detected during metadata read ahead, i.e. they won't increment the dev stats counter. Commit 0cc068e6ee59 btrfs: don't report readahead errors and don't update statistics

We should probably both count the error and fix the error. I'll report this upstream. Since this error checking code is in the kernel, and is written to the file system device tree, changing component to kernel.

Comment 2 Chris Murphy 2021-09-20 16:27:24 UTC

complete dmesg is optional but might (?) help narrow down if this is read ahead, and whether a device problem also shows up

Comment 3 Chris Murphy 2021-09-20 16:39:32 UTC

Upstream thread:
https://lore.kernel.org/linux-btrfs/CAJCQCtRbktnZ5NxRTZL9UKvTr1TaFtkCbeCS2pVnf2SPg8O3-w@mail.gmail.com/T/#u

Comment 4 Chris Murphy 2021-09-20 16:43:45 UTC

And yes actually it would be useful to see the complete dmesg because the problems are against /dev/sda3 which is not a device-mapper device, but a raw partition. 

Could you also post the output from 'smartctl -i' (brief) and separately attach the output from 'smartctl -x' (long)?

Comment 5 Chris Murphy 2021-09-20 16:52:56 UTC

See also https://bugzilla.kernel.org/show_bug.cgi?id=202403

Comment 6 Basic Six 2021-09-21 18:41:00 UTC

> Could you provide a complete dmesg for this boot? You can use 'journalctl -b $boot -k' to get previous boot dmesg where $boot is a negative value from list produced by --list-boots.

Thank for the quick response. It's currently running kernel 5.11.22.

Do you really need the past journal output from before the last reboot ("negative value from [journalctl] --list-boots") which wouldn't contain the errors yet?
As for btrfs/sda3, it would simply contain generic stuff like:

kernel: BTRFS: device label fedora00 devid 1 transid 18390651 /dev/sda3 scanned by systemd-udevd (519)
kernel: BTRFS info (device sda3): disk space caching is enabled
kernel: BTRFS info (device sda3): has skinny extents
kernel: BTRFS info (device sda3): enabling ssd optimizations
kernel: BTRFS info (device sda3): allowing degraded mounts
kernel: BTRFS warning (device sda3): excessive commit interval 600
kernel: BTRFS info (device sda3): disk space caching is enabled

> We should probably both count the error and fix the error.

Even if the error can't be fixed, it needs to be reported through "btrfs dev stats". Otherwise an admin wouldn't replace the drive in time. And "dev stats" printing "0" is wrong in this case.

Back to the kernel log which I can attach separately. I've found the elephant in the room which I've initially overlooked. There are clear signs that the drive is about to experience a slow death.

[2634355.397578] ata3.00: exception Emask 0x0 SAct 0xffffffff SErr 0x0 action 0x6 frozen
[2634355.397596] ata3.00: failed command: READ FPDMA QUEUED
[2634355.397600] ata3.00: cmd 60/28:00:c8:32:68/00:00:03:00:00/40 tag 0 ncq dma 20480 in
                          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[2634355.397614] ata3.00: status: { DRDY }
[2634355.397618] ata3.00: failed command: READ FPDMA QUEUED
[2634355.397621] ata3.00: cmd 60/10:08:00:33:68/00:00:03:00:00/40 tag 1 ncq dma 8192 in
                          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[2634355.397633] ata3.00: status: { DRDY }
...
[2634355.398109] ata3: hard resetting link
[2634355.702064] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[2634355.708139] ata3.00: configured for UDMA/133
[2634355.708201] sd 2:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=30s
[2634355.708209] sd 2:0:0:0: [sda] tag#0 Sense Key : Illegal Request [current] 
[2634355.708214] sd 2:0:0:0: [sda] tag#0 Add. Sense: Unaligned write command
[2634355.708221] sd 2:0:0:0: [sda] tag#0 CDB: Read(10) 28 00 03 68 32 c8 00 00 28 00
[2634355.708225] blk_update_request: I/O error, dev sda, sector 57160392 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[2634355.708279] sd 2:0:0:0: [sda] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=30s
[2634355.708284] sd 2:0:0:0: [sda] tag#1 Sense Key : Illegal Request [current] 
[2634355.708288] sd 2:0:0:0: [sda] tag#1 Add. Sense: Unaligned write command
[2634355.708292] sd 2:0:0:0: [sda] tag#1 CDB: Read(10) 28 00 03 68 33 00 00 00 10 00
[2634355.708295] blk_update_request: I/O error, dev sda, sector 57160448 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
...
[2634355.708642] blk_update_request: I/O error, dev sda, sector 57160576 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[2634355.708662] sd 2:0:0:0: [sda] tag#9 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=30s
[2634355.708666] sd 2:0:0:0: [sda] tag#9 Sense Key : Illegal Request [current] 
[2634355.708670] sd 2:0:0:0: [sda] tag#9 Add. Sense: Unaligned write command
[2634355.708674] sd 2:0:0:0: [sda] tag#9 CDB: Read(10) 28 00 03 68 33 a0 00 00 28 00
[2634355.708676] blk_update_request: I/O error, dev sda, sector 57160608 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[2634355.709011] ata3: EH complete
[2634355.709564] BTRFS info (device sda3): read error corrected: ino 27902168 off 8773632 (dev /dev/sda3 sector 52960104)

I've seen similar i/o errors on rented dedicated servers.
But in this case, the ssd is apparently defective, so I'll have it replaced. It's a bit strange that a WD ssd is already having i/o errors after only two years (while some other ssds have been working fine for six years now). It's definitely not overheating even though the fans were running on low speed:
# hddtemp /dev/sda
/dev/sda: WDC WDS240G2G0B-00EPW0: 35°C

But I'm not here to complain about WD. The only problem is that btrfs isn't reporting that a drive is bad. It was set up as btrfs mirror "just in case", even though the load is minimal. Had it been set up as a single drive, it would've let to corruption that can't be repaired but then, btrfs should also report it in the same way. Had it been set up as a single ext4 filesystem, there wouldn't be any reporting tool other than smartctl and i/o error messages in dmesg.

Comment 7 Basic Six 2021-09-21 18:41:28 UTC

Created attachment 1825059 [details]
drive details

Comment 8 Basic Six 2021-09-21 18:45:13 UTC

Created attachment 1825060 [details]
log

Comment 9 Chris Murphy 2021-09-21 20:00:40 UTC

What is the make/model of the drive? 'smartctl -i' will have this information. We have some prior reports of ST2000DM008 reporting similar unaligned write error and it seems bogus, but we haven't figured if it's some kind of firmware bug or what the nature of it is. But yes it could be early failure. Speculation -> could be that the drive firmware sees internal errors (corruption or checksum failures) and then its own error handling has a bug that causes it to report an unaligned write error.

Comment 10 Basic Six 2021-09-22 15:27:13 UTC

The smartctl output can be found in comment #7:
https://bugzilla.redhat.com/show_bug.cgi?id=2005987#c7

It's an SSD: WDS240G2G0B

Let me know if there's anything missing there.

(There are regular magnetic drives in the system as well, so I'm curious which firmware errors you mean or rather, how to distinguish between a firmware bug and a drive that's actually faulty?)

Note You need to log in before you can comment on or make changes to this bug.

acaringi
adscvr
airlied
alciregi
bskeggs
bugzilla
davide
esandeen
hdegoede
igor.raits
jarodwilson
jeremy
jglisse
jonathan
josef
kernel-maint
lgoncalv
linville
masami256
mchehab
ngompa13
ptalbert
steved