Bug 2374194 - libblockdev 3.3.1 update can cause gnome-disks error "DISK IS LIKELY TO FAIL SOON" despite no failing pre-fail SMART attributes
Summary: libblockdev 3.3.1 update can cause gnome-disks error "DISK IS LIKELY TO FAIL ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: libblockdev
Version: 42
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Vojtech Trefny
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2025-06-22 06:50 UTC by Andre Robatino
Modified: 2025-06-25 01:19 UTC (History)
12 users (show)

Fixed In Version: libblockdev-3.3.1-2.fc42
Clone Of:
Environment:
Last Closed: 2025-06-25 01:19:20 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
output of "smartctl -a /dev/sda" from 2020-11-11 (5.10 KB, text/plain)
2025-06-22 06:52 UTC, Andre Robatino
no flags Details
output of "smartctl -a /dev/sda" from 2025-06-22 (5.30 KB, text/plain)
2025-06-22 06:53 UTC, Andre Robatino
no flags Details
output of "smartctl -a /dev/sda" from 2025-06-22 after additional extended self-test (5.42 KB, text/plain)
2025-06-22 14:28 UTC, Andre Robatino
no flags Details
smartctl -x /dev/sda at 16:31 (6.82 KB, text/plain)
2025-06-22 14:57 UTC, Jan Vlug
no flags Details
smartctl -x /dev/sda at 16:46 (27.00 KB, text/plain)
2025-06-22 14:57 UTC, Jan Vlug
no flags Details
output of "smartctl -a /dev/sda" from 2025-06-21 (udisks2-2.10.90-2) (4.92 KB, text/plain)
2025-06-22 15:49 UTC, ThumpnVTwin
no flags Details

Description Andre Robatino 2025-06-22 06:50:12 UTC
During the latest dnf update which includes the udisks2-2.10.90-3 packages, just before getting the prompt back, I got a gnome notification about my disk being likely to fail soon. Looking at gnome-disks the Overall Assessment is now "DISK IS LIKELY TO FAIL SOON". However, looking at the actual SMART data, there is no significant change. To be safe I immediately backed up my data and ran short and conveyance tests, which both passed. Am also running an extended test to be sure but don't expect any problem. Will attach output of "smartctl -a /dev/sda" from November 2020 and from today for comparison, there is no significant difference. Note that the 6 bad sectors were there when I tested the HDD for the first time after installing it in 2014 so it apparently came that way. I know that the HDD is old but that's just an old-age attribute.

Reproducible: Always

Comment 1 Andre Robatino 2025-06-22 06:52:22 UTC
Created attachment 2094652 [details]
output of "smartctl -a /dev/sda" from 2020-11-11

Comment 2 Andre Robatino 2025-06-22 06:53:36 UTC
Created attachment 2094653 [details]
output of "smartctl -a /dev/sda" from 2025-06-22

Comment 3 Andre Robatino 2025-06-22 06:55:29 UTC
Comment on attachment 2094653 [details]
output of "smartctl -a /dev/sda" from 2025-06-22

>smartctl 7.5 2025-04-30 r5714 [x86_64-linux-6.14.11-300.fc42.x86_64] (local build)
>Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org
>
>=== START OF INFORMATION SECTION ===
>Model Family:     Western Digital Green
>Device Model:     WDC WD20EZRX-00DC0B0
>Serial Number:    WD-WMC1T1537375
>LU WWN Device Id: 5 0014ee 6030995a4
>Firmware Version: 80.00A80
>User Capacity:    2,000,398,934,016 bytes [2.00 TB]
>Sector Sizes:     512 bytes logical, 4096 bytes physical
>Device is:        In smartctl database 7.5/5706
>ATA Version is:   ACS-2 (minor revision not indicated)
>SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
>Local Time is:    Sun Jun 22 02:44:15 2025 EDT
>SMART support is: Available - device has SMART capability.
>SMART support is: Enabled
>
>=== START OF READ SMART DATA SECTION ===
>SMART overall-health self-assessment test result: PASSED
>
>General SMART Values:
>Offline data collection status:  (0x85)	Offline data collection activity
>					was aborted by an interrupting command from host.
>					Auto Offline Data Collection: Enabled.
>Self-test execution status:      ( 249)	Self-test routine in progress...
>					90% of test remaining.
>Total time to complete Offline 
>data collection: 		(26640) seconds.
>Offline data collection
>capabilities: 			 (0x7b) SMART execute Offline immediate.
>					Auto Offline data collection on/off support.
>					Suspend Offline collection upon new
>					command.
>					Offline surface scan supported.
>					Self-test supported.
>					Conveyance Self-test supported.
>					Selective Self-test supported.
>SMART capabilities:            (0x0003)	Saves SMART data before entering
>					power-saving mode.
>					Supports SMART auto save timer.
>Error logging capability:        (0x01)	Error logging supported.
>					General Purpose Logging supported.
>Short self-test routine 
>recommended polling time: 	 (   2) minutes.
>Extended self-test routine
>recommended polling time: 	 ( 269) minutes.
>Conveyance self-test routine
>recommended polling time: 	 (   5) minutes.
>SCT capabilities: 	       (0x70b5)	SCT Status supported.
>					SCT Feature Control supported.
>					SCT Data Table supported.
>
>SMART Attributes Data Structure revision number: 16
>Vendor Specific SMART Attributes with Thresholds:
>ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
>  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       4
>  3 Spin_Up_Time            0x0027   176   176   021    Pre-fail  Always       -       6175
>  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       141
>  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       6
>  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
>  9 Power_On_Hours          0x0032   001   001   000    Old_age   Always       -       96432
> 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
> 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
> 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       141
>192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       84
>193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       225
>194 Temperature_Celsius     0x0022   106   101   000    Old_age   Always       -       44
>196 Reallocated_Event_Count 0x0032   194   194   000    Old_age   Always       -       6
>197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
>198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
>199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
>200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0
>
>SMART Error Log Version: 1
>No Errors Logged
>
>SMART Self-test log structure revision number 1
>Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
># 1  Conveyance offline  Completed without error       00%     30896         -
># 2  Short offline       Completed without error       00%     30896         -
># 3  Extended offline    Completed without error       00%     42525         -
># 4  Short offline       Completed without error       00%     42518         -
># 5  Short offline       Completed: read failure       90%     42496         3299402936
># 6  Extended offline    Completed: read failure       90%     42494         3299402936
># 7  Extended offline    Completed without error       00%        41         -
># 8  Short offline       Completed without error       00%        33         -
># 9  Conveyance offline  Completed without error       00%        33         -
>2 of 2 failed self-tests are outdated by newer successful extended offline self-test # 3
>
>SMART Selective self-test log data structure revision number 1
> SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
>    1        0        0  Not_testing
>    2        0        0  Not_testing
>    3        0        0  Not_testing
>    4        0        0  Not_testing
>    5        0        0  Not_testing
>Selective self-test flags (0x0):
>  After scanning selected spans, do NOT read-scan remainder of disk.
>If Selective self-test is pending on power-up, resume after 0 minute delay.
>
>The above only provides legacy SMART information - try 'smartctl -x' for more
>

Comment 4 bez.powell 2025-06-22 13:06:23 UTC
Just reporting that this seems to be affecting a number of users: https://www.reddit.com/r/Fedora/comments/1lhj5bj/guys_am_i_cooked/

I received the warning when I started my system (Fedora Linux 42.20250622.0 (Silverblue)) up this afternoon (2025-06-22). The 'DISK IS LIKELY TO FAIL SOON' warning is being shown for both drives in a btrfs RAID 1 array. Running `smartctl -H` for both drives returns `SMART overall-health self-assessment test result: PASSED` and, as for Andre, the full report shows both drives to be healthy.

Comment 5 Fabio Valentini 2025-06-22 13:35:40 UTC
Seeing this too on one of my machines.

Comment 6 otheos 2025-06-22 13:37:08 UTC
(In reply to Andre Robatino from comment #2)
> Created attachment 2094653 [details]
> output of "smartctl -a /dev/sda" from 2025-06-22

Run a self test and post the output again. You drive has had failures before and has at least 6 reallocated sectors.

Do ```sudo smartctl -t short /dev/sda``` and after 5 minutes or so, do ```sudo smartctl -a /dev/sda``` and post the new output again.

Comment 7 Andre Robatino 2025-06-22 14:25:21 UTC
(In reply to otheos from comment #6)
> (In reply to Andre Robatino from comment #2)
> > Created attachment 2094653 [details]
> > output of "smartctl -a /dev/sda" from 2025-06-22
> 
> Run a self test and post the output again. You drive has had failures before
> and has at least 6 reallocated sectors.
> 
> Do ```sudo smartctl -t short /dev/sda``` and after 5 minutes or so, do
> ```sudo smartctl -a /dev/sda``` and post the new output again.

As explained above, I had already done a short self-test and a conveyance self-test before posting the latest smartctl output, and was in the middle of an extended self-test, which took hours so I went to bed. After waking up, I find it finished successfully. Posting the new smartctl output for today after the extended test finished.

Comment 8 Andre Robatino 2025-06-22 14:28:49 UTC
Created attachment 2094661 [details]
output of "smartctl -a /dev/sda" from 2025-06-22 after additional extended self-test

This version of smartctl_20250622.txt is after a successful short, conveyance, AND extended self-test. The previous version was only after the short and conveyance tests. (I was in the middle of running the extended self-test before.)

Comment 9 Jan Vlug 2025-06-22 14:57:03 UTC
Created attachment 2094662 [details]
smartctl -x /dev/sda at 16:31

Comment 10 Jan Vlug 2025-06-22 14:57:32 UTC
Created attachment 2094663 [details]
smartctl -x /dev/sda at 16:46

Comment 11 Jan Vlug 2025-06-22 14:59:01 UTC
I see this issue as well. I attached the outcome of two runs of:

smartctl -x /dev/sda

Comment 12 Andre Robatino 2025-06-22 15:09:59 UTC
During my extended test I had downgraded to 2.10.90-2, but no change even after the extended test finished, and even after rebooting, so I finally just gave up and upgraded udisks2 back to the current version. The machine is dual-boot with Windows 11 23H2, which says the disk is healthy, as usual.

Comment 13 ThumpnVTwin 2025-06-22 15:28:21 UTC
(In reply to Andre Robatino from comment #12)
> During my extended test I had downgraded to 2.10.90-2, but no change even
> after the extended test finished, and even after rebooting, so I finally
> just gave up and upgraded udisks2 back to the current version. The machine
> is dual-boot with Windows 11 23H2, which says the disk is healthy, as usual.

Not surprising this started happening to me on version 2.10.90-2 (not saying it is or isn't related this component, I don't know). I made a report about it on Reddit's r/Fedora yesterday. https://old.reddit.com/r/Fedora/comments/1lgz7u2/fedora_reporting_drive_going_to_fail_soon/

I just got the update to 2.10.90-3 this morning. 

I haven't got the warning though since my SSD has been kept below maybe 60 degrees C with both 2.10.90-2 and -3. Maybe just a coincidence but something of interest.

Comment 14 Andre Robatino 2025-06-22 15:43:33 UTC
BTW, my HDD is VERY old, I originally installed it in 2014. The Power-On Hours normalized attribute dropped by one every month until it eventually got to 1, then stayed there. But that's just an old-age attribute and shouldn't trigger a pre-fail warning. Looking at other people's smartctl output, the error is happening even with much newer drives so I don't think the age is the cause. The 6 bad sectors were there shortly after I installed the HDD on 2014-06-11 and haven't changed, I'm guessing the manufacturer failed to zero out the number before shipping it.

Comment 15 ThumpnVTwin 2025-06-22 15:49:58 UTC
Created attachment 2094664 [details]
output of "smartctl -a /dev/sda" from 2025-06-21 (udisks2-2.10.90-2)

This is my output from yesterday with udisks2-2.10.90-2 that also reported DISK IS LIKELY TO FAIL SOON". Mentioned in my previous post.

Comment 16 Andre Robatino 2025-06-22 16:12:54 UTC
Unfortunately, I don't remember whether I looked at gnome-disks in the last few days, before upgrading to -3. I was alerted by a GNOME notification while the upgrade to -3 was finishing up, but it's possible that maybe gnome-disks was showing the error before and I missed it. In that case a different component would be responsible.

Comment 17 ThumpnVTwin 2025-06-22 16:23:43 UTC
(In reply to Andre Robatino from comment #16)
> Unfortunately, I don't remember whether I looked at gnome-disks in the last
> few days, before upgrading to -3. I was alerted by a GNOME notification
> while the upgrade to -3 was finishing up, but it's possible that maybe
> gnome-disks was showing the error before and I missed it. In that case a
> different component would be responsible.

It's possible you missed it. In my case the notification popped up and disappeared; Fedora wasn't showing it under notifications but it was there in gnome-disks. 

Some are saying it's correct though and my SSD is near end of life because of power on hours (20K), TBW (50 but this ssd is rated for 180), and Percent_Lifetime_Remaing is showing raw value of 20. It's unusual that suddenly many started getting this warning though.

Comment 18 Andre Robatino 2025-06-22 16:54:48 UTC
AIUI, Power-On Hours is only an old-age attribute and shouldn't trigger a pre-fail warning even if it hits the threshold (and for me mine stopped dropping monthly as soon as it got to 1, so it will never hit the threshold). And from your smartctl output, Percent_Lifetime_Remaining is also old-age only.

Comment 19 ThumpnVTwin 2025-06-22 20:31:49 UTC
I've been using the laptop all day hoping it would throw the warning again ("DISK IS LIKELY TO FAIL SOON") but it hasn't. Just so I can pull a report directly after; meaning not after running test (smartctl -t short). I don't know what to make of all this. I suspect it's not udisk but another component.

Comment 20 pg_tips 2025-06-22 21:08:51 UTC
Is it possible that the recent update to libblockdev (to 3.3.1) is implicated? (Someone in the Reddit thread also suggested libblockdev as being relevant.)

The commit https://src.fedoraproject.org/rpms/libblockdev/c/2946a1a99f32ad2f63e0093e5020bdd03a3d6ed8?branch=rawhide removed an apparently SMART-related patch that was in the previous version:

- # https://issues.redhat.com/browse/RHEL-80620
- Patch1:      libatasmart-overall_drive_self-assessment.patch

Comment 21 pg_tips 2025-06-22 21:24:31 UTC
Here's the patch that was removed in libblockdev 3.3.1: https://src.fedoraproject.org/rpms/libblockdev/blob/rawhide/f/libatasmart-overall_drive_self-assessment.patch

The header describes the rationale for the patch:

---
The libatasmart attribute overall status differs slightly from
the drive SMART self-assessment and is very sensitive for particular
status values. Such status should fit more like a pre-fail warning,
no reason to fail hard the global assessment. Even a single reallocated
sector would cause a warning, while the drive could be quite healthy
otherwise.
---

So it was specifically designed to prevent reporting of failure warnings by libatasmart which (in the patch author’s opinion) were overly sensitive.

This would explain how some disks (for example those with a small non-zero number of reallocated sectors) would report an OK SMART status under the patched libblockdev 3.3.0, but a failure status under libblockdev 3.3.1-1.

It looks like 3.3.1-2 (available in Rawhide now) restores the patch and would bring back the previous behaviour: https://src.fedoraproject.org/rpms/libblockdev/c/c3a88ad70a91b2ed89ebdd6c0d727c7d45ba7c8a?branch=rawhide

Comment 22 Andre Robatino 2025-06-22 21:48:25 UTC
You beat me to it, I just applied the last few day's worth of updates to my other machines to see if the same bug affects them (it doesn't, despite one of them having an almost identical HDD), and I noticed libblockdev ( https://bodhi.fedoraproject.org/updates/FEDORA-2025-af7ba2696c ) from June 18 among the updates. Just downgrading the libblockdev packages didn't make the error go away on the affected machine, but when I also downgraded udisks2 then it went away. Upgrading udisks again (but not libblockdev), the error doesn't come back, so it does indeed appear to be the fault of libblockdev.

Comment 23 Fedora Update System 2025-06-23 06:04:20 UTC
FEDORA-2025-21e1b83a45 (libblockdev-3.3.1-2.fc42) has been submitted as an update to Fedora 42.
https://bodhi.fedoraproject.org/updates/FEDORA-2025-21e1b83a45

Comment 24 Vojtech Trefny 2025-06-23 06:19:19 UTC
(In reply to pg_tips from comment #21)
> Here's the patch that was removed in libblockdev 3.3.1:

Yes, I was too focused on fixing the CVE as soon as possible and I didn't notice the patch being removed by packit build. Sorry about that. Fixed version is now in Bodhi.

Comment 25 Tomáš Bžatek 2025-06-23 10:35:12 UTC
Interesting, udisks2-2.10.90-1.fc42 has been built 2024-10-02 with the change you're all seeing. Fixed only in libblockdev-3.3.0-3.fc42 on 2025-03-20. There was half a year window and nobody seemed to notice the bug until now.

Comment 26 Andre Robatino 2025-06-23 14:44:03 UTC
That period was when 42 was either Rawhide or Branched. Many/most people including me only run those in a VM. The trigger seems to be the presence of bad sectors, which might only be seen on bare metal. (Of my 3 machines, the one with bad sectors was the only one affected. For people using SSDs there might be other triggers.)

Comment 27 Tomáš Bžatek 2025-06-23 15:02:24 UTC
(In reply to Andre Robatino from comment #26)
> That period was when 42 was either Rawhide or Branched. Many/most people
> including me only run those in a VM. The trigger seems to be the presence of
> bad sectors, which might only be seen on bare metal. (Of my 3 machines, the
> one with bad sectors was the only one affected. For people using SSDs there
> might be other triggers.)

Correct, it was early rawhide back then. Still, given the severity of the issue, we were hoping to get at least some feedback and some testing before the final F42 release - and that's the primary point of all the beta phases.

Comment 28 Fabio Valentini 2025-06-23 15:12:01 UTC
Nope, this issue was caused by 3.3.1, by dropping a downstream patch (which has now been added back):
https://src.fedoraproject.org/rpms/libblockdev/c/ec3a6f16e6701eb44ce1101f26aca3c9f94cdc38?branch=f42
- which was only 5 days ago, *not* during the early F42 development cycle.

Comment 29 Tomáš Bžatek 2025-06-23 15:16:45 UTC
(In reply to Fabio Valentini from comment #28)
> Nope, this issue was caused by 3.3.1, by dropping a downstream patch (which
> has now been added back):

The downstream patch in question was added in mid-March, the bug was there between October and March and only reappeared recently.

Comment 30 bez.powell 2025-06-23 18:46:07 UTC
(In reply to pg_tips from comment #21)
> This would explain how some disks (for example those with a small non-zero
> number of reallocated sectors) would report an OK SMART status under the
> patched libblockdev 3.3.0, but a failure status under libblockdev 3.3.1-1.

That's really interesting. I've run a full report from smartctl on my two affected drives: sda1 has 1 reallocated sector, sdb1 has 6. I'll keep an eye on both of them over the new few months but, like Andre, suspect that they've been that way since I bought them. Nice to know that the likely cause of the warning isn't something I need to be immediately concerned about, however!



(In reply to Tomáš Bžatek from comment #29)
> (In reply to Fabio Valentini from comment #28)
> > Nope, this issue was caused by 3.3.1, by dropping a downstream patch (which
> > has now been added back):
> 
> The downstream patch in question was added in mid-March, the bug was there
> between October and March and only reappeared recently.

I upgraded my Silverblue installation to 42 at the end of April, so probably just missed spotting the bug when it first appeared?

Comment 31 Fedora Update System 2025-06-24 02:00:55 UTC
FEDORA-2025-21e1b83a45 has been pushed to the Fedora 42 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2025-21e1b83a45`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2025-21e1b83a45

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 32 Fedora Update System 2025-06-25 01:19:20 UTC
FEDORA-2025-21e1b83a45 (libblockdev-3.3.1-2.fc42) has been pushed to the Fedora 42 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.