Bug 830910

Summary:

udisks2.service prevents drive from ever spinning down due to constant polling for SMART data.

Product:

[Fedora] Fedora

Reporter:

Reilly Hall <sly.midnight>

Component:

udisks2

Assignee:

David Zeuthen <davidz>

Status:

CLOSED WORKSFORME

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

CC:

davidz, floydbarber, mads, mclasen

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2012-10-04 14:56:03 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
smartctl all output /dev/sdb	none

Description Reilly Hall 2012-06-11 16:03:24 UTC

Description of problem:
This started happening after an upgrade from F16 to F17 via preupgrade:
In /var/log/messages:
Jun 11 11:38:22 Kentsfield udisksd[5536]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/WDC_WD1500AHFD_00RAR1_WD_WMAP41603396: Error updating SMART data: sk_disk_smart_status: Input/output error (udisks-error-quark, 0)
Jun 11 11:48:22 Kentsfield udisksd[5536]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/WDC_WD1500AHFD_00RAR1_WD_WMAP41603396: Error updating SMART data: sk_disk_smart_status: Input/output error (udisks-error-quark, 0)


Version-Release number of selected component (if applicable):
udisks2-1.94.0-5.fc17.x86_64
libudisks2-1.94.0-5.fc17.x86_64
udisks-1.0.4-6.fc17.x86_64

How reproducible:
Everytime I boot the machine the udisks2.service is re-enabled even if I disable it!

Steps to Reproduce:
1. Upgrade from F16 to F17 via preupgrade
2. Boot into F17 and update to latest via yum update
3. Watch these messages appear in log and prevent drive from going into standby as it used to when monitored solely via smartd.
  
Actual results:
My main concern apart from the constant spamming of the system log file is that the drive is set via a call to hdparm from a script at boot time to allow the drive to spin itself down after roughly 20 minutes of idle time (it is a drive I use for VM images in KVM that I don't actually use all that often and would prefer it to be in standby the bulk of the time).  This constant polling roughly every 10 minutes prevents the drive from ever going to standby.  I used to monitor the drive (and the system OS drive) via smartd with specific instructions that if the drive were to be found to be asleep, do not query the drive as that would wake it.  The drive would go to sleep because smartd's default check interval was every 30 minutes, and after 20 the drive would have spun down.

This no longer works due to this new daemon I didn't previously have and cannot just remove as other packages depend on it apparently. Abd disabling is also something that gets overridden.


Expected results:
smartd to continue monitoring the drives the exact way I used to that worked perfectly for me, and this service to either respect spun down drives or ignore them if already monitored via smartd.


Additional info:
Configuration of smartd.conf that prevents smartd from performing a query that would otherwise wake a spundown drive:
/dev/sda -d sat -H -i 198 -I 198 -m root -M daily
/dev/sdb -d sat -n standby,q -H -l error -l selftest -f -t -i 194 -I 194 -W 5,40,50 -m root -M daily

Comment 1 David Zeuthen 2012-06-11 16:42:12 UTC

(In reply to comment #0)
> Jun 11 11:38:22 Kentsfield udisksd[5536]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/WDC_WD1500AHFD_00RAR1_WD_WMAP41603396: Error updating SMART data: sk_disk_smart_status: Input/output error (udisks-error-quark, 0)
> Jun 11 11:48:22 Kentsfield udisksd[5536]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/WDC_WD1500AHFD_00RAR1_WD_WMAP41603396: Error updating SMART data: sk_disk_smart_status: Input/output error (udisks-error-quark, 0)

This is because libatasmart cannot determine the self-assessment of the disk drive. Which is either because of broken firmware in the driver or one of the translations layers inbetween (or a bug in libatasmart)... it's important to log such a condition (which is why udisks does that).

> How reproducible:
> Everytime I boot the machine the udisks2.service is re-enabled even if I
> disable it!

Just FYI the udisks2.service cannot be disabled (it can however be masked).

> 3. Watch these messages appear in log and prevent drive from going into
> standby as it used to when monitored solely via smartd.

So the problem is really that the system logger is waking up the disk?

Honestly, I fail to see how this is a bug with udisks2 - I mean, we detect a problem and we log it to the system logger because we think it's important. I can see how this is annoying, sure, but the problem is either with your hardware causing this error OR the system logger not having enough smarts to avoid waking up the disk.

Yes, we could "fix" this in udisks2 by simply not logging or only logging once but that is not the right fix...

Comment 2 David Zeuthen 2012-06-11 16:44:25 UTC

Btw, changing the sleep timer in the disk to be < 10 minutes should prevent udisksd from sending any SMART commands to the disk. Might be worth a shot.

Comment 3 Reilly Hall 2012-06-11 16:50:51 UTC

Created attachment 590985 [details]
smartctl all output /dev/sdb

Here is what smartctl -a /dev/sdb sees from the drive.  I see no mention of any error and smartd has been monitoring this drive for a few years now no problem.  This must then be a bug in libata (should I change this bug to reflect that maybe?)

Also, it is not the system logger waking up the drive, as the logs do not sit on /dev/sdb1 but rather on the root volume /dev/sda which is an SSD that doesn't spin anyhow.

I believe it's the constant attempt to poll the drive that is keeping it awake.

However, I will try your suggestion to get the drive to sleep sooner than 10 minutes.  If that works, I will just close this bug...but it is worth noting that while I agree with you that an error polling the drive's health self assessment is important and should be logged, why then can smartd check the drive just fine?

Comment 4 David Zeuthen 2012-06-11 17:18:48 UTC

(In reply to comment #3)
> Also, it is not the system logger waking up the drive, as the logs do not
> sit on /dev/sdb1 but rather on the root volume /dev/sda which is an SSD that
> doesn't spin anyhow.

OK, good to know.

> I believe it's the constant attempt to poll the drive that is keeping it
> awake.

Could be... even if there was no error from reading SMART, udisksd doing this every ten minutes could reset the sleep timer so if the sleep timer is bigger than ten minutes it will never go to sleep...

(I've seen disks where reading SMART data didn't reset the timer and I've seen disks where it did... the way I read the specs is that it's vendor-dependent.)

Here's what you can try to check this

 - systemctl stop udisks2.service
 - make smartd check every ten minutes

Does this keep the disks spun up? If not, then a bug-fix in libatasmart should fix this problem...

> However, I will try your suggestion to get the drive to sleep sooner than 10
> minutes.  If that works, I will just close this bug...but it is worth noting
> that while I agree with you that an error polling the drive's health self
> assessment is important and should be logged, why then can smartd check the
> drive just fine?

smartd and libatasmart are different code-bases and have different bugs...

Comment 5 Reilly Hall 2012-06-11 17:41:09 UTC

I have an excellent update. I tried your suggestion of using a shorter timeout to see if udisksd would respect a sleeping disk and leave it that way...and it apparently does.

I forced the disk to standby using hdparm -y /dev/sdb. I however left udisks2.service running intentionally to see if it would still attempt to poll the drive. And apparently it DOES check for a sleeping drive and ignores it.

Here are the last few lines of my system log:

Jun 11 12:30:04 Kentsfield smartd[591]: Device: /dev/sdb [SAT], Temperature 43 Celsius reached limit of 40 Celsius (Min/Max 43/44)
Jun 11 12:38:22 Kentsfield udisksd[5536]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/WDC_WD1500AHFD_00RAR1_WD_WMAP41603396: Error updating SMART data: sk_disk_smart_status: Input/output error (udisks-error-quark, 0)
Jun 11 12:48:22 Kentsfield udisksd[5536]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/WDC_WD1500AHFD_00RAR1_WD_WMAP41603396: Error updating SMART data: sk_disk_smart_status: Input/output error (udisks-error-quark, 0)
Jun 11 13:37:02 Kentsfield nx: Mon Jun 11 13:37:02 EDT 2012

And an hdparm -C /dev/sdb still shows the drive as being in a standby state.

As you can see, the last 2 attempts to check the drive were at 12:38:22 and 12:48:22, but shortly thereafter was when I manually forced the drive asleep and then you can see that as of 13:37:02 there haven't been any more checks.

This is with BOTH udisks2.service and smartd.service running and configured to poll that drive.

So I've taken the liberty to modify my hdparm command in /etc/rc.local to a MUCH shorter timeout.

Thanks again for the suggestion!

Comment 6 David Zeuthen 2012-10-04 14:56:03 UTC

OK, I think we can close this bug now. Thanks.