Bug 830910
| Summary: | udisks2.service prevents drive from ever spinning down due to constant polling for SMART data. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Reilly Hall <sly.midnight> | ||||
| Component: | udisks2 | Assignee: | David Zeuthen <davidz> | ||||
| Status: | CLOSED WORKSFORME | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 17 | CC: | davidz, floydbarber, mads, mclasen | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2012-10-04 14:56:03 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Reilly Hall
2012-06-11 16:03:24 UTC
(In reply to comment #0) > Jun 11 11:38:22 Kentsfield udisksd[5536]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/WDC_WD1500AHFD_00RAR1_WD_WMAP41603396: Error updating SMART data: sk_disk_smart_status: Input/output error (udisks-error-quark, 0) > Jun 11 11:48:22 Kentsfield udisksd[5536]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/WDC_WD1500AHFD_00RAR1_WD_WMAP41603396: Error updating SMART data: sk_disk_smart_status: Input/output error (udisks-error-quark, 0) This is because libatasmart cannot determine the self-assessment of the disk drive. Which is either because of broken firmware in the driver or one of the translations layers inbetween (or a bug in libatasmart)... it's important to log such a condition (which is why udisks does that). > How reproducible: > Everytime I boot the machine the udisks2.service is re-enabled even if I > disable it! Just FYI the udisks2.service cannot be disabled (it can however be masked). > 3. Watch these messages appear in log and prevent drive from going into > standby as it used to when monitored solely via smartd. So the problem is really that the system logger is waking up the disk? Honestly, I fail to see how this is a bug with udisks2 - I mean, we detect a problem and we log it to the system logger because we think it's important. I can see how this is annoying, sure, but the problem is either with your hardware causing this error OR the system logger not having enough smarts to avoid waking up the disk. Yes, we could "fix" this in udisks2 by simply not logging or only logging once but that is not the right fix... Btw, changing the sleep timer in the disk to be < 10 minutes should prevent udisksd from sending any SMART commands to the disk. Might be worth a shot. Created attachment 590985 [details]
smartctl all output /dev/sdb
Here is what smartctl -a /dev/sdb sees from the drive. I see no mention of any error and smartd has been monitoring this drive for a few years now no problem. This must then be a bug in libata (should I change this bug to reflect that maybe?)
Also, it is not the system logger waking up the drive, as the logs do not sit on /dev/sdb1 but rather on the root volume /dev/sda which is an SSD that doesn't spin anyhow.
I believe it's the constant attempt to poll the drive that is keeping it awake.
However, I will try your suggestion to get the drive to sleep sooner than 10 minutes. If that works, I will just close this bug...but it is worth noting that while I agree with you that an error polling the drive's health self assessment is important and should be logged, why then can smartd check the drive just fine?
(In reply to comment #3) > Also, it is not the system logger waking up the drive, as the logs do not > sit on /dev/sdb1 but rather on the root volume /dev/sda which is an SSD that > doesn't spin anyhow. OK, good to know. > I believe it's the constant attempt to poll the drive that is keeping it > awake. Could be... even if there was no error from reading SMART, udisksd doing this every ten minutes could reset the sleep timer so if the sleep timer is bigger than ten minutes it will never go to sleep... (I've seen disks where reading SMART data didn't reset the timer and I've seen disks where it did... the way I read the specs is that it's vendor-dependent.) Here's what you can try to check this - systemctl stop udisks2.service - make smartd check every ten minutes Does this keep the disks spun up? If not, then a bug-fix in libatasmart should fix this problem... > However, I will try your suggestion to get the drive to sleep sooner than 10 > minutes. If that works, I will just close this bug...but it is worth noting > that while I agree with you that an error polling the drive's health self > assessment is important and should be logged, why then can smartd check the > drive just fine? smartd and libatasmart are different code-bases and have different bugs... I have an excellent update. I tried your suggestion of using a shorter timeout to see if udisksd would respect a sleeping disk and leave it that way...and it apparently does. I forced the disk to standby using hdparm -y /dev/sdb. I however left udisks2.service running intentionally to see if it would still attempt to poll the drive. And apparently it DOES check for a sleeping drive and ignores it. Here are the last few lines of my system log: Jun 11 12:30:04 Kentsfield smartd[591]: Device: /dev/sdb [SAT], Temperature 43 Celsius reached limit of 40 Celsius (Min/Max 43/44) Jun 11 12:38:22 Kentsfield udisksd[5536]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/WDC_WD1500AHFD_00RAR1_WD_WMAP41603396: Error updating SMART data: sk_disk_smart_status: Input/output error (udisks-error-quark, 0) Jun 11 12:48:22 Kentsfield udisksd[5536]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/WDC_WD1500AHFD_00RAR1_WD_WMAP41603396: Error updating SMART data: sk_disk_smart_status: Input/output error (udisks-error-quark, 0) Jun 11 13:37:02 Kentsfield nx: Mon Jun 11 13:37:02 EDT 2012 And an hdparm -C /dev/sdb still shows the drive as being in a standby state. As you can see, the last 2 attempts to check the drive were at 12:38:22 and 12:48:22, but shortly thereafter was when I manually forced the drive asleep and then you can see that as of 13:37:02 there haven't been any more checks. This is with BOTH udisks2.service and smartd.service running and configured to poll that drive. So I've taken the liberty to modify my hdparm command in /etc/rc.local to a MUCH shorter timeout. Thanks again for the suggestion! OK, I think we can close this bug now. Thanks. |