Bug 1410233

Summary: smartd trying to poll SCSI devices in ALUA stand-by state and indicates hard drive failing
Product: Red Hat Enterprise Linux 7 Reporter: shivamerla1 <shiva.krishna>
Component: smartmontoolsAssignee: Michal Hlavinka <mhlavink>
Status: CLOSED DUPLICATE QA Contact: qe-baseos-daemons
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.3CC: kzak
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-26 17:20:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description shivamerla1 2017-01-04 20:51:18 UTC
Description of problem:
smartd seems to poll SCSI devices in ALUA stand-by state and throws error saying hard drive is failing. Any read/write/Test Unit Ready commands on SCSI devices in TPG stand-by state will cause "NOT_READY, LOGICAL UNIT NOT ACCESSIBLE, TARGET PORT IN STANDBY STATE ( 2/4/b )" check condition. smartd should handle these errors and ignore them rather than alerting user that hard drive is failing.

Version-Release number of selected component (if applicable):
smartmontools

How reproducible:
consistently

Steps to Reproduce:
1. Map certain SCSI devices from storage array with both active/stand-by ALUA paths.
2. smartd starts throwing these errors on scan.


Actual results:
root@rtp-fuji-ops17 ~]# WARNING: Your hard drive is failing
Device: /dev/sdah, unable to open device
WARNING: Your hard drive is failing
Device: /dev/sdai, unable to open device
WARNING: Your hard drive is failing
Device: /dev/sdaj, unable to open device
WARNING: Your hard drive is failing
Device: /dev/sdak, unable to open device
WARNING: Your hard drive is failing
Device: /dev/sdal, unable to open device
WARNING: Your hard drive is failing
Device: /dev/sdam, unable to open device
WARNING: Your hard drive is failing
Device: /dev/sdan, unable to open device
WARNING: Your hard drive is failing
Device: /dev/sdao, unable to open device
WARNING: Your hard drive is failing
Device: /dev/sdap, unable to open device

Expected results:
smartd should not try to scan stand-by paths.

Additional info:

Comment 2 Michal Hlavinka 2017-03-26 17:20:12 UTC
Closing as duplicate of bug #1340462, as that bug is not publicly visible. Here is what is going to change in next smartmontools update in rhel-7 (don't treat this as any guarantee, the process to fix this bug is ongoing, but many things can happen and it can be dropped from next release update)

Smartmontools daemon smartd was updated:
- "WARNING: Your hard drive is failing" message prefix was changed and it uses prefix "SMART Disk monitor:" now
- some less important messages (like can't talk with a disk - which can be caused by power saving mode) are no longer reported in terminals

Urgent messages, like some attribute indicating disk failure, are still reported (with the less scary message prefix)

All mail messages are still sent as configured

All messages are logged in syslog (journal). This includes messages like "can't read SMART values". This message is (and was) logged with INFO syslog priority. If you don't want to see info messages, but only important ones, use --priority options (man journalctl). For example
$ journalctl --priority=crit

*** This bug has been marked as a duplicate of bug 1340462 ***