Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1814082

Summary: Disk failure prediction features does not work on RHEL8.1 with smartmontools 6.6
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Hideshi Fukumoto <hfukumot>
Component: RADOSAssignee: Neha Ojha <nojha>
Status: CLOSED ERRATA QA Contact: Manohar Murthy <mmurthy>
Severity: high Docs Contact:
Priority: high    
Version: 4.0CC: akupczyk, bhubbard, bniver, ceph-eng-bugs, ceph-qe-bugs, dzafman, kchai, kdreyer, mmuench, nojha, pcfe, rzarzyns, sseshasa, tchandra, tserlin, ykaul
Target Milestone: rc   
Target Release: 4.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: smartmontools-7.1-1.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-19 17:33:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1671154    
Bug Blocks: 1811582    

Description Hideshi Fukumoto 2020-03-17 00:19:47 UTC
Description of problem:

Disk failure prediction features does not work on RHEL8.1 with smartmontools 6.6

"ceph" command in Red Hat Ceph Storage 4.0 on RHEL8.1 failed, because "smartctl" command on RHEL8.1
(in smartmontools-6.6-3.el8.x86_64.rpm: latest version for RHEL-8) can not handle "--json" option.

-----
### RHEL-8 kernel
$ uname -a
Linux <hostname> 4.18.0-147.5.1.el8_1.x86_64 #1 SMP Tue Jan 14 15:50:19 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

### RHEL-8 has smartmontools-6.6
$ rpm -q smartmontools
smartmontools-6.6-3.el8.x86_64

$ ceph -v
ceph version 14.2.4-125.el8cp (db63624068590e593c47150c7574d08c1ec0d3e4) nautilus (stable)

### "ceph device get-health-metrics <device>" fails with "smartctl returned invalid JSON".
$ ceph device get-health-metrics <device>
{
    "20200221-053259": {
        "dev": "/dev/sdf",
        "error": "smartctl returned invalid JSON",
        "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231",
        "nvme_smart_health_information_add_log_error_code": -22,
        "nvme_vendor": "lsi" 
    },
    "20200221-053821": {
        "dev": "/dev/sdf",
        "error": "smartctl returned invalid JSON",
        "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231",
        "nvme_smart_health_information_add_log_error_code": -22,
        "nvme_vendor": "lsi" 
    }
}
-----

The root cause of this issue is that RHEL-8 does not provide smartmontools-7 such as RHEL-7.

RHEL-7 already provided smartmontools-7.0-1.el7_7.1.x86_64.rpm, whose smartctl tool can handle "--json" option.

I know that that RHBZ#1671154[1] (RFE: update to smartmontools 7.0) was already filed, however
customer can not use "Disk failure prediction feature" for a long time because it's a RFE for RHEL-8 product.

As this is not a RFE but a bug for Ceph product,
I expect Ceph engineer to work somehow on this issue.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1671154

Version-Release number of selected component (if applicable):

Red Hat Ceph Storage 4.3 on RHEL 8.1

How reproducible:

Always

Steps to Reproduce:

$ ceph device get-health-metrics <device>

Actual results:

$ ceph device get-health-metrics <device>
{
    "20200221-053259": {
        "dev": "/dev/sdf",
        "error": "smartctl returned invalid JSON",
        "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231",
        "nvme_smart_health_information_add_log_error_code": -22,
        "nvme_vendor": "lsi" 
    },
    "20200221-053821": {
        "dev": "/dev/sdf",
        "error": "smartctl returned invalid JSON",
        "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231",
        "nvme_smart_health_information_add_log_error_code": -22,
        "nvme_vendor": "lsi" 
    }
}

Expected results:

$ ceph device get-health-metrics <device>

output information with any errors

Additional info:

 This issue was filed as RHBZ#1671154 (for RHEL-8 RFE)

Comment 3 Boris Ranto 2020-03-24 11:50:03 UTC
The error is coming from rados, see here:

https://github.com/ceph/ceph/blob/master/src/common/blkdev.cc#L753

We are just calling rados from the devicehealth ceph-mgr plugin, re-assigning.

Comment 17 errata-xmlrpc 2020-05-19 17:33:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:2231