Bug 1814082 - Disk failure prediction features does not work on RHEL8.1 with smartmontools 6.6
Summary: Disk failure prediction features does not work on RHEL8.1 with smartmontools 6.6
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 4.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 4.1
Assignee: Neha Ojha
QA Contact: Manohar Murthy
URL:
Whiteboard:
Depends On: 1671154
Blocks: 1811582
TreeView+ depends on / blocked
 
Reported: 2020-03-17 00:19 UTC by Hideshi Fukumoto
Modified: 2023-10-06 19:28 UTC (History)
16 users (show)

Fixed In Version: smartmontools-7.1-1.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-19 17:33:05 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-7647 0 None None None 2023-10-06 19:28:35 UTC
Red Hat Product Errata RHSA-2020:2231 0 None None None 2020-05-19 17:33:33 UTC

Description Hideshi Fukumoto 2020-03-17 00:19:47 UTC
Description of problem:

Disk failure prediction features does not work on RHEL8.1 with smartmontools 6.6

"ceph" command in Red Hat Ceph Storage 4.0 on RHEL8.1 failed, because "smartctl" command on RHEL8.1
(in smartmontools-6.6-3.el8.x86_64.rpm: latest version for RHEL-8) can not handle "--json" option.

-----
### RHEL-8 kernel
$ uname -a
Linux <hostname> 4.18.0-147.5.1.el8_1.x86_64 #1 SMP Tue Jan 14 15:50:19 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

### RHEL-8 has smartmontools-6.6
$ rpm -q smartmontools
smartmontools-6.6-3.el8.x86_64

$ ceph -v
ceph version 14.2.4-125.el8cp (db63624068590e593c47150c7574d08c1ec0d3e4) nautilus (stable)

### "ceph device get-health-metrics <device>" fails with "smartctl returned invalid JSON".
$ ceph device get-health-metrics <device>
{
    "20200221-053259": {
        "dev": "/dev/sdf",
        "error": "smartctl returned invalid JSON",
        "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231",
        "nvme_smart_health_information_add_log_error_code": -22,
        "nvme_vendor": "lsi" 
    },
    "20200221-053821": {
        "dev": "/dev/sdf",
        "error": "smartctl returned invalid JSON",
        "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231",
        "nvme_smart_health_information_add_log_error_code": -22,
        "nvme_vendor": "lsi" 
    }
}
-----

The root cause of this issue is that RHEL-8 does not provide smartmontools-7 such as RHEL-7.

RHEL-7 already provided smartmontools-7.0-1.el7_7.1.x86_64.rpm, whose smartctl tool can handle "--json" option.

I know that that RHBZ#1671154[1] (RFE: update to smartmontools 7.0) was already filed, however
customer can not use "Disk failure prediction feature" for a long time because it's a RFE for RHEL-8 product.

As this is not a RFE but a bug for Ceph product,
I expect Ceph engineer to work somehow on this issue.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1671154

Version-Release number of selected component (if applicable):

Red Hat Ceph Storage 4.3 on RHEL 8.1

How reproducible:

Always

Steps to Reproduce:

$ ceph device get-health-metrics <device>

Actual results:

$ ceph device get-health-metrics <device>
{
    "20200221-053259": {
        "dev": "/dev/sdf",
        "error": "smartctl returned invalid JSON",
        "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231",
        "nvme_smart_health_information_add_log_error_code": -22,
        "nvme_vendor": "lsi" 
    },
    "20200221-053821": {
        "dev": "/dev/sdf",
        "error": "smartctl returned invalid JSON",
        "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231",
        "nvme_smart_health_information_add_log_error_code": -22,
        "nvme_vendor": "lsi" 
    }
}

Expected results:

$ ceph device get-health-metrics <device>

output information with any errors

Additional info:

 This issue was filed as RHBZ#1671154 (for RHEL-8 RFE)

Comment 3 Boris Ranto 2020-03-24 11:50:03 UTC
The error is coming from rados, see here:

https://github.com/ceph/ceph/blob/master/src/common/blkdev.cc#L753

We are just calling rados from the devicehealth ceph-mgr plugin, re-assigning.

Comment 17 errata-xmlrpc 2020-05-19 17:33:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:2231


Note You need to log in before you can comment on or make changes to this bug.