Description of problem: Disk failure prediction features does not work on RHEL8.1 with smartmontools 6.6 "ceph" command in Red Hat Ceph Storage 4.0 on RHEL8.1 failed, because "smartctl" command on RHEL8.1 (in smartmontools-6.6-3.el8.x86_64.rpm: latest version for RHEL-8) can not handle "--json" option. ----- ### RHEL-8 kernel $ uname -a Linux <hostname> 4.18.0-147.5.1.el8_1.x86_64 #1 SMP Tue Jan 14 15:50:19 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux ### RHEL-8 has smartmontools-6.6 $ rpm -q smartmontools smartmontools-6.6-3.el8.x86_64 $ ceph -v ceph version 14.2.4-125.el8cp (db63624068590e593c47150c7574d08c1ec0d3e4) nautilus (stable) ### "ceph device get-health-metrics <device>" fails with "smartctl returned invalid JSON". $ ceph device get-health-metrics <device> { "20200221-053259": { "dev": "/dev/sdf", "error": "smartctl returned invalid JSON", "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231", "nvme_smart_health_information_add_log_error_code": -22, "nvme_vendor": "lsi" }, "20200221-053821": { "dev": "/dev/sdf", "error": "smartctl returned invalid JSON", "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231", "nvme_smart_health_information_add_log_error_code": -22, "nvme_vendor": "lsi" } } ----- The root cause of this issue is that RHEL-8 does not provide smartmontools-7 such as RHEL-7. RHEL-7 already provided smartmontools-7.0-1.el7_7.1.x86_64.rpm, whose smartctl tool can handle "--json" option. I know that that RHBZ#1671154[1] (RFE: update to smartmontools 7.0) was already filed, however customer can not use "Disk failure prediction feature" for a long time because it's a RFE for RHEL-8 product. As this is not a RFE but a bug for Ceph product, I expect Ceph engineer to work somehow on this issue. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1671154 Version-Release number of selected component (if applicable): Red Hat Ceph Storage 4.3 on RHEL 8.1 How reproducible: Always Steps to Reproduce: $ ceph device get-health-metrics <device> Actual results: $ ceph device get-health-metrics <device> { "20200221-053259": { "dev": "/dev/sdf", "error": "smartctl returned invalid JSON", "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231", "nvme_smart_health_information_add_log_error_code": -22, "nvme_vendor": "lsi" }, "20200221-053821": { "dev": "/dev/sdf", "error": "smartctl returned invalid JSON", "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231", "nvme_smart_health_information_add_log_error_code": -22, "nvme_vendor": "lsi" } } Expected results: $ ceph device get-health-metrics <device> output information with any errors Additional info: This issue was filed as RHBZ#1671154 (for RHEL-8 RFE)
The error is coming from rados, see here: https://github.com/ceph/ceph/blob/master/src/common/blkdev.cc#L753 We are just calling rados from the devicehealth ceph-mgr plugin, re-assigning.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:2231