Bug 1837645
Summary: | ceph device get-health-metrics does not work when smartctl command throws non-zero error code | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Veera Raghava Reddy <vereddy> |
Component: | RADOS | Assignee: | Neha Ojha <nojha> |
Status: | CLOSED ERRATA | QA Contact: | Manohar Murthy <mmurthy> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.1 | CC: | akupczyk, bhubbard, ceph-eng-bugs, dzafman, gjose, hyelloji, jbrier, jdurgin, kchai, nojha, rzarzyns, sseshasa, tserlin, yhatuka |
Target Milestone: | z1 | ||
Target Release: | 4.1 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | ceph-14.2.8-68.el8cp, ceph-14.2.8-68.el7cp | Doc Type: | Bug Fix |
Doc Text: |
.Health metrics are correctly reported when `smartctl` exits with a non-zero error code
Previously, the `ceph device get-health-metrics` command could fail to report metrics if `smartctl` exited with a non-zero error code even though running `smartctl` directly reported the correct information. In this case a JSON error was reported instead. In {storage-product} 4.1z1, the `ceph device get-health-metrics` command reports metrics even if `smartctl` exits with a non-zero error code as long as `smartctl` itself reports correct information.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2020-07-20 14:21:03 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1816167 | ||
Attachments: |
Description
Veera Raghava Reddy
2020-05-19 18:11:07 UTC
*** Bug 1840272 has been marked as a duplicate of this bug. *** Hi Josh, Still see json error with 4.1z1 build. [root@extensa003 ~]# ceph -v ceph version 14.2.8-79.el7cp (2d4542a7b3632dd9a7b09b5700f711e8016a94fd) nautilus (stable) [root@extensa003 ~]# ceph device get-health-metrics AVAGO_SMC3108_00f11da416efd1fc2200d36c23800403 { "20200629-001032": { "dev": "/dev/sdg", "error": "smartctl returned invalid JSON", "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231", "nvme_smart_health_information_add_log_error_code": -22, "nvme_vendor": "avago" }, "20200630-000823": { "dev": "/dev/sdg", "error": "smartctl returned invalid JSON", "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231", "nvme_smart_health_information_add_log_error_code": -22, "nvme_vendor": "avago" } } smartctl output - [root@mero007 ~]# smartctl -a --json "/dev/sdg" { "json_format_version": [ 1, 0 ], "smartctl": { "version": [ 7, 0 ], "svn_revision": "4883", "platform_info": "x86_64-linux-3.10.0-1062.7.1.el7.x86_64", "build_info": "(local build)", "argv": [ "smartctl", "-a", "--json", "/dev/sdg" ], "exit_status": 4 }, "device": { "name": "/dev/sdg", "info_name": "/dev/sdg", "type": "scsi", "protocol": "SCSI" }, "vendor": "AVAGO", "product": "SMC3108", "model_name": "AVAGO SMC3108", "revision": "4.68", "scsi_version": "SPC-3", "user_capacity": { "blocks": 15626993664, "bytes": 8001020755968 }, "logical_block_size": 512, "physical_block_size": 4096, "serial_number": "00f11da416efd1fc2200d36c23800403", "device_type": { "scsi_value": 0, "name": "disk" }, "local_time": { "time_t": 1593500798, "asctime": "Tue Jun 30 07:06:38 2020 UTC" }, "temperature": { "current": 0, "drive_trip": 0 } } (In reply to Veera Raghava Reddy from comment #6) > Hi Josh, > Still see json error with 4.1z1 build. Hi Veera, thanks for checking, fortunately the extra output gives more detail now. > [root@extensa003 ~]# ceph -v > ceph version 14.2.8-79.el7cp (2d4542a7b3632dd9a7b09b5700f711e8016a94fd) > nautilus (stable) > > > [root@extensa003 ~]# ceph device get-health-metrics > AVAGO_SMC3108_00f11da416efd1fc2200d36c23800403 > { > "20200629-001032": { > "dev": "/dev/sdg", > "error": "smartctl returned invalid JSON", This error message is misleading, filed https://tracker.ceph.com/issues/46285 to fix. > "nvme_smart_health_information_add_log_error": "nvme returned an > error: sudo: exit status: 231", > "nvme_smart_health_information_add_log_error_code": -22, These nvme errors are due to the nvme cli command not supporting avago disks. These are non-fatal errors though, the lack of vendor-specific information is just ignored by the disk prediction module. > "nvme_vendor": "avago" > }, > "20200630-000823": { > "dev": "/dev/sdg", > "error": "smartctl returned invalid JSON", > "nvme_smart_health_information_add_log_error": "nvme returned an > error: sudo: exit status: 231", > "nvme_smart_health_information_add_log_error_code": -22, > "nvme_vendor": "avago" > } > } > > > smartctl output - > [root@mero007 ~]# smartctl -a --json "/dev/sdg" > { > "json_format_version": [ > 1, > 0 > ], > "smartctl": { > "version": [ > 7, > 0 > ], > "svn_revision": "4883", > "platform_info": "x86_64-linux-3.10.0-1062.7.1.el7.x86_64", > "build_info": "(local build)", > "argv": [ > "smartctl", > "-a", > "--json", > "/dev/sdg" > ], > "exit_status": 4 Exit status 4 for smartctl means "Bit 2: Some SMART or other ATA command to the disk failed, or there was a checksum error in a SMART data structure" Is there another type of disk you can try this on? It seems these disks do not support smart reporting. Created attachment 1699430 [details]
smartctl output for AVAGO drive - Not supporting smart format
Created attachment 1699431 [details]
smartctl output for Seagate drive - Supporting smart format
Created attachment 1699432 [details]
smartctl output for Micron drive - Supporting smart format
Hi Josh, Attached json output for the following drives AVAGO - not supporting samrt format Seagate - smart status "pass" Micron - smart status "pass" When I reran the output was in long format with the fix. { "20200630-000823": { "dev": "/dev/sdg", "error": "smartctl returned invalid JSON", "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231", "nvme_smart_health_information_add_log_error_code": -22, "nvme_vendor": "avago" }, "20200701-000912": { "device": { "info_name": "/dev/sdg", "name": "/dev/sdg", "protocol": "SCSI", "type": "scsi" }, "device_type": { "name": "disk", "scsi_value": 0 }, "json_format_version": [ 1, 0 ], "local_time": { "asctime": "Wed Jul 1 00:07:00 2020 UTC", "time_t": 1593562020 }, "logical_block_size": 512, "model_name": "AVAGO SMC3108", "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231", "nvme_smart_health_information_add_log_error_code": -22, "nvme_vendor": "avago", "physical_block_size": 4096, "product": "SMC3108", "revision": "4.68", "scsi_version": "SPC-3", "serial_number": "00f11da416efd1fc2200d36c23800403", "smartctl": { "argv": [ "smartctl", "-a", "--json", "/dev/sdg" ], "build_info": "(local build)", "exit_status": 4, "platform_info": "x86_64-linux-3.10.0-1062.7.1.el7.x86_64", "svn_revision": "4883", "version": [ 7, 0 ] }, "temperature": { "current": 0, "drive_trip": 0 }, "user_capacity": { "blocks": 15626993664, "bytes": 8001020755968 }, "vendor": "AVAGO" } } Verifying this BZ as the json error is due to specific drive [AVAGO] not supporting smart format. For other devices the smart format is showing proper json output. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:3003 |