Bug 1837645
| Summary: | ceph device get-health-metrics does not work when smartctl command throws non-zero error code | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Veera Raghava Reddy <vereddy> |
| Component: | RADOS | Assignee: | Neha Ojha <nojha> |
| Status: | CLOSED ERRATA | QA Contact: | Manohar Murthy <mmurthy> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.1 | CC: | akupczyk, bhubbard, ceph-eng-bugs, dzafman, gjose, hyelloji, jbrier, jdurgin, kchai, nojha, rzarzyns, sseshasa, tserlin, yhatuka |
| Target Milestone: | z1 | ||
| Target Release: | 4.1 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | ceph-14.2.8-68.el8cp, ceph-14.2.8-68.el7cp | Doc Type: | Bug Fix |
| Doc Text: |
.Health metrics are correctly reported when `smartctl` exits with a non-zero error code
Previously, the `ceph device get-health-metrics` command could fail to report metrics if `smartctl` exited with a non-zero error code even though running `smartctl` directly reported the correct information. In this case a JSON error was reported instead. In {storage-product} 4.1z1, the `ceph device get-health-metrics` command reports metrics even if `smartctl` exits with a non-zero error code as long as `smartctl` itself reports correct information.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-07-20 14:21:03 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1816167 | ||
| Attachments: | |||
|
Description
Veera Raghava Reddy
2020-05-19 18:11:07 UTC
*** Bug 1840272 has been marked as a duplicate of this bug. *** Hi Josh,
Still see json error with 4.1z1 build.
[root@extensa003 ~]# ceph -v
ceph version 14.2.8-79.el7cp (2d4542a7b3632dd9a7b09b5700f711e8016a94fd) nautilus (stable)
[root@extensa003 ~]# ceph device get-health-metrics AVAGO_SMC3108_00f11da416efd1fc2200d36c23800403
{
"20200629-001032": {
"dev": "/dev/sdg",
"error": "smartctl returned invalid JSON",
"nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231",
"nvme_smart_health_information_add_log_error_code": -22,
"nvme_vendor": "avago"
},
"20200630-000823": {
"dev": "/dev/sdg",
"error": "smartctl returned invalid JSON",
"nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231",
"nvme_smart_health_information_add_log_error_code": -22,
"nvme_vendor": "avago"
}
}
smartctl output -
[root@mero007 ~]# smartctl -a --json "/dev/sdg"
{
"json_format_version": [
1,
0
],
"smartctl": {
"version": [
7,
0
],
"svn_revision": "4883",
"platform_info": "x86_64-linux-3.10.0-1062.7.1.el7.x86_64",
"build_info": "(local build)",
"argv": [
"smartctl",
"-a",
"--json",
"/dev/sdg"
],
"exit_status": 4
},
"device": {
"name": "/dev/sdg",
"info_name": "/dev/sdg",
"type": "scsi",
"protocol": "SCSI"
},
"vendor": "AVAGO",
"product": "SMC3108",
"model_name": "AVAGO SMC3108",
"revision": "4.68",
"scsi_version": "SPC-3",
"user_capacity": {
"blocks": 15626993664,
"bytes": 8001020755968
},
"logical_block_size": 512,
"physical_block_size": 4096,
"serial_number": "00f11da416efd1fc2200d36c23800403",
"device_type": {
"scsi_value": 0,
"name": "disk"
},
"local_time": {
"time_t": 1593500798,
"asctime": "Tue Jun 30 07:06:38 2020 UTC"
},
"temperature": {
"current": 0,
"drive_trip": 0
}
}
(In reply to Veera Raghava Reddy from comment #6) > Hi Josh, > Still see json error with 4.1z1 build. Hi Veera, thanks for checking, fortunately the extra output gives more detail now. > [root@extensa003 ~]# ceph -v > ceph version 14.2.8-79.el7cp (2d4542a7b3632dd9a7b09b5700f711e8016a94fd) > nautilus (stable) > > > [root@extensa003 ~]# ceph device get-health-metrics > AVAGO_SMC3108_00f11da416efd1fc2200d36c23800403 > { > "20200629-001032": { > "dev": "/dev/sdg", > "error": "smartctl returned invalid JSON", This error message is misleading, filed https://tracker.ceph.com/issues/46285 to fix. > "nvme_smart_health_information_add_log_error": "nvme returned an > error: sudo: exit status: 231", > "nvme_smart_health_information_add_log_error_code": -22, These nvme errors are due to the nvme cli command not supporting avago disks. These are non-fatal errors though, the lack of vendor-specific information is just ignored by the disk prediction module. > "nvme_vendor": "avago" > }, > "20200630-000823": { > "dev": "/dev/sdg", > "error": "smartctl returned invalid JSON", > "nvme_smart_health_information_add_log_error": "nvme returned an > error: sudo: exit status: 231", > "nvme_smart_health_information_add_log_error_code": -22, > "nvme_vendor": "avago" > } > } > > > smartctl output - > [root@mero007 ~]# smartctl -a --json "/dev/sdg" > { > "json_format_version": [ > 1, > 0 > ], > "smartctl": { > "version": [ > 7, > 0 > ], > "svn_revision": "4883", > "platform_info": "x86_64-linux-3.10.0-1062.7.1.el7.x86_64", > "build_info": "(local build)", > "argv": [ > "smartctl", > "-a", > "--json", > "/dev/sdg" > ], > "exit_status": 4 Exit status 4 for smartctl means "Bit 2: Some SMART or other ATA command to the disk failed, or there was a checksum error in a SMART data structure" Is there another type of disk you can try this on? It seems these disks do not support smart reporting. Created attachment 1699430 [details]
smartctl output for AVAGO drive - Not supporting smart format
Created attachment 1699431 [details]
smartctl output for Seagate drive - Supporting smart format
Created attachment 1699432 [details]
smartctl output for Micron drive - Supporting smart format
Hi Josh,
Attached json output for the following drives
AVAGO - not supporting samrt format
Seagate - smart status "pass"
Micron - smart status "pass"
When I reran the output was in long format with the fix.
{
"20200630-000823": {
"dev": "/dev/sdg",
"error": "smartctl returned invalid JSON",
"nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231",
"nvme_smart_health_information_add_log_error_code": -22,
"nvme_vendor": "avago"
},
"20200701-000912": {
"device": {
"info_name": "/dev/sdg",
"name": "/dev/sdg",
"protocol": "SCSI",
"type": "scsi"
},
"device_type": {
"name": "disk",
"scsi_value": 0
},
"json_format_version": [
1,
0
],
"local_time": {
"asctime": "Wed Jul 1 00:07:00 2020 UTC",
"time_t": 1593562020
},
"logical_block_size": 512,
"model_name": "AVAGO SMC3108",
"nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231",
"nvme_smart_health_information_add_log_error_code": -22,
"nvme_vendor": "avago",
"physical_block_size": 4096,
"product": "SMC3108",
"revision": "4.68",
"scsi_version": "SPC-3",
"serial_number": "00f11da416efd1fc2200d36c23800403",
"smartctl": {
"argv": [
"smartctl",
"-a",
"--json",
"/dev/sdg"
],
"build_info": "(local build)",
"exit_status": 4,
"platform_info": "x86_64-linux-3.10.0-1062.7.1.el7.x86_64",
"svn_revision": "4883",
"version": [
7,
0
]
},
"temperature": {
"current": 0,
"drive_trip": 0
},
"user_capacity": {
"blocks": 15626993664,
"bytes": 8001020755968
},
"vendor": "AVAGO"
}
}
Verifying this BZ as the json error is due to specific drive [AVAGO] not supporting smart format. For other devices the smart format is showing proper json output. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:3003 |