Description of problem: ceph device get-health-metrics cli shows "smartctl returned invalid JSON" error, even though smartctl command returns Device metrics appropriate when run independently. Tracker BZ for upstream issue - https://tracker.ceph.com/issues/44210 Version-Release number of selected component (if applicable): RHCS 4.1 [ceph version 14.2.8-47.el7cp (8d24dfe40524f948afd782e14dc63a0d0cacb28b) nautilus (stable)] How reproducible: Reproduced multiple times with Device on mero002 and mero007 RHCS OSD nodes. Steps to Reproduce: 1. Install RHCS 4.1 2. Enable device metrics monitoring [ceph device monitoring on] 3. List Device health metrics [ceph device get-health-metrics <Device-ID>, ceph device query-daemon-health-metrics <OSD-ID>] Actual results: Error message shown for device "smartctl JSON error" Expected results: Output should show health metrics for the drive device in JSON format Additional info: While trying multiple scenarios for smartmontools BZ 1814082, came across this scenario. ceph device get-health-metrics cli shows "smartctl returned invalid JSON" error, even though smartctl command returns Device metrics appropriate when run independently. [root@extensa003 ~]# ceph -v ceph version 14.2.8-47.el7cp (8d24dfe40524f948afd782e14dc63a0d0cacb28b) nautilus (stable) [root@extensa003 ~]# ceph device ls | grep -i AVAGO_SMC3108_00be3fb415dfd1fc2200d36c23800403 AVAGO_SMC3108_00be3fb415dfd1fc2200d36c23800403 mero007:sdb osd.34 *When trying to query device health metrics from extensa003 [root/r], get error "smartctl returned invalid JSON"* [root@extensa003 ~]# ceph device get-health-metrics AVAGO_SMC3108_00be3fb415dfd1fc2200d36c23800403 { "20200518-125545": { "dev": "/dev/sdb", "error": "smartctl returned invalid JSON", "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231", "nvme_smart_health_information_add_log_error_code": -22, "nvme_vendor": "avago" }, "20200519-000940": { "dev": "/dev/sdb", "error": "smartctl returned invalid JSON", "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231", "nvme_smart_health_information_add_log_error_code": -22, "nvme_vendor": "avago" } } *smartctl command shows output on mero007 [root/r]* *smartctl 7.0 > [root@extensa003 ~]# ceph device get-health-metrics AVAGO_SMC3108_00be3fb415dfd1fc2200d36c23800403 { "20200518-125545": { "dev": "/dev/sdb", "error": "smartctl returned invalid JSON", "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231", "nvme_smart_health_information_add_log_error_code": -22, "nvme_vendor": "avago" }, "20200519-000940": { "dev": "/dev/sdb", "error": "smartctl retu[root@extensa003 ~]# ceph device get-health-metrics AVAGO_SMC3108_00be3fb415dfd1fc2200d36c23800403 { "20200518-125545": { "dev": "/dev/sdb", "error": "smartctl returned invalid JSON", "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231", "nvme_smart_health_information_add_log_error_code": -22, "nvme_vendor": "avago" }, "20200519-000940": { "dev": "/dev/sdb", "error": "smartctl returned invalid JSON", "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231", "nvme_smart_health_information_add_log_error_code": -22, "nvme_vendor": "avago" } } ********************** Hi Veera, searching for 'ceph smartctl invalid json' turns up an upstream bug for this: https://tracker.ceph.com/issues/44210 The fix isn't in 4.1 - it was added upstream in 14.2.9. Could you add a BZ for tracking this? We can backport the fix downstream for 4.1z1. Thanks, Josh **********************
*** Bug 1840272 has been marked as a duplicate of this bug. ***
Hi Josh, Still see json error with 4.1z1 build. [root@extensa003 ~]# ceph -v ceph version 14.2.8-79.el7cp (2d4542a7b3632dd9a7b09b5700f711e8016a94fd) nautilus (stable) [root@extensa003 ~]# ceph device get-health-metrics AVAGO_SMC3108_00f11da416efd1fc2200d36c23800403 { "20200629-001032": { "dev": "/dev/sdg", "error": "smartctl returned invalid JSON", "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231", "nvme_smart_health_information_add_log_error_code": -22, "nvme_vendor": "avago" }, "20200630-000823": { "dev": "/dev/sdg", "error": "smartctl returned invalid JSON", "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231", "nvme_smart_health_information_add_log_error_code": -22, "nvme_vendor": "avago" } } smartctl output - [root@mero007 ~]# smartctl -a --json "/dev/sdg" { "json_format_version": [ 1, 0 ], "smartctl": { "version": [ 7, 0 ], "svn_revision": "4883", "platform_info": "x86_64-linux-3.10.0-1062.7.1.el7.x86_64", "build_info": "(local build)", "argv": [ "smartctl", "-a", "--json", "/dev/sdg" ], "exit_status": 4 }, "device": { "name": "/dev/sdg", "info_name": "/dev/sdg", "type": "scsi", "protocol": "SCSI" }, "vendor": "AVAGO", "product": "SMC3108", "model_name": "AVAGO SMC3108", "revision": "4.68", "scsi_version": "SPC-3", "user_capacity": { "blocks": 15626993664, "bytes": 8001020755968 }, "logical_block_size": 512, "physical_block_size": 4096, "serial_number": "00f11da416efd1fc2200d36c23800403", "device_type": { "scsi_value": 0, "name": "disk" }, "local_time": { "time_t": 1593500798, "asctime": "Tue Jun 30 07:06:38 2020 UTC" }, "temperature": { "current": 0, "drive_trip": 0 } }
(In reply to Veera Raghava Reddy from comment #6) > Hi Josh, > Still see json error with 4.1z1 build. Hi Veera, thanks for checking, fortunately the extra output gives more detail now. > [root@extensa003 ~]# ceph -v > ceph version 14.2.8-79.el7cp (2d4542a7b3632dd9a7b09b5700f711e8016a94fd) > nautilus (stable) > > > [root@extensa003 ~]# ceph device get-health-metrics > AVAGO_SMC3108_00f11da416efd1fc2200d36c23800403 > { > "20200629-001032": { > "dev": "/dev/sdg", > "error": "smartctl returned invalid JSON", This error message is misleading, filed https://tracker.ceph.com/issues/46285 to fix. > "nvme_smart_health_information_add_log_error": "nvme returned an > error: sudo: exit status: 231", > "nvme_smart_health_information_add_log_error_code": -22, These nvme errors are due to the nvme cli command not supporting avago disks. These are non-fatal errors though, the lack of vendor-specific information is just ignored by the disk prediction module. > "nvme_vendor": "avago" > }, > "20200630-000823": { > "dev": "/dev/sdg", > "error": "smartctl returned invalid JSON", > "nvme_smart_health_information_add_log_error": "nvme returned an > error: sudo: exit status: 231", > "nvme_smart_health_information_add_log_error_code": -22, > "nvme_vendor": "avago" > } > } > > > smartctl output - > [root@mero007 ~]# smartctl -a --json "/dev/sdg" > { > "json_format_version": [ > 1, > 0 > ], > "smartctl": { > "version": [ > 7, > 0 > ], > "svn_revision": "4883", > "platform_info": "x86_64-linux-3.10.0-1062.7.1.el7.x86_64", > "build_info": "(local build)", > "argv": [ > "smartctl", > "-a", > "--json", > "/dev/sdg" > ], > "exit_status": 4 Exit status 4 for smartctl means "Bit 2: Some SMART or other ATA command to the disk failed, or there was a checksum error in a SMART data structure" Is there another type of disk you can try this on? It seems these disks do not support smart reporting.
Created attachment 1699430 [details] smartctl output for AVAGO drive - Not supporting smart format
Created attachment 1699431 [details] smartctl output for Seagate drive - Supporting smart format
Created attachment 1699432 [details] smartctl output for Micron drive - Supporting smart format
Hi Josh, Attached json output for the following drives AVAGO - not supporting samrt format Seagate - smart status "pass" Micron - smart status "pass" When I reran the output was in long format with the fix. { "20200630-000823": { "dev": "/dev/sdg", "error": "smartctl returned invalid JSON", "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231", "nvme_smart_health_information_add_log_error_code": -22, "nvme_vendor": "avago" }, "20200701-000912": { "device": { "info_name": "/dev/sdg", "name": "/dev/sdg", "protocol": "SCSI", "type": "scsi" }, "device_type": { "name": "disk", "scsi_value": 0 }, "json_format_version": [ 1, 0 ], "local_time": { "asctime": "Wed Jul 1 00:07:00 2020 UTC", "time_t": 1593562020 }, "logical_block_size": 512, "model_name": "AVAGO SMC3108", "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231", "nvme_smart_health_information_add_log_error_code": -22, "nvme_vendor": "avago", "physical_block_size": 4096, "product": "SMC3108", "revision": "4.68", "scsi_version": "SPC-3", "serial_number": "00f11da416efd1fc2200d36c23800403", "smartctl": { "argv": [ "smartctl", "-a", "--json", "/dev/sdg" ], "build_info": "(local build)", "exit_status": 4, "platform_info": "x86_64-linux-3.10.0-1062.7.1.el7.x86_64", "svn_revision": "4883", "version": [ 7, 0 ] }, "temperature": { "current": 0, "drive_trip": 0 }, "user_capacity": { "blocks": 15626993664, "bytes": 8001020755968 }, "vendor": "AVAGO" } }
Verifying this BZ as the json error is due to specific drive [AVAGO] not supporting smart format. For other devices the smart format is showing proper json output.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:3003