Hi Adam, as it's an encoding problem, probably a whacky/hacky way might be to try/catch and on encoding error try to treat the string as an LATIN-1 encoded string and try again using the correct encoding. Or maybe it would be cleaner to use the .decode() error parameter: ``` >>> b"\xb7".decode(errors="replace") '�' >>> ``` Which would not put the actual symbol in the string, but for the wwid, would that actually matter? With this, we might be able to forget about possible strange characters in this bit. Best regards Raimund
Reading this bit of the code: ``` def read_file(path_list, file_name=''): # type: (List[str], str) -> str """Returns the content of the first file found within the `path_list` :param path_list: list of file paths to search :param file_name: optional file_name to be applied to a file path :returns: content of the file or 'Unknown' """ for path in path_list: if file_name: file_path = os.path.join(path, file_name) else: file_path = path if os.path.exists(file_path): with open(file_path, 'r') as f: try: content = f.read().strip() except OSError: # sysfs may populate the file, but for devices like # virtio reads can fail return 'Unknown' else: return content return 'Unknown' ``` I think we could either: ``` with open(file_path, 'r', errors='replace') as f: ``` or maybe add in addition to the `OSError` exception handler also `UnicodeDecodeError` and just return 'Unknown' in that case as well. Not sure what would be preferrable. I'll ask the CU to provide the content of the offending file so you can test it. Thank you Raimund
Or, instead of returning `unknown`, we could also just return `unicodedecodeerror`, which should then be visibile in the output and at least gives a hint that an wwid exists, but can't be read, so not clear to me what is better: `unknown` `unicodedecodeerror` `id.23403867855�nd` ? :-) ... I think I would prefer the 3rd one, seeing the WWID even if it has a character replaced. One could even limit this scope to limit impact if one fears replacing characters could interfere with other parts of the code which read files: ``` def read_file(path_list, file_name='', errors=None): <-- Seeing that None is the same as 'strict' [...] with open(file_path, 'r', errors=errors) as f: [...] ``` And then feeding `replace` for reading this Vendor files: ``` def _dev_list(self, dev_list): # type: (List[str]) -> List[Dict[str, object]] """Return a 'pretty' name list for each device in the `dev_list`""" disk_list = list() for dev in dev_list: disk_model = read_file(['/sys/block/{}/device/model'.format(dev)], errors='replace').strip() disk_rev = read_file(['/sys/block/{}/device/rev'.format(dev)], errors='replace').strip() disk_wwid = read_file(['/sys/block/{}/device/wwid'.format(dev)], errors='replace').strip() vendor = read_file(['/sys/block/{}/device/vendor'.format(dev)], errors='replace').strip() ``` Just as an idea. As soon as CU has uploaded the files from the offending hosts, I'll make them available. BR Raimund
Hello Adam, do we have any news about this? We have provided a tarball with the files and the file output, did you get a chance to look it over? Thank you Raimund
Hi Adam, we are only talking these two lines, right? ``` with open(file_path, 'rb') as f: try: content = f.read().decode('utf-8', 'ignore').strip() ``` I think I can just take the cephadm binary from a cluster with the same version and modify those two lines and have the CU execute this locally on one of the servers. Or would there be a need to actually get the rest and compiled from github repository? BR Raimund
Hi Adam, FYI, Asked the CU to reproduce with the 2 commands on the host. I'll keep you updated. BR Raimund
just as an FYI, executing manually on the node produces the same errors, I modified cephadm and sent it to the customer to test it out, will paste results when CU comes back. BR
Hello Adam, CU run the commands on the modified binary of mine, unfortunately they did run the `cephadm ceph-volume` command with a error in the parameter set so it errored out, I asked them to retry with correct parameters. However the fact-gather went through correctly now: ``` [root@ceph-dashboard-test ~]# python3 ./cephadm.testbuild gather-facts { "arch": "x86_64", "bios_date": "12/03/2020", "bios_version": "Hyper-V UEFI Release v4.1", "cpu_cores": 2, "cpu_count": 1, "cpu_load": { "15min": 0.09, "1min": 0.19, "5min": 0.23 }, "cpu_model": "Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz", "cpu_threads": 4, "flash_capacity": "0.0", "flash_capacity_bytes": 0, "flash_count": 0, "flash_list": [], "hdd_capacity": "85.9GB", "hdd_capacity_bytes": 85899345920, "hdd_count": 2, "hdd_list": [ { "description": "Msft Virtual Disk (21.5GB)", "dev_name": "sdb", "disk_size_bytes": 21474836480, "model": "Virtual Disk", "rev": "1.0", "vendor": "Msft", "wwid": "t10.MSFT \\032qv\\025\\215Fh\\035\\236SL" }, { "description": "Msft Virtual Disk (64.4GB)", "dev_name": "sda", "disk_size_bytes": 64424509440, "model": "Virtual Disk", "rev": "1.0", "vendor": "Msft", "wwid": "t10.MSFT &[jP\\200GO+eE\\222" } ], "hostname": "ceph-dashboard-test", "interfaces": { "eth0": { "driver": "hv_netvsc", "iftype": "physical", "ipv4_address": "10.215.44.99/25", "ipv6_address": "fe80::215:5dff:fea0:d50c/64", "lower_devs_list": [], "mtu": 1500, "nic_type": "ethernet", "operstate": "up", "speed": 20000, "upper_devs_list": [] }, "lo": { "driver": "", "iftype": "logical", "ipv4_address": "127.0.0.1/8", "ipv6_address": "::1/128", "lower_devs_list": [], "mtu": 65536, "nic_type": "loopback", "operstate": "unknown", "speed": -1, "upper_devs_list": [] } }, "kernel": "4.18.0-477.15.1.el8_8.x86_64", "kernel_parameters": { "net.ipv4.ip_nonlocal_bind": "0" }, "kernel_security": { "description": "SELinux: Enabled(enforcing, targeted)", "type": "SELinux" }, "memory_available_kb": 13016520, "memory_free_kb": 1453236, "memory_total_kb": 16138760, "model": "Virtual Machine (Virtual Machine)", "nic_count": 1, "operating_system": "Red Hat Enterprise Linux 8.8 (Ootpa)", "selinux_enabled": true, "subscribed": "Yes", "system_uptime": 6150307.68, "tcp6_ports_used": [ 22, 3000, 44321, 9093, 9094, 9095, 9100 ], "tcp_ports_used": [ 22, 44321 ], "timestamp": 1723141832.6658888, "udp6_ports_used": [ 323, 9094 ], "udp_ports_used": [ 323 ], "vendor": "Microsoft Corporation" } ``` I did only change the two lines (opening as binary, and encoding with ignoring errors on reading). So this should fix the problem perfectly. Only question I have now is how can we distribute this fix to the CU. I think we might get a last shot at a final 5.3z-stream release, because I have another BZ open for scrub issues where we got that approved during yesterdays program call. So might be good to get that in. If we could get confirmation if that could go into the next (and probably last) 5.3z-stream, then I'll ask the CU if they would be good upgrading when it's available. Pretty sure we would not need a hotfix around that ... Thanks BR Raimund
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 5.3 security and bug fix updates), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2025:1478