Looking at the OSP15 image manually, thanks to Yatin's findings we now agree that this unique issue is likely regexp parsing "disk size: 1e+03 MiB" value: UC $ qemu-img info overcloud-full.qcow2 image: overcloud-full.qcow2 file format: qcow2 virtual size: 3.84 GiB (4118413312 bytes) disk size: 1e+03 MiB cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false UC $ ls -lha . -rw-r--r--. 1 stack stack 1000M Feb 21 23:48 overcloud-full.qcow2 For comparison, different image (OSP16) gives: UC $ qemu-img info 16/overcloud-full.qcow2 image: 16/overcloud-full.qcow2 file format: qcow2 virtual size: 3.84 GiB (4127129600 bytes) disk size: 0.977 GiB cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false UC $ ls -lha . 16/ -rw-r--r--. 1 stack stack 1001M Feb 13 17:02 overcloud-full.qcow2 qemu-img-4.1.0-23.module+el8.1.1+5467+ba2d821b.x86_64 python-oslo-utils-lang-3.40.5-0.20200113183052.566e14c.el8ost.noarch
ironic-conductor on UC reports: DEBUG ironic_lib.utils [req-741339a2-cccd-4ec6-9932-5d9e68af5d6f 6275ae0d0d904e868edeb26d56ae61d1 9c981643ec2f40578708640f060de538 - default default] Command stderr is: "" execute /usr/lib/python3.6/site-packages/ DEBUG oslo_concurrency.lockutils [req-741339a2-cccd-4ec6-9932-5d9e68af5d6f 6275ae0d0d904e868edeb26d56ae61d1 9c981643ec2f40578708640f060de538 - default default] Releasing lock "download-image" lock /usr/lib/python3 urrency/lockutils.py:281 ERROR ironic.conductor.utils [req-741339a2-cccd-4ec6-9932-5d9e68af5d6f 6275ae0d0d904e868edeb26d56ae61d1 9c981643ec2f40578708640f060de538 - default default] Node 95cdf531-ba45-444f-bc32-8d3acc35b624 failed deploy s iority': 100, 'argsinfo': None, 'interface': 'deploy'} with unexpected error: Invalid string format: 1eB: ValueError: Invalid string format: 1eB ERROR ironic.conductor.utils Traceback (most recent call last): ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/conductor/manager.py", line 3848, in _do_next_deploy_step ERROR ironic.conductor.utils result = interface.execute_deploy_step(task, step) ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/base.py", line 325, in execute_deploy_step ERROR ironic.conductor.utils return self._execute_step(task, step) ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/base.py", line 248, in _execute_step ERROR ironic.conductor.utils return getattr(self, step['step'])(task) ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic_lib/metrics.py", line 60, in wrapped ERROR ironic.conductor.utils result = f(*args, **kwargs) ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/conductor/task_manager.py", line 148, in wrapper ERROR ironic.conductor.utils return f(*args, **kwargs) ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/iscsi_deploy.py", line 422, in deploy ERROR ironic.conductor.utils deploy_utils.cache_instance_image(task.context, node) ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic_lib/metrics.py", line 60, in wrapped ERROR ironic.conductor.utils result = f(*args, **kwargs) ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/deploy_utils.py", line 1184, in cache_instance_image ERROR ironic.conductor.utils force_raw) ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/deploy_utils.py", line 521, in fetch_images ERROR ironic.conductor.utils cache.fetch_image(href, path, ctx=ctx, force_raw=force_raw) ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/image_cache.py", line 140, in fetch_image ERROR ironic.conductor.utils href, master_path, dest_path, ctx=ctx, force_raw=force_raw) ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/image_cache.py", line 167, in _download_image ERROR ironic.conductor.utils _fetch(ctx, href, tmp_path, force_raw) ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/image_cache.py", line 321, in _fetch ERROR ironic.conductor.utils required_space = images.converted_size(path_tmp) ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/common/images.py", line 392, in converted_size ERROR ironic.conductor.utils data = disk_utils.qemu_img_info(path) ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic_lib/disk_utils.py", line 337, in qemu_img_info ERROR ironic.conductor.utils return imageutils.QemuImgInfo(out) ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/oslo_utils/imageutils.py", line 62, in __init__ ERROR ironic.conductor.utils details = self._parse(cmd_output or '') ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/oslo_utils/imageutils.py", line 178, in _parse ERROR ironic.conductor.utils details = self._extract_details(root, root_details, lines) ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/oslo_utils/imageutils.py", line 127, in _extract_details ERROR ironic.conductor.utils real_details = self._extract_bytes(root_details) ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/oslo_utils/imageutils.py", line 113, in _extract_bytes ERROR ironic.conductor.utils return_int=True) ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/oslo_utils/strutils.py", line 232, in string_to_bytes ERROR ironic.conductor.utils raise ValueError(msg) ERROR ironic.conductor.utils ValueError: Invalid string format: 1eB
Perhaps we should abandon the regex parsing and do this: [root@rhel8 ~]# qemu-img info --output json overcloud-full.qcow2 { "virtual-size": 4118413312, "filename": "overcloud-full.qcow2", "cluster-size": 65536, "format": "qcow2", "actual-size": 1048117248, "format-specific": { "type": "qcow2", "data": { "compat": "1.1", "lazy-refcounts": false, "refcount-bits": 16, "corrupt": false } }, "dirty-flag": false }
Hello, (In reply to Lon Hohberger from comment #4) > Perhaps we should abandon the regex parsing and do this: > > [root@rhel8 ~]# qemu-img info --output json overcloud-full.qcow2 > { > "virtual-size": 4118413312, > "filename": "overcloud-full.qcow2", > "cluster-size": 65536, > "format": "qcow2", > "actual-size": 1048117248, > "format-specific": { > "type": "qcow2", > "data": { > "compat": "1.1", > "lazy-refcounts": false, > "refcount-bits": 16, > "corrupt": false > } > }, > "dirty-flag": false > } It could be a good idea, mostly to standardize the way we handle things between versions and avoid to change our behaviors on oslo.utils on each version of qemu.
oslo.utils supports json format; why doesn't ironic-lib use it? diff --git a/ironic_lib/disk_utils.py b/ironic_lib/disk_utils.py index f4273c7..6c0b810 100644 --- a/ironic_lib/disk_utils.py +++ b/ironic_lib/disk_utils.py @@ -332,9 +332,9 @@ def qemu_img_info(path): return imageutils.QemuImgInfo() out, err = utils.execute('env', 'LC_ALL=C', 'LANG=C', - 'qemu-img', 'info', path, + 'qemu-img', 'info', '--output=json', path, prlimit=QEMU_IMG_LIMITS) - return imageutils.QemuImgInfo(out) + return imageutils.QemuImgInfo(out, format=json) def convert_image(source, dest, out_format, run_as_root=False):
Hello, I published a patch [1] to fix oslo.utils by allowing us to handle this notation. Even if we decided to fix that via ironic by using json, I decided to spend some time on that side too. It's better to have more fixes than less. Also I reported a public bug [2] to track this issue upstream. I'll backport these changes and release them upstream on all the maintained stable branches when they will be merged. [1] https://review.opendev.org/#/c/709588/ [2] https://bugs.launchpad.net/oslo.utils/+bug/1864529
Ironic still doesn't seem to be handling nodes properly, nodes reach ERROR state during OC deployment: | fault | {'code': 500, 'created': '2020-02-26T11:30:52Z', 'message': 'Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance 6f1cefe5-074f-4960-bb44-179c1698e38c.', 'details': 'Traceback (most recent call last):\n File "/usr/lib/python3.6/site-packages/nova/conductor/manager.py", line 645, in build_instances\n raise exception.MaxRetriesExceeded(reason=msg)\nnova.exception.MaxRetriesExceeded: Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance 6f1cefe5-074f-4960-bb44-179c1698e38c.\n'} | | flavor | controller (e417a1e7-dc7b-4822-a805-97a78745e02 Tested on current OSP15 puddle (RHOS_TRUNK-15.0-RHEL-8-20200225.n.1), fix from https://bugzilla.redhat.com/show_bug.cgi?id=1806535#c7 doesn't seem to be in place (/usr/lib/python3.6/site-packages/oslo_utils/imageutils.py). OC deployment fails with error: /var/log/containers/ironic/ironic-conductor.log ... DEBUG ironic_lib.utils [req-01fa91ae-344e-4803-9dd2-b7b361b2d039 ea9c9a4b899e41c193e5eac32c074f33 511b866a194141fcbf113cbfb10467c7 - default default] Command stderr is: "" execute /usr/lib/python3.6/site-packages/ironic_lib/utils.py:104 DEBUG oslo_concurrency.lockutils [req-01fa91ae-344e-4803-9dd2-b7b361b2d039 ea9c9a4b899e41c193e5eac32c074f33 511b866a194141fcbf113cbfb10467c7 - default default] Releasing lock "download-image" lock /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:281 ERROR ironic.conductor.utils [req-01fa91ae-344e-4803-9dd2-b7b361b2d039 ea9c9a4b899e41c193e5eac32c074f33 511b866a194141fcbf113cbfb10467c7 - default default] Unexpected error while preparing to deploy to node aa7c6d66-70c5-438a-9500-8e348807ba1c: NameError: name 'json' is not defined ERROR ironic.conductor.utils Traceback (most recent call last): ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/conductor/manager.py", line 3692, in do_node_deploy ERROR ironic.conductor.utils task.driver.deploy.prepare(task) ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic_lib/metrics.py", line 60, in wrapped ERROR ironic.conductor.utils result = f(*args, **kwargs) ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/conductor/task_manager.py", line 148, in wrapper ERROR ironic.conductor.utils return f(*args, **kwargs) ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/iscsi_deploy.py", line 544, in prepare ERROR ironic.conductor.utils task.driver.boot.prepare_ramdisk(task, deploy_opts) ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic_lib/metrics.py", line 60, in wrapped ERROR ironic.conductor.utils result = f(*args, **kwargs) ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/ipxe.py", line 190, in prepare_ramdisk ERROR ironic.conductor.utils pxe_utils.cache_ramdisk_kernel(task, pxe_info, ipxe_enabled=True) ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/common/pxe_utils.py", line 980, in cache_ramdisk_kernel ERROR ironic.conductor.utils CONF.force_raw_images) ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/deploy_utils.py", line 521, in fetch_images ERROR ironic.conductor.utils cache.fetch_image(href, path, ctx=ctx, force_raw=force_raw) ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/image_cache.py", line 140, in fetch_image ERROR ironic.conductor.utils href, master_path, dest_path, ctx=ctx, force_raw=force_raw) ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/image_cache.py", line 167, in _download_image ERROR ironic.conductor.utils _fetch(ctx, href, tmp_path, force_raw) ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/image_cache.py", line 321, in _fetch ERROR ironic.conductor.utils required_space = images.converted_size(path_tmp) ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/common/images.py", line 392, in converted_size ERROR ironic.conductor.utils data = disk_utils.qemu_img_info(path) ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic_lib/disk_utils.py", line 337, in qemu_img_info ERROR ironic.conductor.utils return imageutils.QemuImgInfo(out, format=json) ERROR ironic.conductor.utils NameError: name 'json' is not defined Can the cause here be related to changes made as part of this BZ? If it is not, we should likely track this separately.
Created attachment 1665924 [details] ironic-conductor.log
We're going to pull Herve's patch today and use that. My patch has a typo.
(In reply to Filip Hubík from comment #9) > Ironic still doesn't seem to be handling nodes properly, nodes reach ERROR > state during OC deployment: > > | fault | {'code': 500, 'created': > '2020-02-26T11:30:52Z', 'message': 'Exceeded maximum number of retries. > Exhausted all hosts available for retrying build failures for instance > 6f1cefe5-074f-4960-bb44-179c1698e38c.', 'details': 'Traceback (most recent > call last):\n File > "/usr/lib/python3.6/site-packages/nova/conductor/manager.py", line 645, in > build_instances\n raise > exception.MaxRetriesExceeded(reason=msg)\nnova.exception.MaxRetriesExceeded: > Exceeded maximum number of retries. Exhausted all hosts available for > retrying build failures for instance > 6f1cefe5-074f-4960-bb44-179c1698e38c.\n'} | > | flavor | controller > (e417a1e7-dc7b-4822-a805-97a78745e02 > > Tested on current OSP15 puddle (RHOS_TRUNK-15.0-RHEL-8-20200225.n.1), fix > from https://bugzilla.redhat.com/show_bug.cgi?id=1806535#c7 doesn't seem to > be in place (/usr/lib/python3.6/site-packages/oslo_utils/imageutils.py). > > OC deployment fails with error: > > /var/log/containers/ironic/ironic-conductor.log > ... > DEBUG ironic_lib.utils [req-01fa91ae-344e-4803-9dd2-b7b361b2d039 > ea9c9a4b899e41c193e5eac32c074f33 511b866a194141fcbf113cbfb10467c7 - default > default] Command stderr is: "" execute > /usr/lib/python3.6/site-packages/ironic_lib/utils.py:104 > DEBUG oslo_concurrency.lockutils [req-01fa91ae-344e-4803-9dd2-b7b361b2d039 > ea9c9a4b899e41c193e5eac32c074f33 511b866a194141fcbf113cbfb10467c7 - default > default] Releasing lock "download-image" lock > /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:281 > ERROR ironic.conductor.utils [req-01fa91ae-344e-4803-9dd2-b7b361b2d039 > ea9c9a4b899e41c193e5eac32c074f33 511b866a194141fcbf113cbfb10467c7 - default > default] Unexpected error while preparing to deploy to node > aa7c6d66-70c5-438a-9500-8e348807ba1c: NameError: name 'json' is not defined > ERROR ironic.conductor.utils Traceback (most recent call last): > ERROR ironic.conductor.utils File > "/usr/lib/python3.6/site-packages/ironic/conductor/manager.py", line 3692, > in do_node_deploy > ERROR ironic.conductor.utils task.driver.deploy.prepare(task) > ERROR ironic.conductor.utils File > "/usr/lib/python3.6/site-packages/ironic_lib/metrics.py", line 60, in wrapped > ERROR ironic.conductor.utils result = f(*args, **kwargs) > ERROR ironic.conductor.utils File > "/usr/lib/python3.6/site-packages/ironic/conductor/task_manager.py", line > 148, in wrapper > ERROR ironic.conductor.utils return f(*args, **kwargs) > ERROR ironic.conductor.utils File > "/usr/lib/python3.6/site-packages/ironic/drivers/modules/iscsi_deploy.py", > line 544, in prepare > ERROR ironic.conductor.utils task.driver.boot.prepare_ramdisk(task, > deploy_opts) > ERROR ironic.conductor.utils File > "/usr/lib/python3.6/site-packages/ironic_lib/metrics.py", line 60, in wrapped > ERROR ironic.conductor.utils result = f(*args, **kwargs) > ERROR ironic.conductor.utils File > "/usr/lib/python3.6/site-packages/ironic/drivers/modules/ipxe.py", line 190, > in prepare_ramdisk > ERROR ironic.conductor.utils pxe_utils.cache_ramdisk_kernel(task, > pxe_info, ipxe_enabled=True) > ERROR ironic.conductor.utils File > "/usr/lib/python3.6/site-packages/ironic/common/pxe_utils.py", line 980, in > cache_ramdisk_kernel > ERROR ironic.conductor.utils CONF.force_raw_images) > ERROR ironic.conductor.utils File > "/usr/lib/python3.6/site-packages/ironic/drivers/modules/deploy_utils.py", > line 521, in fetch_images > ERROR ironic.conductor.utils cache.fetch_image(href, path, ctx=ctx, > force_raw=force_raw) > ERROR ironic.conductor.utils File > "/usr/lib/python3.6/site-packages/ironic/drivers/modules/image_cache.py", > line 140, in fetch_image > ERROR ironic.conductor.utils href, master_path, dest_path, ctx=ctx, > force_raw=force_raw) > ERROR ironic.conductor.utils File > "/usr/lib/python3.6/site-packages/ironic/drivers/modules/image_cache.py", > line 167, in _download_image > ERROR ironic.conductor.utils _fetch(ctx, href, tmp_path, force_raw) > ERROR ironic.conductor.utils File > "/usr/lib/python3.6/site-packages/ironic/drivers/modules/image_cache.py", > line 321, in _fetch > ERROR ironic.conductor.utils required_space = > images.converted_size(path_tmp) > ERROR ironic.conductor.utils File > "/usr/lib/python3.6/site-packages/ironic/common/images.py", line 392, in > converted_size > ERROR ironic.conductor.utils data = disk_utils.qemu_img_info(path) > ERROR ironic.conductor.utils File > "/usr/lib/python3.6/site-packages/ironic_lib/disk_utils.py", line 337, in > qemu_img_info > ERROR ironic.conductor.utils return imageutils.QemuImgInfo(out, > format=json) > ERROR ironic.conductor.utils NameError: name 'json' is not defined > > Can the cause here be related to changes made as part of this BZ? If it is > not, we should likely track this separately. It's not related to this issue, it's another topic where Ironic disk_utils badly handle given format. Your issue here is that quotes (' or ") are missing on `format` value. In other words json should surrounded by quotes and this is why you got this error, because with the current syntaxe the ironic disk_utils module ask for a var named `json` which doesn't exist. Please track this separately to avoid to merge different issues and components. This is an ironic issue and not an oslo.utils issue. Thanks
As explained by https://bugzilla.redhat.com/show_bug.cgi?id=1806535#c11 and also followup discussions, content that produced "NameError: name 'json' is not defined" was built just as a temporary and experimental one with custom fix (with intention to have at least workaround before the merge upstream), so we can discard this branch and only original issue remains. Clearing needinfos.
Verified , osp15 phase2 is passing RHOS_TRUNK-15.0-RHEL-8-20200227.n.0, with : var/log/rpm.list:python-oslo-utils-lang-3.40.5-0.20200113183053.566e14c.el8ost.noarch https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/phase2-15_director-rhel-8.1-virthost-3cont_2comp_3ceph-ipv4-geneve-ceph/26/
On puddle RHOS_TRUNK-15.0-RHEL-8-20200225.n.1, I can confirm also that fix https://review.opendev.org/#/c/709588 resolves the issue here, with https://bugzilla.redhat.com/show_bug.cgi?id=1806535#c6 reverted (both changes inside ironic-conductor container on controller).
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0710