Description of problem: while deploying two instances in overcloud qemu-img coredumped on both ironic overcloud servers. Deleting the instances and redploying later was successful Version-Release number of selected component (if applicable): RHOS-16.1-RHEL-8-20201021.n.0 ipxe-roms-qemu.noarch 20181214-5.git133f4c47.el8 @rhosp-rhel-8.2-appstream libvirt-daemon-driver-qemu.x86_64 6.0.0-25.4.module+el8.2.1+8060+c0c58169 @rhosp-rhel-8.2-av openstack-ironic-python-agent-builder.noarch 2.1.1-1.20200914175356.65d0f80.el8ost @rhelosp-16.1 puppet-ironic.noarch 15.4.1-1.20200814153354.39f97cc.el8ost @rhelosp-16.1 puppet-nova.noarch 15.6.1-1.20200814103355.51a6857.el8ost @rhelosp-16.1 python3-ironic-inspector-client.noarch 3.7.1-0.20200522054325.3a41127.el8ost @rhelosp-16.1 python3-ironicclient.noarch 3.1.2-0.20200522053422.1220d76.el8ost @rhelosp-16.1 python3-novaclient.noarch 1:15.1.1-0.20200629073413.79959ab.el8ost @rhelosp-16.1 qemu-guest-agent.x86_64 15:4.2.0-29.module+el8.2.1+7990+27f1e480.4 @rhosp-rhel-8.2-av qemu-img.x86_64 15:4.2.0-29.module+el8.2.1+7990+27f1e480.4 @rhosp-rhel-8.2-av qemu-kvm.x86_64 15:4.2.0-29.module+el8.2.1+7990+27f1e480.4 @rhosp-rhel-8.2-av qemu-kvm-block-curl.x86_64 15:4.2.0-29.module+el8.2.1+7990+27f1e480.4 @rhosp-rhel-8.2-av qemu-kvm-block-gluster.x86_64 15:4.2.0-29.module+el8.2.1+7990+27f1e480.4 @rhosp-rhel-8.2-av qemu-kvm-block-iscsi.x86_64 15:4.2.0-29.module+el8.2.1+7990+27f1e480.4 @rhosp-rhel-8.2-av qemu-kvm-block-rbd.x86_64 15:4.2.0-29.module+el8.2.1+7990+27f1e480.4 @rhosp-rhel-8.2-av qemu-kvm-block-ssh.x86_64 15:4.2.0-29.module+el8.2.1+7990+27f1e480.4 @rhosp-rhel-8.2-av qemu-kvm-common.x86_64 15:4.2.0-29.module+el8.2.1+7990+27f1e480.4 @rhosp-rhel-8.2-av qemu-kvm-core.x86_64 15:4.2.0-29.module+el8.2.1+7990+27f1e480.4 @rhosp-rhel-8.2-av | docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-ironic-neutron-agent:16.1_20201020.1 | | docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-scheduler:16.1_20201020.1 | | docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-compute-ironic:16.1_20201020.1 | | docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-ironic-conductor:16.1_20201020.1 | | docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-api:16.1_20201020.1 | | docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-ironic-api:16.1_20201020.1 | | docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-ironic-inspector:16.1_20201020.1 | | docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-conductor:16.1_20201020.1 | | docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-ironic-pxe:16.1_20201020.1 | | docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-novncproxy:16.1_20201020.1 | | docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-compute:16.1_20201020.1 | | docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-libvirt:16.1_20201020.1 How reproducible: consistent across multiple deploys Steps to Reproduce: 1. deploy 3cont_2comp_3ceph_2ironrole_2ironic ironic in self contained overcloud nodes. 2. create unique networking in the overcloud, deploy 2 instances on the ironic nodes. 3. Actual results: instance deployment fails Expected results: instances deploy successfully Additional info:
+--------------------------------------+-----------+--------+----------+----------------+--------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+-----------+--------+----------+----------------+--------+ | d311485a-709f-476b-8e4c-a5731147e611 | instance2 | ERROR | | overcloud-full | | | 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d | instance1 | ERROR | | overcloud-full | | +--------------------------------------+-----------+--------+----------+----------------+--------+ scan for instance ids on controllers and ironic overcloud nodes: checking for d311485a-709f-476b-8e4c-a5731147e611 on controller at 192.168.24.47 Warning: Permanently added '192.168.24.47' (ECDSA) to the list of known hosts. This system is not registered to Red Hat Insights. See https://cloud.redhat.com/ To register this system, run: insights-client --register Last login: Thu Oct 29 16:13:37 2020 from 192.168.24.1 sudo grep -R "d311485a-709f-476b-8e4c-a5731147e611" /var/log/containers |grep ERROR echo exit 31147e611" /var/log/containers |grep ERRORo grep -R "d311485a-709f-476b-8e4c-a573 /var/log/containers/nova/nova-conductor.log.1:2020-10-28 21:35:20.944 7 ERROR nova.scheduler.utils [req-4374ade3-be9c-4616-8d86-2dc14b4e54c7 3b01a29d2ad6412797ee829c00981d16 dcabc5daf96e481dbc165d728d64ac7a - default default] [instance: d311485a-709f-476b-8e4c-a5731147e611] Error from last host: overcloud-controller-1.localdomain (node 7f0f5990-c495-4615-80be-69139da3f3fa): ['Traceback (most recent call last):\n', ' File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2437, in _build_and_run_instance\n block_device_info=block_device_info)\n', ' File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1289, in spawn\n \'node\': node_uuid})\n', ' File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__\n self.force_reraise()\n', ' File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise\n six.reraise(self.type_, self.value, self.tb)\n', ' File "/usr/lib/python3.6/site-packages/six.py", line 675, in reraise\n raise value\n', ' File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1281, in spawn\n timer.start(interval=CONF.ironic.api_retry_interval).wait()\n', ' File "/usr/lib/python3.6/site-packages/eventlet/event.py", line 125, in wait\n result = hub.switch()\n', ' File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch\n return self.greenlet.switch()\n', ' File "/usr/lib/python3.6/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop\n result = func(*self.args, **self.kw)\n', ' File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 558, in _wait_for_active\n raise exception.InstanceDeployFailure(msg)\n', "nova.exception.InstanceDeployFailure: Failed to provision instance d311485a-709f-476b-8e4c-a5731147e611: Failed to deploy. Exception: Unexpected error while running command.\nCommand: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmpttmuad6s/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.part /var/lib/ironic/master_images/tmpttmuad6s/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.converted\nExit code: -6\nStdout: ''\nStderr: 'Failed to allocate memory: Cannot allocate memory\\n'\n", '\nDuring handling of the above exception, another exception occurred:\n\n', 'Traceback (most recent call last):\n', ' File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2161, in _do_build_and_run_instance\n filter_properties, request_spec)\n', ' File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2537, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', "nova.exception.RescheduledException: Build of instance d311485a-709f-476b-8e4c-a5731147e611 was re-scheduled: Failed to provision instance d311485a-709f-476b-8e4c-a5731147e611: Failed to deploy. Exception: Unexpected error while running command.\nCommand: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmpttmuad6s/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.part /var/lib/ironic/master_images/tmpttmuad6s/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.converted\nExit code: -6\nStdout: ''\nStderr: 'Failed to allocate memory: Cannot allocate memory\\n'\n"] /var/log/containers/nova/nova-conductor.log.1:2020-10-28 21:35:20.948 7 WARNING nova.scheduler.utils [req-4374ade3-be9c-4616-8d86-2dc14b4e54c7 3b01a29d2ad6412797ee829c00981d16 dcabc5daf96e481dbc165d728d64ac7a - default default] [instance: d311485a-709f-476b-8e4c-a5731147e611] Setting instance to ERROR state.: nova.exception.MaxRetriesExceeded: Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance d311485a-709f-476b-8e4c-a5731147e611. checking for d311485a-709f-476b-8e4c-a5731147e611 on controller at 192.168.24.48 Warning: Permanently added '192.168.24.48' (ECDSA) to the list of known hosts. This system is not registered to Red Hat Insights. See https://cloud.redhat.com/ To register this system, run: insights-client --register Last login: Thu Oct 29 16:13:39 2020 from 192.168.24.1 sudo grep -R "d311485a-709f-476b-8e4c-a5731147e611" /var/log/containers |grep ERROR echo exit 31147e611" /var/log/containers |grep ERRORo grep -R "d311485a-709f-476b-8e4c-a573 /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.016 7 ERROR oslo.service.loopingcall [-] Fixed interval looping call 'nova.virt.ironic.driver.IronicDriver._wait_for_active' failed: nova.exception.InstanceDeployFailure: Failed to provision instance d311485a-709f-476b-8e4c-a5731147e611: Failed to deploy. Exception: Unexpected error while running command. /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.016 7 ERROR oslo.service.loopingcall nova.exception.InstanceDeployFailure: Failed to provision instance d311485a-709f-476b-8e4c-a5731147e611: Failed to deploy. Exception: Unexpected error while running command. /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.virt.ironic.driver [req-4374ade3-be9c-4616-8d86-2dc14b4e54c7 3b01a29d2ad6412797ee829c00981d16 dcabc5daf96e481dbc165d728d64ac7a - default default] Error deploying instance d311485a-709f-476b-8e4c-a5731147e611 on baremetal node 7f0f5990-c495-4615-80be-69139da3f3fa.: nova.exception.InstanceDeployFailure: Failed to provision instance d311485a-709f-476b-8e4c-a5731147e611: Failed to deploy. Exception: Unexpected error while running command. /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [req-4374ade3-be9c-4616-8d86-2dc14b4e54c7 3b01a29d2ad6412797ee829c00981d16 dcabc5daf96e481dbc165d728d64ac7a - default default] [instance: d311485a-709f-476b-8e4c-a5731147e611] Instance failed to spawn: nova.exception.InstanceDeployFailure: Failed to provision instance d311485a-709f-476b-8e4c-a5731147e611: Failed to deploy. Exception: Unexpected error while running command. /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] Traceback (most recent call last): /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2663, in _build_resources /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] yield resources /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2437, in _build_and_run_instance /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] block_device_info=block_device_info) /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1289, in spawn /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] 'node': node_uuid}) /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__ /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] self.force_reraise() /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] six.reraise(self.type_, self.value, self.tb) /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] File "/usr/lib/python3.6/site-packages/six.py", line 675, in reraise /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] raise value /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1281, in spawn /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] timer.start(interval=CONF.ironic.api_retry_interval).wait() /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] File "/usr/lib/python3.6/site-packages/eventlet/event.py", line 125, in wait /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] result = hub.switch() /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] return self.greenlet.switch() /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] File "/usr/lib/python3.6/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] result = func(*self.args, **self.kw) /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 558, in _wait_for_active /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] raise exception.InstanceDeployFailure(msg) /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] nova.exception.InstanceDeployFailure: Failed to provision instance d311485a-709f-476b-8e4c-a5731147e611: Failed to deploy. Exception: Unexpected error while running command. /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] Command: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmpttmuad6s/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.part /var/lib/ironic/master_images/tmpttmuad6s/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.converted /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] Exit code: -6 /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] Stdout: '' /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] Stderr: 'Failed to allocate memory: Cannot allocate memory\n' /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [req-4374ade3-be9c-4616-8d86-2dc14b4e54c7 3b01a29d2ad6412797ee829c00981d16 dcabc5daf96e481dbc165d728d64ac7a - default default] [instance: d311485a-709f-476b-8e4c-a5731147e611] Failed to build and run instance: nova.exception.InstanceDeployFailure: Failed to provision instance d311485a-709f-476b-8e4c-a5731147e611: Failed to deploy. Exception: Unexpected error while running command. /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] Traceback (most recent call last): /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2437, in _build_and_run_instance /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] block_device_info=block_device_info) /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1289, in spawn /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] 'node': node_uuid}) /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__ /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] self.force_reraise() /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] six.reraise(self.type_, self.value, self.tb) /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] File "/usr/lib/python3.6/site-packages/six.py", line 675, in reraise /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] raise value /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1281, in spawn /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] timer.start(interval=CONF.ironic.api_retry_interval).wait() /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] File "/usr/lib/python3.6/site-packages/eventlet/event.py", line 125, in wait /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] result = hub.switch() /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] return self.greenlet.switch() /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] File "/usr/lib/python3.6/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] result = func(*self.args, **self.kw) /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 558, in _wait_for_active /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] raise exception.InstanceDeployFailure(msg) /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] nova.exception.InstanceDeployFailure: Failed to provision instance d311485a-709f-476b-8e4c-a5731147e611: Failed to deploy. Exception: Unexpected error while running command. /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] Command: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmpttmuad6s/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.part /var/lib/ironic/master_images/tmpttmuad6s/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.converted /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] Exit code: -6 /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] Stdout: '' /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] Stderr: 'Failed to allocate memory: Cannot allocate memory\n' /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] [heat-admin@overcloud-controller-1 ~]$ echo [heat-admin@overcloud-controller-1 ~]$ exit checking for 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d on controller at 192.168.24.47 Warning: Permanently added '192.168.24.47' (ECDSA) to the list of known hosts. This system is not registered to Red Hat Insights. See https://cloud.redhat.com/ To register this system, run: insights-client --register Last login: Thu Oct 29 16:14:18 2020 from 192.168.24.1 sudo grep -R "60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d" /var/log/containers |grep ERROR echo exit 45a7ced5d" /var/log/containers |grep ERRORo grep -R "60fdfb6e-5fcc-4c91-bdf8-e5b4 /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.546 7 ERROR oslo.service.loopingcall [-] Fixed interval looping call 'nova.virt.ironic.driver.IronicDriver._wait_for_active' failed: nova.exception.InstanceDeployFailure: Failed to provision instance 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d: Failed to deploy. Exception: Unexpected error while running command. /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.546 7 ERROR oslo.service.loopingcall nova.exception.InstanceDeployFailure: Failed to provision instance 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d: Failed to deploy. Exception: Unexpected error while running command. /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.virt.ironic.driver [req-64fb5b93-f83f-44be-ac9f-47158f7fa7f9 3b01a29d2ad6412797ee829c00981d16 dcabc5daf96e481dbc165d728d64ac7a - default default] Error deploying instance 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d on baremetal node a87c1654-4577-4cc7-bae6-8c2ae2a18e2e.: nova.exception.InstanceDeployFailure: Failed to provision instance 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d: Failed to deploy. Exception: Unexpected error while running command. /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [req-64fb5b93-f83f-44be-ac9f-47158f7fa7f9 3b01a29d2ad6412797ee829c00981d16 dcabc5daf96e481dbc165d728d64ac7a - default default] [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Instance failed to spawn: nova.exception.InstanceDeployFailure: Failed to provision instance 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d: Failed to deploy. Exception: Unexpected error while running command. /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Traceback (most recent call last): /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2663, in _build_resources /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] yield resources /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2437, in _build_and_run_instance /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] block_device_info=block_device_info) /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1289, in spawn /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] 'node': node_uuid}) /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__ /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] self.force_reraise() /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] six.reraise(self.type_, self.value, self.tb) /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] File "/usr/lib/python3.6/site-packages/six.py", line 675, in reraise /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] raise value /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1281, in spawn /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] timer.start(interval=CONF.ironic.api_retry_interval).wait() /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] File "/usr/lib/python3.6/site-packages/eventlet/event.py", line 125, in wait /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] result = hub.switch() /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] return self.greenlet.switch() /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] File "/usr/lib/python3.6/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] result = func(*self.args, **self.kw) /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 558, in _wait_for_active /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] raise exception.InstanceDeployFailure(msg) /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] nova.exception.InstanceDeployFailure: Failed to provision instance 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d: Failed to deploy. Exception: Unexpected error while running command. /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Command: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmprgc5z2k3/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.part /var/lib/ironic/master_images/tmprgc5z2k3/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.converted /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Exit code: -6 /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Stdout: '' /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Stderr: 'Failed to allocate memory: Cannot allocate memory\n' /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [req-64fb5b93-f83f-44be-ac9f-47158f7fa7f9 3b01a29d2ad6412797ee829c00981d16 dcabc5daf96e481dbc165d728d64ac7a - default default] [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Failed to build and run instance: nova.exception.InstanceDeployFailure: Failed to provision instance 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d: Failed to deploy. Exception: Unexpected error while running command. /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Traceback (most recent call last): /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2437, in _build_and_run_instance /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] block_device_info=block_device_info) /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1289, in spawn /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] 'node': node_uuid}) /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__ /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] self.force_reraise() /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] six.reraise(self.type_, self.value, self.tb) /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] File "/usr/lib/python3.6/site-packages/six.py", line 675, in reraise /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] raise value /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1281, in spawn /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] timer.start(interval=CONF.ironic.api_retry_interval).wait() /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] File "/usr/lib/python3.6/site-packages/eventlet/event.py", line 125, in wait /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] result = hub.switch() /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] return self.greenlet.switch() /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] File "/usr/lib/python3.6/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] result = func(*self.args, **self.kw) /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 558, in _wait_for_active /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] raise exception.InstanceDeployFailure(msg) /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] nova.exception.InstanceDeployFailure: Failed to provision instance 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d: Failed to deploy. Exception: Unexpected error while running command. /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Command: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmprgc5z2k3/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.part /var/lib/ironic/master_images/tmprgc5z2k3/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.converted /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Exit code: -6 /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Stdout: '' /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Stderr: 'Failed to allocate memory: Cannot allocate memory\n' /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] [heat-admin@overcloud-controller-2 ~]$ echo [heat-admin@overcloud-controller-2 ~]$ exit checking for 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d on controller at 192.168.24.48 Warning: Permanently added '192.168.24.48' (ECDSA) to the list of known hosts. This system is not registered to Red Hat Insights. See https://cloud.redhat.com/ To register this system, run: insights-client --register Last login: Thu Oct 29 16:14:20 2020 from 192.168.24.1 sudo grep -R "60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d" /var/log/containers |grep ERROR echo exit 45a7ced5d" /var/log/containers |grep ERRORo grep -R "60fdfb6e-5fcc-4c91-bdf8-e5b4 /var/log/containers/nova/nova-conductor.log.1:2020-10-28 21:35:15.423 7 ERROR nova.scheduler.utils [req-64fb5b93-f83f-44be-ac9f-47158f7fa7f9 3b01a29d2ad6412797ee829c00981d16 dcabc5daf96e481dbc165d728d64ac7a - default default] [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Error from last host: overcloud-controller-2.localdomain (node a87c1654-4577-4cc7-bae6-8c2ae2a18e2e): ['Traceback (most recent call last):\n', ' File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2437, in _build_and_run_instance\n block_device_info=block_device_info)\n', ' File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1289, in spawn\n \'node\': node_uuid})\n', ' File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__\n self.force_reraise()\n', ' File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise\n six.reraise(self.type_, self.value, self.tb)\n', ' File "/usr/lib/python3.6/site-packages/six.py", line 675, in reraise\n raise value\n', ' File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1281, in spawn\n timer.start(interval=CONF.ironic.api_retry_interval).wait()\n', ' File "/usr/lib/python3.6/site-packages/eventlet/event.py", line 125, in wait\n result = hub.switch()\n', ' File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch\n return self.greenlet.switch()\n', ' File "/usr/lib/python3.6/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop\n result = func(*self.args, **self.kw)\n', ' File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 558, in _wait_for_active\n raise exception.InstanceDeployFailure(msg)\n', "nova.exception.InstanceDeployFailure: Failed to provision instance 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d: Failed to deploy. Exception: Unexpected error while running command.\nCommand: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmprgc5z2k3/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.part /var/lib/ironic/master_images/tmprgc5z2k3/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.converted\nExit code: -6\nStdout: ''\nStderr: 'Failed to allocate memory: Cannot allocate memory\\n'\n", '\nDuring handling of the above exception, another exception occurred:\n\n', 'Traceback (most recent call last):\n', ' File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2161, in _do_build_and_run_instance\n filter_properties, request_spec)\n', ' File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2537, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', "nova.exception.RescheduledException: Build of instance 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d was re-scheduled: Failed to provision instance 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d: Failed to deploy. Exception: Unexpected error while running command.\nCommand: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmprgc5z2k3/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.part /var/lib/ironic/master_images/tmprgc5z2k3/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.converted\nExit code: -6\nStdout: ''\nStderr: 'Failed to allocate memory: Cannot allocate memory\\n'\n"] /var/log/containers/nova/nova-conductor.log.1:2020-10-28 21:35:15.810 7 WARNING nova.scheduler.utils [req-64fb5b93-f83f-44be-ac9f-47158f7fa7f9 3b01a29d2ad6412797ee829c00981d16 dcabc5daf96e481dbc165d728d64ac7a - default default] [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Setting instance to ERROR state. [heat-admin@overcloud-controller-1 ~]$ echo [heat-admin@overcloud-controller-1 ~]$ exit Also: [heat-admin@overcloud-ironic-1 ~]$ sudo coredumpctl TIME PID UID GID SIG COREFILE EXE Wed 2020-10-28 21:34:34 UTC 30244 42422 42422 6 present /usr/bin/qemu-img [heat-admin@overcloud-ironic-1 ~]$ echo [heat-admin@overcloud-ironic-0 ~]$ sudo coredumpctl TIME PID UID GID SIG COREFILE EXE Wed 2020-10-28 21:34:24 UTC 30214 42422 42422 6 present /usr/bin/qemu-img [heat-admin@overcloud-ironic-0 ~]$ echo Unfortunately I cannot retrieve the coredumps at the moment.
Deleting the failed instances, refreshing the target "baremetal" nodes (manage, provide) and redeploying the instances (same paramters) later succeeds.
[heat-admin@overcloud-ironic-0 ~]$ sudo podman info host: BuildahVersion: 1.12.0-dev CgroupVersion: v1 Conmon: package: conmon-2.0.6-1.module+el8.2.0+7659+b700d80e.x86_64 path: /usr/bin/conmon version: 'conmon version 2.0.6, commit: 0ef43c583d5911a10ad68726fde31e63d98d773b' Distribution: distribution: '"rhel"' version: "8.2" MemFree: 8464867328 MemTotal: 16643993600 OCIRuntime: name: runc package: runc-1.0.0-64.rc10.module+el8.2.0+7659+b700d80e.x86_64 path: /usr/bin/runc version: 'runc version spec: 1.0.1-dev' SwapFree: 0 SwapTotal: 0 arch: amd64 cpus: 8 eventlogger: journald hostname: overcloud-ironic-0 kernel: 4.18.0-193.28.1.el8_2.x86_64 os: linux rootless: false uptime: 20h 14m 46.39s (Approximately 0.83 days) registries: blocked: null insecure: - undercloud-0.ctlplane.redhat.local:8787 search: - registry.access.redhat.com - registry.redhat.io - docker.io store: ConfigFile: /etc/containers/storage.conf ContainerStore: number: 11 GraphDriverName: overlay GraphOptions: {} GraphRoot: /var/lib/containers/storage GraphStatus: Backing Filesystem: xfs Native Overlay Diff: "true" Supports d_type: "true" Using metacopy: "false" ImageStore: number: 4 RunRoot: /var/run/containers/storage VolumePath: /var/lib/containers/storage/volumes [heat-admin@overcloud-ironic-1 ~]$ sudo podman info host: BuildahVersion: 1.12.0-dev CgroupVersion: v1 Conmon: package: conmon-2.0.6-1.module+el8.2.0+7659+b700d80e.x86_64 path: /usr/bin/conmon version: 'conmon version 2.0.6, commit: 0ef43c583d5911a10ad68726fde31e63d98d773b' Distribution: distribution: '"rhel"' version: "8.2" MemFree: 8350900224 MemTotal: 16643993600 OCIRuntime: name: runc package: runc-1.0.0-64.rc10.module+el8.2.0+7659+b700d80e.x86_64 path: /usr/bin/runc version: 'runc version spec: 1.0.1-dev' SwapFree: 0 SwapTotal: 0 arch: amd64 cpus: 8 eventlogger: journald hostname: overcloud-ironic-1 kernel: 4.18.0-193.28.1.el8_2.x86_64 os: linux rootless: false uptime: 20h 14m 40.9s (Approximately 0.83 days) registries: blocked: null insecure: - undercloud-0.ctlplane.redhat.local:8787 search: - registry.access.redhat.com - registry.redhat.io - docker.io store: ConfigFile: /etc/containers/storage.conf ContainerStore: number: 11 GraphDriverName: overlay GraphOptions: {} GraphRoot: /var/lib/containers/storage GraphStatus: Backing Filesystem: xfs Native Overlay Diff: "true" Supports d_type: "true" Using metacopy: "false" ImageStore: number: 4 RunRoot: /var/run/containers/storage VolumePath: /var/lib/containers/storage/volumes
> /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Command: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmprgc5z2k3/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.part /var/lib/ironic/master_images/tmprgc5z2k3/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.converted /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Exit code: -6 /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Stdout: '' /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Stderr: 'Failed to allocate memory: Cannot allocate memory\n' Looks like qemu-img is unable to convert the image because there's not enough RAM. I wonder whether qemu-img attempts to fit the entire image into RAM, how big the image file is, and how much RAM is available on the compute where this conversation attempt takes place?
Nova caps the amount of memory and CPU that 'qemu-img' can use to protect against Denial of Service (DoS); it was fixed by DanPB back in 2016, as part of a CVE[1] Talking to Dan on IRC, he makes a good point that either the cap is too small for this image; or the disk image is corrupt and correctly triggering the 'ulimit' protection against DoS. [1] https://opendev.org/openstack/nova/commit/068d851561ad — virt: set address space & CPU time limits when running qemu-img - - - So can the reporter get `qemu-img info` of the image?
The issue is actually on the ironic conductor side which does the conversion. Following from the ironic conductor log: 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils [req-f387e258-8ac7-4491-90bb-5169aacc48b0 fc2a283a64264b1682c1da2508ee007d 6f1ce8efc14f431b962691dc1c39eddc - default default] Node b19c943a-295e-4ca9-a13b-04e41b509c67 failed deploy step {'step': 'deploy', 'priority': 100, 'argsinfo': None, 'interface': 'deploy'} with unexpected error: Unexpected error while running command. Command: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmpmx49mjnx/a35feba0-9a05-426e-b591-ade31665cd8a.part /var/lib/ironic/master_images/tmpmx49mjnx/a35feba0-9a05-426e-b591-ade31665cd8a.converted Exit code: -6 Stdout: '' Stderr: 'Failed to allocate memory: Cannot allocate memory\n': oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. Command: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmpmx49mjnx/a35feba0-9a05-426e-b591-ade31665cd8a.part /var/lib/ironic/master_images/tmpmx49mjnx/a35feba0-9a05-426e-b591-ade31665cd8a.converted Exit code: -6 Stdout: '' Stderr: 'Failed to allocate memory: Cannot allocate memory\n' 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils Traceback (most recent call last): 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/conductor/manager.py", line 3951, in _do_next_deploy_step 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils result = interface.execute_deploy_step(task, step) 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/base.py", line 360, in execute_deploy_step 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils return self._execute_step(task, step) 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/base.py", line 283, in _execute_step 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils return getattr(self, step['step'])(task) 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic_lib/metrics.py", line 60, in wrapped 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils result = f(*args, **kwargs) 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/conductor/task_manager.py", line 148, in wrapper 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils return f(*args, **kwargs) 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/iscsi_deploy.py", line 422, in deploy 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils deploy_utils.cache_instance_image(task.context, node) 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic_lib/metrics.py", line 60, in wrapped 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils result = f(*args, **kwargs) 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/deploy_utils.py", line 1197, in cache_instance_image 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils force_raw) 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/deploy_utils.py", line 531, in fetch_images 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils cache.fetch_image(href, path, ctx=ctx, force_raw=force_raw) 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/image_cache.py", line 140, in fetch_image 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils href, master_path, dest_path, ctx=ctx, force_raw=force_raw) 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/image_cache.py", line 167, in _download_image 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils _fetch(ctx, href, tmp_path, force_raw) 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/image_cache.py", line 324, in _fetch 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils images.image_to_raw(image_href, path, path_tmp) 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/common/images.py", line 361, in image_to_raw 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils disk_utils.convert_image(path_tmp, staged, 'raw') 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic_lib/disk_utils.py", line 413, in convert_image 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils use_standard_locale=True) 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic_lib/utils.py", line 99, in execute 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils result = processutils.execute(*cmd, **kwargs) 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py", line 424, in execute 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils cmd=sanitized_cmd) 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils Command: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmpmx49mjnx/a35feba0-9a05-426e-b591-ade31665cd8a.part /var/lib/ironic/master_images/tmpmx49mjnx/a35feba0-9a05-426e-b591-ade31665cd8a.converted 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils Exit code: -6 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils Stdout: '' 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils Stderr: 'Failed to allocate memory: Cannot allocate memory\n' 2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils Ironic's max memory limit for image conversion is 1024 MB: # grep image_convert_memory_limit /var/lib/config-data/puppet-generated/ironic/etc/ironic/ironic.conf #image_convert_memory_limit = 1024
*** Bug 1893199 has been marked as a duplicate of this bug. ***
We are seeing what we think is the same or similar issues. I have an environment where this is reproduced now. ./nova-compute.log.1:2021-05-06 21:16:39.759 7 INFO nova.compute.claims [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Claim successful on node ceadc963-30a1-4597-88ef-84cf4713f543 ./nova-compute.log.1:2021-05-06 21:17:01.087 7 INFO nova.virt.ironic.driver [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] Config drive for instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca on baremetal node ceadc963-30a1-4597-88ef-84cf4713f543 created. ./nova-compute.log.1:2021-05-06 21:17:32.015 7 ERROR oslo.service.loopingcall [-] Fixed interval looping call 'nova.virt.ironic.driver.IronicDriver._wait_for_active' failed: nova.exception.InstanceDeployFailure: Failed to provision instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca: Failed to deploy. Exception: Unexpected error while running command. ./nova-compute.log.1:2021-05-06 21:17:32.015 7 ERROR oslo.service.loopingcall nova.exception.InstanceDeployFailure: Failed to provision instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca: Failed to deploy. Exception: Unexpected error while running command. ./nova-compute.log.1:2021-05-06 21:17:32.017 7 ERROR nova.virt.ironic.driver [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] Error deploying instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca on baremetal node ceadc963-30a1-4597-88ef-84cf4713f543.: nova.exception.InstanceDeployFailure: Failed to provision instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca: Failed to deploy. Exception: Unexpected error while running command. ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Instance failed to spawn: nova.exception.InstanceDeployFailure: Failed to provision instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca: Failed to deploy. Exception: Unexpected error while running command. ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Traceback (most recent call last): ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2668, in _build_resources ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] yield resources ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2442, in _build_and_run_instance ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] block_device_info=block_device_info) ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1289, in spawn ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] 'node': node_uuid}) ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__ ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] self.force_reraise() ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] six.reraise(self.type_, self.value, self.tb) ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] File "/usr/lib/python3.6/site-packages/six.py", line 675, in reraise ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] raise value ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1281, in spawn ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] timer.start(interval=CONF.ironic.api_retry_interval).wait() ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] File "/usr/lib/python3.6/site-packages/eventlet/event.py", line 125, in wait ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] result = hub.switch() ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] return self.greenlet.switch() ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] File "/usr/lib/python3.6/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] result = func(*self.args, **self.kw) ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 558, in _wait_for_active ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] raise exception.InstanceDeployFailure(msg) ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] nova.exception.InstanceDeployFailure: Failed to provision instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca: Failed to deploy. Exception: Unexpected error while running command. ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Command: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmpsahbszdq/3ba5f9cf-625d-4e0f-8285-5f64da979e33.part /var/lib/ironic/master_images/tmpsahbszdq/3ba5f9cf-625d-4e0f-8285-5f64da979e33.converted ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Exit code: 1 ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Stdout: '' ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Stderr: 'qemu-img: error while reading sector 289536: Input/output error\n' ./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] ./nova-compute.log.1:2021-05-06 21:17:32.019 7 INFO nova.compute.manager [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Terminating instance ./nova-compute.log.1:2021-05-06 21:18:18.794 7 INFO nova.virt.ironic.driver [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Successfully unprovisioned Ironic node ceadc963-30a1-4597-88ef-84cf4713f543 ./nova-compute.log.1:2021-05-06 21:18:18.795 7 INFO nova.compute.manager [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Took 46.76 seconds to destroy the instance on the hypervisor. ./nova-compute.log.1:2021-05-06 21:18:19.059 7 ERROR nova.scheduler.client.report [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] [req-a0b8a683-cd25-458a-a1fd-e47f92790b2f] Failed to update inventory to [{'CUSTOM_BAREMETAL': {'total': 1, 'reserved': 1, 'min_unit': 1, 'max_unit': 1, 'step_size': 1, 'allocation_ratio': 1.0}}] for resource provider with UUID ceadc963-30a1-4597-88ef-84cf4713f543. Got 409: {"errors": [{"status": 409, "title": "Conflict", "detail": "There was a conflict when trying to complete your request.\n\n resource provider generation conflict ", "code": "placement.concurrent_update", "request_id": "req-a0b8a683-cd25-458a-a1fd-e47f92790b2f"}]}: nova.exception.InstanceDeployFailure: Failed to provision instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca: Failed to deploy. Exception: Unexpected error while running command. ./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Failed to build and run instance: nova.exception.InstanceDeployFailure: Failed to provision instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca: Failed to deploy. Exception: Unexpected error while running command. ./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Traceback (most recent call last): ./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2442, in _build_and_run_instance ./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] block_device_info=block_device_info) ./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1289, in spawn ./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] 'node': node_uuid}) ./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__ ./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] self.force_reraise() ./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise ./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] six.reraise(self.type_, self.value, self.tb) ./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] File "/usr/lib/python3.6/site-packages/six.py", line 675, in reraise ./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] raise value ./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1281, in spawn ./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] timer.start(interval=CONF.ironic.api_retry_interval).wait() ./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] File "/usr/lib/python3.6/site-packages/eventlet/event.py", line 125, in wait ./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] result = hub.switch() ./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch ./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] return self.greenlet.switch() ./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] File "/usr/lib/python3.6/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop ./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] result = func(*self.args, **self.kw) ./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 558, in _wait_for_active ./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] raise exception.InstanceDeployFailure(msg) ./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] nova.exception.InstanceDeployFailure: Failed to provision instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca: Failed to deploy. Exception: Unexpected error while running command. ./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Command: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmpsahbszdq/3ba5f9cf-625d-4e0f-8285-5f64da979e33.part /var/lib/ironic/master_images/tmpsahbszdq/3ba5f9cf-625d-4e0f-8285-5f64da979e33.converted ./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Exit code: 1 ./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Stdout: '' ./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Stderr: 'qemu-img: error while reading sector 289536: Input/output error\n' ./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] ./nova-compute.log.1:2021-05-06 21:18:21.568 7 INFO nova.compute.manager [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Took 1.06 seconds to deallocate network for instance. ./nova-compute.log.1:2021-05-06 21:18:21.935 7 INFO nova.scheduler.client.report [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] Deleted allocation for instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca ./nova-compute.log.1:2021-05-06 21:18:21.977 7 INFO nova.compute.manager [-] [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] During sync_power_state the instance has a pending task (spawning). Skip. ./nova-conductor.log.1:2021-05-06 21:18:21.954 7 ERROR nova.scheduler.utils [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Error from last host: overcloud-controller-1.localdomain (node ceadc963-30a1-4597-88ef-84cf4713f543): ['Traceback (most recent call last):\n', ' File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2442, in _build_and_run_instance\n block_device_info=block_device_info)\n', ' File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1289, in spawn\n \'node\': node_uuid})\n', ' File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__\n self.force_reraise()\n', ' File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise\n six.reraise(self.type_, self.value, self.tb)\n', ' File "/usr/lib/python3.6/site-packages/six.py", line 675, in reraise\n raise value\n', ' File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1281, in spawn\n timer.start(interval=CONF.ironic.api_retry_interval).wait()\n', ' File "/usr/lib/python3.6/site-packages/eventlet/event.py", line 125, in wait\n result = hub.switch()\n', ' File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch\n return self.greenlet.switch()\n', ' File "/usr/lib/python3.6/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop\n result = func(*self.args, **self.kw)\n', ' File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 558, in _wait_for_active\n raise exception.InstanceDeployFailure(msg)\n', "nova.exception.InstanceDeployFailure: Failed to provision instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca: Failed to deploy. Exception: Unexpected error while running command.\nCommand: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmpsahbszdq/3ba5f9cf-625d-4e0f-8285-5f64da979e33.part /var/lib/ironic/master_images/tmpsahbszdq/3ba5f9cf-625d-4e0f-8285-5f64da979e33.converted\nExit code: 1\nStdout: ''\nStderr: 'qemu-img: error while reading sector 289536: Input/output error\\n'\n", '\nDuring handling of the above exception, another exception occurred:\n\n', 'Traceback (most recent call last):\n', ' File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2168, in _do_build_and_run_instance\n filter_properties, request_spec)\n', ' File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2542, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', "nova.exception.RescheduledException: Build of instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca was re-scheduled: Failed to provision instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca: Failed to deploy. Exception: Unexpected error while running command.\nCommand: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmpsahbszdq/3ba5f9cf-625d-4e0f-8285-5f64da979e33.part /var/lib/ironic/master_images/tmpsahbszdq/3ba5f9cf-625d-4e0f-8285-5f64da979e33.converted\nExit code: 1\nStdout: ''\nStderr: 'qemu-img: error while reading sector 289536: Input/output error\\n'\n"] ./nova-conductor.log.1:2021-05-06 21:18:22.294 7 WARNING nova.scheduler.client.report [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] Failed to save allocation for 4df7207d-3d9a-4dc7-8ef3-6586787d75ca. Got HTTP 409: {"errors": [{"status": 409, "title": "Conflict", "detail": "There was a conflict when trying to complete your request.\n\n Unable to allocate inventory: Unable to create allocation for 'CUSTOM_BAREMETAL' on resource provider '4d94494b-4842-4723-8f32-7b7f011f6140'. The requested amount would exceed the capacity. ", "code": "placement.undefined_code", "request_id": "req-35eaa6eb-4570-4ab1-a169-c84a523e0b83"}]} ./nova-conductor.log.1:2021-05-06 21:18:22.295 7 WARNING nova.scheduler.utils [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] Failed to compute_task_build_instances: Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca. ./nova-conductor.log.1:2021-05-06 21:18:22.295 7 WARNING nova.scheduler.utils [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Setting instance to ERROR state.
Please see last comment.
Here are the full conductor logs: http://shell.lab.bos.redhat.com/~pbabbar/sos-logs/qemu-img-conversion/sosreport-overcloud-controller-1-2021-05-07-gqfsnyw.tar.xz
moving it back to assigned as we still see this issue being coming intermittently.
I've reproduced it (once or twice) outside of IPA by running: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw overcloud-full.qcow2 overcloud-full.raw Lowering the limit makes it 100% reproducible. When running qemu-img directly I can see the address space used by the qemu-img process isn't consistent between runs. Address space usage measured by summing the size of everything in smaps: #!/bin/bash -e qemu-img convert -O raw overcloud-full.qcow2 overcloud-full.raw & #qemu-img convert -O raw cirros-0.5.2-x86_64-disk.img cirros-0.5.2-x86_64-disk.raw & QEMU_PID=$! while [ -e /proc/${QEMU_PID} ]; do grep ^Size: /proc/${QEMU_PID}/smaps | awk '{s+=$2} END { print s}' done This appears to vary wildly - hundreds of MB difference between runs. I assume this is normal due to malloc order differing with threading. Is address space size even the right thing to limit? With mmap-ed files, shared libs, and the way malloc allocates buckets etc... this doesn't really translate to memory usage. Is there any reason to not increase image_convert_memory_limit on the undercloud? IIRC the absolute minimum we expect (e.g in CI/Dev) is 8gb RAM per node.
@dtantsur WDYT? I see you've just bumped the limit on master https://review.opendev.org/#/q/Ia1535b375d2cec467011c7456da84b5811ba48dd Cheers, Ollie
All for it, but I no longer work on OSP, so you'll have to contact someone from the team.
(In reply to Ollie Walsh from comment #31) > I've reproduced it (once or twice) outside of IPA by running: > /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img > convert -O raw overcloud-full.qcow2 overcloud-full.raw > > Lowering the limit makes it 100% reproducible. > > When running qemu-img directly I can see the address space used by the > qemu-img process isn't consistent between runs. Address space usage measured > by summing the size of everything in smaps: > > #!/bin/bash -e > qemu-img convert -O raw overcloud-full.qcow2 overcloud-full.raw & > #qemu-img convert -O raw cirros-0.5.2-x86_64-disk.img > cirros-0.5.2-x86_64-disk.raw & > QEMU_PID=$! > while [ -e /proc/${QEMU_PID} ]; do > grep ^Size: /proc/${QEMU_PID}/smaps | awk '{s+=$2} END { print s}' > done > > This appears to vary wildly - hundreds of MB difference between runs. I > assume this is normal due to malloc order differing with threading. Or maybe this is just due to ASLR. Also FWIW the qcow2 image size doesn't appear to be significant. Address space usage ranging from 500mb to 1.4gb when converting the cirros image.
(In reply to Ollie Walsh from comment #34) > (In reply to Ollie Walsh from comment #31) > > I've reproduced it (once or twice) outside of IPA by running: > > /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img > > convert -O raw overcloud-full.qcow2 overcloud-full.raw > > > > Lowering the limit makes it 100% reproducible. > > > > When running qemu-img directly I can see the address space used by the > > qemu-img process isn't consistent between runs. Address space usage measured > > by summing the size of everything in smaps: > > > > #!/bin/bash -e > > qemu-img convert -O raw overcloud-full.qcow2 overcloud-full.raw & > > #qemu-img convert -O raw cirros-0.5.2-x86_64-disk.img > > cirros-0.5.2-x86_64-disk.raw & > > QEMU_PID=$! > > while [ -e /proc/${QEMU_PID} ]; do > > grep ^Size: /proc/${QEMU_PID}/smaps | awk '{s+=$2} END { print s}' > > done > > > > This appears to vary wildly - hundreds of MB difference between runs. I > > assume this is normal due to malloc order differing with threading. > > Or maybe this is just due to ASLR. > I hadn't thought of that, which makes me wonder, combined with any memory pressure and possibly even much larger images, if qemu-img convert would even be possible in a hypervisor case, or a case of an administrator needing to convert the disk images around. :\ Our artificial memory pressure through the ulimit just hits the malloc headache harder because we're funcitonally trying to keep the process from OOM'ing the machine. :\ > Also FWIW the qcow2 image size doesn't appear to be significant. Address > space usage ranging from 500mb to 1.4gb when converting the cirros image. That is..... wow. When I read through the qemu-img source, it didn't *seem* like it would result in anything like that, but that functionally means all memory bounding will need to be ripped out.
(In reply to Julia Kreger from comment #35) > (In reply to Ollie Walsh from comment #34) > > (In reply to Ollie Walsh from comment #31) > > > I've reproduced it (once or twice) outside of IPA by running: > > > /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img > > > convert -O raw overcloud-full.qcow2 overcloud-full.raw > > > > > > Lowering the limit makes it 100% reproducible. > > > > > > When running qemu-img directly I can see the address space used by the > > > qemu-img process isn't consistent between runs. Address space usage measured > > > by summing the size of everything in smaps: > > > > > > #!/bin/bash -e > > > qemu-img convert -O raw overcloud-full.qcow2 overcloud-full.raw & > > > #qemu-img convert -O raw cirros-0.5.2-x86_64-disk.img > > > cirros-0.5.2-x86_64-disk.raw & > > > QEMU_PID=$! > > > while [ -e /proc/${QEMU_PID} ]; do > > > grep ^Size: /proc/${QEMU_PID}/smaps | awk '{s+=$2} END { print s}' > > > done > > > > > > This appears to vary wildly - hundreds of MB difference between runs. I > > > assume this is normal due to malloc order differing with threading. > > > > Or maybe this is just due to ASLR. > > > > I hadn't thought of that, which makes me wonder, combined with any memory > pressure and possibly even much larger images, I also tested with a qcow2 with 10gb of random data and it didn't notice a major difference. > if qemu-img convert would > even be possible in a hypervisor case, or a case of an administrator needing > to convert the disk images around. :\ Our artificial memory pressure through > the ulimit just hits the malloc headache harder because we're funcitonally > trying to keep the process from OOM'ing the machine. :\ > > > Also FWIW the qcow2 image size doesn't appear to be significant. Address > > space usage ranging from 500mb to 1.4gb when converting the cirros image. > > That is..... wow. When I read through the qemu-img source, it didn't *seem* > like it would result in anything like that, but that functionally means all > memory bounding will need to be ripped out. Been chatting about this on #virt. It does appears to be related to malloc and threading. Setting MALLOC_ARENA_MAX=1 results in ~250mb. Similarly disabling threading with -m 1 results in ~250mb. The default limit for MALLOC_ARENA_MAX = 8 * ncpus so it's likely that the number of cpu cores is significant too. I expect this could affect other project setting similar limits e.g libguestfs so may ultimately need to be addressed in qemu-img. May also need a workaround in the various places it is used in openstack
I'm working on patches for ironic at this time, however there is more than one deployment interface impacted by this which will result in patches against multiple repositories and branches.
You need to maintain some limits (maybe higher than present). qemu-img is vulnerable to denial of service attacks when processing untrusted images, and these are easy to find and exploit.
That makes a lot of sense given how it sifts through the data to assemble the stream to write out. Good to know, thanks!
(In reply to Richard W.M. Jones from comment #38) > You need to maintain some limits (maybe higher than present). qemu-img > is vulnerable to denial of service attacks when processing untrusted > images, and these are easy to find and exploit. More info in https://bugs.launchpad.net/ossa/+bug/1449062
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3762