Bug 1892773 - Failed to allocate memory: Cannot allocate memory , qemu-img coredump creating instances in overcloud
Summary: Failed to allocate memory: Cannot allocate memory , qemu-img coredump creatin...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-ironic-lib
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: z7
: 16.1 (Train on RHEL 8.2)
Assignee: Julia Kreger
QA Contact: nlevinki
URL:
Whiteboard:
: 1893199 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-29 15:59 UTC by Alistair Tonner
Modified: 2021-12-09 20:17 UTC (History)
13 users (show)

Fixed In Version: python-ironic-lib-2.21.2-1.20210603223309.cc50dab.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-12-09 20:17:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 766815 0 None MERGED Fix other out of memory error for qemu-img convert 2021-02-19 19:23:05 UTC
OpenStack gerrit 793225 0 None NEW Limit qemu-img execution arenas 2021-05-27 00:27:18 UTC
OpenStack gerrit 793238 0 None NEW Limit the number of malloc arenas for qemu-img convert 2021-05-27 00:27:18 UTC
OpenStack gerrit 793351 0 None MERGED Limit the number of malloc arenas for qemu-img convert 2021-06-02 16:06:24 UTC
OpenStack gerrit 793931 0 None MERGED Increase the memory limit for qemu-img 2021-06-02 16:38:46 UTC
OpenStack gerrit 794034 0 None MERGED Limit qemu-img execution arenas 2021-06-02 16:38:46 UTC
Red Hat Issue Tracker OSP-1513 0 None None None 2021-11-18 11:29:00 UTC
Red Hat Knowledge Base (Solution) 6170232 0 None None None 2021-07-06 18:57:44 UTC
Red Hat Product Errata RHBA-2021:3762 0 None None None 2021-12-09 20:17:49 UTC

Description Alistair Tonner 2020-10-29 15:59:47 UTC
Description of problem:
  while deploying two instances in overcloud qemu-img coredumped on both ironic overcloud servers.

   Deleting the instances and redploying later was successful


Version-Release number of selected component (if applicable):
RHOS-16.1-RHEL-8-20201021.n.0

ipxe-roms-qemu.noarch                             20181214-5.git133f4c47.el8                      @rhosp-rhel-8.2-appstream
libvirt-daemon-driver-qemu.x86_64                 6.0.0-25.4.module+el8.2.1+8060+c0c58169         @rhosp-rhel-8.2-av
openstack-ironic-python-agent-builder.noarch      2.1.1-1.20200914175356.65d0f80.el8ost           @rhelosp-16.1
puppet-ironic.noarch                              15.4.1-1.20200814153354.39f97cc.el8ost          @rhelosp-16.1
puppet-nova.noarch                                15.6.1-1.20200814103355.51a6857.el8ost          @rhelosp-16.1
python3-ironic-inspector-client.noarch            3.7.1-0.20200522054325.3a41127.el8ost           @rhelosp-16.1
python3-ironicclient.noarch                       3.1.2-0.20200522053422.1220d76.el8ost           @rhelosp-16.1
python3-novaclient.noarch                         1:15.1.1-0.20200629073413.79959ab.el8ost        @rhelosp-16.1
qemu-guest-agent.x86_64                           15:4.2.0-29.module+el8.2.1+7990+27f1e480.4      @rhosp-rhel-8.2-av
qemu-img.x86_64                                   15:4.2.0-29.module+el8.2.1+7990+27f1e480.4      @rhosp-rhel-8.2-av
qemu-kvm.x86_64                                   15:4.2.0-29.module+el8.2.1+7990+27f1e480.4      @rhosp-rhel-8.2-av
qemu-kvm-block-curl.x86_64                        15:4.2.0-29.module+el8.2.1+7990+27f1e480.4      @rhosp-rhel-8.2-av
qemu-kvm-block-gluster.x86_64                     15:4.2.0-29.module+el8.2.1+7990+27f1e480.4      @rhosp-rhel-8.2-av
qemu-kvm-block-iscsi.x86_64                       15:4.2.0-29.module+el8.2.1+7990+27f1e480.4      @rhosp-rhel-8.2-av
qemu-kvm-block-rbd.x86_64                         15:4.2.0-29.module+el8.2.1+7990+27f1e480.4      @rhosp-rhel-8.2-av
qemu-kvm-block-ssh.x86_64                         15:4.2.0-29.module+el8.2.1+7990+27f1e480.4      @rhosp-rhel-8.2-av
qemu-kvm-common.x86_64                            15:4.2.0-29.module+el8.2.1+7990+27f1e480.4      @rhosp-rhel-8.2-av
qemu-kvm-core.x86_64                              15:4.2.0-29.module+el8.2.1+7990+27f1e480.4      @rhosp-rhel-8.2-av

| docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-ironic-neutron-agent:16.1_20201020.1       |
| docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-scheduler:16.1_20201020.1             |
| docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-compute-ironic:16.1_20201020.1        |
| docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-ironic-conductor:16.1_20201020.1           |
| docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-api:16.1_20201020.1                   |
| docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-ironic-api:16.1_20201020.1                 |
| docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-ironic-inspector:16.1_20201020.1           |
| docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-conductor:16.1_20201020.1             |
| docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-ironic-pxe:16.1_20201020.1                 |
| docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-novncproxy:16.1_20201020.1            |
| docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-compute:16.1_20201020.1               |
| docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-libvirt:16.1_20201020.1 


How reproducible:

  consistent across multiple deploys

Steps to Reproduce:
1. deploy 3cont_2comp_3ceph_2ironrole_2ironic ironic in self contained overcloud nodes.
2. create unique networking in the overcloud, deploy 2 instances on the ironic nodes.
3.

Actual results:

instance deployment fails 

Expected results:

instances deploy successfully

Additional info:

Comment 1 Alistair Tonner 2020-10-29 16:20:18 UTC
+--------------------------------------+-----------+--------+----------+----------------+--------+
| ID                                   | Name      | Status | Networks | Image          | Flavor |
+--------------------------------------+-----------+--------+----------+----------------+--------+
| d311485a-709f-476b-8e4c-a5731147e611 | instance2 | ERROR  |          | overcloud-full |        |
| 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d | instance1 | ERROR  |          | overcloud-full |        |
+--------------------------------------+-----------+--------+----------+----------------+--------+

scan for instance ids on controllers and ironic overcloud nodes:
checking for d311485a-709f-476b-8e4c-a5731147e611 on controller at 192.168.24.47
Warning: Permanently added '192.168.24.47' (ECDSA) to the list of known hosts.
This system is not registered to Red Hat Insights. See https://cloud.redhat.com/
To register this system, run: insights-client --register

Last login: Thu Oct 29 16:13:37 2020 from 192.168.24.1
sudo grep -R "d311485a-709f-476b-8e4c-a5731147e611" /var/log/containers |grep ERROR
echo
exit
31147e611" /var/log/containers |grep ERRORo grep -R "d311485a-709f-476b-8e4c-a573
/var/log/containers/nova/nova-conductor.log.1:2020-10-28 21:35:20.944 7 ERROR nova.scheduler.utils [req-4374ade3-be9c-4616-8d86-2dc14b4e54c7 3b01a29d2ad6412797ee829c00981d16 dcabc5daf96e481dbc165d728d64ac7a - default default] [instance: d311485a-709f-476b-8e4c-a5731147e611] Error from last host: overcloud-controller-1.localdomain (node 7f0f5990-c495-4615-80be-69139da3f3fa): ['Traceback (most recent call last):\n', '  File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2437, in _build_and_run_instance\n    block_device_info=block_device_info)\n', '  File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1289, in spawn\n    \'node\': node_uuid})\n', '  File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__\n    self.force_reraise()\n', '  File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise\n    six.reraise(self.type_, self.value, self.tb)\n', '  File "/usr/lib/python3.6/site-packages/six.py", line 675, in reraise\n    raise value\n', '  File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1281, in spawn\n    timer.start(interval=CONF.ironic.api_retry_interval).wait()\n', '  File "/usr/lib/python3.6/site-packages/eventlet/event.py", line 125, in wait\n    result = hub.switch()\n', '  File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch\n    return self.greenlet.switch()\n', '  File "/usr/lib/python3.6/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop\n    result = func(*self.args, **self.kw)\n', '  File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 558, in _wait_for_active\n    raise exception.InstanceDeployFailure(msg)\n', "nova.exception.InstanceDeployFailure: Failed to provision instance d311485a-709f-476b-8e4c-a5731147e611: Failed to deploy. Exception: Unexpected error while running command.\nCommand: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmpttmuad6s/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.part /var/lib/ironic/master_images/tmpttmuad6s/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.converted\nExit code: -6\nStdout: ''\nStderr: 'Failed to allocate memory: Cannot allocate memory\\n'\n", '\nDuring handling of the above exception, another exception occurred:\n\n', 'Traceback (most recent call last):\n', '  File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2161, in _do_build_and_run_instance\n    filter_properties, request_spec)\n', '  File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2537, in _build_and_run_instance\n    instance_uuid=instance.uuid, reason=six.text_type(e))\n', "nova.exception.RescheduledException: Build of instance d311485a-709f-476b-8e4c-a5731147e611 was re-scheduled: Failed to provision instance d311485a-709f-476b-8e4c-a5731147e611: Failed to deploy. Exception: Unexpected error while running command.\nCommand: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmpttmuad6s/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.part /var/lib/ironic/master_images/tmpttmuad6s/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.converted\nExit code: -6\nStdout: ''\nStderr: 'Failed to allocate memory: Cannot allocate memory\\n'\n"]
/var/log/containers/nova/nova-conductor.log.1:2020-10-28 21:35:20.948 7 WARNING nova.scheduler.utils [req-4374ade3-be9c-4616-8d86-2dc14b4e54c7 3b01a29d2ad6412797ee829c00981d16 dcabc5daf96e481dbc165d728d64ac7a - default default] [instance: d311485a-709f-476b-8e4c-a5731147e611] Setting instance to ERROR state.: nova.exception.MaxRetriesExceeded: Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance d311485a-709f-476b-8e4c-a5731147e611.
 


checking for d311485a-709f-476b-8e4c-a5731147e611 on controller at 192.168.24.48
Warning: Permanently added '192.168.24.48' (ECDSA) to the list of known hosts.
This system is not registered to Red Hat Insights. See https://cloud.redhat.com/
To register this system, run: insights-client --register

Last login: Thu Oct 29 16:13:39 2020 from 192.168.24.1
sudo grep -R "d311485a-709f-476b-8e4c-a5731147e611" /var/log/containers |grep ERROR
echo
exit
31147e611" /var/log/containers |grep ERRORo grep -R "d311485a-709f-476b-8e4c-a573
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.016 7 ERROR oslo.service.loopingcall [-] Fixed interval looping call 'nova.virt.ironic.driver.IronicDriver._wait_for_active' failed: nova.exception.InstanceDeployFailure: Failed to provision instance d311485a-709f-476b-8e4c-a5731147e611: Failed to deploy. Exception: Unexpected error while running command.
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.016 7 ERROR oslo.service.loopingcall nova.exception.InstanceDeployFailure: Failed to provision instance d311485a-709f-476b-8e4c-a5731147e611: Failed to deploy. Exception: Unexpected error while running command.
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.virt.ironic.driver [req-4374ade3-be9c-4616-8d86-2dc14b4e54c7 3b01a29d2ad6412797ee829c00981d16 dcabc5daf96e481dbc165d728d64ac7a - default default] Error deploying instance d311485a-709f-476b-8e4c-a5731147e611 on baremetal node 7f0f5990-c495-4615-80be-69139da3f3fa.: nova.exception.InstanceDeployFailure: Failed to provision instance d311485a-709f-476b-8e4c-a5731147e611: Failed to deploy. Exception: Unexpected error while running command.
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [req-4374ade3-be9c-4616-8d86-2dc14b4e54c7 3b01a29d2ad6412797ee829c00981d16 dcabc5daf96e481dbc165d728d64ac7a - default default] [instance: d311485a-709f-476b-8e4c-a5731147e611] Instance failed to spawn: nova.exception.InstanceDeployFailure: Failed to provision instance d311485a-709f-476b-8e4c-a5731147e611: Failed to deploy. Exception: Unexpected error while running command.
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] Traceback (most recent call last):
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]   File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2663, in _build_resources
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]     yield resources
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]   File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2437, in _build_and_run_instance
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]     block_device_info=block_device_info)
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]   File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1289, in spawn
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]     'node': node_uuid})
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]     self.force_reraise()
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]     six.reraise(self.type_, self.value, self.tb)
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]   File "/usr/lib/python3.6/site-packages/six.py", line 675, in reraise
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]     raise value
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]   File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1281, in spawn
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]     timer.start(interval=CONF.ironic.api_retry_interval).wait()
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]   File "/usr/lib/python3.6/site-packages/eventlet/event.py", line 125, in wait
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]     result = hub.switch()
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]   File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]     return self.greenlet.switch()
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]   File "/usr/lib/python3.6/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]     result = func(*self.args, **self.kw)
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]   File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 558, in _wait_for_active
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]     raise exception.InstanceDeployFailure(msg)
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] nova.exception.InstanceDeployFailure: Failed to provision instance d311485a-709f-476b-8e4c-a5731147e611: Failed to deploy. Exception: Unexpected error while running command.
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] Command: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmpttmuad6s/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.part /var/lib/ironic/master_images/tmpttmuad6s/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.converted
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] Exit code: -6
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] Stdout: ''
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] Stderr: 'Failed to allocate memory: Cannot allocate memory\n'
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:36.018 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] 
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [req-4374ade3-be9c-4616-8d86-2dc14b4e54c7 3b01a29d2ad6412797ee829c00981d16 dcabc5daf96e481dbc165d728d64ac7a - default default] [instance: d311485a-709f-476b-8e4c-a5731147e611] Failed to build and run instance: nova.exception.InstanceDeployFailure: Failed to provision instance d311485a-709f-476b-8e4c-a5731147e611: Failed to deploy. Exception: Unexpected error while running command.
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] Traceback (most recent call last):
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]   File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2437, in _build_and_run_instance
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]     block_device_info=block_device_info)
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]   File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1289, in spawn
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]     'node': node_uuid})
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]     self.force_reraise()
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]     six.reraise(self.type_, self.value, self.tb)
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]   File "/usr/lib/python3.6/site-packages/six.py", line 675, in reraise
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]     raise value
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]   File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1281, in spawn
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]     timer.start(interval=CONF.ironic.api_retry_interval).wait()
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]   File "/usr/lib/python3.6/site-packages/eventlet/event.py", line 125, in wait
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]     result = hub.switch()
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]   File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]     return self.greenlet.switch()
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]   File "/usr/lib/python3.6/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]     result = func(*self.args, **self.kw)
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]   File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 558, in _wait_for_active
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611]     raise exception.InstanceDeployFailure(msg)
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] nova.exception.InstanceDeployFailure: Failed to provision instance d311485a-709f-476b-8e4c-a5731147e611: Failed to deploy. Exception: Unexpected error while running command.
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] Command: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmpttmuad6s/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.part /var/lib/ironic/master_images/tmpttmuad6s/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.converted
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] Exit code: -6
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] Stdout: ''
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] Stderr: 'Failed to allocate memory: Cannot allocate memory\n'
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:17.609 7 ERROR nova.compute.manager [instance: d311485a-709f-476b-8e4c-a5731147e611] 
[heat-admin@overcloud-controller-1 ~]$ echo
[heat-admin@overcloud-controller-1 ~]$ exit

checking for 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d on controller at 192.168.24.47
Warning: Permanently added '192.168.24.47' (ECDSA) to the list of known hosts.
This system is not registered to Red Hat Insights. See https://cloud.redhat.com/
To register this system, run: insights-client --register

Last login: Thu Oct 29 16:14:18 2020 from 192.168.24.1
sudo grep -R "60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d" /var/log/containers |grep ERROR
echo
exit
45a7ced5d" /var/log/containers |grep ERRORo grep -R "60fdfb6e-5fcc-4c91-bdf8-e5b4
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.546 7 ERROR oslo.service.loopingcall [-] Fixed interval looping call 'nova.virt.ironic.driver.IronicDriver._wait_for_active' failed: nova.exception.InstanceDeployFailure: Failed to provision instance 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d: Failed to deploy. Exception: Unexpected error while running command.
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.546 7 ERROR oslo.service.loopingcall nova.exception.InstanceDeployFailure: Failed to provision instance 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d: Failed to deploy. Exception: Unexpected error while running command.
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.virt.ironic.driver [req-64fb5b93-f83f-44be-ac9f-47158f7fa7f9 3b01a29d2ad6412797ee829c00981d16 dcabc5daf96e481dbc165d728d64ac7a - default default] Error deploying instance 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d on baremetal node a87c1654-4577-4cc7-bae6-8c2ae2a18e2e.: nova.exception.InstanceDeployFailure: Failed to provision instance 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d: Failed to deploy. Exception: Unexpected error while running command.
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [req-64fb5b93-f83f-44be-ac9f-47158f7fa7f9 3b01a29d2ad6412797ee829c00981d16 dcabc5daf96e481dbc165d728d64ac7a - default default] [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Instance failed to spawn: nova.exception.InstanceDeployFailure: Failed to provision instance 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d: Failed to deploy. Exception: Unexpected error while running command.
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Traceback (most recent call last):
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]   File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2663, in _build_resources
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]     yield resources
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]   File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2437, in _build_and_run_instance
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]     block_device_info=block_device_info)
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]   File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1289, in spawn
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]     'node': node_uuid})
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]     self.force_reraise()
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]     six.reraise(self.type_, self.value, self.tb)
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]   File "/usr/lib/python3.6/site-packages/six.py", line 675, in reraise
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]     raise value
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]   File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1281, in spawn
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]     timer.start(interval=CONF.ironic.api_retry_interval).wait()
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]   File "/usr/lib/python3.6/site-packages/eventlet/event.py", line 125, in wait
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]     result = hub.switch()
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]   File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]     return self.greenlet.switch()
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]   File "/usr/lib/python3.6/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]     result = func(*self.args, **self.kw)
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]   File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 558, in _wait_for_active
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]     raise exception.InstanceDeployFailure(msg)
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] nova.exception.InstanceDeployFailure: Failed to provision instance 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d: Failed to deploy. Exception: Unexpected error while running command.
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Command: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmprgc5z2k3/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.part /var/lib/ironic/master_images/tmprgc5z2k3/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.converted
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Exit code: -6
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Stdout: ''
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Stderr: 'Failed to allocate memory: Cannot allocate memory\n'
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] 
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [req-64fb5b93-f83f-44be-ac9f-47158f7fa7f9 3b01a29d2ad6412797ee829c00981d16 dcabc5daf96e481dbc165d728d64ac7a - default default] [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Failed to build and run instance: nova.exception.InstanceDeployFailure: Failed to provision instance 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d: Failed to deploy. Exception: Unexpected error while running command.
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Traceback (most recent call last):
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]   File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2437, in _build_and_run_instance
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]     block_device_info=block_device_info)
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]   File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1289, in spawn
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]     'node': node_uuid})
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]     self.force_reraise()
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]     six.reraise(self.type_, self.value, self.tb)
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]   File "/usr/lib/python3.6/site-packages/six.py", line 675, in reraise
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]     raise value
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]   File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1281, in spawn
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]     timer.start(interval=CONF.ironic.api_retry_interval).wait()
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]   File "/usr/lib/python3.6/site-packages/eventlet/event.py", line 125, in wait
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]     result = hub.switch()
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]   File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]     return self.greenlet.switch()
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]   File "/usr/lib/python3.6/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]     result = func(*self.args, **self.kw)
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]   File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 558, in _wait_for_active
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d]     raise exception.InstanceDeployFailure(msg)
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] nova.exception.InstanceDeployFailure: Failed to provision instance 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d: Failed to deploy. Exception: Unexpected error while running command.
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Command: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmprgc5z2k3/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.part /var/lib/ironic/master_images/tmprgc5z2k3/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.converted
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Exit code: -6
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Stdout: ''
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Stderr: 'Failed to allocate memory: Cannot allocate memory\n'
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:35:12.431 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] 
[heat-admin@overcloud-controller-2 ~]$ echo
[heat-admin@overcloud-controller-2 ~]$ exit
 
checking for 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d on controller at 192.168.24.48
Warning: Permanently added '192.168.24.48' (ECDSA) to the list of known hosts.
This system is not registered to Red Hat Insights. See https://cloud.redhat.com/
To register this system, run: insights-client --register

Last login: Thu Oct 29 16:14:20 2020 from 192.168.24.1
sudo grep -R "60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d" /var/log/containers |grep ERROR
echo
exit
45a7ced5d" /var/log/containers |grep ERRORo grep -R "60fdfb6e-5fcc-4c91-bdf8-e5b4
/var/log/containers/nova/nova-conductor.log.1:2020-10-28 21:35:15.423 7 ERROR nova.scheduler.utils [req-64fb5b93-f83f-44be-ac9f-47158f7fa7f9 3b01a29d2ad6412797ee829c00981d16 dcabc5daf96e481dbc165d728d64ac7a - default default] [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Error from last host: overcloud-controller-2.localdomain (node a87c1654-4577-4cc7-bae6-8c2ae2a18e2e): ['Traceback (most recent call last):\n', '  File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2437, in _build_and_run_instance\n    block_device_info=block_device_info)\n', '  File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1289, in spawn\n    \'node\': node_uuid})\n', '  File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__\n    self.force_reraise()\n', '  File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise\n    six.reraise(self.type_, self.value, self.tb)\n', '  File "/usr/lib/python3.6/site-packages/six.py", line 675, in reraise\n    raise value\n', '  File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1281, in spawn\n    timer.start(interval=CONF.ironic.api_retry_interval).wait()\n', '  File "/usr/lib/python3.6/site-packages/eventlet/event.py", line 125, in wait\n    result = hub.switch()\n', '  File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch\n    return self.greenlet.switch()\n', '  File "/usr/lib/python3.6/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop\n    result = func(*self.args, **self.kw)\n', '  File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 558, in _wait_for_active\n    raise exception.InstanceDeployFailure(msg)\n', "nova.exception.InstanceDeployFailure: Failed to provision instance 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d: Failed to deploy. Exception: Unexpected error while running command.\nCommand: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmprgc5z2k3/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.part /var/lib/ironic/master_images/tmprgc5z2k3/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.converted\nExit code: -6\nStdout: ''\nStderr: 'Failed to allocate memory: Cannot allocate memory\\n'\n", '\nDuring handling of the above exception, another exception occurred:\n\n', 'Traceback (most recent call last):\n', '  File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2161, in _do_build_and_run_instance\n    filter_properties, request_spec)\n', '  File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2537, in _build_and_run_instance\n    instance_uuid=instance.uuid, reason=six.text_type(e))\n', "nova.exception.RescheduledException: Build of instance 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d was re-scheduled: Failed to provision instance 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d: Failed to deploy. Exception: Unexpected error while running command.\nCommand: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmprgc5z2k3/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.part /var/lib/ironic/master_images/tmprgc5z2k3/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.converted\nExit code: -6\nStdout: ''\nStderr: 'Failed to allocate memory: Cannot allocate memory\\n'\n"]
/var/log/containers/nova/nova-conductor.log.1:2020-10-28 21:35:15.810 7 WARNING nova.scheduler.utils [req-64fb5b93-f83f-44be-ac9f-47158f7fa7f9 3b01a29d2ad6412797ee829c00981d16 dcabc5daf96e481dbc165d728d64ac7a - default default] [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Setting instance to ERROR state.
[heat-admin@overcloud-controller-1 ~]$ echo

[heat-admin@overcloud-controller-1 ~]$ exit





    Also:

[heat-admin@overcloud-ironic-1 ~]$ sudo coredumpctl
TIME                            PID   UID   GID SIG COREFILE  EXE
Wed 2020-10-28 21:34:34 UTC   30244 42422 42422   6 present   /usr/bin/qemu-img
[heat-admin@overcloud-ironic-1 ~]$ echo

[heat-admin@overcloud-ironic-0 ~]$ sudo coredumpctl
TIME                            PID   UID   GID SIG COREFILE  EXE
Wed 2020-10-28 21:34:24 UTC   30214 42422 42422   6 present   /usr/bin/qemu-img
[heat-admin@overcloud-ironic-0 ~]$ echo

   Unfortunately I cannot retrieve the coredumps at the moment.

Comment 3 Alistair Tonner 2020-10-29 16:23:06 UTC
Deleting the failed instances, refreshing the target "baremetal" nodes (manage, provide) and redeploying the instances (same paramters) later succeeds.

Comment 4 Alistair Tonner 2020-10-29 16:35:20 UTC
[heat-admin@overcloud-ironic-0 ~]$ sudo podman info
host:
  BuildahVersion: 1.12.0-dev
  CgroupVersion: v1
  Conmon:
    package: conmon-2.0.6-1.module+el8.2.0+7659+b700d80e.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.6, commit: 0ef43c583d5911a10ad68726fde31e63d98d773b'
  Distribution:
    distribution: '"rhel"'
    version: "8.2"
  MemFree: 8464867328
  MemTotal: 16643993600
  OCIRuntime:
    name: runc
    package: runc-1.0.0-64.rc10.module+el8.2.0+7659+b700d80e.x86_64
    path: /usr/bin/runc
    version: 'runc version spec: 1.0.1-dev'
  SwapFree: 0
  SwapTotal: 0
  arch: amd64
  cpus: 8
  eventlogger: journald
  hostname: overcloud-ironic-0
  kernel: 4.18.0-193.28.1.el8_2.x86_64
  os: linux
  rootless: false
  uptime: 20h 14m 46.39s (Approximately 0.83 days)
registries:
  blocked: null
  insecure:
  - undercloud-0.ctlplane.redhat.local:8787
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  ConfigFile: /etc/containers/storage.conf
  ContainerStore:
    number: 11
  GraphDriverName: overlay
  GraphOptions: {}
  GraphRoot: /var/lib/containers/storage
  GraphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  ImageStore:
    number: 4
  RunRoot: /var/run/containers/storage
  VolumePath: /var/lib/containers/storage/volumes


[heat-admin@overcloud-ironic-1 ~]$ sudo podman info
host:
  BuildahVersion: 1.12.0-dev
  CgroupVersion: v1
  Conmon:
    package: conmon-2.0.6-1.module+el8.2.0+7659+b700d80e.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.6, commit: 0ef43c583d5911a10ad68726fde31e63d98d773b'
  Distribution:
    distribution: '"rhel"'
    version: "8.2"
  MemFree: 8350900224
  MemTotal: 16643993600
  OCIRuntime:
    name: runc
    package: runc-1.0.0-64.rc10.module+el8.2.0+7659+b700d80e.x86_64
    path: /usr/bin/runc
    version: 'runc version spec: 1.0.1-dev'
  SwapFree: 0
  SwapTotal: 0
  arch: amd64
  cpus: 8
  eventlogger: journald
  hostname: overcloud-ironic-1
  kernel: 4.18.0-193.28.1.el8_2.x86_64
  os: linux
  rootless: false
  uptime: 20h 14m 40.9s (Approximately 0.83 days)
registries:
  blocked: null
  insecure:
  - undercloud-0.ctlplane.redhat.local:8787
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  ConfigFile: /etc/containers/storage.conf
  ContainerStore:
    number: 11
  GraphDriverName: overlay
  GraphOptions: {}
  GraphRoot: /var/lib/containers/storage
  GraphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  ImageStore:
    number: 4
  RunRoot: /var/run/containers/storage
  VolumePath: /var/lib/containers/storage/volumes

Comment 6 Artom Lifshitz 2020-11-04 23:15:11 UTC
> /var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Command: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmprgc5z2k3/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.part /var/lib/ironic/master_images/tmprgc5z2k3/9a3fd303-51f2-42b2-8155-6f1baf3ff5c9.converted
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Exit code: -6
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Stdout: ''
/var/log/containers/nova/nova-compute.log.1:2020-10-28 21:34:25.550 7 ERROR nova.compute.manager [instance: 60fdfb6e-5fcc-4c91-bdf8-e5b45a7ced5d] Stderr: 'Failed to allocate memory: Cannot allocate memory\n'


Looks like qemu-img is unable to convert the image because there's not enough RAM. I wonder whether qemu-img attempts to fit the entire image into RAM, how big the image file is, and how much RAM is available on the compute where this conversation attempt takes place?

Comment 7 Kashyap Chamarthy 2020-11-09 13:12:55 UTC
Nova caps the amount of memory and CPU that 'qemu-img' can use to
protect against Denial of Service (DoS); it was fixed by DanPB back 
in 2016, as part of a CVE[1]

Talking to Dan on IRC, he makes a good point that either the cap is too 
small for this image; or the disk image is corrupt and correctly 
triggering the 'ulimit' protection against DoS.


[1] https://opendev.org/openstack/nova/commit/068d851561ad
    — virt: set address space & CPU time limits when running qemu-img

        - - - 

So can the reporter get `qemu-img info` of the image?

Comment 14 Martin Schuppert 2020-12-03 15:05:22 UTC
The issue is actually on the ironic conductor side which does the conversion. Following from the ironic conductor log:

2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils [req-f387e258-8ac7-4491-90bb-5169aacc48b0 fc2a283a64264b1682c1da2508ee007d 6f1ce8efc14f431b962691dc1c39eddc - default default] Node b19c943a-295e-4ca9-a13b-04e41b509c67 failed deploy step {'step': 'deploy', 'priority': 100, 'argsinfo': None, 'interface': 'deploy'} with unexpected error: Unexpected error while running command.
Command: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmpmx49mjnx/a35feba0-9a05-426e-b591-ade31665cd8a.part /var/lib/ironic/master_images/tmpmx49mjnx/a35feba0-9a05-426e-b591-ade31665cd8a.converted
Exit code: -6
Stdout: ''
Stderr: 'Failed to allocate memory: Cannot allocate memory\n': oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
Command: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmpmx49mjnx/a35feba0-9a05-426e-b591-ade31665cd8a.part /var/lib/ironic/master_images/tmpmx49mjnx/a35feba0-9a05-426e-b591-ade31665cd8a.converted
Exit code: -6
Stdout: ''
Stderr: 'Failed to allocate memory: Cannot allocate memory\n'
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils Traceback (most recent call last):
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils   File "/usr/lib/python3.6/site-packages/ironic/conductor/manager.py", line 3951, in _do_next_deploy_step
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils     result = interface.execute_deploy_step(task, step)
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils   File "/usr/lib/python3.6/site-packages/ironic/drivers/base.py", line 360, in execute_deploy_step
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils     return self._execute_step(task, step)
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils   File "/usr/lib/python3.6/site-packages/ironic/drivers/base.py", line 283, in _execute_step
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils     return getattr(self, step['step'])(task)
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils   File "/usr/lib/python3.6/site-packages/ironic_lib/metrics.py", line 60, in wrapped
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils     result = f(*args, **kwargs)
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils   File "/usr/lib/python3.6/site-packages/ironic/conductor/task_manager.py", line 148, in wrapper
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils     return f(*args, **kwargs)
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils   File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/iscsi_deploy.py", line 422, in deploy
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils     deploy_utils.cache_instance_image(task.context, node)
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils   File "/usr/lib/python3.6/site-packages/ironic_lib/metrics.py", line 60, in wrapped
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils     result = f(*args, **kwargs)
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils   File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/deploy_utils.py", line 1197, in cache_instance_image
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils     force_raw)
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils   File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/deploy_utils.py", line 531, in fetch_images
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils     cache.fetch_image(href, path, ctx=ctx, force_raw=force_raw)
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils   File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/image_cache.py", line 140, in fetch_image
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils     href, master_path, dest_path, ctx=ctx, force_raw=force_raw)
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils   File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/image_cache.py", line 167, in _download_image
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils     _fetch(ctx, href, tmp_path, force_raw)
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils   File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/image_cache.py", line 324, in _fetch
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils     images.image_to_raw(image_href, path, path_tmp)
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils   File "/usr/lib/python3.6/site-packages/ironic/common/images.py", line 361, in image_to_raw
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils     disk_utils.convert_image(path_tmp, staged, 'raw')
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils   File "/usr/lib/python3.6/site-packages/ironic_lib/disk_utils.py", line 413, in convert_image
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils     use_standard_locale=True)
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils   File "/usr/lib/python3.6/site-packages/ironic_lib/utils.py", line 99, in execute
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils     result = processutils.execute(*cmd, **kwargs)
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils   File "/usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py", line 424, in execute
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils     cmd=sanitized_cmd)
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils Command: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmpmx49mjnx/a35feba0-9a05-426e-b591-ade31665cd8a.part /var/lib/ironic/master_images/tmpmx49mjnx/a35feba0-9a05-426e-b591-ade31665cd8a.converted
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils Exit code: -6
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils Stdout: ''
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils Stderr: 'Failed to allocate memory: Cannot allocate memory\n'
2020-12-02 16:11:58.756 8 ERROR ironic.conductor.utils 

Ironic's max memory limit for image conversion is 1024 MB: 

# grep image_convert_memory_limit /var/lib/config-data/puppet-generated/ironic/etc/ironic/ironic.conf
#image_convert_memory_limit = 1024

Comment 17 Steve Baker 2020-12-16 03:42:34 UTC
*** Bug 1893199 has been marked as a duplicate of this bug. ***

Comment 26 Jason Paroly 2021-05-07 14:34:11 UTC
We are seeing what we think is the same or similar issues.  I have an environment where this is reproduced now.

./nova-compute.log.1:2021-05-06 21:16:39.759 7 INFO nova.compute.claims [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Claim successful on node ceadc963-30a1-4597-88ef-84cf4713f543
./nova-compute.log.1:2021-05-06 21:17:01.087 7 INFO nova.virt.ironic.driver [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] Config drive for instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca on baremetal node ceadc963-30a1-4597-88ef-84cf4713f543 created.
./nova-compute.log.1:2021-05-06 21:17:32.015 7 ERROR oslo.service.loopingcall [-] Fixed interval looping call 'nova.virt.ironic.driver.IronicDriver._wait_for_active' failed: nova.exception.InstanceDeployFailure: Failed to provision instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca: Failed to deploy. Exception: Unexpected error while running command.
./nova-compute.log.1:2021-05-06 21:17:32.015 7 ERROR oslo.service.loopingcall nova.exception.InstanceDeployFailure: Failed to provision instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca: Failed to deploy. Exception: Unexpected error while running command.
./nova-compute.log.1:2021-05-06 21:17:32.017 7 ERROR nova.virt.ironic.driver [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] Error deploying instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca on baremetal node ceadc963-30a1-4597-88ef-84cf4713f543.: nova.exception.InstanceDeployFailure: Failed to provision instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca: Failed to deploy. Exception: Unexpected error while running command.
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Instance failed to spawn: nova.exception.InstanceDeployFailure: Failed to provision instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca: Failed to deploy. Exception: Unexpected error while running command.
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Traceback (most recent call last):
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]   File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2668, in _build_resources
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]     yield resources
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]   File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2442, in _build_and_run_instance
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]     block_device_info=block_device_info)
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]   File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1289, in spawn
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]     'node': node_uuid})
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]     self.force_reraise()
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]     six.reraise(self.type_, self.value, self.tb)
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]   File "/usr/lib/python3.6/site-packages/six.py", line 675, in reraise
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]     raise value
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]   File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1281, in spawn
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]     timer.start(interval=CONF.ironic.api_retry_interval).wait()
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]   File "/usr/lib/python3.6/site-packages/eventlet/event.py", line 125, in wait
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]     result = hub.switch()
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]   File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]     return self.greenlet.switch()
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]   File "/usr/lib/python3.6/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]     result = func(*self.args, **self.kw)
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]   File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 558, in _wait_for_active
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]     raise exception.InstanceDeployFailure(msg)
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] nova.exception.InstanceDeployFailure: Failed to provision instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca: Failed to deploy. Exception: Unexpected error while running command.
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Command: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmpsahbszdq/3ba5f9cf-625d-4e0f-8285-5f64da979e33.part /var/lib/ironic/master_images/tmpsahbszdq/3ba5f9cf-625d-4e0f-8285-5f64da979e33.converted
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Exit code: 1
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Stdout: ''
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Stderr: 'qemu-img: error while reading sector 289536: Input/output error\n'
./nova-compute.log.1:2021-05-06 21:17:32.018 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] 
./nova-compute.log.1:2021-05-06 21:17:32.019 7 INFO nova.compute.manager [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Terminating instance
./nova-compute.log.1:2021-05-06 21:18:18.794 7 INFO nova.virt.ironic.driver [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Successfully unprovisioned Ironic node ceadc963-30a1-4597-88ef-84cf4713f543
./nova-compute.log.1:2021-05-06 21:18:18.795 7 INFO nova.compute.manager [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Took 46.76 seconds to destroy the instance on the hypervisor.
./nova-compute.log.1:2021-05-06 21:18:19.059 7 ERROR nova.scheduler.client.report [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] [req-a0b8a683-cd25-458a-a1fd-e47f92790b2f] Failed to update inventory to [{'CUSTOM_BAREMETAL': {'total': 1, 'reserved': 1, 'min_unit': 1, 'max_unit': 1, 'step_size': 1, 'allocation_ratio': 1.0}}] for resource provider with UUID ceadc963-30a1-4597-88ef-84cf4713f543.  Got 409: {"errors": [{"status": 409, "title": "Conflict", "detail": "There was a conflict when trying to complete your request.\n\n resource provider generation conflict  ", "code": "placement.concurrent_update", "request_id": "req-a0b8a683-cd25-458a-a1fd-e47f92790b2f"}]}: nova.exception.InstanceDeployFailure: Failed to provision instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca: Failed to deploy. Exception: Unexpected error while running command.
./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Failed to build and run instance: nova.exception.InstanceDeployFailure: Failed to provision instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca: Failed to deploy. Exception: Unexpected error while running command.
./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Traceback (most recent call last):
./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]   File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2442, in _build_and_run_instance
./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]     block_device_info=block_device_info)
./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]   File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1289, in spawn
./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]     'node': node_uuid})
./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]     self.force_reraise()
./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]     six.reraise(self.type_, self.value, self.tb)
./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]   File "/usr/lib/python3.6/site-packages/six.py", line 675, in reraise
./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]     raise value
./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]   File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1281, in spawn
./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]     timer.start(interval=CONF.ironic.api_retry_interval).wait()
./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]   File "/usr/lib/python3.6/site-packages/eventlet/event.py", line 125, in wait
./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]     result = hub.switch()
./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]   File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch
./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]     return self.greenlet.switch()
./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]   File "/usr/lib/python3.6/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop
./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]     result = func(*self.args, **self.kw)
./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]   File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 558, in _wait_for_active
./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca]     raise exception.InstanceDeployFailure(msg)
./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] nova.exception.InstanceDeployFailure: Failed to provision instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca: Failed to deploy. Exception: Unexpected error while running command.
./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Command: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmpsahbszdq/3ba5f9cf-625d-4e0f-8285-5f64da979e33.part /var/lib/ironic/master_images/tmpsahbszdq/3ba5f9cf-625d-4e0f-8285-5f64da979e33.converted
./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Exit code: 1
./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Stdout: ''
./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Stderr: 'qemu-img: error while reading sector 289536: Input/output error\n'
./nova-compute.log.1:2021-05-06 21:18:19.210 7 ERROR nova.compute.manager [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] 
./nova-compute.log.1:2021-05-06 21:18:21.568 7 INFO nova.compute.manager [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Took 1.06 seconds to deallocate network for instance.
./nova-compute.log.1:2021-05-06 21:18:21.935 7 INFO nova.scheduler.client.report [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] Deleted allocation for instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca
./nova-compute.log.1:2021-05-06 21:18:21.977 7 INFO nova.compute.manager [-] [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] During sync_power_state the instance has a pending task (spawning). Skip.
./nova-conductor.log.1:2021-05-06 21:18:21.954 7 ERROR nova.scheduler.utils [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Error from last host: overcloud-controller-1.localdomain (node ceadc963-30a1-4597-88ef-84cf4713f543): ['Traceback (most recent call last):\n', '  File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2442, in _build_and_run_instance\n    block_device_info=block_device_info)\n', '  File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1289, in spawn\n    \'node\': node_uuid})\n', '  File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__\n    self.force_reraise()\n', '  File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise\n    six.reraise(self.type_, self.value, self.tb)\n', '  File "/usr/lib/python3.6/site-packages/six.py", line 675, in reraise\n    raise value\n', '  File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1281, in spawn\n    timer.start(interval=CONF.ironic.api_retry_interval).wait()\n', '  File "/usr/lib/python3.6/site-packages/eventlet/event.py", line 125, in wait\n    result = hub.switch()\n', '  File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch\n    return self.greenlet.switch()\n', '  File "/usr/lib/python3.6/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop\n    result = func(*self.args, **self.kw)\n', '  File "/usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 558, in _wait_for_active\n    raise exception.InstanceDeployFailure(msg)\n', "nova.exception.InstanceDeployFailure: Failed to provision instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca: Failed to deploy. Exception: Unexpected error while running command.\nCommand: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmpsahbszdq/3ba5f9cf-625d-4e0f-8285-5f64da979e33.part /var/lib/ironic/master_images/tmpsahbszdq/3ba5f9cf-625d-4e0f-8285-5f64da979e33.converted\nExit code: 1\nStdout: ''\nStderr: 'qemu-img: error while reading sector 289536: Input/output error\\n'\n", '\nDuring handling of the above exception, another exception occurred:\n\n', 'Traceback (most recent call last):\n', '  File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2168, in _do_build_and_run_instance\n    filter_properties, request_spec)\n', '  File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2542, in _build_and_run_instance\n    instance_uuid=instance.uuid, reason=six.text_type(e))\n', "nova.exception.RescheduledException: Build of instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca was re-scheduled: Failed to provision instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca: Failed to deploy. Exception: Unexpected error while running command.\nCommand: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmpsahbszdq/3ba5f9cf-625d-4e0f-8285-5f64da979e33.part /var/lib/ironic/master_images/tmpsahbszdq/3ba5f9cf-625d-4e0f-8285-5f64da979e33.converted\nExit code: 1\nStdout: ''\nStderr: 'qemu-img: error while reading sector 289536: Input/output error\\n'\n"]
./nova-conductor.log.1:2021-05-06 21:18:22.294 7 WARNING nova.scheduler.client.report [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] Failed to save allocation for 4df7207d-3d9a-4dc7-8ef3-6586787d75ca. Got HTTP 409: {"errors": [{"status": 409, "title": "Conflict", "detail": "There was a conflict when trying to complete your request.\n\n Unable to allocate inventory: Unable to create allocation for 'CUSTOM_BAREMETAL' on resource provider '4d94494b-4842-4723-8f32-7b7f011f6140'. The requested amount would exceed the capacity.  ", "code": "placement.undefined_code", "request_id": "req-35eaa6eb-4570-4ab1-a169-c84a523e0b83"}]}
./nova-conductor.log.1:2021-05-06 21:18:22.295 7 WARNING nova.scheduler.utils [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] Failed to compute_task_build_instances: Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance 4df7207d-3d9a-4dc7-8ef3-6586787d75ca.
./nova-conductor.log.1:2021-05-06 21:18:22.295 7 WARNING nova.scheduler.utils [req-10551c5c-7d0c-483b-812b-3e8072db8ee9 200ecd17cc9d43bcb147d66320bbf97c 3bf9112e3ea5442ca8027f99227593e1 - default default] [instance: 4df7207d-3d9a-4dc7-8ef3-6586787d75ca] Setting instance to ERROR state.

Comment 27 Jason Paroly 2021-05-07 14:35:47 UTC
Please see last comment.

Comment 29 Paras Babbar 2021-05-07 17:45:05 UTC
moving it back to assigned as we still see this issue being coming intermittently.

Comment 31 Ollie Walsh 2021-05-26 12:26:27 UTC
I've reproduced it (once or twice) outside of IPA by running:
/usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw overcloud-full.qcow2 overcloud-full.raw

Lowering the limit makes it 100% reproducible.

When running qemu-img directly I can see the address space used by the qemu-img process isn't consistent between runs. Address space usage measured by summing the size of everything in smaps:

    #!/bin/bash -e
    qemu-img convert -O raw overcloud-full.qcow2 overcloud-full.raw &
    #qemu-img convert -O raw cirros-0.5.2-x86_64-disk.img cirros-0.5.2-x86_64-disk.raw &
    QEMU_PID=$!
    while [ -e /proc/${QEMU_PID} ]; do
      grep ^Size: /proc/${QEMU_PID}/smaps | awk '{s+=$2} END { print s}'
    done

This appears to vary wildly - hundreds of MB difference between runs. I assume this is normal due to malloc order differing with threading.

Is address space size even the right thing to limit? With mmap-ed files, shared libs, and the way malloc allocates buckets etc... this doesn't really translate to memory usage.

Is there any reason to not increase image_convert_memory_limit on the undercloud? IIRC the absolute minimum we expect (e.g in CI/Dev) is 8gb RAM per node.

Comment 32 Ollie Walsh 2021-05-26 13:34:00 UTC
@dtantsur WDYT? I see you've just bumped the limit on master https://review.opendev.org/#/q/Ia1535b375d2cec467011c7456da84b5811ba48dd

Cheers,
Ollie

Comment 33 Dmitry Tantsur 2021-05-26 13:44:14 UTC
All for it, but I no longer work on OSP, so you'll have to contact someone from the team.

Comment 34 Ollie Walsh 2021-05-26 15:11:50 UTC
(In reply to Ollie Walsh from comment #31)
> I've reproduced it (once or twice) outside of IPA by running:
> /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img
> convert -O raw overcloud-full.qcow2 overcloud-full.raw
> 
> Lowering the limit makes it 100% reproducible.
> 
> When running qemu-img directly I can see the address space used by the
> qemu-img process isn't consistent between runs. Address space usage measured
> by summing the size of everything in smaps:
> 
>     #!/bin/bash -e
>     qemu-img convert -O raw overcloud-full.qcow2 overcloud-full.raw &
>     #qemu-img convert -O raw cirros-0.5.2-x86_64-disk.img
> cirros-0.5.2-x86_64-disk.raw &
>     QEMU_PID=$!
>     while [ -e /proc/${QEMU_PID} ]; do
>       grep ^Size: /proc/${QEMU_PID}/smaps | awk '{s+=$2} END { print s}'
>     done
> 
> This appears to vary wildly - hundreds of MB difference between runs. I
> assume this is normal due to malloc order differing with threading.

Or maybe this is just due to ASLR.

Also FWIW the qcow2 image size doesn't appear to be significant. Address space usage ranging from 500mb to 1.4gb when converting the cirros image.

Comment 35 Julia Kreger 2021-05-26 17:14:36 UTC
(In reply to Ollie Walsh from comment #34)
> (In reply to Ollie Walsh from comment #31)
> > I've reproduced it (once or twice) outside of IPA by running:
> > /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img
> > convert -O raw overcloud-full.qcow2 overcloud-full.raw
> > 
> > Lowering the limit makes it 100% reproducible.
> > 
> > When running qemu-img directly I can see the address space used by the
> > qemu-img process isn't consistent between runs. Address space usage measured
> > by summing the size of everything in smaps:
> > 
> >     #!/bin/bash -e
> >     qemu-img convert -O raw overcloud-full.qcow2 overcloud-full.raw &
> >     #qemu-img convert -O raw cirros-0.5.2-x86_64-disk.img
> > cirros-0.5.2-x86_64-disk.raw &
> >     QEMU_PID=$!
> >     while [ -e /proc/${QEMU_PID} ]; do
> >       grep ^Size: /proc/${QEMU_PID}/smaps | awk '{s+=$2} END { print s}'
> >     done
> > 
> > This appears to vary wildly - hundreds of MB difference between runs. I
> > assume this is normal due to malloc order differing with threading.
> 
> Or maybe this is just due to ASLR.
> 

I hadn't thought of that, which makes me wonder, combined with any memory pressure and possibly even much larger images, if qemu-img convert would even be possible in a hypervisor case, or a case of an administrator needing to convert the disk images around. :\ Our artificial memory pressure through the ulimit just hits the malloc headache harder because we're funcitonally trying to keep the process from OOM'ing the machine. :\

> Also FWIW the qcow2 image size doesn't appear to be significant. Address
> space usage ranging from 500mb to 1.4gb when converting the cirros image.

That is..... wow. When I read through the qemu-img source, it didn't *seem* like it would result in anything like that, but that functionally means all memory bounding will need to be ripped out.

Comment 36 Ollie Walsh 2021-05-26 17:50:56 UTC
(In reply to Julia Kreger from comment #35)
> (In reply to Ollie Walsh from comment #34)
> > (In reply to Ollie Walsh from comment #31)
> > > I've reproduced it (once or twice) outside of IPA by running:
> > > /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img
> > > convert -O raw overcloud-full.qcow2 overcloud-full.raw
> > > 
> > > Lowering the limit makes it 100% reproducible.
> > > 
> > > When running qemu-img directly I can see the address space used by the
> > > qemu-img process isn't consistent between runs. Address space usage measured
> > > by summing the size of everything in smaps:
> > > 
> > >     #!/bin/bash -e
> > >     qemu-img convert -O raw overcloud-full.qcow2 overcloud-full.raw &
> > >     #qemu-img convert -O raw cirros-0.5.2-x86_64-disk.img
> > > cirros-0.5.2-x86_64-disk.raw &
> > >     QEMU_PID=$!
> > >     while [ -e /proc/${QEMU_PID} ]; do
> > >       grep ^Size: /proc/${QEMU_PID}/smaps | awk '{s+=$2} END { print s}'
> > >     done
> > > 
> > > This appears to vary wildly - hundreds of MB difference between runs. I
> > > assume this is normal due to malloc order differing with threading.
> > 
> > Or maybe this is just due to ASLR.
> > 
> 
> I hadn't thought of that, which makes me wonder, combined with any memory
> pressure and possibly even much larger images,

I also tested with a qcow2 with 10gb of random data and it didn't notice a major difference.

> if qemu-img convert would
> even be possible in a hypervisor case, or a case of an administrator needing
> to convert the disk images around. :\ Our artificial memory pressure through
> the ulimit just hits the malloc headache harder because we're funcitonally
> trying to keep the process from OOM'ing the machine. :\
> 
> > Also FWIW the qcow2 image size doesn't appear to be significant. Address
> > space usage ranging from 500mb to 1.4gb when converting the cirros image.
> 
> That is..... wow. When I read through the qemu-img source, it didn't *seem*
> like it would result in anything like that, but that functionally means all
> memory bounding will need to be ripped out.

Been chatting about this on #virt.

It does appears to be related to malloc and threading.

Setting MALLOC_ARENA_MAX=1 results in ~250mb. Similarly disabling threading with -m 1 results in ~250mb.

The default limit for MALLOC_ARENA_MAX = 8 * ncpus so it's likely that the number of cpu cores is significant too.

I expect this could affect other project setting similar limits e.g libguestfs so may ultimately need to be addressed in qemu-img. May also need a workaround in the various places it is used in openstack

Comment 37 Julia Kreger 2021-05-26 18:50:21 UTC
I'm working on patches for ironic at this time, however there is more than one deployment interface impacted by this which will result in patches against multiple repositories and branches.

Comment 38 Richard W.M. Jones 2021-05-26 18:55:15 UTC
You need to maintain some limits (maybe higher than present).  qemu-img
is vulnerable to denial of service attacks when processing untrusted
images, and these are easy to find and exploit.

Comment 39 Julia Kreger 2021-05-26 19:00:39 UTC
That makes a lot of sense given how it sifts through the data to assemble the stream to write out. Good to know, thanks!

Comment 40 Ollie Walsh 2021-05-26 19:12:27 UTC
(In reply to Richard W.M. Jones from comment #38)
> You need to maintain some limits (maybe higher than present).  qemu-img
> is vulnerable to denial of service attacks when processing untrusted
> images, and these are easy to find and exploit.

More info in https://bugs.launchpad.net/ossa/+bug/1449062

Comment 54 errata-xmlrpc 2021-12-09 20:17:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3762


Note You need to log in before you can comment on or make changes to this bug.