This bug was initially created as a copy of Bug #1851797 I am copying this bug because: Description of problem: Failed to create VM with watchdog by setting flavor metadata Version-Release number of selected component (if applicable): openstack-nova-compute-20.2.1-0.20200528080027.1e95025.el8ost.noarch libvirt-daemon-kvm-6.0.0-23.module+el8.2.1+6955+1e1fca42.x86_64 How reproducible: 100% Steps to Reproduce: 1. Create a image/volume in OSP16.1, set the image/volume metedata hw_watchdog_action: pause, create flavor: m2, start the VM from image/volume with watchdog successfully, the xml is as below: <watchdog model='i6300esb' action='pause'> <alias name='watchdog0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </watchdog> 2. Set the flavor:m2 with hw_watchdog_action: pause, try to start the VM from the same image/volume with m2 on same compute node, failed with message: "No valid host was found. There are not enough hosts available" ----------------------------------------------------------------------- [heat-admin@overcloud-controller-0 ~]$ sudo tail -f /var/log/containers/nova/nova-scheduler.log 2020-06-29 02:24:10.513 47 INFO nova.filters [req-a8e4515d-44ef-4020-83d5-9888a5776b8b 317f1c1fe439476dbfb991ec7768d23f 63a5b71d92f74e0ba553e6acca2c747c - default default] Filter AggregateInstanceExtraSpecsFilter returned 0 hosts 2020-06-29 02:24:10.514 47 INFO nova.filters [req-a8e4515d-44ef-4020-83d5-9888a5776b8b 317f1c1fe439476dbfb991ec7768d23f 63a5b71d92f74e0ba553e6acca2c747c - default default] Filtering removed all hosts for the request with instance ID '554daf2f-88f1-4171-ad01-fc236958219d'. Filter results: ['AvailabilityZoneFilter: (start: 4, end: 1)', 'ComputeFilter: (start: 1, end: 1)', 'ComputeCapabilitiesFilter: (start: 1, end: 1)', 'ImagePropertiesFilter: (start: 1, end: 1)', 'ServerGroupAntiAffinityFilter: (start: 1, end: 1)', 'ServerGroupAffinityFilter: (start: 1, end: 1)', 'NUMATopologyFilter: (start: 1, end: 1)', 'AggregateInstanceExtraSpecsFilter: (start: 1, end: 0)'] ------------------------------------------------------------------------ Actual results: In step2, failed to start VM with flavor metadata: hw_watchdog_action: pause Expected results: In step2, can start VM with flavor metadata: hw_watchdog_action: pause successfully
Verified on: openstack-glance-19.0.4-1.20210713143305.5bbd356.el8ost.noarch Filling in for Mike, cloning his verification steps from original bz1851797. 1. Lets update an existing Cirros image, with hw_watchdog_action parameter: (overcloud) [stack@undercloud-0 ~]$ glance image-update b574e2c5-2625-42b6-9d91-700c71fa13fa --property hw_watchdog_action=pause +----------------------------------+----------------------------------------------------------------------------------+ | Property | Value | +----------------------------------+----------------------------------------------------------------------------------+ | checksum | 443b7623e27ecf03dc9e01ee93f67afe | | container_format | bare | | created_at | 2021-09-14T05:50:22Z | | direct_url | swift+config://ref1/glance/b574e2c5-2625-42b6-9d91-700c71fa13fa | | disk_format | qcow2 | | hw_watchdog_action | pause | | id | b574e2c5-2625-42b6-9d91-700c71fa13fa | | min_disk | 0 | | min_ram | 0 | | name | cirros | | os_hash_algo | sha512 | | os_hash_value | 6513f21e44aa3da349f248188a44bc304a3653a04122d8fb4535423c8e1d14cd6a153f735bb0982e | | | 2161b5b5186106570c17a9e58b64dd39390617cd5a350f78 | | os_hidden | False | | owner | 24dfce9076bc49aa99b0b67516db7b5f | | owner_specified.openstack.md5 | | | owner_specified.openstack.object | images/cirros | | owner_specified.openstack.sha256 | | | protected | False | | size | 12716032 | | status | active | | stores | default_backend | | tags | [] | | updated_at | 2021-09-14T06:00:53Z | | virtual_size | Not available | | visibility | public | +----------------------------------+----------------------------------------------------------------------------------+ 2. Boot an instance off of the updated image: (overcloud) [stack@undercloud-0 ~]$ nova boot test_watchdog_option --nic net-id=b81f87ff-0b7f-4be0-822d-f60f1141ff23 --flavor tiny --image b574e2c5-2625-42b6-9d91-700c71fa13fa +--------------------------------------+-----------------------------------------------+ | Property | Value | +--------------------------------------+-----------------------------------------------+ | OS-DCF:diskConfig | MANUAL | | OS-EXT-AZ:availability_zone | | | OS-EXT-SRV-ATTR:host | - | | OS-EXT-SRV-ATTR:hostname | test-watchdog-option | | OS-EXT-SRV-ATTR:hypervisor_hostname | - | | OS-EXT-SRV-ATTR:instance_name | | | OS-EXT-SRV-ATTR:kernel_id | | | OS-EXT-SRV-ATTR:launch_index | 0 | | OS-EXT-SRV-ATTR:ramdisk_id | | | OS-EXT-SRV-ATTR:reservation_id | r-la3w0s1s | | OS-EXT-SRV-ATTR:root_device_name | - | | OS-EXT-SRV-ATTR:user_data | - | | OS-EXT-STS:power_state | 0 | | OS-EXT-STS:task_state | scheduling | | OS-EXT-STS:vm_state | building | | OS-SRV-USG:launched_at | - | | OS-SRV-USG:terminated_at | - | | accessIPv4 | | | accessIPv6 | | | adminPass | FTkaH9WDvuWK | | config_drive | | | created | 2021-09-14T06:05:17Z | | description | - | | flavor:disk | 1 | | flavor:ephemeral | 0 | | flavor:extra_specs | {} | | flavor:original_name | tiny | | flavor:ram | 512 | | flavor:swap | 0 | | flavor:vcpus | 1 | | hostId | | | host_status | | | id | 45ae2948-6eee-4fa3-ae44-48a2cf7ef0a8 | | image | cirros (b574e2c5-2625-42b6-9d91-700c71fa13fa) | | key_name | - | | locked | False | | locked_reason | - | | metadata | {} | | name | test_watchdog_option | | os-extended-volumes:volumes_attached | [] | | progress | 0 | | security_groups | default | | server_groups | [] | | status | BUILD | | tags | [] | | tenant_id | 24dfce9076bc49aa99b0b67516db7b5f | | trusted_image_certificates | - | | updated | 2021-09-14T06:05:17Z | | user_id | 0d06dd3833e8491a85ba787192859b00 | +--------------------------------------+-----------------------------------------------+ 3. Greate instance is alive: (overcloud) [stack@undercloud-0 ~]$ nova list +--------------------------------------+----------------------+--------+------------+-------------+-----------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+----------------------+--------+------------+-------------+-----------------------------------+ | 45ae2948-6eee-4fa3-ae44-48a2cf7ef0a8 | test_watchdog_option | ACTIVE | - | Running | internal=192.168.0.26 | +--------------------------------------+----------------------+--------+------------+-------------+-----------------------------------+ 4. Lets confirm watchdog settings are in place: ()[root@compute-0 /]# virsh dumpxml instance-00000005 | grep watch <nova:name>test_watchdog_option</nova:name> <watchdog model='i6300esb' action='pause'> <alias name='watchdog0'/> </watchdog> Looks good to verify, we could stop here and signoff. But I wounder if I can crash my instance and cause it to reach paused state. On instance console I ran this: $ sudo dd if=/dev/zero of=/dev/vda [ 830.737662] init[1]: segfault at 1b2 ip 00007f03deb5febc sp 00007ffe1d02a798 error 6 in libuClibc-0.9.33.2.so[7f06fb564000+63000] [ 830.767984] in libuClibc-0.9.33.2.so[7f03deb4e000+63000] [ 830.782178] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 830.782178] [ 830.784979] CPU: 0 PID: 1 Comm: init Not tainted 4.4.0-28-generic #47-Ubuntu [ 830.784979] Hardware name: Red Hat OpenStack Compute, BIOS 1.13.0-2.module+el8.2.1+7284+aa32a2c4 04/01/2014 [ 830.784979] 0000000000000086 000000005ff57745 ffff88001f63bc50 ffffffff813eb1a3 [ 830.784979] ffffffff81cb10d8 ffff88001f63bce8 ffff88001f63bcd8 ffffffff8118bf57 [ 830.784979] ffff880000000010 ffff88001f63bce8 ffff88001f63bc80 000000005ff57745 [ 830.784979] Call Trace: [ 830.784979] [<ffffffff813eb1a3>] dump_stack+0x63/0x90 [ 830.784979] [<ffffffff8118bf57>] panic+0xd3/0x215 [ 830.784979] [<ffffffff81184e1e>] ? perf_event_exit_task+0xbe/0x350 [ 830.784979] [<ffffffff81084541>] do_exit+0xae1/0xaf0 [ 830.784979] [<ffffffff810845d3>] do_group_exit+0x43/0xb0 [ 830.784979] [<ffffffff810907a2>] get_signal+0x292/0x600 [ 830.784979] [<ffffffff8102e537>] do_signal+0x37/0x6f0 [ 830.784979] [<ffffffff810ca961>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20 [ 830.784979] [<ffffffff810d0000>] ? resume_store+0xc0/0xd0 [ 830.784979] [<ffffffff8118c27e>] ? printk+0x57/0x73 [ 830.784979] [<ffffffff81822f06>] ? __schedule+0x3b6/0xa30 [ 830.784979] [<ffffffff8100320c>] exit_to_usermode_loop+0x8c/0xd0 [ 830.784979] [<ffffffff81003c16>] prepare_exit_to_usermode+0x26/0x30 [ 830.784979] [<ffffffff818281a5>] retint_user+0x8/0x10 [ 830.784979] Kernel Offset: disabled [ 830.784979] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 830.784979] Console stopped working froze, yet instance remains in running state. SO well either I didn't nuke it good enough, or Nova /libvirt has another issue? (overcloud) [stack@undercloud-0 ~]$ nova list +--------------------------------------+----------------------+--------+------------+-------------+-----------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+----------------------+--------+------------+-------------+-----------------------------------+ | 45ae2948-6eee-4fa3-ae44-48a2cf7ef0a8 | test_watchdog_option | ACTIVE | - | Running | internal=192.168.0.26 | +--------------------------------------+----------------------+--------+------------+-------------+-----------------------------------+ ()[root@compute-0 /]# virsh list Id Name State ----------------------------------- 1 instance-00000005 running Again confirming watchdog setting is set on virshxml <watchdog model='i6300esb' action='pause'> <alias name='watchdog0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </watchdog> Odd I would expect Nova/virsh to switch to "paused" state. Lets try another instance/nuke method: (overcloud) [stack@undercloud-0 ~]$ nova boot test_watchdog_option_nuke2 --nic net-id=b81f87ff-0b7f-4be0-822d-f60f1141ff23 --flavor tiny --image b574e2c5-2625-42b6-9d91-700c71fa13fa +--------------------------------------+-----------------------------------------------+ | Property | Value | +--------------------------------------+-----------------------------------------------+ | OS-DCF:diskConfig | MANUAL | | OS-EXT-AZ:availability_zone | | | OS-EXT-SRV-ATTR:host | - | | OS-EXT-SRV-ATTR:hostname | test-watchdog-option-nuke2 | | OS-EXT-SRV-ATTR:hypervisor_hostname | - | | OS-EXT-SRV-ATTR:instance_name | | | OS-EXT-SRV-ATTR:kernel_id | | | OS-EXT-SRV-ATTR:launch_index | 0 | | OS-EXT-SRV-ATTR:ramdisk_id | | | OS-EXT-SRV-ATTR:reservation_id | r-4av7qo8d | | OS-EXT-SRV-ATTR:root_device_name | - | | OS-EXT-SRV-ATTR:user_data | - | | OS-EXT-STS:power_state | 0 | | OS-EXT-STS:task_state | scheduling | | OS-EXT-STS:vm_state | building | | OS-SRV-USG:launched_at | - | | OS-SRV-USG:terminated_at | - | | accessIPv4 | | | accessIPv6 | | | adminPass | 5yzhGKx6q2aT | | config_drive | | | created | 2021-09-14T06:30:56Z | | description | - | | flavor:disk | 1 | | flavor:ephemeral | 0 | | flavor:extra_specs | {} | | flavor:original_name | tiny | | flavor:ram | 512 | | flavor:swap | 0 | | flavor:vcpus | 1 | | hostId | | | host_status | | | id | 7e648b4b-884e-4fbe-84a2-f1e9812ed463 | | image | cirros (b574e2c5-2625-42b6-9d91-700c71fa13fa) | | key_name | - | | locked | False | | locked_reason | - | | metadata | {} | | name | test_watchdog_option_nuke2 | | os-extended-volumes:volumes_attached | [] | | progress | 0 | | security_groups | default | | server_groups | [] | | status | BUILD | | tags | [] | | tenant_id | 24dfce9076bc49aa99b0b67516db7b5f | | trusted_image_certificates | - | | updated | 2021-09-14T06:30:56Z | | user_id | 0d06dd3833e8491a85ba787192859b00 | +--------------------------------------+-----------------------------------------------+ (overcloud) [stack@undercloud-0 ~]$ nova list +--------------------------------------+----------------------------+--------+------------+-------------+-----------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+----------------------------+--------+------------+-------------+-----------------------------------+ | 45ae2948-6eee-4fa3-ae44-48a2cf7ef0a8 | test_watchdog_option | ACTIVE | - | Running | internal=192.168.0.26 | | 7e648b4b-884e-4fbe-84a2-f1e9812ed463 | test_watchdog_option_nuke2 | ACTIVE | - | Running | internal=192.168.0.29 | +--------------------------------------+----------------------------+--------+------------+-------------+-----------------------------------+ This time lets try this inside the new instance's console: $ sudo -i # echo c > /proc/sysrq-trigger [ 345.813727] sysrq: SysRq : Trigger a crash [ 345.828764] BUG: unable to handle kernel NULL pointer dereference at (null) [ 345.832663] IP: [<ffffffff814ef6d6>] sysrq_handle_crash+0x16/0x20 [ 345.832663] PGD 1dcf2067 PUD 1dcf8067 PMD 0 [ 345.832663] Oops: 0002 [#1] SMP [ 345.832663] Modules linked in: nls_iso8859_1 isofs ip_tables x_tables pcnet32 8139cp mii ne2k_pci 8390 e1000 virtio_scsi [ 345.832663] CPU: 0 PID: 439 Comm: sh Not tainted 4.4.0-28-generic #47-Ubuntu [ 345.832663] Hardware name: Red Hat OpenStack Compute, BIOS 1.13.0-2.module+el8.2.1+7284+aa32a2c4 04/01/2014 [ 345.832663] task: ffff88001d055280 ti: ffff88001d63c000 task.ti: ffff88001d63c000 [ 345.832663] RIP: 0010:[<ffffffff814ef6d6>] [<ffffffff814ef6d6>] sysrq_handle_crash+0x16/0x20 [ 345.832663] RSP: 0018:ffff88001d63fe48 EFLAGS: 00010282 [ 345.832663] RAX: 000000000000000f RBX: 0000000000000063 RCX: 0000000000000000 [ 345.832663] RDX: 0000000000000000 RSI: ffff88001f80dc78 RDI: 0000000000000063 [ 345.832663] RBP: ffff88001d63fe48 R08: 0000000000000002 R09: 00000000000001bd [ 345.832663] R10: 0000000000000001 R11: 00000000000001bd R12: 0000000000000006 [ 345.832663] R13: 0000000000000000 R14: ffffffff81ebad20 R15: 0000000000000000 [ 345.832663] FS: 00007f594d7cd6a0(0000) GS:ffff88001f800000(0000) knlGS:0000000000000000 [ 345.832663] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 345.832663] CR2: 0000000000000000 CR3: 000000000007e000 CR4: 00000000001406f0 [ 345.832663] Stack: [ 345.832663] ffff88001d63fe78 ffffffff814efeda 0000000000000002 fffffffffffffffb [ 345.832663] ffff88001d63ff18 0000000000000002 ffff88001d63fe90 ffffffff814f035f [ 345.832663] ffff88001ddd93c0 ffff88001d63feb0 ffffffff8127a062 ffff88001d1a1d00 [ 345.832663] Call Trace: [ 345.832663] [<ffffffff814efeda>] __handle_sysrq+0xea/0x140 [ 345.832663] [<ffffffff814f035f>] write_sysrq_trigger+0x2f/0x40 [ 345.832663] [<ffffffff8127a062>] proc_reg_write+0x42/0x70 [ 345.832663] [<ffffffff8120c918>] __vfs_write+0x18/0x40 [ 345.832663] [<ffffffff8120d2a9>] vfs_write+0xa9/0x1a0 [ 345.832663] [<ffffffff8120df65>] SyS_write+0x55/0xc0 [ 345.832663] [<ffffffff8106b807>] ? trace_do_page_fault+0x37/0xe0 [ 345.832663] [<ffffffff818276b2>] entry_SYSCALL_64_fastpath+0x16/0x71 [ 345.832663] Code: ef e8 1f f8 ff ff eb db 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00 00 55 c7 05 f8 15 c1 00 01 00 00 00 48 89 e5 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 0f 1f 44 00 00 55 c7 05 a0 1b 96 [ 345.832663] RIP [<ffffffff814ef6d6>] sysrq_handle_crash+0x16/0x20 [ 345.832663] RSP <ffff88001d63fe48> [ 345.832663] CR2: 0000000000000000 [ 346.586658] ---[ end trace ae30b72b651e7a03 ]--- [ 346.599648] Kernel panic - not syncing: Fatal exception [ 346.603610] Kernel Offset: disabled [ 346.603610] ---[ end Kernel panic - not syncing: Fatal exception Again no luck console is frozen, yet nova/libvirt still show instance and running: (overcloud) [stack@undercloud-0 ~]$ nova list +--------------------------------------+----------------------------+--------+------------+-------------+-----------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+----------------------------+--------+------------+-------------+-----------------------------------+ | 45ae2948-6eee-4fa3-ae44-48a2cf7ef0a8 | test_watchdog_option | ACTIVE | - | Running | internal=192.168.0.26 | | 7e648b4b-884e-4fbe-84a2-f1e9812ed463 | test_watchdog_option_nuke2 | ACTIVE | - | Running | internal=192.168.0.29 | +--------------------------------------+----------------------------+--------+------------+-------------+-----------------------------------+ ()[root@compute-1 /]# virsh list Id Name State ----------------------------------- 2 instance-00000008 running Well lets just confirm (again) that the watchdog setting has indeed trickled down to virsh level <watchdog model='i6300esb' action='pause'> <alias name='watchdog0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </watchdog> To sum up things, Glance is now working fine thus we can verify this bz. However the underlying Nova/libvirt part isn't working as expected(new bug?) BTW the image was a Cirros based, maybe I should used say a rhel image?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3762