Bug 1980594 - Failed to create VM with watchdog by setting flavor metadata
Summary: Failed to create VM with watchdog by setting flavor metadata
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-glance
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: z7
: 16.1 (Train on RHEL 8.2)
Assignee: Cyril Roelandt
QA Contact:
RHOS Documentation Team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-09 01:48 UTC by Cyril Roelandt
Modified: 2021-12-09 20:20 UTC (History)
4 users (show)

Fixed In Version: openstack-glance-19.0.4-1.20210709193304.5bbd356.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-12-09 20:20:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-6070 0 None None None 2021-11-18 11:34:09 UTC
Red Hat Product Errata RHBA-2021:3762 0 None None None 2021-12-09 20:20:45 UTC

Description Cyril Roelandt 2021-07-09 01:48:39 UTC
This bug was initially created as a copy of Bug #1851797

I am copying this bug because: 



Description of problem:
Failed to create VM with watchdog by setting flavor metadata

Version-Release number of selected component (if applicable):
openstack-nova-compute-20.2.1-0.20200528080027.1e95025.el8ost.noarch
libvirt-daemon-kvm-6.0.0-23.module+el8.2.1+6955+1e1fca42.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create a image/volume in OSP16.1, set the image/volume metedata hw_watchdog_action: pause, create flavor: m2, start the VM from image/volume with watchdog successfully, the xml is as below:
    <watchdog model='i6300esb' action='pause'>
      <alias name='watchdog0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </watchdog>

2. Set the flavor:m2 with hw_watchdog_action: pause, try to start the VM from the same image/volume with m2 on same compute node, failed with message:
   "No valid host was found. There are not enough hosts available"
-----------------------------------------------------------------------
[heat-admin@overcloud-controller-0 ~]$ sudo tail -f /var/log/containers/nova/nova-scheduler.log
2020-06-29 02:24:10.513 47 INFO nova.filters [req-a8e4515d-44ef-4020-83d5-9888a5776b8b 317f1c1fe439476dbfb991ec7768d23f 63a5b71d92f74e0ba553e6acca2c747c - default default] Filter AggregateInstanceExtraSpecsFilter returned 0 hosts
2020-06-29 02:24:10.514 47 INFO nova.filters [req-a8e4515d-44ef-4020-83d5-9888a5776b8b 317f1c1fe439476dbfb991ec7768d23f 63a5b71d92f74e0ba553e6acca2c747c - default default] Filtering removed all hosts for the request with instance ID '554daf2f-88f1-4171-ad01-fc236958219d'. Filter results: ['AvailabilityZoneFilter: (start: 4, end: 1)', 'ComputeFilter: (start: 1, end: 1)', 'ComputeCapabilitiesFilter: (start: 1, end: 1)', 'ImagePropertiesFilter: (start: 1, end: 1)', 'ServerGroupAntiAffinityFilter: (start: 1, end: 1)', 'ServerGroupAffinityFilter: (start: 1, end: 1)', 'NUMATopologyFilter: (start: 1, end: 1)', 'AggregateInstanceExtraSpecsFilter: (start: 1, end: 0)']  
------------------------------------------------------------------------

Actual results:
In step2, failed to start VM with flavor metadata: hw_watchdog_action: pause

Expected results:
In step2, can start VM with flavor metadata: hw_watchdog_action: pause successfully

Comment 10 Tzach Shefi 2021-09-14 06:44:28 UTC
Verified on:
openstack-glance-19.0.4-1.20210713143305.5bbd356.el8ost.noarch

Filling in for Mike, cloning his verification steps from original bz1851797. 

1. Lets update an existing Cirros image, with hw_watchdog_action parameter:

(overcloud) [stack@undercloud-0 ~]$ glance image-update b574e2c5-2625-42b6-9d91-700c71fa13fa --property hw_watchdog_action=pause
+----------------------------------+----------------------------------------------------------------------------------+
| Property                         | Value                                                                            |
+----------------------------------+----------------------------------------------------------------------------------+
| checksum                         | 443b7623e27ecf03dc9e01ee93f67afe                                                 |
| container_format                 | bare                                                                             |
| created_at                       | 2021-09-14T05:50:22Z                                                             |
| direct_url                       | swift+config://ref1/glance/b574e2c5-2625-42b6-9d91-700c71fa13fa                  |
| disk_format                      | qcow2                                                                            |
| hw_watchdog_action               | pause                                                                            |
| id                               | b574e2c5-2625-42b6-9d91-700c71fa13fa                                             |
| min_disk                         | 0                                                                                |
| min_ram                          | 0                                                                                |
| name                             | cirros                                                                           |
| os_hash_algo                     | sha512                                                                           |
| os_hash_value                    | 6513f21e44aa3da349f248188a44bc304a3653a04122d8fb4535423c8e1d14cd6a153f735bb0982e |
|                                  | 2161b5b5186106570c17a9e58b64dd39390617cd5a350f78                                 |
| os_hidden                        | False                                                                            |
| owner                            | 24dfce9076bc49aa99b0b67516db7b5f                                                 |
| owner_specified.openstack.md5    |                                                                                  |
| owner_specified.openstack.object | images/cirros                                                                    |
| owner_specified.openstack.sha256 |                                                                                  |
| protected                        | False                                                                            |
| size                             | 12716032                                                                         |
| status                           | active                                                                           |
| stores                           | default_backend                                                                  |
| tags                             | []                                                                               |
| updated_at                       | 2021-09-14T06:00:53Z                                                             |
| virtual_size                     | Not available                                                                    |
| visibility                       | public                                                                           |
+----------------------------------+----------------------------------------------------------------------------------+

2. Boot an instance off of the updated image:
(overcloud) [stack@undercloud-0 ~]$ nova boot test_watchdog_option --nic net-id=b81f87ff-0b7f-4be0-822d-f60f1141ff23 --flavor tiny --image b574e2c5-2625-42b6-9d91-700c71fa13fa 
+--------------------------------------+-----------------------------------------------+
| Property                             | Value                                         |
+--------------------------------------+-----------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                        |
| OS-EXT-AZ:availability_zone          |                                               |
| OS-EXT-SRV-ATTR:host                 | -                                             |
| OS-EXT-SRV-ATTR:hostname             | test-watchdog-option                          |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | -                                             |
| OS-EXT-SRV-ATTR:instance_name        |                                               |
| OS-EXT-SRV-ATTR:kernel_id            |                                               |
| OS-EXT-SRV-ATTR:launch_index         | 0                                             |
| OS-EXT-SRV-ATTR:ramdisk_id           |                                               |
| OS-EXT-SRV-ATTR:reservation_id       | r-la3w0s1s                                    |
| OS-EXT-SRV-ATTR:root_device_name     | -                                             |
| OS-EXT-SRV-ATTR:user_data            | -                                             |
| OS-EXT-STS:power_state               | 0                                             |
| OS-EXT-STS:task_state                | scheduling                                    |
| OS-EXT-STS:vm_state                  | building                                      |
| OS-SRV-USG:launched_at               | -                                             |
| OS-SRV-USG:terminated_at             | -                                             |
| accessIPv4                           |                                               |
| accessIPv6                           |                                               |
| adminPass                            | FTkaH9WDvuWK                                  |
| config_drive                         |                                               |
| created                              | 2021-09-14T06:05:17Z                          |
| description                          | -                                             |
| flavor:disk                          | 1                                             |
| flavor:ephemeral                     | 0                                             |
| flavor:extra_specs                   | {}                                            |
| flavor:original_name                 | tiny                                          |
| flavor:ram                           | 512                                           |
| flavor:swap                          | 0                                             |
| flavor:vcpus                         | 1                                             |
| hostId                               |                                               |
| host_status                          |                                               |
| id                                   | 45ae2948-6eee-4fa3-ae44-48a2cf7ef0a8          |
| image                                | cirros (b574e2c5-2625-42b6-9d91-700c71fa13fa) |
| key_name                             | -                                             |
| locked                               | False                                         |
| locked_reason                        | -                                             |
| metadata                             | {}                                            |
| name                                 | test_watchdog_option                          |
| os-extended-volumes:volumes_attached | []                                            |
| progress                             | 0                                             |
| security_groups                      | default                                       |
| server_groups                        | []                                            |
| status                               | BUILD                                         |
| tags                                 | []                                            |
| tenant_id                            | 24dfce9076bc49aa99b0b67516db7b5f              |
| trusted_image_certificates           | -                                             |
| updated                              | 2021-09-14T06:05:17Z                          |
| user_id                              | 0d06dd3833e8491a85ba787192859b00              |
+--------------------------------------+-----------------------------------------------+


3. Greate instance is alive:
(overcloud) [stack@undercloud-0 ~]$ nova list
+--------------------------------------+----------------------+--------+------------+-------------+-----------------------------------+
| ID                                   | Name                 | Status | Task State | Power State | Networks                          |
+--------------------------------------+----------------------+--------+------------+-------------+-----------------------------------+
| 45ae2948-6eee-4fa3-ae44-48a2cf7ef0a8 | test_watchdog_option | ACTIVE | -          | Running     | internal=192.168.0.26             |
+--------------------------------------+----------------------+--------+------------+-------------+-----------------------------------+

4. Lets confirm watchdog settings are in place:

()[root@compute-0 /]# virsh dumpxml instance-00000005 | grep watch
      <nova:name>test_watchdog_option</nova:name>
    <watchdog model='i6300esb' action='pause'>
      <alias name='watchdog0'/>
    </watchdog>


Looks good to verify, we could stop here and signoff.
But I wounder if I can crash my instance and cause it to reach paused state. 

On instance console I ran this:
$ sudo dd if=/dev/zero of=/dev/vda


[  830.737662] init[1]: segfault at 1b2 ip 00007f03deb5febc sp 00007ffe1d02a798 error 6
 in libuClibc-0.9.33.2.so[7f06fb564000+63000]
[  830.767984]  in libuClibc-0.9.33.2.so[7f03deb4e000+63000]
[  830.782178] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[  830.782178] 
[  830.784979] CPU: 0 PID: 1 Comm: init Not tainted 4.4.0-28-generic #47-Ubuntu
[  830.784979] Hardware name: Red Hat OpenStack Compute, BIOS 1.13.0-2.module+el8.2.1+7284+aa32a2c4 04/01/2014
[  830.784979]  0000000000000086 000000005ff57745 ffff88001f63bc50 ffffffff813eb1a3
[  830.784979]  ffffffff81cb10d8 ffff88001f63bce8 ffff88001f63bcd8 ffffffff8118bf57
[  830.784979]  ffff880000000010 ffff88001f63bce8 ffff88001f63bc80 000000005ff57745
[  830.784979] Call Trace:
[  830.784979]  [<ffffffff813eb1a3>] dump_stack+0x63/0x90
[  830.784979]  [<ffffffff8118bf57>] panic+0xd3/0x215
[  830.784979]  [<ffffffff81184e1e>] ? perf_event_exit_task+0xbe/0x350
[  830.784979]  [<ffffffff81084541>] do_exit+0xae1/0xaf0
[  830.784979]  [<ffffffff810845d3>] do_group_exit+0x43/0xb0
[  830.784979]  [<ffffffff810907a2>] get_signal+0x292/0x600
[  830.784979]  [<ffffffff8102e537>] do_signal+0x37/0x6f0
[  830.784979]  [<ffffffff810ca961>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
[  830.784979]  [<ffffffff810d0000>] ? resume_store+0xc0/0xd0
[  830.784979]  [<ffffffff8118c27e>] ? printk+0x57/0x73
[  830.784979]  [<ffffffff81822f06>] ? __schedule+0x3b6/0xa30
[  830.784979]  [<ffffffff8100320c>] exit_to_usermode_loop+0x8c/0xd0
[  830.784979]  [<ffffffff81003c16>] prepare_exit_to_usermode+0x26/0x30
[  830.784979]  [<ffffffff818281a5>] retint_user+0x8/0x10
[  830.784979] Kernel Offset: disabled
[  830.784979] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[  830.784979] 


Console stopped working froze, yet instance remains in running state.
SO well either I didn't nuke it good enough, or Nova /libvirt has another issue?
  
(overcloud) [stack@undercloud-0 ~]$ nova list
+--------------------------------------+----------------------+--------+------------+-------------+-----------------------------------+
| ID                                   | Name                 | Status | Task State | Power State | Networks                          |
+--------------------------------------+----------------------+--------+------------+-------------+-----------------------------------+
| 45ae2948-6eee-4fa3-ae44-48a2cf7ef0a8 | test_watchdog_option | ACTIVE | -          | Running     | internal=192.168.0.26             |
+--------------------------------------+----------------------+--------+------------+-------------+-----------------------------------+

()[root@compute-0 /]# virsh list
 Id   Name                State
-----------------------------------
 1    instance-00000005   running


Again confirming watchdog setting is set on virshxml
    <watchdog model='i6300esb' action='pause'>
      <alias name='watchdog0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </watchdog>


Odd I would expect Nova/virsh to switch to "paused" state. 


Lets try another instance/nuke method:
(overcloud) [stack@undercloud-0 ~]$ nova boot test_watchdog_option_nuke2 --nic net-id=b81f87ff-0b7f-4be0-822d-f60f1141ff23 --flavor tiny --image b574e2c5-2625-42b6-9d91-700c71fa13fa 
+--------------------------------------+-----------------------------------------------+
| Property                             | Value                                         |
+--------------------------------------+-----------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                        |
| OS-EXT-AZ:availability_zone          |                                               |
| OS-EXT-SRV-ATTR:host                 | -                                             |
| OS-EXT-SRV-ATTR:hostname             | test-watchdog-option-nuke2                    |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | -                                             |
| OS-EXT-SRV-ATTR:instance_name        |                                               |
| OS-EXT-SRV-ATTR:kernel_id            |                                               |
| OS-EXT-SRV-ATTR:launch_index         | 0                                             |
| OS-EXT-SRV-ATTR:ramdisk_id           |                                               |
| OS-EXT-SRV-ATTR:reservation_id       | r-4av7qo8d                                    |
| OS-EXT-SRV-ATTR:root_device_name     | -                                             |
| OS-EXT-SRV-ATTR:user_data            | -                                             |
| OS-EXT-STS:power_state               | 0                                             |
| OS-EXT-STS:task_state                | scheduling                                    |
| OS-EXT-STS:vm_state                  | building                                      |
| OS-SRV-USG:launched_at               | -                                             |
| OS-SRV-USG:terminated_at             | -                                             |
| accessIPv4                           |                                               |
| accessIPv6                           |                                               |
| adminPass                            | 5yzhGKx6q2aT                                  |
| config_drive                         |                                               |
| created                              | 2021-09-14T06:30:56Z                          |
| description                          | -                                             |
| flavor:disk                          | 1                                             |
| flavor:ephemeral                     | 0                                             |
| flavor:extra_specs                   | {}                                            |
| flavor:original_name                 | tiny                                          |
| flavor:ram                           | 512                                           |
| flavor:swap                          | 0                                             |
| flavor:vcpus                         | 1                                             |
| hostId                               |                                               |
| host_status                          |                                               |
| id                                   | 7e648b4b-884e-4fbe-84a2-f1e9812ed463          |
| image                                | cirros (b574e2c5-2625-42b6-9d91-700c71fa13fa) |
| key_name                             | -                                             |
| locked                               | False                                         |
| locked_reason                        | -                                             |
| metadata                             | {}                                            |
| name                                 | test_watchdog_option_nuke2                    |
| os-extended-volumes:volumes_attached | []                                            |
| progress                             | 0                                             |
| security_groups                      | default                                       |
| server_groups                        | []                                            |
| status                               | BUILD                                         |
| tags                                 | []                                            |
| tenant_id                            | 24dfce9076bc49aa99b0b67516db7b5f              |
| trusted_image_certificates           | -                                             |
| updated                              | 2021-09-14T06:30:56Z                          |
| user_id                              | 0d06dd3833e8491a85ba787192859b00              |
+--------------------------------------+-----------------------------------------------+

(overcloud) [stack@undercloud-0 ~]$ nova list
+--------------------------------------+----------------------------+--------+------------+-------------+-----------------------------------+                                                                                                
| ID                                   | Name                       | Status | Task State | Power State | Networks                          |                                                                                                
+--------------------------------------+----------------------------+--------+------------+-------------+-----------------------------------+                                                                                                                                                                                             
| 45ae2948-6eee-4fa3-ae44-48a2cf7ef0a8 | test_watchdog_option       | ACTIVE | -          | Running     | internal=192.168.0.26             |                                                                                                
| 7e648b4b-884e-4fbe-84a2-f1e9812ed463 | test_watchdog_option_nuke2 | ACTIVE | -          | Running     | internal=192.168.0.29             |                                                                                                
+--------------------------------------+----------------------------+--------+------------+-------------+-----------------------------------+ 


This time lets try this inside the new instance's console:
$ sudo -i
# echo c > /proc/sysrq-trigger
[  345.813727] sysrq: SysRq : Trigger a crash
[  345.828764] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  345.832663] IP: [<ffffffff814ef6d6>] sysrq_handle_crash+0x16/0x20
[  345.832663] PGD 1dcf2067 PUD 1dcf8067 PMD 0 
[  345.832663] Oops: 0002 [#1] SMP 
[  345.832663] Modules linked in: nls_iso8859_1 isofs ip_tables x_tables pcnet32 8139cp mii ne2k_pci 8390 e1000 virtio_scsi
[  345.832663] CPU: 0 PID: 439 Comm: sh Not tainted 4.4.0-28-generic #47-Ubuntu
[  345.832663] Hardware name: Red Hat OpenStack Compute, BIOS 1.13.0-2.module+el8.2.1+7284+aa32a2c4 04/01/2014
[  345.832663] task: ffff88001d055280 ti: ffff88001d63c000 task.ti: ffff88001d63c000
[  345.832663] RIP: 0010:[<ffffffff814ef6d6>]  [<ffffffff814ef6d6>] sysrq_handle_crash+0x16/0x20
[  345.832663] RSP: 0018:ffff88001d63fe48  EFLAGS: 00010282
[  345.832663] RAX: 000000000000000f RBX: 0000000000000063 RCX: 0000000000000000
[  345.832663] RDX: 0000000000000000 RSI: ffff88001f80dc78 RDI: 0000000000000063
[  345.832663] RBP: ffff88001d63fe48 R08: 0000000000000002 R09: 00000000000001bd
[  345.832663] R10: 0000000000000001 R11: 00000000000001bd R12: 0000000000000006
[  345.832663] R13: 0000000000000000 R14: ffffffff81ebad20 R15: 0000000000000000
[  345.832663] FS:  00007f594d7cd6a0(0000) GS:ffff88001f800000(0000) knlGS:0000000000000000
[  345.832663] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  345.832663] CR2: 0000000000000000 CR3: 000000000007e000 CR4: 00000000001406f0
[  345.832663] Stack:
[  345.832663]  ffff88001d63fe78 ffffffff814efeda 0000000000000002 fffffffffffffffb
[  345.832663]  ffff88001d63ff18 0000000000000002 ffff88001d63fe90 ffffffff814f035f
[  345.832663]  ffff88001ddd93c0 ffff88001d63feb0 ffffffff8127a062 ffff88001d1a1d00
[  345.832663] Call Trace:
[  345.832663]  [<ffffffff814efeda>] __handle_sysrq+0xea/0x140
[  345.832663]  [<ffffffff814f035f>] write_sysrq_trigger+0x2f/0x40
[  345.832663]  [<ffffffff8127a062>] proc_reg_write+0x42/0x70
[  345.832663]  [<ffffffff8120c918>] __vfs_write+0x18/0x40
[  345.832663]  [<ffffffff8120d2a9>] vfs_write+0xa9/0x1a0
[  345.832663]  [<ffffffff8120df65>] SyS_write+0x55/0xc0
[  345.832663]  [<ffffffff8106b807>] ? trace_do_page_fault+0x37/0xe0
[  345.832663]  [<ffffffff818276b2>] entry_SYSCALL_64_fastpath+0x16/0x71
[  345.832663] Code: ef e8 1f f8 ff ff eb db 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00 00 55 c7 05 f8 15 c1 00 01 00 00 00 48 89 e5 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 0f 1f 44 00 00 55 c7 05 a0 1b 96 
[  345.832663] RIP  [<ffffffff814ef6d6>] sysrq_handle_crash+0x16/0x20
[  345.832663]  RSP <ffff88001d63fe48>
[  345.832663] CR2: 0000000000000000
[  346.586658] ---[ end trace ae30b72b651e7a03 ]---
[  346.599648] Kernel panic - not syncing: Fatal exception
[  346.603610] Kernel Offset: disabled
[  346.603610] ---[ end Kernel panic - not syncing: Fatal exception


Again no luck console is frozen, yet nova/libvirt still show instance and running:
(overcloud) [stack@undercloud-0 ~]$ nova list
+--------------------------------------+----------------------------+--------+------------+-------------+-----------------------------------+
| ID                                   | Name                       | Status | Task State | Power State | Networks                          |
+--------------------------------------+----------------------------+--------+------------+-------------+-----------------------------------+
| 45ae2948-6eee-4fa3-ae44-48a2cf7ef0a8 | test_watchdog_option       | ACTIVE | -          | Running     | internal=192.168.0.26             |
| 7e648b4b-884e-4fbe-84a2-f1e9812ed463 | test_watchdog_option_nuke2 | ACTIVE | -          | Running     | internal=192.168.0.29             |
+--------------------------------------+----------------------------+--------+------------+-------------+-----------------------------------+

()[root@compute-1 /]# virsh list
 Id   Name                State                                                                                                                                                                                                              
-----------------------------------                                                                                                                                                                                                                                                                                                                                                                                                         
 2    instance-00000008   running      

Well lets just confirm (again) that the watchdog setting has indeed trickled down to virsh level
    <watchdog model='i6300esb' action='pause'>
      <alias name='watchdog0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </watchdog>




To sum up things, Glance is now working fine thus we can verify this bz.
However the underlying Nova/libvirt part isn't working as expected(new bug?)
BTW the image was a Cirros based, maybe I should used say a rhel image?

Comment 22 errata-xmlrpc 2021-12-09 20:20:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3762


Note You need to log in before you can comment on or make changes to this bug.