Bug 1893199

Summary: overcloud instance cinder iscsi volume boot times out
Product: Red Hat OpenStack Reporter: Alistair Tonner <atonner>
Component: python-ironic-libAssignee: Steve Baker <sbaker>
Status: CLOSED DUPLICATE QA Contact: Alistair Tonner <atonner>
Severity: medium Docs Contact:
Priority: medium    
Version: 16.1 (Train)CC: abishop, ltoscano, sbaker
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-12-16 03:42:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alistair Tonner 2020-10-30 14:24:19 UTC
Description of problem:
overcloud BM instance on unique tenant network fails to boot with the console reporting Could not open san device, Connection timed out

Version-Release number of selected component (if applicable):
RHOS-16.1-RHEL-8-20201021.n.0

iscsi-initiator-utils.x86_64                      6.2.0.878-4.gitd791ce0.el8                      @rhosp-rhel-8.2-baseos
iscsi-initiator-utils-iscsiuio.x86_64             6.2.0.878-4.gitd791ce0.el8                      @rhosp-rhel-8.2-baseos
libiscsi.x86_64                                   1.18.0-8.module+el8.2.0+4793+b09dd2fb           @rhosp-rhel-8.2-av
libvirt-daemon-driver-storage-iscsi.x86_64        6.0.0-25.4.module+el8.2.1+8060+c0c58169         @rhosp-rhel-8.2-av
libvirt-daemon-driver-storage-iscsi-direct.x86_64 6.0.0-25.4.module+el8.2.1+8060+c0c58169         @rhosp-rhel-8.2-av
openstack-ironic-python-agent-builder.noarch      2.1.1-1.20200914175356.65d0f80.el8ost           @rhelosp-16.1
puppet-cinder.noarch                              15.4.1-1.20200831153422.ff571a9.el8ost          @rhelosp-16.1
puppet-ironic.noarch                              15.4.1-1.20200814153354.39f97cc.el8ost          @rhelosp-16.1
puppet-nova.noarch                                15.6.1-1.20200814103355.51a6857.el8ost          @rhelosp-16.1
python3-cinderclient.noarch                       5.0.1-0.20200326130221.8fa0882.el8ost           @rhelosp-16.1
python3-ironic-inspector-client.noarch            3.7.1-0.20200522054325.3a41127.el8ost           @rhelosp-16.1
python3-ironicclient.noarch                       3.1.2-0.20200522053422.1220d76.el8ost           @rhelosp-16.1
python3-novaclient.noarch                         1:15.1.1-0.20200629073413.79959ab.el8ost        @rhelosp-16.1
qemu-kvm-block-iscsi.x86_64                       15:4.2.0-29.module+el8.2.1+7990+27f1e480.4      @rhosp-rhel-8.2-av

docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-ironic-neutron-agent:16.1_20201020.1       |
| docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-scheduler:16.1_20201020.1             |
| docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-api:16.1_20201020.1                   |
| docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-ironic-pxe:16.1_20201020.1                 |
| docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-conductor:16.1_20201020.1             |
| docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-ironic-api:16.1_20201020.1                 |
| docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-iscsid:16.1_20201020.1                     |
| docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-compute-ironic:16.1_20201020.1        |
| docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-ironic-conductor:16.1_20201020.1           |
| docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-ironic-inspector:16.1_20201020.1           |
| docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-api:16.1_20201020.1                 |
| docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-novncproxy:16.1_20201020.1            |
| docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-scheduler:16.1_20201020.1           |
| docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-compute:16.1_20201020.1               |
| docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-libvirt:16.1_20201020.1               |
| docker://undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-volume:16.1_20201020.1

How reproducible:

Consistently 


Steps to Reproduce:
1.Deploy 3cont_2comp_2ironic, update rhel8 image to include scsi boot utils, create cinder volume and connect to ironic node, boot fails.
2.
3.

Actual results:
boot fails with node console reporting "Could not open san device, boot timed out"  (this loops perpetually)

Expected results:
instance should boot successfully

Additional info:
  Jenkins testing

Comment 2 Alistair Tonner 2020-10-30 14:31:24 UTC
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| f947288b-6efc-4e03-8cb5-037162f097f8 | compute-0    | f8c722d7-f230-4561-b987-88821bc4b060 | power on    | active             | False       |
| 29a8583c-a9a8-43e1-a19b-890095b173e5 | compute-1    | 556260da-bbe7-425f-9faf-70ff7eac6b9d | power on    | active             | False       |
| 44c3233a-4f00-42af-9c61-a264514d6a4c | controller-0 | 133205fc-8dc3-40e4-b27f-97207442bb6e | power on    | active             | False       |
| c95e6dbe-f1fe-4bf3-903a-e69a2ec2197a | controller-1 | f30bdd1c-a20f-47aa-b1a0-02c8a7e0b173 | power on    | active             | False       |
| f9ca8bd3-c1f6-4d51-9893-e1757e3fcaf5 | controller-2 | 6a946da5-006c-4a6d-a7e2-acb6aa62fcb2 | power on    | active             | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+

+--------------------------------------+--------------+--------+------------------------+----------------+------------+
| ID                                   | Name         | Status | Networks               | Image          | Flavor     |
+--------------------------------------+--------------+--------+------------------------+----------------+------------+
| f30bdd1c-a20f-47aa-b1a0-02c8a7e0b173 | controller-2 | ACTIVE | ctlplane=192.168.24.11 | overcloud-full | controller |
| 6a946da5-006c-4a6d-a7e2-acb6aa62fcb2 | controller-1 | ACTIVE | ctlplane=192.168.24.45 | overcloud-full | controller |
| 133205fc-8dc3-40e4-b27f-97207442bb6e | controller-0 | ACTIVE | ctlplane=192.168.24.35 | overcloud-full | controller |
| 556260da-bbe7-425f-9faf-70ff7eac6b9d | compute-1    | ACTIVE | ctlplane=192.168.24.16 | overcloud-full | compute    |
| f8c722d7-f230-4561-b987-88821bc4b060 | compute-0    | ACTIVE | ctlplane=192.168.24.48 | overcloud-full | compute    |
+--------------------------------------+--------------+--------+------------------------+----------------+------------+

+--------------------------------------+----------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name     | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+----------+--------------------------------------+-------------+--------------------+-------------+
| 47b32327-8005-4805-8f25-c7875d21061b | ironic-0 | 01f8d194-84be-4d9d-ab6a-cfde6779351d | power on    | active             | False       |
| 47dfc4e4-e413-4415-9b7c-320b793dd07e | ironic-1 | b5a0bdb4-d54b-4b5f-b581-44472c0750ee | power on    | active             | False       |
+--------------------------------------+----------+--------------------------------------+-------------+--------------------+-------------+
+--------------------------------------+---------------+--------+-------------------------+----------------+--------+
| ID                                   | Name          | Status | Networks                | Image          | Flavor |
+--------------------------------------+---------------+--------+-------------------------+----------------+--------+
| 01f8d194-84be-4d9d-ab6a-cfde6779351d | bfv-instance1 | ACTIVE | baremetal=192.168.24.80 |                |        |
| b5a0bdb4-d54b-4b5f-b581-44472c0750ee | instance2     | ACTIVE | baremetal=192.168.24.89 | overcloud-full |        |


bfv-instance1 is the cinder boot node:

os_dcf_diskconfig="MANUAL"
os_ext_az_availability_zone="nova"
os_ext_srv_attr_host="controller-1.redhat.local"
os_ext_srv_attr_hostname="bfv-instance1"
os_ext_srv_attr_hypervisor_hostname="47b32327-8005-4805-8f25-c7875d21061b"
os_ext_srv_attr_instance_name="instance-00000008"
os_ext_srv_attr_kernel_id=""
os_ext_srv_attr_launch_index="0"
os_ext_srv_attr_ramdisk_id=""
os_ext_srv_attr_reservation_id="r-qqrmjq97"
os_ext_srv_attr_root_device_name="/dev/sda"
os_ext_srv_attr_user_data="None"
os_ext_sts_power_state="Running"
os_ext_sts_task_state="None"
os_ext_sts_vm_state="active"
os_srv_usg_launched_at="2020-10-30T00:31:25.000000"
os_srv_usg_terminated_at="None"
accessipv4=""
accessipv6=""
addresses="baremetal=192.168.24.80"
config_drive="True"
created="2020-10-30T00:30:20Z"
description="None"
flavor="disk='20', ephemeral='0', extra_specs.baremetal='true', extra_specs.resources:CUSTOM_BAREMETAL='1', extra_specs.resources:DISK_GB='0', extra_specs.resources:MEMORY_MB='0', extra_specs.resources:VCPU='0', original_name='baremetal', ram='2048', swap='0', vcpus='1'"
hostid="c026ce08b0b35d88fe16182861e3b6e2dfed4f8adfb95280cf125cf7"
host_status="UP"
id="01f8d194-84be-4d9d-ab6a-cfde6779351d"
image=""
key_name="stack-key"
locked="False"
locked_reason="None"
name="bfv-instance1"
progress="0"
project_id="8ba6a449dac6476896f9106f2d11a398"
properties=""
security_groups="name='default'"
server_groups="[]"
status="ACTIVE"
tags="[]"
trusted_image_certificates="None"
updated="2020-10-30T00:32:27Z"
user_id="14e80c9af6bd40dfbb44ceeee3f022b7"
volumes_attached="delete_on_termination='False', id='86b1c7df-ad30-4059-811a-b3f0c298141d'"

Volume:

attachments="[{'id': '86b1c7df-ad30-4059-811a-b3f0c298141d', 'attachment_id': 'fda08c68-cc9b-4750-8887-0d4a91a15847', 'volume_id': '86b1c7df-ad30-4059-811a-b3f0c298141d', 'server_id': '01f8d194-84be-4d9d-ab6a-cfde6779351d', 'host_name': '192.168.24.80', 'device': '/dev/sda', 'attached_at': '2020-10-30T00:30:33.000000'}]"
availability_zone="nova"
bootable="true"
consistencygroup_id="None"
created_at="2020-10-30T00:28:54.000000"
description="None"
encrypted="False"
id="86b1c7df-ad30-4059-811a-b3f0c298141d"
migration_status="None"
multiattach="False"
name="rhel-test-volume"
os_vol_host_attr_host="hostgroup@tripleo_iscsi#tripleo_iscsi"
os_vol_mig_status_attr_migstat="None"
os_vol_mig_status_attr_name_id="None"
os_vol_tenant_attr_tenant_id="8ba6a449dac6476896f9106f2d11a398"
properties="{}"
replication_status="None"
size="10"
snapshot_id="None"
source_volid="None"
status="in-use"
type="tripleo"
updated_at="2020-10-30T00:30:33.000000"
user_id="14e80c9af6bd40dfbb44ceeee3f022b7"
volume_image_metadata="{'signature_verified': 'False', 'image_id': 'd91d7b7e-5fc2-42f8-92f5-b82da1d46fcf', 'image_name': 'rhel-bfv', 'checksum': '98dad0abb0894ddd27c81b983373af33', 'container_format': 'bare', 'disk_format': 'qcow2', 'min_disk': '0', 'min_ram': '0', 'size': '1259601920'}"

+--------------------------------------+-----------+--------------------------------------+
| ID                                   | Name      | Subnets                              |
+--------------------------------------+-----------+--------------------------------------+
| a51b4765-9227-41c7-8378-d8f5b49e5b25 | baremetal | 5774ae3b-0763-4991-b80b-f95bbca66658 |
+--------------------------------------+-----------+--------------------------------------+
(overcloud) [stack@undercloud-0 ~]$ openstack subnet show 5774ae3b-0763-4991-b80b-f95bbca66658
+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field             | Value                                                                                                                                                            |
+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| allocation_pools  | 192.168.24.71-192.168.24.100                                                                                                                                     |
| cidr              | 192.168.24.0/24                                                                                                                                                  |
| created_at        | 2020-10-29T23:57:21Z                                                                                                                                             |
| description       |                                                                                                                                                                  |
| dns_nameservers   | 10.0.0.1                                                                                                                                                         |
| enable_dhcp       | True                                                                                                                                                             |
| gateway_ip        | 192.168.24.250                                                                                                                                                   |
| host_routes       |                                                                                                                                                                  |
| id                | 5774ae3b-0763-4991-b80b-f95bbca66658                                                                                                                             |
| ip_version        | 4                                                                                                                                                                |
| ipv6_address_mode | None                                                                                                                                                             |
| ipv6_ra_mode      | None                                                                                                                                                             |
| location          | cloud='', project.domain_id=, project.domain_name='Default', project.id='8ba6a449dac6476896f9106f2d11a398', project.name='admin', region_name='regionOne', zone= |
| name              | baremetal-subnet                                                                                                                                                 |
| network_id        | a51b4765-9227-41c7-8378-d8f5b49e5b25                                                                                                                             |
| prefix_length     | None                                                                                                                                                             |
| project_id        | 8ba6a449dac6476896f9106f2d11a398                                                                                                                                 |
| revision_number   | 2                                                                                                                                                                |
| segment_id        | None                                                                                                                                                             |
| service_types     |                                                                                                                                                                  |
| subnetpool_id     | None                                                                                                                                                             |
| tags              |                                                                                                                                                                  |
| updated_at        | 2020-10-30T00:29:31Z                                                                                                                                             |
+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
(overcloud) [stack@undercloud-0 ~]$ openstack port list
+--------------------------------------+------+-------------------+-------------------------------------------------------------------------------+--------+
| ID                                   | Name | MAC Address       | Fixed IP Addresses                                                            | Status |
+--------------------------------------+------+-------------------+-------------------------------------------------------------------------------+--------+
| 056aded0-ea3e-4698-bbc0-72bd0722eaf9 |      | 52:54:00:f7:1b:7b | ip_address='192.168.24.80', subnet_id='5774ae3b-0763-4991-b80b-f95bbca66658'  | DOWN   |
| 1abbad93-f1ea-400e-b3ae-37e235b457c0 |      | 52:54:00:47:6b:66 | ip_address='192.168.24.89', subnet_id='5774ae3b-0763-4991-b80b-f95bbca66658'  | DOWN   |
| 5edead48-0e2a-4ccb-95f1-7c617e52f97a |      | fa:16:3e:58:c3:8e | ip_address='192.168.24.71', subnet_id='5774ae3b-0763-4991-b80b-f95bbca66658'  | DOWN   |
| 76411b09-c512-4b49-8acd-741786e44f92 |      | fa:16:3e:d1:d3:5f | ip_address='192.168.24.72', subnet_id='5774ae3b-0763-4991-b80b-f95bbca66658'  | ACTIVE |
| a88990d7-80e2-4f5d-9169-73c60fbb86f7 |      | fa:16:3e:50:bb:0f | ip_address='192.168.24.250', subnet_id='5774ae3b-0763-4991-b80b-f95bbca66658' | ACTIVE |
| cdd01aa9-adba-4b68-b70c-fdeff7fdf1d5 |      | fa:16:3e:76:a0:7c | ip_address='192.168.24.73', subnet_id='5774ae3b-0763-4991-b80b-f95bbca66658'  | ACTIVE |
| f53fe0a1-da4e-47ed-a51c-090b75ab8ad2 |      | fa:16:3e:b8:93:4d | ip_address='192.168.24.74', subnet_id='5774ae3b-0763-4991-b80b-f95bbca66658'  | ACTIVE |
+--------------------------------------+------+-------------------+-------------------------------------------------------------------------------+--------+

Comment 3 Luigi Toscano 2020-10-30 14:35:24 UTC
Which iSCSI is that? Please provide a bit more details about the configuration. 
Do you know if it works with other drivers (ceph)?

Comment 4 Luigi Toscano 2020-11-02 13:17:22 UTC
Please ignore my last question, as this is iSCSI only.

Comment 5 Alistair Tonner 2020-11-04 16:43:40 UTC
Issue is routing -> node boots with 192.168.24.80 as IP, initiator is at 172.17.3.147

controller-0 has 192.168.24.29/32 192.168.24.35/24
controller-1 has 192.168.24.45/24
controller-2 has 192.168.24.11/24

control_virtual_ip is 192.168.24.29 (on controller-0)

Router r1 is built with baremetal-subnet using 192.168.24.250 as a gateway, allocation pool 192.168.24.71-> 100 and a route to 172.17.3.0/24 via control_virtual_ip (192.168.24.29 on controller-0)

in this state booting node fails with Could not open san device, Connection timed out.
changing the route to 172.17.3.0/24 via 192.168.24.45  (controller-1, where the scsi initiator lives) allows the booting node to connect to the initiator and load the image and boot.

Controller-0 is not forwarding traffic from the router into vlan30 (172.17.3.0/24).

Comment 6 Alistair Tonner 2020-11-04 18:35:40 UTC
setting ipv4.net.ipv4.conf.all.rp_filter=2 on controller-0/1/2 resolves the communications issue.  Node can now pull the image and boot.

Comment 7 Alistair Tonner 2020-11-04 21:04:11 UTC
final footnote on the job failure:

LIBGUESTFS_BACKEND=direct virt-customize -a /tmp/images/{{ dib_image }} --run-command 'echo "nameserver 10.11.5.19" |tee /etc/resolv.conf && yum localinstall -y http://rhos-release.virt.bos.redhat.com/repos/rhos-release/rhos-release-latest.noarch.rpm && rhos-release -P {{ ospversion.stdout|float }} -p passed_phase1 && yum install -y iscsi-initiator-utils cloud-init openssh && for i in $(awk -F"vmlinuz-" "/linux16/ && ! /rescue/ {print \$2}" /etc/grub2.cfg|awk "{print \$1}") ; do (dracut --force --add "network iscsi" /boot/initramfs-$i.img $i) ; done && sed -i "s/GRUB_CMDLINE_LINUX=\"/GRUB_CMDLINE_LINUX=\"rd.iscsi.firmware=1 /g" /etc/default/grub && /sbin/grub2-mkconfig -o /boot/grub2/grub.cfg && adduser cloud-user && mkdir /home/cloud-user/.ssh && chown -R cloud-user:cloud-user /home/cloud-user && chmod 700 /home/cloud-user/.ssh && touch /home/cloud-user/.ssh/authorized_keys && chmod 600 /home/cloud-user/.ssh/authorized_keys' --root-password password:redhat --ssh-inject cloud-user --selinux-relabel

 The above command completes without failure but results in an image that is unbootable because of :
for i in $(awk -F"vmlinuz-" "/linux16/ && ! /rescue/ {print \$2}" /etc/grub2.cfg|awk "{print \$1}") ; do (dracut --force --add "network iscsi" /boot/initramfs-$i.img $i) ; done

  There are no longer lines with "linux16" in /etc/grub2.cfg  -- switched to ls /lib/modules for a list of installed kernels.

Comment 8 Luigi Toscano 2020-11-05 09:51:58 UTC
So is this an issue with the network setup?

Comment 9 Alan Bishop 2020-11-05 15:27:48 UTC
Or a baremetal provisioning issue? Either way it does not appear to be a storage issue for the Cinder squad. Can we get this reassigned?

Comment 10 Steve Baker 2020-12-16 03:42:36 UTC
I'm going to mark this as a duplicate of bug #1892773, since we're assuming its that same problem, and that bug is also against 16.1

*** This bug has been marked as a duplicate of bug 1892773 ***