Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1949525

Summary: [FFU 13-16.1] n-cpu in hybrid mode unable to attach volumes - iSCSI driver not found. Please make sure it is loaded, and retry the operation
Product: Red Hat OpenStack Reporter: MD Sufiyan <msufiyan>
Component: openstack-tripleo-heat-templatesAssignee: Lukas Bezdicka <lbezdick>
Status: CLOSED ERRATA QA Contact: Jason Grosso <jgrosso>
Severity: high Docs Contact:
Priority: high    
Version: 16.1 (Train)CC: ahyder, dasmith, dvd, eglynn, jamsmith, jesse.pretorius, jgrosso, jhakimra, jpretori, kchamart, kthakre, lbezdick, ltoscano, lyarwood, mburns, pgrist, sbauza, sgolovat, sgordon, slinaber, spower, tshefi, vromanso
Target Milestone: z6Keywords: Triaged
Target Release: 16.1 (Train on RHEL 8.2)Flags: tshefi: automate_bug+
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-11.3.2-1.20210408163452.el8ost Doc Type: Known Issue
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-26 13:52:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description MD Sufiyan 2021-04-14 13:30:07 UTC
Description of problem:

Power-on or cold migration of shutdown instances residing on un-upgraded compute nodes always end up with VolumeDeviceNotFound error

>> nova-compute logs

~~~
Command: iscsiadm -m node -T iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 -p 172.x.x.x:3260 --login
Exit code: 12
Stdout: 'Logging in to [iface: default, target: iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603, portal: 172.x.x.x,3260]\n'
Stderr: 'iscsiadm: Could not login to [iface: default, target: iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603, portal: 172.x.x.x,3260].\
niscsiadm: initiator reported error (12 - iSCSI driver not found. Please make sure it is loaded, and retry the operation)\niscsiadm: Could not log into all portals\n' _
process_cmd /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:490
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 485, in _process_cmd
    ret = func(*f_args, **f_kwargs)
  File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 247, in _wrap
    return func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line 194, in execute_root
    return custom_execute(*cmd, shell=False, run_as_root=False, **kwargs)
  File "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line 143, in custom_execute
    on_completion=on_completion, *cmd, **kwargs)
  File "/usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py", line 424, in execute
    cmd=sanitized_cmd)
oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
Command: iscsiadm -m node -T iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 -p 172.x.x.x:3260 --login
Exit code: 12
Stdout: 'Logging in to [iface: default, target: iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603, portal: 172.x.x.x,3260]\n'
Stderr: 'iscsiadm: Could not login to [iface: default, target: iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603, portal: 172.x.x.x,3260].\niscsiadm: initiator reported error (12 - iSCSI driver not found. Please make sure it is loaded, and retry the operation)\niscsiadm: Could not log into all portals\n'
2021-04-14 15:29:14.546 270 DEBUG oslo.privsep.daemon [-] privsep: reply[140005403599832]: (5, 'oslo_concurrency.processutils.ProcessExecutionError', ('Logging in to [iface: default, target: iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603, portal: 172.x.x.x,3260]\n', 'iscsiadm: Could not login to [iface:default, target: iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603, portal: 172.x.x.x,3260].\niscsiadm: initiator reported error (12 - iSCSI driver not found. Please make sure it is loaded, and retry the operation)\niscsiadm: Could not log into all portals\n', 12, 'iscsiadm -m node -T iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 -p 172.x.x.x:3260 --login', None)) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
2021-04-14 15:29:14.547 8 WARNING os_brick.initiator.connectors.iscsi [req-aaa5d3ce-9496-4bdf-ad05-1da3a3401a21 b0adfbe4065146d5ac91476ccac2f04a 6b2dc8c2170a4728ae0aacfc845abdcf - default default] Failed to login iSCSI target iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 on portal 172.x.x.x:3260 (exitcode 12).: oslo_concurrency.processutils ProcessExecutionError: Unexpected error while running command.
2021-04-14 15:29:14.547 8 WARNING os_brick.initiator.connectors.iscsi [req-aaa5d3ce-9496-4bdf-ad05-1da3a3401a21 b0adfbe4065146d5ac91476ccac2f04a 6b2dc8c2170a4728ae0aacfc845abdcf - default default] Failed to connect to iSCSI portal 172.x.x.x:3260.
2021-04-14 15:29:14.548 8 DEBUG os_brick.initiator.connectors.iscsi [req-aaa5d3ce-9496-4bdf-ad05-1da3a3401a21 b0adfbe4065146d5ac91476ccac2f04a 6b2dc8c2170a4728ae0aacfc845abdcf - default default] Getting connected devices fo(ips,iqns,luns)=(('172.x.x.x:3260', 'iqn.2010-01.com.solidfire:yyj6.uuid-ae75d59d-8aab-475a-ac0b-f025f4c2e200.603', 0),) _get_connection_devices /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:816
2021-04-14 15:29:14.548 270 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m node execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:372
2021-04-14 15:29:14.566 270 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m node" returned: 0 in 0.018s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:409
~~~

- Connection to win iqn-xx is not establishing while doing a cold-migrate or power on
 
~~~
grep -i aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee nova-compute.log
2021-04-14 13:36:53.633 8 WARNING os_brick.initiator.connectors.iscsi [req-faa2b034-57ab-4b83-91f7-7a9815e5ae0e bd9d83280a17446ab408302ecd51f442 6b2dc8c2170a4728ae0aacfc845abdcf - default default] Failed to login iSCSI target iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 on portal 172.x.x.x:3260 (exit code 12).: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
2021-04-14 13:36:55.989 8 WARNING os_brick.initiator.connectors.iscsi [req-faa2b034-57ab-4b83-91f7-7a9815e5ae0e bd9d83280a17446ab408302ecd51f442 6b2dc8c2170a4728ae0aacfc845abdcf - default default] Failed to login iSCSI target iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 on portal 172.x.x.x:3260 (exit code 12).: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
2021-04-14 13:37:00.366 8 WARNING os_brick.initiator.connectors.iscsi [req-faa2b034-57ab-4b83-91f7-7a9815e5ae0e bd9d83280a17446ab408302ecd51f442 6b2dc8c2170a4728ae0aacfc845abdcf - default default] Failed to login iSCSI target iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 on portal 172.x.x.x:3260 (exit code 12).: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
2021-04-14 13:37:46.615 8 WARNING os_brick.initiator.connectors.iscsi [req-ab8ef843-a30b-4bcf-8eb8-56c60d056faa bd9d83280a17446ab408302ecd51f442 6b2dc8c2170a4728ae0aacfc845abdcf - default default] Failed to login iSCSI target iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 on portal 172.x.x.x:3260 (exit code 12).: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
2021-04-14 13:37:49.009 8 WARNING os_brick.initiator.connectors.iscsi [req-ab8ef843-a30b-4bcf-8eb8-56c60d056faa bd9d83280a17446ab408302ecd51f442 6b2dc8c2170a4728ae0aacfc845abdcf - default default] Failed to login iSCSI target iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 on portal 172.x.x.x:3260 (exit code 12).: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
2021-04-14 13:37:53.421 8 WARNING os_brick.initiator.connectors.iscsi [req-ab8ef843-a30b-4bcf-8eb8-56c60d056faa bd9d83280a17446ab408302ecd51f442 6b2dc8c2170a4728ae0aacfc845abdcf - default default] Failed to login iSCSI target iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 on portal 172.x.x.x:3260 (exit code 12).: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
~~~
 
- nova-compute is currently using the osp16.1 container as the nova_hybrid task step has been already performed.

~~~
cat sos_commands/docker/docker_ps | grep -i nova
1bedf70e14a1        hostname.domin.com:5000/registry-osp16-1-containers-nova-compute:16.1.3                           "dumb-init --singl..."   7 days ago          Up 7 days (unhealthy)                       nova_compute
e9b5ac222086        hostname.domin.com:5000/registry-docker-container-nova-compute:13.0-160.1614612488                "dumb-init --singl..."   4 weeks ago         Up 4 weeks (healthy)                        nova_migration_target
bcee5c007820        hostname.domin.com:5000/registry-docker-container-nova-libvirt:13.0-170.1614612467                "dumb-init --singl..."   4 weeks ago         Up 4 weeks                                  nova_libvirt
8da3e1ec5d05        hostname.domin.com:5000/registry-docker-container-nova-libvirt:13.0-170.1614612467                "dumb-init --singl..."   4 weeks ago         Up 4 weeks                                  nova_virtlogd
~~~

~~~
[root@bl2ol10 nova]# docker exec -it nova_compute rpm -qa |grep -i  os-brick
python3-os-brick-2.10.5-1.20200916163418.634fb4a.el8ost.noarch
~~~

Storage backend : 

~~~
enabled_backends=tripleo_iscsi,solidfire
~~~

Comment 3 Lee Yarwood 2021-04-14 15:14:47 UTC
This smells like an issue with c-vol incorrectly mapping volumes to the compute hosts after the control plane upgrade to 16.

Have we tried to shelve and unshelve an example boot from volume instance? If this fails it will confirm that the connection_info being provided by c-api is invalid and could be caused by a failure to map volumes by c-vol.

Comment 4 Lee Yarwood 2021-04-14 15:29:45 UTC
(In reply to Lee Yarwood from comment #3)
> This smells like an issue with c-vol incorrectly mapping volumes to the
> compute hosts after the control plane upgrade to 16.
> 
> Have we tried to shelve and unshelve an example boot from volume instance?
> If this fails it will confirm that the connection_info being provided by
> c-api is invalid and could be caused by a failure to map volumes by c-vol.

Apologies ignore this, buried within the trace is the underlying reason:

`initiator reported error (12 - iSCSI driver not found. Please make sure it is loaded, and retry the operation`

As discussed on irc this smells like the following caused by a version mismatch between the host and container:

After a minor update, cinder-volume can't create volume from image, they are stuck in downloading
https://access.redhat.com/solutions/5504681

Lets confirm the versions used within the nova_compute container and host, what modules are loaded etc and go from there.

Comment 23 Lukas Bezdicka 2021-04-23 09:14:09 UTC
Failing QA:

2021-04-23 04:18:04 | 
2021-04-23 04:18:04 | TASK [Update the iscsid paunch image in config] ********************************
2021-04-23 04:18:04 | Friday 23 April 2021  04:18:02 +0000 (0:00:00.690)       0:00:06.001 ********** 
2021-04-23 04:18:04 | changed: [controller-2] => {"changed": true, "cmd": "set -o pipefail\njq '.iscsid.image = \"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-iscsid:16.1_20210421.1\" | {\"iscsid\": .iscsid }' /var/lib/tripleo-config/docker-container-startup-config-step_3.json >\\\n/var/lib/tripleo-config/docker-container-hybrid_iscsid.json\n", "delta": "0:00:00.010810", "end": "2021-04-23 04:18:03.097911", "rc": 0, "start": "2021-04-23 04:18:03.087101", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
2021-04-23 04:18:04 | changed: [controller-1] => {"changed": true, "cmd": "set -o pipefail\njq '.iscsid.image = \"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-iscsid:16.1_20210421.1\" | {\"iscsid\": .iscsid }' /var/lib/tripleo-config/docker-container-startup-config-step_3.json >\\\n/var/lib/tripleo-config/docker-container-hybrid_iscsid.json\n", "delta": "0:00:00.009339", "end": "2021-04-23 04:18:03.101408", "rc": 0, "start": "2021-04-23 04:18:03.092069", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
2021-04-23 04:18:04 | fatal: [controller-0]: FAILED! => {"changed": true, "cmd": "set -o pipefail\njq '.iscsid.image = \"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-iscsid:16.1_20210421.1\" | {\"iscsid\": .iscsid }' /var/lib/tripleo-config/docker-container-startup-config-step_3.json >\\\n/var/lib/tripleo-config/docker-container-hybrid_iscsid.json\n", "delta": "0:00:00.004323", "end": "2021-04-23 04:18:03.149369", "msg": "non-zero return code", "rc": 127, "start": "2021-04-23 04:18:03.145046", "stderr": "/bin/sh: line 1: jq: command not found", "stderr_lines": ["/bin/sh: line 1: jq: command not found"], "stdout": "", "stdout_lines": []}
2021-04-23 04:18:04 | 
2021-04-23 04:18:04 | NO MORE HOSTS LEFT *************************************************************
2021-04-23 04:18:04 | 
2021-04-23 04:18:04 | PLAY RECAP *********************************************************************
2021-04-23 04:18:04 | compute-0                  : ok=4    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0   
2021-04-23 04:18:04 | compute-1                  : ok=3    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0   
2021-04-23 04:18:04 | controller-0               : ok=5    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   
2021-04-23 04:18:04 | controller-1               : ok=6    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
2021-04-23 04:18:04 | controller-2               : ok=6    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
2021-04-23 04:18:04 | undercloud                 : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Comment 24 spower 2021-04-26 15:08:36 UTC
This was not ready for the 16.1.6 final compose, we are blocker only now. This is being moved to 16.1.7 now and the exception flag will be removed.

Comment 42 errata-xmlrpc 2021-05-26 13:52:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.6 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2097

Comment 43 Tzach Shefi 2021-06-09 07:30:03 UTC
As this passed automation/ci see -> #c34
I've bumped "automate_bug" status to +