Bug 1949525 - [FFU 13-16.1] n-cpu in hybrid mode unable to attach volumes - iSCSI driver not found. Please make sure it is loaded, and retry the operation
Summary: [FFU 13-16.1] n-cpu in hybrid mode unable to attach volumes - iSCSI driver no...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.1 (Train)
Hardware: All
OS: Linux
high
high
Target Milestone: z6
: 16.1 (Train on RHEL 8.2)
Assignee: Lukas Bezdicka
QA Contact: Jason Grosso
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-14 13:30 UTC by MD Sufiyan
Modified: 2024-06-14 01:14 UTC (History)
23 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.3.2-1.20210408163452.el8ost
Doc Type: Known Issue
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-26 13:52:43 UTC
Target Upstream Version:
Embargoed:
tshefi: automate_bug+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 786289 0 None MERGED [train-only] Introduce hybrid state for iscsi 2021-04-20 11:22:05 UTC
OpenStack gerrit 787698 0 None MERGED [ffwd] Rework checks for hybrid state containers 2021-04-30 08:43:47 UTC
Red Hat Issue Tracker OSP-2962 0 None None None 2022-08-02 13:55:01 UTC
Red Hat Product Errata RHBA-2021:2097 0 None None None 2021-05-26 13:53:08 UTC

Description MD Sufiyan 2021-04-14 13:30:07 UTC
Description of problem:

Power-on or cold migration of shutdown instances residing on un-upgraded compute nodes always end up with VolumeDeviceNotFound error

>> nova-compute logs

~~~
Command: iscsiadm -m node -T iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 -p 172.x.x.x:3260 --login
Exit code: 12
Stdout: 'Logging in to [iface: default, target: iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603, portal: 172.x.x.x,3260]\n'
Stderr: 'iscsiadm: Could not login to [iface: default, target: iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603, portal: 172.x.x.x,3260].\
niscsiadm: initiator reported error (12 - iSCSI driver not found. Please make sure it is loaded, and retry the operation)\niscsiadm: Could not log into all portals\n' _
process_cmd /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:490
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 485, in _process_cmd
    ret = func(*f_args, **f_kwargs)
  File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 247, in _wrap
    return func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line 194, in execute_root
    return custom_execute(*cmd, shell=False, run_as_root=False, **kwargs)
  File "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line 143, in custom_execute
    on_completion=on_completion, *cmd, **kwargs)
  File "/usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py", line 424, in execute
    cmd=sanitized_cmd)
oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
Command: iscsiadm -m node -T iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 -p 172.x.x.x:3260 --login
Exit code: 12
Stdout: 'Logging in to [iface: default, target: iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603, portal: 172.x.x.x,3260]\n'
Stderr: 'iscsiadm: Could not login to [iface: default, target: iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603, portal: 172.x.x.x,3260].\niscsiadm: initiator reported error (12 - iSCSI driver not found. Please make sure it is loaded, and retry the operation)\niscsiadm: Could not log into all portals\n'
2021-04-14 15:29:14.546 270 DEBUG oslo.privsep.daemon [-] privsep: reply[140005403599832]: (5, 'oslo_concurrency.processutils.ProcessExecutionError', ('Logging in to [iface: default, target: iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603, portal: 172.x.x.x,3260]\n', 'iscsiadm: Could not login to [iface:default, target: iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603, portal: 172.x.x.x,3260].\niscsiadm: initiator reported error (12 - iSCSI driver not found. Please make sure it is loaded, and retry the operation)\niscsiadm: Could not log into all portals\n', 12, 'iscsiadm -m node -T iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 -p 172.x.x.x:3260 --login', None)) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
2021-04-14 15:29:14.547 8 WARNING os_brick.initiator.connectors.iscsi [req-aaa5d3ce-9496-4bdf-ad05-1da3a3401a21 b0adfbe4065146d5ac91476ccac2f04a 6b2dc8c2170a4728ae0aacfc845abdcf - default default] Failed to login iSCSI target iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 on portal 172.x.x.x:3260 (exitcode 12).: oslo_concurrency.processutils ProcessExecutionError: Unexpected error while running command.
2021-04-14 15:29:14.547 8 WARNING os_brick.initiator.connectors.iscsi [req-aaa5d3ce-9496-4bdf-ad05-1da3a3401a21 b0adfbe4065146d5ac91476ccac2f04a 6b2dc8c2170a4728ae0aacfc845abdcf - default default] Failed to connect to iSCSI portal 172.x.x.x:3260.
2021-04-14 15:29:14.548 8 DEBUG os_brick.initiator.connectors.iscsi [req-aaa5d3ce-9496-4bdf-ad05-1da3a3401a21 b0adfbe4065146d5ac91476ccac2f04a 6b2dc8c2170a4728ae0aacfc845abdcf - default default] Getting connected devices fo(ips,iqns,luns)=(('172.x.x.x:3260', 'iqn.2010-01.com.solidfire:yyj6.uuid-ae75d59d-8aab-475a-ac0b-f025f4c2e200.603', 0),) _get_connection_devices /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:816
2021-04-14 15:29:14.548 270 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m node execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:372
2021-04-14 15:29:14.566 270 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m node" returned: 0 in 0.018s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:409
~~~

- Connection to win iqn-xx is not establishing while doing a cold-migrate or power on
 
~~~
grep -i aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee nova-compute.log
2021-04-14 13:36:53.633 8 WARNING os_brick.initiator.connectors.iscsi [req-faa2b034-57ab-4b83-91f7-7a9815e5ae0e bd9d83280a17446ab408302ecd51f442 6b2dc8c2170a4728ae0aacfc845abdcf - default default] Failed to login iSCSI target iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 on portal 172.x.x.x:3260 (exit code 12).: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
2021-04-14 13:36:55.989 8 WARNING os_brick.initiator.connectors.iscsi [req-faa2b034-57ab-4b83-91f7-7a9815e5ae0e bd9d83280a17446ab408302ecd51f442 6b2dc8c2170a4728ae0aacfc845abdcf - default default] Failed to login iSCSI target iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 on portal 172.x.x.x:3260 (exit code 12).: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
2021-04-14 13:37:00.366 8 WARNING os_brick.initiator.connectors.iscsi [req-faa2b034-57ab-4b83-91f7-7a9815e5ae0e bd9d83280a17446ab408302ecd51f442 6b2dc8c2170a4728ae0aacfc845abdcf - default default] Failed to login iSCSI target iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 on portal 172.x.x.x:3260 (exit code 12).: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
2021-04-14 13:37:46.615 8 WARNING os_brick.initiator.connectors.iscsi [req-ab8ef843-a30b-4bcf-8eb8-56c60d056faa bd9d83280a17446ab408302ecd51f442 6b2dc8c2170a4728ae0aacfc845abdcf - default default] Failed to login iSCSI target iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 on portal 172.x.x.x:3260 (exit code 12).: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
2021-04-14 13:37:49.009 8 WARNING os_brick.initiator.connectors.iscsi [req-ab8ef843-a30b-4bcf-8eb8-56c60d056faa bd9d83280a17446ab408302ecd51f442 6b2dc8c2170a4728ae0aacfc845abdcf - default default] Failed to login iSCSI target iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 on portal 172.x.x.x:3260 (exit code 12).: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
2021-04-14 13:37:53.421 8 WARNING os_brick.initiator.connectors.iscsi [req-ab8ef843-a30b-4bcf-8eb8-56c60d056faa bd9d83280a17446ab408302ecd51f442 6b2dc8c2170a4728ae0aacfc845abdcf - default default] Failed to login iSCSI target iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 on portal 172.x.x.x:3260 (exit code 12).: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
~~~
 
- nova-compute is currently using the osp16.1 container as the nova_hybrid task step has been already performed.

~~~
cat sos_commands/docker/docker_ps | grep -i nova
1bedf70e14a1        hostname.domin.com:5000/registry-osp16-1-containers-nova-compute:16.1.3                           "dumb-init --singl..."   7 days ago          Up 7 days (unhealthy)                       nova_compute
e9b5ac222086        hostname.domin.com:5000/registry-docker-container-nova-compute:13.0-160.1614612488                "dumb-init --singl..."   4 weeks ago         Up 4 weeks (healthy)                        nova_migration_target
bcee5c007820        hostname.domin.com:5000/registry-docker-container-nova-libvirt:13.0-170.1614612467                "dumb-init --singl..."   4 weeks ago         Up 4 weeks                                  nova_libvirt
8da3e1ec5d05        hostname.domin.com:5000/registry-docker-container-nova-libvirt:13.0-170.1614612467                "dumb-init --singl..."   4 weeks ago         Up 4 weeks                                  nova_virtlogd
~~~

~~~
[root@bl2ol10 nova]# docker exec -it nova_compute rpm -qa |grep -i  os-brick
python3-os-brick-2.10.5-1.20200916163418.634fb4a.el8ost.noarch
~~~

Storage backend : 

~~~
enabled_backends=tripleo_iscsi,solidfire
~~~

Comment 3 Lee Yarwood 2021-04-14 15:14:47 UTC
This smells like an issue with c-vol incorrectly mapping volumes to the compute hosts after the control plane upgrade to 16.

Have we tried to shelve and unshelve an example boot from volume instance? If this fails it will confirm that the connection_info being provided by c-api is invalid and could be caused by a failure to map volumes by c-vol.

Comment 4 Lee Yarwood 2021-04-14 15:29:45 UTC
(In reply to Lee Yarwood from comment #3)
> This smells like an issue with c-vol incorrectly mapping volumes to the
> compute hosts after the control plane upgrade to 16.
> 
> Have we tried to shelve and unshelve an example boot from volume instance?
> If this fails it will confirm that the connection_info being provided by
> c-api is invalid and could be caused by a failure to map volumes by c-vol.

Apologies ignore this, buried within the trace is the underlying reason:

`initiator reported error (12 - iSCSI driver not found. Please make sure it is loaded, and retry the operation`

As discussed on irc this smells like the following caused by a version mismatch between the host and container:

After a minor update, cinder-volume can't create volume from image, they are stuck in downloading
https://access.redhat.com/solutions/5504681

Lets confirm the versions used within the nova_compute container and host, what modules are loaded etc and go from there.

Comment 23 Lukas Bezdicka 2021-04-23 09:14:09 UTC
Failing QA:

2021-04-23 04:18:04 | 
2021-04-23 04:18:04 | TASK [Update the iscsid paunch image in config] ********************************
2021-04-23 04:18:04 | Friday 23 April 2021  04:18:02 +0000 (0:00:00.690)       0:00:06.001 ********** 
2021-04-23 04:18:04 | changed: [controller-2] => {"changed": true, "cmd": "set -o pipefail\njq '.iscsid.image = \"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-iscsid:16.1_20210421.1\" | {\"iscsid\": .iscsid }' /var/lib/tripleo-config/docker-container-startup-config-step_3.json >\\\n/var/lib/tripleo-config/docker-container-hybrid_iscsid.json\n", "delta": "0:00:00.010810", "end": "2021-04-23 04:18:03.097911", "rc": 0, "start": "2021-04-23 04:18:03.087101", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
2021-04-23 04:18:04 | changed: [controller-1] => {"changed": true, "cmd": "set -o pipefail\njq '.iscsid.image = \"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-iscsid:16.1_20210421.1\" | {\"iscsid\": .iscsid }' /var/lib/tripleo-config/docker-container-startup-config-step_3.json >\\\n/var/lib/tripleo-config/docker-container-hybrid_iscsid.json\n", "delta": "0:00:00.009339", "end": "2021-04-23 04:18:03.101408", "rc": 0, "start": "2021-04-23 04:18:03.092069", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
2021-04-23 04:18:04 | fatal: [controller-0]: FAILED! => {"changed": true, "cmd": "set -o pipefail\njq '.iscsid.image = \"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-iscsid:16.1_20210421.1\" | {\"iscsid\": .iscsid }' /var/lib/tripleo-config/docker-container-startup-config-step_3.json >\\\n/var/lib/tripleo-config/docker-container-hybrid_iscsid.json\n", "delta": "0:00:00.004323", "end": "2021-04-23 04:18:03.149369", "msg": "non-zero return code", "rc": 127, "start": "2021-04-23 04:18:03.145046", "stderr": "/bin/sh: line 1: jq: command not found", "stderr_lines": ["/bin/sh: line 1: jq: command not found"], "stdout": "", "stdout_lines": []}
2021-04-23 04:18:04 | 
2021-04-23 04:18:04 | NO MORE HOSTS LEFT *************************************************************
2021-04-23 04:18:04 | 
2021-04-23 04:18:04 | PLAY RECAP *********************************************************************
2021-04-23 04:18:04 | compute-0                  : ok=4    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0   
2021-04-23 04:18:04 | compute-1                  : ok=3    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0   
2021-04-23 04:18:04 | controller-0               : ok=5    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   
2021-04-23 04:18:04 | controller-1               : ok=6    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
2021-04-23 04:18:04 | controller-2               : ok=6    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
2021-04-23 04:18:04 | undercloud                 : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Comment 24 spower 2021-04-26 15:08:36 UTC
This was not ready for the 16.1.6 final compose, we are blocker only now. This is being moved to 16.1.7 now and the exception flag will be removed.

Comment 42 errata-xmlrpc 2021-05-26 13:52:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.6 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2097

Comment 43 Tzach Shefi 2021-06-09 07:30:03 UTC
As this passed automation/ci see -> #c34
I've bumped "automate_bug" status to +


Note You need to log in before you can comment on or make changes to this bug.