Bug 1949525
| Summary: | [FFU 13-16.1] n-cpu in hybrid mode unable to attach volumes - iSCSI driver not found. Please make sure it is loaded, and retry the operation | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | MD Sufiyan <msufiyan> |
| Component: | openstack-tripleo-heat-templates | Assignee: | Lukas Bezdicka <lbezdick> |
| Status: | CLOSED ERRATA | QA Contact: | Jason Grosso <jgrosso> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 16.1 (Train) | CC: | ahyder, dasmith, dvd, eglynn, jamsmith, jesse.pretorius, jgrosso, jhakimra, jpretori, kchamart, kthakre, lbezdick, ltoscano, lyarwood, mburns, pgrist, sbauza, sgolovat, sgordon, slinaber, spower, tshefi, vromanso |
| Target Milestone: | z6 | Keywords: | Triaged |
| Target Release: | 16.1 (Train on RHEL 8.2) | Flags: | tshefi:
automate_bug+
|
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | openstack-tripleo-heat-templates-11.3.2-1.20210408163452.el8ost | Doc Type: | Known Issue |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-05-26 13:52:43 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
This smells like an issue with c-vol incorrectly mapping volumes to the compute hosts after the control plane upgrade to 16. Have we tried to shelve and unshelve an example boot from volume instance? If this fails it will confirm that the connection_info being provided by c-api is invalid and could be caused by a failure to map volumes by c-vol. (In reply to Lee Yarwood from comment #3) > This smells like an issue with c-vol incorrectly mapping volumes to the > compute hosts after the control plane upgrade to 16. > > Have we tried to shelve and unshelve an example boot from volume instance? > If this fails it will confirm that the connection_info being provided by > c-api is invalid and could be caused by a failure to map volumes by c-vol. Apologies ignore this, buried within the trace is the underlying reason: `initiator reported error (12 - iSCSI driver not found. Please make sure it is loaded, and retry the operation` As discussed on irc this smells like the following caused by a version mismatch between the host and container: After a minor update, cinder-volume can't create volume from image, they are stuck in downloading https://access.redhat.com/solutions/5504681 Lets confirm the versions used within the nova_compute container and host, what modules are loaded etc and go from there. Failing QA:
2021-04-23 04:18:04 |
2021-04-23 04:18:04 | TASK [Update the iscsid paunch image in config] ********************************
2021-04-23 04:18:04 | Friday 23 April 2021 04:18:02 +0000 (0:00:00.690) 0:00:06.001 **********
2021-04-23 04:18:04 | changed: [controller-2] => {"changed": true, "cmd": "set -o pipefail\njq '.iscsid.image = \"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-iscsid:16.1_20210421.1\" | {\"iscsid\": .iscsid }' /var/lib/tripleo-config/docker-container-startup-config-step_3.json >\\\n/var/lib/tripleo-config/docker-container-hybrid_iscsid.json\n", "delta": "0:00:00.010810", "end": "2021-04-23 04:18:03.097911", "rc": 0, "start": "2021-04-23 04:18:03.087101", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
2021-04-23 04:18:04 | changed: [controller-1] => {"changed": true, "cmd": "set -o pipefail\njq '.iscsid.image = \"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-iscsid:16.1_20210421.1\" | {\"iscsid\": .iscsid }' /var/lib/tripleo-config/docker-container-startup-config-step_3.json >\\\n/var/lib/tripleo-config/docker-container-hybrid_iscsid.json\n", "delta": "0:00:00.009339", "end": "2021-04-23 04:18:03.101408", "rc": 0, "start": "2021-04-23 04:18:03.092069", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
2021-04-23 04:18:04 | fatal: [controller-0]: FAILED! => {"changed": true, "cmd": "set -o pipefail\njq '.iscsid.image = \"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-iscsid:16.1_20210421.1\" | {\"iscsid\": .iscsid }' /var/lib/tripleo-config/docker-container-startup-config-step_3.json >\\\n/var/lib/tripleo-config/docker-container-hybrid_iscsid.json\n", "delta": "0:00:00.004323", "end": "2021-04-23 04:18:03.149369", "msg": "non-zero return code", "rc": 127, "start": "2021-04-23 04:18:03.145046", "stderr": "/bin/sh: line 1: jq: command not found", "stderr_lines": ["/bin/sh: line 1: jq: command not found"], "stdout": "", "stdout_lines": []}
2021-04-23 04:18:04 |
2021-04-23 04:18:04 | NO MORE HOSTS LEFT *************************************************************
2021-04-23 04:18:04 |
2021-04-23 04:18:04 | PLAY RECAP *********************************************************************
2021-04-23 04:18:04 | compute-0 : ok=4 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
2021-04-23 04:18:04 | compute-1 : ok=3 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
2021-04-23 04:18:04 | controller-0 : ok=5 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
2021-04-23 04:18:04 | controller-1 : ok=6 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
2021-04-23 04:18:04 | controller-2 : ok=6 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
2021-04-23 04:18:04 | undercloud : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
This was not ready for the 16.1.6 final compose, we are blocker only now. This is being moved to 16.1.7 now and the exception flag will be removed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.6 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2097 As this passed automation/ci see -> #c34 I've bumped "automate_bug" status to + |
Description of problem: Power-on or cold migration of shutdown instances residing on un-upgraded compute nodes always end up with VolumeDeviceNotFound error >> nova-compute logs ~~~ Command: iscsiadm -m node -T iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 -p 172.x.x.x:3260 --login Exit code: 12 Stdout: 'Logging in to [iface: default, target: iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603, portal: 172.x.x.x,3260]\n' Stderr: 'iscsiadm: Could not login to [iface: default, target: iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603, portal: 172.x.x.x,3260].\ niscsiadm: initiator reported error (12 - iSCSI driver not found. Please make sure it is loaded, and retry the operation)\niscsiadm: Could not log into all portals\n' _ process_cmd /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:490 Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 485, in _process_cmd ret = func(*f_args, **f_kwargs) File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 247, in _wrap return func(*args, **kwargs) File "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line 194, in execute_root return custom_execute(*cmd, shell=False, run_as_root=False, **kwargs) File "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line 143, in custom_execute on_completion=on_completion, *cmd, **kwargs) File "/usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py", line 424, in execute cmd=sanitized_cmd) oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. Command: iscsiadm -m node -T iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 -p 172.x.x.x:3260 --login Exit code: 12 Stdout: 'Logging in to [iface: default, target: iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603, portal: 172.x.x.x,3260]\n' Stderr: 'iscsiadm: Could not login to [iface: default, target: iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603, portal: 172.x.x.x,3260].\niscsiadm: initiator reported error (12 - iSCSI driver not found. Please make sure it is loaded, and retry the operation)\niscsiadm: Could not log into all portals\n' 2021-04-14 15:29:14.546 270 DEBUG oslo.privsep.daemon [-] privsep: reply[140005403599832]: (5, 'oslo_concurrency.processutils.ProcessExecutionError', ('Logging in to [iface: default, target: iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603, portal: 172.x.x.x,3260]\n', 'iscsiadm: Could not login to [iface:default, target: iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603, portal: 172.x.x.x,3260].\niscsiadm: initiator reported error (12 - iSCSI driver not found. Please make sure it is loaded, and retry the operation)\niscsiadm: Could not log into all portals\n', 12, 'iscsiadm -m node -T iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 -p 172.x.x.x:3260 --login', None)) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 2021-04-14 15:29:14.547 8 WARNING os_brick.initiator.connectors.iscsi [req-aaa5d3ce-9496-4bdf-ad05-1da3a3401a21 b0adfbe4065146d5ac91476ccac2f04a 6b2dc8c2170a4728ae0aacfc845abdcf - default default] Failed to login iSCSI target iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 on portal 172.x.x.x:3260 (exitcode 12).: oslo_concurrency.processutils ProcessExecutionError: Unexpected error while running command. 2021-04-14 15:29:14.547 8 WARNING os_brick.initiator.connectors.iscsi [req-aaa5d3ce-9496-4bdf-ad05-1da3a3401a21 b0adfbe4065146d5ac91476ccac2f04a 6b2dc8c2170a4728ae0aacfc845abdcf - default default] Failed to connect to iSCSI portal 172.x.x.x:3260. 2021-04-14 15:29:14.548 8 DEBUG os_brick.initiator.connectors.iscsi [req-aaa5d3ce-9496-4bdf-ad05-1da3a3401a21 b0adfbe4065146d5ac91476ccac2f04a 6b2dc8c2170a4728ae0aacfc845abdcf - default default] Getting connected devices fo(ips,iqns,luns)=(('172.x.x.x:3260', 'iqn.2010-01.com.solidfire:yyj6.uuid-ae75d59d-8aab-475a-ac0b-f025f4c2e200.603', 0),) _get_connection_devices /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:816 2021-04-14 15:29:14.548 270 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m node execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:372 2021-04-14 15:29:14.566 270 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m node" returned: 0 in 0.018s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:409 ~~~ - Connection to win iqn-xx is not establishing while doing a cold-migrate or power on ~~~ grep -i aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee nova-compute.log 2021-04-14 13:36:53.633 8 WARNING os_brick.initiator.connectors.iscsi [req-faa2b034-57ab-4b83-91f7-7a9815e5ae0e bd9d83280a17446ab408302ecd51f442 6b2dc8c2170a4728ae0aacfc845abdcf - default default] Failed to login iSCSI target iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 on portal 172.x.x.x:3260 (exit code 12).: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. 2021-04-14 13:36:55.989 8 WARNING os_brick.initiator.connectors.iscsi [req-faa2b034-57ab-4b83-91f7-7a9815e5ae0e bd9d83280a17446ab408302ecd51f442 6b2dc8c2170a4728ae0aacfc845abdcf - default default] Failed to login iSCSI target iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 on portal 172.x.x.x:3260 (exit code 12).: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. 2021-04-14 13:37:00.366 8 WARNING os_brick.initiator.connectors.iscsi [req-faa2b034-57ab-4b83-91f7-7a9815e5ae0e bd9d83280a17446ab408302ecd51f442 6b2dc8c2170a4728ae0aacfc845abdcf - default default] Failed to login iSCSI target iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 on portal 172.x.x.x:3260 (exit code 12).: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. 2021-04-14 13:37:46.615 8 WARNING os_brick.initiator.connectors.iscsi [req-ab8ef843-a30b-4bcf-8eb8-56c60d056faa bd9d83280a17446ab408302ecd51f442 6b2dc8c2170a4728ae0aacfc845abdcf - default default] Failed to login iSCSI target iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 on portal 172.x.x.x:3260 (exit code 12).: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. 2021-04-14 13:37:49.009 8 WARNING os_brick.initiator.connectors.iscsi [req-ab8ef843-a30b-4bcf-8eb8-56c60d056faa bd9d83280a17446ab408302ecd51f442 6b2dc8c2170a4728ae0aacfc845abdcf - default default] Failed to login iSCSI target iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 on portal 172.x.x.x:3260 (exit code 12).: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. 2021-04-14 13:37:53.421 8 WARNING os_brick.initiator.connectors.iscsi [req-ab8ef843-a30b-4bcf-8eb8-56c60d056faa bd9d83280a17446ab408302ecd51f442 6b2dc8c2170a4728ae0aacfc845abdcf - default default] Failed to login iSCSI target iqn.2010-01.com.solidfire:yyj6.uuid-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.603 on portal 172.x.x.x:3260 (exit code 12).: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. ~~~ - nova-compute is currently using the osp16.1 container as the nova_hybrid task step has been already performed. ~~~ cat sos_commands/docker/docker_ps | grep -i nova 1bedf70e14a1 hostname.domin.com:5000/registry-osp16-1-containers-nova-compute:16.1.3 "dumb-init --singl..." 7 days ago Up 7 days (unhealthy) nova_compute e9b5ac222086 hostname.domin.com:5000/registry-docker-container-nova-compute:13.0-160.1614612488 "dumb-init --singl..." 4 weeks ago Up 4 weeks (healthy) nova_migration_target bcee5c007820 hostname.domin.com:5000/registry-docker-container-nova-libvirt:13.0-170.1614612467 "dumb-init --singl..." 4 weeks ago Up 4 weeks nova_libvirt 8da3e1ec5d05 hostname.domin.com:5000/registry-docker-container-nova-libvirt:13.0-170.1614612467 "dumb-init --singl..." 4 weeks ago Up 4 weeks nova_virtlogd ~~~ ~~~ [root@bl2ol10 nova]# docker exec -it nova_compute rpm -qa |grep -i os-brick python3-os-brick-2.10.5-1.20200916163418.634fb4a.el8ost.noarch ~~~ Storage backend : ~~~ enabled_backends=tripleo_iscsi,solidfire ~~~