Bug 1720561 - [RHOSP 13] iscsi.service on host should be disabled to avoid iscsid is started at host when some stale shutdown happens
Summary: [RHOSP 13] iscsi.service on host should be disabled to avoid iscsid is starte...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z8
: 13.0 (Queens)
Assignee: Pablo Caruana
QA Contact: Sasha Smolyak
URL:
Whiteboard:
Depends On: 1723486
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-06-14 08:31 UTC by Takashi Kajinami
Modified: 2023-12-15 16:33 UTC (History)
9 users (show)

Fixed In Version: openstack-tripleo-heat-templates-8.3.1-58.el7ost
Doc Type: Bug Fix
Doc Text:
Previously, when a stale shutdown happened on a node, the iscsi.service detected remaining information about the previous iscsi connection and recovered connections based on that information. This caused a conflict between the isci.service that runs on the host and the isci.service that runs in a container in Red Hat OpenStack Platform 13. This fix disables the iscsi.service on the host when deploying iscsid in a container, which avoids the conflict.
Clone Of:
: 1723486 (view as bug list)
Environment:
Last Closed: 2019-09-03 16:55:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1833019 0 None None None 2019-06-17 00:50:09 UTC
OpenStack gerrit 665829 0 'None' MERGED Disable iscsi.service to avoid iscsid on host from getting started 2021-01-19 18:20:27 UTC
Red Hat Knowledge Base (Solution) 4228601 0 None None None 2019-09-30 01:57:57 UTC
Red Hat Product Errata RHBA-2019:2624 0 None None None 2019-09-03 16:55:48 UTC

Description Takashi Kajinami 2019-06-14 08:31:06 UTC
Description of problem:

In RHOSP13, we run iscsid running inside container, and disable the one running on host.
However, when some stale shutdown happens, iscsi.service is started when booting that stale node,
and it launches iscsid.service on host.

This makes iscsid container stuck in "Restarting" with the following error.

~~~
Jun 14 17:11:12 compute-1 journal: INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
Jun 14 17:11:12 compute-1 journal: INFO:__main__:Validating config file
Jun 14 17:11:12 compute-1 journal: INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
Jun 14 17:11:12 compute-1 journal: INFO:__main__:Copying service configuration files
Jun 14 17:11:12 compute-1 journal: INFO:__main__:Deleting /etc/iscsi/iscsid.conf
Jun 14 17:11:12 compute-1 journal: INFO:__main__:Copying /var/lib/kolla/config_files/src-iscsid/iscsid.conf to /etc/iscsi/iscsid.conf
Jun 14 17:11:12 compute-1 journal: INFO:__main__:Deleting /etc/iscsi/initiatorname.iscsi
Jun 14 17:11:12 compute-1 journal: INFO:__main__:Copying /var/lib/kolla/config_files/src-iscsid/initiatorname.iscsi to /etc/iscsi/initiatorname.iscsi
Jun 14 17:11:12 compute-1 journal: INFO:__main__:Writing out command to execute
Jun 14 17:11:12 compute-1 journal: ++ cat /run_command
Jun 14 17:11:12 compute-1 journal: Running command: '/usr/sbin/iscsid -f'
Jun 14 17:11:12 compute-1 journal: + CMD='/usr/sbin/iscsid -f'
Jun 14 17:11:12 compute-1 journal: + ARGS=
Jun 14 17:11:12 compute-1 journal: + [[ ! -n '' ]]
Jun 14 17:11:12 compute-1 journal: + . kolla_extend_start
Jun 14 17:11:12 compute-1 journal: ++ [[ ! -f /etc/iscsi/initiatorname.iscsi ]]
Jun 14 17:11:12 compute-1 journal: + echo 'Running command: '\''/usr/sbin/iscsid -f'\'''
Jun 14 17:11:12 compute-1 journal: + exec /usr/sbin/iscsid -f
Jun 14 17:11:12 compute-1 journal: iscsid: Can not bind IPC socket
~~~

Version-Release number of selected component (if applicable):
z5

How reproducible:

Always

Steps to Reproduce:
1. Create an instance, with iscsi cinder volume attached
2. Force reboot the node where the instance is running

Actual results:
iscsi.service launches iscsid.service on host, and iscsid container get stuck in Restarting

Expected results:
iscsid.service on host is not started, and iscsid container get started without any error


Additional info:

We see this issue since we made iscsi session shared by host and container,
to solve shutdown problem of compute nodes.[1]

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1655815

Comment 10 Tzach Shefi 2019-08-18 11:05:05 UTC
Verified on:
openstack-tripleo-heat-templates-8.3.1-72.el7ost.noarch


Using 3par iscsi as Cinder's backend. 

1. Create an instance:
(overcloud) [stack@undercloud-0 ~]$ cinder create 1 --name 3par_iscsi_vol
+--------------------------------+--------------------------------------+
| Property                       | Value                                |
+--------------------------------+--------------------------------------+
| attachments                    | []                                   |
| availability_zone              | nova                                 |
| bootable                       | false                                |
| consistencygroup_id            | None                                 |
| created_at                     | 2019-08-18T10:49:43.000000           |
| description                    | None                                 |
| encrypted                      | False                                |
| id                             | 1f193350-ba00-400c-8501-58d698068055 |
| metadata                       | {}                                   |
| migration_status               | None                                 |
| multiattach                    | False                                |
| name                           | 3par_iscsi_vol                       |
| os-vol-host-attr:host          | controller-0@3par#SSD_r5             |
| os-vol-mig-status-attr:migstat | None                                 |
| os-vol-mig-status-attr:name_id | None                                 |
| os-vol-tenant-attr:tenant_id   | 67844cb7ae4a4d29ad599e53cdeec3f9     |
| replication_status             | None                                 |
| size                           | 1                                    |
| snapshot_id                    | None                                 |
| source_volid                   | None                                 |
| status                         | available                            |
| updated_at                     | 2019-08-18T10:49:44.000000           |
| user_id                        | 767dfd54ba6d49aebe01b7f4edb9725c     |
| volume_type                    | tripleo                              |
+--------------------------------+--------------------------------------+



2. Booted an instance on compute-0:
(overcloud) [stack@undercloud-0 ~]$ nova show inst1
+--------------------------------------+----------------------------------------------------------+
| Property                             | Value                                                    |
+--------------------------------------+----------------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                                   |
| OS-EXT-AZ:availability_zone          | nova                                                     |
| OS-EXT-SRV-ATTR:host                 | compute-0.localdomain                                    |
| OS-EXT-SRV-ATTR:hostname             | inst1                                                    |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | compute-0.localdomain                                    |
| OS-EXT-SRV-ATTR:instance_name        | instance-00000002                                        |
| OS-EXT-STS:power_state               | 1                                                        |
| OS-EXT-STS:task_state                | -                                                        |
| OS-EXT-STS:vm_state                  | active                                                   |
| description                          | inst1                                                    |

3. Current status of iscsid before attaching instance:
[root@compute-0 ~]# systemctl -a | grep iscsid
  iscsid.service                                                                                        loaded    inactive dead      Open-iSCSI

status of iscsid:
[root@compute-0 ~]# docker ps | grep iscsi
bdaeea3efdc0        192.168.24.1:8787/rhosp13/openstack-iscsid:2019-08-13.1                      "kolla_start"       2 days ago          Up 5 hours (healthy)                       iscsid



4. Attach volume to instance:
(overcloud) [stack@undercloud-0 ~]$ nova volume-attach inst1 1f193350-ba00-400c-8501-58d698068055 auto
+----------+--------------------------------------+
| Property | Value                                |
+----------+--------------------------------------+
| device   | /dev/vdb                             |
| id       | 1f193350-ba00-400c-8501-58d698068055 |
| serverId | b3b27bbe-87f4-44ad-a846-a1f9363ef0cb |
| volumeId | 1f193350-ba00-400c-8501-58d698068055 |
+----------+--------------------------------------+


5. Verify that Cinder volume is attached:
(overcloud) [stack@undercloud-0 ~]$ cinder list
+--------------------------------------+--------+----------------+------+-------------+----------+--------------------------------------+
| ID                                   | Status | Name           | Size | Volume Type | Bootable | Attached to                          |
+--------------------------------------+--------+----------------+------+-------------+----------+--------------------------------------+
| 1f193350-ba00-400c-8501-58d698068055 | in-use | 3par_iscsi_vol | 1    | tripleo     | false    | b3b27bbe-87f4-44ad-a846-a1f9363ef0cb |
+--------------------------------------+--------+----------------+------+-------------+----------+--------------------------------------+

6. Force compute reboot 
[root@compute-0 ~]# sudo shutdown -r now
Connection to 192.168.24.14 closed by remote host.
Connection to 192.168.24.14 closed.


7. Wait for host to reboot and check status of iscsid service and docker. 

Service should remain down:
(undercloud) [stack@undercloud-0 ~]$ ssh heat-admin.24.14
Warning: Permanently added '192.168.24.14' (ECDSA) to the list of known hosts.
Last login: Sun Aug 18 10:50:48 2019 from 192.168.24.1
[heat-admin@compute-0 ~]$ sudo -i
[root@compute-0 ~]# systemctl -a | grep iscsid
  iscsid.service                                                                                        loaded    inactive dead      Open-iSCSI

Service remains down. 

Docker should remain up and health:
[root@compute-0 ~]# docker ps | grep iscsi
bdaeea3efdc0        192.168.24.1:8787/rhosp13/openstack-iscsid:2019-08-13.1                      "kolla_start"       2 days ago          Up About a minute (healthy)                       iscsid

Docker is up. 
Wait a few minutes and recheck docker status should remain up and more than 1 minute

[root@compute-0 ~]# docker ps | grep iscsi
bdaeea3efdc0        192.168.24.1:8787/rhosp13/openstack-iscsid:2019-08-13.1                      "kolla_start"       2 days ago          Up 2 minutes (healthy)                       iscsid

up 2 min looking better. 


8. Restart instance
(overcloud) [stack@undercloud-0 ~]$ nova start inst1
Request to start server inst1 has been accepted.

iscsi docker still up (good)
[root@compute-0 ~]# docker ps | grep iscsi
bdaeea3efdc0        192.168.24.1:8787/rhosp13/openstack-iscsid:2019-08-13.1                      "kolla_start"       2 days ago          Up 3 minutes (healthy)                       iscsid



9. Check volume status should reattach:
(overcloud) [stack@undercloud-0 ~]$ cinder list
+--------------------------------------+--------+----------------+------+-------------+----------+--------------------------------------+
| ID                                   | Status | Name           | Size | Volume Type | Bootable | Attached to                          |
+--------------------------------------+--------+----------------+------+-------------+----------+--------------------------------------+
| 1f193350-ba00-400c-8501-58d698068055 | in-use | 3par_iscsi_vol | 1    | tripleo     | false    | b3b27bbe-87f4-44ad-a846-a1f9363ef0cb |
+--------------------------------------+--------+----------------+------+-------------+----------+--------------------------------------+


All looks good after compute host reboot iscsid service remains down. 
iscsid docker remains up 
instance booted and attached to instance. 

One last check of service and docker status:
[root@compute-0 ~]# systemctl -a | grep iscsid
  iscsid.service 

[root@compute-0 ~]# docker ps | grep iscsi
bdaeea3efdc0        192.168.24.1:8787/rhosp13/openstack-iscsid:2019-08-13.1                      "kolla_start"       2 days ago          Up 4 minutes (healthy)                       iscsid

Both remain as should be, service down and docker up. 

Good to verify.

Comment 12 errata-xmlrpc 2019-09-03 16:55:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2624


Note You need to log in before you can comment on or make changes to this bug.