Bug 1720561

Summary: [RHOSP 13] iscsi.service on host should be disabled to avoid iscsid is started at host when some stale shutdown happens
Product: Red Hat OpenStack Reporter: Takashi Kajinami <tkajinam>
Component: openstack-tripleo-heat-templatesAssignee: Pablo Caruana <pcaruana>
Status: CLOSED ERRATA QA Contact: Sasha Smolyak <ssmolyak>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: aschultz, cschwede, knoha, mburns, ndeevy, pcaruana, slinaber, tenobreg, tshefi
Target Milestone: z8Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-8.3.1-58.el7ost Doc Type: Bug Fix
Doc Text:
Previously, when a stale shutdown happened on a node, the iscsi.service detected remaining information about the previous iscsi connection and recovered connections based on that information. This caused a conflict between the isci.service that runs on the host and the isci.service that runs in a container in Red Hat OpenStack Platform 13. This fix disables the iscsi.service on the host when deploying iscsid in a container, which avoids the conflict.
Story Points: ---
Clone Of:
: 1723486 (view as bug list) Environment:
Last Closed: 2019-09-03 16:55:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1723486    
Bug Blocks:    

Description Takashi Kajinami 2019-06-14 08:31:06 UTC
Description of problem:

In RHOSP13, we run iscsid running inside container, and disable the one running on host.
However, when some stale shutdown happens, iscsi.service is started when booting that stale node,
and it launches iscsid.service on host.

This makes iscsid container stuck in "Restarting" with the following error.

~~~
Jun 14 17:11:12 compute-1 journal: INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
Jun 14 17:11:12 compute-1 journal: INFO:__main__:Validating config file
Jun 14 17:11:12 compute-1 journal: INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
Jun 14 17:11:12 compute-1 journal: INFO:__main__:Copying service configuration files
Jun 14 17:11:12 compute-1 journal: INFO:__main__:Deleting /etc/iscsi/iscsid.conf
Jun 14 17:11:12 compute-1 journal: INFO:__main__:Copying /var/lib/kolla/config_files/src-iscsid/iscsid.conf to /etc/iscsi/iscsid.conf
Jun 14 17:11:12 compute-1 journal: INFO:__main__:Deleting /etc/iscsi/initiatorname.iscsi
Jun 14 17:11:12 compute-1 journal: INFO:__main__:Copying /var/lib/kolla/config_files/src-iscsid/initiatorname.iscsi to /etc/iscsi/initiatorname.iscsi
Jun 14 17:11:12 compute-1 journal: INFO:__main__:Writing out command to execute
Jun 14 17:11:12 compute-1 journal: ++ cat /run_command
Jun 14 17:11:12 compute-1 journal: Running command: '/usr/sbin/iscsid -f'
Jun 14 17:11:12 compute-1 journal: + CMD='/usr/sbin/iscsid -f'
Jun 14 17:11:12 compute-1 journal: + ARGS=
Jun 14 17:11:12 compute-1 journal: + [[ ! -n '' ]]
Jun 14 17:11:12 compute-1 journal: + . kolla_extend_start
Jun 14 17:11:12 compute-1 journal: ++ [[ ! -f /etc/iscsi/initiatorname.iscsi ]]
Jun 14 17:11:12 compute-1 journal: + echo 'Running command: '\''/usr/sbin/iscsid -f'\'''
Jun 14 17:11:12 compute-1 journal: + exec /usr/sbin/iscsid -f
Jun 14 17:11:12 compute-1 journal: iscsid: Can not bind IPC socket
~~~

Version-Release number of selected component (if applicable):
z5

How reproducible:

Always

Steps to Reproduce:
1. Create an instance, with iscsi cinder volume attached
2. Force reboot the node where the instance is running

Actual results:
iscsi.service launches iscsid.service on host, and iscsid container get stuck in Restarting

Expected results:
iscsid.service on host is not started, and iscsid container get started without any error


Additional info:

We see this issue since we made iscsi session shared by host and container,
to solve shutdown problem of compute nodes.[1]

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1655815

Comment 10 Tzach Shefi 2019-08-18 11:05:05 UTC
Verified on:
openstack-tripleo-heat-templates-8.3.1-72.el7ost.noarch


Using 3par iscsi as Cinder's backend. 

1. Create an instance:
(overcloud) [stack@undercloud-0 ~]$ cinder create 1 --name 3par_iscsi_vol
+--------------------------------+--------------------------------------+
| Property                       | Value                                |
+--------------------------------+--------------------------------------+
| attachments                    | []                                   |
| availability_zone              | nova                                 |
| bootable                       | false                                |
| consistencygroup_id            | None                                 |
| created_at                     | 2019-08-18T10:49:43.000000           |
| description                    | None                                 |
| encrypted                      | False                                |
| id                             | 1f193350-ba00-400c-8501-58d698068055 |
| metadata                       | {}                                   |
| migration_status               | None                                 |
| multiattach                    | False                                |
| name                           | 3par_iscsi_vol                       |
| os-vol-host-attr:host          | controller-0@3par#SSD_r5             |
| os-vol-mig-status-attr:migstat | None                                 |
| os-vol-mig-status-attr:name_id | None                                 |
| os-vol-tenant-attr:tenant_id   | 67844cb7ae4a4d29ad599e53cdeec3f9     |
| replication_status             | None                                 |
| size                           | 1                                    |
| snapshot_id                    | None                                 |
| source_volid                   | None                                 |
| status                         | available                            |
| updated_at                     | 2019-08-18T10:49:44.000000           |
| user_id                        | 767dfd54ba6d49aebe01b7f4edb9725c     |
| volume_type                    | tripleo                              |
+--------------------------------+--------------------------------------+



2. Booted an instance on compute-0:
(overcloud) [stack@undercloud-0 ~]$ nova show inst1
+--------------------------------------+----------------------------------------------------------+
| Property                             | Value                                                    |
+--------------------------------------+----------------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                                   |
| OS-EXT-AZ:availability_zone          | nova                                                     |
| OS-EXT-SRV-ATTR:host                 | compute-0.localdomain                                    |
| OS-EXT-SRV-ATTR:hostname             | inst1                                                    |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | compute-0.localdomain                                    |
| OS-EXT-SRV-ATTR:instance_name        | instance-00000002                                        |
| OS-EXT-STS:power_state               | 1                                                        |
| OS-EXT-STS:task_state                | -                                                        |
| OS-EXT-STS:vm_state                  | active                                                   |
| description                          | inst1                                                    |

3. Current status of iscsid before attaching instance:
[root@compute-0 ~]# systemctl -a | grep iscsid
  iscsid.service                                                                                        loaded    inactive dead      Open-iSCSI

status of iscsid:
[root@compute-0 ~]# docker ps | grep iscsi
bdaeea3efdc0        192.168.24.1:8787/rhosp13/openstack-iscsid:2019-08-13.1                      "kolla_start"       2 days ago          Up 5 hours (healthy)                       iscsid



4. Attach volume to instance:
(overcloud) [stack@undercloud-0 ~]$ nova volume-attach inst1 1f193350-ba00-400c-8501-58d698068055 auto
+----------+--------------------------------------+
| Property | Value                                |
+----------+--------------------------------------+
| device   | /dev/vdb                             |
| id       | 1f193350-ba00-400c-8501-58d698068055 |
| serverId | b3b27bbe-87f4-44ad-a846-a1f9363ef0cb |
| volumeId | 1f193350-ba00-400c-8501-58d698068055 |
+----------+--------------------------------------+


5. Verify that Cinder volume is attached:
(overcloud) [stack@undercloud-0 ~]$ cinder list
+--------------------------------------+--------+----------------+------+-------------+----------+--------------------------------------+
| ID                                   | Status | Name           | Size | Volume Type | Bootable | Attached to                          |
+--------------------------------------+--------+----------------+------+-------------+----------+--------------------------------------+
| 1f193350-ba00-400c-8501-58d698068055 | in-use | 3par_iscsi_vol | 1    | tripleo     | false    | b3b27bbe-87f4-44ad-a846-a1f9363ef0cb |
+--------------------------------------+--------+----------------+------+-------------+----------+--------------------------------------+

6. Force compute reboot 
[root@compute-0 ~]# sudo shutdown -r now
Connection to 192.168.24.14 closed by remote host.
Connection to 192.168.24.14 closed.


7. Wait for host to reboot and check status of iscsid service and docker. 

Service should remain down:
(undercloud) [stack@undercloud-0 ~]$ ssh heat-admin.24.14
Warning: Permanently added '192.168.24.14' (ECDSA) to the list of known hosts.
Last login: Sun Aug 18 10:50:48 2019 from 192.168.24.1
[heat-admin@compute-0 ~]$ sudo -i
[root@compute-0 ~]# systemctl -a | grep iscsid
  iscsid.service                                                                                        loaded    inactive dead      Open-iSCSI

Service remains down. 

Docker should remain up and health:
[root@compute-0 ~]# docker ps | grep iscsi
bdaeea3efdc0        192.168.24.1:8787/rhosp13/openstack-iscsid:2019-08-13.1                      "kolla_start"       2 days ago          Up About a minute (healthy)                       iscsid

Docker is up. 
Wait a few minutes and recheck docker status should remain up and more than 1 minute

[root@compute-0 ~]# docker ps | grep iscsi
bdaeea3efdc0        192.168.24.1:8787/rhosp13/openstack-iscsid:2019-08-13.1                      "kolla_start"       2 days ago          Up 2 minutes (healthy)                       iscsid

up 2 min looking better. 


8. Restart instance
(overcloud) [stack@undercloud-0 ~]$ nova start inst1
Request to start server inst1 has been accepted.

iscsi docker still up (good)
[root@compute-0 ~]# docker ps | grep iscsi
bdaeea3efdc0        192.168.24.1:8787/rhosp13/openstack-iscsid:2019-08-13.1                      "kolla_start"       2 days ago          Up 3 minutes (healthy)                       iscsid



9. Check volume status should reattach:
(overcloud) [stack@undercloud-0 ~]$ cinder list
+--------------------------------------+--------+----------------+------+-------------+----------+--------------------------------------+
| ID                                   | Status | Name           | Size | Volume Type | Bootable | Attached to                          |
+--------------------------------------+--------+----------------+------+-------------+----------+--------------------------------------+
| 1f193350-ba00-400c-8501-58d698068055 | in-use | 3par_iscsi_vol | 1    | tripleo     | false    | b3b27bbe-87f4-44ad-a846-a1f9363ef0cb |
+--------------------------------------+--------+----------------+------+-------------+----------+--------------------------------------+


All looks good after compute host reboot iscsid service remains down. 
iscsid docker remains up 
instance booted and attached to instance. 

One last check of service and docker status:
[root@compute-0 ~]# systemctl -a | grep iscsid
  iscsid.service 

[root@compute-0 ~]# docker ps | grep iscsi
bdaeea3efdc0        192.168.24.1:8787/rhosp13/openstack-iscsid:2019-08-13.1                      "kolla_start"       2 days ago          Up 4 minutes (healthy)                       iscsid

Both remain as should be, service down and docker up. 

Good to verify.

Comment 12 errata-xmlrpc 2019-09-03 16:55:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2624