Description of problem: In RHOSP13, we run iscsid running inside container, and disable the one running on host. However, when some stale shutdown happens, iscsi.service is started when booting that stale node, and it launches iscsid.service on host. This makes iscsid container stuck in "Restarting" with the following error. ~~~ Jun 14 17:11:12 compute-1 journal: INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json Jun 14 17:11:12 compute-1 journal: INFO:__main__:Validating config file Jun 14 17:11:12 compute-1 journal: INFO:__main__:Kolla config strategy set to: COPY_ALWAYS Jun 14 17:11:12 compute-1 journal: INFO:__main__:Copying service configuration files Jun 14 17:11:12 compute-1 journal: INFO:__main__:Deleting /etc/iscsi/iscsid.conf Jun 14 17:11:12 compute-1 journal: INFO:__main__:Copying /var/lib/kolla/config_files/src-iscsid/iscsid.conf to /etc/iscsi/iscsid.conf Jun 14 17:11:12 compute-1 journal: INFO:__main__:Deleting /etc/iscsi/initiatorname.iscsi Jun 14 17:11:12 compute-1 journal: INFO:__main__:Copying /var/lib/kolla/config_files/src-iscsid/initiatorname.iscsi to /etc/iscsi/initiatorname.iscsi Jun 14 17:11:12 compute-1 journal: INFO:__main__:Writing out command to execute Jun 14 17:11:12 compute-1 journal: ++ cat /run_command Jun 14 17:11:12 compute-1 journal: Running command: '/usr/sbin/iscsid -f' Jun 14 17:11:12 compute-1 journal: + CMD='/usr/sbin/iscsid -f' Jun 14 17:11:12 compute-1 journal: + ARGS= Jun 14 17:11:12 compute-1 journal: + [[ ! -n '' ]] Jun 14 17:11:12 compute-1 journal: + . kolla_extend_start Jun 14 17:11:12 compute-1 journal: ++ [[ ! -f /etc/iscsi/initiatorname.iscsi ]] Jun 14 17:11:12 compute-1 journal: + echo 'Running command: '\''/usr/sbin/iscsid -f'\''' Jun 14 17:11:12 compute-1 journal: + exec /usr/sbin/iscsid -f Jun 14 17:11:12 compute-1 journal: iscsid: Can not bind IPC socket ~~~ Version-Release number of selected component (if applicable): z5 How reproducible: Always Steps to Reproduce: 1. Create an instance, with iscsi cinder volume attached 2. Force reboot the node where the instance is running Actual results: iscsi.service launches iscsid.service on host, and iscsid container get stuck in Restarting Expected results: iscsid.service on host is not started, and iscsid container get started without any error Additional info: We see this issue since we made iscsi session shared by host and container, to solve shutdown problem of compute nodes.[1] [1] https://bugzilla.redhat.com/show_bug.cgi?id=1655815
Verified on: openstack-tripleo-heat-templates-8.3.1-72.el7ost.noarch Using 3par iscsi as Cinder's backend. 1. Create an instance: (overcloud) [stack@undercloud-0 ~]$ cinder create 1 --name 3par_iscsi_vol +--------------------------------+--------------------------------------+ | Property | Value | +--------------------------------+--------------------------------------+ | attachments | [] | | availability_zone | nova | | bootable | false | | consistencygroup_id | None | | created_at | 2019-08-18T10:49:43.000000 | | description | None | | encrypted | False | | id | 1f193350-ba00-400c-8501-58d698068055 | | metadata | {} | | migration_status | None | | multiattach | False | | name | 3par_iscsi_vol | | os-vol-host-attr:host | controller-0@3par#SSD_r5 | | os-vol-mig-status-attr:migstat | None | | os-vol-mig-status-attr:name_id | None | | os-vol-tenant-attr:tenant_id | 67844cb7ae4a4d29ad599e53cdeec3f9 | | replication_status | None | | size | 1 | | snapshot_id | None | | source_volid | None | | status | available | | updated_at | 2019-08-18T10:49:44.000000 | | user_id | 767dfd54ba6d49aebe01b7f4edb9725c | | volume_type | tripleo | +--------------------------------+--------------------------------------+ 2. Booted an instance on compute-0: (overcloud) [stack@undercloud-0 ~]$ nova show inst1 +--------------------------------------+----------------------------------------------------------+ | Property | Value | +--------------------------------------+----------------------------------------------------------+ | OS-DCF:diskConfig | MANUAL | | OS-EXT-AZ:availability_zone | nova | | OS-EXT-SRV-ATTR:host | compute-0.localdomain | | OS-EXT-SRV-ATTR:hostname | inst1 | | OS-EXT-SRV-ATTR:hypervisor_hostname | compute-0.localdomain | | OS-EXT-SRV-ATTR:instance_name | instance-00000002 | | OS-EXT-STS:power_state | 1 | | OS-EXT-STS:task_state | - | | OS-EXT-STS:vm_state | active | | description | inst1 | 3. Current status of iscsid before attaching instance: [root@compute-0 ~]# systemctl -a | grep iscsid iscsid.service loaded inactive dead Open-iSCSI status of iscsid: [root@compute-0 ~]# docker ps | grep iscsi bdaeea3efdc0 192.168.24.1:8787/rhosp13/openstack-iscsid:2019-08-13.1 "kolla_start" 2 days ago Up 5 hours (healthy) iscsid 4. Attach volume to instance: (overcloud) [stack@undercloud-0 ~]$ nova volume-attach inst1 1f193350-ba00-400c-8501-58d698068055 auto +----------+--------------------------------------+ | Property | Value | +----------+--------------------------------------+ | device | /dev/vdb | | id | 1f193350-ba00-400c-8501-58d698068055 | | serverId | b3b27bbe-87f4-44ad-a846-a1f9363ef0cb | | volumeId | 1f193350-ba00-400c-8501-58d698068055 | +----------+--------------------------------------+ 5. Verify that Cinder volume is attached: (overcloud) [stack@undercloud-0 ~]$ cinder list +--------------------------------------+--------+----------------+------+-------------+----------+--------------------------------------+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--------------------------------------+--------+----------------+------+-------------+----------+--------------------------------------+ | 1f193350-ba00-400c-8501-58d698068055 | in-use | 3par_iscsi_vol | 1 | tripleo | false | b3b27bbe-87f4-44ad-a846-a1f9363ef0cb | +--------------------------------------+--------+----------------+------+-------------+----------+--------------------------------------+ 6. Force compute reboot [root@compute-0 ~]# sudo shutdown -r now Connection to 192.168.24.14 closed by remote host. Connection to 192.168.24.14 closed. 7. Wait for host to reboot and check status of iscsid service and docker. Service should remain down: (undercloud) [stack@undercloud-0 ~]$ ssh heat-admin.24.14 Warning: Permanently added '192.168.24.14' (ECDSA) to the list of known hosts. Last login: Sun Aug 18 10:50:48 2019 from 192.168.24.1 [heat-admin@compute-0 ~]$ sudo -i [root@compute-0 ~]# systemctl -a | grep iscsid iscsid.service loaded inactive dead Open-iSCSI Service remains down. Docker should remain up and health: [root@compute-0 ~]# docker ps | grep iscsi bdaeea3efdc0 192.168.24.1:8787/rhosp13/openstack-iscsid:2019-08-13.1 "kolla_start" 2 days ago Up About a minute (healthy) iscsid Docker is up. Wait a few minutes and recheck docker status should remain up and more than 1 minute [root@compute-0 ~]# docker ps | grep iscsi bdaeea3efdc0 192.168.24.1:8787/rhosp13/openstack-iscsid:2019-08-13.1 "kolla_start" 2 days ago Up 2 minutes (healthy) iscsid up 2 min looking better. 8. Restart instance (overcloud) [stack@undercloud-0 ~]$ nova start inst1 Request to start server inst1 has been accepted. iscsi docker still up (good) [root@compute-0 ~]# docker ps | grep iscsi bdaeea3efdc0 192.168.24.1:8787/rhosp13/openstack-iscsid:2019-08-13.1 "kolla_start" 2 days ago Up 3 minutes (healthy) iscsid 9. Check volume status should reattach: (overcloud) [stack@undercloud-0 ~]$ cinder list +--------------------------------------+--------+----------------+------+-------------+----------+--------------------------------------+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--------------------------------------+--------+----------------+------+-------------+----------+--------------------------------------+ | 1f193350-ba00-400c-8501-58d698068055 | in-use | 3par_iscsi_vol | 1 | tripleo | false | b3b27bbe-87f4-44ad-a846-a1f9363ef0cb | +--------------------------------------+--------+----------------+------+-------------+----------+--------------------------------------+ All looks good after compute host reboot iscsid service remains down. iscsid docker remains up instance booted and attached to instance. One last check of service and docker status: [root@compute-0 ~]# systemctl -a | grep iscsid iscsid.service [root@compute-0 ~]# docker ps | grep iscsi bdaeea3efdc0 192.168.24.1:8787/rhosp13/openstack-iscsid:2019-08-13.1 "kolla_start" 2 days ago Up 4 minutes (healthy) iscsid Both remain as should be, service down and docker up. Good to verify.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2624