Description of problem: blockdev fails to see logged iscsi target - device of a volume attached to an instance after the instance is live migrated to different node and nova-rootwrap fails (see traceback below): - iscsiadm -m session shows the target to be logged - It's even possible to display the files with ls: # ll /dev/disk/by-path/ip-192.0.2.9:3260-iscsi-iqn.2010-10.org.openstack:volume-1a6cc490-1c74-41ea-95b4-2b4a106f534d-lun-0 lrwxrwxrwx. 1 root root 9 Jul 17 13:30 /dev/disk/by-path/ip-192.0.2.9:3260-iscsi-iqn.2010-10.org.openstack:volume-1a6cc490-1c74-41ea-95b4-2b4a106f534d-lun-0 -> ../../sdb # ll /dev/sdb brw-rw----. 1 qemu qemu 8, 16 Jul 17 13:30 /dev/sdb - And even instance seems to be started correctly and nova list returns ACTIVE status for the VM. - default configuration values of cinder were used (iscsi_helper=tgtadm) - NFS was used as shared storage for instances. Version-Release number of selected component (if applicable): $ rpm -qa | grep openstack openstack-dashboard-theme-2015.1.0-10.el7ost.noarch openstack-ceilometer-common-2015.1.0-6.el7ost.noarch openstack-ceilometer-alarm-2015.1.0-6.el7ost.noarch openstack-neutron-ml2-2015.1.0-11.el7ost.noarch openstack-swift-proxy-2.3.0-1.el7ost.noarch openstack-neutron-2015.1.0-11.el7ost.noarch openstack-heat-common-2015.1.0-4.el7ost.noarch openstack-heat-api-cfn-2015.1.0-4.el7ost.noarch openstack-nova-api-2015.1.0-14.el7ost.noarch openstack-keystone-2015.1.0-4.el7ost.noarch openstack-swift-object-2.3.0-1.el7ost.noarch python-django-openstack-auth-1.2.0-3.el7ost.noarch redhat-access-plugin-openstack-7.0.0-0.el7ost.noarch openstack-nova-compute-2015.1.0-14.el7ost.noarch openstack-ceilometer-central-2015.1.0-6.el7ost.noarch openstack-heat-api-2015.1.0-4.el7ost.noarch openstack-nova-cert-2015.1.0-14.el7ost.noarch openstack-nova-scheduler-2015.1.0-14.el7ost.noarch openstack-glance-2015.1.0-6.el7ost.noarch openstack-neutron-lbaas-2015.1.0-5.el7ost.noarch openstack-selinux-0.6.35-3.el7ost.noarch openstack-swift-2.3.0-1.el7ost.noarch openstack-nova-common-2015.1.0-14.el7ost.noarch openstack-ceilometer-collector-2015.1.0-6.el7ost.noarch openstack-ceilometer-compute-2015.1.0-6.el7ost.noarch openstack-heat-api-cloudwatch-2015.1.0-4.el7ost.noarch openstack-nova-conductor-2015.1.0-14.el7ost.noarch openstack-cinder-2015.1.0-3.el7ost.noarch openstack-neutron-metering-agent-2015.1.0-11.el7ost.noarch openstack-swift-container-2.3.0-1.el7ost.noarch python-openstackclient-1.0.3-2.el7ost.noarch openstack-puppet-modules-2015.1.8-3.el7ost.noarch openstack-swift-plugin-swift3-1.7-3.el7ost.noarch openstack-neutron-common-2015.1.0-11.el7ost.noarch openstack-heat-engine-2015.1.0-4.el7ost.noarch openstack-nova-novncproxy-2015.1.0-14.el7ost.noarch openstack-neutron-openvswitch-2015.1.0-11.el7ost.noarch openstack-swift-account-2.3.0-1.el7ost.noarch openstack-dashboard-2015.1.0-10.el7ost.noarch openstack-ceilometer-notification-2015.1.0-6.el7ost.noarch openstack-ceilometer-api-2015.1.0-6.el7ost.noarch openstack-nova-console-2015.1.0-14.el7ost.noarch openstack-utils-2014.2-1.el7ost.noarch How reproducible: Always Steps to Reproduce: 1. Attach a volume to an instance which uses NFS as shared storage. 2. Live-Migrate the instance to a different node. Actual results: nova-rootwrap fails - blockdev reports that there is no such device even though the iscsi target is logged and ls command can list the files. Expected results: Additional info: Command: sudo nova-rootwrap /etc/nova/rootwrap.conf blockdev --getsize64 /dev/disk/by-p ath/ip-192.0.2.9:3260-iscsi-iqn.2010-10.org.openstack:volume-1a6cc490-1c74-41ea-95b4-2b 4a106f534d-lun-0 Exit code: 1 Stdout: u'' Stderr: u'blockdev: cannot open /dev/disk/by-path/ip-192.0.2.9:3260-iscsi-iqn.2010-10.o rg.openstack:volume-1a6cc490-1c74-41ea-95b4-2b4a106f534d-lun-0: No such device or addre ss\n' 2015-07-17 13:42:28.077 1416 TRACE nova.openstack.common.periodic_task Traceback (most recent call last): 2015-07-17 13:42:28.077 1416 TRACE nova.openstack.common.periodic_task File "/usr/lib /python2.7/site-packages/nova/openstack/common/periodic_task.py", line 224, in run_peri odic_tasks 2015-07-17 13:42:28.077 1416 TRACE nova.openstack.common.periodic_task task(self, c ontext) 2015-07-17 13:42:28.077 1416 TRACE nova.openstack.common.periodic_task File "/usr/lib /python2.7/site-packages/nova/compute/manager.py", line 6247, in update_available_resou rce 2015-07-17 13:42:28.077 1416 TRACE nova.openstack.common.periodic_task rt.update_av ailable_resource(context) 2015-07-17 13:42:28.077 1416 TRACE nova.openstack.common.periodic_task File "/usr/lib /python2.7/site-packages/nova/compute/resource_tracker.py", line 376, in update_availab le_resource 2015-07-17 13:42:28.077 1416 TRACE nova.openstack.common.periodic_task resources = self.driver.get_available_resource(self.nodename) 2015-07-17 13:42:28.077 1416 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5006, in get_available_resource 2015-07-17 13:42:28.077 1416 TRACE nova.openstack.common.periodic_task disk_over_committed = self._get_disk_over_committed_size_total() 2015-07-17 13:42:28.077 1416 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6192, in _get_disk_over_committed_size_total 2015-07-17 13:42:28.077 1416 TRACE nova.openstack.common.periodic_task self._get_instance_disk_info(dom.name(), xml)) 2015-07-17 13:42:28.077 1416 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6145, in _get_instance_disk_info 2015-07-17 13:42:28.077 1416 TRACE nova.openstack.common.periodic_task dk_size = lvm.get_volume_size(path) 2015-07-17 13:42:28.077 1416 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/lvm.py", line 172, in get_volume_size 2015-07-17 13:42:28.077 1416 TRACE nova.openstack.common.periodic_task run_as_root=True) 2015-07-17 13:42:28.077 1416 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/utils.py", line 55, in execute 2015-07-17 13:42:28.077 1416 TRACE nova.openstack.common.periodic_task return utils.execute(*args, **kwargs) 2015-07-17 13:42:28.077 1416 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/utils.py", line 213, in execute 2015-07-17 13:42:28.077 1416 TRACE nova.openstack.common.periodic_task return processutils.execute(*cmd, **kwargs) 2015-07-17 13:42:28.077 1416 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py", line 233, in execute 2015-07-17 13:42:28.077 1416 TRACE nova.openstack.common.periodic_task cmd=sanitized_cmd) 2015-07-17 13:42:28.077 1416 TRACE nova.openstack.common.periodic_task ProcessExecutionError: Unexpected error while running command. 2015-07-17 13:42:28.077 1416 TRACE nova.openstack.common.periodic_task Command: sudo nova-rootwrap /etc/nova/rootwrap.conf blockdev --getsize64 /dev/disk/by-path/ip-192.0.2.9:3260-iscsi-iqn.2010-10.org.openstack:volume-1a6cc490-1c74-41ea-95b4-2b4a106f534d-lun-0 2015-07-17 13:42:28.077 1416 TRACE nova.openstack.common.periodic_task Exit code: 1 2015-07-17 13:42:28.077 1416 TRACE nova.openstack.common.periodic_task Stdout: u'' 2015-07-17 13:42:28.077 1416 TRACE nova.openstack.common.periodic_task Stderr: u'blockdev: cannot open /dev/disk/by-path/ip-192.0.2.9:3260-iscsi-iqn.2010-10.org.openstack:volume-1a6cc490-1c74-41ea-95b4-2b4a106f534d-lun-0: No such device or address\n
I am changing the component to OSP director. The main reason is that I cannot see that bug on environment which was created by packstack. More details. It seems blockdev cannot see the device because authentication to ISCSI target after live migration failed - I can see created ACL for the particular target on the controller with using targetcli tool before migrating instance, once the instance with attached volume is migrated, there is no ACL created for the iscsi target. I have no idea what component is responsible for setting the authentication properly but I noticed that ISCSI initiator name of all compute nodes is the same - Not sure this is made purposely and has any impact but on packstack setup ISCSi initiator names of compute nodes are different - initiator name should be unique anyway. That's why I am moving for triage to osp-d. My setup is based on virt-env. Before migration: o- iscsi ..........................................................[Targets: 4] o- iqn.2010-10.org.openstack:volume-11218701-7f0b-4431-ade7-101c7cf20c6e [TPGs: 1] | o- tpg1 ........................................[no-gen-acls, auth per-acl] | o- acls......................................................... [ACLs: 1] | | o- iqn.1994-05.com.redhat:4a52e5aa22c ..... [1-way auth, Mapped LUNs: 1] | | o- mapped_lun0 [lun0 block/iqn.2010-10.org.openstack:volume-11218701-7f0b-4431-ade7-101c7cf20c6e (rw)] | o- luns ...................................................... [LUNs: 1] | | o- lun0 [block/iqn.2010-10.org.openstack:volume-11218701-7f0b-4431-ade7-101c7cf20c6e (/dev/cinder-volumes/volume-11218701-7f0b-4431-ade7-101c7cf20c6e)] | o- portals .................................................................. [Portals: 1] | o- 0.0.0.0:3260 ........................................................................... [OK] o- iqn.2010-10.org.openstack:volume-1a6cc490-1c74-41ea-95b4-2b4a106f534d .................... [TPGs: 1] After migration: o- iscsi ........................................................ [Targets: 4] o- iqn.2010-10.org.openstack:volume-11218701-7f0b-4431-ade7-101c7cf20c6e .................... [TPGs: 1] | o- tpg1 ....................................... [no-gen-acls, auth per-acl] | o- acls ................................................... [ACLs: 0] | o- luns ........................................................ [LUNs: 1] | | o- lun0 [block/iqn.2010-10.org.openstack:volume-11218701-7f0b-4431-ade7-101c7cf20c6e (/dev/cinder-volumes/volume-11218701-7f0b-4431-ade7-101c7cf20c6e)] | o- portals ............................................. [Portals: 1] | o- 0.0.0.0:3260 ............................................ [OK] The ISCSi initiator name is on both compute nodes: iqn.1994-05.com.redhat:4a52e5aa22c
To summarize, it seems the issue here is that all compute nodes have the same iSCSI initiator name. Basil/Jarda, any opinions on how critical this is?
related/DUP? bz1288423 - iscsi initiatorname is identical for all overcloud nodes
What is an impact for the end-user here?
(In reply to Jaromir Coufal from comment #8) > What is an impact for the end-user here? Quite a weird question, Anyway it's been a while I played with this but as far as I remember and I described in comments, After migration proper iscsi ACL is not created and rootwrap reports a fail which causes any other following live migration of the instance to fail and most likely I guess volume was not accessible in the instance after first original live migration.
*** Bug 1288423 has been marked as a duplicate of this bug. ***
Cloned this against director 8
this is what the iscsi-initiator-utils rpm does in %post: %post /sbin/ldconfig %systemd_post iscsi.service iscsi-shutdown.service iscsid.service iscsid.socket if [ $1 -eq 1 ]; then if [ ! -f %{_sysconfdir}/iscsi/initiatorname.iscsi ]; then echo "InitiatorName=`/usr/sbin/iscsi-iname`" > %{_sysconfdir}/iscsi/initiatorname.iscsi fi # enable socket activation and persistant session startup by default /bin/systemctl enable iscsi.service >/dev/null 2>&1 || : /bin/systemctl enable iscsid.socket >/dev/null 2>&1 || : fi so the name gets set the same for all the nodes since they are deployed from the same image and the name is generated at rpm install time. i think we just need to add the above logic to our puppet manifests so that it gets regenerated when puppet is run
I wrote this quick patch. https://review.openstack.org/#/c/275890/
The patch looks good to me, I'm trying to reproduce the bug, once I do I'll try out your patch. I'd mainly like to see what happens to VM's on an existing cloud if we change the initiator name during their life cycle.
Not having shared storage setup, I reproduced this using a volume backed VM, live migration failed, the host being migrated too could not connect to the iscsi target. I then changed the Initiator name on both compute nodes (while a new VM was running), the VM continued to run and live migration started working. So the suggested patch should fix the bug as reported, new deployments wont exhibit the problem and the live migration attempted above would have worked AIUI. Following this I live migrated the VM back to the host where it was started and the initiatorname being reported by targetcli is the original initiatorname before it was changed. From the looks of it the the change in initiatorname only took effect on the compute node that hadn't yet been used. Eric, I'm still digging into this (currently redeploying with 3 compute nodes so I can try a more complex example), do you know if anything needs to be run for a change in initiatorname to take effect on compute nodes where the origional initiator name had been used?
(In reply to Derek Higgins from comment #19) I believe Nova/os-brick will read the new initiator name at attach time, but it's possible that you need to restart the iscsi service on the compute node to ensure that it reloads the config and matches what's being used by Nova. I'm having trouble finding documentation about what the expected behavior is here.
The patch a attached works for new deployments, live migration works as expected. For existing deployments it becomes a little more complicated, the following needs to happen on each compute node before live migration is attempted # Set the InitiatorName (if /etc/iscsi/.initiator_reset doesn't exist) /bin/echo InitiatorName=$(/usr/sbin/iscsi-iname) > /etc/iscsi/initiatorname.iscsi # make sure the new InitiatorName is picked up systemctl restart iscsid systemctl restart openstack-nova-compute Only after doing this have I been able to live migrate volume backed VM's created both before and after the InitiatorName has been changed onto compute nodes that had been previously used Assuming any upgrade will involve live migration, depending on how it is being orchestrated the 3 lines above may be part of the patch to overcloud_compute.pp or part of an upgrade script.
[stack@instack ~]$ nova list +--------------------------------------+-------------------------+--------+------------+-------------+---------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------------------------+--------+------------+-------------+---------------------+ | fe8c1d14-d63b-442e-91b8-a6c68f1214ce | overcloud-cephstorage-0 | ACTIVE | - | Running | ctlplane=192.0.2.7 | | 44344478-0866-40f4-9a27-a9fa84343119 | overcloud-compute-0 | ACTIVE | - | Running | ctlplane=192.0.2.11 | | 69335882-c373-4956-9823-47d95cd1ed4b | overcloud-compute-1 | ACTIVE | - | Running | ctlplane=192.0.2.9 | | 3c691e1a-6a7e-48f1-bb28-fc2b7e953d15 | overcloud-controller-0 | ACTIVE | - | Running | ctlplane=192.0.2.12 | | abd350f8-54e2-420f-9e3e-d7f4081ed51c | overcloud-controller-1 | ACTIVE | - | Running | ctlplane=192.0.2.10 | | 58483250-538b-4bdb-861b-f7cc98f7d08d | overcloud-controller-2 | ACTIVE | - | Running | ctlplane=192.0.2.8 | +--------------------------------------+-------------------------+--------+------------+-------------+---------------------+ [stack@instack ~]$ for i in `nova list|grep ctlplane|cut -d"=" -f2 |cut -d' ' -f1`; do echo $i; ssh heat-admin@$i cat /etc/iscsi/initiatorname.iscsi; done 192.0.2.7 InitiatorName=iqn.1994-05.com.redhat:9d4e9e8d8fe 192.0.2.11 InitiatorName=iqn.1994-05.com.redhat:8950acdea36 192.0.2.9 InitiatorName=iqn.1994-05.com.redhat:7c3107a5d62 192.0.2.12 InitiatorName=iqn.1994-05.com.redhat:9d4e9e8d8fe 192.0.2.10 InitiatorName=iqn.1994-05.com.redhat:9d4e9e8d8fe 192.0.2.8 InitiatorName=iqn.1994-05.com.redhat:9d4e9e8d8fe [stack@instack ~]$ rpm -qa |grep tripleo openstack-tripleo-image-elements-0.9.6-10.el7ost.noarch openstack-tripleo-common-0.0.1.dev6-6.git49b57eb.el7ost.noarch openstack-tripleo-heat-templates-0.8.6-119.el7ost.noarch openstack-tripleo-puppet-elements-0.0.1-5.el7ost.noarch openstack-tripleo-0.0.7-0.1.1664e566.el7ost.noarch Looks like the computes have different initiatornames. All other hosts still have the same initiatorname. Setting the BZ on virified, will report the non-randomness on the other nodes in a separate BZ
this will need reverification as there was an addition to the initial fix. on overcloud update, the iscsid and openstack-nova-compute services need to be restarted on after the initiator name has been set, so that dependency has been added to the compute puppet manifest.
@Dan - it's intended that only the compute nodes have their iSCSI initiator ID's changed, hence why four out of your six nodes have the original and identical ID and the computes are random. This is not a bug.
(In reply to Rhys Oxenham from comment #25) > @Dan - it's intended that only the compute nodes have their iSCSI initiator > ID's changed, hence why four out of your six nodes have the original and > identical ID and the computes are random. This is not a bug. I understand that, this is why I opened the other bug on low prio. In case additional disks are to be added via iscsi on those other non-compute nodes (cinder, glance, ceph, swift especially) we don't want to run into problems if we can have an easy fix now. The severity for now is low, but that's still not the way iscsi initiators should be autoconfigured anywhere.
(In reply to Dan Yasny from comment #27) > (In reply to Rhys Oxenham from comment #25) > > @Dan - it's intended that only the compute nodes have their iSCSI initiator > > ID's changed, hence why four out of your six nodes have the original and > > identical ID and the computes are random. This is not a bug. > > I understand that, this is why I opened the other bug on low prio. In case > additional disks are to be added via iscsi on those other non-compute nodes > (cinder, glance, ceph, swift especially) we don't want to run into problems > if we can have an easy fix now. The severity for now is low, but that's > still not the way iscsi initiators should be autoconfigured anywhere. Got it, then I was the one that confused your intentions - apologies.
(In reply to James Slagle from comment #24) > this will need reverification as there was an addition to the initial fix. > on overcloud update, the iscsid and openstack-nova-compute services need to > be restarted on after the initiator name has been set, so that dependency > has been added to the compute puppet manifest. @James, Can you please elaborate on how exactly the additional verification steps should look? If I deploy 7.1 and upgrade to 7.3 (last night's puddle) and then check the initiator names, will that suffice?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0264.html