Description of problem: After a power outage, my iscsi target appears to have lost their ACLs. This causes instance boot to fail as the iscsi session is not established at boot time. Version-Release number of selected component (if applicable): python-cinder-7.0.0-1.el7.noarch openstack-cinder-7.0.0-1.el7.noarch kernel-3.10.0-229.14.1.el7.x86_64 How reproducible: happened once Steps to Reproduce: 1. power off systems without graceful shutdonw 2. power on system 3. attempt to boot instances Actual results: nova compute returns "iscsiadm: No session found.", and running the command manually gives the same error: # sudo nova-rootwrap /etc/nova/rootwrap.conf iscsiadm -m node -T iqn.2010-10.org.openstack:volume-7d644152-08d1-4e82-b7f5-e597f860ca4b -p 192.168.1.200:3260 --rescan iscsiadm: No session found. Attempting to login to the iscsi target: iscsiadm: initiator reported error (24 - iSCSI login failed due to authorization failure) Expected results: session should exist at boot time, login should be allowed, and LIO should not lose its acls. Additional info: checking targetcli on the host shows that every volume is missing its ACL. I dont have the exact output, but it looked something like this: [root@bulldozer ~]# targetcli ls /iscsi/iqn.2010-10.org.openstack:volume-7d644152-08d1-4e82-b7f5-e597f860ca4b o- iqn.2010-10.org.openstack:volume-7d644152-08d1-4e82-b7f5-e597f860ca4b ................................................. [TPGs: 1] o- tpg1 ................................................................................................... [no-gen-acls, no-auth] o- acls .............................................................................................................. [ACLs: 0] o- luns .............................................................................................................. [LUNs: 1] | o- lun0 [block/iqn.2010-10.org.openstack:volume-7d644152-08d1-4e82-b7f5-e597f860ca4b (/dev/cinder-volumes/volume-7d644152-08d1-4e82-b7f5-e597f860ca4b)] o- portals ........................................................................................................ [Portals: 1] o- 192.168.1.200:3260 ................................................................................................... [OK] I was able to fix this issue by simply adding the iqn of my compute node to the acl in targetcli. this was an all-in-one environment running on centos.
additionally, i had another issue after the hard reboot that I am unsure is related: i was doing some testing for this bz: https://bugzilla.redhat.com/show_bug.cgi?id=1261083#c13 After the reboot, LVM picked up the data from the under laying path and prevented target from starting. I had to manually blow away the LVM data and restart the target service.
this appears to happen after hard reboot. had the issue occur twice today. I lost the acls i had previous restored above, and then i lost the lun assignment within LIO. this is a brand new instance that was started up a few hours ago, but as you can see, no luns are assigned to the target. [root@bulldozer ~(keystone_admin)]# targetcli ls /iscsi/iqn.2010-10.org.openstack:volume-7f7bc29c-dd0e-4419-90fb-eda506b25341 o- iqn.2010-10.org.openstack:volume-7f7bc29c-dd0e-4419-90fb-eda506b25341 ................................................. [TPGs: 1] o- tpg1 .............................................................................................. [no-gen-acls, auth per-acl] o- acls .............................................................................................................. [ACLs: 1] | o- iqn.1994-05.com.redhat:d32a1588856 ........................................................... [1-way auth, Mapped LUNs: 0] o- luns .............................................................................................................. [LUNs: 0] o- portals ........................................................................................................ [Portals: 1] o- 192.168.1.200:3260 ................................................................................................... [OK] [root@bulldozer ~(keystone_admin)]# cinder show 7f7bc29c-dd0e-4419-90fb-eda506b25341 +---------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Property | Value | +---------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | attachments | [{u'server_id': u'ceb63750-83ab-4902-9794-44763a70ca6a', u'attachment_id': u'a33a5847-5626-4b36-a770-01956c831adc', u'host_name': None, u'volume_id': u'7f7bc29c-dd0e-4419-90fb-eda506b25341', u'device': u'/dev/vdc', u'id': u'7f7bc29c-dd0e-4419-90fb-eda506b25341'}] | | availability_zone | nova | | bootable | false | | consistencygroup_id | None | | created_at | 2016-03-06T21:41:46.000000 | | description | Large Storage Disk | | encrypted | False | | id | 7f7bc29c-dd0e-4419-90fb-eda506b25341 | | metadata | {u'readonly': u'False', u'attached_mode': u'rw'} | | migration_status | None | | multiattach | False | | name | torrents-storage-1 | | os-vol-host-attr:host | bulldozer@linear#linear | | os-vol-mig-status-attr:migstat | None | | os-vol-mig-status-attr:name_id | None | | os-vol-tenant-attr:tenant_id | ce8f683ab7fa4cbba42a3e9bb084e6fe | | os-volume-replication:driver_data | None | | os-volume-replication:extended_status | None | | replication_status | disabled | | size | 1024 | | snapshot_id | None | | source_volid | None | | status | error | | user_id | d76c0705baca4e2eb3a24541f9f1fa39 | | volume_type | linear | +---------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
we hit this issue in the lab today on an rhosp7 environment installed with packstack. the tpg1 lost its lun assignment [root@gss-ose-4 ~(keystone_admin)]# targetcli ls /iscsi/iqn.2010-10.org.openstack:volume-3c7f8e55-e5cb-452f-a2aa-07e4ca32083b o- iqn.2010-10.org.openstack:volume-3c7f8e55-e5cb-452f-a2aa-07e4ca32083b ................................................. [TPGs: 1] o- tpg1 .............................................................................................. [no-gen-acls, auth per-acl] o- acls .............................................................................................................. [ACLs: 1] | o- iqn.1994-05.com.redhat:15d18a66115e .......................................................... [1-way auth, Mapped LUNs: 0] o- luns .............................................................................................................. [LUNs: 0] o- portals ........................................................................................................ [Portals: 1] o- 0.0.0.0:3260 ......................................................................................................... [OK] I think this *may* be related to a bug where RTSLib does not flush changes to disk, causing them to be lost in the event of a hard reboot. Here is an upstream commit that was submitted to fix that: https://github.com/jmagrini/rtslib-fb/commit/b625f61a03d2127239480b45fed80028f82f8a50 Is this related?
to follow up on #5, the box was NOT hard rebooted, but a graceful reboot did occur and the issue was present when the box came back up.
oh, it looks like we lost all of our backstores for the issue in #5 too: [root@gss-ose-4 ~(keystone_admin)]# targetcli ls /backstores o- backstores ................................................................................................................ [...] o- block .................................................................................................... [Storage Objects: 0] o- fileio ................................................................................................... [Storage Objects: 0] o- pscsi .................................................................................................... [Storage Objects: 0] o- ramdisk .................................................................................................. [Storage Objects: 0]
This bug is against a Version which has reached End of Life. If it's still present in supported release (http://releases.openstack.org), please update Version and reopen.