| Summary: | iscsi targets loses acl data after hard reboot | ||
|---|---|---|---|
| Product: | [Community] RDO | Reporter: | Jack Waterworth <jwaterwo> |
| Component: | openstack-cinder | Assignee: | Eric Harney <eharney> |
| Status: | CLOSED EOL | QA Contact: | nlevinki <nlevinki> |
| Severity: | low | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | Liberty | CC: | eharney, srevivo |
| Target Milestone: | --- | ||
| Target Release: | Kilo | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-05-19 15:38:41 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
additionally, i had another issue after the hard reboot that I am unsure is related: i was doing some testing for this bz: https://bugzilla.redhat.com/show_bug.cgi?id=1261083#c13 After the reboot, LVM picked up the data from the under laying path and prevented target from starting. I had to manually blow away the LVM data and restart the target service. this appears to happen after hard reboot. had the issue occur twice today. I lost the acls i had previous restored above, and then i lost the lun assignment within LIO.
this is a brand new instance that was started up a few hours ago, but as you can see, no luns are assigned to the target.
[root@bulldozer ~(keystone_admin)]# targetcli ls /iscsi/iqn.2010-10.org.openstack:volume-7f7bc29c-dd0e-4419-90fb-eda506b25341
o- iqn.2010-10.org.openstack:volume-7f7bc29c-dd0e-4419-90fb-eda506b25341 ................................................. [TPGs: 1]
o- tpg1 .............................................................................................. [no-gen-acls, auth per-acl]
o- acls .............................................................................................................. [ACLs: 1]
| o- iqn.1994-05.com.redhat:d32a1588856 ........................................................... [1-way auth, Mapped LUNs: 0]
o- luns .............................................................................................................. [LUNs: 0]
o- portals ........................................................................................................ [Portals: 1]
o- 192.168.1.200:3260 ................................................................................................... [OK]
[root@bulldozer ~(keystone_admin)]# cinder show 7f7bc29c-dd0e-4419-90fb-eda506b25341
+---------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Property | Value |
+---------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| attachments | [{u'server_id': u'ceb63750-83ab-4902-9794-44763a70ca6a', u'attachment_id': u'a33a5847-5626-4b36-a770-01956c831adc', u'host_name': None, u'volume_id': u'7f7bc29c-dd0e-4419-90fb-eda506b25341', u'device': u'/dev/vdc', u'id': u'7f7bc29c-dd0e-4419-90fb-eda506b25341'}] |
| availability_zone | nova |
| bootable | false |
| consistencygroup_id | None |
| created_at | 2016-03-06T21:41:46.000000 |
| description | Large Storage Disk |
| encrypted | False |
| id | 7f7bc29c-dd0e-4419-90fb-eda506b25341 |
| metadata | {u'readonly': u'False', u'attached_mode': u'rw'} |
| migration_status | None |
| multiattach | False |
| name | torrents-storage-1 |
| os-vol-host-attr:host | bulldozer@linear#linear |
| os-vol-mig-status-attr:migstat | None |
| os-vol-mig-status-attr:name_id | None |
| os-vol-tenant-attr:tenant_id | ce8f683ab7fa4cbba42a3e9bb084e6fe |
| os-volume-replication:driver_data | None |
| os-volume-replication:extended_status | None |
| replication_status | disabled |
| size | 1024 |
| snapshot_id | None |
| source_volid | None |
| status | error |
| user_id | d76c0705baca4e2eb3a24541f9f1fa39 |
| volume_type | linear |
+---------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
we hit this issue in the lab today on an rhosp7 environment installed with packstack. the tpg1 lost its lun assignment
[root@gss-ose-4 ~(keystone_admin)]# targetcli ls /iscsi/iqn.2010-10.org.openstack:volume-3c7f8e55-e5cb-452f-a2aa-07e4ca32083b
o- iqn.2010-10.org.openstack:volume-3c7f8e55-e5cb-452f-a2aa-07e4ca32083b ................................................. [TPGs: 1]
o- tpg1 .............................................................................................. [no-gen-acls, auth per-acl]
o- acls .............................................................................................................. [ACLs: 1]
| o- iqn.1994-05.com.redhat:15d18a66115e .......................................................... [1-way auth, Mapped LUNs: 0]
o- luns .............................................................................................................. [LUNs: 0]
o- portals ........................................................................................................ [Portals: 1]
o- 0.0.0.0:3260 ......................................................................................................... [OK]
I think this *may* be related to a bug where RTSLib does not flush changes to disk, causing them to be lost in the event of a hard reboot. Here is an upstream commit that was submitted to fix that:
https://github.com/jmagrini/rtslib-fb/commit/b625f61a03d2127239480b45fed80028f82f8a50
Is this related?
to follow up on #5, the box was NOT hard rebooted, but a graceful reboot did occur and the issue was present when the box came back up. oh, it looks like we lost all of our backstores for the issue in #5 too: [root@gss-ose-4 ~(keystone_admin)]# targetcli ls /backstores o- backstores ................................................................................................................ [...] o- block .................................................................................................... [Storage Objects: 0] o- fileio ................................................................................................... [Storage Objects: 0] o- pscsi .................................................................................................... [Storage Objects: 0] o- ramdisk .................................................................................................. [Storage Objects: 0] This bug is against a Version which has reached End of Life. If it's still present in supported release (http://releases.openstack.org), please update Version and reopen. |
Description of problem: After a power outage, my iscsi target appears to have lost their ACLs. This causes instance boot to fail as the iscsi session is not established at boot time. Version-Release number of selected component (if applicable): python-cinder-7.0.0-1.el7.noarch openstack-cinder-7.0.0-1.el7.noarch kernel-3.10.0-229.14.1.el7.x86_64 How reproducible: happened once Steps to Reproduce: 1. power off systems without graceful shutdonw 2. power on system 3. attempt to boot instances Actual results: nova compute returns "iscsiadm: No session found.", and running the command manually gives the same error: # sudo nova-rootwrap /etc/nova/rootwrap.conf iscsiadm -m node -T iqn.2010-10.org.openstack:volume-7d644152-08d1-4e82-b7f5-e597f860ca4b -p 192.168.1.200:3260 --rescan iscsiadm: No session found. Attempting to login to the iscsi target: iscsiadm: initiator reported error (24 - iSCSI login failed due to authorization failure) Expected results: session should exist at boot time, login should be allowed, and LIO should not lose its acls. Additional info: checking targetcli on the host shows that every volume is missing its ACL. I dont have the exact output, but it looked something like this: [root@bulldozer ~]# targetcli ls /iscsi/iqn.2010-10.org.openstack:volume-7d644152-08d1-4e82-b7f5-e597f860ca4b o- iqn.2010-10.org.openstack:volume-7d644152-08d1-4e82-b7f5-e597f860ca4b ................................................. [TPGs: 1] o- tpg1 ................................................................................................... [no-gen-acls, no-auth] o- acls .............................................................................................................. [ACLs: 0] o- luns .............................................................................................................. [LUNs: 1] | o- lun0 [block/iqn.2010-10.org.openstack:volume-7d644152-08d1-4e82-b7f5-e597f860ca4b (/dev/cinder-volumes/volume-7d644152-08d1-4e82-b7f5-e597f860ca4b)] o- portals ........................................................................................................ [Portals: 1] o- 192.168.1.200:3260 ................................................................................................... [OK] I was able to fix this issue by simply adding the iqn of my compute node to the acl in targetcli. this was an all-in-one environment running on centos.