.Having more than one path from an initiator to an iSCSI gateway is not supported
In the iSCSI gateway, `tcmu-runner` might return the same inquiry and Asymmetric logical unit access (ALUA) info for all iSCSI sessions to a target port group. This can cause the initiator or multipath layer to use the incorrect port info to reference the internal structures for paths and devices, which can result in failures, failover and failback failing, or incorrect multipath and SCSI log or tool output. Therefore, having more than one iSCSI session from an initiator to an iSCSI gateway is not supported.
Description of problem:
1. Started IO from windows on a fresh config(logged in and luns mapped)
2. Started IO from a windows VM on ESX for additional load.
3. After a while of IOs, also a windows initiator reboot, I am seeing 8 sessions for 4 TPGs. Not sure from where 4 extra sessions got created.
Seeing exactly 2 of each path.
4. IO s continued, srated OSD addition to the ceph cluster.
5. Lot of pg remapping happening, and IOs stopped.
6. Windows initiator crashed
Minidump and MEMORY.DMP file present.
Version-Release number of selected component (if applicable):
ceph version 12.2.1-14.el7cp
3.10.0-714.el7.test.x86_64 (kernel on ceph cluster)
@Tejas: What does "pg remapping happening" mean? Are OSDs crashing or are you manually doing things in the background to slow down the OSDs' responsiveness?
I added a new OSD node with 8 OSDs, so the object redistribution is happening to the new OSDs. I did not manually do anything except add the OSDs.
> 3. After a while of IOs, also a windows initiator reboot, I am seeing 8 sessions for 4 TPGs. Not sure from where 4 extra sessions got created.
Seeing exactly 2 of each path.
Where do you see the extra sessions? The target side or initiator side or both? If on the target side is it in gwcli or the configfs interface?
For the extra sessions, you just have 4 extra sessions defined in the "Favorite Targets", so whenever you reboot or restart the iscsi service you will get the extra sessions. Did you by any chance maybe setup iscsi targets and forget you had already set up some Favorite Targets?
We do not support multiple sessions to the same target port group from the same initiator, because tcmu-runner returns incorrect inquiry data. This will cause windows failover/failback issues, but I am not sure if it would cause a crash. It could cause the wrong paths to be referenced and it looks like during the test IO timed out and failovers were attempted.
Do you want me to fix up the Favorites? We should fix that then rerun the test.
okay let me try the same run tomorrow with just 4 sessions defined, and we can confirm ifthe crash was due to that.
Ok. Just FYI, I looked at the dmp and it looks like the multiple sessions and bad inquiry data might be the cause for the crash. Here is the trace from the dmp. Of course we do not have the source, but going by the function names, it seems like it might have been trying to update the alua tpg info and so we probably hit the bug I mentioned:
For the command timeout issue that started this, I think we might have to increase the command timers on the initiators.
@Mike: I thought the 25 second initiator timeout was chosen based upon ESX hard-coded limitations? Are you just suggesting increasing the timeout for Linux/Windows initiators
OK -- so it sounds like we can close this as NOTABUG if it only occurs when Windows connects to the same target portal multiple times.
We can keep this open till MCS is implemented and then verify it.
Closing since it was a config issue that we have documented.