2097614 – Host goes to non-operational due to 15 - session exists error

Bug 2097614 - Host goes to non-operational due to 15 - session exists error

Summary: Host goes to non-operational due to 15 - session exists error

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	vdsm
Classification:	oVirt
Component:	Core
Sub Component:
Version:	4.50.1.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	ovirt-4.5.2
Target Release:	---
Assignee:	Nir Soffer
QA Contact:	sshmulev
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-06-16 07:17 UTC by sshmulev
Modified:	2022-08-30 08:47 UTC (History)
CC List:	8 users (show)
Fixed In Version:	vdsm-4.50.2
Clone Of:
Environment:
Last Closed:	2022-08-30 08:47:42 UTC
oVirt Team:	Storage
Embargoed:
Dependent Products:
Flags:	pm-rhel: ovirt-4.5?

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	oVirt vdsm pull 259	0	None	open	iscsiadm: Consider session exists as success	2022-06-20 16:12:18 UTC
Red Hat Issue Tracker	RHV-46441	0	None	None	None	2022-06-16 07:35:14 UTC

Description sshmulev 2022-06-16 07:17:24 UTC

Description of problem:
Host goes to non-operational for several minutes due to "15 - session exists" error in vdsm after the deactivation of the host.
In this case, we do not have duplicate connections in the engine DB but on the host ISCSI cache - which triggers the same error (15 - session exists) as seen in https://bugzilla.redhat.com/show_bug.cgi?id=2083271.

Version-Release number of selected component (if applicable):
ovirt-engine-4.5.1.1-0.14.el8ev
vdsm-4.50.1.2-1.el8ev

How reproducible:
100%

Steps to Reproduce:
1. On the host:
Need to have one connection (for examplex.x.x.x) on an iscsi SD, that has two entries:

Create a stale file(/var/lib/iscsi/nodes/xxxxxxx/default) for node.tpgt =1 or add it manually to the ISCSI DB.
In our case ALL QE hosts have in ISCSI cache on portal:x.x.x.x two tpgt connections (node.tpgt = 1, node.tpgt = 1034).

2. If additional ISCSI connections exist that are not x.x.x.x - Remove them via RHV UI.
3. Move host to maintenance -> Activate.

Actual results:
initiator reported error (15 - session exists)

The host goes non-operational for several minutes.
After ~4 minutes host is auto recovered.

Expected results:
We shouldn't see the error "15 - session exists", and the host shouldn't switch to non-operational when deactivated.

Additional info:
This is a regression since in bug 2083271 we got to the non-operational state of the host due to duplicate connections in the DB - now the DB has just one connection.

Comment 2 Nir Soffer 2022-06-16 07:34:56 UTC

(In reply to sshmulev from comment #0)
> Steps to Reproduce:
> 1. On the host:
> Need to have one connection (for examplex.x.x.x) on an iscsi SD, that has
> two entries:
> 
> Create a stale file(/var/lib/iscsi/nodes/xxxxxxx/default) for node.tpgt =1
> or add it manually to the ISCSI DB.

This is not valid way to reproduce. Why do you touch iscsi DB manually?
It you touch it manually you own it and you need to fix it yourself.

> In our case ALL QE hosts have in ISCSI cache on portal:x.x.x.x two tpgt
> connections (node.tpgt = 1, node.tpgt = 1034).

Not clear what is the iscsi configuration you are using.

> 2. If additional ISCSI connections exist that are not x.x.x.x - Remove them
> via RHV UI.

Not clear what you mean by that.

> Expected results:
> We shouldn't see the error "15 - session exists", and the host shouldn't
> switch to non-operational when deactivated.

If the host has existing session and vdsm tries to reconnect, the error is
expected.

Not clear why the host move to non-operational.

> Additional info:
> This is a regression since in bug 2083271 we got to the non-operational
> state of the host due to duplicate connections in the DB - now the DB has
> just one connection.

This is a regression only if you show that the same flow worked in previous
versions.

The provided reproduction instructions are not helpful. We need clear instructions
so developer can follow them and reproduce the failure.

Comment 3 Nir Soffer 2022-06-16 07:59:25 UTC

Looking at the attached logs:

1. Trying to connect to iqn.1992-08.com.netapp:vserver-rhv-qe on 10.46.16.10:3260 

2022-06-15 11:30:23,559+0300 INFO  (jsonrpc/6) [vdsm.api] START connectStorageServer(domType=3, spUUID='15e0fff7-39ba-4e91-be6f-161f6077b57e', conList=[{'password': '********', 'port': '3260', 'iqn': 'iqn.1992-08.com.netapp:vserver-rhv-qe', 'connection': '10.46.16.10', 'ipv6_enabled': 'false', 'id': 'cb463ed5-9d3a-45f9-92f7-31d5cfcd1684', 'user': '', 'tpgt': '1034'}]) from=::ffff:10.35.232.214,51286, flow_id=55f66251, task_id=c0798740-d934-4e64-bd09-9a88585f575e (api:48)

2022-06-15 11:30:23,560+0300 INFO  (jsonrpc/6) [storage.iscsi] Adding iscsi node for target 10.46.16.10:3260,1034 iqn.1992-08.com.netapp:vserver-rhv-qe iface default (iscsi:209)
2022-06-15 11:30:23,765+0300 DEBUG (jsonrpc/6) [storage.iscsi] iface.net_ifacename not provided, skipping rp filter setup (iscsi:605)
2022-06-15 11:30:23,765+0300 INFO  (jsonrpc/6) [storage.storageServer] Log in to 1 targets using 1 workers (storageServer:614)
2022-06-15 11:30:23,768+0300 INFO  (iscsi-login/0) [storage.iscsi] Logging in to iscsi target 10.46.16.10:3260,1034 iqn.1992-08.com.netapp:vserver-rhv-qe via iface default (iscsi:231)

2. iscsiadm login failed

2022-06-15 11:30:25,058+0300 ERROR (iscsi-login/0) [storage.storageServer] Could not login to target (storageServer:621)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/iscsiadm.py", line 322, in node_login
    "-p", portal, "-l"])
  File "/usr/lib/python3.6/site-packages/vdsm/storage/iscsiadm.py", line 125, in run_cmd
    out = commands.run(cmd, sudo=True)
  File "/usr/lib/python3.6/site-packages/vdsm/common/commands.py", line 102, in run
    raise cmdutils.Error(args, p.returncode, out, err)
vdsm.common.cmdutils.Error: Command ['/sbin/iscsiadm', '-m', 'node', '-T', 'iqn.1992-08.com.netapp:vserver-rhv-qe', '-I', 'default', '-p', '10.46.16.10:3260,1034', '-l'] failed with rc=15 out=b'Logging in to [iface: default, target: iqn.1992-08.com.netapp:vserver-rhv-qe, portal: 10.46.16.10,3260]\nLogging in to [iface: default, target: iqn.1992-08.com.netapp:vserver-rhv-qe, portal: 10.46.16.10,3260]\nLogin to [iface: default, target: iqn.1992-08.com.netapp:vserver-rhv-qe, portal: 10.46.16.10,3260] successful.\n' err=b'iscsiadm: Could not login to [iface: default, target: iqn.1992-08.com.netapp:vserver-rhv-qe, portal: 10.46.16.10,3260].\niscsiadm: initiator reported error (15 - session exists)\niscsiadm: Could not log into all portals\n'

3. Vdsm report the result to engine

2022-06-15 11:30:28,142+0300 INFO  (jsonrpc/6) [vdsm.api] FINISH connectStorageServer return={'statuslist': [{'id': 'cb463ed5-9d3a-45f9-92f7-31d5cfcd1684', 'status': 465}]} from=::ffff:10.35.232.214,51286, flow_id=55f66251, task_id=c0798740-d934-4e64-bd09-9a88585f575e (api:54)

Maybe engine moves the host to non-operational because the single
connection requested failed. We can consider returning success for 
existing connections since "session exists" means that we already
have a connection.

We don't see removal of the existing session - so this is not related
to bug 2083271.

Comment 4 Nir Soffer 2022-06-16 08:05:31 UTC

Removed the Regression keyword since the bug show no evidence of a regression.

To add a Regression please provide clear reproduction instructions and specify
the previous version tested that behaved differently.

Comment 5 Arik 2022-06-16 10:07:03 UTC

(In reply to Nir Soffer from comment #3)
> Maybe engine moves the host to non-operational because the single
> connection requested failed. We can consider returning success for 
> existing connections since "session exists" means that we already
> have a connection.

+1
Yeah, we had that kind of philosophical questions in the past - you try to remove an image and find that it doesn't exist, so did you succeed or not? :)
 I think that also in that case we've said that if we've reached the desired state, the operation can treat that as a successful operation

Comment 6 Arik 2022-06-16 10:08:32 UTC

(In reply to Arik from comment #5)
>  I think that also in that case we've said that if we've reached the desired
> state, the operation can treat that as a successful operation

let me rephrase this one:
I think that also in the aforementioned case we've said that if we reach the desired state, we can treat that as a successful operation

Comment 8 Avihai 2022-06-19 14:14:50 UTC

Another scenario via RHV only manually to reach this state:
Remove and recreate the same ISCSI connection result in an additional connection with node.tpgt = 1 which is the default and the original connection is the ISCSI cache.

Looks like there is no way to add a storage connection vi UI storage connection with ISCSI portal != 1 which can result in adding a duplicate entry in ISCSI cache for the same target IP and from there to the seen issue if you have one ISCSI storage connection for a storage domain from the host.

Additional scenario details:
1) Create DC, cluster , one host1 , storage backend connected with multiple ISCSI targets (IP1,IP2)
2) Add first ISCSI storage domain A via UI :
- Discovery on IP1 
- Login to all targets(IP1,IP2)
- Add one LUN

3) Checkout host ISCSI cache on host1 via CLI command 
host>   iscsiadm -m node -o show | egrep "node.name|node.tpg|node.conn\[0\]\.address" ;iscsiadm -m session

Output:
host ISCSI cache looks good with node.tpgt != 1 and expected active connection.

node.name = X
node.tpgt =! 1
node.conn[0].address = IP1


4) Move storage domain A to maintenance
5) Remove the ISCSI storage connection IP1:
Goto Storage -> Storage Domains -> storage domain A -> Connections -> Remove IP1 connection
6) Add the same ISCSI storage connection IP1 as seen before in this page (address , port=3260 always, target) .
* Important note :
I did not see a way to add tpgt target.

7) Checkout host ISCSI cache via CLI command:

host>   iscsiadm -m node -o show | egrep "node.name|node.tpg|node.conn\[0\]\.address" ;iscsiadm -m session

Output:
host ISCSI cache does NOT looks good and have now two entries to same target IP1:
1st entry: 
node.name = X
node.tpgt = 1
node.conn[0].address = IP1

2nd entry:
node.name = X
node.tpgt =! 1
node.conn[0].address = IP1

* From this point host cache is dirty with 2 entries to the same target IP1 and causes the 'session exist' error in VDSM and engine BUT for engine to think host has no connectivity and move to non-operational engine needs to think no ISCSI storage connections exist to this storage domain A.   

8) Remove ISCSI storage connection IP2 so you are left only with one ISCSI storage connection
9) Deactivate -> Activate host 

Host1 goes to non-operational for several minutes(~2-4) and then auto-recovers.

Comment 9 Nir Soffer 2022-06-19 15:49:58 UTC

(In reply to Avihai from comment #8)
> Another scenario via RHV only manually to reach this state:
> Remove and recreate the same ISCSI connection result in an additional
> connection with node.tpgt = 1 which is the default and the original
> connection is the ISCSI cache.
> 
> Looks like there is no way to add a storage connection vi UI storage
> connection with ISCSI portal != 1 which can result in adding a duplicate
> entry in ISCSI cache 

What is iscsi cache? Maybe you mean iscsi db?

/var/lib/iscsi/... is the iscsi database - this is not a cache. 

> for the same target IP and from there to the seen issue
> if you have one ISCSI storage connection for a storage domain from the host.
> 
> Additional scenario details:
> 1) Create DC, cluster , one host1 , storage backend connected with multiple
> ISCSI targets (IP1,IP2)
> 2) Add first ISCSI storage domain A via UI :
> - Discovery on IP1 
> - Login to all targets(IP1,IP2)
> - Add one LUN
> 
> 3) Checkout host ISCSI cache on host1 via CLI command 
> host>   iscsiadm -m node -o show | egrep
> "node.name|node.tpg|node.conn\[0\]\.address" ;iscsiadm -m session
> 
> Output:
> host ISCSI cache looks good with node.tpgt != 1 and expected active
> connection.
> 
> node.name = X
> node.tpgt =! 1
> node.conn[0].address = IP1
> 
> 
> 4) Move storage domain A to maintenance
> 5) Remove the ISCSI storage connection IP1:
> Goto Storage -> Storage Domains -> storage domain A -> Connections -> Remove
> IP1 connection
> 6) Add the same ISCSI storage connection IP1 as seen before in this page
> (address , port=3260 always, target) .
> * Important note :
> I did not see a way to add tpgt target.
> 
> 7) Checkout host ISCSI cache via CLI command:
> 
> host>   iscsiadm -m node -o show | egrep
> "node.name|node.tpg|node.conn\[0\]\.address" ;iscsiadm -m session
> 
> Output:
> host ISCSI cache does NOT looks good and have now two entries to same target
> IP1:
> 1st entry: 
> node.name = X
> node.tpgt = 1
> node.conn[0].address = IP1
> 
> 2nd entry:
> node.name = X
> node.tpgt =! 1
> node.conn[0].address = IP1

Are you sure that the host iscsi db was modifed after adding a connection on
engine side while the storage domain was in maintenance?

> * From this point host cache is dirty with 2 entries to the same target IP1
> and causes the 'session exist' error in VDSM and engine BUT for engine to
> think host has no connectivity and move to non-operational engine needs to
> think no ISCSI storage connections exist to this storage domain A.   
> 
> 8) Remove ISCSI storage connection IP2 so you are left only with one ISCSI
> storage connection
> 9) Deactivate -> Activate host 
> 
> Host1 goes to non-operational for several minutes(~2-4) and then
> auto-recovers.

OK, this looks like reproducible instructions.

The flow, removing and adding back iscsi connection, and then removing the second
connection and leaving the host with a single connection looks like very unlikely
situation, so I'm lowering the severity to medium.

Comment 10 Arik 2022-06-19 16:22:18 UTC

I would argue that the priority should be low/medium in that case and the severity should remain high (if it may lead to the host switching to maintenance state)
but that's less important, if it's not a regression then it's something for 4.5.2 and the approach described in comment 3 makes perfect sense to me

Comment 11 Avihai 2022-06-20 06:23:24 UTC

(In reply to Nir Soffer from comment #9)
> What is iscsi cache? Maybe you mean iscsi db?
Yes.

> Are you sure that the host iscsi db was modifed after adding a connection on
> engine side while the storage domain was in maintenance?
You are correct, storage domain should be activated to see the changes in host ISCSI node DB.

Corrected and shorter scenario B ( Thanks for that :)):

1) Create DC, cluster , one host1 , storage backend connected with multiple ISCSI targets (IP1,IP2)
2) Add first ISCSI storage domain A via UI :
- Discovery on IP1 
- Login to all targets(IP1,IP2)
- Add one LUN

3) Checkout host ISCSI cache on host1 via CLI command 
host>   iscsiadm -m node -o show | egrep "node.name|node.tpg|node.conn\[0\]\.address" ;iscsiadm -m session

Output:
host ISCSI cache looks good with node.tpgt != 1 and expected active connection.

node.name = X
node.tpgt =! 1
node.conn[0].address = IP1

node.name = X
node.tpgt =! 1
node.conn[0].address = IP2

4) Move storage domain A to maintenance
5) Remove the ISCSI storage connection IP1:
Goto Storage -> Storage Domains -> storage domain A -> Connections -> Detach and Remove IP1 connection

6) Add the same ISCSI storage connection IP1 as seen before in this page (address , port=3260 always, target) .
Goto Storage -> Storage Domains -> storage domain A -> Connections -> Add IP1 connection

* Important note :
I did not see a way to add tpgt target so tpgt=1 always(!)

7) Remove ISCSI storage connection IP2 so you are left only with one ISCSI storage connection
Goto Storage -> Storage Domains -> storage domain A -> Connections -> Detach and Remove IP2 connection

8) Activate host 

Host goes to non-operational for several minutes(2-4 min) and then recovers.
Future host deactivate-activate will again cause non-operational for several minutes(2-4 min) (!)

> The flow, removing and adding back iscsi connection, and then removing the
> second
> connection and leaving the host with a single connection looks like very
> unlikely
> situation, so I'm lowering the severity to medium.
I disagree as this will occur also when an admin/customer wants to change faulty target IP or add a new ISCSI target during maintenance which is very probable and common.
 
Removing an ISCSI connection faulty IP, fixing the backend, and re-adding it is a common operation and sometimes customers will be left with one working storage connection and will expect RHV to support and not disable the hosts for minutes each time he deactivate-activate host which is a very severe outcome. 

Moving back to high severity.

Comment 12 Nir Soffer 2022-06-20 14:17:12 UTC

I could not reproduce the issue in my environment.

This what I tested:

1. Move host to maintenance
2. Move storage domain to maintenance
3. Edit storage domain connections:
   3.1 Remove connection 192.168.122.34:3260 iqn.2003-01.org.alpine.01
   3.2 Add connection 192.168.122.34:3260 iqn.2003-01.org.alpine.01
   3.3 Remove connection 192.168.122.35:3260 iqn.2003-01.org.alpine.01
4. Activate storage domain
5. On host, connect manually to 192.168.122.35:3260 iqn.2003-01.org.alpine.01

    # iscsiadm -m session
    tcp: [10] 192.168.122.34:3260,1 iqn.2003-01.org.alpine.01 (non-flash)

6. Activate host

I got no "session exist" errors, likely because I don't have unexpected
nodes on my host.

# iscsiadm -m node -P1
Target: iqn.2003-01.org.alpine.01
	Portal: 192.168.122.34:3260,1
		Iface Name: default
	Portal: 192.168.122.35:3260,1
		Iface Name: default
Target: iqn.2003-01.org.alpine.02
	Portal: 192.168.122.34:3260,1
		Iface Name: default
	Portal: 192.168.122.35:3260,1
		Iface Name: default

# iscsiadm -m session -P1
Target: iqn.2003-01.org.alpine.01 (non-flash)
	Current Portal: 192.168.122.34:3260,1
	Persistent Portal: 192.168.122.34:3260,1
		**********
		Interface:
		**********
		Iface Name: default
		Iface Transport: tcp
		Iface Initiatorname: iqn.1994-05.com.redhat:71982f294f4
		Iface IPaddress: 192.168.122.23
		Iface HWaddress: default
		Iface Netdev: default
		SID: 13
		iSCSI Connection State: LOGGED IN
		iSCSI Session State: LOGGED_IN
		Internal iscsid Session State: NO CHANGE
Target: iqn.2003-01.org.alpine.02 (non-flash)
	Current Portal: 192.168.122.34:3260,1
	Persistent Portal: 192.168.122.34:3260,1
		**********
		Interface:
		**********
		Iface Name: default
		Iface Transport: tcp
		Iface Initiatorname: iqn.1994-05.com.redhat:71982f294f4
		Iface IPaddress: 192.168.122.23
		Iface HWaddress: default
		Iface Netdev: default
		SID: 14
		iSCSI Connection State: LOGGED IN
		iSCSI Session State: LOGGED_IN
		Internal iscsid Session State: NO CHANGE
	Current Portal: 192.168.122.35:3260,1
	Persistent Portal: 192.168.122.35:3260,1
		**********
		Interface:
		**********
		Iface Name: default
		Iface Transport: tcp
		Iface Initiatorname: iqn.1994-05.com.redhat:71982f294f4
		Iface IPaddress: 192.168.122.23
		Iface HWaddress: default
		Iface Netdev: default
		SID: 15
		iSCSI Connection State: LOGGED IN
		iSCSI Session State: LOGGED_IN
		Internal iscsid Session State: NO CHANGE

Avihai, Can you explain how you add the extra node with the unexpected
tpgt on your system?

Comment 13 Nir Soffer 2022-06-20 15:02:27 UTC

I could reproduce the error with manual iscsiadm commands.

Host initial state:

# iscsiadm -m node -P1
Target: iqn.2003-01.org.alpine.01
	Portal: 192.168.122.34:3260,1
		Iface Name: default
Target: iqn.2003-01.org.alpine.02
	Portal: 192.168.122.34:3260,1
		Iface Name: default
	Portal: 192.168.122.35:3260,1
		Iface Name: default

# iscsiadm -m session
iscsiadm: No active sessions.


Adding unexpected portal:

# iscsiadm -m node -T iqn.2003-01.org.alpine.01 -I default -p 192.168.122.34:3260,2 --op=new
New iSCSI node [tcp:[hw=,ip=,net_if=,iscsi_if=default] 192.168.122.34,3260,2 iqn.2003-01.org.alpine.01] added

# iscsiadm -m node -P1
Target: iqn.2003-01.org.alpine.01
	Portal: 192.168.122.34:3260,1
		Iface Name: default
	Portal: 192.168.122.34:3260,2
		Iface Name: default
Target: iqn.2003-01.org.alpine.02
	Portal: 192.168.122.34:3260,1
		Iface Name: default
	Portal: 192.168.122.35:3260,1
		Iface Name: default


Trying to log in:

# iscsiadm -m node -T iqn.2003-01.org.alpine.01 -p 192.168.122.34:3260,2 -l
Logging in to [iface: default, target: iqn.2003-01.org.alpine.01, portal: 192.168.122.34,3260]
Logging in to [iface: default, target: iqn.2003-01.org.alpine.01, portal: 192.168.122.34,3260]
Login to [iface: default, target: iqn.2003-01.org.alpine.01, portal: 192.168.122.34,3260] successful.
iscsiadm: Could not login to [iface: default, target: iqn.2003-01.org.alpine.01, portal: 192.168.122.34,3260].
iscsiadm: initiator reported error (15 - session exists)
iscsiadm: Could not log into all portals

# iscsiadm -m session
tcp: [20] 192.168.122.34:3260,1 iqn.2003-01.org.alpine.01 (non-flash)


So the issue is having invalid configuration - 2 iscsi nodes with the same 
aadress, port, ipn, iface.

If this configuration is created by engine, please open engine bug.

On vdsm side, while this configuration is invalid, we do have one connection
on the right interface, so keeping it and warning about the invalid configuration
can help users.

Comment 14 Nir Soffer 2022-06-20 16:15:07 UTC

Avihai, can you test the upstream build with QE environment?
https://github.com/oVirt/vdsm/actions/runs/2530243581

Comment 15 Michal Skrivanek 2022-06-20 16:24:47 UTC

This is a user error. We never asked anyone to connect to the same target up front. The solution is simple - do not connect to anything prior to installation/adding SD.
Fixing it to ignore existing sessions is a good fail safe though

Comment 17 Kobi Hakimi 2022-06-21 11:58:53 UTC

I opened a new bug about the NonOperational status with a different scenario as you can see in:
https://bugzilla.redhat.com/show_bug.cgi?id=2099640

Comment 18 Arik 2022-06-21 14:21:44 UTC

(In reply to Kobi Hakimi from comment #17)
> I opened a new bug about the NonOperational status with a different scenario
> as you can see in:
> https://bugzilla.redhat.com/show_bug.cgi?id=2099640

Kobi, can you test it with https://github.com/oVirt/vdsm/actions/runs/2530243581? if it's handled by the fix for this bug then we need to understand whether that duplication is a new thing or what's its implication in order see if it is worth looking into that..

Comment 19 Kobi Hakimi 2022-06-22 08:50:43 UTC

Arik, after I got the explanation from Nir on how to do this update I did the following steps to try the new vdsm build

1. Open the Engine and reproduce the issue on the first host by moving the host to Maintenance then Activate.
2. The host becomes NonOperational and after ~5 minutes the host was restored and becomes UP.
3. SSH to the same host and check that indeed we have the extra tpgt=1 portal: iscsiadm -m node -P1
4. Download the new vdsm version from rpm-el8( https://github.com/oVirt/vdsm/actions/runs/2530243581 )
5. Extract the zip file: unzip rpm-el8.zip -d rpm-el8 and cd to rpm-el8 
6. Run update: dnf upgrade *.rpm


When moving the first host to Maintenance > Activate the host sty UP and Running!!

Looks Great!!

Notes:
- Tested on HE and not HE environments.
- The extra tpgt=1 portal is still there.

Comment 20 Avihai 2022-06-22 09:55:45 UTC

(In reply to Michal Skrivanek from comment #15)
> This is a user error. We never asked anyone to connect to the same target up
> front. The solution is simple - do not connect to anything prior to
> installation/adding SD.
> Fixing it to ignore existing sessions is a good fail safe though

We found two valid scenario's so far to reach this state which are not user's error but RHV's issues leading to non-operational -> Moving back to High Sev.
Even if the user do not connect to anything prior to installation/adding SD he will hit the same issue.

Valid Scenario's:
Scenario 1:  Adding a new storage connection webadmin - described in bug 2099997
Scenario 2:  Add new storage domain via ansible ovirt_storage_domain module  - described in bug 2099640

Comment 21 sshmulev 2022-07-10 10:24:26 UTC

Verified.

Versions:
vdsm-4.50.2-1.el8ev.x86_64
ovirt-engine-4.5.1.3-654.af4ac851f145.33.el8ev.noarch

The "session exists" error still shows up due to the extra tpgt=1 portal, but this time the host doesn't go to non-operational mode.

Comment 22 Sandro Bonazzola 2022-08-30 08:47:42 UTC

This bugzilla is included in oVirt 4.5.2 release, published on August 10th 2022.
Since the problem described in this bug report should be resolved in oVirt 4.5.2 release, it has been closed with a resolution of CURRENT RELEASE.
If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.