Bug 1259518 - vdsm fails to log into all targets [NEEDINFO]
vdsm fails to log into all targets
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm (Show other bugs)
3.5.3
All Linux
high Severity high
: ovirt-3.6.3
: 3.6.0
Assigned To: Nir Soffer
Virtualization Bugs
storage
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-09-02 16:51 EDT by Allan Voss
Modified: 2016-03-10 01:25 EST (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-09-23 16:30:04 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
nsoffer: needinfo? (avoss)
nsoffer: needinfo? (avoss)


Attachments (Terms of Use)
rhev-m-db (1.85 KB, text/plain)
2015-09-03 15:10 EDT, Douglas Duckworth
no flags Details

  None (edit)
Description Allan Voss 2015-09-02 16:51:58 EDT
Description of problem:
On boot, vdsm logs in to only 8 of the 12 necessary targets

Version-Release number of selected component (if applicable):
Red Hat Enterprise Virtualization Hypervisor release 6.6 (20150603.0.el6ev)
rhevm-3.5.3.1-1.4.el6ev.noarch


How reproducible:
Unknown

Steps to Reproduce:
1. Add host to environment
2. activate host

Actual results:
Host logs into 8 of the 12 necessary targets

Expected results:
Host logs into all necessary targets

Additional info:
Comment 2 Fabian Deutsch 2015-09-03 05:05:44 EDT
Nir, can you tell by the logs why the login fails?
Comment 3 Nir Soffer 2015-09-03 06:38:36 EDT
(In reply to Fabian Deutsch from comment #2)
> Nir, can you tell by the logs why the login fails?

There no vdsm logs in this bug.

I don't see any evidence that the host failed to log in; according to the
information in this bug, only 8 targets were discovered and the host
was connected to all of them.

Manual discover added 4 additional targets, and the host could login to
all of them.

Note that adding node in this way is not compatible with vdsm; You should
delete these nodes and manage the connections via engine. Otherwise, the host
will login to all nodes automatically on startup, while it should login to
them only when engine ask vdsm to connect to the storage.

It is possible that the 4 additional targets were not available when the user
configured the storage. If a target is not stored in engine database, vdsm
will not connect to it.

To continue with this I would:

1. Remove the nodes added manually on this host (or all hosts that were
configured in this way):

    iscsiadm -m node -o delete

2. Edit the storage doamin
3. Perform discovery
4. Login to all targets that are part of the storage domain

After that, all hosts should connect to all targets needed by the storage
domain.

If no LUN is used on a target, vdsm will not connect to it.
Comment 4 Douglas Duckworth 2015-09-03 15:10:59 EDT
Created attachment 1070057 [details]
rhev-m-db
Comment 5 Douglas Duckworth 2015-09-03 15:11:51 EDT
On Host tulhv1p03 I did the following.

1. Put Host in Maintenance Mode

2. Logged out of sessions I added manually then deleted nodes which I added manually using:

iscsiadm -m node -u
iscsiadm -m node -o delete

3. Activated Host with the following nodes / sessions present after Host reached Up status:


10.10.192.110:3260,0 iqn.2002-03.com.compellent:5000d310005c6861
10.10.192.110:3260,0 iqn.2002-03.com.compellent:5000d310005c685f
10.10.192.110:3260,0 iqn.2002-03.com.compellent:5000d310005c685c
10.10.192.110:3260,0 iqn.2002-03.com.compellent:5000d310005c685a
10.10.192.110:3260,0 iqn.2002-03.com.compellent:5000d310005c6860
10.10.192.110:3260,0 iqn.2002-03.com.compellent:5000d310005c685d
10.10.192.110:3260,0 iqn.2002-03.com.compellent:5000d310005c685b
10.10.192.110:3260,0 iqn.2002-03.com.compellent:5000d310005c685e

[root@tulhv1p03 ~]# iscsiadm -m node | wc -l
8

[root@tulhv1p03 ~]# iscsiadm -m session
tcp: [13] 10.10.192.110:3260,0 iqn.2002-03.com.compellent:5000d310005c685e (non-flash)
tcp: [14] 10.10.192.110:3260,0 iqn.2002-03.com.compellent:5000d310005c685b (non-flash)
tcp: [15] 10.10.192.110:3260,0 iqn.2002-03.com.compellent:5000d310005c685d (non-flash)
tcp: [16] 10.10.192.110:3260,0 iqn.2002-03.com.compellent:5000d310005c6860 (non-flash)
tcp: [17] 10.10.192.110:3260,0 iqn.2002-03.com.compellent:5000d310005c685a (non-flash)
tcp: [18] 10.10.192.110:3260,0 iqn.2002-03.com.compellent:5000d310005c685c (non-flash)
tcp: [19] 10.10.192.110:3260,0 iqn.2002-03.com.compellent:5000d310005c685f (non-flash)
tcp: [20] 10.10.192.110:3260,0 iqn.2002-03.com.compellent:5000d310005c6861 (non-flash)

[root@tulhv1p03 ~]# iscsiadm -m session | wc -l
8

Note at this point these targets are missing from Host tulhv1p03:

iqn.2002-03.com.compellent:5000d310005c687a
iqn.2002-03.com.compellent:5000d310005c6879
iqn.2002-03.com.compellent:5000d310005c6878
iqn.2002-03.com.compellent:5000d310005c6877

They are defined in the RHEV-M database in "storage_server_connections" under "iqn:"

I see that after adding these targets manually on the Host, using "iscsiadm -m discovery -t st -o new -p 10.10.192.110," two out of the four targets do have storage attached seen through "iscsiadm -m session -P3."

Targets without attached storage:

iqn.2002 03.com.compellent:5000d310005c687a
iqn.2002-03.com.compellent:5000d310005c6879

Targets with attached storage:

Target: iqn.2002-03.com.compellent:5000d310005c6877 (non-flash) 
Current Portal: 10.10.192.98:3260,0
Persistent Portal: 10.10.192.110:3260,
Iface Initiatorname: iqn.1994-05.com.redhat:tulhv1p03

              Attached SCSI devices:
                ************************
                Host Number: 32 State: running
                scsi32 Channel 00 Id 0 Lun: 103
                        Attached scsi disk sdkl         State: running
                scsi32 Channel 00 Id 0 Lun: 105
                        Attached scsi disk sdko         State: running
                scsi32 Channel 00 Id 0 Lun: 108
                        Attached scsi disk sdku         State: running
                scsi32 Channel 00 Id 0 Lun: 110
                        Attached scsi disk sdky         State: running
                scsi32 Channel 00 Id 0 Lun: 111
                        Attached scsi disk sdkz         State: running
                scsi32 Channel 00 Id 0 Lun: 114
                        Attached scsi disk sdkf         State: running
                scsi32 Channel 00 Id 0 Lun: 116
                        Attached scsi disk sdkh         State: running
                scsi32 Channel 00 Id 0 Lun: 117
                        Attached scsi disk sdkj         State: running
                scsi32 Channel 00 Id 0 Lun: 120
                        Attached scsi disk sdkn         State: running
                scsi32 Channel 00 Id 0 Lun: 121
                        Attached scsi disk sdkq         State: running
                scsi32 Channel 00 Id 0 Lun: 122
                        Attached scsi disk sdks         State: running
                scsi32 Channel 00 Id 0 Lun: 124
                        Attached scsi disk sdkw         State: running


Target: iqn.2002-03.com.compellent:5000d310005c6878 (non-flash)
        Current Portal: 10.10.192.99:3260,0
        Persistent Portal: 10.10.192.110:3260,0
Iface Initiatorname: iqn.1994-05.com.redhat:tulhv1p03

                ************************
                Attached SCSI devices:
                ************************
                Host Number: 31 State: running
                scsi31 Channel 00 Id 0 Lun: 103
                        Attached scsi disk sdkk         State: running
                scsi31 Channel 00 Id 0 Lun: 105
                        Attached scsi disk sdkp         State: running
                scsi31 Channel 00 Id 0 Lun: 108
                        Attached scsi disk sdkv         State: running
                scsi31 Channel 00 Id 0 Lun: 110
                        Attached scsi disk sdla         State: running
                scsi31 Channel 00 Id 0 Lun: 111
                        Attached scsi disk sdlb         State: running
                scsi31 Channel 00 Id 0 Lun: 114
                        Attached scsi disk sdke         State: running
                scsi31 Channel 00 Id 0 Lun: 116
                        Attached scsi disk sdkg         State: running
                scsi31 Channel 00 Id 0 Lun: 117
                        Attached scsi disk sdki         State: running
                scsi31 Channel 00 Id 0 Lun: 120
                        Attached scsi disk sdkm         State: running
                scsi31 Channel 00 Id 0 Lun: 121
                        Attached scsi disk sdkr         State: running
                scsi31 Channel 00 Id 0 Lun: 122
                        Attached scsi disk sdkt         State: running
                scsi31 Channel 00 Id 0 Lun: 124
                        Attached scsi disk sdkx         State: running

Our Cluster has 13 Storage Domains.  I would like to complete steps 2-4, as suggested in the above comment, but simply editing one Storage Domain, doing discovery, then logging in takes a very long time.  Since we see that these targets exist in the RHEV-M database, while two out of four do have LUNs, then shouldn't vdsm log into those with storage when the Host is activated?  


On the Host tulhv1p03 in /var/log/vdsm/vdsm.log I do not see vdsm doing that:


[root@tulhv1p03 ~]# date
Thu Sep  3 19:05:36 UTC 2015

2015-09-03 18:31:05,533::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2002-03.com.compellent:5000d310005c685e -I default -p 10.10.192.110:3260,0 -l (cwd None)

2015-09-03 18:31:06,692::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2002-03.com.compellent:5000d310005c685b -I default -p 10.10.192.110:3260,0 -l (cwd None)

2015-09-03 18:31:07,281::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2002-03.com.compellent:5000d310005c685d -I default -p 10.10.192.110:3260,0 -l (cwd None)

2015-09-03 18:31:07,818::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2002-03.com.compellent:5000d310005c6860 -I default -p 10.10.192.110:3260,0 -l (cwd None)

2015-09-03 18:31:08,535::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2002-03.com.compellent:5000d310005c685a -I default -p 10.10.192.110:3260,0 -l (cwd None)

2015-09-03 18:31:09,108::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2002-03.com.compellent:5000d310005c685c -I default -p 10.10.192.110:3260,0 -l (cwd None)

2015-09-03 18:31:09,830::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2002-03.com.compellent:5000d310005c685f -I default -p 10.10.192.110:3260,0 -l (cwd None)

2015-09-03 18:31:10,542::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2002-03.com.compellent:5000d310005c6861 -I default -p 10.10.192.110:3260,0 -l (cwd None)
Comment 6 Nir Soffer 2015-09-04 07:44:07 EDT
(In reply to Douglas Duckworth from comment #5)

This is expected bahavior, if the these LUNs are not part of any storage
domain. A host will be connect only to target needed by the active storage
domains.

There is one possible problem, maybe a target was not available when you
created the storage domain (e.g. temporary auth failure). To check for this
issue, use "multipath -ll" to list all the devices and their paths, and
check that all multipath devices have the expected number of paths.

If all multipath devices have the expected number of paths, there is no problem, and the system is behaving correctly.
Comment 7 Fabian Deutsch 2015-09-17 09:04:58 EDT
Moving this to vdsm for now, as - if it is an issue - then it's rather in the vdsm domain, despite that it looks like notabug for now.
Comment 8 Allon Mureinik 2015-09-17 09:28:52 EDT
(In reply to Fabian Deutsch from comment #7)
> Moving this to vdsm for now, as - if it is an issue - then it's rather in
> the vdsm domain, despite that it looks like notabug for now.

Agreed.
Assigning to Nir in the meanwhile, as he was already involved in this BZ, but all signs do indeed point to NOTABUG.
Comment 9 Douglas Duckworth 2015-09-23 16:19:40 EDT
We confirmed that vdsm does in fact add needed targets to Hosts when a Guest VM with Direct LUNs is moved to that Host.  Upon Activating the Host it will only have Targets for Storage Domains since Storage Domains must be present across all members of the Cluster, but Direct LUNs are only mapped to Hosts which contain Guest VMs that use those Direct LUNs.  So there is no bug.  Thank you for tolerating my novice understanding of RHEV.
Comment 10 Nir Soffer 2015-09-23 16:30:04 EDT
(In reply to Douglas Duckworth from comment #9)
Thanks for updating Douglas, closing based on commment 9.

Note You need to log in before you can comment on or make changes to this bug.