Bug 1312805 - Adding an iSCSI storage crashes node and sets all VM states to Unknown
Adding an iSCSI storage crashes node and sets all VM states to Unknown
Status: CLOSED NOTABUG
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage (Show other bugs)
3.6.2.6
x86_64 Linux
medium Severity high (vote)
: ovirt-3.6.6
: ---
Assigned To: Tal Nisan
Aharon Canan
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-02-29 04:47 EST by nicolas
Modified: 2016-03-08 05:41 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-03-04 11:39:00 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
ylavi: ovirt‑3.6.z?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)
Engine log at crash time (138.64 KB, text/plain)
2016-02-29 04:47 EST, nicolas
no flags Details
VDSM log at crash time (13.14 MB, text/plain)
2016-02-29 04:48 EST, nicolas
no flags Details
FWIW, I found this entries in the /var/log/messages file, seems that iscsi is crashing badly (50.52 KB, text/plain)
2016-02-29 18:00 EST, nicolas
no flags Details
multipath -v10 output (14.37 KB, text/plain)
2016-02-29 18:49 EST, nicolas
no flags Details

  None (edit)
Description nicolas 2016-02-29 04:47:42 EST
Created attachment 1131472 [details]
Engine log at crash time

Description of problem:

We're running oVirt with Gluster as the main storage backend currently. We're planning to migrate storage to iSCSI, so we've been trying to add a iSCSI backend from the webadmin, resulting in the chosen node (by default the SPM) crashed and all its VMs set status to Unknown.

VDSM version is 4.17.18.

Version-Release number of selected component (if applicable):

oVirt-engine: v. 3.6.2.6-1
VDSM: 4.17.18-0

How reproducible:

Always

Steps to Reproduce:
1. Click on the storage tab
2. Click on the New domain button
3. Change storage type to iSCSI
4. As discover targets we set the IP address (10.X.X.80 in the logs), port is default.
5. Click on discover: The target appears correctly (iqn: iqn.2003-10.com.lefthandnetworks:p4000-kvm:67:ovirt-rojo)
6. Click on the arrow button ([->]) 
7. The chosen VDSM node hangs, on the Hosts tab we see 'Connecting' and timing out and setting host as non-responsive, which leads to VMs be set as Unknown and needing to be forced to be reset via DB.

Actual results:

To minimize impact, we migrated all VMs from one host and forced that host to add the storage and see it crash so no VMs are affected (getting the same result, though).


Additional info:

I'm attaching 2 logs: engine.log and vdsm.log from the crashing node.

Some legend:

* kvmr0X.domain.com: Nodes
* IPs 10.X.X.11 to 10.X.X.15 are nodes' service IPs of hosts 1 to 5 respectively.
* IP 10.X.X.60 and on are dedicated to storage (10.X.X.80 is the iSCSI target VIP).
* 192.168.100.X are migration IPs.
Comment 1 nicolas 2016-02-29 04:48 EST
Created attachment 1131473 [details]
VDSM log at crash time
Comment 2 Allon Mureinik 2016-02-29 07:13:41 EST
Tal, as the QE contact, can you take a look please?
Comment 3 nicolas 2016-02-29 18:00 EST
Created attachment 1131747 [details]
FWIW, I found this entries in the /var/log/messages file, seems that iscsi is crashing badly
Comment 4 nicolas 2016-02-29 18:49 EST
Created attachment 1131751 [details]
multipath -v10 output

I'm attaching the multipath -v10 command output. As relevant I see this line:

36000eb3a4f1acbc20000000000000043: set ACT_CREATE (map does not exist)

Still can't find out why is it caused by
Comment 5 nicolas 2016-03-04 11:39:00 EST
I finally could find the culprit. Seems that one of the hops between hosts and storage servers had MTU 1500 when 9000 was expected. Fixed that and now it works.

Note You need to log in before you can comment on or make changes to this bug.