Bug 1159637

Summary: [JSONRPC] Extend size operation of an attached iSCSi domain, fail with CDA
Product: Red Hat Enterprise Virtualization Manager Reporter: Ori Gofen <ogofen>
Component: vdsmAssignee: Piotr Kliczewski <pkliczew>
Status: CLOSED CURRENTRELEASE QA Contact: Ori Gofen <ogofen>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.5.0CC: acanan, amureini, bazulay, cmestreg, danken, ecohen, gklein, iheim, lpeer, lsurette, pkliczew, rbalakri, Rhev-m-bugs, scohen, tnisan, yeylon
Target Milestone: ---Keywords: Regression
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: vdsm-4.16.7.6-1.el6ev Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-16 13:39:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs+image none

Description Ori Gofen 2014-11-02 13:39:57 UTC
Created attachment 952906 [details]
logs+image

Description of problem:

The operation of detach-attach iSCSi storage domain fail to expose all Luns to the receiving Host.
Upon attach the receiving host reports on 3 session out of 4 , while it is logged in to each session twice:

root@RHEL7Server /rhev/data-center
 # iscsiadm -m session
tcp: [1] 10.35.160.107:3260,4 iqn.1992-04.com.emc:cx.ckm00121000438.b7 (non-flash)
tcp: [2] 10.35.160.107:3260,4 iqn.1992-04.com.emc:cx.ckm00121000438.b7 (non-flash)
tcp: [3] 10.35.160.105:3260,2 iqn.1992-04.com.emc:cx.ckm00121000438.a7 (non-flash)
tcp: [4] 10.35.160.105:3260,2 iqn.1992-04.com.emc:cx.ckm00121000438.a7 (non-flash)
tcp: [5] 10.35.160.104:3260,1 iqn.1992-04.com.emc:cx.ckm00121000438.a6 (non-flash)
tcp: [6] 10.35.160.106:3260,3 iqn.1992-04.com.emc:cx.ckm00121000438.b6 (non-flash)
tcp: [7] 10.35.160.106:3260,3 iqn.1992-04.com.emc:cx.ckm00121000438.b6 (non-flash)

at the UI discovery window receiving host is connected to all sessions.

The operation of attach is succesfull,but when we attempt, edit->extend size, the operation fails with CDA(see image)

engine throws:

2014-11-02 15:17:37,452 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDevicesVisibilityVDSCommand] (ajp-/127.0.0.1:8702-7) [8b1c200] Failed in GetDevicesVisibilityVDS method
2014-11-02 15:17:37,453 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDevicesVisibilityVDSCommand] (ajp-/127.0.0.1:8702-7) [8b1c200] Command org.ovirt.engine.core.vdsbroker.vdsbroker.GetDevicesVisibilityVDSCommand return value

DevicesVisibilityMapReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=-32603, mMessage=Internal JSON-RPC error.]]

2014-11-02 15:17:37,458 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDevicesVisibilityVDSCommand] (ajp-/127.0.0.1:8702-7) [8b1c200] HostName = 10.35.2.53
2014-11-02 15:17:37,459 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDevicesVisibilityVDSCommand] (ajp-/127.0.0.1:8702-7) [8b1c200] Command GetDevicesVisibilityVDSCommand(HostName = 10.35.2.53, HostId = 46ceb82f-c1ca-4511-a840-bf1067b12e35, devicesIds=[360060160f4a03000b1f68463c259e411, 360060160f4a03000acefe985b05ee411, 360060160f4a030000c87bc4db05ee411]) execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetDevicesVisibilityVDS, error = Internal JSON-RPC error., code = -32603
2014-11-02 15:17:37,459 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDevicesVisibilityVDSCommand] (ajp-/127.0.0.1:8702-7) [8b1c200] FINISH, GetDevicesVisibilityVDSCommand, log id: 1477bcc7
2014-11-02 15:17:37,459 ERROR [org.ovirt.engine.core.bll.storage.ConnectAllHostsToLunCommand] (ajp-/127.0.0.1:8702-7) [8b1c200] Command org.ovirt.engine.core.bll.storage.ConnectAllHostsToLunCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetDevicesVisibilityVDS, error = Internal JSON-RPC error., code = -32603 (Failed with error unexpected and code 16)
2014-11-02 15:17:37,482 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp-/127.0.0.1:8702-7) [8b1c200] Correlation ID: 8b1c200, Job ID: da163531-36ef-4530-b20a-f4e8b6e3efc4, Call Stack: null, Custom Event ID: -1, Message: Failed to connect Host 10.35.2.53 to device. (User: admin)
2014-11-02 15:17:37,488 WARN  [org.ovirt.engine.core.bll.storage.ExtendSANStorageDomainCommand] (ajp-/127.0.0.1:8702-7) [8b1c200] CanDoAction of action ExtendSANStorageDomain failed. Reasons:VAR__TYPE__STORAGE__DOMAIN,VAR__ACTION__EXTEND,ERROR_CANNOT_EXTEND_CONNECTION_FAILED,$hostName 10.35.2.53,$lun


Version-Release number of selected component (if applicable):
vt8

How reproducible:
100%

Steps to Reproduce:
* Have a storage group with 4 iqn's
1.(on first dc) Create iSCSi domain
2.(on first dc) Create NFS Domain
3.(on second dc) Craete NFS domain
4. detach iSCSi domain and attach it to second dc
5. extend it's size (choose all available Luns if you don't know which one will fail the operation)

Actual results:
The operation fails with CDA

Expected results:
operation should be successful

Additional info:
ogofen-vdsb (receiving host)

Comment 2 Allon Mureinik 2014-11-06 16:02:26 UTC
What sticks out for me is this error:

2014-11-02 15:17:37,453 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDevicesVisibilityVDSCommand] (ajp-/127.0.0.1:8702-7) [8b1c200] Command org.ovirt.engine.core.vdsbroker.vdsbroker.GetDevicesVisibilityVDSCommand return value

DevicesVisibilityMapReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=-32603, mMessage=Internal JSON-RPC error.]]


Piotr, can you take a look please?

Comment 3 Piotr Kliczewski 2014-11-07 07:52:08 UTC
This error code is returned when something unexpected happened during invocation of a verb.

By looking at vdsm logs I found:

Thread-235::DEBUG::2014-11-02 15:17:37,013::lvm::288::Storage.Misc.excCmd::(cmd) FAILED: <err> = '  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n  TEST MODE: Metadata will NOT be updated and volumes will not be (de)activated.\n  Couldn\'t find device with uuid O2lLNq-ySlq-WSPd-6MsF-VPfN-ZUQ8-05vjJn.\n  Couldn\'t find device with uuid Xb49tk-bHna-TNRG-tP2Y-Lgsx-Sc5P-JrDppd.\n  Couldn\'t find device with uuid RpmtJa-6nlX-c83h-ZeRE-ATGY-9MWF-2r5gsQ.\n  Couldn\'t find device with uuid daeLdr-o4yg-BHt5-4ohE-3Tlm-pA3B-WyRwAc.\n  Can\'t initialize physical volume "/dev/mapper/360060160f4a03000e9492254c259e411" of volume group "577145c7-5949-4abf-b5af-7056948e165a" without -ff\n  Can\'t initialize physical volume "/dev/mapper/360060160f4a030007da7d711c259e411" of volume group "577145c7-5949-4abf-b5af-7056948e165a" without -ff\n  Can\'t initialize physical volume "/dev/mapper/360060160f4a03000b1f68463c259e411" of volume group "95da13b9-b971-4e74-85cc-9f42d0b0df60" without -ff\n  Can\'t initialize physical volume "/dev/mapper/360060160f4a030000c87bc4db05ee411" of volume group "6c2c285d-a930-4eb9-aca8-91bf2940ee72" without -ff\n  You have a memory leak (not released memory pool):\n   [0x7fe018e6c480] format_instance\n  Internal error: Unreleased memory pool(s) found.\n'; <rc> = 5
Thread-235::DEBUG::2014-11-02 15:17:37,014::lvm::852::Storage.LVM::(testPVCreate) rc: 5, out: ['  Physical volume "/dev/mapper/360060160f4a03000acefe985b05ee411" successfully created'], err: ['  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!', '  TEST MODE: Metadata will NOT be updated and volumes will not be (de)activated.', "  Couldn't find device with uuid O2lLNq-ySlq-WSPd-6MsF-VPfN-ZUQ8-05vjJn.", "  Couldn't find device with uuid Xb49tk-bHna-TNRG-tP2Y-Lgsx-Sc5P-JrDppd.", "  Couldn't find device with uuid RpmtJa-6nlX-c83h-ZeRE-ATGY-9MWF-2r5gsQ.", "  Couldn't find device with uuid daeLdr-o4yg-BHt5-4ohE-3Tlm-pA3B-WyRwAc.", '  Can\'t initialize physical volume "/dev/mapper/360060160f4a03000e9492254c259e411" of volume group "577145c7-5949-4abf-b5af-7056948e165a" without -ff', '  Can\'t initialize physical volume "/dev/mapper/360060160f4a030007da7d711c259e411" of volume group "577145c7-5949-4abf-b5af-7056948e165a" without -ff', '  Can\'t initialize physical volume "/dev/mapper/360060160f4a03000b1f68463c259e411" of volume group "95da13b9-b971-4e74-85cc-9f42d0b0df60" without -ff', '  Can\'t initialize physical volume "/dev/mapper/360060160f4a030000c87bc4db05ee411" of volume group "6c2c285d-a930-4eb9-aca8-91bf2940ee72" without -ff', '  You have a memory leak (not released memory pool):', '   [0x7fe018e6c480] format_instance', '  Internal error: Unreleased memory pool(s) found.'], unusedDevs: set(['/dev/mapper/360060160f4a03000acefe985b05ee411']), usedDevs: set(['/dev/mapper/360060160f4a03000e9492254c259e411', '/dev/mapper/360060160f4a030007da7d711c259e411', '/dev/mapper/360060160f4a030000c87bc4db05ee411', '/dev/mapper/360060160f4a03000b1f68463c259e411'])

Is it something exepected?

Comment 4 Ori Gofen 2014-11-10 15:03:30 UTC
This bug consists from two bugs, the CDA on extend block domains has nothing to do with the iSCSi registration as I have described, it just seemed natural that that's the problem, but it isn't, the iqn duplicity bug happens on hosts with multiple nics only, and will be opened separately from this bug.

Comment 5 Ori Gofen 2014-11-10 15:12:53 UTC
Updated bug again :)
The fail is due to another json-rpc bug, couldn't reproduce on hosts that use xml-rpc.

Comment 7 Ori Gofen 2014-11-16 16:49:39 UTC
iqn duplicity bug bz #1164576

Comment 8 Dan Kenigsberg 2014-11-19 16:55:57 UTC
This is a regression versus rhev-3.4 (with xmlrpc). It must be solved in rhev-3.5.0

Comment 9 Ori Gofen 2014-12-02 11:31:00 UTC
verified on vt12