Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1014613

Summary:

[vdsm] Cannot import older VM

Product:

Red Hat Enterprise Virtualization Manager

Reporter:

Jiri Belka <jbelka>

Component:

vdsm

Assignee:

Sergey Gotliv <sgotliv>

Status:

CLOSED WORKSFORME

QA Contact:

Severity:

high

Docs Contact:

Priority:

urgent

Version:

3.3.0

CC:

abaron, bazulay, danken, hateya, iheim, jbelka, lpeer, mpavlik, scohen, sgotliv, yeylon

Target Milestone:

---

Keywords:

Triaged

Target Release:

3.3.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

storage

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2013-10-25 20:52:54 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Storage

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
engine.log, vdsm.log, ovf file, list of images/$uuid	none
logs, 2nd wave	none
logs from logcollector	none

Description Jiri Belka 2013-10-02 11:54:44 UTC

Created attachment 806399 [details]
engine.log, vdsm.log, ovf file, list of images/$uuid

Description of problem:
In Admin Portal: Failed to import Vm jb-wxp to Data Center Default, Cluster Default

The VM is older VM, exported on 3.2 env (2013-Jun-05, 16:58).

8adc79ee-8269-48b1-993b-407fb829f8ff is my data domain - str-03-jb33
ca414eb9-98ab-422f-8c4e-c2c513231058 is my export domain - str-01-jbelka-export

Thread-34372::ERROR::2013-10-02 13:11:21,834::hsm::1518::Storage.HSM::(deleteImage) Empty or not found image 14f8de7c-6c67-496a-8998-6ddbeb8db64d in SD 8adc79ee-8269-48b1-993b-407fb829f8ff. {'1ac141cf-6718-4110-bd81-8760da6baa64': ImgsPar(imgs=('45f88dd5-5219-495a-b971-059aaf146149', 'd542a9a9-47b2-45c0-84be-2b558cb15667'), parent='00000000-0000-0000-0000-000000000000'), '6e4ad20f-d63a-4991-a878-2453cac4c71d': ImgsPar(imgs=('3a2a9619-f79b-4fa2-ac5d-029b50c7cf92', '31170ed0-a452-4a79-9864-16f6c5471ef8'), parent='00000000-0000-0000-0000-000000000000'), '61689bae-e53e-4bcc-8ea8-93f2a871c606': ImgsPar(imgs=('cbe5c385-19d4-4e53-8f1f-86778bc526db',), parent=None), '445adfcf-894e-45df-9043-9a0c9fd472ac': ImgsPar(imgs=('4592eec4-0bbc-4458-879e-f872d950d135',), parent=None), '149f3399-97e7-464a-8005-b1ddb216fab1': ImgsPar(imgs=('8ac5910a-9b90-47c5-bfbe-f6d268c32045',), parent=None), '2faeb0f3-19de-4bcb-bcc2-42967bf3dc50': ImgsPar(imgs=('7fc7bade-4434-4da5-b114-d8be2c74ef6d',), parent=None), '59c8c7e8-9a05-4901-91b3-b18b00fbb3c2': ImgsPar(imgs=('631781c0-b748-4700-b64d-037c625068ba',), parent=None), '862457f0-6269-4ddc-95da-d8e37e8b02cb': ImgsPar(imgs=('31170ed0-a452-4a79-9864-16f6c5471ef8',), parent=None), '5f89ae09-fae6-406f-9346-10796a2c1d9e': ImgsPar(imgs=('631781c0-b748-4700-b64d-037c625068ba',), parent=None), '63c2949f-2738-4abc-bcd0-dbda38eda1b1': ImgsPar(imgs=('5210cbaa-6adf-456f-a7ca-5c7d56697d98', 'cbe5c385-19d4-4e53-8f1f-86778bc526db'), parent='00000000-0000-0000-0000-000000000000'), 'aad255ee-2fa3-4659-a5de-8e775b20d361': ImgsPar(imgs=('f69a71d5-912d-447c-8cf6-ae247371abbd',), parent=None), 'bfde185f-b470-4e94-822d-f1a256a44f13': ImgsPar(imgs=('d542a9a9-47b2-45c0-84be-2b558cb15667',), parent=None), '0c84422e-1276-42f4-af5f-d591a609958b': ImgsPar(imgs=('78441be0-c017-4f69-81f6-fa61fe38d600',), parent=None), '0c25a184-5d4b-418f-b10e-ae179020e6bc': ImgsPar(imgs=('f4e2f9db-7e74-4ccf-abcc-46800110a3b1',), parent=None), 'eb499557-89ff-43d0-ab0d-666242043f95': ImgsPar(imgs=('8ac5910a-9b90-47c5-bfbe-f6d268c32045',), parent=None), 'b06143aa-6b4e-4c4d-853b-7a1fbea2aa2c': ImgsPar(imgs=('e8a6f991-0cd8-42a3-a047-a0ea2c5dbce9',), parent=None), '70aa9a8d-0a3b-4760-9efe-e33608d365f1': ImgsPar(imgs=('3f0d11e6-62fa-4192-bf7b-4d3b956402c3',), parent=None)}
Thread-34372::ERROR::2013-10-02 13:11:21,834::task::850::TaskManager.Task::(_setError) Task=`655341c6-9c73-40a4-84d1-7cbfe99e7036`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1519, in deleteImage
    raise se.ImageDoesNotExistInSD(imgUUID, sdUUID)
Thread-34372::ERROR::2013-10-02 13:11:21,841::dispatcher::67::Storage.Dispatcher.Protect::(run) {'status': {'message': "Image does not exist in domain: 'image=14f8de7c-6c67-496a-8998-6ddbeb8db64d, domain=8adc79ee-8269-48b1-993b-407fb829f8ff'", 'code': 268}}

Version-Release number of selected component (if applicable):
is17

How reproducible:
is17
vdsm-4.12.0-170.el6ev.x86_64
libvirt-0.10.2-27.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.406.el6.x86_64
qemu-img-rhev-0.12.1.2-2.406.el6.x86_64

Steps to Reproduce:
1. have a vm from 3.2 on export domain
2. try to import this vm into 3.3
3.

Actual results:
failure

Expected results:


Additional info:

Comment 1 Sergey Gotliv 2013-10-02 21:56:21 UTC

Jiri,

Did you try to import that VM again? 

I am asking because initial import failed because Engine lost connectivity to SPM for a few seconds (see Engine log around 13:11:02). Unfortunately, I don't see when this import is started and can't follow the flow because Engine's log is partial, but I see when it failed. 

By the way, can you attach a full log?

During import VM we copying images so in case of failure we have to remove them from target storage domain. To do so, Engine is sending DeleteImage request to VDSM image by image, but since VDSM didn't copy some of them in the first place it may respond with ImageDoesNotExistInSDError. This printing in the log doesn't mean that something wrong. Engine handles this error as success and proceed to delete that image from database. So till now I don't see any problem.

Please, try to import VM again and if you succeed close this bug.

I am changing summary of this bug because its misleading.

Comment 2 Sergey Gotliv 2013-10-03 07:27:51 UTC

Yaniv,

Why do you think this is regression?

Comment 3 Jiri Belka 2013-10-07 15:31:19 UTC

Interesting. I don't see feel any network issue, so why engine is thinking my host is non-responding?

Full logs attached.

2013-Oct-07, 17:26 Failed to import Vm jb-w8-x86 to Data Center Default, Cluster Default
2013-Oct-07, 17:26 Storage Pool Manager runs on Host dell-r210ii-13 (Address: 10.34.62.205).
2013-Oct-07, 17:25 State was set to Up for host dell-r210ii-13.
2013-Oct-07, 17:25 Host dell-r210ii-13 is non responsive.
2013-Oct-07, 17:24 Invalid status on Data Center Default. Setting Data Center status to Non Responsive (On host dell-r210ii-13, Error: Network error during communication with the Host.).
2013-Oct-07, 17:19 Starting to import Vm jb-w8-x86 to Data Center Default, Cluster Default
2013-Oct-07, 17:16 Failed to import Vm jb-w8-x86 to Data Center Default, Cluster Default
...snip...

Comment 4 Jiri Belka 2013-10-07 15:33:55 UTC

Created attachment 808907 [details]
logs, 2nd wave

Comment 5 Sergey Gotliv 2013-10-08 11:37:46 UTC

Jiri,

You see it in the log, right?


2013-10-07 17:15:25,174 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-70) IrsBroker::Failed::GetStoragePoolInfoVDS due to: ConnectException: Connection refused
2013-10-07 17:15:25,191 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-83) Command GetCapabilitiesVDS execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2013-10-07 17:15:25,192 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-83) Failed to refresh VDS , vds = a53ccb2c-9993-46b5-8fb9-293b02aeb8b3 : dell-r210ii-13, VDS Network Error, continuing.
java.net.ConnectException: Connection refused
2013-10-07 17:15:26,166 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (DefaultQuartzScheduler_Worker-84) Command HSMGetAllTasksStatusesVDS execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2013-10-07 17:15:28,200 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-88) Command GetCapabilitiesVDS execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2013-10-07 17:15:28,201 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-88) Failed to refresh VDS , vds = a53ccb2c-9993-46b5-8fb9-293b02aeb8b3 : dell-r210ii-13, VDS Network Error, continuing.
java.net.ConnectException: Connection refused
2013-10-07 17:15:31,208 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-92) Command GetCapabilitiesVDS execution failed. Exception: VDSNetworkException: java.net.Co


Engine failes to communicate with VDSM, but VDSM logs you provided are ended in 15:31. Why???


Thread-24::DEBUG::2013-10-07 15:31:47,424::fileSD::239::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n374 bytes (374 B) copied, 0.000216256 s, 1.7 MB/s\n'; <rc> = 0

I need all logs in order to understand whats going on your host.

Comment 6 Jiri Belka 2013-10-09 09:58:11 UTC

Completely fresh install of rhevm (is17) and rhevh (20130918.0.el6).

2013-Oct-09, 11:46
Failed to import Vm jb-w7-x64 to Data Center Default, Cluster Default
	
2013-Oct-09, 11:46
Storage Pool Manager runs on Host dell-r210ii-13 (Address: 10.34.62.205).
	
2013-Oct-09, 11:45
State was set to Up for host dell-r210ii-13.
	
2013-Oct-09, 11:45
Host dell-r210ii-13 is non responsive.
	
2013-Oct-09, 11:44
Invalid status on Data Center Default. Setting Data Center status to Non Responsive (On host dell-r210ii-13, Error: Network error during communication with the Host.).
	
2013-Oct-09, 11:43
Used Network resources of host dell-r210ii-13 [98%] exceeded defined threshold [95%].

Comment 7 Jiri Belka 2013-10-09 09:59:58 UTC

Created attachment 809834 [details]
logs from logcollector

Comment 8 Dan Kenigsberg 2013-10-25 19:59:55 UTC

Sergey, it seems as if you've moved this bug to ON_QA by mistake. If it was intentional, please explain why.

Comment 9 Sergey Gotliv 2013-10-25 20:52:54 UTC

See my comment 5. I didn't have appropriate logs to work with so I moved bug to QA and asked to provide missing logs. 

Anyway If I remember correctly I spoke with Jiri and Haim and we agreed that there is no storage issue here. Jiri confirmed that flow is working perfect on other machines and I checked it on mine.