Bug 1014613 - [vdsm] Cannot import older VM
Summary: [vdsm] Cannot import older VM
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 3.3.0
Assignee: Sergey Gotliv
QA Contact:
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-10-02 11:54 UTC by Jiri Belka
Modified: 2016-02-10 18:14 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-10-25 20:52:54 UTC
oVirt Team: Storage
Target Upstream Version:


Attachments (Terms of Use)
engine.log, vdsm.log, ovf file, list of images/$uuid (31.88 KB, application/x-gzip)
2013-10-02 11:54 UTC, Jiri Belka
no flags Details
logs, 2nd wave (1.19 MB, application/x-tar)
2013-10-07 15:33 UTC, Jiri Belka
no flags Details
logs from logcollector (10.71 MB, application/x-xz)
2013-10-09 09:59 UTC, Jiri Belka
no flags Details

Description Jiri Belka 2013-10-02 11:54:44 UTC
Created attachment 806399 [details]
engine.log, vdsm.log, ovf file, list of images/$uuid

Description of problem:
In Admin Portal: Failed to import Vm jb-wxp to Data Center Default, Cluster Default

The VM is older VM, exported on 3.2 env (2013-Jun-05, 16:58).

8adc79ee-8269-48b1-993b-407fb829f8ff is my data domain - str-03-jb33
ca414eb9-98ab-422f-8c4e-c2c513231058 is my export domain - str-01-jbelka-export

Thread-34372::ERROR::2013-10-02 13:11:21,834::hsm::1518::Storage.HSM::(deleteImage) Empty or not found image 14f8de7c-6c67-496a-8998-6ddbeb8db64d in SD 8adc79ee-8269-48b1-993b-407fb829f8ff. {'1ac141cf-6718-4110-bd81-8760da6baa64': ImgsPar(imgs=('45f88dd5-5219-495a-b971-059aaf146149', 'd542a9a9-47b2-45c0-84be-2b558cb15667'), parent='00000000-0000-0000-0000-000000000000'), '6e4ad20f-d63a-4991-a878-2453cac4c71d': ImgsPar(imgs=('3a2a9619-f79b-4fa2-ac5d-029b50c7cf92', '31170ed0-a452-4a79-9864-16f6c5471ef8'), parent='00000000-0000-0000-0000-000000000000'), '61689bae-e53e-4bcc-8ea8-93f2a871c606': ImgsPar(imgs=('cbe5c385-19d4-4e53-8f1f-86778bc526db',), parent=None), '445adfcf-894e-45df-9043-9a0c9fd472ac': ImgsPar(imgs=('4592eec4-0bbc-4458-879e-f872d950d135',), parent=None), '149f3399-97e7-464a-8005-b1ddb216fab1': ImgsPar(imgs=('8ac5910a-9b90-47c5-bfbe-f6d268c32045',), parent=None), '2faeb0f3-19de-4bcb-bcc2-42967bf3dc50': ImgsPar(imgs=('7fc7bade-4434-4da5-b114-d8be2c74ef6d',), parent=None), '59c8c7e8-9a05-4901-91b3-b18b00fbb3c2': ImgsPar(imgs=('631781c0-b748-4700-b64d-037c625068ba',), parent=None), '862457f0-6269-4ddc-95da-d8e37e8b02cb': ImgsPar(imgs=('31170ed0-a452-4a79-9864-16f6c5471ef8',), parent=None), '5f89ae09-fae6-406f-9346-10796a2c1d9e': ImgsPar(imgs=('631781c0-b748-4700-b64d-037c625068ba',), parent=None), '63c2949f-2738-4abc-bcd0-dbda38eda1b1': ImgsPar(imgs=('5210cbaa-6adf-456f-a7ca-5c7d56697d98', 'cbe5c385-19d4-4e53-8f1f-86778bc526db'), parent='00000000-0000-0000-0000-000000000000'), 'aad255ee-2fa3-4659-a5de-8e775b20d361': ImgsPar(imgs=('f69a71d5-912d-447c-8cf6-ae247371abbd',), parent=None), 'bfde185f-b470-4e94-822d-f1a256a44f13': ImgsPar(imgs=('d542a9a9-47b2-45c0-84be-2b558cb15667',), parent=None), '0c84422e-1276-42f4-af5f-d591a609958b': ImgsPar(imgs=('78441be0-c017-4f69-81f6-fa61fe38d600',), parent=None), '0c25a184-5d4b-418f-b10e-ae179020e6bc': ImgsPar(imgs=('f4e2f9db-7e74-4ccf-abcc-46800110a3b1',), parent=None), 'eb499557-89ff-43d0-ab0d-666242043f95': ImgsPar(imgs=('8ac5910a-9b90-47c5-bfbe-f6d268c32045',), parent=None), 'b06143aa-6b4e-4c4d-853b-7a1fbea2aa2c': ImgsPar(imgs=('e8a6f991-0cd8-42a3-a047-a0ea2c5dbce9',), parent=None), '70aa9a8d-0a3b-4760-9efe-e33608d365f1': ImgsPar(imgs=('3f0d11e6-62fa-4192-bf7b-4d3b956402c3',), parent=None)}
Thread-34372::ERROR::2013-10-02 13:11:21,834::task::850::TaskManager.Task::(_setError) Task=`655341c6-9c73-40a4-84d1-7cbfe99e7036`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1519, in deleteImage
    raise se.ImageDoesNotExistInSD(imgUUID, sdUUID)
Thread-34372::ERROR::2013-10-02 13:11:21,841::dispatcher::67::Storage.Dispatcher.Protect::(run) {'status': {'message': "Image does not exist in domain: 'image=14f8de7c-6c67-496a-8998-6ddbeb8db64d, domain=8adc79ee-8269-48b1-993b-407fb829f8ff'", 'code': 268}}

Version-Release number of selected component (if applicable):
is17

How reproducible:
is17
vdsm-4.12.0-170.el6ev.x86_64
libvirt-0.10.2-27.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.406.el6.x86_64
qemu-img-rhev-0.12.1.2-2.406.el6.x86_64

Steps to Reproduce:
1. have a vm from 3.2 on export domain
2. try to import this vm into 3.3
3.

Actual results:
failure

Expected results:


Additional info:

Comment 1 Sergey Gotliv 2013-10-02 21:56:21 UTC
Jiri,

Did you try to import that VM again? 

I am asking because initial import failed because Engine lost connectivity to SPM for a few seconds (see Engine log around 13:11:02). Unfortunately, I don't see when this import is started and can't follow the flow because Engine's log is partial, but I see when it failed. 

By the way, can you attach a full log?

During import VM we copying images so in case of failure we have to remove them from target storage domain. To do so, Engine is sending DeleteImage request to VDSM image by image, but since VDSM didn't copy some of them in the first place it may respond with ImageDoesNotExistInSDError. This printing in the log doesn't mean that something wrong. Engine handles this error as success and proceed to delete that image from database. So till now I don't see any problem.

Please, try to import VM again and if you succeed close this bug.

I am changing summary of this bug because its misleading.

Comment 2 Sergey Gotliv 2013-10-03 07:27:51 UTC
Yaniv,

Why do you think this is regression?

Comment 3 Jiri Belka 2013-10-07 15:31:19 UTC
Interesting. I don't see feel any network issue, so why engine is thinking my host is non-responding?

Full logs attached.

2013-Oct-07, 17:26 Failed to import Vm jb-w8-x86 to Data Center Default, Cluster Default
2013-Oct-07, 17:26 Storage Pool Manager runs on Host dell-r210ii-13 (Address: 10.34.62.205).
2013-Oct-07, 17:25 State was set to Up for host dell-r210ii-13.
2013-Oct-07, 17:25 Host dell-r210ii-13 is non responsive.
2013-Oct-07, 17:24 Invalid status on Data Center Default. Setting Data Center status to Non Responsive (On host dell-r210ii-13, Error: Network error during communication with the Host.).
2013-Oct-07, 17:19 Starting to import Vm jb-w8-x86 to Data Center Default, Cluster Default
2013-Oct-07, 17:16 Failed to import Vm jb-w8-x86 to Data Center Default, Cluster Default
...snip...

Comment 4 Jiri Belka 2013-10-07 15:33:55 UTC
Created attachment 808907 [details]
logs, 2nd wave

Comment 5 Sergey Gotliv 2013-10-08 11:37:46 UTC
Jiri,

You see it in the log, right?


2013-10-07 17:15:25,174 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-70) IrsBroker::Failed::GetStoragePoolInfoVDS due to: ConnectException: Connection refused
2013-10-07 17:15:25,191 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-83) Command GetCapabilitiesVDS execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2013-10-07 17:15:25,192 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-83) Failed to refresh VDS , vds = a53ccb2c-9993-46b5-8fb9-293b02aeb8b3 : dell-r210ii-13, VDS Network Error, continuing.
java.net.ConnectException: Connection refused
2013-10-07 17:15:26,166 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (DefaultQuartzScheduler_Worker-84) Command HSMGetAllTasksStatusesVDS execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2013-10-07 17:15:28,200 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-88) Command GetCapabilitiesVDS execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2013-10-07 17:15:28,201 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-88) Failed to refresh VDS , vds = a53ccb2c-9993-46b5-8fb9-293b02aeb8b3 : dell-r210ii-13, VDS Network Error, continuing.
java.net.ConnectException: Connection refused
2013-10-07 17:15:31,208 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-92) Command GetCapabilitiesVDS execution failed. Exception: VDSNetworkException: java.net.Co


Engine failes to communicate with VDSM, but VDSM logs you provided are ended in 15:31. Why???


Thread-24::DEBUG::2013-10-07 15:31:47,424::fileSD::239::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n374 bytes (374 B) copied, 0.000216256 s, 1.7 MB/s\n'; <rc> = 0

I need all logs in order to understand whats going on your host.

Comment 6 Jiri Belka 2013-10-09 09:58:11 UTC
Completely fresh install of rhevm (is17) and rhevh (20130918.0.el6).

2013-Oct-09, 11:46
Failed to import Vm jb-w7-x64 to Data Center Default, Cluster Default
	
2013-Oct-09, 11:46
Storage Pool Manager runs on Host dell-r210ii-13 (Address: 10.34.62.205).
	
2013-Oct-09, 11:45
State was set to Up for host dell-r210ii-13.
	
2013-Oct-09, 11:45
Host dell-r210ii-13 is non responsive.
	
2013-Oct-09, 11:44
Invalid status on Data Center Default. Setting Data Center status to Non Responsive (On host dell-r210ii-13, Error: Network error during communication with the Host.).
	
2013-Oct-09, 11:43
Used Network resources of host dell-r210ii-13 [98%] exceeded defined threshold [95%].

Comment 7 Jiri Belka 2013-10-09 09:59:58 UTC
Created attachment 809834 [details]
logs from logcollector

Comment 8 Dan Kenigsberg 2013-10-25 19:59:55 UTC
Sergey, it seems as if you've moved this bug to ON_QA by mistake. If it was intentional, please explain why.

Comment 9 Sergey Gotliv 2013-10-25 20:52:54 UTC
See my comment 5. I didn't have appropriate logs to work with so I moved bug to QA and asked to provide missing logs. 

Anyway If I remember correctly I spoke with Jiri and Haim and we agreed that there is no storage issue here. Jiri confirmed that flow is working perfect on other machines and I checked it on mine.


Note You need to log in before you can comment on or make changes to this bug.