Bug 1014613 - [vdsm] Cannot import older VM
[vdsm] Cannot import older VM
Status: CLOSED WORKSFORME
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm (Show other bugs)
3.3.0
Unspecified Unspecified
urgent Severity high
: ---
: 3.3.0
Assigned To: Sergey Gotliv
storage
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-02 07:54 EDT by Jiri Belka
Modified: 2016-02-10 13:14 EST (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-10-25 16:52:54 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
engine.log, vdsm.log, ovf file, list of images/$uuid (31.88 KB, application/x-gzip)
2013-10-02 07:54 EDT, Jiri Belka
no flags Details
logs, 2nd wave (1.19 MB, application/x-tar)
2013-10-07 11:33 EDT, Jiri Belka
no flags Details
logs from logcollector (10.71 MB, application/x-xz)
2013-10-09 05:59 EDT, Jiri Belka
no flags Details

  None (edit)
Description Jiri Belka 2013-10-02 07:54:44 EDT
Created attachment 806399 [details]
engine.log, vdsm.log, ovf file, list of images/$uuid

Description of problem:
In Admin Portal: Failed to import Vm jb-wxp to Data Center Default, Cluster Default

The VM is older VM, exported on 3.2 env (2013-Jun-05, 16:58).

8adc79ee-8269-48b1-993b-407fb829f8ff is my data domain - str-03-jb33
ca414eb9-98ab-422f-8c4e-c2c513231058 is my export domain - str-01-jbelka-export

Thread-34372::ERROR::2013-10-02 13:11:21,834::hsm::1518::Storage.HSM::(deleteImage) Empty or not found image 14f8de7c-6c67-496a-8998-6ddbeb8db64d in SD 8adc79ee-8269-48b1-993b-407fb829f8ff. {'1ac141cf-6718-4110-bd81-8760da6baa64': ImgsPar(imgs=('45f88dd5-5219-495a-b971-059aaf146149', 'd542a9a9-47b2-45c0-84be-2b558cb15667'), parent='00000000-0000-0000-0000-000000000000'), '6e4ad20f-d63a-4991-a878-2453cac4c71d': ImgsPar(imgs=('3a2a9619-f79b-4fa2-ac5d-029b50c7cf92', '31170ed0-a452-4a79-9864-16f6c5471ef8'), parent='00000000-0000-0000-0000-000000000000'), '61689bae-e53e-4bcc-8ea8-93f2a871c606': ImgsPar(imgs=('cbe5c385-19d4-4e53-8f1f-86778bc526db',), parent=None), '445adfcf-894e-45df-9043-9a0c9fd472ac': ImgsPar(imgs=('4592eec4-0bbc-4458-879e-f872d950d135',), parent=None), '149f3399-97e7-464a-8005-b1ddb216fab1': ImgsPar(imgs=('8ac5910a-9b90-47c5-bfbe-f6d268c32045',), parent=None), '2faeb0f3-19de-4bcb-bcc2-42967bf3dc50': ImgsPar(imgs=('7fc7bade-4434-4da5-b114-d8be2c74ef6d',), parent=None), '59c8c7e8-9a05-4901-91b3-b18b00fbb3c2': ImgsPar(imgs=('631781c0-b748-4700-b64d-037c625068ba',), parent=None), '862457f0-6269-4ddc-95da-d8e37e8b02cb': ImgsPar(imgs=('31170ed0-a452-4a79-9864-16f6c5471ef8',), parent=None), '5f89ae09-fae6-406f-9346-10796a2c1d9e': ImgsPar(imgs=('631781c0-b748-4700-b64d-037c625068ba',), parent=None), '63c2949f-2738-4abc-bcd0-dbda38eda1b1': ImgsPar(imgs=('5210cbaa-6adf-456f-a7ca-5c7d56697d98', 'cbe5c385-19d4-4e53-8f1f-86778bc526db'), parent='00000000-0000-0000-0000-000000000000'), 'aad255ee-2fa3-4659-a5de-8e775b20d361': ImgsPar(imgs=('f69a71d5-912d-447c-8cf6-ae247371abbd',), parent=None), 'bfde185f-b470-4e94-822d-f1a256a44f13': ImgsPar(imgs=('d542a9a9-47b2-45c0-84be-2b558cb15667',), parent=None), '0c84422e-1276-42f4-af5f-d591a609958b': ImgsPar(imgs=('78441be0-c017-4f69-81f6-fa61fe38d600',), parent=None), '0c25a184-5d4b-418f-b10e-ae179020e6bc': ImgsPar(imgs=('f4e2f9db-7e74-4ccf-abcc-46800110a3b1',), parent=None), 'eb499557-89ff-43d0-ab0d-666242043f95': ImgsPar(imgs=('8ac5910a-9b90-47c5-bfbe-f6d268c32045',), parent=None), 'b06143aa-6b4e-4c4d-853b-7a1fbea2aa2c': ImgsPar(imgs=('e8a6f991-0cd8-42a3-a047-a0ea2c5dbce9',), parent=None), '70aa9a8d-0a3b-4760-9efe-e33608d365f1': ImgsPar(imgs=('3f0d11e6-62fa-4192-bf7b-4d3b956402c3',), parent=None)}
Thread-34372::ERROR::2013-10-02 13:11:21,834::task::850::TaskManager.Task::(_setError) Task=`655341c6-9c73-40a4-84d1-7cbfe99e7036`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1519, in deleteImage
    raise se.ImageDoesNotExistInSD(imgUUID, sdUUID)
Thread-34372::ERROR::2013-10-02 13:11:21,841::dispatcher::67::Storage.Dispatcher.Protect::(run) {'status': {'message': "Image does not exist in domain: 'image=14f8de7c-6c67-496a-8998-6ddbeb8db64d, domain=8adc79ee-8269-48b1-993b-407fb829f8ff'", 'code': 268}}

Version-Release number of selected component (if applicable):
is17

How reproducible:
is17
vdsm-4.12.0-170.el6ev.x86_64
libvirt-0.10.2-27.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.406.el6.x86_64
qemu-img-rhev-0.12.1.2-2.406.el6.x86_64

Steps to Reproduce:
1. have a vm from 3.2 on export domain
2. try to import this vm into 3.3
3.

Actual results:
failure

Expected results:


Additional info:
Comment 1 Sergey Gotliv 2013-10-02 17:56:21 EDT
Jiri,

Did you try to import that VM again? 

I am asking because initial import failed because Engine lost connectivity to SPM for a few seconds (see Engine log around 13:11:02). Unfortunately, I don't see when this import is started and can't follow the flow because Engine's log is partial, but I see when it failed. 

By the way, can you attach a full log?

During import VM we copying images so in case of failure we have to remove them from target storage domain. To do so, Engine is sending DeleteImage request to VDSM image by image, but since VDSM didn't copy some of them in the first place it may respond with ImageDoesNotExistInSDError. This printing in the log doesn't mean that something wrong. Engine handles this error as success and proceed to delete that image from database. So till now I don't see any problem.

Please, try to import VM again and if you succeed close this bug.

I am changing summary of this bug because its misleading.
Comment 2 Sergey Gotliv 2013-10-03 03:27:51 EDT
Yaniv,

Why do you think this is regression?
Comment 3 Jiri Belka 2013-10-07 11:31:19 EDT
Interesting. I don't see feel any network issue, so why engine is thinking my host is non-responding?

Full logs attached.

2013-Oct-07, 17:26 Failed to import Vm jb-w8-x86 to Data Center Default, Cluster Default
2013-Oct-07, 17:26 Storage Pool Manager runs on Host dell-r210ii-13 (Address: 10.34.62.205).
2013-Oct-07, 17:25 State was set to Up for host dell-r210ii-13.
2013-Oct-07, 17:25 Host dell-r210ii-13 is non responsive.
2013-Oct-07, 17:24 Invalid status on Data Center Default. Setting Data Center status to Non Responsive (On host dell-r210ii-13, Error: Network error during communication with the Host.).
2013-Oct-07, 17:19 Starting to import Vm jb-w8-x86 to Data Center Default, Cluster Default
2013-Oct-07, 17:16 Failed to import Vm jb-w8-x86 to Data Center Default, Cluster Default
...snip...
Comment 4 Jiri Belka 2013-10-07 11:33:55 EDT
Created attachment 808907 [details]
logs, 2nd wave
Comment 5 Sergey Gotliv 2013-10-08 07:37:46 EDT
Jiri,

You see it in the log, right?


2013-10-07 17:15:25,174 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-70) IrsBroker::Failed::GetStoragePoolInfoVDS due to: ConnectException: Connection refused
2013-10-07 17:15:25,191 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-83) Command GetCapabilitiesVDS execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2013-10-07 17:15:25,192 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-83) Failed to refresh VDS , vds = a53ccb2c-9993-46b5-8fb9-293b02aeb8b3 : dell-r210ii-13, VDS Network Error, continuing.
java.net.ConnectException: Connection refused
2013-10-07 17:15:26,166 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (DefaultQuartzScheduler_Worker-84) Command HSMGetAllTasksStatusesVDS execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2013-10-07 17:15:28,200 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-88) Command GetCapabilitiesVDS execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2013-10-07 17:15:28,201 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-88) Failed to refresh VDS , vds = a53ccb2c-9993-46b5-8fb9-293b02aeb8b3 : dell-r210ii-13, VDS Network Error, continuing.
java.net.ConnectException: Connection refused
2013-10-07 17:15:31,208 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-92) Command GetCapabilitiesVDS execution failed. Exception: VDSNetworkException: java.net.Co


Engine failes to communicate with VDSM, but VDSM logs you provided are ended in 15:31. Why???


Thread-24::DEBUG::2013-10-07 15:31:47,424::fileSD::239::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n374 bytes (374 B) copied, 0.000216256 s, 1.7 MB/s\n'; <rc> = 0

I need all logs in order to understand whats going on your host.
Comment 6 Jiri Belka 2013-10-09 05:58:11 EDT
Completely fresh install of rhevm (is17) and rhevh (20130918.0.el6).

2013-Oct-09, 11:46
Failed to import Vm jb-w7-x64 to Data Center Default, Cluster Default
	
2013-Oct-09, 11:46
Storage Pool Manager runs on Host dell-r210ii-13 (Address: 10.34.62.205).
	
2013-Oct-09, 11:45
State was set to Up for host dell-r210ii-13.
	
2013-Oct-09, 11:45
Host dell-r210ii-13 is non responsive.
	
2013-Oct-09, 11:44
Invalid status on Data Center Default. Setting Data Center status to Non Responsive (On host dell-r210ii-13, Error: Network error during communication with the Host.).
	
2013-Oct-09, 11:43
Used Network resources of host dell-r210ii-13 [98%] exceeded defined threshold [95%].
Comment 7 Jiri Belka 2013-10-09 05:59:58 EDT
Created attachment 809834 [details]
logs from logcollector
Comment 8 Dan Kenigsberg 2013-10-25 15:59:55 EDT
Sergey, it seems as if you've moved this bug to ON_QA by mistake. If it was intentional, please explain why.
Comment 9 Sergey Gotliv 2013-10-25 16:52:54 EDT
See my comment 5. I didn't have appropriate logs to work with so I moved bug to QA and asked to provide missing logs. 

Anyway If I remember correctly I spoke with Jiri and Haim and we agreed that there is no storage issue here. Jiri confirmed that flow is working perfect on other machines and I checked it on mine.

Note You need to log in before you can comment on or make changes to this bug.