Bug 1331335 - [engine-backend] An attempt to import an image back to the data domain while the original one has "_remove_me" in its ID fails on "java.lang.NumberFormatException"
Summary: [engine-backend] An attempt to import an image back to the data domain while ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.17.30
Hardware: x86_64
OS: Unspecified
unspecified
medium vote
Target Milestone: ovirt-4.1.0-beta
: 4.19.2
Assignee: Benny Zlotnik
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-28 10:25 UTC by Elad
Modified: 2017-03-16 14:46 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-16 14:46:11 UTC
oVirt Team: Storage
rule-engine: ovirt-4.1+
rule-engine: planning_ack+
tnisan: devel_ack+
ratamir: testing_ack+


Attachments (Terms of Use)
logs from engine and hypervisor (1.82 MB, application/x-gzip)
2016-04-28 10:25 UTC, Elad
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1270220 0 high CLOSED SPM is not tolerant for very slow NFS file deletes 2021-02-22 00:41:40 UTC
oVirt gerrit 69803 0 master MERGED core: fix NumberFormatException when importing a removed VM 2017-01-11 14:30:06 UTC
oVirt gerrit 70014 0 ovirt-engine-4.1 MERGED core: fix NumberFormatException when importing a removed VM 2017-01-11 15:13:15 UTC

Internal Links: 1270220

Description Elad 2016-04-28 10:25:03 UTC
Created attachment 1151799 [details]
logs from engine and hypervisor

Description of problem:
While trying to verify bug 1270220, I did the following:
Created an NFS domain resides on a sever that simulates slow files deletion using [1], created a VM with a disk resides on the slowfs domain attached, exported the VM to export domain, changed the deletion delay in the storage server to 10 sec (unlink = 10), removed the VM with the attached disk and immediately tried to import the VM with the disk (the same image ID).



I got the following exception in engine.log:

2016-04-28 12:44:01,823 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (default task-14) [338bff11] ERROR, GetImagesListVDSCommand( GetImagesListVDSCommandParameters:{runAsync='true', storagePoolId='7de10d80-b113-4f60-8f7f-e70f6476432b', ignoreFailoverLimit='false', sdUUID='fb97cea4-5bf2-48fa-9ceb-b8b2e109acd4'}), exception: For input string: "_remove_me_a9187951", log id: 6238e26f
2016-04-28 12:44:01,823 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (default task-14) [338bff11] Exception: java.lang.NumberFormatException: For input string: "_remove_me_a9187951"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) [rt.jar:1.8.0_71]
        at java.lang.Long.parseLong(Long.java:589) [rt.jar:1.8.0_71]
        at java.lang.Long.valueOf(Long.java:776) [rt.jar:1.8.0_71]
        at java.lang.Long.decode(Long.java:928) [rt.jar:1.8.0_71]
        at java.util.UUID.fromString(UUID.java:198) [rt.jar:1.8.0_71]
        at org.ovirt.engine.core.compat.Guid.<init>(Guid.java:73) [compat.jar:]
        at org.ovirt.engine.core.vdsbroker.irsbroker.GetImagesListVDSCommand.executeIrsBrokerCommand(GetImagesListVDSCommand.java:23) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand.executeVDSCommand(IrsBrokerCommand.java:159) [vdsbroker.jar:]



============================

Webadmdin:


Operation Canceled

Error while executing action: 

slow:
General command validation failure.

============================

Version-Release number of selected component (if applicable):
ovirt-engine-4.0.0-0.0.master.20160406161747.gita4ecba2.el7.centos.noarch
vdsm-4.17.999-724.gitb8cb30a.el7.centos.noarch

How reproducible:
For the mentioned scenario (I think it depends on timing)

Steps to Reproduce:
1. Create an NFS domain resides on a sever that simulates slow files deletion using [1] (can be achieved also by manipulating vdsm code so the files deletion will be slower) 
2. Create a VM with a disk resides on the slowfs domain attached
3. Export the VM to export domain
4. Change the deletion delay in the storage server to 10 sec (unlink = 10)
5. remove the VM with the attached disk and immediately try to import the VM with the disk (the same image ID) to the same data domain


Actual results:
Import fails with the mentioned exception and error message.

Expected results:
Import should succeed

Additional info:
[1] https://github.com/nirs/slowfs/blob/master/README.md

Comment 1 Tal Nisan 2016-05-01 14:22:30 UTC
This issue should be fixed in VDSM, getImagesList should not return images that are going to be deleted.
Setting target to 4.1 as this issue is a corner case in a slow storage environment and the operation fails as it should just not in a graceful way.

Comment 2 Elad 2017-02-19 14:12:53 UTC
Tested according to the steps in the description.
VM import succeeded. 

Used:
vdsm-4.19.6-1.el7ev.x86_64
rhevm-4.1.1.2-0.1.el7.noarch

Slowfs: 
https://github.com/nirs/slowfs/blob/master/README.md


Note You need to log in before you can comment on or make changes to this bug.