Bug 1414484

Summary: image is stuck with "Finalizing" status after a download of 1gb chunk (= disk size) [works with 1mb chunk]
Product: [oVirt] ovirt-engine Reporter: Natalie Gavrielov <ngavrilo>
Component: GeneralAssignee: Daniel Erez <derez>
Status: CLOSED CURRENTRELEASE QA Contact: Natalie Gavrielov <ngavrilo>
Severity: low Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: amureini, bugs, lveyde, ngavrilo, tnisan, ylavi
Target Milestone: ovirt-4.1.2Flags: rule-engine: ovirt-4.1+
rule-engine: ovirt-4.2+
ykaul: blocker-
Target Release: 4.1.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-05-23 08:18:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1447023    
Bug Blocks:    
Attachments:
Description Flags
snapshot, logs: vdsm. daemon, proxy, engine
none
download script
none
proxy.log
none
logs, and content of image_transfers
none
vdsm logs none

Description Natalie Gavrielov 2017-01-18 16:00:15 UTC
Created attachment 1242206 [details]
snapshot, logs: vdsm. daemon, proxy, engine

Description of problem:

Downloaded a 1gb disk using python-sdk, chunk size used was 1gb.
Downloaded disk is stuck in "Finalizing" status since than (usually this stage take very short time) 


Version-Release number of selected component (if applicable):
vdsm-4.19.1-1.el7ev.x86_64
ovirt-engine-4.1.0-0.4.master.20170111000229.git9ce0636.el7.centos.noarch

How reproducible:
100%

Steps to Reproduce:
Download a 1GB size image from ovirt, using python-sdk

Actual results:
After download finishes, the GUI shows finalizing status for the downloaded disk - and it's stuck like this, forever.

Expected results:
For the GUI to display status OK, once the download is done.

Additional info:
Same test using the same script but with chunk size 1mb works fine (when download finishes the status of the disk to OK).
Disk id: e2bc9560-b998-4bc3-aca5-af5ba4763eff

Comment 1 Yaniv Kaul 2017-01-19 08:51:29 UTC
I have no idea why the blocker flag was added. A simple workaround of using a fairly reasonable 1mb chunk seems to work.

Comment 2 Daniel Erez 2017-01-29 08:29:08 UTC
Hi Natalie,

A few questions:

* Can you please attach the upload script you've used?

* The proxy log seems truncated, can you please attach it again?

* Note that we don't support a huge chunk size, as it depends on the client available memory. The maximal chunk size should be ~10MB. We could add a comment on that in the sdk example [1]. Can you please try to reproduce the described scenario on another machine with sufficient available memory?

[1] https://gerrit.ovirt.org/gitweb?p=ovirt-engine-sdk.git;a=blob_plain;f=sdk/examples/upload_disk.py

Comment 3 Natalie Gavrielov 2017-01-31 15:50:35 UTC
Created attachment 1246356 [details]
download script

just change chunk size

Comment 4 Natalie Gavrielov 2017-01-31 15:52:47 UTC
Created attachment 1246357 [details]
proxy.log

Comment 5 Natalie Gavrielov 2017-01-31 16:08:51 UTC
(In reply to Daniel Erez from comment #2)
> * Note that we don't support a huge chunk size, as it depends on the client
> available memory. The maximal chunk size should be ~10MB. We could add a
> comment on that in the sdk example [1]. Can you please try to reproduce the
> described scenario on another machine with sufficient available memory?
> 
What do you mean by sufficient available memory?

The client I'm using has 24gb virtual memory:
# vmstat -s -S M
        23936 M total memory
          523 M used memory
         9934 M active memory
         6242 M inactive memory
         6688 M free memory
            1 M buffer memory
        16722 M swap cache
        12096 M total swap
            0 M used swap
        12096 M free swap
The other 2 hosts have 16gb virtual memory.

I don't see a point in trying out other machines.

Comment 6 Daniel Erez 2017-01-31 17:41:28 UTC
@Natalie - according to the logs [1], the transfer was successful, but the ticket wasn't found. Can you please attach to content of image_transfers table after the scenario is reproduced? Another thing, I couldn't reproduce it with 1GB chunck. Is it reproduced in your environment with other chunk sizes as well?

[1]
d388-4d52-b8b1-f6e59d79a486] Transfer was successful. Download disk '1gb-disk-raw' (id 'e2bc9560-b998-4bc3-aca5-af5ba4763eff')
2017-01-18 16:41:41,516+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (DefaultQuartzScheduler1) [db4e3eb5-16ca-44ba-acef-0c28f5dd99b7] Finalizing successful transfer for Download disk 'test-upload-16-01-23-50-compressed-v2' (id 'c149a2db-8d3a-4215-bfdd-675252a27ce9')
2017-01-18 16:41:41,517+02 WARN  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (DefaultQuartzScheduler1) [db4e3eb5-16ca-44ba-acef-0c28f5dd99b7] Failed to stop image transfer session. Ticket does not exist for image 'c149a2db-8d3a-4215-bfdd-675252a27ce9'

Comment 7 Natalie Gavrielov 2017-02-01 12:38:17 UTC
Created attachment 1246641 [details]
logs, and content of image_transfers

(In reply to Daniel Erez from comment #6)
> @Natalie - according to the logs [1], the transfer was successful, but the
> ticket wasn't found. Can you please attach to content of image_transfers
> table after the scenario is reproduced? Another thing, I couldn't reproduce
> it with 1GB chunck. Is it reproduced in your environment with other chunk
> sizes as well?
> 
> [1]
> d388-4d52-b8b1-f6e59d79a486] Transfer was successful. Download disk
> '1gb-disk-raw' (id 'e2bc9560-b998-4bc3-aca5-af5ba4763eff')
> 2017-01-18 16:41:41,516+02 INFO 
> [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand]
> (DefaultQuartzScheduler1) [db4e3eb5-16ca-44ba-acef-0c28f5dd99b7] Finalizing
> successful transfer for Download disk
> 'test-upload-16-01-23-50-compressed-v2' (id
> 'c149a2db-8d3a-4215-bfdd-675252a27ce9')
> 2017-01-18 16:41:41,517+02 WARN 
> [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand]
> (DefaultQuartzScheduler1) [db4e3eb5-16ca-44ba-acef-0c28f5dd99b7] Failed to
> stop image transfer session. Ticket does not exist for image
> 'c149a2db-8d3a-4215-bfdd-675252a27ce9'

I tried it again with 1, 2, 4, 8 chunks for a 1gb disk.

Attached logs and the content of the image_transfers.

Comment 8 Daniel Erez 2017-02-01 13:30:26 UTC
(In reply to Natalie Gavrielov from comment #7)
> Created attachment 1246641 [details]
> logs, and content of image_transfers
> 
> (In reply to Daniel Erez from comment #6)
> > @Natalie - according to the logs [1], the transfer was successful, but the
> > ticket wasn't found. Can you please attach to content of image_transfers
> > table after the scenario is reproduced? Another thing, I couldn't reproduce
> > it with 1GB chunck. Is it reproduced in your environment with other chunk
> > sizes as well?
> > 
> > [1]
> > d388-4d52-b8b1-f6e59d79a486] Transfer was successful. Download disk
> > '1gb-disk-raw' (id 'e2bc9560-b998-4bc3-aca5-af5ba4763eff')
> > 2017-01-18 16:41:41,516+02 INFO 
> > [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand]
> > (DefaultQuartzScheduler1) [db4e3eb5-16ca-44ba-acef-0c28f5dd99b7] Finalizing
> > successful transfer for Download disk
> > 'test-upload-16-01-23-50-compressed-v2' (id
> > 'c149a2db-8d3a-4215-bfdd-675252a27ce9')
> > 2017-01-18 16:41:41,517+02 WARN 
> > [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand]
> > (DefaultQuartzScheduler1) [db4e3eb5-16ca-44ba-acef-0c28f5dd99b7] Failed to
> > stop image transfer session. Ticket does not exist for image
> > 'c149a2db-8d3a-4215-bfdd-675252a27ce9'
> 
> I tried it again with 1, 2, 4, 8 chunks for a 1gb disk.
> 
> Attached logs and the content of the image_transfers.

Thanks!
So what are the results? 
With which chunk sizes it doesn't work as expected?

Comment 9 Natalie Gavrielov 2017-02-02 13:22:13 UTC
(In reply to Daniel Erez from comment #8)
> (In reply to Natalie Gavrielov from comment #7)
> > Created attachment 1246641 [details]
> > logs, and content of image_transfers
> > 
> > (In reply to Daniel Erez from comment #6)
> > > @Natalie - according to the logs [1], the transfer was successful, but the
> > > ticket wasn't found. Can you please attach to content of image_transfers
> > > table after the scenario is reproduced? Another thing, I couldn't reproduce
> > > it with 1GB chunck. Is it reproduced in your environment with other chunk
> > > sizes as well?
> > > 
> > > [1]
> > > d388-4d52-b8b1-f6e59d79a486] Transfer was successful. Download disk
> > > '1gb-disk-raw' (id 'e2bc9560-b998-4bc3-aca5-af5ba4763eff')
> > > 2017-01-18 16:41:41,516+02 INFO 
> > > [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand]
> > > (DefaultQuartzScheduler1) [db4e3eb5-16ca-44ba-acef-0c28f5dd99b7] Finalizing
> > > successful transfer for Download disk
> > > 'test-upload-16-01-23-50-compressed-v2' (id
> > > 'c149a2db-8d3a-4215-bfdd-675252a27ce9')
> > > 2017-01-18 16:41:41,517+02 WARN 
> > > [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand]
> > > (DefaultQuartzScheduler1) [db4e3eb5-16ca-44ba-acef-0c28f5dd99b7] Failed to
> > > stop image transfer session. Ticket does not exist for image
> > > 'c149a2db-8d3a-4215-bfdd-675252a27ce9'
> > 
> > I tried it again with 1, 2, 4, 8 chunks for a 1gb disk.
> > 
> > Attached logs and the content of the image_transfers.
> 
> Thanks!
> So what are the results? 
> With which chunk sizes it doesn't work as expected?

All failed - disk is stuck in "finalizing" state (after finishing transfer), chunk sizes: 512MB, 256MB, 128MB.

Comment 10 Daniel Erez 2017-02-05 09:59:28 UTC
@Natalie - seems that a similar issue as bug 1417903 has occurred (an exception in prepare image) [1]. Can you please attach the previous vdsm logs (from around 2017-01-31 11:53:44)?

[1]
2017-01-31 11:53:44,072+02 ERROR [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-6) [76a5c223-b1cf-4225-985a-38b50b9b69d6] Failed to prepare image for transfer session: {}: org.ovirt.engine.core.common.errors.EngineException: EngineException: java.lang.NullPointerException (Failed with error ENGINE and code 5001)
	at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:118) [bll.jar:]

Comment 11 Natalie Gavrielov 2017-02-05 14:08:46 UTC
Created attachment 1247818 [details]
vdsm logs

(In reply to Daniel Erez from comment #10)
> @Natalie - seems that a similar issue as bug 1417903 has occurred (an
> exception in prepare image) [1]. Can you please attach the previous vdsm
> logs (from around 2017-01-31 11:53:44)?

Attached

Comment 12 Natalie Gavrielov 2017-05-14 16:33:00 UTC
Verified using builds:
rhevm-4.1.2.2-0.1.el7.noarch
vdsm-4.19.14-1.el7ev.x86_64
ovirt-imageio-common-1.0.0-0.el7ev.noarch
ovirt-imageio-proxy-1.0.0-0.el7ev.noarch
ovirt-imageio-daemon-1.0.0-0.el7ev.noarch