Bug 1945522 - [VM import from RHV to CNV] Disk lock after importer failure prevents importer retry
Summary: [VM import from RHV to CNV] Disk lock after importer failure prevents importe...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Storage
Version: 2.6.1
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.8.0
Assignee: Matthew Arnold
QA Contact: Amos Mastbaum
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-01 08:01 UTC by Ilanit Stein
Modified: 2021-07-27 14:30 UTC (History)
9 users (show)

Fixed In Version: virt-cdi-importer v4.8.0-16
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 14:29:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt containerized-data-importer pull 1760 0 None closed Unlock ImageIO disks after importer failure. 2021-04-23 00:31:25 UTC
Red Hat Product Errata RHSA-2021:2920 0 None None None 2021-07-27 14:30:51 UTC

Description Ilanit Stein 2021-04-01 08:01:26 UTC
Description of problem:
when the cdi importer fails, like in case of bug 1945121, the disk on the source RHV provider stays locked. Thus when the importer starts a retry, it fails on a disk lock, and that blocks the importer to actually try again to import (copy) the image.  

Version-Release number of selected component (if applicable):
CNV-2.6.1

Comment 1 Adam Litke 2021-04-07 19:32:14 UTC
Fabien could you help me to assign this to the appropriate engineer on your team to be fixed in CDI for 4.8?

Comment 2 Fabien Dupont 2021-04-07 20:49:27 UTC
I've assigned to Matthew since he already fixed another lock case.

Comment 3 Adam Litke 2021-04-08 19:14:53 UTC
Matthew, did you already fix this case or is this a unique one?

Comment 4 Matthew Arnold 2021-04-08 20:28:29 UTC
No, I did not fix this case already. I fixed a similar thing in 1924560, where ImageIO disks were locked after a clean importer shutdown. This variant was caused by an importer error, and I'm sure there are more potential cases just like it.

Comment 5 Adam Litke 2021-05-12 18:19:06 UTC
Matthew, can this bug be moved to MODIFIED since the attached PR is merged?

Comment 6 Matthew Arnold 2021-05-12 18:44:58 UTC
Yes, I will move it over.

Comment 8 Ilanit Stein 2021-06-15 13:50:37 UTC
@Matthew,

Can you please suggest verification steps as bug 1945121 is already fixed?

Comment 9 Matthew Arnold 2021-06-22 14:02:20 UTC
I have had luck triggering a failure in the same code path by removing the importer's copy of qemu-img. If you start an import from RHV, you can oc exec bash in the importer pod and delete or move /bin/qemu-img somewhere that the importer program can't find it. You should get a failure like this:

    0622 13:18:42.213320       1 data-processor.go:232] , Couldn't start qemu-img: exec: "qemu-img": executable file not found in $PATH                                          
    kubevirt.io/containerized-data-importer/pkg/image.(*qemuOperations).Info                                                                                                      
            pkg/image/qemu.go:190                                                                                                                                                 
    kubevirt.io/containerized-data-importer/pkg/importer.ResizeImage                                                                                                              
            pkg/importer/data-processor.go:304                                                                                                                                    
    kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).resize                                                                                                  
            pkg/importer/data-processor.go:272
    kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
            pkg/importer/data-processor.go:224
    kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData                                                                                             
            pkg/importer/data-processor.go:169
    main.main
            cmd/cdi-importer/importer.go:189
    runtime.main
            GOROOT/src/runtime/proc.go:203
    runtime.goexit
            GOROOT/src/runtime/asm_amd64.s:1373 
    Resize of image failed


...but the image should not stay locked in RHV.

Comment 12 Amos Mastbaum 2021-07-06 11:38:36 UTC
Versions:
CNV 4.8.0-451 iib 86746
MTV 2.1.0-21 iib 88402
OCP 4.8.0-rc.1


After falling the importer cdi by removing /bin/qemu-img inside the pod,
The import restarted and completed successfully

Comment 13 Maayan Hadasi 2021-07-06 13:24:15 UTC
Created attachment 1798641 [details]
describe importer-pod

Attached here the 'oc describe pod importer-pod' command output where you can see that the importer pod is restarted on failure

$ oc describe pod importer-mguetta-bug-ver-147c518e-ce0a-44bf-bb82-8672e52906e7
...
Status:       Running
IP:           10.128.2.100
IPs:
  IP:           10.128.2.100
Controlled By:  PersistentVolumeClaim/mguetta-bug-ver-147c518e-ce0a-44bf-bb82-8672e52906e7
Containers:
  importer:
...
    State:          Running
      Started:      Tue, 06 Jul 2021 08:24:02 -0400
    Last State:     Terminated
      Reason:       Error
      Message:      Unable to process data: , Couldn't start qemu-img: exec: "qemu-img": executable file not found in $PATH
      Exit Code:    1
      Started:      Tue, 06 Jul 2021 08:22:25 -0400
      Finished:     Tue, 06 Jul 2021 08:24:01 -0400
    Ready:          True
    Restart Count:  1

Comment 16 errata-xmlrpc 2021-07-27 14:29:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.8.0 Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2920


Note You need to log in before you can comment on or make changes to this bug.