Bug 1918503

Summary: Controller will crash when MigMigration is deleted mid DVM reconcile
Product: Migration Toolkit for Containers Reporter: Erik Nelson <ernelson>
Component: ControllerAssignee: Dylan Murray <dymurray>
Status: CLOSED ERRATA QA Contact: Xin jiang <xjiang>
Severity: unspecified Docs Contact: Avital Pinnick <apinnick>
Priority: unspecified    
Version: 1.4.0CC: ernelson, rjohnson, sregidor
Target Milestone: ---   
Target Release: 1.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-11 12:55:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Erik Nelson 2021-01-20 21:38:32 UTC
Description of problem:
A panic can be caused by force deleting a migmigration object while it's underlying DVM resource is mid-reconciling.

Version-Release number of selected component (if applicable):
1.4.0

How reproducible:
Fairly consistently, although this is a race condition

Steps to Reproduce:
1.For example, run a migmigration and while the migmigration object is in the "WaitForDVMToComplete" task, and the DVM is in a quick reconcile state (i.e. "WaitForRsyncClientPodsToComplete") deleting the migmigration object (oc delete migmigrations --all) will likely trigger a panic:

/usr/lib/golang/src/runtime/asm_amd64.s:1373                                                                       
panic: runtime error: invalid memory address or nil pointer dereference [recovered]  
        panic: runtime error: invalid memory address or nil pointer dereference            
[signal SIGSEGV: segmentation violation code=0x1 addr=0x68 pc=0x19ccbdd]               
goroutine 452 [running]:                                 
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
        /opt/app-root/src/go/pkg/mod/k8s.io/apimachinery.0-20181127025237-2b1284ed4c93/pkg/util/runtime/runtime.go:58 +0x105
panic(0x1ef5100, 0x39312f0)                              
        /usr/lib/golang/src/runtime/panic.go:969 +0x166
github.com/konveyor/mig-controller/pkg/controller/directvolumemigration.(*ReconcileDirectVolumeMigration).migrate(0xc0004be2c0, 0xc0010a3400, 0x2, 0x2, 0x0)
        /opt/app-root/src/github.com/konveyor/mig-controller/pkg/controller/directvolumemigration/migrate.go:33 +0x17d
github.com/konveyor/mig-controller/pkg/controller/directvolumemigration.(*ReconcileDirectVolumeMigration).Reconcile(0xc0004be2c0, 0xc002fdc4c0, 0x13, 0xc001da7260, 0x2a, 0xc001204e00, 0x0, 0x0, 0x0)
        /opt/app-root/src/github.com/konveyor/mig-controller/pkg/controller/directvolumemigration/directvolumemigration_controller.go:141 +0x3e7
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000516c80, 0x0)
        /opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime.11/pkg/internal/controller/controller.go:215 +0x1d6
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1()
        /opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime.11/pkg/internal/controller/controller.go:158 +0x36
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc0004b7230)
        /opt/app-root/src/go/pkg/mod/k8s.io/apimachinery.0-20181127025237-2b1284ed4c93/pkg/util/wait/wait.go:133 +0x5f
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0004b7230, 0x3b9aca00, 0x0, 0x1, 0xc000c58060)
        /opt/app-root/src/go/pkg/mod/k8s.io/apimachinery.0-20181127025237-2b1284ed4c93/pkg/util/wait/wait.go:134 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(0xc0004b7230, 0x3b9aca00, 0xc000c58060)
        /opt/app-root/src/go/pkg/mod/k8s.io/apimachinery.0-20181127025237-2b1284ed4c93/pkg/util/wait/wait.go:88 +0x4d
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start
        /opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime.11/pkg/internal/controller/controller.go:157 +0x2fd

Comment 1 Dylan Murray 2021-01-20 21:42:00 UTC
https://github.com/konveyor/mig-controller/pull/903

Comment 5 Sergio 2021-01-25 16:04:06 UTC
Verified using MTC 1.4.0 OCP3.11 AWS -> OCP4.5 AWS (AWS S3)

openshift-migration-rhel7-operator@sha256:79f524931e7188bfbfddf1e3d23f491b627d691ef7849a42432c7aec2d5f8a54
    - name: MIG_CONTROLLER_REPO
      value: openshift-migration-controller-rhel8@sha256
    - name: MIG_CONTROLLER_TAG
      value: cdf1bd56e353f076693cb7373c0a876be8984593d664ee0d7e1aeae7a3c54c1f


When the MigMigration resource is deleted, instead of a crash in the controller we get this error message 

{"level":"info","ts":1611590310.5926704,"logger":"direct|2tqcl","msg":"","direct":"openshift-migration/e3518010-5f25-11eb-b0ca-a524f44d2dff-8gndw","error":"did not find expected owning migmigration object for dvm","stacktrace":"\ngithub.com/konveyor/mig-controller/pkg/controller/directvolumemigration.(*ReconcileDirectVolumeMigration).migrate()\n\t/remote-source/app/pkg/controller/directvolumemigration/migrate.go:22\ngithub.com/konveyor/mig-controller/pkg/controller/directvolumemigration.(*ReconcileDirectVolumeMigration).Reconcile()\n\t/remote-source/app/pkg/controller/directvolumemigration/directvolumemigration_controller.go:126\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem()\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1()\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1()\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil()\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until()\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88\nruntime.goexit()\n\t/opt/rh/go-toolset-1.15/root/usr/lib/go-toolset-1.15-golang/src/runtime/asm_amd64.s:1374"}

And the controller keeps working normally.


Moved to VERIFIED status.

Comment 7 errata-xmlrpc 2021-02-11 12:55:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Migration Toolkit for Containers (MTC) tool image release advisory 1.4.0), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5329