Bug 1918503 - Controller will crash when MigMigration is deleted mid DVM reconcile
Summary: Controller will crash when MigMigration is deleted mid DVM reconcile
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Migration Toolkit for Containers
Classification: Red Hat
Component: Controller
Version: 1.4.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 1.4.0
Assignee: Dylan Murray
QA Contact: Xin jiang
Avital Pinnick
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-20 21:38 UTC by Erik Nelson
Modified: 2021-02-11 12:55 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-11 12:55:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:5329 0 None None None 2021-02-11 12:55:45 UTC

Description Erik Nelson 2021-01-20 21:38:32 UTC
Description of problem:
A panic can be caused by force deleting a migmigration object while it's underlying DVM resource is mid-reconciling.

Version-Release number of selected component (if applicable):
1.4.0

How reproducible:
Fairly consistently, although this is a race condition

Steps to Reproduce:
1.For example, run a migmigration and while the migmigration object is in the "WaitForDVMToComplete" task, and the DVM is in a quick reconcile state (i.e. "WaitForRsyncClientPodsToComplete") deleting the migmigration object (oc delete migmigrations --all) will likely trigger a panic:

/usr/lib/golang/src/runtime/asm_amd64.s:1373                                                                       
panic: runtime error: invalid memory address or nil pointer dereference [recovered]  
        panic: runtime error: invalid memory address or nil pointer dereference            
[signal SIGSEGV: segmentation violation code=0x1 addr=0x68 pc=0x19ccbdd]               
goroutine 452 [running]:                                 
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
        /opt/app-root/src/go/pkg/mod/k8s.io/apimachinery.0-20181127025237-2b1284ed4c93/pkg/util/runtime/runtime.go:58 +0x105
panic(0x1ef5100, 0x39312f0)                              
        /usr/lib/golang/src/runtime/panic.go:969 +0x166
github.com/konveyor/mig-controller/pkg/controller/directvolumemigration.(*ReconcileDirectVolumeMigration).migrate(0xc0004be2c0, 0xc0010a3400, 0x2, 0x2, 0x0)
        /opt/app-root/src/github.com/konveyor/mig-controller/pkg/controller/directvolumemigration/migrate.go:33 +0x17d
github.com/konveyor/mig-controller/pkg/controller/directvolumemigration.(*ReconcileDirectVolumeMigration).Reconcile(0xc0004be2c0, 0xc002fdc4c0, 0x13, 0xc001da7260, 0x2a, 0xc001204e00, 0x0, 0x0, 0x0)
        /opt/app-root/src/github.com/konveyor/mig-controller/pkg/controller/directvolumemigration/directvolumemigration_controller.go:141 +0x3e7
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000516c80, 0x0)
        /opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime.11/pkg/internal/controller/controller.go:215 +0x1d6
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1()
        /opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime.11/pkg/internal/controller/controller.go:158 +0x36
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc0004b7230)
        /opt/app-root/src/go/pkg/mod/k8s.io/apimachinery.0-20181127025237-2b1284ed4c93/pkg/util/wait/wait.go:133 +0x5f
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0004b7230, 0x3b9aca00, 0x0, 0x1, 0xc000c58060)
        /opt/app-root/src/go/pkg/mod/k8s.io/apimachinery.0-20181127025237-2b1284ed4c93/pkg/util/wait/wait.go:134 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(0xc0004b7230, 0x3b9aca00, 0xc000c58060)
        /opt/app-root/src/go/pkg/mod/k8s.io/apimachinery.0-20181127025237-2b1284ed4c93/pkg/util/wait/wait.go:88 +0x4d
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start
        /opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime.11/pkg/internal/controller/controller.go:157 +0x2fd

Comment 1 Dylan Murray 2021-01-20 21:42:00 UTC
https://github.com/konveyor/mig-controller/pull/903

Comment 5 Sergio 2021-01-25 16:04:06 UTC
Verified using MTC 1.4.0 OCP3.11 AWS -> OCP4.5 AWS (AWS S3)

openshift-migration-rhel7-operator@sha256:79f524931e7188bfbfddf1e3d23f491b627d691ef7849a42432c7aec2d5f8a54
    - name: MIG_CONTROLLER_REPO
      value: openshift-migration-controller-rhel8@sha256
    - name: MIG_CONTROLLER_TAG
      value: cdf1bd56e353f076693cb7373c0a876be8984593d664ee0d7e1aeae7a3c54c1f


When the MigMigration resource is deleted, instead of a crash in the controller we get this error message 

{"level":"info","ts":1611590310.5926704,"logger":"direct|2tqcl","msg":"","direct":"openshift-migration/e3518010-5f25-11eb-b0ca-a524f44d2dff-8gndw","error":"did not find expected owning migmigration object for dvm","stacktrace":"\ngithub.com/konveyor/mig-controller/pkg/controller/directvolumemigration.(*ReconcileDirectVolumeMigration).migrate()\n\t/remote-source/app/pkg/controller/directvolumemigration/migrate.go:22\ngithub.com/konveyor/mig-controller/pkg/controller/directvolumemigration.(*ReconcileDirectVolumeMigration).Reconcile()\n\t/remote-source/app/pkg/controller/directvolumemigration/directvolumemigration_controller.go:126\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem()\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1()\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1()\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil()\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until()\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88\nruntime.goexit()\n\t/opt/rh/go-toolset-1.15/root/usr/lib/go-toolset-1.15-golang/src/runtime/asm_amd64.s:1374"}

And the controller keeps working normally.


Moved to VERIFIED status.

Comment 7 errata-xmlrpc 2021-02-11 12:55:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Migration Toolkit for Containers (MTC) tool image release advisory 1.4.0), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5329


Note You need to log in before you can comment on or make changes to this bug.