Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1791323

Summary: Velero pod crashes if ResticRepository.Status.LastMaintenanceTime is nil
Product: OpenShift Container Platform Reporter: Scott Seago <sseago>
Component: Migration ToolingAssignee: Scott Seago <sseago>
Status: CLOSED ERRATA QA Contact: Xin jiang <xjiang>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.2.zCC: ernelson, sregidor
Target Milestone: ---   
Target Release: 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-06 20:21:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Scott Seago 2020-01-15 14:22:18 UTC
Description of problem:
Velero pod crashes if ResticRepository.Status.LastMaintenanceTime is nil

The migration logs a ReconcileFailed condition with 'Reconcile failed: [unable to upgrade connection: container not found
      ("velero")]. See controller logs for details.'
The velero pod is crashing with this log message:
E0114 22:35:56.492712       1 runtime.go:73] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)



Version-Release number of selected component (if applicable): 4.2.z


How reproducible:
It's unclear how to reproduce this in general. If the ResticRepository happens to have a missing Status.LastMaintenanceTime, the bug is completely reproducible. I'm not sure what conditions result in this initial state, though. It may be reproducible by removing this field from the Status.

Steps to Reproduce:
1. (starting with a migplan that will migrate PVs with restic for which the associated ResticRepository is missing Status.LastMaintenanceTime
2. Start a migration

Actual results:
The migration logs a ReconcileFailed condition with 'Reconcile failed: [unable to upgrade connection: container not found
      ("velero")]. See controller logs for details.'
The velero pod crashes with this log message:
E0114 22:35:56.492712       1 runtime.go:73] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)


Expected results:
Migration succeeds

Additional info:

Comment 1 Scott Seago 2020-01-15 14:26:12 UTC
Upstream PR is here: https://github.com/vmware-tanzu/velero/pull/2200
corresponding PR in our repo is forthcoming.

Comment 2 Scott Seago 2020-01-15 15:07:33 UTC
fusor PR: https://github.com/fusor/velero/pull/50

Comment 4 Sergio 2020-01-22 12:10:12 UTC
Verified CAM 1.1 stage

    imageID: registry.stage.redhat.io/rhcam-1-1/openshift-migration-velero-rhel8@sha256:89e56f7f08802e92a763ca3c7336209e58849b9ac9ea90ddc76d9b94d981b8b9
    imageID: registry.stage.redhat.io/rhcam-1-1/openshift-migration-plugin-rhel8@sha256:9c6eceba0c422b9f375c3ab785ff392093493ce33def7c761d7cedc51cde775d
    imageID: registry.stage.redhat.io/rhcam-1-1/openshift-migration-velero-plugin-for-aws-rhel8@sha256:5235eeeee330165eef77ac8d823eed384c9108884f6be49c9ab47944051af91e
    imageID: registry.stage.redhat.io/rhcam-1-1/openshift-migration-velero-plugin-for-gcp-rhel8@sha256:789b12ff351d3edde735b9f5eebe494a8ac5a94604b419dfd84e87d073b04e9e
    imageID: registry.stage.redhat.io/rhcam-1-1/openshift-migration-velero-plugin-for-microsoft-azure-rhel8@sha256:b98f1c61ba347aaa0c8dac5c34b6be4b8cce20c8ff462f476a3347d767ad0a93


In order to reproduce the issue all resticrepository resources were patched and the field "lastMaintenanceTime" deleted. A stage migration was run, and then a normal migration.

# In openshift-migration namespace
oc get resticrepository -o name | xargs oc patch   --type json -p '[{"op":"remove","path":"/status/lastMaintenanceTime"}]'

The result was that the migrations ended properly, and the resticrepositories got their lastMaintenanceTime field added.

Comment 6 errata-xmlrpc 2020-02-06 20:21:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:0440