Bug 2233811 - SSP - pods randomly fail with segmentation violation in client-go/discovery/aggregated_discovery.go
Summary: SSP - pods randomly fail with segmentation violation in client-go/discovery/a...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: SSP
Version: 4.14.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.14.0
Assignee: Karel Šimon
QA Contact: zhe peng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-08-23 14:20 UTC by vsibirsk
Modified: 2023-11-08 14:06 UTC (History)
2 users (show)

Fixed In Version: kubevirt-ssp-operator-rhel9-container-v4.14.0-107
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-11-08 14:06:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt ssp-operator pull 676 0 None open fix: bump k8s dependencies 2023-08-31 08:20:56 UTC
Red Hat Issue Tracker CNV-32375 0 None None None 2023-09-06 17:31:28 UTC
Red Hat Product Errata RHSA-2023:6817 0 None None None 2023-11-08 14:06:27 UTC

Description vsibirsk 2023-08-23 14:20:34 UTC
Description of problem:
ssp-operator pods sometimes end-up in CrashLoopBackOff state.
Also VIRT pods are affected (virt-controller and/or virt-operator)

Version-Release number of selected component (if applicable):
4.14

How reproducible:
Sporadic. We couldn't find the exact trigger, not all deployed clusters can be affected

Steps to Reproduce:
1.Deploy 4.14 CNV cluster
2.After some time, CNV pods start to fail
3.

Actual results:
openshift-cnv pods are in CrashLoopBackOff state

Expected results:
All CNV pods are in Running state

Additional info:
pods -A | grep -v Running | grep -v Completed
NAMESPACE                                          NAME                                                              READY   STATUS             RESTARTS          AGE
openshift-cnv                                      cdi-deployment-844845fd6d-9pkjr                                   0/1     CrashLoopBackOff   186 (4m24s ago)   16h
openshift-cnv                                      cdi-operator-6499bcc5b7-xxtzc                                     0/1     CrashLoopBackOff   187 (4m32s ago)   16h
openshift-cnv                                      hostpath-provisioner-operator-f4dc64d86-vhvlf                     0/1     CrashLoopBackOff   187 (5m ago)      16h
openshift-cnv                                      ssp-operator-644c98fdc9-cjncw                                     0/1     CrashLoopBackOff   188 (4m49s ago)   16h
openshift-cnv                                      virt-controller-b5b88dd59-prtk7                                   0/1     CrashLoopBackOff   186 (84s ago)     16h
openshift-cnv                                      virt-controller-b5b88dd59-wkpvz                                   0/1     CrashLoopBackOff   187 (81s ago)     16h
openshift-cnv                                      virt-operator-5cb848c66c-2mzmk                                    0/1     CrashLoopBackOff   181 (2m16s ago)   16h
openshift-cnv                                      virt-operator-5cb848c66c-cnkvh                                    0/1     CrashLoopBackOff   182 (2m44s ago)   16h

Comment 1 Dominik Holler 2023-08-30 12:05:16 UTC
https://github.com/kubevirt/managed-tenant-quota/pull/11/ might be helpful to fix this bug

Comment 2 zhe peng 2023-09-14 07:27:40 UTC
verify with build: CNV-v4.14.0.rhel9-1914

1. deploy three CNV4.14 cluster
2. check pods after long time

ssp-operator-68fd8b6b98-2kzwp                                     1/1     Running   1 (20h ago)    20h
virt-api-7c84cd7ffd-c6zxx                                         1/1     Running   0              20h
virt-api-7c84cd7ffd-twfmk                                         1/1     Running   0              20h
virt-controller-674586dbb8-8jrnj                                  1/1     Running   0              3h37m
virt-controller-674586dbb8-vls98                                  1/1     Running   0              20h
virt-exportproxy-856598c54b-259wd                                 1/1     Running   0              20h
virt-exportproxy-856598c54b-9bmz2                                 1/1     Running   0              20h
virt-handler-97cvb                                                1/1     Running   0              20h
virt-handler-dzspl                                                1/1     Running   1 (164m ago)   20h
virt-handler-jd7fq                                                1/1     Running   0              20h
virt-operator-86b97dd8bc-5gvfz                                    1/1     Running   0              20h
virt-operator-86b97dd8bc-gf8jx                                    1/1     Running   0              20h


no CrashLoopBackOff state found. 

3. check all three clusters, no CrashLoopBackOff for cnv pods

move to verified.

Comment 4 errata-xmlrpc 2023-11-08 14:06:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.14.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6817


Note You need to log in before you can comment on or make changes to this bug.