Bug 2050882

Summary: Crio appears to be coredumping in some scenarios
Product: OpenShift Container Platform Reporter: Paul Maidment <pmaidmen>
Component: NodeAssignee: Peter Hunt <pehunt>
Node sub component: CRI-O QA Contact: Sunil Choudhary <schoudha>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: aos-bugs, dwalsh, tsweeney
Version: 4.10   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 2052125 (view as bug list) Environment:
Last Closed: 2022-08-10 10:47:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2052125    

Description Paul Maidment 2022-02-04 20:43:13 UTC
Description of problem:

During tests crio appears to be coredumping in some scenarios, this leads to an unavailability of crio.sock on the host which leads to downtime and test failure.

Version-Release number of selected component (if applicable): This appears to be present in 4.10 tests, unsure if it exists elsewhere or not.


How reproducible:
This can be easily seen when examining various Prow jobs, I have provided four examples below with links to the relevant journal entries and core dumps.

==========
Example 1:
==========

--------------
Test run:
--------------
pull-ci-openshift-cluster-network-operator-master-e2e-aws-sdn-multi #1484627511491956736

--------------
Prow URL:
--------------
https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/1287/pull-ci-openshift-cluster-network-operator-master-e2e-aws-sdn-multi/1484627511491956736

---------------
Prow artifacts: 
---------------
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/1287/pull-ci-openshift-cluster-network-operator-master-e2e-aws-sdn-multi/1484627511491956736/

----------------------------
Reported JUnit error:
----------------------------
	"Managed cluster should report ready nodes the entire duration of the test run"
	fail [github.com/openshift/origin/test/extended/machines/cluster.go:83]: Unexpected error:
	Test fails in this function: https://github.com/openshift/origin/blob/master/test/extended/machines/cluster.go#L64-L85 due to unexpected host unavailability logged in Promethius

--------------
Affected host:
--------------
ip-10-0-173-140.us-west-1.compute.internal	
	
-----------------
Journal location:
-----------------
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/1287/pull-ci-openshift-cluster-network-operator-master-e2e-aws-sdn-multi/1484627511491956736/artifacts/e2e-aws-sdn-multi/gather-extra/artifacts/nodes/ip-10-0-173-140.us-west-1.compute.internal/

-------------------
Core dump location:
-------------------
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/1287/pull-ci-openshift-cluster-network-operator-master-e2e-aws-sdn-multi/1484627511491956736/artifacts/e2e-aws-sdn-multi/gather-core-dump/artifacts/
	
----------------
Journal extract:
----------------
Jan 21 21:40:31.511763 ip-10-0-173-140 crio[1340]: time="2022-01-21 21:40:31.510013084Z" level=info msg="Checking image status: quay.io/openshift/community-e2e-images:e2e-23-k8s-gcr-io-e2e-test-images-nginx-1-14-2-5IU_bdDKXt6CZCwt" id=7778427c-dcbb-4e8c-804c-9047a9e31e04 name=/runtime.v1.ImageService/ImageStatus
Jan 21 21:40:31.524720 ip-10-0-173-140 crio[1340]: time="2022-01-21 21:40:31.520915355Z" level=info msg="Checking image status: quay.io/openshift/community-e2e-images:e2e-23-k8s-gcr-io-e2e-test-images-nginx-1-14-2-5IU_bdDKXt6CZCwt" id=6dda5b13-4356-42e6-be3d-0faa669637c2 name=/runtime.v1.ImageService/ImageStatus
Jan 21 21:40:31.531953 ip-10-0-173-140 crio[1340]: time="2022-01-21 21:40:31.531807037Z" level=info msg="Created container d3c0463659cb13670d6e953d753e10fdcc65fe7640cf498e9fb6a95632e1f7fb: e2e-projected-8823/downwardapi-volume-4e0039e7-51d2-4b25-bab3-acff0e8d1053/client-container" id=72d8f758-c9ba-474b-8c51-e39795e89261 name=/runtime.v1.RuntimeService/CreateContainer
Jan 21 21:40:31.535824 ip-10-0-173-140 crio[1340]: time="2022-01-21 21:40:31.535537460Z" level=info msg="Starting container: d3c0463659cb13670d6e953d753e10fdcc65fe7640cf498e9fb6a95632e1f7fb" id=4b76768d-c614-4cef-9878-f12df5da2ce4 name=/runtime.v1.RuntimeService/StartContainer
Jan 21 21:40:31.569836 ip-10-0-173-140 hyperkube[1374]: E0121 21:40:31.568883    1374 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"simpletest.deployment-b9656bd8b-hzbxq.16cc67035068e318", GenerateName:"", Namespace:"e2e-gc-5229", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"e2e-gc-5229", Name:"simpletest.deployment-b9656bd8b-hzbxq", UID:"62e4e3db-f171-4dfb-b4fc-9a6990bbc1d7", APIVersion:"v1", ResourceVersion:"121513", FieldPath:"spec.containers{nginx}"}, Reason:"Created", Message:"Created container nginx", Source:v1.EventSource{Component:"kubelet", Host:"ip-10-0-173-140.us-west-1.compute.internal"}, FirstTimestamp:time.Date(2022, time.January, 21, 21, 40, 28, 15002392, time.Local), LastTimestamp:time.Date(2022, time.January, 21, 21, 40, 28, 15002392, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events "simpletest.deployment-b9656bd8b-hzbxq.16cc67035068e318" is forbidden: unable to create new content in namespace e2e-gc-5229 because it is being terminated' (will not retry!)
Jan 21 21:40:31.598462 ip-10-0-173-140 systemd[1]: var-lib-containers-storage-overlay-c199b5109d2c7b7f5186bd05a898faaf49736a559e3aa121cbd03f6ed49c511e-merged.mount: Succeeded.
Jan 21 21:40:31.599271 ip-10-0-173-140 systemd[1]: var-lib-containers-storage-overlay-c199b5109d2c7b7f5186bd05a898faaf49736a559e3aa121cbd03f6ed49c511e-merged.mount: Consumed 0 CPU time
Jan 21 21:40:31.602911 ip-10-0-173-140 crio[1340]: time="2022-01-21 21:40:31.601478358Z" level=info msg="Started container" PID=101548 containerID=d3c0463659cb13670d6e953d753e10fdcc65fe7640cf498e9fb6a95632e1f7fb description=e2e-projected-8823/downwardapi-volume-4e0039e7-51d2-4b25-bab3-acff0e8d1053/client-container id=4b76768d-c614-4cef-9878-f12df5da2ce4 name=/runtime.v1.RuntimeService/StartContainer sandboxID=c5f082fe479d195987b6f6fb90be523aea1a54fc9b66ce4fc0653b7a0db1941d
Jan 21 21:40:31.606033 ip-10-0-173-140 systemd[1]: Started crio-conmon-5e42e7d9a13540107e184bbb923ce0f467282c86b7e0878eabfa33acefd42e2a.scope.
Jan 21 21:40:31.620056 ip-10-0-173-140 ovs-vswitchd[1183]: ovs|05444|connmgr|INFO|br0<->unix#14443: 2 flow_mods in the last 0 s (2 deletes)
Jan 21 21:40:31.650085 ip-10-0-173-140 kernel: device veth9d125ef5 entered promiscuous mode
Jan 21 21:40:31.646685 ip-10-0-173-140 ovs-vswitchd[1183]: ovs|05445|bridge|INFO|bridge br0: added interface veth9d125ef5 on port 333
Jan 21 21:40:31.651322 ip-10-0-173-140 NetworkManager[1196]: <info>  [1642801231.6512] manager: (veth9d125ef5): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/686)
Jan 21 21:40:31.653214 ip-10-0-173-140 crio[1340]: time="2022-01-21 21:40:31.652376722Z" level=info msg="Removed container 3d0a7e1ff5d30dc1d5a6d3524781902d57a12368de6bd12819822727981d0c08: e2e-provisioning-5176/pod-subpath-test-preprovisionedpv-bj4m/init-volume-preprovisionedpv-bj4m" id=b580b3cb-fb65-484e-ad2e-d5f32a98d370 name=/runtime.v1.RuntimeService/RemoveContainer
Jan 21 21:40:31.661030 ip-10-0-173-140 systemd[1]: Started libcontainer container 5e42e7d9a13540107e184bbb923ce0f467282c86b7e0878eabfa33acefd42e2a.
Jan 21 21:40:31.674772 ip-10-0-173-140 crio[1340]: panic: runtime error: index out of range [-1]
Jan 21 21:40:31.674772 ip-10-0-173-140 crio[1340]: goroutine 148518 [running]:
Jan 21 21:40:31.674772 ip-10-0-173-140 crio[1340]: panic({0x55c6442c6640, 0xc00402c588})
Jan 21 21:40:31.674772 ip-10-0-173-140 crio[1340]:         /usr/lib/golang/src/runtime/panic.go:1147 +0x3a8 fp=0xc000e25748 sp=0xc000e25688 pc=0x55c6423acfc8
Jan 21 21:40:31.674772 ip-10-0-173-140 crio[1340]: runtime.goPanicIndex(0xffffffffffffffff, 0x2094b)
Jan 21 21:40:31.674772 ip-10-0-173-140 crio[1340]:         /usr/lib/golang/src/runtime/panic.go:90 +0x75 fp=0xc000e25788 sp=0xc000e25748 pc=0x55c6423aa3d5
...
...

	
==========
Example 2:
==========

--------------
Test run:
--------------
periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-crun #1485673817597546496

--------------
Prow URL:
--------------
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-crun/1485673817597546496

--------------
Prow artifacts
--------------
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-crun/1485673817597546496/

----------------------------
Reported JUnit error:
----------------------------
	"Managed cluster should report ready nodes the entire duration of the test run"
	fail [github.com/openshift/origin/test/extended/machines/cluster.go:83]: Unexpected error:
	Test fails in this function: https://github.com/openshift/origin/blob/master/test/extended/machines/cluster.go#L64-L85 due to unexpected host unavailability logged in Promethius

--------------
Affected host:
--------------
ip-10-0-148-41.us-west-2.compute.internal

-----------------
Journal location:
-----------------
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-crun/1485673817597546496/artifacts/e2e-aws-crun/gather-extra/artifacts/nodes/ip-10-0-148-41.us-west-2.compute.internal/

-------------------
Core dump location:
-------------------
There is a core dump for crio in this location. 

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-crun/1485673817597546496/artifacts/e2e-aws-crun/gather-core-dump/artifacts/

----------------
Journal extract:
----------------
Jan 24 18:54:29.126714 ip-10-0-148-41 crio[1345]: time="2022-01-24 18:54:29.126687657Z" level=info msg="Got pod network &{Name:adopt-release-qcrj9 Namespace:e2e-job-5099 ID:d3ce78f108c298e645da50f217cf7229a2e129ba96d1971f3bc1efd7e0c0bfe8 UID:28b3d4cb-4d07-4286-be0e-4dece966cece NetNS:/var/run/netns/72048597-239c-46be-bb40-de49b58cc7b9 Networks:[{Name:multus-cni-network Ifname:eth0}] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}"
Jan 24 18:54:29.126842 ip-10-0-148-41 crio[1345]: time="2022-01-24 18:54:29.126825626Z" level=info msg="Deleting pod e2e-job-5099_adopt-release-qcrj9 from CNI network \"multus-cni-network\" (type=multus)"
Jan 24 18:54:29.160217 ip-10-0-148-41 ovs-vswitchd[1185]: ovs|02695|connmgr|INFO|br0<->unix#6647: 2 flow_mods in the last 0 s (2 deletes)
Jan 24 18:54:29.187464 ip-10-0-148-41 ovs-vswitchd[1185]: ovs|02696|connmgr|INFO|br0<->unix#6650: 4 flow_mods in the last 0 s (4 deletes)
Jan 24 18:54:29.214655 ip-10-0-148-41 ovs-vswitchd[1185]: ovs|02697|bridge|INFO|bridge br0: deleted interface vethdbb4b014 on port 219
Jan 24 18:54:29.218893 ip-10-0-148-41 kernel: device vethdbb4b014 left promiscuous mode
Jan 24 18:54:29.233764 ip-10-0-148-41 crio[1345]: 2022-01-24T18:54:29Z [verbose] Del: e2e-job-5099:adopt-release-7jztg:4f9bb107-9e01-4857-8adf-24417dbeecf3:openshift-sdn:eth0 {"cniVersion":"0.3.1","name":"openshift-sdn","type":"openshift-sdn"}
Jan 24 18:54:29.252038 ip-10-0-148-41 ovs-vswitchd[1185]: ovs|02698|connmgr|INFO|br0<->unix#6653: 2 flow_mods in the last 0 s (2 deletes)
Jan 24 18:54:29.367971 ip-10-0-148-41 crio[1345]: time="2022-01-24 18:54:29.367924488Z" level=info msg="Stopped pod sandbox: 4cec496efce42283ef52aca196d6effa736a4ddb24138a9d93ea6c80d59ecc36" id=59367ce8-d1d8-48c3-87b6-3c5d57b419bd name=/runtime.v1.RuntimeService/StopPodSandbox
Jan 24 18:54:29.369391 ip-10-0-148-41 ovs-vswitchd[1185]: ovs|02699|connmgr|INFO|br0<->unix#6656: 4 flow_mods in the last 0 s (4 deletes)
Jan 24 18:54:29.385163 ip-10-0-148-41 crio[1345]: panic: runtime error: index out of range [-1]
Jan 24 18:54:29.385163 ip-10-0-148-41 crio[1345]: goroutine 109908 [running]:
Jan 24 18:54:29.385163 ip-10-0-148-41 crio[1345]: panic({0x557c153df540, 0xc000f41488})
Jan 24 18:54:29.385163 ip-10-0-148-41 crio[1345]:         /usr/lib/golang/src/runtime/panic.go:1147 +0x3a8 fp=0xc002959748 sp=0xc002959688 pc=0x557c134c6008
Jan 24 18:54:29.385163 ip-10-0-148-41 crio[1345]: runtime.goPanicIndex(0xffffffffffffffff, 0x14046)
Jan 24 18:54:29.385163 ip-10-0-148-41 crio[1345]:         /usr/lib/golang/src/runtime/panic.go:90 +0x75 fp=0xc002959788 sp=0xc002959748 pc=0x557c134c3415
Jan 24 18:54:29.385163 ip-10-0-148-41 crio[1345]: github.com/cri-o/cri-o/vendor/k8s.io/cri-api/pkg/apis/runtime/v1.encodeVarintApi(...)
Jan 24 18:54:29.385163 ip-10-0-148-41 crio[1345]:         /builddir/build/BUILD/cri-o-a97515269235404afae754c65bad272c92508731/_output/src/github.com/cri-o/cri-o/vendor/k8s.io/cri-api/pkg/apis/runtime/v1/api.pb.go:15630
...
...

==========
Example 3:
==========

--------------
Test run:
--------------
periodic-ci-openshift-release-master-ci-4.10-e2e-aws-techpreview #1485804740204826624

--------------
Prow URL:
--------------
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-aws-techpreview/1485804740204826624

--------------
Prow artifacts
--------------
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-aws-techpreview/1485804740204826624/

----------------------------
Reported JUnit error:
----------------------------
	"Managed cluster should report ready nodes the entire duration of the test run"
	fail [github.com/openshift/origin/test/extended/machines/cluster.go:83]: Unexpected error:
	Test fails in this function: https://github.com/openshift/origin/blob/master/test/extended/machines/cluster.go#L64-L85 due to unexpected host unavailability logged in Promethius

--------------
Affected host:
--------------
ip-10-0-170-132.ec2.internal

-----------------
Journal location:
-----------------
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-aws-techpreview/1485804740204826624/artifacts/e2e-aws-techpreview/gather-extra/artifacts/nodes/ip-10-0-170-132.ec2.internal/

-------------------
Core dump location:
-------------------
There is a core dump for crio in this location. 

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-aws-techpreview/1485804740204826624/artifacts/e2e-aws-techpreview/gather-core-dump/artifacts/

----------------
Journal extract:
----------------
Jan 25 03:26:34.926919 ip-10-0-170-132 hyperkube[1380]: I0125 03:26:34.925605    1380 operation_generator.go:909] UnmountVolume.TearDown succeeded for volume "kubernetes.io/projected/810e25b5-c74c-4122-a45d-e26039f16512-kube-api-access-xdr7j" (OuterVolumeSpecName: "kube-api-access-xdr7j") pod "810e25b5-c74c-4122-a45d-e26039f16512" (UID: "810e25b5-c74c-4122-a45d-e26039f16512"). InnerVolumeSpecName "kube-api-access-xdr7j". PluginName "kubernetes.io/projected", VolumeGidValue ""
Jan 25 03:26:34.931126 ip-10-0-170-132 crio[1339]: time="2022-01-25 03:26:34.927897359Z" level=info msg="Stopped pod sandbox: bd8a6cea340b480d460ea7eb2a97e11756b284544334278256c1a0eb17283502" id=1fdaf14b-d381-4292-85b3-1b99a74a4ace name=/runtime.v1.RuntimeService/StopPodSandbox
Jan 25 03:26:34.994315 ip-10-0-170-132 hyperkube[1380]: I0125 03:26:34.993401    1380 reconciler.go:295] "Volume detached for volume \"kube-api-access-xdr7j\" (UniqueName: \"kubernetes.io/projected/810e25b5-c74c-4122-a45d-e26039f16512-kube-api-access-xdr7j\") on node \"ip-10-0-170-132.ec2.internal\" DevicePath \"\""
Jan 25 03:26:34.994315 ip-10-0-170-132 hyperkube[1380]: I0125 03:26:34.993443    1380 reconciler.go:295] "Volume detached for volume \"socket-dir\" (UniqueName: \"kubernetes.io/host-path/810e25b5-c74c-4122-a45d-e26039f16512-socket-dir\") on node \"ip-10-0-170-132.ec2.internal\" DevicePath \"\""
Jan 25 03:26:34.996245 ip-10-0-170-132 crio[1339]: 2022-01-25T03:26:34Z [verbose] Del: e2e-csi-mock-volumes-3292-4139:csi-mockplugin-0:5414ade0-de6d-417b-bf95-f38ee2680d46:openshift-sdn:eth0 {"cniVersion":"0.3.1","name":"openshift-sdn","type":"openshift-sdn"}
Jan 25 03:26:34.999849 ip-10-0-170-132 ovs-vswitchd[1178]: ovs|01202|connmgr|INFO|br0<->unix#3004: 2 flow_mods in the last 0 s (2 deletes)
Jan 25 03:26:35.001297 ip-10-0-170-132 ovs-vswitchd[1178]: ovs|01203|connmgr|INFO|br0<->unix#3005: 2 flow_mods in the last 0 s (2 deletes)
Jan 25 03:26:35.008774 ip-10-0-170-132 crio[1339]: panic: runtime error: index out of range [-1]
Jan 25 03:26:35.008774 ip-10-0-170-132 crio[1339]: goroutine 66870 [running]:
Jan 25 03:26:35.008774 ip-10-0-170-132 crio[1339]: panic({0x561e5f555540, 0xc000ebfda0})
Jan 25 03:26:35.009017 ip-10-0-170-132 crio[1339]:         /usr/lib/golang/src/runtime/panic.go:1147 +0x3a8 fp=0xc000b19748 sp=0xc000b19688 pc=0x561e5d63c008
Jan 25 03:26:35.009017 ip-10-0-170-132 crio[1339]: runtime.goPanicIndex(0xffffffffffffffff, 0x190a1)
Jan 25 03:26:35.009017 ip-10-0-170-132 crio[1339]:         /usr/lib/golang/src/runtime/panic.go:90 +0x75 fp=0xc000b19788 sp=0xc000b19748 pc=0x561e5d639415
...
...

==========
Example 4:
==========

--------------
Test run:
--------------
pull-ci-openshift-installer-master-e2e-aws-single-node #1485851997893562368

--------------
Prow URL:
--------------
https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/5534/pull-ci-openshift-installer-master-e2e-aws-single-node/1485851997893562368

--------------
Prow artifacts
--------------
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_installer/5534/pull-ci-openshift-installer-master-e2e-aws-single-node/1485851997893562368/

----------------------------
Reported JUnit error:
----------------------------
	"Managed cluster should report ready nodes the entire duration of the test run"
	fail [github.com/openshift/origin/test/extended/machines/cluster.go:83]: Unexpected error:
	Test fails in this function: https://github.com/openshift/origin/blob/master/test/extended/machines/cluster.go#L64-L85 due to unexpected host unavailability logged in Promethius

--------------
Affected host:
--------------
ip-10-0-181-131.us-east-2.compute.internal

-----------------
Journal location:
-----------------
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_installer/5534/pull-ci-openshift-installer-master-e2e-aws-single-node/1485851997893562368/artifacts/e2e-aws-single-node/gather-extra/artifacts/nodes/ip-10-0-181-131.us-east-2.compute.internal/

-------------------
Core dump location:
-------------------
There is a core dump for crio in this location. 

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_installer/5534/pull-ci-openshift-installer-master-e2e-aws-single-node/1485851997893562368/artifacts/e2e-aws-single-node/gather-core-dump/artifacts/

----------------
Journal extract:
----------------
Jan 25 07:13:47.527695 ip-10-0-181-131 crio[1416]: I0125 07:13:47.454576  489466 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"e2e-security-context-2300", Name:"security-context-35cd2326-8005-4a80-b447-73cc4421d7ce", UID:"3059dab2-ff3f-4083-b4a1-035d02cab055", APIVersion:"v1", ResourceVersion:"137957", FieldPath:""}): type: 'Normal' reason: 'AddedInterface' Add eth0 [10.128.0.221/23] from openshift-sdn
Jan 25 07:13:47.528731 ip-10-0-181-131 crio[1416]: time="2022-01-25 07:13:47.528678244Z" level=info msg="Got pod network &{Name:security-context-35cd2326-8005-4a80-b447-73cc4421d7ce Namespace:e2e-security-context-2300 ID:8bca8581e23a2e83a21931a770185092230e0ecb8e7abb4ee471d0b43581b726 UID:3059dab2-ff3f-4083-b4a1-035d02cab055 NetNS:/var/run/netns/5785a643-9903-4061-921f-cd87a4b613fc Networks:[] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}"
Jan 25 07:13:47.529081 ip-10-0-181-131 crio[1416]: time="2022-01-25 07:13:47.529051361Z" level=info msg="Checking pod e2e-security-context-2300_security-context-35cd2326-8005-4a80-b447-73cc4421d7ce for CNI network multus-cni-network (type=multus)"
Jan 25 07:13:47.531193 ip-10-0-181-131 crio[1416]: time="2022-01-25 07:13:47.531153377Z" level=info msg="Stopped pod sandbox: c546c80c8d8b809513a27ab47e57ec40f80553e1c9e032713b187246e37a0959" id=7a8d6b57-47d6-4506-a697-3f5d3fa4f449 name=/runtime.v1.RuntimeService/StopPodSandbox
Jan 25 07:13:47.554594 ip-10-0-181-131 hyperkube[1449]: I0125 07:13:47.552497    1449 operation_generator.go:909] UnmountVolume.TearDown succeeded for volume "kubernetes.io/projected/fe03c4a5-842d-44a5-bd2a-4cc11e549e06-kube-api-access-jgvdf" (OuterVolumeSpecName: "kube-api-access-jgvdf") pod "fe03c4a5-842d-44a5-bd2a-4cc11e549e06" (UID: "fe03c4a5-842d-44a5-bd2a-4cc11e549e06"). InnerVolumeSpecName "kube-api-access-jgvdf". PluginName "kubernetes.io/projected", VolumeGidValue ""
Jan 25 07:13:47.555347 ip-10-0-181-131 crio[1416]: panic: runtime error: index out of range [-1]
Jan 25 07:13:47.555633 ip-10-0-181-131 crio[1416]: goroutine 814366 [running]:
Jan 25 07:13:47.555633 ip-10-0-181-131 crio[1416]: panic({0x55afbcdf4540, 0xc00257f080})
Jan 25 07:13:47.555931 ip-10-0-181-131 crio[1416]:         /usr/lib/golang/src/runtime/panic.go:1147 +0x3a8 fp=0xc001295748 sp=0xc001295688 pc=0x55afbaedb008
Jan 25 07:13:47.556330 ip-10-0-181-131 crio[1416]: runtime.goPanicIndex(0xffffffffffffffff, 0x31ace)
Jan 25 07:13:47.556330 ip-10-0-181-131 crio[1416]:         /usr/lib/golang/src/runtime/panic.go:90 +0x75 fp=0xc001295788 sp=0xc001295748 pc=0x55afbaed8415
Jan 25 07:13:47.574157 ip-10-0-181-131 crio[1416]: github.com/cri-o/cri-o/vendor/k8s.io/cri-api/pkg/apis/runtime/v1.encodeVarintApi(...)
Jan 25 07:13:47.574157 ip-10-0-181-131 crio[1416]:         /builddir/build/BUILD/cri-o-a97515269235404afae754c65bad272c92508731/_output/src/github.com/cri-o/cri-o/vendor/k8s.io/cri-api/pkg/apis/runtime/v1/api.pb.go:15630
Jan 25 07:13:47.574157 ip-10-0-181-131 crio[1416]: github.com/cri-o/cri-o/vendor/k8s.io/cri-api/pkg/apis/runtime/v1.(*ListPodSandboxResponse).MarshalToSizedBuffer(0xc0089b9780, {0xc006780000, 0x31ace, 0x31ace})
Jan 25 07:13:47.574157 ip-10-0-181-131 crio[1416]:         /builddir/build/BUILD/cri-o-a97515269235404afae754c65bad272c92508731/_output/src/github.com/cri-o/cri-o/vendor/k8s.io/cri-api/pkg/apis/runtime/v1/api.pb.go:11153 +0x16e fp=0xc0012957c8 sp=0xc001295788 pc=0x55afbb9286ae
Jan 25 07:13:47.574157 ip-10-0-181-131 crio[1416]: github.com/cri-o/cri-o/vendor/k8s.io/cri-api/pkg/apis/runtime/v1.(*ListPodSandboxResponse).Marshal(0x10)

...
...


Steps to Reproduce:
Provided in examples above

Actual results:
Provided in examples above


Expected results:
crio should not coredump


Additional info:

Comment 1 Peter Hunt 2022-02-07 18:22:34 UTC
fixed by attached PR

Comment 2 Peter Hunt 2022-02-08 18:30:57 UTC
PR merged

Comment 6 errata-xmlrpc 2022-08-10 10:47:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069