Bug 1814458 - [storage] KubePodCrashLooping in 4.5 release promotion jobs
Summary: [storage] KubePodCrashLooping in 4.5 release promotion jobs
Keywords:
Status: CLOSED DUPLICATE of bug 1814280
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: aos-storage-staff@redhat.com
QA Contact: Liang Xia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-17 23:38 UTC by W. Trevor King
Modified: 2020-03-18 12:25 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1814594 (view as bug list)
Environment:
Last Closed: 2020-03-18 12:25:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description W. Trevor King 2020-03-17 23:38:57 UTC
One of the more common 4.5 failure modes in the past 24h:

$ curl -s 'https://search.svc.ci.openshift.org/search?name=^release-openshift-ocp-installer-.*-4.5&search=promQL+query:+count_over_time.*reported+incorrect+results&type=build-log&maxAge=24h&context=0' | jq -r '. | to_entries[].value | to_entries[].value[].context[]' | sed -n 's/.*incorrect results:\\n\(.*\)",$/\1/p' | sed 's|\\||g' | jq -r '.[].metric.alertname' | sort | uniq -c | sort -n | tail -n3
     17 TargetDown
     33 KubePodCrashLooping
    107 FailingOperator

Reasonably well distributed over our flavors:

$ curl -s 'https://search.svc.ci.openshift.org/search?name=^release-openshift-ocp-installer-.*-4.5&search=promQL+query:+count_over_time.*reported+incorrect+results.*KubePodCrashLooping&maxAge=24h' | jq -r '. | keys[]' | sed 's|/[^/]*$||' | sort | uniq -c
     12 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5
      6 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-fips-4.5
      4 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-ovn-4.5
      2 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-4.5
      1 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-ovn-4.5
      1 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.5
      1 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-ovn-4.5
      4 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.5
      2 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-4.5

AWS jobs to dig into:

$ curl -s 'https://search.svc.ci.openshift.org/search?name=^release-openshift-ocp-installer-e2e-aws-4.5&search=promQL+query:+count_over_time.*reported+incorrect+results.*KubePodCrashLooping&maxAge=24h' | jq -r '. | keys[]'
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/435
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/436
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/444
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/447
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/449
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/451
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/455
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/457
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/471
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/473
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/478
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/479

Picking the most recent (479), it is also impacted by bug 1812261 (iptables segfaulting and bug 1785023 (ResourceQuota life of a secret).  From the pod JSON [1], I don't see a pod in CrashLoopBackOff, and none of the restartCount seem higher than 2, so I'm not sure what is crashing or causing the crashes.

[1]: https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/479/artifacts/e2e-aws/pods.json

Comment 1 Stefan Schimanski 2020-03-18 10:41:45 UTC
curl -s 'https://search.svc.ci.openshift.org/search?name=^release-openshift-ocp-installer-.*-4.5&search=promQL+query:+count_over_time.*reported+incorrect+results&type=build-log&maxAge=24h&context=0' | jq -r '. | to_entries[].value | to_entries[].value[].context[]' | sed -n 's/.*incorrect results:\\n\(.*\)",$/\1/p' | sed 's|\\||g' | jq -r '.[].metric | select(.alertname == "KubePodCrashLooping") | (.namespace + "/" + .pod)' | sort | uniq -c | sort -n

   1 openshift-csi-snapshot-controller/csi-snapshot-controller-547f4bb8b5-sb2lj
   1 openshift-csi-snapshot-controller/csi-snapshot-controller-594d844779-whtx9
   1 openshift-csi-snapshot-controller/csi-snapshot-controller-65d54b4b4c-2jf85
   1 openshift-csi-snapshot-controller/csi-snapshot-controller-65d54b4b4c-8cgqd
   1 openshift-csi-snapshot-controller/csi-snapshot-controller-65d54b4b4c-8lkfx
   1 openshift-csi-snapshot-controller/csi-snapshot-controller-65d54b4b4c-99gr4
   1 openshift-csi-snapshot-controller/csi-snapshot-controller-65d54b4b4c-bknnk
   1 openshift-csi-snapshot-controller/csi-snapshot-controller-65d54b4b4c-czf9t
   1 openshift-csi-snapshot-controller/csi-snapshot-controller-65d54b4b4c-h5bgz
   1 openshift-csi-snapshot-controller/csi-snapshot-controller-65d54b4b4c-k5xzn
   1 openshift-csi-snapshot-controller/csi-snapshot-controller-65d54b4b4c-kcg98
   1 openshift-csi-snapshot-controller/csi-snapshot-controller-65d54b4b4c-l42g5
   1 openshift-csi-snapshot-controller/csi-snapshot-controller-65d54b4b4c-l94sh
   1 openshift-csi-snapshot-controller/csi-snapshot-controller-65d54b4b4c-m57tm
   1 openshift-csi-snapshot-controller/csi-snapshot-controller-65d54b4b4c-mtvpp
   1 openshift-csi-snapshot-controller/csi-snapshot-controller-65d54b4b4c-mxx8n
   1 openshift-csi-snapshot-controller/csi-snapshot-controller-65d54b4b4c-ps846
   1 openshift-csi-snapshot-controller/csi-snapshot-controller-65d54b4b4c-qrb7n
   1 openshift-csi-snapshot-controller/csi-snapshot-controller-65d54b4b4c-rsgdg
   1 openshift-csi-snapshot-controller/csi-snapshot-controller-65d54b4b4c-tq74j
   1 openshift-csi-snapshot-controller/csi-snapshot-controller-65d54b4b4c-tvhdn
   1 openshift-csi-snapshot-controller/csi-snapshot-controller-65d54b4b4c-wrw6v
   1 openshift-csi-snapshot-controller/csi-snapshot-controller-7cc54dbd4d-z4cl2
   1 openshift-csi-snapshot-controller/csi-snapshot-controller-7f9d66fc78-94q9p
   1 openshift-ovn-kubernetes/ovnkube-node-9vhvs
   1 openshift-ovn-kubernetes/ovnkube-node-v45nr
   1 openshift-ovn-kubernetes/ovnkube-node-vlwj6
   1 openshift-ovn-kubernetes/ovnkube-node-vmvzz
   1 openshift-sdn/sdn-controller-fp8nt

Comment 2 Jan Safranek 2020-03-18 12:25:44 UTC

*** This bug has been marked as a duplicate of bug 1814280 ***


Note You need to log in before you can comment on or make changes to this bug.