Bug 1723914

Summary: 4.2 CI failed with - Not all desired DNS DaemonSets available
Product: OpenShift Container Platform Reporter: Ben Bennett <bbennett>
Component: NetworkingAssignee: Dan Mace <dmace>
Networking sub component: DNS QA Contact: Hongan Li <hongli>
Status: CLOSED DUPLICATE Docs Contact:
Severity: medium    
Priority: medium CC: aos-bugs, cdc, wking
Version: 4.2.0Keywords: Reopened
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-13 08:05:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ben Bennett 2019-06-25 17:55:30 UTC
Description of problem:

A 4.2 CI run failed with "Not all desired DNS DaemonSets available".

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.2/56


Version-Release number of selected component (if applicable):

release:4.2.0-0.nightly-2019-06-25-143607


How reproducible:

Only seen once.

Comment 1 Dan Mace 2019-06-25 19:35:54 UTC
https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.2/56/artifacts/e2e-aws/must-gather/namespaces/openshift-dns/core/events.yaml
https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.2/56/artifacts/e2e-aws/must-gather/namespaces/openshift-dns/apps/daemonsets.yaml
https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.2/56/artifacts/e2e-aws/must-gather/namespaces/openshift-dns/pods/

Current line of inquiry is investigating dns-default-cg574 which appears in the event log but not in the state dump.

    message: 'Failed create pod sandbox: rpc error: code = Unknown desc = failed to
    create pod network sandbox k8s_dns-default-cg574_openshift-dns_80dff9ab-975b-11e9-a1aa-0a349e682728_0(d64b9d47df6414e11f0ad4cbed67d6f216bc38d3aff6812670239fc22e20ec03):
    netplugin failed but error parsing its diagnostic message "": unexpected end of
    JSON input'

Lots of other SDN errors for the pods that do exist before they finally got created.

Still looking around, just wanted to communicate some notes.

Comment 3 Dan Mace 2019-06-25 19:49:07 UTC
Is our status reporting here correct? We're reporting degraded, which seems appropriate.

Comment 4 Dan Mace 2019-07-30 14:06:10 UTC
All evidence so far points to some transient SDN issue. If this is still happening, feel free to re-open against SDN.

Comment 5 W. Trevor King 2019-09-13 04:01:36 UTC
New bug filed in bug 1751246.  Marking this one as a dup of the new one so they have a structured Bugzilla connection ;)

*** This bug has been marked as a duplicate of bug 1751246 ***

Comment 6 Casey Callendrello 2019-09-13 08:04:04 UTC
This is definitely not a duplicate of the other bug - loopback is coredumping

Jun 25 15:12:12 ip-10-0-155-42 systemd-coredump[2686]: Process 2642 (loopback) of user 0 dumped core.
                                                       
                                                       Stack trace of thread 2642:
                                                       #0  0x00007f61c09960d3 _dl_relocate_object (/usr/lib64/ld-2.28.so)
                                                       #1  0x00007f61c098e1af dl_main (/usr/lib64/ld-2.28.so)
                                                       #2  0x00007f61c09a3b00 _dl_sysdep_start (/usr/lib64/ld-2.28.so)
                                                       #3  0x00007f61c098c0f8 _dl_start (/usr/lib64/ld-2.28.so)
                                                       #4  0x00007f61c098b038 _start (/usr/lib64/ld-2.28.so)


It looks like another instance of https://bugzilla.redhat.com/show_bug.cgi?id=1725832

Comment 7 Casey Callendrello 2019-09-13 08:05:15 UTC

*** This bug has been marked as a duplicate of bug 1725832 ***