Bug 1744029

Summary:	e2e flake: Failed to execute container related command like: logs, exec with error "tls: internal error"
Product:	OpenShift Container Platform	Reporter:	zhou ying <yinzhou>
Component:	Cloud Compute	Assignee:	Michael Gugino <mgugino>
Status:	CLOSED ERRATA	QA Contact:	Jianwei Hou <jhou>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	4.2.0	CC:	agarcial, amcdermo, aos-bugs, hongkliu, jokerman, mfojtik, mgugino, mpatel, rkrawitz, rphillips
Target Milestone:	---
Target Release:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-10-16 06:36:58 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description zhou ying 2019-08-21 07:27:30 UTC

Description of problem:
Test failed in job: 
https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-gcp-4.2/56

Failed cases: 
failed: (52.3s) 2019-08-20T11:05:09 "[sig-storage] In-tree Volumes [Driver: nfs] [Testpattern: Inline-volume (default fs)] subPath should support existing single file [Suite:openshift/conformance/parallel] [Suite:k8s]"
failed: (28s) 2019-08-20T11:04:25 "[sig-storage] PersistentVolumes-local  [Volume type: blockfswithformat] One pod requesting one prebound PVC should be able to mount volume and write from pod1 [Suite:openshift/conformance/parallel] [Suite:k8s]"


Failed errors:
fail [k8s.io/kubernetes/test/e2e/storage/utils/local.go:134]: Unexpected error:
    <exec.CodeExitError>: {
        Err: {
            s: "error running &{/usr/bin/kubectl [kubectl --server=https://api.ci-op-0zphwvsz-711dc.origin-ci-int-gce.dev.openshift.com:6443 --kubeconfig=/tmp/admin.kubeconfig exec --namespace=e2e-persistent-local-volumes-test-4467 hostexec-ci-op--v5m9b-w-b-7szpf.c.openshift-gce-devel-ci.internal -- nsenter --mount=/rootfs/proc/1/ns/mnt -- sh -c mkdir -p /tmp/local-volume-test-3ba1bee8-c33a-11e9-8d5d-0a58ac10dbe9 && dd if=/dev/zero of=/tmp/local-volume-test-3ba1bee8-c33a-11e9-8d5d-0a58ac10dbe9/file bs=4096 count=5120 && sudo losetup -f /tmp/local-volume-test-3ba1bee8-c33a-11e9-8d5d-0a58ac10dbe9/file] []  <nil>  Error from server: error dialing backend: remote error: tls: internal error\n [] <nil> 0xc00434ec30 exit status 1 <nil> <nil> true [0xc0037bafb0 0xc0037bafc8 0xc0037bafe0] [0xc0037bafb0 0xc0037bafc8 0xc0037bafe0] [0xc0037bafc0 0xc0037bafd8] [0x95d7a0 0x95d7a0] 0xc001b80ea0 <nil>}:\nCommand stdout:\n\nstderr:\nError from server: error dialing backend: remote error: tls: internal error\n\nerror:\nexit status 1\n",
        },
        Code: 1,
    }
    error running &{/usr/bin/kubectl [kubectl --server=https://api.ci-op-0zphwvsz-711dc.origin-ci-int-gce.dev.openshift.com:6443 --kubeconfig=/tmp/admin.kubeconfig exec --namespace=e2e-persistent-local-volumes-test-4467 hostexec-ci-op--v5m9b-w-b-7szpf.c.openshift-gce-devel-ci.internal -- nsenter --mount=/rootfs/proc/1/ns/mnt -- sh -c mkdir -p /tmp/local-volume-test-3ba1bee8-c33a-11e9-8d5d-0a58ac10dbe9 && dd if=/dev/zero of=/tmp/local-volume-test-3ba1bee8-c33a-11e9-8d5d-0a58ac10dbe9/file bs=4096 count=5120 && sudo losetup -f /tmp/local-volume-test-3ba1bee8-c33a-11e9-8d5d-0a58ac10dbe9/file] []  <nil>  Error from server: error dialing backend: remote error: tls: internal error
     [] <nil> 0xc00434ec30 exit status 1 <nil> <nil> true [0xc0037bafb0 0xc0037bafc8 0xc0037bafe0] [0xc0037bafb0 0xc0037bafc8 0xc0037bafe0] [0xc0037bafc0 0xc0037bafd8] [0x95d7a0 0x95d7a0] 0xc001b80ea0 <nil>}:
    Command stdout:
    
    stderr:
    Error from server: error dialing backend: remote error: tls: internal error
    
    error:
    exit status 1
    
occurred

fail [k8s.io/kubernetes/test/e2e/framework/util.go:2323]: Unexpected error:
    <exec.CodeExitError>: {
        Err: {
            s: "error running &{/usr/bin/kubectl [kubectl --server=https://api.ci-op-0zphwvsz-711dc.origin-ci-int-gce.dev.openshift.com:6443 --kubeconfig=/tmp/admin.kubeconfig logs nfs-server nfs-server --namespace=e2e-volume-1778] []  <nil>  Error from server: Get https://ci-op--v5m9b-w-b-7szpf.c.openshift-gce-devel-ci.internal:10250/containerLogs/e2e-volume-1778/nfs-server/nfs-server: remote error: tls: internal error\n [] <nil> 0xc002c43d40 exit status 1 <nil> <nil> true [0xc002be4020 0xc002be4038 0xc002be4050] [0xc002be4020 0xc002be4038 0xc002be4050] [0xc002be4030 0xc002be4048] [0x95d7a0 0x95d7a0] 0xc002b0cba0 <nil>}:\nCommand stdout:\n\nstderr:\nError from server: Get https://ci-op--v5m9b-w-b-7szpf.c.openshift-gce-devel-ci.internal:10250/containerLogs/e2e-volume-1778/nfs-server/nfs-server: remote error: tls: internal error\n\nerror:\nexit status 1\n",
        },
        Code: 1,
    }
    error running &{/usr/bin/kubectl [kubectl --server=https://api.ci-op-0zphwvsz-711dc.origin-ci-int-gce.dev.openshift.com:6443 --kubeconfig=/tmp/admin.kubeconfig logs nfs-server nfs-server --namespace=e2e-volume-1778] []  <nil>  Error from server: Get https://ci-op--v5m9b-w-b-7szpf.c.openshift-gce-devel-ci.internal:10250/containerLogs/e2e-volume-1778/nfs-server/nfs-server: remote error: tls: internal error
     [] <nil> 0xc002c43d40 exit status 1 <nil> <nil> true [0xc002be4020 0xc002be4038 0xc002be4050] [0xc002be4020 0xc002be4038 0xc002be4050] [0xc002be4030 0xc002be4048] [0x95d7a0 0x95d7a0] 0xc002b0cba0 <nil>}:
    Command stdout:
    
    stderr:
    Error from server: Get https://ci-op--v5m9b-w-b-7szpf.c.openshift-gce-devel-ci.internal:10250/containerLogs/e2e-volume-1778/nfs-server/nfs-server: remote error: tls: internal error
    
    error:
    exit status 1
    
occurred

Aug 20 11:04:48.000 I persistentvolume/pvc-3e773ceb-c33a-11e9-87ad-42010a000005 googleapi: Error 400: The disk resource 'projects/openshift-gce-devel-ci/zones/us-east1-c/disks/ci-op--v5m9b-dynamic-pvc-3e773ceb-c33a-11e9-87ad-42010a000005' is already being used by 'projects/openshift-gce-devel-ci/zones/us-east1-c/instances/ci-op--v5m9b-w-c-bzr9k', resourceInUseByAnotherResource
Version-Release number of selected component (if applicable):


How reproducible:
occasionally

Comment 2 zhou ying 2019-08-22 02:15:51 UTC

In this test job, about half numbers [sig-storage] test cases  were failed with this error.

Comment 5 Seth Jennings 2019-08-28 19:29:18 UTC

*** Bug 1743741 has been marked as a duplicate of this bug. ***

Comment 6 Alberto 2019-09-04 09:32:25 UTC

The pending CSRs issue should be covered by the PRs in https://bugzilla.redhat.com/show_bug.cgi?id=1717610 and https://bugzilla.redhat.com/show_bug.cgi?id=1746881. Also this adds some additional logs https://github.com/openshift/cluster-machine-approver/pull/44 So I set this to modified as those go in.
I'm still not closing as duplicated though as notice the machine approver is also struggling to reach the api server so this could be symptom of an underlying issue.

Comment 7 Alberto 2019-09-05 08:16:08 UTC

https://github.com/openshift/cluster-machine-approver/pull/43
https://github.com/openshift/cluster-machine-approver/pull/41

Comment 9 zhou ying 2019-09-05 10:03:00 UTC

Checked with the latest job, can't reproduce the issue:
https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-gcp-4.2/182
https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-gcp-4.2/179

Comment 10 errata-xmlrpc 2019-10-16 06:36:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922

Comment 11 Hongkai Liu 2020-01-31 21:34:49 UTC

Found another resourceInUseByAnotherResource in CI
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.2/357