Bug 1820717

Summary:	[sig-storage] FailMount Unable to attach or mount volumes
Product:	OpenShift Container Platform	Reporter:	Carlos Eduardo Arango Gutierrez <carangog>
Component:	Node	Assignee:	Kir Kolyshkin <kir>
Status:	CLOSED DUPLICATE	QA Contact:	Sunil Choudhary <schoudha>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	unspecified	CC:	aos-bugs, hekumar, jlebon, jokerman, jsafrane, mpatel, rteague, sdodson, zyu
Target Milestone:	---
Target Release:	4.5.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1823597 (view as bug list)		Environment:
Last Closed:	2020-04-22 20:08:49 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1823597

Description Carlos Eduardo Arango Gutierrez 2020-04-03 16:38:10 UTC

Description of problem:

CI tests continuously failing on 
release-openshift-ocp-installer-e2e-aws-rhel7-workers

test : *[sig-storage] In-tree Volumes [Driver: nfs] [Testpattern: Pre-provisioned PV (block volmode)] volumeMode should fail to create pod by failing to mount volume [Suite:openshift/conformance/parallel] [Suite:k8s]*

```
/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:307 +0x4d
github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework.deleteNS(0x65741e0, 0xc002c6c280, 0x63a1f00, 0xc000d6a638, 0xc002b49280, 0x13, 0x8bb2c97000, 0xed618d43a, 0xa6bac80)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/util.go:1194 +0x189
github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework.(*Framework).AfterEach.func1(0xc000843040)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:311 +0x222
github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework.(*Framework).AfterEach(0xc000843040)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:397 +0x400
github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework.RunCleanupActions()
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/cleanup.go:59 +0x66
github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e.glob..func3()
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/e2e.go:164 +0x62
github.com/openshift/origin/vendor/github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync(0xc0009a2c00, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/onsi/ginkgo/internal/leafnodes/runner.go:109 +0x9c
github.com/openshift/origin/vendor/github.com/onsi/ginkgo/internal/leafnodes.(*runner).run(0xc0009a2c00, 0x90, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/onsi/ginkgo/internal/leafnodes/runner.go:63 +0xcf
github.com/openshift/origin/vendor/github.com/onsi/ginkgo/internal/leafnodes.(*synchronizedAfterSuiteNode).Run(0xc0009dc6e0, 0x1, 0x1, 0x0, 0x0, 0x4e6b9e)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/onsi/ginkgo/internal/leafnodes/synchronized_after_suite_node.go:29 +0x64
github.com/openshift/origin/vendor/github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runAfterSuite(0xc002ddaa00, 0xc000010020)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/onsi/ginkgo/internal/specrunner/spec_runner.go:136 +0xd3
github.com/openshift/origin/vendor/github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).registerForInterrupts(0xc002ddaa00)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/onsi/ginkgo/internal/specrunner/spec_runner.go:234 +0x1d9
created by github.com/openshift/origin/vendor/github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).Run
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/onsi/ginkgo/internal/specrunner/spec_runner.go:59 +0x5e

goroutine 199 [IO wait]:
internal/poll.runtime_pollWait(0x7fdc8a88e000, 0x72, 0xffffffffffffffff)
	/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/netpoll.go:182 +0x56
internal/poll.(*pollDesc).wait(0xc003380198, 0x72, 0xdb00, 0xdb94, 0xffffffffffffffff)
	/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/internal/poll/fd_poll_runtime.go:87 +0x9b
internal/poll.(*pollDesc).waitRead(...)
	/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc003380180, 0xc003ab0000, 0xdb94, 0xdb94, 0x0, 0x0, 0x0)
	/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/internal/poll/fd_unix.go:169 +0x19b
net.(*netFD).Read(0xc003380180, 0xc003ab0000, 0xdb94, 0xdb94, 0x203000, 0x0, 0xdb28)
	/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc003931600, 0xc003ab0000, 0xdb94, 0xdb94, 0x0, 0x0, 0x0)
	/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/net/net.go:177 +0x69
crypto/tls.(*atLeastReader).Read(0xc002f0a8e0, 0xc003ab0000, 0xdb94, 0xdb94, 0x60806d4ff88f3934, 0xc00009da30, 0xc00009d9e0)
	/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/crypto/tls/conn.go:761 +0x60
bytes.(*Buffer).ReadFrom(0xc00317b758, 0x639d5c0, 0xc002f0a8e0, 0x40f965, 0x4ee8260, 0x56df840)
	/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/bytes/buffer.go:207 +0xbd
crypto/tls.(*Conn).readFromUntil(0xc00317b500, 0x63a3200, 0xc003931600, 0x5, 0xc003931600, 0x60e)
	/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/crypto/tls/conn.go:783 +0xf8
crypto/tls.(*Conn).readRecordOrCCS(0xc00317b500, 0x5b01800, 0xc00317b638, 0xc00009dd58)
	/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/crypto/tls/conn.go:590 +0x125
crypto/tls.(*Conn).readRecord(...)
	/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/crypto/tls/conn.go:558
crypto/tls.(*Conn).Read(0xc00317b500, 0xc003482000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
	/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/crypto/tls/conn.go:1236 +0x137
bufio.(*Reader).Read(0xc00339de00, 0xc002db2f18, 0x9, 0x9, 0x40be04, 0xc002d96060, 0xc00009dd58)
	/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/bufio/bufio.go:223 +0x23e
io.ReadAtLeast(0x639d320, 0xc00339de00, 0xc002db2f18, 0x9, 0x9, 0x9, 0x639d800, 0xc003a6e2c0, 0xc0000a2040)
	/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/io/io.go:310 +0x88
io.ReadFull(...)
	/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/io/io.go:329
github.com/openshift/origin/vendor/golang.org/x/net/http2.readFrameHeader(0xc002db2f18, 0x9, 0x9, 0x639d320, 0xc00339de00, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/golang.org/x/net/http2/frame.go:237 +0x88
github.com/openshift/origin/vendor/golang.org/x/net/http2.(*Framer).ReadFrame(0xc002db2ee0, 0xc003929830, 0x0, 0x0, 0x0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/golang.org/x/net/http2/frame.go:492 +0xa1
github.com/openshift/origin/vendor/golang.org/x/net/http2.(*clientConnReadLoop).run(0xc00009dfb8, 0x5af8c38, 0xc00344f7b8)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/golang.org/x/net/http2/transport.go:1679 +0x8d
github.com/openshift/origin/vendor/golang.org/x/net/http2.(*ClientConn).readLoop(0xc00155b380)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/golang.org/x/net/http2/transport.go:1607 +0x76
created by github.com/openshift/origin/vendor/golang.org/x/net/http2.(*Transport).newClientConn
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/golang.org/x/net/http2/transport.go:670 +0x637

goroutine 740 [chan receive]:
github.com/openshift/origin/vendor/github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).registerForHardInterrupts(0xc002ddaa00)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/onsi/ginkgo/internal/specrunner/spec_runner.go:244 +0xc0
created by github.com/openshift/origin/vendor/github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).registerForInterrupts
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/onsi/ginkgo/internal/specrunner/spec_runner.go:223 +0x104

goroutine 743 [select]:
github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.poller.func1.1(0xc003971140, 0x77359400, 0x8bb2c97000, 0xc0039710e0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:450 +0x153
created by github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.poller.func1
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:433 +0x8c

goroutine 734 [select]:
github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.poller.func1.1(0xc002dccb40, 0x77359400, 0x45d964b800, 0xc002dccae0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:450 +0x153
created by github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.poller.func1
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:433 +0x8c

rax    0xca
rbx    0xa6bd2a0
rcx    0xffffffffffffffff
rdx    0x0
rdi    0xa6bd3e8
rsi    0x80
rbp    0x7ffed1f155f0
rsp    0x7ffed1f155a8
r8     0x0
r9     0x0
r10    0x0
r11    0x286
r12    0x55
r13    0x1
r14    0x1
r15    0x0
rip    0x464bf1
rflags 0x286
cs     0x33
fs     0x0
gs     0x0
```

Also 

*[sig-storage] In-tree Volumes [Driver: local][LocalVolumeType: dir-bindmounted] [Testpattern: Pre-provisioned PV (default fs)] volumes should be mountable [Suite:openshift/conformance/parallel] [Suite:k8s]*

```
fail [k8s.io/kubernetes/test/e2e/storage/utils/local.go:251]: Unexpected error:
    <exec.CodeExitError>: {
        Err: {
            s: "error running &{/usr/bin/kubectl [kubectl --server=https://api.ci-op-45dwdn11-253c5.origin-ci-int-aws.dev.rhcloud.com:6443 --kubeconfig=/tmp/admin.kubeconfig exec --namespace=e2e-volume-9990 hostexec-ip-10-0-135-178.ec2.internal -- nsenter --mount=/rootfs/proc/1/ns/mnt -- sh -c sudo umount /tmp/local-driver-cf680694-7577-11ea-b330-0a58ac107e30 && rm -r /tmp/local-driver-cf680694-7577-11ea-b330-0a58ac107e30] []  <nil>  rm: cannot remove '/tmp/local-driver-cf680694-7577-11ea-b330-0a58ac107e30': Device or resource busy\ncommand terminated with exit code 1\n [] <nil> 0xc0040815f0 exit status 1 <nil> <nil> true [0xc0038264b0 0xc0038264c8 0xc0038264e0] [0xc0038264b0 0xc0038264c8 0xc0038264e0] [0xc0038264c0 0xc0038264d8] [0x95bef0 0x95bef0] 0xc001266d80 <nil>}:\nCommand stdout:\n\nstderr:\nrm: cannot remove '/tmp/local-driver-cf680694-7577-11ea-b330-0a58ac107e30': Device or resource busy\ncommand terminated with exit code 1\n\nerror:\nexit status 1\n",
        },
        Code: 1,
    }
    error running &{/usr/bin/kubectl [kubectl --server=https://api.ci-op-45dwdn11-253c5.origin-ci-int-aws.dev.rhcloud.com:6443 --kubeconfig=/tmp/admin.kubeconfig exec --namespace=e2e-volume-9990 hostexec-ip-10-0-135-178.ec2.internal -- nsenter --mount=/rootfs/proc/1/ns/mnt -- sh -c sudo umount /tmp/local-driver-cf680694-7577-11ea-b330-0a58ac107e30 && rm -r /tmp/local-driver-cf680694-7577-11ea-b330-0a58ac107e30] []  <nil>  rm: cannot remove '/tmp/local-driver-cf680694-7577-11ea-b330-0a58ac107e30': Device or resource busy
    command terminated with exit code 1
     [] <nil> 0xc0040815f0 exit status 1 <nil> <nil> true [0xc0038264b0 0xc0038264c8 0xc0038264e0] [0xc0038264b0 0xc0038264c8 0xc0038264e0] [0xc0038264c0 0xc0038264d8] [0x95bef0 0x95bef0] 0xc001266d80 <nil>}:
    Command stdout:
    
    stderr:
    rm: cannot remove '/tmp/local-driver-cf680694-7577-11ea-b330-0a58ac107e30': Device or resource busy
    command terminated with exit code 1
    
    error:
    exit status 1
```

Additional info:

Comment 1 Carlos Eduardo Arango Gutierrez 2020-04-03 16:39:00 UTC

This Fail Mount error is also seen on https://bugzilla.redhat.com/show_bug.cgi?id=1819495

Comment 5 Jonathan Lebon 2020-04-06 16:10:49 UTC

*** Bug 1821347 has been marked as a duplicate of this bug. ***

Comment 6 Jan Safranek 2020-04-07 17:39:15 UTC

Tried to re-run the job rehearsal by myself. I can see lot of "device or resource busy" errors there, however, some of the pod directories are removed after (lot of) some retries.

It looks like there is something wrong with mount propagation:

1. Pod A uses a volume.
2. User / test starts a (privileged?) pod B using completely different volumes (I used "oc debug node/xxx" - privileged pod, but no mount propagation set anywhere)
3. User / test deletes Pod A.
4. Kubelet umounts the volume.
5. Kubelet tries to delete directory where the volume was mounted -> error.

Pod B is still running and it sees the volume of pod A mounted and blocks removal of the directory. Is there unmount hook present in the container runtime?
When Pod B dies, it most probably allows kubelet to finally remove the volume directory - it's hard to debug during e2e tests running, pods come and go... But I saw some "busy" directories removed after a while.

Moving to Node team for investigation.

Comment 7 Jan Safranek 2020-04-07 17:41:25 UTC

One note: nodes run RHEL-7
VERSION="7.7 (Maipo)"
cri-o-1.14.12-19.dev.rhaos4.2.git313d784.el7.x86_64
kernel-3.10.0-1127.el7.x86_64

Comment 8 Russell Teague 2020-04-08 13:55:30 UTC

Any update here?  All rhel7 worker jobs are blocked by multiple test failures.

Top 15 failures:
Failed	% of 132	Test (started between 2020-04-06T15:01:06 and 2020-04-08T09:29:55 UTC)
66	50	[sig-storage] PersistentVolumes-local  [Volume type: tmpfs] Two pods mounting a local volume at the same time should be able to write from pod1 and read from pod2 [Suite:openshift/conformance/parallel] [Suite:k8s]
65	49	[sig-apps] StatefulSet [k8s.io] Basic StatefulSet functionality [StatefulSetBasic] should perform rolling updates and roll backs of template modifications with PVCs [Suite:openshift/conformance/parallel] [Suite:k8s]
64	48	[sig-storage] PersistentVolumes-local  [Volume type: dir-bindmounted] Two pods mounting a local volume one after the other should be able to write from pod1 and read from pod2 [Suite:openshift/conformance/parallel] [Suite:k8s]
64	48	[sig-storage] PersistentVolumes-local  [Volume type: blockfswithformat] Two pods mounting a local volume one after the other should be able to write from pod1 and read from pod2 [Suite:openshift/conformance/parallel] [Suite:k8s]
63	47	[sig-apps] StatefulSet [k8s.io] Basic StatefulSet functionality [StatefulSetBasic] should perform rolling updates and roll backs of template modifications [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]
60	45	[sig-storage] PersistentVolumes-local  [Volume type: dir-link-bindmounted] Two pods mounting a local volume at the same time should be able to write from pod1 and read from pod2 [Suite:openshift/conformance/parallel] [Suite:k8s]
60	45	[sig-storage] PersistentVolumes-local  [Volume type: dir-bindmounted] Two pods mounting a local volume at the same time should be able to write from pod1 and read from pod2 [Suite:openshift/conformance/parallel] [Suite:k8s]
59	44	[sig-storage] PersistentVolumes-local  [Volume type: dir-link-bindmounted] Two pods mounting a local volume one after the other should be able to write from pod1 and read from pod2 [Suite:openshift/conformance/parallel] [Suite:k8s]
57	43	[sig-storage] PersistentVolumes-local  [Volume type: blockfswithformat] Two pods mounting a local volume at the same time should be able to write from pod1 and read from pod2 [Suite:openshift/conformance/parallel] [Suite:k8s]
57	43	[sig-apps] StatefulSet [k8s.io] Basic StatefulSet functionality [StatefulSetBasic] should provide basic identity [Suite:openshift/conformance/parallel] [Suite:k8s]
56	42	[sig-storage] PersistentVolumes-local  [Volume type: tmpfs] Two pods mounting a local volume one after the other should be able to write from pod1 and read from pod2 [Suite:openshift/conformance/parallel] [Suite:k8s]
56	42	[sig-storage] PersistentVolumes-local  [Volume type: blockfswithformat] One pod requesting one prebound PVC should be able to mount volume and read from pod1 [Suite:openshift/conformance/parallel] [Suite:k8s]
55	41	[sig-storage] PersistentVolumes-local  [Volume type: tmpfs] One pod requesting one prebound PVC should be able to mount volume and read from pod1 [Suite:openshift/conformance/parallel] [Suite:k8s]
53	40	[sig-storage] PersistentVolumes-local  [Volume type: dir-link-bindmounted] One pod requesting one prebound PVC should be able to mount volume and read from pod1 [Suite:openshift/conformance/parallel] [Suite:k8s]
53	40	[sig-storage] PersistentVolumes-local  [Volume type: dir-bindmounted] One pod requesting one prebound PVC should be able to mount volume and write from pod1 [Suite:openshift/conformance/parallel] [Suite:k8s]

Comment 9 Kir Kolyshkin 2020-04-13 20:32:07 UTC

(In reply to Jan Safranek from comment #6)
> Tried to re-run the job rehearsal by myself. I can see lot of "device or
> resource busy" errors there, however, some of the pod directories are
> removed after (lot of) some retries.
> 
> It looks like there is something wrong with mount propagation:
> 
> 1. Pod A uses a volume.
> 2. User / test starts a (privileged?) pod B using completely different
> volumes (I used "oc debug node/xxx" - privileged pod, but no mount
> propagation set anywhere)
> 3. User / test deletes Pod A.
> 4. Kubelet umounts the volume.
> 5. Kubelet tries to delete directory where the volume was mounted -> error.
> 
> Pod B is still running and it sees the volume of pod A mounted and blocks
> removal of the directory. Is there unmount hook present in the container
> runtime?
> 
> When Pod B dies, it most probably allows kubelet to finally remove the
> volume directory - it's hard to debug during e2e tests running, pods come
> and go... But I saw some "busy" directories removed after a while.
> 
> Moving to Node team for investigation.

Jan, were you able to repro this from the command line?
Can you provide the list of commands to run (ideally, with
an environment in which this can be reproduced).

I have seen similar things happening if some containers mount
/var/lib/kubernetes (or the like, i.e. directories containing
other containers' mounts). Can something like this be the case?
(or the like -- i.e. directories that contain

Comment 10 Scott Dodson 2020-04-14 01:24:06 UTC

I've moved severity up to urgent as this is preventing us from getting any passing e2e tests when RHEL7 workers are involved. This should be backported to 4.4 as well when it's fixed.

Comment 11 Jan Safranek 2020-04-15 09:25:53 UTC

It started in around April 01 in 4.2 RHEL-7 jobs: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.2-informing#release-openshift-ocp-installer-e2e-aws-rhel7-workers-4.2&sort-by-flakiness=

The last OK job: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-rhel7-workers-4.2/344
The first flaky job: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-rhel7-workers-4.2/347

Do you remember anything interesting in RHCOS / CRI-O / OCP 4.2 around that time?

Comment 12 Mrunal Patel 2020-04-15 20:40:19 UTC

Can you get the version of runc in use here?

Comment 13 Kir Kolyshkin 2020-04-15 20:45:19 UTC

Also, if possible, can you provide the result of cat /proc/sys/fs/may_detach_mounts (as root I guess)?

If it's 0, try doing

echo 1 > /proc/sys/fs/may_detach_mounts

and try to reproduce again (without rebooting the kernel).

Comment 14 Kir Kolyshkin 2020-04-15 21:41:48 UTC

Detailed explanation and how to check if my hypothesis is correct here: https://bugzilla.redhat.com/show_bug.cgi?id=1823374#c17

Comment 15 Kir Kolyshkin 2020-04-15 22:23:33 UTC

So, yes, this is runc packaging bug

(In reply to Jan Safranek from comment #11)
> It started in around April 01 in 4.2 RHEL-7 jobs:
> https://testgrid.k8s.io/redhat-openshift-ocp-release-4.2-informing#release-
> openshift-ocp-installer-e2e-aws-rhel7-workers-4.2&sort-by-flakiness=
> 
> The last OK job:
> https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-
> openshift-ocp-installer-e2e-aws-rhel7-workers-4.2/344

runc-1.0.0-66.rc8.el7_7.x86_64

> The first flaky job:
> https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-
> openshift-ocp-installer-e2e-aws-rhel7-workers-4.2/347

runc-1.0.0-67.rc10.el7_8.x86_64

The regression is the one I mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1823374#c17
(http://pkgs.devel.redhat.com/cgit/rpms/runc/commit/?h=extras-rhel-7.8&id=4250b1f539bfde82ac7e33d9a1385286975d9915)

Comment 16 Kir Kolyshkin 2020-04-22 20:08:49 UTC


*** This bug has been marked as a duplicate of bug 1823374 ***