Hide Forgot
Description of problem: After restarting nodes the image-registry co is in degraded true state. Version-Release number of selected component (if applicable): # oc version Client Version: 4.7.0-0.nightly-ppc64le-2020-11-05-140212 Server Version: 4.7.0-0.nightly-ppc64le-2020-11-05-140212 Kubernetes Version: v1.19.2+6bd0f34 Steps to Reproduce: 1. Restated worker node. 2. image-registry co went in degraded true state 3. Fixed the issue after deleting a image-pruner job. After which image-registry co was in good state. 4. Again restarted the cluster nodes including the bastion. Again image-registry co went in degraded true state. Actual results: After restarting the nodes, image-registry co went in degraded true state Expected results: image-registry co should not go in degraded true state after restarting the cluster nodes. Additional info: Temporary fix for this issue is : When the image-registry co goes in degraded true state, following is observed in pruner job : # oc -n openshift-image-registry get jobs NAME COMPLETIONS DURATION AGE image-pruner-1604880000 1/1 3s 2d11h image-pruner-1604966400 1/1 4s 35h image-pruner-1605052800 0/1 11h 11h The image-registry comes back to degraded false state after deleting the affected job. # oc -n openshift-image-registry delete jobs/image-pruner-1605052800 job.batch "image-pruner-1605052800" deleted Following is observed, after restaring the cluster nodes. - Saw that image-registry co is again in progressing true state. - The following pods have issue : # oc get pods --all-namespaces -o wide | grep -v Running | grep -v Completed NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES openshift-image-registry image-pruner-1605139200-qkx2x 0/1 Error 0 5h2m 10.131.0.154 worker-0 <none> <none> openshift-image-registry image-registry-5d4446bdd8-8fxwp 0/1 CrashLoopBackOff 155 20h 10.128.2.20 worker-1 <none> <none> - The pod is in CrashLoopBackOff state # oc logs -f po/image-registry-5d4446bdd8-8fxwp -n openshift-image-registry mkdir: cannot create directory '/etc/pki/ca-trust/extracted/edk2': File exists mkdir: cannot create directory '/etc/pki/ca-trust/extracted/java': File exists mkdir: cannot create directory '/etc/pki/ca-trust/extracted/openssl': File exists mkdir: cannot create directory '/etc/pki/ca-trust/extracted/pem': File exists # oc logs -f po/image-pruner-1605139200-qkx2x -n openshift-image-registry F1112 00:00:38.597063 1 helpers.go:115] error: failed to ping registry https://image-registry.openshift-image-registry.svc:5000: Get "https://image-registry.openshift-image-registry.svc:5000/": dial tcp 172.30.55.81:5000: i/o timeout goroutine 1 [running]: k8s.io/klog/v2.stacks(0xc000012001, 0xc00169c580, 0xf0, 0x141) /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:996 +0xac k8s.io/klog/v2.(*loggingT).output(0x142c7740, 0xc000000003, 0x0, 0x0, 0xc000a2c620, 0x1404c036, 0xa, 0x73, 0x0) /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:945 +0x1a0 k8s.io/klog/v2.(*loggingT).printDepth(0x142c7740, 0x3, 0x0, 0x0, 0x2, 0xc0018236a0, 0x1, 0x1) /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:718 +0x128 k8s.io/klog/v2.FatalDepth(...) /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:1449 k8s.io/kubectl/pkg/cmd/util.fatal(0xc001068000, 0xc1, 0x1) /go/src/github.com/openshift/oc/vendor/k8s.io/kubectl/pkg/cmd/util/helpers.go:93 +0x1b4 k8s.io/kubectl/pkg/cmd/util.checkErr(0x12c78920, 0xc001df16b0, 0x129c4608) /go/src/github.com/openshift/oc/vendor/k8s.io/kubectl/pkg/cmd/util/helpers.go:188 +0x80c k8s.io/kubectl/pkg/cmd/util.CheckErr(...) /go/src/github.com/openshift/oc/vendor/k8s.io/kubectl/pkg/cmd/util/helpers.go:115 github.com/openshift/oc/pkg/cli/admin/prune/images.NewCmdPruneImages.func1(0xc00163f340, 0xc0016cee80, 0x0, 0x8) /go/src/github.com/openshift/oc/pkg/cli/admin/prune/images/images.go:161 +0x1b4 github.com/spf13/cobra.(*Command).execute(0xc00163f340, 0xc0016cee00, 0x8, 0x8, 0xc00163f340, 0xc0016cee00) /go/src/github.com/openshift/oc/vendor/github.com/spf13/cobra/command.go:846 +0x208 github.com/spf13/cobra.(*Command).ExecuteC(0xc000859080, 0x2, 0xc000859080, 0x2) /go/src/github.com/openshift/oc/vendor/github.com/spf13/cobra/command.go:950 +0x29c github.com/spf13/cobra.(*Command).Execute(...) /go/src/github.com/openshift/oc/vendor/github.com/spf13/cobra/command.go:887 main.main() /go/src/github.com/openshift/oc/cmd/oc/oc.go:110 +0x930 goroutine 6 [chan receive]: k8s.io/klog/v2.(*loggingT).flushDaemon(0x142c7740) /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:1131 +0x78 created by k8s.io/klog/v2.init.0 /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:416 +0xe0 goroutine 9 [chan receive]: k8s.io/klog.(*loggingT).flushDaemon(0x142c7660) /go/src/github.com/openshift/oc/vendor/k8s.io/klog/klog.go:1010 +0x78 created by k8s.io/klog.init.0 /go/src/github.com/openshift/oc/vendor/k8s.io/klog/klog.go:411 +0xe0 goroutine 66 [IO wait]: internal/poll.runtime_pollWait(0x7fff9028ee98, 0x72, 0xfadf) /opt/rh/go-toolset-1.15/root/usr/lib/go-toolset-1.15-golang/src/runtime/netpoll.go:220 +0x50 internal/poll.(*pollDesc).wait(0xc0016cf418, 0x72, 0xc0018ba000, 0xfadf, 0xfadf) /opt/rh/go-toolset-1.15/root/usr/lib/go-toolset-1.15-golang/src/internal/poll/fd_poll_runtime.go:87 +0x40 internal/poll.(*pollDesc).waitRead(...) /opt/rh/go-toolset-1.15/root/usr/lib/go-toolset-1.15-golang/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(*FD).Read(0xc0016cf400, 0xc0018ba000, 0xfadf, 0xfadf, 0x0, 0x0, 0x0) /opt/rh/go-toolset-1.15/root/usr/lib/go-toolset-1.15-golang/src/internal/poll/fd_unix.go:159 +0x188 net.(*netFD).Read(0xc0016cf400, 0xc0018ba000, 0xfadf, 0xfadf, 0x0, 0x0, 0x103185ac) /opt/rh/go-toolset-1.15/root/usr/lib/go-toolset-1.15-golang/src/net/fd_posix.go:55 +0x48 net.(*conn).Read(0xc0003fa040, 0xc0018ba000, 0xfadf, 0xfadf, 0x0, 0x0, 0x0) /opt/rh/go-toolset-1.15/root/usr/lib/go-toolset-1.15-golang/src/net/net.go:182 +0x84 crypto/tls.(*atLeastReader).Read(0xc000eae640, 0xc0018ba000, 0xfadf, 0xfadf, 0x1031aa4c, 0x223, 0xc00003e000) /opt/rh/go-toolset-1.15/root/usr/lib/go-toolset-1.15-golang/src/crypto/tls/conn.go:779 +0x60 bytes.(*Buffer).ReadFrom(0xc000ece280, 0x12c78660, 0xc000eae640, 0xc0018ba005, 0x123da540, 0x126d03a0) /opt/rh/go-toolset-1.15/root/usr/lib/go-toolset-1.15-golang/src/bytes/buffer.go:204 +0xac crypto/tls.(*Conn).readFromUntil(0xc000ece000, 0x12c7dbe0, 0xc0003fa040, 0x5, 0xc0003fa040, 0x223) /opt/rh/go-toolset-1.15/root/usr/lib/go-toolset-1.15-golang/src/crypto/tls/conn.go:801 +0xd4 crypto/tls.(*Conn).readRecordOrCCS(0xc000ece000, 0x0, 0x0, 0xc0013cc080) /opt/rh/go-toolset-1.15/root/usr/lib/go-toolset-1.15-golang/src/crypto/tls/conn.go:608 +0xf4 crypto/tls.(*Conn).readRecord(...) /opt/rh/go-toolset-1.15/root/usr/lib/go-toolset-1.15-golang/src/crypto/tls/conn.go:576 crypto/tls.(*Conn).Read(0xc000ece000, 0xc001896000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /opt/rh/go-toolset-1.15/root/usr/lib/go-toolset-1.15-golang/src/crypto/tls/conn.go:1252 +0x150 bufio.(*Reader).Read(0xc001892840, 0xc0004a4658, 0x9, 0x9, 0x129c7f01, 0x10515144, 0x19982220e1ce) /opt/rh/go-toolset-1.15/root/usr/lib/go-toolset-1.15-golang/src/bufio/bufio.go:227 +0x238 io.ReadAtLeast(0x12c78400, 0xc001892840, 0xc0004a4658, 0x9, 0x9, 0x9, 0xc000090060, 0x105128e8, 0xd6) /opt/rh/go-toolset-1.15/root/usr/lib/go-toolset-1.15-golang/src/io/io.go:314 +0x84 io.ReadFull(...) /opt/rh/go-toolset-1.15/root/usr/lib/go-toolset-1.15-golang/src/io/io.go:333 golang.org/x/net/http2.readFrameHeader(0xc0004a4658, 0x9, 0x9, 0x12c78400, 0xc001892840, 0x0, 0x0, 0xc000ead260, 0x21a) /go/src/github.com/openshift/oc/vendor/golang.org/x/net/http2/frame.go:237 +0x60 golang.org/x/net/http2.(*Framer).ReadFrame(0xc0004a4620, 0xc000ead260, 0x0, 0x0, 0x0) /go/src/github.com/openshift/oc/vendor/golang.org/x/net/http2/frame.go:492 +0x84 golang.org/x/net/http2.(*clientConnReadLoop).run(0xc001813f98, 0x0, 0x0) /go/src/github.com/openshift/oc/vendor/golang.org/x/net/http2/transport.go:1794 +0xc0 golang.org/x/net/http2.(*ClientConn).readLoop(0xc000ecc300) /go/src/github.com/openshift/oc/vendor/golang.org/x/net/http2/transport.go:1716 +0x68 created by golang.org/x/net/http2.(*Transport).newClientConn /go/src/github.com/openshift/oc/vendor/golang.org/x/net/http2/transport.go:695 +0x554 goroutine 25 [select]: k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x129c4530, 0x12c7c6a0, 0xc000eca090, 0x1, 0xc0000b0360) /go/src/github.com/openshift/oc/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:167 +0x11c k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x129c4530, 0x12a05f200, 0x0, 0xc000750001, 0xc0000b0360) /go/src/github.com/openshift/oc/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x90 k8s.io/apimachinery/pkg/util/wait.Until(...) /go/src/github.com/openshift/oc/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 k8s.io/apimachinery/pkg/util/wait.Forever(0x129c4530, 0x12a05f200) /go/src/github.com/openshift/oc/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:81 +0x50 created by k8s.io/component-base/logs.InitLogs /go/src/github.com/openshift/oc/vendor/k8s.io/component-base/logs/logs.go:58 +0x88 goroutine 26 [select]: io.(*pipe).Read(0xc0014e1a40, 0xc0012d0000, 0x1000, 0x1000, 0x122a1c40, 0x1, 0xc0012d0000) /opt/rh/go-toolset-1.15/root/usr/lib/go-toolset-1.15-golang/src/io/pipe.go:57 +0xc8 io.(*PipeReader).Read(0xc0013c4d78, 0xc0012d0000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /opt/rh/go-toolset-1.15/root/usr/lib/go-toolset-1.15-golang/src/io/pipe.go:134 +0x48 bufio.(*Scanner).Scan(0xc001593480, 0x0) /opt/rh/go-toolset-1.15/root/usr/lib/go-toolset-1.15-golang/src/bufio/scan.go:214 +0xa8 github.com/openshift/oc/pkg/cli/admin/mustgather.newPrefixWriter.func1(0xc001593480, 0x12c7e0c0, 0xc000012018, 0x127dd833, 0x17) /go/src/github.com/openshift/oc/pkg/cli/admin/mustgather/mustgather.go:424 +0xfc created by github.com/openshift/oc/pkg/cli/admin/mustgather.newPrefixWriter /go/src/github.com/openshift/oc/pkg/cli/admin/mustgather/mustgather.go:423 +0x1b4 # oc -n openshift-image-registry get jobs NAME COMPLETIONS DURATION AGE image-pruner-1604880000 1/1 3s 3d5h image-pruner-1604966400 1/1 4s 2d5h image-pruner-1605139200 0/1 5h16m 5h16m
Steps to reproduce the problem: 1. get registry pods: oc get pods -n openshift-image-registry 2. rsh into a pod: oc -n openshift-image-registry rsh image-registry-584b6c5f4f-h88rr 3. kill the registry process (it always has pid=1 in the container): kill 1 (in rsh session) 4. check the pod status: NAME READY STATUS RESTARTS AGE image-registry-584b6c5f4f-h88rr 0/1 CrashLoopBackOff 4 21m
Image registry pod just restarts and then go back running: $ oc get pods NAME READY STATUS RESTARTS AGE cluster-image-registry-operator-84bfd498b7-hwm4r 1/1 Running 0 8h image-pruner-1606379280-gn9xb 0/1 Completed 0 2m35s image-pruner-1606379340-kcwkl 0/1 Completed 0 95s image-pruner-1606379400-btg4j 0/1 Completed 0 35s image-registry-74d66d97bf-dlstg 1/1 Running 1 8h image-registry-74d66d97bf-xffcp 1/1 Running 1 8h node-ca-7c4vr 1/1 Running 0 8h node-ca-8cgzc 1/1 Running 0 8h node-ca-8sd2w 1/1 Running 0 8h node-ca-ml6xg 1/1 Running 0 8h node-ca-sxg7z 1/1 Running 0 8h node-ca-ztfxf 1/1 Running 0 8h zhengwenjings-MacBook-Pro:4.7 wzheng$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2020-11-25-114114 True False 8h Cluster version is 4.7.0-0.nightly-2020-11-25-114114
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633