Bug 2192346 - Fail to clone PVC on hostpath-csi-pvc-block storage class after CDI certs renewal
Summary: Fail to clone PVC on hostpath-csi-pvc-block storage class after CDI certs ren...
Keywords:
Status: CLOSED DUPLICATE of bug 2172853
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Storage
Version: 4.13.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.13.1
Assignee: Alex Kalenyuk
QA Contact: Natalie Gavrielov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-01 14:45 UTC by Jenia Peimer
Modified: 2023-05-18 11:00 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-05-18 11:00:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker CNV-28486 0 None None None 2023-05-01 14:46:50 UTC

Description Jenia Peimer 2023-05-01 14:45:52 UTC
Description of problem:
Once in 3-4 test runs, PVC clone fails on hostpath-csi-pvc-block storage class backed by OCS. This is only seen after CDI certificates renewal (couldn't reproduce without this step - ran the test 10 times).

Version-Release number of selected component (if applicable):
4.13

How reproducible:
1 in 3-4 test runs

Steps to Reproduce:
1. Create a source DV
2. Renew CDI certificates (update CDI's secret's annotation, this will trigger the cert renewal)
3. Clone the source DV to a target DV

Actual results:
Clone fails once in 3-4 runs
This is only seen on BM with hostpath-csi-pvc-block (with underlying OCS).
The test passed 7 times in a row on BM with hostpath-csi-basic (local) storage class.
The test passed 10 times in a row on PSI cluster with hostpath-csi-pvc-block (with underlying LSO).
I wasn't able to reproduce this behavior without CDI certificates renewal.

Expected results:
Clone succeeds

Additional info:
Test name: test_import_clone_after_certs_renewal

$ oc get dv -A
test-cdi-certificate   dv-target   CloneInProgress   N/A        4          2m2s

$ oc get pods -n test-cdi-certificate
NAME                                              READY   STATUS             RESTARTS      AGE
98744b32-3ada-4551-9511-8bd14332ac76-source-pod   0/1     CrashLoopBackOff   4 (34s ago)   2m5s
cdi-upload-dv-target                              1/1     Running            0             2m11s

$ oc logs -n test-cdi-certificate cdi-upload-dv-target
I0501 14:23:49.166069       1 uploadserver.go:74] Running server on 0.0.0.0:8443
I0501 14:23:54.709746       1 uploadserver.go:378] Content type header is "filesystem-clone"
I0501 14:23:54.709889       1 util.go:204] begin untar to /data...
I0501 14:23:54.709898       1 util.go:210] running untar cmd: [/usr/bin/tar --preserve-permissions --no-same-owner -xvC /data]
E0501 14:23:54.782846       1 util.go:219] exit status 2
E0501 14:23:54.782883       1 uploadserver.go:391] Saving stream failed: error unarchiving to /data: exit status 2
I0501 14:23:56.795716       1 uploadserver.go:378] Content type header is "filesystem-clone"
I0501 14:23:56.795824       1 util.go:204] begin untar to /data...
I0501 14:23:56.795832       1 util.go:210] running untar cmd: [/usr/bin/tar --preserve-permissions --no-same-owner -xvC /data]
E0501 14:23:56.879546       1 util.go:219] exit status 2
E0501 14:23:56.879578       1 uploadserver.go:391] Saving stream failed: error unarchiving to /data: exit status 2
I0501 14:24:11.303774       1 uploadserver.go:378] Content type header is "filesystem-clone"
I0501 14:24:11.303888       1 util.go:204] begin untar to /data...
I0501 14:24:11.303897       1 util.go:210] running untar cmd: [/usr/bin/tar --preserve-permissions --no-same-owner -xvC /data]
E0501 14:24:11.459124       1 util.go:219] exit status 2
E0501 14:24:11.459161       1 uploadserver.go:391] Saving stream failed: error unarchiving to /data: exit status 2
I0501 14:24:38.322763       1 uploadserver.go:378] Content type header is "filesystem-clone"
I0501 14:24:38.322848       1 util.go:204] begin untar to /data...
I0501 14:24:38.322857       1 util.go:210] running untar cmd: [/usr/bin/tar --preserve-permissions --no-same-owner -xvC /data]
E0501 14:24:38.471889       1 util.go:219] exit status 2
E0501 14:24:38.471915       1 uploadserver.go:391] Saving stream failed: error unarchiving to /data: exit status 2
I0501 14:25:22.163380       1 uploadserver.go:378] Content type header is "filesystem-clone"
I0501 14:25:22.163471       1 util.go:204] begin untar to /data...
I0501 14:25:22.163480       1 util.go:210] running untar cmd: [/usr/bin/tar --preserve-permissions --no-same-owner -xvC /data]
E0501 14:25:22.273202       1 util.go:219] exit status 2
E0501 14:25:22.273227       1 uploadserver.go:391] Saving stream failed: error unarchiving to /data: exit status 2


$ oc logs -n test-cdi-certificate 98744b32-3ada-4551-9511-8bd14332ac76-source-pod
VOLUME_MODE=filesystem
MOUNT_POINT=/var/run/cdi/clone/source
/var/run/cdi/clone/source /
UPLOAD_BYTES=1073741846
I0501 14:25:21.793028      10 clone-source.go:220] content-type is "filesystem-clone"
I0501 14:25:21.793093      10 clone-source.go:221] mount is "/var/run/cdi/clone/source"
I0501 14:25:21.793097      10 clone-source.go:222] upload-bytes is 1073741846
I0501 14:25:21.793110      10 clone-source.go:239] Starting cloner target
I0501 14:25:21.793170      10 clone-source.go:177] Executing [/usr/bin/tar cv -S disk.img]
I0501 14:25:22.127411      10 clone-source.go:251] Set header to filesystem-clone
I0501 14:25:22.272567      10 clone-source.go:127] Wrote 13711360 bytes
F0501 14:25:22.273388      10 clone-source.go:260] Unexpected status code 500
goroutine 1 [running]:
k8s.io/klog/v2.stacks(0x1)
/remote-source/app/vendor/k8s.io/klog/v2/klog.go:1038 +0x89
k8s.io/klog/v2.(*loggingT).output(0x149d920, 0x3, 0x0, 0xc000547d50, 0x0, {0x1060733?, 0x1?}, 0xc0004ec530?, 0x0)
/remote-source/app/vendor/k8s.io/klog/v2/klog.go:987 +0x5fc
k8s.io/klog/v2.(*loggingT).printf(0x12?, 0x5200b?, 0x0, {0x0, 0x0}, {0xd857eb, 0x19}, {0xc000708ad0, 0x1, 0x1})
/remote-source/app/vendor/k8s.io/klog/v2/klog.go:753 +0x1c5
k8s.io/klog/v2.Fatalf(...)
/remote-source/app/vendor/k8s.io/klog/v2/klog.go:1532
main.main()
/remote-source/app/cmd/cdi-cloner/clone-source.go:260 +0xc54

goroutine 37 [chan receive]:
k8s.io/klog/v2.(*loggingT).flushDaemon(0xc0003b4840?)
/remote-source/app/vendor/k8s.io/klog/v2/klog.go:1181 +0x6a
created by k8s.io/klog/v2.init.0
/remote-source/app/vendor/k8s.io/klog/v2/klog.go:420 +0xf6

goroutine 293 [sleep]:
time.Sleep(0x3b9aca00)
/usr/lib/golang/src/runtime/time.go:195 +0x135
kubevirt.io/containerized-data-importer/pkg/util/prometheus.(*ProgressReader).timedUpdateProgress(0xc000314720?)
/remote-source/app/pkg/util/prometheus/prometheus.go:55 +0x2c
created by kubevirt.io/containerized-data-importer/pkg/util/prometheus.(*ProgressReader).StartTimedUpdate
/remote-source/app/pkg/util/prometheus/prometheus.go:48 +0x56

goroutine 295 [IO wait]:
internal/poll.runtime_pollWait(0x7fd2804e6c08, 0x72)
/usr/lib/golang/src/runtime/netpoll.go:305 +0x89
internal/poll.(*pollDesc).wait(0xc000700080?, 0x0?, 0x0)
/usr/lib/golang/src/internal/poll/fd_poll_runtime.go:84 +0x32
internal/poll.(*pollDesc).waitRead(...)
/usr/lib/golang/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc000700080)
/usr/lib/golang/src/internal/poll/fd_unix.go:614 +0x234
net.(*netFD).accept(0xc000700080)
/usr/lib/golang/src/net/fd_unix.go:172 +0x35
net.(*TCPListener).accept(0xc000502240)
/usr/lib/golang/src/net/tcpsock_posix.go:142 +0x28
net.(*TCPListener).Accept(0xc000502240)
/usr/lib/golang/src/net/tcpsock.go:288 +0x3d
crypto/tls.(*listener).Accept(0xc0005022e8)
/usr/lib/golang/src/crypto/tls/tls.go:66 +0x2d
net/http.(*Server).Serve(0xc00046a0f0, {0xee5680, 0xc0005022e8})
/usr/lib/golang/src/net/http/server.go:3070 +0x385
net/http.(*Server).ServeTLS(0xc00046a0f0, {0xee8ec0?, 0xc000502240}, {0xc000318f60, 0x1e}, {0xc000318fc0, 0x1e})
/usr/lib/golang/src/net/http/server.go:3142 +0x3fd
net/http.(*Server).ListenAndServeTLS(0xc00046a0f0, {0xc000318f60, 0x1e}, {0xc000318fc0, 0x1e})
/usr/lib/golang/src/net/http/server.go:3299 +0x12f
net/http.ListenAndServeTLS(...)
/usr/lib/golang/src/net/http/server.go:3265
kubevirt.io/containerized-data-importer/pkg/util/prometheus.StartPrometheusEndpoint.func1()
/remote-source/app/pkg/util/prometheus/prometheus.go:113 +0xa5
created by kubevirt.io/containerized-data-importer/pkg/util/prometheus.StartPrometheusEndpoint
/remote-source/app/pkg/util/prometheus/prometheus.go:111 +0x23b

goroutine 308 [IO wait]:
internal/poll.runtime_pollWait(0x7fd2804e6b18, 0x72)
/usr/lib/golang/src/runtime/netpoll.go:305 +0x89
internal/poll.(*pollDesc).wait(0xc000700400?, 0xc00081e900?, 0x0)
/usr/lib/golang/src/internal/poll/fd_poll_runtime.go:84 +0x32
internal/poll.(*pollDesc).waitRead(...)
/usr/lib/golang/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000700400, {0xc00081e900, 0x240, 0x240})
/usr/lib/golang/src/internal/poll/fd_unix.go:167 +0x25a
net.(*netFD).Read(0xc000700400, {0xc00081e900?, 0x1441b60?, 0x3?})
/usr/lib/golang/src/net/fd_posix.go:55 +0x29
net.(*conn).Read(0xc000506090, {0xc00081e900?, 0x67db15?, 0xc0005c1640?})
/usr/lib/golang/src/net/net.go:183 +0x45
crypto/tls.(*atLeastReader).Read(0xc0003a2de0, {0xc00081e900?, 0x0?, 0x0?})
/usr/lib/golang/src/crypto/tls/conn.go:787 +0x3d
bytes.(*Buffer).ReadFrom(0xc0003c25f8, {0xedf900, 0xc0003a2de0})
/usr/lib/golang/src/bytes/buffer.go:202 +0x98
crypto/tls.(*Conn).readFromUntil(0xc0003c2380, {0xee0960?, 0xc000506090}, 0x800?)
/usr/lib/golang/src/crypto/tls/conn.go:809 +0xe5
crypto/tls.(*Conn).readRecordOrCCS(0xc0003c2380, 0x0)
/usr/lib/golang/src/crypto/tls/conn.go:616 +0x116
crypto/tls.(*Conn).readRecord(...)
/usr/lib/golang/src/crypto/tls/conn.go:582
crypto/tls.(*Conn).Read(0xc0003c2380, {0xc00088c000, 0x1000, 0x203000?})
/usr/lib/golang/src/crypto/tls/conn.go:1315 +0x16f
net/http.(*connReader).Read(0xc0007bb620, {0xc00088c000, 0x1000, 0x1000})
/usr/lib/golang/src/net/http/server.go:786 +0x171
bufio.(*Reader).fill(0xc000315440)
/usr/lib/golang/src/bufio/bufio.go:106 +0xff
bufio.(*Reader).ReadSlice(0xc000315440, 0x0?)
/usr/lib/golang/src/bufio/bufio.go:372 +0x2f
bufio.(*Reader).ReadLine(0xc000315440)
/usr/lib/golang/src/bufio/bufio.go:401 +0x27
net/textproto.(*Reader).readLineSlice(0xc0009261e0)
/usr/lib/golang/src/net/textproto/reader.go:58 +0x99
net/textproto.(*Reader).ReadLine(...)
/usr/lib/golang/src/net/textproto/reader.go:39
net/http.readRequest(0xc0007e9a08?)
/usr/lib/golang/src/net/http/request.go:1036 +0x79
net/http.(*conn).readRequest(0xc0003a6280, {0xee9a30, 0xc0005c1600})
/usr/lib/golang/src/net/http/server.go:994 +0x24a
net/http.(*conn).serve(0xc0003a6280, {0xee9ad8, 0xc0004ee780})
/usr/lib/golang/src/net/http/server.go:1916 +0x345
created by net/http.(*Server).Serve
/usr/lib/golang/src/net/http/server.go:3102 +0x4db

goroutine 309 [IO wait]:
internal/poll.runtime_pollWait(0x7fd2804e6de8, 0x72)
/usr/lib/golang/src/runtime/netpoll.go:305 +0x89
internal/poll.(*pollDesc).wait(0xc000700380?, 0xc000846000?, 0x0)
/usr/lib/golang/src/internal/poll/fd_poll_runtime.go:84 +0x32
internal/poll.(*pollDesc).waitRead(...)
/usr/lib/golang/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000700380, {0xc000846000, 0x1000, 0x1000})
/usr/lib/golang/src/internal/poll/fd_unix.go:167 +0x25a
net.(*netFD).Read(0xc000700380, {0xc000846000?, 0xc0003bcfe0?, 0xc000846005?})
/usr/lib/golang/src/net/fd_posix.go:55 +0x29
net.(*conn).Read(0xc000506088, {0xc000846000?, 0x0?, 0x0?})
/usr/lib/golang/src/net/net.go:183 +0x45
crypto/tls.(*atLeastReader).Read(0xc0003a2df8, {0xc000846000?, 0x0?, 0x0?})
/usr/lib/golang/src/crypto/tls/conn.go:787 +0x3d
bytes.(*Buffer).ReadFrom(0xc0003c2278, {0xedf900, 0xc0003a2df8})
/usr/lib/golang/src/bytes/buffer.go:202 +0x98
crypto/tls.(*Conn).readFromUntil(0xc0003c2000, {0xee0960?, 0xc000506088}, 0x100a?)
/usr/lib/golang/src/crypto/tls/conn.go:809 +0xe5
crypto/tls.(*Conn).readRecordOrCCS(0xc0003c2000, 0x0)
/usr/lib/golang/src/crypto/tls/conn.go:616 +0x116
crypto/tls.(*Conn).readRecord(...)
/usr/lib/golang/src/crypto/tls/conn.go:582
crypto/tls.(*Conn).Read(0xc0003c2000, {0xc000847000, 0x1000, 0x1?})
/usr/lib/golang/src/crypto/tls/conn.go:1315 +0x16f
net/http.(*persistConn).Read(0xc000464fc0, {0xc000847000?, 0x4520a0?, 0xc0007cdec8?})
/usr/lib/golang/src/net/http/transport.go:1929 +0x4e
bufio.(*Reader).fill(0xc000501b00)
/usr/lib/golang/src/bufio/bufio.go:106 +0xff
bufio.(*Reader).Peek(0xc000501b00, 0x1)
/usr/lib/golang/src/bufio/bufio.go:144 +0x5d
net/http.(*persistConn).readLoop(0xc000464fc0)
/usr/lib/golang/src/net/http/transport.go:2093 +0x1ac
created by net/http.(*Transport).dialConn
/usr/lib/golang/src/net/http/transport.go:1751 +0x173e

goroutine 310 [select]:
net/http.(*persistConn).writeLoop(0xc000464fc0)
/usr/lib/golang/src/net/http/transport.go:2392 +0xf5
created by net/http.(*Transport).dialConn
/usr/lib/golang/src/net/http/transport.go:1752 +0x1791



YAMLs:

DV-SOURCE:

apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  labels:
    created-by-dynamic-class-creator: 'Yes'
  name: dv-source
  namespace: test-cdi-certificate
spec:
  contentType: kubevirt
  source:
    http:
      url: http:<server>/files/cnv-tests/cirros-images/cirros-0.4.0-x86_64-disk.qcow2
  storage:
    resources:
      requests:
        storage: 1Gi
    storageClassName: hostpath-csi-pvc-block


DV-TARGET

apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  labels:
    created-by-dynamic-class-creator: 'Yes'
  name: dv-target
  namespace: test-cdi-certificate
spec:
  contentType: kubevirt
  source:
    pvc:
      name: dv-source
      namespace: test-cdi-certificate
  storage:
    resources:
      requests:
        storage: 1Gi
    storageClassName: hostpath-csi-pvc-block

Comment 1 Jenia Peimer 2023-05-01 15:10:22 UTC
What is common in the log of each failure (caught this log 3 times) is that it only gets to write 13711360 bytes:

$ oc logs -n test-cdi-certificate 2d47ca14-d69c-4dd0-bd83-a4caa9092ae8-source-pod
...
UPLOAD_BYTES=1073741846
...
I0430 17:52:22.457305      10 clone-source.go:127] Wrote 13711360 bytes
F0430 17:52:22.457925      10 clone-source.go:260] Unexpected status code 500

Comment 3 Jenia Peimer 2023-05-04 08:27:22 UTC
Could be related to this bug: https://bugzilla.redhat.com/show_bug.cgi?id=2172853

Comment 4 Alex Kalenyuk 2023-05-18 10:05:45 UTC
Sounds similar to https://bugzilla.redhat.com/show_bug.cgi?id=2172853#c6 - does this test scenario have a first consumer with different group that could race the cloner pod?

Comment 5 Jenia Peimer 2023-05-18 10:49:28 UTC
Yes, it had a first consumer with a different fsGroup, and since we changed it to 107 - we don't see this failure anymore (had several tier2 runs).

Comment 6 Alex Kalenyuk 2023-05-18 10:53:33 UTC
(In reply to Jenia Peimer from comment #5)
> Yes, it had a first consumer with a different fsGroup, and since we changed
> it to 107 - we don't see this failure anymore (had several tier2 runs).

So this is a dup of https://bugzilla.redhat.com/show_bug.cgi?id=2172853?

Comment 7 Jenia Peimer 2023-05-18 11:00:17 UTC
Right, closing this as a duplicate

*** This bug has been marked as a duplicate of bug 2172853 ***


Note You need to log in before you can comment on or make changes to this bug.