Description of problem: Importer pod became CrashLoopBackOff when Istio is installed, namespace has sidecar injection enabled, and no DataVolume sidecar.istio.io/inject: "false" annotation Version-Release number of selected component (if applicable): OCP4.5 CNV2.4 How reproducible: Always Steps to Reproduce: $ oc label namespace default istio-injection=enabled $ oc get namespace default -L istio-injection NAME STATUS AGE ISTIO-INJECTION default Active 6d8h enabled $ cat << EOF | oc create -f - apiVersion: cdi.kubevirt.io/v1beta1 kind: DataVolume metadata: name: test-dv1 spec: source: http: url: "http://$url/Fedora-Cloud-Base-33-1.2.x86_64.qcow2" pvc: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi EOF $ oc get pod NAME READY STATUS RESTARTS AGE importer-test-dv1 1/2 CrashLoopBackOff 25 105m $ oc get dv NAME PHASE PROGRESS RESTARTS AGE test-dv1 ImportInProgress N/A 111m Actual results: imported pod CrashLoopBackOff Expected results: importer pod works well Additional info:
Yan. Please attach importer logs and events for the default namespace.
Adam, log has attached. The crash is caused by connection error for source http url in importer pod. It works well if set annotation in the dv $ oc get dv NAME PHASE PROGRESS RESTARTS AGE test-dv1 ImportInProgress N/A 1 22m test-dv2 Succeeded 100.0% 23m --- apiVersion: cdi.kubevirt.io/v1beta1 kind: DataVolume metadata: name: test-dv2 annotations: sidecar.istio.io/inject: "false" spec: source: http: url: "http://$url/Fedora-Cloud-Base-33-1.2.x86_64.qcow2" pvc: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi
Created attachment 1746887 [details] log
The error I see in the logs is: Unable to connect to http data source: Get http://mirrors.nav.ro/fedora/linux/releases/33/Cloud/x86_64/images/Fedora-Cloud-Base-33-1.2.x86_64.qcow2: dial tcp 5.154.224.26:80: connect: connection refused Could you try a non-Fedora image? the Fedora download URL sometimes points to dysfunctional mirrors.
@Maya I can reproduce the issue with our testing cirros images (cirros-0.4.0-x86_64-disk.qcow2)
Yan, could this be an istio configuration issue in your cluster?
Could you use this document to ensure that your istio proxy is allowing all egress traffic? https://istio.io/latest/docs/tasks/traffic-management/egress/egress-control/ This doc also provides some hints for debugging the network access.
Adam, I'm afraid that we borrowed the istio cluster from other team, so not sure whether it's ok for us to change some of the configuration for the cluster. But I tried to create two normal pods with/without the annotation, both network inside the pods are working well, so I guess the istio configuration probably is correct. $ oc get po NAME READY STATUS RESTARTS AGE hello-pod-anno 1/1 Running 0 7m46s hello-pod2 2/2 Running 0 7m7s $ oc rsh hello-pod-anno / # curl -O http://$url/cirros-images/cirros-0.4.0-x86_64-disk.qcow2 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 12.1M 100 12.1M 0 0 30.5M 0 --:--:-- --:--:-- --:--:-- 32.9M / # ls cirros-0.4.0-x86_64-disk.qcow2 cirros-0.4.0-x86_64-disk.qcow2 $ oc rsh hello-pod2 Defaulting container name to hello-pod2. Use 'oc describe pod/hello-pod2 -n default' to see all of the containers in this pod. / # curl -O http://$url/cirros-images/cirros-0.4.0-x86_64-disk.qcow2 / # ls cirros-0.4.0-x86_64-disk.qcow2 cirros-0.4.0-x86_64-disk.qcow2
Created attachment 1747280 [details] pods yaml
We are trying to get the CDI importer pod work correctly with Istio (without disabling sidecar injection for the pod, which was our first workaround and worked fine). When we tried to import the following image we got a 502 Bad Gateway error causing CrashLoopBackOff of the Importer pod. kind: DataVolume metadata: name: test-dv spec: source: http: url: "http://mirrors.nav.ro/fedora/linux/releases/33/Cloud/x86_64/images/Fedora-Cloud-Base-33-1.2.x86_64.qcow2" pvc: accessModes: - ReadWriteOnce resources: requests: storage: 5Gi Then we tried applying: apiVersion: networking.istio.io/v1alpha3 kind: ServiceEntry metadata: name: fed spec: hosts: - mirrors.nav.ro location: MESH_EXTERNAL ports: - number: 80 name: http protocol: HTTP resolution: NONE which improved it, but during import we got one or more: http: TLS handshake error from 127.0.0.6:35115: EOF and after some import progress it terminated with: qemu-img: curl: The requested URL returned error: 503 Service Unavailable qemu-img: error while reading at byte 831913984: Input/output error Tried adding HTTPS port but it went back to the initial behavior. Discussing it with Rob Cernich from the Istio team he commented: I wonder if this is caused by this issue: OSSM-357. If you look at that issue, the problem is that on a full sync, the endpoint ip for the external service gets nuked, so it works for a while right after creating the SE, then stops working. If that's the behavior you are seeing, it's most likely the same issue as the one linked. Our test was performed on cluster with Istio 1.5.7.
Possibly a dupe of https://issues.redhat.com/browse/OSSM-357
Just for clarification, this issue happens only when Istio is installed and its sidecar injection is enabled for the CDI transfer pods namespace. I fixed the bz title to make it clear and less dramatic. Discussing it with Rob Cenich from the Istio team, we see no reason to have Instio sidecar injection enabled for the CDI transfer pods. CDI importer pulling an image from an external URL is not something natural in the Istio env, so the behavior is not surprising at all. However, we may handle it gracefully in the importer, @Adam? In case one insists on using Istio sidecar injection in ns for some reason, he can use the sidecar injection disabling annotation (see BZ#1883232) in the DataVolume to disable it for CDI transfer pods. However we may simply set the sidecar injection disabling annotation by default for all CDI transfer pods, @Adam? Of course when Istio is installed and its sidecar injection is enabled for the namespace, the original 'fix' has no effect if we don't add the annotation to the dv. All the fix does is pass it to the transfer pods, which is the way to get them working correctly in this situation.
I think it is worth considering always supplying the injection disabling annotation on CDI created pods. Are there any downsides to this?
It really depends on why you wanted it to be part of the mesh, so it would use the proxy for its communication (e.g. visualization, security, traffic control, etc.; maybe the target is in the mesh and you want to use mTLS, or apply load balancing rules, blue/green/canary deployments of the target, etc.). Obviously, if you wanted to use those features, you'd need the sidecar and you'd need to make sure things were configured correctly. In this case, it appears ServiceEntry was working, but there might be a bug in istio which was causing problems (which, using ServiceEntry implies the target service is not a part of the mesh).
PR #1677 merged into release-v1.28.
Test with virt-cdi-importer v2.6.1-4, issue have been fixed. $ oc get pod NAME READY STATUS RESTARTS AGE importer-test-dv3 1/1 Running 0 3m13s $ oc get dv NAME PHASE PROGRESS RESTARTS AGE test-dv3 Succeeded 100.0% 4m12s
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (CNV 2.6.1 Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2021:1126