Bug 1914833 - Importer pod became CrashLoopBackOff when Istio is installed, namespace has sidecar injection enabled, and no DataVolume sidecar.istio.io/inject: "false" annotation
Summary: Importer pod became CrashLoopBackOff when Istio is installed, namespace has s...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Storage
Version: 2.4.0
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 2.6.1
Assignee: Arnon Gilboa
QA Contact: Yan Du
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-11 10:07 UTC by Yan Du
Modified: 2024-06-13 23:57 UTC (History)
7 users (show)

Fixed In Version: virt-cdi-importer v2.6.1-2
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-04-07 08:46:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
log (14.76 KB, text/plain)
2021-01-13 03:55 UTC, Yan Du
no flags Details
pods yaml (365 bytes, text/plain)
2021-01-14 02:51 UTC, Yan Du
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt containerized-data-importer pull 1674 0 None open Allow passing default annotation value to transfer pods 2021-02-22 10:23:35 UTC
Github kubevirt containerized-data-importer pull 1677 0 None closed [release-v1.28] Allow passing default annotation value to transfer pods 2021-03-14 14:31:50 UTC
Red Hat Issue Tracker CNV-9476 0 None None None 2024-06-13 23:57:38 UTC
Red Hat Product Errata RHEA-2021:1126 0 None None None 2021-04-07 08:46:36 UTC

Description Yan Du 2021-01-11 10:07:01 UTC
Description of problem:
Importer pod became CrashLoopBackOff when Istio is installed, namespace has sidecar injection enabled, and no DataVolume sidecar.istio.io/inject: "false" annotation

Version-Release number of selected component (if applicable):
OCP4.5
CNV2.4

How reproducible:
Always

Steps to Reproduce:
$ oc label namespace default istio-injection=enabled
$ oc get namespace default -L istio-injection
NAME      STATUS   AGE    ISTIO-INJECTION
default   Active   6d8h   enabled

$ cat << EOF | oc create -f -
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: test-dv1
spec:
  source:
      http:
         url: "http://$url/Fedora-Cloud-Base-33-1.2.x86_64.qcow2"
  pvc:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: 10Gi
EOF


$ oc get pod
NAME                READY   STATUS             RESTARTS   AGE
importer-test-dv1   1/2     CrashLoopBackOff   25         105m
$ oc get dv
NAME       PHASE              PROGRESS   RESTARTS   AGE
test-dv1   ImportInProgress   N/A                   111m


Actual results:
imported pod CrashLoopBackOff

Expected results:
importer pod works well

Additional info:

Comment 1 Adam Litke 2021-01-12 20:54:29 UTC
Yan.  Please attach importer logs and events for the default namespace.

Comment 2 Yan Du 2021-01-13 03:55:03 UTC
Adam, log has attached. The crash is caused by connection error for source http url in importer pod.

It works well if set annotation in the dv

$ oc get dv
NAME       PHASE              PROGRESS   RESTARTS   AGE
test-dv1   ImportInProgress   N/A        1          22m
test-dv2   Succeeded          100.0%                23m

---
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: test-dv2
  annotations:
      sidecar.istio.io/inject: "false"
spec:
  source:
      http:
         url: "http://$url/Fedora-Cloud-Base-33-1.2.x86_64.qcow2"
  pvc:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: 10Gi

Comment 3 Yan Du 2021-01-13 03:55:32 UTC
Created attachment 1746887 [details]
log

Comment 4 Maya Rashish 2021-01-13 13:27:33 UTC
The error I see in the logs is:

Unable to connect to http data source: Get http://mirrors.nav.ro/fedora/linux/releases/33/Cloud/x86_64/images/Fedora-Cloud-Base-33-1.2.x86_64.qcow2: dial tcp 5.154.224.26:80: connect: connection refused

Could you try a non-Fedora image? the Fedora download URL sometimes points to dysfunctional mirrors.

Comment 5 Yan Du 2021-01-13 14:41:48 UTC
@Maya I can reproduce the issue with our testing cirros images (cirros-0.4.0-x86_64-disk.qcow2)

Comment 6 Adam Litke 2021-01-13 16:38:58 UTC
Yan, could this be an istio configuration issue in your cluster?

Comment 7 Adam Litke 2021-01-13 16:42:41 UTC
Could you use this document to ensure that your istio proxy is allowing all egress traffic? https://istio.io/latest/docs/tasks/traffic-management/egress/egress-control/

This doc also provides some hints for debugging the network access.

Comment 8 Yan Du 2021-01-14 02:47:05 UTC
Adam, I'm afraid that we borrowed the istio cluster from other team, so not sure whether it's ok for us to change some of the configuration for the cluster.

But I tried to create two normal pods with/without the annotation, both network inside the pods are working well, so I guess the istio configuration probably is correct. 

$ oc get po
NAME             READY   STATUS    RESTARTS   AGE
hello-pod-anno   1/1     Running   0          7m46s
hello-pod2      2/2     Running   0          7m7s

$ oc rsh hello-pod-anno
/ # curl -O http://$url/cirros-images/cirros-0.4.0-x86_64-disk.qcow2
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 12.1M  100 12.1M    0     0  30.5M      0 --:--:-- --:--:-- --:--:-- 32.9M
/ # ls cirros-0.4.0-x86_64-disk.qcow2 
cirros-0.4.0-x86_64-disk.qcow2

$ oc rsh hello-pod2
Defaulting container name to hello-pod2.
Use 'oc describe pod/hello-pod2 -n default' to see all of the containers in this pod.
/ # curl -O http://$url/cirros-images/cirros-0.4.0-x86_64-disk.qcow2
/ # ls cirros-0.4.0-x86_64-disk.qcow2 
cirros-0.4.0-x86_64-disk.qcow2

Comment 9 Yan Du 2021-01-14 02:51:58 UTC
Created attachment 1747280 [details]
pods yaml

Comment 12 Arnon Gilboa 2021-02-02 20:15:47 UTC
We are trying to get the CDI importer pod work correctly with Istio (without disabling sidecar injection for the pod, which was our first workaround and worked fine). When we tried to import the following image we got a 502 Bad Gateway error causing CrashLoopBackOff of the Importer pod.

kind: DataVolume
metadata:
  name: test-dv
spec:
  source:
      http:
         url: "http://mirrors.nav.ro/fedora/linux/releases/33/Cloud/x86_64/images/Fedora-Cloud-Base-33-1.2.x86_64.qcow2"
  pvc:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: 5Gi

Then we tried applying:
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: fed
spec:
  hosts:
  - mirrors.nav.ro
  location: MESH_EXTERNAL
  ports:
  - number: 80
    name: http
    protocol: HTTP
  resolution: NONE

which improved it, but during import we got one or more:
http: TLS handshake error from 127.0.0.6:35115: EOF

and after some import progress it terminated with:
qemu-img: curl: The requested URL returned error: 503 Service Unavailable
qemu-img: error while reading at byte 831913984: Input/output error


Tried adding HTTPS port but it went back to the initial behavior.

Discussing it with Rob Cernich from the Istio team he commented: ​I wonder if this is caused by this issue: OSSM-357. If you look at that issue, the problem is that on a full sync, the endpoint ip for the external service gets nuked, so it works for a while right after creating the SE, then stops working.  If that's the behavior you are seeing, it's most likely the same issue as the one linked.

Our test was performed on cluster with Istio 1.5.7.

Comment 13 Adam Litke 2021-02-02 20:58:19 UTC
Possibly a dupe of https://issues.redhat.com/browse/OSSM-357

Comment 14 Arnon Gilboa 2021-02-03 11:17:11 UTC
Just for clarification, this issue happens only when Istio is installed and its sidecar injection is enabled for the CDI transfer pods namespace. I fixed the bz title to make it clear and less dramatic.

Discussing it with Rob Cenich from the Istio team, we see no reason to have Instio sidecar injection enabled for the CDI transfer pods. CDI importer pulling an image from an external URL is not something natural in the Istio env, so the behavior is not surprising at all. However, we may handle it gracefully in the importer, @Adam?

In case one insists on using Istio sidecar injection in ns for some reason, he can use the sidecar injection disabling annotation (see BZ#1883232) in the DataVolume to disable it for CDI transfer pods. However we may simply set the sidecar injection disabling annotation by default for all CDI transfer pods,  @Adam?

Of course when Istio is installed and its sidecar injection is enabled for the namespace, the original 'fix' has no effect if we don't add the annotation to the dv. All the fix does is pass it to the transfer pods, which is the way to get them working correctly in this situation.

Comment 16 Adam Litke 2021-02-03 17:52:04 UTC
I think it is worth considering always supplying the injection disabling annotation on CDI created pods.  Are there any downsides to this?

Comment 17 Rob Cernich 2021-02-03 18:19:54 UTC
It really depends on why you wanted it to be part of the mesh, so it would use the proxy for its communication (e.g. visualization, security, traffic control, etc.; maybe the target is in the mesh and you want to use mTLS, or apply load balancing rules, blue/green/canary deployments of the target, etc.).  Obviously, if you wanted to use those features, you'd need the sidecar and you'd need to make sure things were configured correctly.  In this case, it appears ServiceEntry was working, but there might be a bug in istio which was causing problems (which, using ServiceEntry implies the target service is not a part of the mesh).

Comment 18 Arnon Gilboa 2021-03-14 14:30:21 UTC
PR #1677 merged into release-v1.28.

Comment 19 Yan Du 2021-03-18 03:35:11 UTC
Test with virt-cdi-importer v2.6.1-4, issue have been fixed.

$ oc get pod
NAME                READY   STATUS    RESTARTS   AGE
importer-test-dv3   1/1     Running   0          3m13s
$ oc get dv
NAME       PHASE       PROGRESS   RESTARTS   AGE
test-dv3   Succeeded   100.0%                4m12s

Comment 25 errata-xmlrpc 2021-04-07 08:46:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (CNV 2.6.1 Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2021:1126


Note You need to log in before you can comment on or make changes to this bug.