Bug 1660134

Summary: [kuryr][cri-o] Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
Product: Red Hat OpenStack Reporter: Eduardo Minguez <eminguez>
Component: openstack-kuryr-kubernetesAssignee: Michał Dulko <mdulko>
Status: CLOSED CURRENTRELEASE QA Contact: GenadiC <gcheresh>
Severity: high Docs Contact:
Priority: high    
Version: 14.0 (Rocky)CC: aos-bugs, asegurap, bbennett, cdc, jliberma, jschluet, ltomasbo, mdulko, rbost
Target Milestone: zstreamKeywords: TestOnly, Triaged, ZStream
Target Release: 14.0 (Rocky)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-31 08:40:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
controller logs none

Description Eduardo Minguez 2018-12-17 15:15:55 UTC
Description of problem:

Trying to deploy a cri-o & kuryr ocp cluster (3.11.51) on osp failed when deploying the webconsole.

The real issue seems to be related to kuryr + cri-o combination as pods running con docker based container runtime are working fine.


Version-Release number of selected component (if applicable):

RHEL 7.6-4
atomic-openshift-node-3.11.51-1.git.0.1560686.el7.x86_64
cri-o-1.11.10-1.rhaos3.11.git42c86f0.el7.x86_64
registry.redhat.io/rhosp13/openstack-kuryr-controller:13.0
registry.redhat.io/rhosp13/openstack-kuryr-cni:13.0

How reproducible:

Deploy ocp 3.11.51 with cri-o on some nodes, docker on others and kuryr as cni.

Environment:

* 3 master nodes with cri-o as container runtime
* 3 infra nodes with docker as container runtime
* 3 app nodes with cri-o as container runtime


Steps to Reproduce:
1. Deploy OCP with the variables included in this BZ
2. 
3.

Actual results:
Installation fails due to the webconsole not being able to deploy


Expected results:
Installation and environment is ok.


Additional info:

Inventory variables:

* all.yml
---
openshift_openstack_clusterid: "shiftstack"
openshift_openstack_public_dns_domain: "automated.lan"
openshift_openstack_dns_nameservers: ["10.19.115.119"]
openshift_openstack_public_hostname_suffix: "-public"
openshift_openstack_nsupdate_zone: "{{ openshift_openstack_public_dns_domain }}"
openshift_openstack_heat_template_version: queens
openshift_openstack_keypair_name: "cicd-key"
openshift_openstack_external_network_name: "public_network"
openshift_openstack_private_network_name:  "shiftstack-net"
openshift_openstack_default_image_name: "rhel-server-7.6"
openshift_openstack_build_base_image: "rhel-server-7.6"
openshift_openstack_num_masters: 3
openshift_openstack_num_infra: 3
openshift_openstack_num_cns: 0
openshift_openstack_num_nodes: 3
openshift_openstack_default_flavor: "m1.medium"
openshift_openstack_use_lbaas_load_balancer: true
openshift_openstack_docker_master_volume_size: "20"
openshift_openstack_docker_infra_volume_size: "20"
openshift_openstack_docker_node_volume_size: "20"
openshift_openstack_docker_volume_size: "15"
openshift_openstack_subnet_cidr: "10.240.0.0/24"
openshift_openstack_pool_start: "10.240.0.3"
openshift_openstack_pool_end: "10.240.0.254"
rhsub_ak: REDACTED
rhsub_orgid: REDACTED
rhsub_pool: REDACTED
openshift_openstack_external_nsupdate_keys:
  public:
    key_secret: 'REDACTED'
    key_algorithm: 'hmac-md5'
    key_name: "update-key"
    server: '10.19.115.119'
  private:
    key_secret: 'REDACTED'
    key_algorithm: 'hmac-md5'
    key_name: "update-key"
    server: '10.19.115.119'
ansible_user: openshift
openshift_openstack_disable_root: true
openshift_openstack_user: openshift
openshift_use_kuryr: True
openshift_use_openshift_sdn: False
openshift_master_open_ports:
  - service: dns tcp
    port: 53/tcp
  - service: dns udp
    port: 53/udp
openshift_node_open_ports:
  - service: dns tcp
    port: 53/tcp
  - service: dns udp
    port: 53/udp
use_trunk_ports: True
os_sdn_network_plugin_name: cni
openshift_node_proxy_mode: userspace
openshift_openstack_kuryr_cni_image: registry.redhat.io/rhosp13/openstack-kuryr-cni:13.0
openshift_openstack_kuryr_controller_image: registry.redhat.io/rhosp13/openstack-kuryr-controller:13.0
kuryr_openstack_public_net_id: 148a8023-62a7-4672-b018-003462f8d7dc
enable_kuryr_cni_probes: False

* OSEv3.yml 
---
openshift_deployment_type: openshift-enterprise
openshift_release: v3.11
oreg_url: registry.redhat.io/openshift3/ose-${component}:${version}
openshift_examples_modify_imagestreams: true
oreg_auth_user: REDACTED
oreg_auth_password: REDACTED
openshift_additional_registry_credentials: [{'host':'registry.connect.redhat.com','user':'REDACTED','password':'REDACTED','test_image':'mongodb/enterprise-operator:0.3.2'}]
openshift_master_identity_providers:
- name: 'htpasswd_auth'
  login: 'true'
  challenge: 'true'
  kind: 'HTPasswdPasswordIdentityProvider'
openshift_master_htpasswd_users:
  admin: 'REDACTED'
openshift_master_default_subdomain: "apps.{{ (openshift_openstack_clusterid|trim == '') | ternary(openshift_openstack_public_dns_domain, openshift_openstack_clusterid + '.' + openshift_openstack_public_dns_domain) }}"
openshift_master_cluster_public_hostname: "console.{{ (openshift_openstack_clusterid|trim == '') | ternary(openshift_openstack_public_dns_domain, openshift_openstack_clusterid + '.' + openshift_openstack_public_dns_domain) }}"
openshift_hosted_router_wait: True
openshift_hosted_registry_wait: True
openshift_cloudprovider_kind: openstack
openshift_cloudprovider_openstack_auth_url: "{{ lookup('env','OS_AUTH_URL') }}"
openshift_cloudprovider_openstack_username: "{{ lookup('env','OS_USERNAME') }}"
openshift_cloudprovider_openstack_password: "{{ lookup('env','OS_PASSWORD') }}"
openshift_cloudprovider_openstack_tenant_name: "{{ lookup('env','OS_PROJECT_NAME') }}"
openshift_cloudprovider_openstack_domain_name: "{{ lookup('env', 'OS_USER_DOMAIN_NAME') }}"
openshift_cloudprovider_openstack_region: "{{ lookup('env', 'OS_REGION_NAME') }}"
openshift_cloudprovider_openstack_blockstorage_version: v2
openshift_storageclass_parameters: {'fstype': 'xfs', 'availability': 'nova'}
openshift_cloudprovider_openstack_blockstorage_ignore_volume_az: true
openshift_hosted_registry_storage_kind: openstack
openshift_hosted_registry_storage_access_modes: ['ReadWriteOnce']
openshift_hosted_registry_storage_openstack_filesystem: xfs
openshift_hosted_registry_storage_volume_size: 50Gi
openshift_hosted_registry_storage_openstack_volumeID: 7b8a123e-4b13-4fa1-b86f-8cac4ca6a28b
networkPluginName: redhat/ovs-networkpolicy
openshift_hostname_check: false
openshift_disable_check: memory_availability
ansible_become: true
osm_use_cockpit: false
openshift_builddefaults_nodeselectors: {'node-role.kubernetes.io/infra':'true'}


* Nodes specific configuration:
host_vars/master-2.shiftstack.automated.lan.yml:openshift_use_crio_only: true
host_vars/master-2.shiftstack.automated.lan.yml:openshift_use_crio: true
host_vars/master-2.shiftstack.automated.lan.yml:openshift_openstack_master_group_name: node-config-master-crio
host_vars/master-2.shiftstack.automated.lan.yml:openshift_node_group_name: node-config-master-crio
host_vars/master-0.shiftstack.automated.lan.yml:openshift_use_crio_only: true
host_vars/master-0.shiftstack.automated.lan.yml:openshift_use_crio: true
host_vars/master-0.shiftstack.automated.lan.yml:openshift_openstack_master_group_name: node-config-master-crio
host_vars/master-0.shiftstack.automated.lan.yml:openshift_node_group_name: node-config-master-crio
host_vars/master-1.shiftstack.automated.lan.yml:openshift_use_crio_only: true
host_vars/master-1.shiftstack.automated.lan.yml:openshift_use_crio: true
host_vars/master-1.shiftstack.automated.lan.yml:openshift_openstack_master_group_name: node-config-master-crio
host_vars/master-1.shiftstack.automated.lan.yml:openshift_node_group_name: node-config-master-crio
host_vars/app-node-1.shiftstack.automated.lan.yml:openshift_use_crio_only: true
host_vars/app-node-1.shiftstack.automated.lan.yml:openshift_use_crio: true
host_vars/app-node-1.shiftstack.automated.lan.yml:openshift_openstack_master_group_name: node-config-compute-crio
host_vars/app-node-1.shiftstack.automated.lan.yml:openshift_node_group_name: node-config-compute-crio
host_vars/app-node-0.shiftstack.automated.lan.yml:openshift_use_crio_only: true
host_vars/app-node-0.shiftstack.automated.lan.yml:openshift_use_crio: true
host_vars/app-node-0.shiftstack.automated.lan.yml:openshift_openstack_master_group_name: node-config-compute-crio
host_vars/app-node-0.shiftstack.automated.lan.yml:openshift_node_group_name: node-config-compute-crio
host_vars/app-node-2.shiftstack.automated.lan.yml:openshift_use_crio_only: true
host_vars/app-node-2.shiftstack.automated.lan.yml:openshift_use_crio: true
host_vars/app-node-2.shiftstack.automated.lan.yml:openshift_openstack_master_group_name: node-config-compute-crio
host_vars/app-node-2.shiftstack.automated.lan.yml:openshift_node_group_name: node-config-compute-crio
host_vars/infra-node-0.shiftstack.automated.lan.yml:openshift_use_crio: false
host_vars/infra-node-2.shiftstack.automated.lan.yml:openshift_use_crio: false
host_vars/infra-node-1.shiftstack.automated.lan.yml:openshift_use_crio: false

Comment 1 Eduardo Minguez 2018-12-17 15:28:52 UTC
Some more information I forgot to add:

* Nodes

$ oc get nodes -o wide
NAME                                    STATUS    ROLES     AGE       VERSION           INTERNAL-IP   EXTERNAL-IP     OS-IMAGE                                      KERNEL-VERSION              CONTAINER-RUNTIME
app-node-0.shiftstack.automated.lan     Ready     compute   1h        v1.11.0+d4cacc0   10.240.0.6    10.19.115.122   Red Hat Enterprise Linux Server 7.6 (Maipo)   3.10.0-957.1.3.el7.x86_64   cri-o://1.11.10
app-node-1.shiftstack.automated.lan     Ready     compute   1h        v1.11.0+d4cacc0   10.240.0.15   10.19.115.123   Red Hat Enterprise Linux Server 7.6 (Maipo)   3.10.0-957.1.3.el7.x86_64   cri-o://1.11.10
app-node-2.shiftstack.automated.lan     Ready     compute   1h        v1.11.0+d4cacc0   10.240.0.22   10.19.115.117   Red Hat Enterprise Linux Server 7.6 (Maipo)   3.10.0-957.1.3.el7.x86_64   cri-o://1.11.10
infra-node-0.shiftstack.automated.lan   Ready     infra     1h        v1.11.0+d4cacc0   10.240.0.14   10.19.115.131   Red Hat Enterprise Linux Server 7.6 (Maipo)   3.10.0-957.1.3.el7.x86_64   docker://1.13.1
infra-node-1.shiftstack.automated.lan   Ready     infra     1h        v1.11.0+d4cacc0   10.240.0.35   10.19.115.133   Red Hat Enterprise Linux Server 7.6 (Maipo)   3.10.0-957.1.3.el7.x86_64   docker://1.13.1
infra-node-2.shiftstack.automated.lan   Ready     infra     1h        v1.11.0+d4cacc0   10.240.0.19   10.19.115.132   Red Hat Enterprise Linux Server 7.6 (Maipo)   3.10.0-957.1.3.el7.x86_64   docker://1.13.1
master-0.shiftstack.automated.lan       Ready     master    1h        v1.11.0+d4cacc0   10.240.0.8    10.19.115.126   Red Hat Enterprise Linux Server 7.6 (Maipo)   3.10.0-957.1.3.el7.x86_64   cri-o://1.11.10
master-1.shiftstack.automated.lan       Ready     master    1h        v1.11.0+d4cacc0   10.240.0.7    10.19.115.125   Red Hat Enterprise Linux Server 7.6 (Maipo)   3.10.0-957.1.3.el7.x86_64   cri-o://1.11.10
master-2.shiftstack.automated.lan       Ready     master    1h        v1.11.0+d4cacc0   10.240.0.27   10.19.115.128   Red Hat Enterprise Linux Server 7.6 (Maipo)   3.10.0-957.1.3.el7.x86_64   cri-o://1.11.10

* Pods

$ oc get pods -o wide --all-namespaces
NAMESPACE               NAME                                                   READY     STATUS              RESTARTS   AGE       IP            NODE                                    NOMINATED NODE
default                 docker-registry-1-qnpl5                                1/1       Running             0          1h        10.11.0.9     infra-node-2.shiftstack.automated.lan   <none>
default                 registry-console-4-deploy                              0/1       ContainerCreating   0          36m       <none>        master-0.shiftstack.automated.lan       <none>
default                 router-1-8fwzb                                         1/1       Running             0          1h        10.240.0.19   infra-node-2.shiftstack.automated.lan   <none>
default                 router-1-b6qvr                                         1/1       Running             0          1h        10.240.0.35   infra-node-1.shiftstack.automated.lan   <none>
default                 router-1-dj77t                                         1/1       Running             0          1h        10.240.0.14   infra-node-0.shiftstack.automated.lan   <none>
kube-system             master-api-master-0.shiftstack.automated.lan           1/1       Running             0          1h        10.240.0.8    master-0.shiftstack.automated.lan       <none>
kube-system             master-api-master-1.shiftstack.automated.lan           1/1       Running             0          1h        10.240.0.7    master-1.shiftstack.automated.lan       <none>
kube-system             master-api-master-2.shiftstack.automated.lan           1/1       Running             0          1h        10.240.0.27   master-2.shiftstack.automated.lan       <none>
kube-system             master-controllers-master-0.shiftstack.automated.lan   1/1       Running             2          1h        10.240.0.8    master-0.shiftstack.automated.lan       <none>
kube-system             master-controllers-master-1.shiftstack.automated.lan   1/1       Running             1          1h        10.240.0.7    master-1.shiftstack.automated.lan       <none>
kube-system             master-controllers-master-2.shiftstack.automated.lan   1/1       Running             1          1h        10.240.0.27   master-2.shiftstack.automated.lan       <none>
kube-system             master-etcd-master-0.shiftstack.automated.lan          1/1       Running             2          1h        10.240.0.8    master-0.shiftstack.automated.lan       <none>
kube-system             master-etcd-master-1.shiftstack.automated.lan          1/1       Running             2          1h        10.240.0.7    master-1.shiftstack.automated.lan       <none>
kube-system             master-etcd-master-2.shiftstack.automated.lan          1/1       Running             1          1h        10.240.0.27   master-2.shiftstack.automated.lan       <none>
openshift-infra         kuryr-cni-ds-446l9                                     2/2       Running             0          51m       10.240.0.27   master-2.shiftstack.automated.lan       <none>
openshift-infra         kuryr-cni-ds-446l9                                     2/2       Running             0          51m       10.240.0.27   master-2.shiftstack.automated.lan       <none>
openshift-infra         kuryr-cni-ds-5xbzd                                     2/2       Running             0          51m       10.240.0.19   infra-node-2.shiftstack.automated.lan   <none>
openshift-infra         kuryr-cni-ds-hfbvk                                     2/2       Running             0          51m       10.240.0.35   infra-node-1.shiftstack.automated.lan   <none>
openshift-infra         kuryr-cni-ds-k9wqd                                     2/2       Running             0          51m       10.240.0.15   app-node-1.shiftstack.automated.lan     <none>
openshift-infra         kuryr-cni-ds-lgmjr                                     2/2       Running             0          51m       10.240.0.22   app-node-2.shiftstack.automated.lan     <none>
openshift-infra         kuryr-cni-ds-mmpr4                                     2/2       Running             0          51m       10.240.0.14   infra-node-0.shiftstack.automated.lan   <none>
openshift-infra         kuryr-cni-ds-qphqm                                     2/2       Running             0          51m       10.240.0.8    master-0.shiftstack.automated.lan       <none>
openshift-infra         kuryr-cni-ds-r5bb9                                     2/2       Running             0          51m       10.240.0.7    master-1.shiftstack.automated.lan       <none>
openshift-infra         kuryr-cni-ds-vpvlz                                     2/2       Running             0          51m       10.240.0.6    app-node-0.shiftstack.automated.lan     <none>
openshift-infra         kuryr-controller-5945bd8bf4-tsxvg                      1/1       Running             0          38m       10.240.0.35   infra-node-1.shiftstack.automated.lan   <none>
openshift-monitoring    alertmanager-main-0                                    3/3       Running             0          1h        10.11.0.13    infra-node-2.shiftstack.automated.lan   <none>
openshift-monitoring    alertmanager-main-1                                    3/3       Running             0          1h        10.11.0.28    infra-node-1.shiftstack.automated.lan   <none>
openshift-monitoring    alertmanager-main-2                                    3/3       Running             0          1h        10.11.0.3     infra-node-0.shiftstack.automated.lan   <none>
openshift-monitoring    cluster-monitoring-operator-6f5fbd6f8b-g2gwf           1/1       Running             0          1h        10.11.0.26    infra-node-0.shiftstack.automated.lan   <none>
openshift-monitoring    grafana-857fc848bf-t4gnk                               2/2       Running             0          1h        10.11.0.16    infra-node-2.shiftstack.automated.lan   <none>
openshift-monitoring    kube-state-metrics-79b579544-9p4d2                     3/3       Running             0          1h        10.11.0.7     infra-node-2.shiftstack.automated.lan   <none>
openshift-monitoring    node-exporter-28pk6                                    2/2       Running             0          1h        10.240.0.22   app-node-2.shiftstack.automated.lan     <none>
openshift-monitoring    node-exporter-2frm2                                    2/2       Running             0          1h        10.240.0.27   master-2.shiftstack.automated.lan       <none>
openshift-monitoring    node-exporter-49v2p                                    2/2       Running             0          1h        10.240.0.6    app-node-0.shiftstack.automated.lan     <none>
openshift-monitoring    node-exporter-5m7rf                                    2/2       Running             0          1h        10.240.0.19   infra-node-2.shiftstack.automated.lan   <none>
openshift-monitoring    node-exporter-7k4v7                                    2/2       Running             0          1h        10.240.0.35   infra-node-1.shiftstack.automated.lan   <none>
openshift-monitoring    node-exporter-7nbbn                                    2/2       Running             0          1h        10.240.0.14   infra-node-0.shiftstack.automated.lan   <none>
openshift-monitoring    node-exporter-dzp4n                                    2/2       Running             0          1h        10.240.0.15   app-node-1.shiftstack.automated.lan     <none>
openshift-monitoring    node-exporter-gws25                                    2/2       Running             0          1h        10.240.0.7    master-1.shiftstack.automated.lan       <none>
openshift-monitoring    node-exporter-q9zmg                                    2/2       Running             0          1h        10.240.0.8    master-0.shiftstack.automated.lan       <none>
openshift-monitoring    prometheus-k8s-0                                       4/4       Running             1          1h        10.11.0.23    infra-node-0.shiftstack.automated.lan   <none>
openshift-monitoring    prometheus-k8s-1                                       4/4       Running             1          1h        10.11.0.59    infra-node-1.shiftstack.automated.lan   <none>
openshift-monitoring    prometheus-operator-7855c8646b-n8zvr                   1/1       Running             0          1h        10.11.0.11    infra-node-0.shiftstack.automated.lan   <none>
openshift-node          sync-8nbzn                                             1/1       Running             0          1h        10.240.0.14   infra-node-0.shiftstack.automated.lan   <none>
openshift-node          sync-bxd4k                                             1/1       Running             0          1h        10.240.0.35   infra-node-1.shiftstack.automated.lan   <none>
openshift-node          sync-bxvg2                                             1/1       Running             0          1h        10.240.0.19   infra-node-2.shiftstack.automated.lan   <none>
openshift-node          sync-cpfdp                                             1/1       Running             0          1h        10.240.0.27   master-2.shiftstack.automated.lan       <none>
openshift-node          sync-fh76x                                             1/1       Running             0          1h        10.240.0.22   app-node-2.shiftstack.automated.lan     <none>
openshift-node          sync-gb2sw                                             1/1       Running             0          1h        10.240.0.15   app-node-1.shiftstack.automated.lan     <none>
openshift-node          sync-hvt7l                                             1/1       Running             0          1h        10.240.0.8    master-0.shiftstack.automated.lan       <none>
openshift-node          sync-zclxb                                             1/1       Running             0          1h        10.240.0.7    master-1.shiftstack.automated.lan       <none>
openshift-node          sync-zdtg5                                             1/1       Running             0          1h        10.240.0.6    app-node-0.shiftstack.automated.lan     <none>
openshift-web-console   webconsole-7f7f679596-cht8c                            0/1       ContainerCreating   0          1h        <none>        master-0.shiftstack.automated.lan       <none>
openshift-web-console   webconsole-7f7f679596-cnwkq                            0/1       ContainerCreating   0          1h        <none>        master-1.shiftstack.automated.lan       <none>
openshift-web-console   webconsole-7f7f679596-sbgbs                            0/1       ContainerCreating   0          1h        <none>        master-2.shiftstack.automated.lan       <none>


* Docker based nodes are running containers ok

openshift-monitoring    grafana-857fc848bf-t4gnk                               2/2       Running             0          1h        10.11.0.16    infra-node-2.shiftstack.automated.lan   <none>
openshift-monitoring    kube-state-metrics-79b579544-9p4d2                     3/3       Running             0          59m       10.11.0.7     infra-node-2.shiftstack.automated.lan   <none>


* Pods with network host capabilities (hostNetwork: true) are running ok

openshift-node          sync-cpfdp                                             1/1       Running             0          1h        10.240.0.27   master-2.shiftstack.automated.lan       <none>



Events:

* openshift-web-console namespace:
...
1h          1h           1         webconsole-7f7f679596-cht8c.15712404d53ba758   Pod                      Warning   FailedCreatePodSandBox   kubelet, master-0.shiftstack.automated.lan   Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_webconsole-7f7f679596-cht8c_openshift-web-console_1675d0ac-0205-11e9-aac7-fa163ea5fa78_0(197c663c7f344f7250a42cf01ef0f67f48aa996a4f0e75e02ba476960e7b5c04): netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
1h          1h           1         webconsole-7f7f679596-sbgbs.157124075ffd9840   Pod                      Warning   FailedCreatePodSandBox   kubelet, master-2.shiftstack.automated.lan   Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_webconsole-7f7f679596-sbgbs_openshift-web-console_167e1195-0205-11e9-aac7-fa163ea5fa78_0(5bf89f3bf42e5fe02f35c4c381e4c8ec2d46d11495314131d5356751fe62bee4): netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
1h          1h           1         webconsole-7f7f679596-cnwkq.15712408882f19f9   Pod                      Warning   FailedCreatePodSandBox   kubelet, master-1.shiftstack.automated.lan   Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_webconsole-7f7f679596-cnwkq_openshift-web-console_15b390fd-0205-11e9-aac7-fa163ea5fa78_0(2d7dcb3e96b9430f4bc9c83949a6e032a3b55a0c241401f66683e36b64cbc0ef): netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
1h          1h           1         webconsole-7f7f679596-cht8c.157124092d814c29   Pod                      Warning   FailedCreatePodSandBox   kubelet, master-0.shiftstack.automated.lan   Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_webconsole-7f7f679596-cht8c_openshift-web-console_1675d0ac-0205-11e9-aac7-fa163ea5fa78_0(210e6e2eb640a6155d5e16ddbf3adf9ae78606f24aabe71b0cde90cc4b05dd0d): netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
1h          1h           1         webconsole-7f7f679596-cnwkq.1571240bf0674904   Pod                      Warning   FailedCreatePodSandBox   kubelet, master-1.shiftstack.automated.lan   Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_webconsole-7f7f679596-cnwkq_openshift-web-console_15b390fd-0205-11e9-aac7-fa163ea5fa78_0(895e313509e5662287dbdcc3020f3edce1b9353282eb5c7e953c20fcafccc8cc): netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
1h          1h           1         webconsole-7f7f679596-cht8c.157124113c8dd39d   Pod                      Warning   FailedCreatePodSandBox   kubelet, master-0.shiftstack.automated.lan   Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_webconsole-7f7f679596-cht8c_openshift-web-console_1675d0ac-0205-11e9-aac7-fa163ea5fa78_0(4a7cc2a9b4c3a0369635842ba285531b6559ffbcf6f88aeed91cc973a551c12f): netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
12s         1h           256       webconsole-7f7f679596-sbgbs.1571240a6aeff174   Pod                      Warning   FailedCreatePodSandBox   kubelet, master-2.shiftstack.automated.lan   (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_webconsole-7f7f679596-sbgbs_openshift-web-console_167e1195-0205-11e9-aac7-fa163ea5fa78_0(982573aab6256f40d5c880c839bf23800b54de3c4bc64f4cb639b44f648bd6f9): netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
12s         1h           276       webconsole-7f7f679596-cht8c.157124141386a91f   Pod                      Warning   FailedCreatePodSandBox   kubelet, master-0.shiftstack.automated.lan   (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_webconsole-7f7f679596-cht8c_openshift-web-console_1675d0ac-0205-11e9-aac7-fa163ea5fa78_0(5ce1658826ffe2ad6992dffdbd785a3e4caef47cedf82576fc8db967ca895c97): netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
10s         1h           253       webconsole-7f7f679596-cnwkq.157124100a65c271   Pod                      Warning   FailedCreatePodSandBox   kubelet, master-1.shiftstack.automated.lan   (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_webconsole-7f7f679596-cnwkq_openshift-web-console_15b390fd-0205-11e9-aac7-fa163ea5fa78_0(86fcadf26d981bd2bcffb607296492b39469361f0c8afe108055a512cc75f355): netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input


Things I've tried to try to fix it:

* Restarted kuryr related pods just in case
* Removed health checks in kuryr-controller deployment and rollout a new deployment
* Tried a new rollout of some failed deployment (the registry-console)

Logs:

* kuryr-cni-ds-qphqm (master-0 pod)

+ cleanup
+ rm -f /etc/cni/net.d/10-kuryr.conf
+ rm -f /opt/cni/bin/kuryr-cni
+ deploy
+ POD_NAMESPACE=openshift-infra
+ cat
+ cp /kuryr-cni /opt/cni/bin/kuryr-cni
+ chmod +x /opt/cni/bin/kuryr-cni
+ cp /opt/kuryr-kubernetes/etc/cni/net.d/10-kuryr.conf /etc/cni/net.d
+ '[' True == True ']'
+ kuryr-daemon --config-file /etc/kuryr/kuryr.conf
2018-12-17 14:29:27.548 14 INFO kuryr_kubernetes.config [-] Logging enabled!
2018-12-17 14:29:27.549 14 INFO kuryr_kubernetes.config [-] /usr/bin/kuryr-daemon version 0.4.5
2018-12-17 14:29:27.816 14 INFO os_vif [-] Loaded VIF plugins: noop, ovs, linux_bridge
2018-12-17 14:29:27.866 30 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/pods?fieldSelector=spec.nodeName=master-0.shiftstack.automated.lan'
2018-12-17 14:29:27.872 31 INFO kuryr_kubernetes.cni.daemon.service [-] Starting server on 127.0.0.1:50036.
2018-12-17 14:29:27.889 31 INFO werkzeug [-]  * Running on http://127.0.0.1:50036/
2018-12-17 14:30:17.930 30 WARNING kuryr_kubernetes.watcher [-] Restarting(0) watching '/api/v1/pods?fieldSelector=spec.nodeName=master-0.shiftstack.automated.lan': ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 byte
s read)): ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
2018-12-17 14:30:20.936 30 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/pods?fieldSelector=spec.nodeName=master-0.shiftstack.automated.lan'
...
2018-12-17 15:01:05.582 30 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/pods?fieldSelector=spec.nodeName=master-0.shiftstack.automated.lan'
2018-12-17 15:01:48.648 30 WARNING kuryr_kubernetes.watcher [-] Restarting(1) watching '/api/v1/pods?fieldSelector=spec.nodeName=master-0.shiftstack.automated.lan': {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","mess
age":"Unauthorized","reason":"Unauthorized","code":401}
: K8sClientException: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}
...
2018-12-17 15:24:04.923 30 WARNING kuryr_kubernetes.watcher [-] Restarting(0) watching '/api/v1/pods?fieldSelector=spec.nodeName=master-0.shiftstack.automated.lan': ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 byte
s read)): ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
2018-12-17 15:24:07.929 30 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/pods?fieldSelector=spec.nodeName=master-0.shiftstack.automated.lan'


* kuryr-controller-5945bd8bf4-tsxvg  (kuryr-controller)

2018-12-17 14:42:42.021 1 INFO kuryr_kubernetes.config [-] Logging enabled!
2018-12-17 14:42:42.022 1 INFO kuryr_kubernetes.config [-] /usr/bin/kuryr-k8s-controller version 0.4.5
2018-12-17 14:42:44.433 1 INFO os_vif [-] Loaded VIF plugins: noop, ovs, linux_bridge
2018-12-17 14:42:44.694 1 INFO kuryr_kubernetes.controller.service [-] Service 'KuryrK8sService' stopped
2018-12-17 14:42:44.719 1 INFO kuryr_kubernetes.controller.service [-] Service 'KuryrK8sService' starting
2018-12-17 14:42:44.721 1 INFO kuryr_kubernetes.controller.managers.health [-] Starting health check server.
2018-12-17 14:42:44.725 1 INFO werkzeug [-]  * Running on http://localhost:8082/
2018-12-17 14:42:44.742 1 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/pods'
2018-12-17 14:42:44.750 1 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/services'
2018-12-17 14:42:44.756 1 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/endpoints'
2018-12-17 14:42:49.634 1 INFO kuryr_kubernetes.controller.drivers.vif_pool [-] PORTS POOL: pools updated with pre-created ports
2018-12-17 14:43:34.833 1 WARNING kuryr_kubernetes.watcher [-] Restarting(0) watching '/api/v1/endpoints': ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read)): ChunkedEncodingError: ('Connection broken: Incom
pleteRead(0 bytes read)', IncompleteRead(0 bytes read))
2018-12-17 14:43:35.495 1 WARNING kuryr_kubernetes.watcher [-] Restarting(0) watching '/api/v1/pods': ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read)): ChunkedEncodingError: ('Connection broken: Incomplete
Read(0 bytes read)', IncompleteRead(0 bytes read))
2018-12-17 14:43:37.765 1 WARNING kuryr_kubernetes.watcher [-] Restarting(0) watching '/api/v1/services': ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read)): ChunkedEncodingError: ('Connection broken: Incomp
leteRead(0 bytes read)', IncompleteRead(0 bytes read))
2018-12-17 14:43:37.842 1 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/endpoints'
2018-12-17 14:43:38.496 1 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/pods'
2018-12-17 14:43:40.771 1 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/services'
2018-12-17 14:44:28.607 1 WARNING kuryr_kubernetes.watcher [-] Restarting(0) watching '/api/v1/endpoints': ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read)): ChunkedEncodingError: ('Connection broken: Incom
pleteRead(0 bytes read)', IncompleteRead(0 bytes read))
2018-12-17 14:44:30.812 1 WARNING kuryr_kubernetes.watcher [-] Restarting(0) watching '/api/v1/services': ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read)): ChunkedEncodingError: ('Connection broken: Incomp
leteRead(0 bytes read)', IncompleteRead(0 bytes read))
2018-12-17 14:44:31.613 1 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/endpoints'
2018-12-17 14:44:33.814 1 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/services'
2018-12-17 14:44:56.826 1 WARNING kuryr_kubernetes.watcher [-] Restarting(0) watching '/api/v1/pods': ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read)): ChunkedEncodingError: ('Connection broken: Incomplete
Read(0 bytes read)', IncompleteRead(0 bytes read))
2018-12-17 14:44:59.833 1 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/pods'
2018-12-17 14:45:21.739 1 WARNING kuryr_kubernetes.watcher [-] Restarting(0) watching '/api/v1/endpoints': ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read)): ChunkedEncodingError: ('Connection broken: Incom
pleteRead(0 bytes read)', IncompleteRead(0 bytes read))
2018-12-17 14:45:23.860 1 WARNING kuryr_kubernetes.watcher [-] Restarting(0) watching '/api/v1/services': ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read)): ChunkedEncodingError: ('Connection broken: Incomp
leteRead(0 bytes read)', IncompleteRead(0 bytes read))
...
2018-12-17 15:01:48.575 1 WARNING kuryr_kubernetes.watcher [-] Restarting(1) watching '/api/v1/pods': {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}
: K8sClientException: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}
2018-12-17 15:01:48.586 1 WARNING kuryr_kubernetes.watcher [-] Restarting(1) watching '/api/v1/services': {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}
: K8sClientException: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}
2018-12-17 15:01:48.590 1 WARNING kuryr_kubernetes.watcher [-] Restarting(1) watching '/api/v1/endpoints': {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}
: K8sClientException: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}
2018-12-17 15:01:54.591 1 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/endpoints'
2018-12-17 15:01:57.579 1 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/pods'
2018-12-17 15:01:57.589 1 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/services'
...
2018-12-17 15:24:35.146 1 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/endpoints'
2018-12-17 15:24:46.761 1 WARNING kuryr_kubernetes.watcher [-] Restarting(0) watching '/api/v1/services': ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read)): ChunkedEncodingError: ('Connection broken: Incomp
leteRead(0 bytes read)', IncompleteRead(0 bytes read))
2018-12-17 15:24:49.766 1 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/services'
2018-12-17 15:25:03.318 1 WARNING kuryr_kubernetes.watcher [-] Restarting(0) watching '/api/v1/pods': ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read)): ChunkedEncodingError: ('Connection broken: Incomplete
Read(0 bytes read)', IncompleteRead(0 bytes read))
2018-12-17 15:25:06.321 1 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/pods'
2018-12-17 15:25:26.280 1 WARNING kuryr_kubernetes.watcher [-] Restarting(0) watching '/api/v1/endpoints': ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read)): ChunkedEncodingError: ('Connection broken: Incom
pleteRead(0 bytes read)', IncompleteRead(0 bytes read))

Comment 2 Eduardo Minguez 2018-12-18 09:39:35 UTC
After discussing this BZ with Luis Tomas, he suggested me to try to use the kuryr/{cni/controller}:latest images instead and enable the heatlhchecks.
So I tore the environment down and recreated again with the following modifications in the variables:

all.yml
----
enable_kuryr_controller_probes: True
enable_kuryr_cni_probes: True
# Commented out the images to use latest.
#openshift_openstack_kuryr_cni_image: registry.redhat.io/rhosp13/openstack-kuryr-cni:13.0
#openshift_openstack_kuryr_controller_image: registry.redhat.io/rhosp13/openstack-kuryr-controller:13.0
---

After a while, the installation failed. The kuryr-cni pods are running ok, but the controller is not:

---
$ oc get pods
NAME                               READY     STATUS    RESTARTS   AGE
kuryr-cni-ds-7mfss                 2/2       Running   0          14h
kuryr-cni-ds-cn7lv                 2/2       Running   0          14h
kuryr-cni-ds-cslnx                 2/2       Running   0          14h
kuryr-cni-ds-dd6fw                 2/2       Running   0          14h
kuryr-cni-ds-fpdcc                 2/2       Running   0          14h
kuryr-cni-ds-k46mg                 2/2       Running   0          14h
kuryr-cni-ds-nttkw                 2/2       Running   0          14h
kuryr-cni-ds-svfkz                 2/2       Running   0          14h
kuryr-cni-ds-vkpkl                 2/2       Running   0          14h
kuryr-controller-897b579df-txsqp   0/1       Running   1          14h
---

I've tried to create a new project & pod as:
---
[openshift@master-0 ~]$ oc new-app kuryr/demo
warning: Cannot find git. Ensure that it is installed and in your path. Git is required to work with git repositories.
--> Found Docker image f4d576a (8 weeks old) from Docker Hub for "kuryr/demo"

    * An image stream tag will be created as "demo:latest" that will track this image
    * This image will be deployed in deployment config "demo"
    * Port 8080/tcp will be load balanced by service "demo"
      * Other containers can access this service through the hostname "demo"

--> Creating resources ...
    imagestream.image.openshift.io "demo" created
    deploymentconfig.apps.openshift.io "demo" created
    service "demo" created
--> Success
    Application is not exposed. You can expose services to the outside world by executing one or more of the commands below:
     'oc expose svc/demo' 
    Run 'oc status' to view your app.
[openshift@master-0 ~]$ oc get pods
NAME            READY     STATUS              RESTARTS   AGE
demo-1-deploy   0/1       ContainerCreating   0          2s
[openshift@master-0 ~]$ oc get events
LAST SEEN   FIRST SEEN   COUNT     NAME                             KIND               SUBOBJECT   TYPE      REASON                   SOURCE                                         MESSAGE
11s         11s          1         demo.1571637797197fc2            DeploymentConfig               Normal    DeploymentCreated        deploymentconfig-controller                    Created new replication controller "demo-1" for version 1
9s          9s           1         demo-1-deploy.1571637816cff910   Pod                            Normal    Scheduled                default-scheduler                              Successfully assigned asdf/demo-1-deploy to app-node-1.shiftstack.automated.lan
7s          7s           1         demo-1-deploy.157163789f2bac2c   Pod                            Warning   FailedCreatePodSandBox   kubelet, app-node-1.shiftstack.automated.lan   Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_demo-1-deploy_asdf_c17f620a-02a7-11e9-9814-fa163ea73c28_0(90f8d7da47b74ef7845568661b634a4b3b27fe179bcf4266a9952b1fc9d6432c): netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
---


kuryr-cni logs from the app-node-1 container:

---
2018-12-18 09:33:13.917 34 INFO werkzeug [-] 10.240.0.6 - - [18/Dec/2018 09:33:13] "GET /alive HTTP/1.1" 200 -
2018-12-18 09:33:17.347 34 INFO kuryr_kubernetes.cni.health [-] CNI driver readiness verified.
2018-12-18 09:33:17.349 34 INFO werkzeug [-] 10.240.0.6 - - [18/Dec/2018 09:33:17] "GET /ready HTTP/1.1" 200 -
2018-12-18 09:33:23.920 34 WARNING pyroute2.ipdb.main [-] shutdown in progress
2018-12-18 09:33:23.923 34 INFO werkzeug [-] 10.240.0.6 - - [18/Dec/2018 09:33:23] "GET /alive HTTP/1.1" 200 -
2018-12-18 09:33:27.347 34 INFO kuryr_kubernetes.cni.health [-] CNI driver readiness verified.
2018-12-18 09:33:27.349 34 INFO werkzeug [-] 10.240.0.6 - - [18/Dec/2018 09:33:27] "GET /ready HTTP/1.1" 200 -
2018-12-18 09:33:29.853 27 ERROR kuryr_kubernetes.watcher [-] Caught exception while watching.: ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
2018-12-18 09:33:29.853 27 ERROR kuryr_kubernetes.watcher Traceback (most recent call last):
2018-12-18 09:33:29.853 27 ERROR kuryr_kubernetes.watcher   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/watcher.py", line 186, in _watch
2018-12-18 09:33:29.853 27 ERROR kuryr_kubernetes.watcher     for event in self._client.watch(path):
2018-12-18 09:33:29.853 27 ERROR kuryr_kubernetes.watcher   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/k8s_client.py", line 199, in watch
2018-12-18 09:33:29.853 27 ERROR kuryr_kubernetes.watcher     for line in response.iter_lines():
2018-12-18 09:33:29.853 27 ERROR kuryr_kubernetes.watcher   File "/usr/lib/python2.7/site-packages/requests/models.py", line 794, in iter_lines
2018-12-18 09:33:29.853 27 ERROR kuryr_kubernetes.watcher     for chunk in self.iter_content(chunk_size=chunk_size, decode_unicode=decode_unicode):
2018-12-18 09:33:29.853 27 ERROR kuryr_kubernetes.watcher   File "/usr/lib/python2.7/site-packages/requests/models.py", line 753, in generate
2018-12-18 09:33:29.853 27 ERROR kuryr_kubernetes.watcher     raise ChunkedEncodingError(e)
2018-12-18 09:33:29.853 27 ERROR kuryr_kubernetes.watcher ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
2018-12-18 09:33:29.853 27 ERROR kuryr_kubernetes.watcher 
2018-12-18 09:33:29.857 27 WARNING kuryr_kubernetes.watcher [-] Restarting(0) watching '/api/v1/pods?fieldSelector=spec.nodeName=app-node-1.shiftstack.automated.lan'.: ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
2018-12-18 09:33:32.861 27 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/pods?fieldSelector=spec.nodeName=app-node-1.shiftstack.automated.lan'
2018-12-18 09:33:33.919 34 WARNING pyroute2.ipdb.main [-] shutdown in progress
2018-12-18 09:33:33.922 34 INFO werkzeug [-] 10.240.0.6 - - [18/Dec/2018 09:33:33] "GET /alive HTTP/1.1" 200 -
2018-12-18 09:33:37.338 34 INFO kuryr_kubernetes.cni.health [-] CNI driver readiness verified.
2018-12-18 09:33:37.340 34 INFO werkzeug [-] 10.240.0.6 - - [18/Dec/2018 09:33:37] "GET /ready HTTP/1.1" 200 -
2018-12-18 09:33:43.917 34 WARNING pyroute2.ipdb.main [-] shutdown in progress
2018-12-18 09:33:43.919 34 INFO werkzeug [-] 10.240.0.6 - - [18/Dec/2018 09:33:43] "GET /alive HTTP/1.1" 200 -
---

I've attached the 'oc logs kuryr-controller-897b579df-txsqp' output.

Comment 3 Eduardo Minguez 2018-12-18 09:40:26 UTC
Created attachment 1515271 [details]
controller logs

Comment 4 Michał Dulko 2019-08-01 16:15:05 UTC
This is related to OpenStack-based Kuryr images that we don't really use anymore in 4.2, so moving this to 3.11z.

Comment 5 Michał Dulko 2019-08-01 17:02:33 UTC
And moving this to OpenStack.

Comment 8 Michał Dulko 2019-08-01 17:14:34 UTC
This is now fixed in the repo where containers are built from and should be available in the next build.