Created attachment 1507144 [details] The logs of destroying the bootstrap Description of problem: The master IP and route are deleted after destroying the bootstrap. Version-Release number of the following components: [jzhang@dhcp-140-18 installer]$ openshift-install version openshift-install v0.3.0-243-g3d0ba6a0b0ef539970b5d8ae5542411b0bcb34b8 Terraform v0.11.8 Your version of Terraform is out of date! The latest version is 0.11.10. You can update by downloading from www.terraform.io/downloads.html How reproducible: always Steps to Reproduce: 1. Create the OCP 4.0 by following this doc: https://github.com/openshift/installer/blob/master/docs/dev/libvirt-howto.md 2. Environemnt varaiable: [jzhang@dhcp-140-18 installer]$ cat env.sh export OPENSHIFT_INSTALL_PLATFORM=libvirt export OPENSHIFT_INSTALL_BASE_DOMAIN=tt.testing export OPENSHIFT_INSTALL_CLUSTER_NAME=jian export OPENSHIFT_INSTALL_PULL_SECRET_PATH=`pwd`/config.json export OPENSHIFT_INSTALL_LIBVIRT_URI=qemu+tcp://192.168.122.1/system export OPENSHIFT_INSTALL_EMAIL_ADDRESS=jiazha export OPENSHIFT_INSTALL_PASSWORD=redhat export OPENSHIFT_INSTALL_SSH_PUB_KEY_PATH=/home/jzhang/.ssh/id_rsa.pub 3, [jzhang@dhcp-140-18 installer]$ openshift-install create cluster --dir=jian --log-level=debug Actual results: [jzhang@dhcp-140-18 installer]$ sudo virsh list --all setlocale: No such file or directory Id Name State ---------------------------------------------------- 19 jian-master-0 running 20 jian-worker-0-fqmrd running [jzhang@dhcp-140-18 installer]$ oc get ns Unable to connect to the server: dial tcp 192.168.126.11:6443: connect: no route to host [jzhang@dhcp-140-18 installer]$ virsh -c "${OPENSHIFT_INSTALL_LIBVIRT_URI}" domifaddr "${OPENSHIFT_INSTALL_CLUSTER_NAME}-master-0" setlocale: No such file or directory Name MAC address Protocol Address ------------------------------------------------------------------------------- Expected results: can access this cluster successfully. Additional info: [jzhang@dhcp-140-18 installer]$ sudo cat /var/lib/libvirt/dnsmasq/* ##WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE ##OVERWRITTEN AND LOST. Changes to this configuration should be made using: ## virsh net-edit default ## or other application using the libvirt API. ## ## dnsmasq conf file created by libvirt strict-order pid-file=/var/run/libvirt/network/default.pid except-interface=lo bind-dynamic interface=virbr0 dhcp-range=192.168.122.2,192.168.122.254 dhcp-no-override dhcp-authoritative dhcp-lease-max=253 dhcp-hostsfile=/var/lib/libvirt/dnsmasq/default.hostsfile addn-hosts=/var/lib/libvirt/dnsmasq/default.addnhosts 192.168.126.11 jian-api jian-etcd-0 192.168.126.50 jian ##WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE ##OVERWRITTEN AND LOST. Changes to this configuration should be made using: ## virsh net-edit jian ## or other application using the libvirt API. ## ## dnsmasq conf file created by libvirt strict-order local=/tt.testing/ domain=tt.testing expand-hosts pid-file=/var/run/libvirt/network/jian.pid except-interface=lo bind-dynamic interface=tt0 srv-host=_etcd-server-ssl._tcp.jian.tt.testing,jian-etcd-0.tt.testing,2380,0,10 addn-hosts=/var/lib/libvirt/dnsmasq/jian.addnhosts [ ]
You need to update your libvirt provider to the latest version - specifically, you need https://github.com/dmacvicar/terraform-provider-libvirt/pull/469.
Alex, Thanks, I update it. The old version: [jzhang@dhcp-140-18 plugins]$ ./terraform-provider-libvirt -version ./terraform-provider-libvirt was not built correctly Compiled against library: libvirt 3.7.0 Using library: libvirt 4.1.0 Running hypervisor: QEMU 2.11.2 Running against daemon: 4.1.0 The new version: [jzhang@dhcp-140-18 plugins]$ ./terraform-provider-libvirt -version ./terraform-provider-libvirt was not built correctly Compiled against library: libvirt 4.1.0 Using library: libvirt 4.1.0 Running hypervisor: QEMU 2.11.2 Running against daemon: 4.1.0 And, I also update the openshift-installer, but got the errors: the worker didn't up. [jzhang@dhcp-140-18 installer]$ openshift-install version openshift-install v0.3.0-250-g30bb25ac57d7c7d3dae519186cbfca9af8aeaca2 Terraform v0.11.8 Your version of Terraform is out of date! The latest version is 0.11.10. You can update by downloading from www.terraform.io/downloads.html [jzhang@dhcp-140-18 installer]$ sudo virsh list --all setlocale: No such file or directory Id Name State ---------------------------------------------------- 21 demo2-bootstrap running 22 demo2-master-0 running [jzhang@dhcp-140-18 installer]$ oc get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE openshift-cluster-version cluster-version-operator-8bb6cff75-492km 0/1 Pending 0 45m [jzhang@dhcp-140-18 installer]$ oc get nodes No resources found. [jzhang@dhcp-140-18 installer]$ oc get all -n openshift-cluster-api No resources found. I can get its IP address successfully, but still cannot resolve hostname. [jzhang@dhcp-140-18 installer]$ virsh -c "${OPENSHIFT_INSTALL_LIBVIRT_URI}" domifaddr "${OPENSHIFT_INSTALL_CLUSTER_NAME}-master-0" setlocale: No such file or directory Name MAC address Protocol Address ------------------------------------------------------------------------------- vnet1 32:b0:b7:03:99:af ipv4 192.168.126.11/24 [jzhang@dhcp-140-18 installer]$ ssh core ssh: Could not resolve hostname demo2-master-0-tt.testing: Name or service not known Access the master and check the Kubelet log, got below errors, what do you suggest? [jzhang@dhcp-140-18 installer]$ ssh core.126.11 [core@demo2-master-0 ~]$ journalctl -f -u kubelet ... Nov 20 04:27:43 demo2-master-0 hyperkube[5132]: I1120 04:27:43.395217 5132 kubelet_node_status.go:79] Attempting to register node demo2-master-0 Nov 20 04:27:43 demo2-master-0 hyperkube[5132]: E1120 04:27:43.396444 5132 kubelet_node_status.go:103] Unable to register node "demo2-master-0" with API server: nodes is forbidden: User "system:anonymous" cannot create nodes at the cluster scope: no RBAC policy matched
The DNS issue can be resolved by following the libvirt setup guide in the installer. Can you also try following the troubleshooting guide? That should help highlight which component is failing.
Sorry, I didn't find the solution in the trobleshooting section. The issue I met is the worker node did NOT up. Is it a DNS issue? Can you tell me how to debug it? I check the log of the Kubelet running on the master node, got errors: Nov 20 04:27:43 demo2-master-0 hyperkube[5132]: I1120 04:27:43.395217 5132 kubelet_node_status.go:79] Attempting to register node demo2-master-0 Nov 20 04:27:43 demo2-master-0 hyperkube[5132]: E1120 04:27:43.396444 5132 kubelet_node_status.go:103] Unable to register node "demo2-master-0" with API server: nodes is forbidden: User "system:anonymous" cannot create nodes at the cluster scope: no RBAC policy matched
> Sorry, I didn't find the solution in the trobleshooting section. The issue I met is the worker node did NOT up. The section for that is [1]. It suggests looking at the logs for the clusterapi-manager-controllers pod. What did you see when you looked at those logs? [1]: https://github.com/openshift/installer/blob/v0.4.0/docs/user/troubleshooting.md#no-worker-nodes-created