Created attachment 1525535 [details] oc describe nodes Description of problem: Both kubevirt api servers located on the node2. As a result if node2 is down api don't work (for example can't create new VM) Version-Release number of selected component (if applicable): 1.4 How reproducible: Steps to Reproduce: 1. install environment 2. verify nodes describe 3. Actual results: Node2 has both kubevirt api servers: ---------------------------------------------------- Name: cnv-executor-dshchedr-node2.example.com . . kubevirt virt-api-fd86fd5fc-llx5g 0 (0%) 0 (0%) 0 (0%) 0 (0%) kubevirt virt-api-fd86fd5fc-mdqmp 0 (0%) 0 (0%) 0 (0%) 0 (0%). ----------------------------------------------------- Expected results: One api server for each node Additional info: attached describe nodes
Update for steps to reproduce: 1) install new env. Each node has its own apiserver: Name: cnv-executor-dshchedr2-node1.example.com . kubevirt virt-api-fd86fd5fc-x7zcw 0 (0%) 0 (0%) 0 (0%) 0 (0%) Name: cnv-executor-dshchedr2-node2.example.com . kubevirt virt-api-fd86fd5fc-z6xbk 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2) disable network on the node2 and wait several minutes. Node1 has 2 virt-api and virt-controller pods Name: cnv-executor-dshchedr2-node1.example.com . kubevirt virt-api-fd86fd5fc-95kp8 0 (0%) 0 (0%) 0 (0%) 0 (0%) kubevirt virt-api-fd86fd5fc-jw7nk 0 (0%) 0 (0%) 0 (0%) 0 (0%) kubevirt virt-controller-6fccbf85fd-4nkf7 0 (0%) 0 (0%) 0 (0%) 0 (0%) kubevirt virt-controller-6fccbf85fd-xlbjz 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3) Enable network on node2. Node1 still has both api servers and controllers.
I think we can do something here with pod anti-affinity. https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#never-co-located-in-the-same-node
(In reply to Denys Shchedrivyi from comment #1) > Update for steps to reproduce: > > 1) install new env. Each node has its own apiserver: > > Name: cnv-executor-dshchedr2-node1.example.com > . > kubevirt virt-api-fd86fd5fc-x7zcw 0 > (0%) 0 (0%) 0 (0%) 0 (0%) > > > Name: cnv-executor-dshchedr2-node2.example.com > . > kubevirt virt-api-fd86fd5fc-z6xbk > 0 (0%) 0 (0%) 0 (0%) 0 (0%) > > > 2) disable network on the node2 and wait several minutes. Node1 has 2 > virt-api and virt-controller pods > Name: cnv-executor-dshchedr2-node1.example.com > . > kubevirt virt-api-fd86fd5fc-95kp8 > 0 (0%) 0 (0%) 0 (0%) 0 (0%) > kubevirt virt-api-fd86fd5fc-jw7nk > 0 (0%) 0 (0%) 0 (0%) 0 (0%) > kubevirt virt-controller-6fccbf85fd-4nkf7 > 0 (0%) 0 (0%) 0 (0%) 0 (0%) > kubevirt virt-controller-6fccbf85fd-xlbjz > 0 (0%) 0 (0%) 0 (0%) 0 (0%) > > > 3) Enable network on node2. Node1 still has both api servers and controllers. We should definitely use pod anti-affinity to avoid this problem as much as possible, however there is one high level design question here. Should we use "preferred" or "required" scheduling? If we use requiredDuringSchedulingIgnoredDuringExecution on a single node cluster (even artificially induced as in this example), then the second virt-api or virt-controller pod would forever remain in pending until a new node became available. If we use preferredDuringSchedulingIgnoredDuringExecution, then all pods would be able to run on a single node cluster (node1 as in this example). But the catch is, once a pod is running it doesn't just migrate on its own. So using "preferred" would behave exactly the same as the scenario laid out here. So what's more important? allowing two pods of virt-api/virt-controller to run, or keeping them off the same node at all costs--even if that means one never starts?
With respect to the previous question in #3: we will be targetting a 3 node cluster, so it makes sense to proceed with requiredDuringSchedulingIgnoredDuringExecution
To be more clear: We'll optimize on clusters with at least 3 nodes.
https://github.com/kubevirt/kubevirt/pull/2089
Stu: Can we move it to modify i see that PR is merged.
I can reproduce this with CNV1.4 verify with CNV2.0 step: 1. deploy OCP with 3 nodes #oc get nodes NAME STATUS ROLES AGE VERSION working-vdpf7-master-0 Ready master 10h v1.12.4+509916ce1 working-vdpf7-worker-0-2s9cp Ready worker 10h v1.12.4+509916ce1 working-vdpf7-worker-0-4zxr4 Ready worker 10h v1.12.4+509916ce1 working-vdpf7-worker-0-wvxjm Ready worker 10h v1.12.4+509916ce1 2. check virt-api in each node worker1: kubevirt virt-api-7d49b88fd5-gq489 worker2: kubevirt virt-api-7d49b88fd5-w4zjm worker3: no virt-api 3. disable network on worker2 #oc get nodes NAME STATUS ROLES AGE VERSION working-vdpf7-master-0 Ready master 11h v1.12.4+509916ce1 working-vdpf7-worker-0-2s9cp Ready worker 10h v1.12.4+509916ce1 working-vdpf7-worker-0-4zxr4 NotReady worker 10h v1.12.4+509916ce1 working-vdpf7-worker-0-wvxjm Ready worker 10h v1.12.4+509916ce1 4. check virt-api in each node worker1: kubevirt virt-api-7d49b88fd5-gq489 worker3: no virt-api
Zhe, I'm confused by your verification steps. It appears in step 4 that only one virt-api instance is listed. However, you didn't list all the nodes (there's a master node too). The anti-affinity rules imposed on virt-api and virt-controller simply indicate that they prefer not to be scheduled on the same node as other virt-api and virt-controller pods. It's still perfectly legal for virt-api to be scheduled on working-vdpf7-master-0. Is that where it was scheduled?
I re-test this with build: virt-api:v2.0.0-21 step: # oc get nodes NAME STATUS ROLES AGE VERSION working-xz8wf-master-0 Ready master 2d22h v1.12.4+509916ce1 working-xz8wf-worker-0-cdggc Ready worker 2d22h v1.12.4+509916ce1 working-xz8wf-worker-0-ffjz7 Ready worker 2d22h v1.12.4+509916ce1 working-xz8wf-worker-0-q92ch Ready worker 2d22h v1.12.4+509916ce1 # oc get pods -n kubevirt-hyperconverged -o wide virt-api-7d8df76549-2c4w4 1/1 Running 0 56m 10.129.0.30 working-xz8wf-worker-0-q92ch <none> virt-api-7d8df76549-l8jl2 1/1 Running 0 2d21h 10.130.0.23 working-xz8wf-worker-0-cdggc <none> disable network for node: working-xz8wf-worker-0-q92ch # oc get nodes NAME STATUS ROLES AGE VERSION working-xz8wf-master-0 Ready master 2d22h v1.12.4+509916ce1 working-xz8wf-worker-0-cdggc Ready worker 2d22h v1.12.4+509916ce1 working-xz8wf-worker-0-ffjz7 Ready worker 2d22h v1.12.4+509916ce1 working-xz8wf-worker-0-q92ch NotReady worker 2d22h v1.12.4+509916e1 # oc get pods -n kubevirt-hyperconverged -o wide virt-api-7d8df76549-2c4w4 1/1 Unknown 0 65m 10.129.0.30 working-xz8wf-worker-0-q92ch <none> virt-api-7d8df76549-l8jl2 1/1 Running 0 2d21h 10.130.0.23 working-xz8wf-worker-0-cdggc <none> virt-api-7d8df76549-ms826 1/1 Running 0 38s 10.131.0.49 working-xz8wf-worker-0-ffjz7 <none> enable network of node and check pods # oc get pods -n kubevirt-hyperconverged -o wide virt-api-7d8df76549-l8jl2 1/1 Running 0 2d22h 10.130.0.23 working-xz8wf-worker-0-cdggc <none> virt-api-7d8df76549-ms826 1/1 Running 0 4m19s 10.131.0.49 working-xz8wf-worker-0-ffjz7 <none>
Hi Stuart, I update my step in comment 11, please help check, thanks.
Zhe, This looks like exactly what we should expect. Given this output I'd say this bz should be marked as VERIFIED.
Very thanks Stuart's help, as comment 13, move this to verified.