Bug 1671511 - Both kubevirt apiservers located on one node
Summary: Both kubevirt apiservers located on one node
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 1.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 2.0
Assignee: sgott
QA Contact: zhe peng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-01-31 19:00 UTC by Denys Shchedrivyi
Modified: 2019-10-22 12:33 UTC (History)
7 users (show)

Fixed In Version: virt-api-container-v2.0.0-21
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-22 12:33:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
oc describe nodes (24.47 KB, text/plain)
2019-01-31 19:00 UTC, Denys Shchedrivyi
no flags Details

Description Denys Shchedrivyi 2019-01-31 19:00:36 UTC
Created attachment 1525535 [details]
oc describe nodes

Description of problem:
 Both kubevirt api servers located on the node2. As a result if node2 is down api don't work (for example can't create new VM)

Version-Release number of selected component (if applicable):
1.4

How reproducible:


Steps to Reproduce:
1. install environment
2. verify nodes describe
3.

Actual results:
Node2 has both kubevirt api servers:

----------------------------------------------------
Name:               cnv-executor-dshchedr-node2.example.com
.
.
  kubevirt                   virt-api-fd86fd5fc-llx5g                 0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kubevirt                   virt-api-fd86fd5fc-mdqmp                 0 (0%)        0 (0%)      0 (0%)           0 (0%).
-----------------------------------------------------


Expected results:
One api server for each node


Additional info:

attached describe nodes

Comment 1 Denys Shchedrivyi 2019-01-31 21:08:57 UTC
Update for steps to reproduce:

1) install new env. Each node has its own apiserver:

Name:               cnv-executor-dshchedr2-node1.example.com
.
  kubevirt                   virt-api-fd86fd5fc-x7zcw                     0 (0%)        0 (0%)      0 (0%)           0 (0%)


Name:               cnv-executor-dshchedr2-node2.example.com
.
  kubevirt                   virt-api-fd86fd5fc-z6xbk                       0 (0%)        0 (0%)      0 (0%)           0 (0%)


2) disable network on the node2 and wait several minutes. Node1 has 2 virt-api and virt-controller pods
Name:               cnv-executor-dshchedr2-node1.example.com
.
  kubevirt                   virt-api-fd86fd5fc-95kp8                       0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kubevirt                   virt-api-fd86fd5fc-jw7nk                       0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kubevirt                   virt-controller-6fccbf85fd-4nkf7               0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kubevirt                   virt-controller-6fccbf85fd-xlbjz               0 (0%)        0 (0%)      0 (0%)           0 (0%)


3) Enable network on node2. Node1 still has both api servers and controllers.

Comment 2 Fabian Deutsch 2019-02-04 20:46:03 UTC
I think we can do something here with pod anti-affinity.

https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#never-co-located-in-the-same-node

Comment 3 sgott 2019-02-07 14:39:14 UTC
(In reply to Denys Shchedrivyi from comment #1)
> Update for steps to reproduce:
> 
> 1) install new env. Each node has its own apiserver:
> 
> Name:               cnv-executor-dshchedr2-node1.example.com
> .
>   kubevirt                   virt-api-fd86fd5fc-x7zcw                     0
> (0%)        0 (0%)      0 (0%)           0 (0%)
> 
> 
> Name:               cnv-executor-dshchedr2-node2.example.com
> .
>   kubevirt                   virt-api-fd86fd5fc-z6xbk                      
> 0 (0%)        0 (0%)      0 (0%)           0 (0%)
> 
> 
> 2) disable network on the node2 and wait several minutes. Node1 has 2
> virt-api and virt-controller pods
> Name:               cnv-executor-dshchedr2-node1.example.com
> .
>   kubevirt                   virt-api-fd86fd5fc-95kp8                      
> 0 (0%)        0 (0%)      0 (0%)           0 (0%)
>   kubevirt                   virt-api-fd86fd5fc-jw7nk                      
> 0 (0%)        0 (0%)      0 (0%)           0 (0%)
>   kubevirt                   virt-controller-6fccbf85fd-4nkf7              
> 0 (0%)        0 (0%)      0 (0%)           0 (0%)
>   kubevirt                   virt-controller-6fccbf85fd-xlbjz              
> 0 (0%)        0 (0%)      0 (0%)           0 (0%)
> 
> 
> 3) Enable network on node2. Node1 still has both api servers and controllers.

We should definitely use pod anti-affinity to avoid this problem as much as possible, however there is one high level design question here. Should we use "preferred" or "required" scheduling? If we use requiredDuringSchedulingIgnoredDuringExecution on a single node cluster (even artificially induced as in this example), then the second virt-api or virt-controller pod would forever remain in pending until a new node became available. If we use preferredDuringSchedulingIgnoredDuringExecution, then all pods would be able to run on a single node cluster (node1 as in this example).

But the catch is, once a pod is running it doesn't just migrate on its own. So using "preferred" would behave exactly the same as the scenario laid out here.

So what's more important? allowing two pods of virt-api/virt-controller to run, or keeping them off the same node at all costs--even if that means one never starts?

Comment 4 sgott 2019-02-26 20:41:57 UTC
With respect to the previous question in #3: we will be targetting a 3 node cluster, so it makes sense to proceed with 
requiredDuringSchedulingIgnoredDuringExecution

Comment 5 Fabian Deutsch 2019-02-27 11:00:36 UTC
To be more clear: We'll optimize on clusters with at least 3 nodes.

Comment 6 Fabian Deutsch 2019-04-02 12:03:45 UTC
https://github.com/kubevirt/kubevirt/pull/2089

Comment 7 Israel Pinto 2019-04-22 08:13:56 UTC
Stu: 
Can we move it to modify i see that PR is merged.

Comment 8 zhe peng 2019-04-26 07:26:41 UTC
I can reproduce this with CNV1.4
verify with CNV2.0
step:
1. deploy OCP with 3 nodes
#oc get nodes
NAME                           STATUS    ROLES     AGE       VERSION
working-vdpf7-master-0         Ready     master    10h       v1.12.4+509916ce1
working-vdpf7-worker-0-2s9cp   Ready     worker    10h       v1.12.4+509916ce1
working-vdpf7-worker-0-4zxr4   Ready     worker    10h       v1.12.4+509916ce1
working-vdpf7-worker-0-wvxjm   Ready     worker    10h       v1.12.4+509916ce1

2. check virt-api in each node
worker1:
kubevirt                                virt-api-7d49b88fd5-gq489
worker2:
kubevirt                                virt-api-7d49b88fd5-w4zjm
worker3:
no virt-api
3. disable network on worker2
#oc get nodes
NAME                           STATUS     ROLES     AGE       VERSION
working-vdpf7-master-0         Ready      master    11h       v1.12.4+509916ce1
working-vdpf7-worker-0-2s9cp   Ready      worker    10h       v1.12.4+509916ce1
working-vdpf7-worker-0-4zxr4   NotReady   worker    10h       v1.12.4+509916ce1
working-vdpf7-worker-0-wvxjm   Ready      worker    10h       v1.12.4+509916ce1

4. check virt-api in each node
worker1:
kubevirt                                virt-api-7d49b88fd5-gq489
worker3:
no virt-api

Comment 10 sgott 2019-04-30 16:25:30 UTC
Zhe,

I'm confused by your verification steps. It appears in step 4 that only one virt-api instance is listed. However, you didn't list all the nodes (there's a master node too).

The anti-affinity rules imposed on virt-api and virt-controller simply indicate that they prefer not to be scheduled on the same node as other virt-api and virt-controller pods. It's still perfectly legal for virt-api to be scheduled on working-vdpf7-master-0. Is that where it was scheduled?

Comment 11 zhe peng 2019-05-09 10:21:19 UTC
I re-test this with build:
virt-api:v2.0.0-21

step:
# oc get nodes
NAME                           STATUS    ROLES     AGE       VERSION
working-xz8wf-master-0         Ready     master    2d22h     v1.12.4+509916ce1
working-xz8wf-worker-0-cdggc   Ready     worker    2d22h     v1.12.4+509916ce1
working-xz8wf-worker-0-ffjz7   Ready     worker    2d22h     v1.12.4+509916ce1
working-xz8wf-worker-0-q92ch   Ready     worker    2d22h     v1.12.4+509916ce1

# oc get pods -n kubevirt-hyperconverged -o wide
virt-api-7d8df76549-2c4w4                          1/1       Running   0          56m       10.129.0.30   working-xz8wf-worker-0-q92ch   <none>
virt-api-7d8df76549-l8jl2                          1/1       Running   0          2d21h     10.130.0.23   working-xz8wf-worker-0-cdggc   <none>

disable network for node: working-xz8wf-worker-0-q92ch

# oc get nodes
NAME                           STATUS     ROLES     AGE       VERSION
working-xz8wf-master-0         Ready      master    2d22h     v1.12.4+509916ce1
working-xz8wf-worker-0-cdggc   Ready      worker    2d22h     v1.12.4+509916ce1
working-xz8wf-worker-0-ffjz7   Ready      worker    2d22h     v1.12.4+509916ce1
working-xz8wf-worker-0-q92ch   NotReady   worker    2d22h     v1.12.4+509916e1

# oc get pods -n kubevirt-hyperconverged -o wide
virt-api-7d8df76549-2c4w4                          1/1       Unknown             0          65m       10.129.0.30   working-xz8wf-worker-0-q92ch   <none>
virt-api-7d8df76549-l8jl2                          1/1       Running             0          2d21h     10.130.0.23   working-xz8wf-worker-0-cdggc   <none>
virt-api-7d8df76549-ms826                          1/1       Running             0          38s       10.131.0.49   working-xz8wf-worker-0-ffjz7   <none>

enable network of node and check pods
# oc get pods -n kubevirt-hyperconverged -o wide
virt-api-7d8df76549-l8jl2                          1/1       Running       0          2d22h     10.130.0.23   working-xz8wf-worker-0-cdggc   <none>
virt-api-7d8df76549-ms826                          1/1       Running       0          4m19s     10.131.0.49   working-xz8wf-worker-0-ffjz7   <none>

Comment 12 zhe peng 2019-05-09 10:25:04 UTC
Hi Stuart,
I update my step in comment 11, please help check, thanks.

Comment 13 sgott 2019-05-09 11:50:28 UTC
Zhe,

This looks like exactly what we should expect. Given this output I'd say this bz should be marked as VERIFIED.

Comment 14 zhe peng 2019-05-10 03:11:14 UTC
Very thanks Stuart's help, as comment 13, move this to verified.


Note You need to log in before you can comment on or make changes to this bug.