Health checks are periodic connection attempts to check the healthy status of members. Unhealthy members should be marked as out of service and incoming traffic not forwarded to such members. Presently, OVN does not support health checks for load balancing members and as so traffic can be forwarded to unhealthy members. This BZ requests enhancement to OVN to support health checks for members. TCP has probably higher priority over UDP as it's more commonly used in applications.
Take
I wrongly set Assignee to me and restore to OVN component default.
This feature is backported to OVN2.12-2.12.0-19 FDN.
hi,Numan could you please give me some suggestions about how to test this feature? thanks very much!
verified on version below: [root@dell-per730-57 basic]# rpm -qa|grep ovn ovn2.11-host-2.11.1-24.el7fdp.x86_64 kernel-kernel-networking-openvswitch-ovn-common-1.0-6.noarch ovn2.11-central-2.11.1-24.el7fdp.x86_64 kernel-kernel-networking-openvswitch-ovn-basic-1.0-16.noarch kernel-kernel-networking-openvswitch-ovn-qos-1.0-1.noarch ovn2.11-2.11.1-24.el7fdp.x86_64 #add load balance on logical router uuid=`ovn-nbctl create load_balancer vips:30.0.0.1="172.16.103.11,172.16.103.12"` ovn-nbctl set load_balancer $uuid vips:'"30.0.0.1:8000"'='"172.16.103.11:80,172.16.103.12:80"' #create load balance check uuid3=`ovn-nbctl --id=@hc create Load_Balancer_Health_Check vip="30.0.0.1\:8000" -- add Load_Balancer $uuid health_check @hc` ovn-nbctl set Load_Balancer_Health_Check $uuid3 options:interval=5 options:timeout=20 options:success_count=3 options:failure_count=3 ovn-nbctl set logical_router r1 load_balancer=$uuid ovn-nbctl --wait=sb set load_balancer $uuid ip_port_mappings:172.16.103.12=hv0_vm01_vnet1:172.16.103.1 ovn-sbctl list service_monitor|grep "status.*\[\]" || ((result += 1)) rlRun "ovn-sbctl list service_monitor" [root@dell-per730-57 basic]# ovn-sbctl list service_monitor _uuid : af85a476-b867-4a64-b8a9-99fba2c2a8d0 external_ids : {} ip : "172.16.103.12" logical_port : "hv0_vm01_vnet1" options : {failure_count="3", interval="5", success_count="3", timeout="20"} port : 80 protocol : tcp src_ip : "172.16.103.1" src_mac : "de:94:9a:e5:bf:c4" status : [] for ((i=0; i<10; i++)) do vmsh run_cmd $(vm_name $hv 0) 'curl 30.0.0.1:8000 >> log.txt' done vmsh run_cmd $(vm_name $hv 0) 'cat log.txt | grep vm1' || ((result += 1)) vmsh run_cmd $(vm_name $hv 0) 'cat log.txt | grep vm2' || ((result += 1)) echo "result=$result" vmsh run_cmd $(vm_name $hv 0) 'cat log.txt' echo "result=$result" vmsh run_cmd $(vm_name 1 1) 'ip link set down dev eth1' sleep 30 rlRun "ovn-sbctl list service_monitor" [root@dell-per730-57 basic]# ovn-sbctl list service_monitor _uuid : af85a476-b867-4a64-b8a9-99fba2c2a8d0 external_ids : {} ip : "172.16.103.12" logical_port : "hv0_vm01_vnet1" options : {failure_count="3", interval="5", success_count="3", timeout="20"} port : 80 protocol : tcp src_ip : "172.16.103.1" src_mac : "de:94:9a:e5:bf:c4" status : offline rlRun "ovn-sbctl list service_monitor|grep \"status.*offline\"" ovn-sbctl list service_monitor|grep "status.*offline" || ((result += 1)) vmsh run_cmd $(vm_name 1 1) 'ip link set up dev eth1' sleep 30 rlRun "ovn-sbctl list service_monitor" rlRun "ovn-sbctl list service_monitor|grep \"status.*online\"" ovn-sbctl list service_monitor|grep "status.*online" [root@dell-per730-57 basic]# ovn-sbctl list service_monitor _uuid : af85a476-b867-4a64-b8a9-99fba2c2a8d0 external_ids : {} ip : "172.16.103.12" logical_port : "hv0_vm01_vnet1" options : {failure_count="3", interval="5", success_count="3", timeout="20"} port : 80 protocol : tcp src_ip : "172.16.103.1" src_mac : "de:94:9a:e5:bf:c4" status : online #change the src_mac [root@dell-per730-57 basic]# ovn-nbctl set NB_Global . options:svc_monitor_mac="fe:a0:65:a2:01:03" [root@dell-per730-57 basic]# ovn-sbctl list service_monitor _uuid : 89eec067-6f39-4c29-8d7b-3633de6d0bda external_ids : {} ip : "172.16.103.11" logical_port : "hv0_vm00_vnet1" options : {failure_count="3", interval="5", success_count="3", timeout="20"} port : 80 protocol : tcp src_ip : "172.16.103.1" src_mac : "fe:a0:65:a2:01:03" status : online [root@localhost ~]# tcpdump -i eth1 -e -nn -v [60764.285786] device eth1 entered promiscuous mode tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes 20:58:44.509627 fe:a0:65:a2:01:03 > 00:de:ad:00:00:01, ethertype IPv4 (0x0800), length 54: (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 40) 172.16.103.1.44719 > 172.16.103.11.80: Flags [S], cksum 0x37bf (correct), seq 1969041168, win 65160, length 0 20:58:44.509661 00:de:ad:00:00:01 > fe:a0:65:a2:01:03, ethertype IPv4 (0x0800), length 58: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 44) 172.16.103.11.80 > 172.16.103.1.44719: Flags [S.], cksum 0x264c (incorrect -> 0x4ef7), seq 1130240533, ack 1969041169, win 29200, options [mss 1460], length 0 20:58:44.510345 fe:a0:65:a2:01:03 > 00:de:ad:00:00:01, ethertype IPv4 (0x0800), length 54: (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 40) 172.16.103.1.44719 > 172.16.103.11.80: Flags [R.], cksum 0xda36 (correct), seq 2, ack 1, win 65160, length 0
sorry for the mess. here is the log for ovn2.12 [root@dell-per730-19 basic]# rpm -qa|grep ovn ovn2.12-central-2.12.0-19.el7fdp.x86_64 kernel-kernel-networking-openvswitch-ovn-common-1.0-6.noarch kernel-kernel-networking-openvswitch-ovn-basic-1.0-16.noarch ovn2.12-2.12.0-19.el7fdp.x86_64 ovn2.12-host-2.12.0-19.el7fdp.x86_64 kernel-kernel-networking-openvswitch-ovn-qos-1.0-1.noarch :: [ 21:54:48 ] :: [ BEGIN ] :: Running 'ovn-sbctl list service_monitor' _uuid : df3e1fa8-87fa-4b7f-b361-3510a867ea4f external_ids : {} ip : "172.16.103.12" logical_port : hv0_vm01_vnet1 options : {failure_count="3", interval="5", success_count="3", timeout="20"} port : 80 protocol : tcp src_ip : "172.16.103.1" src_mac : "62:d5:ba:75:8b:9f" status : offline :: [ 21:54:48 ] :: [ PASS ] :: Command 'ovn-sbctl list service_monitor' (Expected 0, got 0) :: [ 21:54:48 ] :: [ BEGIN ] :: Running 'ovn-sbctl list service_monitor|grep "status.*offline"' status : offline :: [ 21:54:48 ] :: [ PASS ] :: Command 'ovn-sbctl list service_monitor|grep "status.*offline"' (Expected 0, got 0) SYNC_NC: sync_set client test_load_balance_health_check SYNC_NC: sent "test_load_balance_health_check" to dell-per730-19.rhts.eng.pek2.redhat.com SYNC_NC: sync_wait client test_load_balance_health_check SYNC_NC: waiting "dell-per730-19.rhts.eng.pek2.redhat.com" SYNC_NC: got "test_load_balance_health_check" from dell-per730-19.rhts.eng.pek2.redhat.com :: [ 21:55:24 ] :: [ BEGIN ] :: Running 'ovn-sbctl list service_monitor' _uuid : df3e1fa8-87fa-4b7f-b361-3510a867ea4f external_ids : {} ip : "172.16.103.12" logical_port : hv0_vm01_vnet1 options : {failure_count="3", interval="5", success_count="3", timeout="20"} port : 80 protocol : tcp src_ip : "172.16.103.1" src_mac : "62:d5:ba:75:8b:9f" status : online :: [ 21:55:24 ] :: [ PASS ] :: Command 'ovn-sbctl list service_monitor' (Expected 0, got 0) :: [ 21:55:24 ] :: [ BEGIN ] :: Running 'ovn-sbctl list service_monitor|grep "status.*online"' status : online :: [ 21:55:24 ] :: [ PASS ] :: Command 'ovn-sbctl list service_monitor|grep "status.*online"' (Expected 0, got 0) :: [ 21:58:06 ] :: [ BEGIN ] :: Running 'ovn-sbctl list service_monitor' _uuid : caf48d03-c312-4d54-9316-f75c6afc76a1 external_ids : {} ip : "172.16.103.11" logical_port : hv0_vm00_vnet1 options : {failure_count="3", interval="5", success_count="3", timeout="20"} port : 80 protocol : tcp src_ip : "172.16.103.1" src_mac : "62:d5:ba:75:8b:9f" status : online :: [ 21:58:06 ] :: [ PASS ] :: Command 'ovn-sbctl list service_monitor' (Expected 0, got 0) :: [ 21:58:06 ] :: [ BEGIN ] :: Running 'ovn-sbctl list service_monitor|grep "status.*online"' status : online :: [ 21:58:06 ] :: [ PASS ] :: Command 'ovn-sbctl list service_monitor|grep "status.*online"' (Expected 0, got 0) :: [ 21:58:06 ] :: [ BEGIN ] :: Running 'ovn-sbctl list service_monitor|grep "fe:a0:65:a2:01:03"' src_mac : "fe:a0:65:a2:01:03" :: [ 21:58:06 ] :: [ PASS ] :: Command 'ovn-sbctl list service_monitor|grep "fe:a0:65:a2:01:03"' (Expected 0, got 0) [root@localhost ~]# tcpdump -r a.pcap -e -nn -v|grep fe:a0:65:a2:01:03 reading from file a.pcap, link-type EN10MB (Ethernet) 21:58:14.388988 fe:a0:65:a2:01:03 > 00:de:ad:00:00:01, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 172.16.103.1 is-at fe:a0:65:a2:01:03, length 28 21:58:15.390830 fe:a0:65:a2:01:03 > 00:de:ad:00:00:01, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 172.16.103.1 is-at fe:a0:65:a2:01:03, length 28 21:58:16.780564 00:de:ad:00:00:01 > fe:a0:65:a2:01:03, ethertype IPv4 (0x0800), length 58: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 44) 21:58:16.781260 fe:a0:65:a2:01:03 > 00:de:ad:00:00:01, ethertype IPv4 (0x0800), length 54: (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 40) 21:58:21.786355 fe:a0:65:a2:01:03 > 00:de:ad:00:00:01, ethertype IPv4 (0x0800), length 54: (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 40) 21:58:21.786385 00:de:ad:00:00:01 > fe:a0:65:a2:01:03, ethertype IPv4 (0x0800), length 58: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 44) 21:58:21.786752 fe:a0:65:a2:01:03 > 00:de:ad:00:00:01, ethertype IPv4 (0x0800), length 54: (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 40) 21:58:26.791913 fe:a0:65:a2:01:03 > 00:de:ad:00:00:01, ethertype IPv4 (0x0800), length 54: (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 40) 21:58:26.791941 00:de:ad:00:00:01 > fe:a0:65:a2:01:03, ethertype IPv4 (0x0800), length 58: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 44) 21:58:26.792370 fe:a0:65:a2:01:03 > 00:de:ad:00:00:01, ethertype IPv4 (0x0800), length 54: (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 40) 21:58:31.797498 fe:a0:65:a2:01:03 > 00:de:ad:00:00:01, ethertype IPv4 (0x0800), length 54: (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 40) 21:58:31.797524 00:de:ad:00:00:01 > fe:a0:65:a2:01:03, ethertype IPv4 (0x0800), length 58: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 44) 21:58:31.797948 fe:a0:65:a2:01:03 > 00:de:ad:00:00:01, ethertype IPv4 (0x0800), length 54: (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 40)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0167
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days