Description of problem: After upgrading from 4.8.13 to 4.8.33 upgrade was success, but haproxy pod are crashloopback with error "[ALERT] 082/151218 (9) : Starting proxy health_check_http_url: cannot bind socket [:::30936]" later one container started in pod but others not $ omg get pods -A |grep haproxy openshift-vsphere-infra haproxy-esp01-66qgh-master-0 1/2 Running 367 22h openshift-vsphere-infra haproxy-esp01-66qgh-master-1 1/2 Running 355 22h openshift-vsphere-infra haproxy-esp01-66qgh-master-2 2/2 Running 365 22h Logs from haproxy pods see below: The logs showed: + declare -r haproxy_sock=/var/run/haproxy/haproxy-master.sock + declare -r haproxy_log_sock=/var/run/haproxy/haproxy-log.sock + export -f msg_handler + export -f reload_haproxy + export -f verify_old_haproxy_ps_being_deleted + rm -f /var/run/haproxy/haproxy-master.sock /var/run/haproxy/haproxy-log.sock + '[' -s /etc/haproxy/haproxy.cfg ']' + socat UNIX-RECV:/var/run/haproxy/haproxy-log.sock STDOUT + socat UNIX-LISTEN:/var/run/haproxy/haproxy-master.sock,fork 'system:bash -c msg_handler' + /usr/sbin/haproxy -W -db -f /etc/haproxy/haproxy.cfg -p /var/lib/haproxy/run/haproxy.pid <133>Mar 24 15:12:18 haproxy[9]: Proxy main started. [NOTICE] 082/151218 (9) : haproxy version is 2.2.13-5f3eb59 [NOTICE] 082/151218 (9) : path to executable is /usr/sbin/haproxy [ALERT] 082/151218 (9) : Starting proxy health_check_http_url: cannot bind socket [:::30936] <133>Mar 24 15:12:18 haproxy[9]: Proxy stats started. <133>Mar 24 15:12:18 haproxy[9]: Proxy masters started. haproxy-esp01-66qgh-master-2 Warning ProbeError 10m (x1027 over 21h) kubelet (combined from similar events): Liveness probe error: Get http://10.10.30.82:30936/haproxy_ready: read tcp 10.10.30.82:40358->10.10.30.82:30936: read: connection reset by peer haproxy-esp01-66qgh-master-0 Warning ProbeError 120m (x895 over 20h) kubelet (combined from similar events): Liveness probe error: Get http://10.10.30.81:30936/haproxy_ready: read tcp 10.10.30.81:34222->10.10.30.81:30936: read: connection reset by peer --- On checking on master nodes i can see port is getting used by SVC "Handle NodePort service elksaas-gd-ls-logstash port 30936" core@esp01-66qgh-master-0 ~]$ sudo netstat -atlpo | grep 30936 tcp6 0 0 [::]:30936 [::]:* LISTEN 547545/ovnkube off (0.00/0/0) [core@esp01-66qgh-master-1 ~]$ sudo netstat -atlpo | grep 30936 tcp6 0 0 [::]:30936 [::]:* LISTEN 2855901/ovnkube off (0.00/0/0) [core@esp01-66qgh-master-2 ~]$ sudo netstat -atlpo | grep 30936 tcp6 0 0 [::]:30936 [::]:* LISTEN 3975186/ovnkube off (0.00/0/0) --- [scripts]$ oc logs ovnkube-node-gzbqr -n openshift-ovn-kubernetes -c ovnkube-node | grep 30936 I0325 11:35:11.874496 3975186 gateway_iptables.go:45] Adding rule in table: nat, chain: OVN-KUBE-NODEPORT with args: "-p TCP -m addrtype --dst-type LOCAL --dport 30936 -j DNAT --to-destination 172.30.180.193:5035" for protocol: 0 I0325 11:35:12.225603 3975186 gateway_iptables.go:45] Adding rule in table: nat, chain: OVN-KUBE-NODEPORT with args: "-p TCP -m addrtype --dst-type LOCAL --dport 30936 -j DNAT --to-destination 172.30.180.193:5035" for protocol: 0 I0325 11:35:12.525202 3975186 port_claim.go:182] Handle NodePort service elksaas-gd-ls-logstash port 30936 I0325 11:35:12.525209 3975186 port_claim.go:40] Opening socket for service: elksaas/elksaas-gd-ls-logstash, port: 30936 and protocol TCP I0325 11:35:12.525212 3975186 port_claim.go:63] Opening socket for LocalPort "nodePort for elksaas/elksaas-gd-ls-logstash:rpa" (:30936/tcp) I0325 11:35:12.534890 3975186 gateway_iptables.go:45] Adding rule in table: nat, chain: OVN-KUBE-NODEPORT with args: "-p TCP -m addrtype --dst-type LOCAL --dport 30936 -j DNAT --to-destination 172.30.180.193:5035" for protocol: 0 Version-Release number of selected component (if applicable): 4.8.33 How reproducible: After upgrading to 4.8.33 having this issue. Steps to Reproduce: 1. upgrade from 4.8.13 to 4.8.33 2. then we will haproxy pod are trying to use port 30936 from nodePort svc range 3. Actual results: Haproxy pods using nodePort svc range port 30936 Expected results: Haproxy pods should not use port from Nodeport svc range Additional info: Haproxy config: $ oc debug node/esp01-66qgh-master-0 Starting pod/esp01-66qgh-master-0-debug ... To use host binaries, run `chroot /host` Pod IP: 10.10.30.81 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# ll /etc/haproxy/ sh: ll: command not found sh-4.4# ls -al /etc/haproxy/ total 16 drwxr-xr-x. 2 root root 25 Mar 23 16:52 . drwxr-xr-x. 102 root root 8192 Mar 26 17:42 .. -rw-r--r--. 1 root root 1158 Mar 23 17:42 haproxy.cfg sh-4.4# cat /etc/haproxy/haproxy.cfg global stats socket /var/lib/haproxy/run/haproxy.sock mode 600 level admin expose-fd listeners defaults maxconn 20000 mode tcp log /var/run/haproxy/haproxy-log.sock local0 option dontlognull retries 3 timeout http-request 30s timeout queue 1m timeout connect 10s timeout client 86400s timeout server 86400s timeout tunnel 86400s frontend main bind :::9445 v4v6 default_backend masters listen health_check_http_url bind :::30936 v4v6 mode http monitor-uri /haproxy_ready option dontlognull listen stats bind localhost:50000 mode http stats enable stats hide-version stats uri /haproxy_stats stats refresh 30s stats auth Username:Password backend masters option httpchk GET /readyz HTTP/1.0 option log-health-checks balance roundrobin server esp01-66qgh-master-0 10.10.30.81:6443 weight 1 verify none check check-ssl inter 1s fall 2 rise 3 server esp01-66qgh-master-2 10.10.30.82:6443 weight 1 verify none check check-ssl inter 1s fall 2 rise 3 server esp01-66qgh-master-1 10.10.30.83:6443 weight 1 verify none check check-ssl inter 1s fall 2 rise 3
Yeah, I just noticed this conflict a couple of weeks ago. Patches are up for 4.11 to fix it and will need to be backported to all supported releases. *** This bug has been marked as a duplicate of bug 2069740 ***