** A NOTE ABOUT USING URGENT ** This BZ has been set to urgent severity and priority. When a BZ is marked urgent priority Engineers are asked to stop whatever they are doing, putting everything else on hold. Please be prepared to have reasonable justification ready to discuss, and ensure your own and engineering management are aware and agree this BZ is urgent. Keep in mind, urgent bugs are very expensive and have maximal management visibility. NOTE: This bug was automatically assigned to an engineering manager with the severity reset to *unspecified* until the emergency is vetted and confirmed. Please do not manually override the severity. ** INFORMATION REQUIRED ** Please answer these questions before escalation to engineering: 1. Has a link to must-gather output been provided in this BZ? We cannot work without. If must-gather fails to run, attach all relevant logs and provide the error message of must-gather. 2. Give the output of "oc get clusteroperators -o yaml". 3. In case of degraded/unavailable operators, have all their logs and the logs of the operands been analyzed [yes/no] 4. List the top 5 relevant errors from the logs of the operators and operands in (3). 5. Order the list of degraded/unavailable operators according to which is likely the cause of the failure of the other, root-cause at the top. 6. Explain why (5) is likely the right order and list the information used for that assessment. 7. Explain why Engineering is necessary to make progress.
Another instance: https://bugzilla.redhat.com/show_bug.cgi?id=2006548. Is the customer using the hard-stop-after annotation and, if so, with what value? I am currently investigating 2006548 and may mark this as a duplicate later in the day if the behaviour of OCP 4.6 and 4.8 are similar, though I do note the description explicitly talks about a behavioural change from 4.8.3 to 4.8.10.
I am currently running: $ oc version Client Version: 4.8.0 Server Version: 4.8.11 Kubernetes Version: v1.21.1+9807387 I have a version of the router that is not currently bound to privileged port numbers - this so I can run lsof in the router pod and identify the established connections. My cluster is quiet. NAME READY STATUS RESTARTS AGE router-copy-c99657fc9-8r84z 2/2 Running 0 23m router-default-6d77dbc8b7-n5ljl 2/2 Running 0 42m $ oc rsh router-copy-c99657fc9-8r84z Defaulted container "router" out of: router, logs sh-4.4$ pgrep -a haproxy 57 /usr/sbin/haproxy -f /var/lib/haproxy/conf/haproxy.config -p /var/lib/haproxy/run/haproxy.pid -x /var/lib/haproxy/run/haproxy.sock -sf 20 A single haproxy instance with the following established TCP connections: sh-4.4$ lsof -L -n -p 57 | grep TCP lsof: WARNING: can't stat() cgroup file system /sys/fs/cgroup/systemd Output information may be incomplete. lsof: WARNING: can't stat() cgroup file system /sys/fs/cgroup/rdma Output information may be incomplete. [snipped repeated WARNINGs] haproxy 57 1000620000 5u IPv4 2631687 0t0 TCP *:webcache (LISTEN) haproxy 57 1000620000 6u IPv4 2631688 0t0 TCP *:pcsync-https (LISTEN) haproxy 57 1000620000 27u IPv4 2994165 0t0 TCP 10.128.2.1:40678->10.129.0.22:pcsync-https (ESTABLISHED) haproxy 57 1000620000 29u IPv4 2922580 0t0 TCP 10.128.2.1:47318->10.130.0.4:pcsync-https (ESTABLISHED) If I now connect to the console via Chrome, and wait ~30s I see: sh-4.4$ lsof -L -n -p 57 | grep TCP lsof: WARNING: can't stat() cgroup file system /sys/fs/cgroup/systemd Output information may be incomplete. lsof: WARNING: can't stat() cgroup file system /sys/fs/cgroup/rdma Output information may be incomplete. [snipped repeated WARNINGs] haproxy 57 1000620000 5u IPv4 2631687 0t0 TCP *:webcache (LISTEN) haproxy 57 1000620000 6u IPv4 2631688 0t0 TCP *:pcsync-https (LISTEN) haproxy 57 1000620000 23u IPv4 3025322 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:40434 (ESTABLISHED) haproxy 57 1000620000 27u IPv4 3024525 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:40866 (ESTABLISHED) haproxy 57 1000620000 28u IPv4 3029018 0t0 TCP 10.128.2.1:42546->10.129.0.22:pcsync-https (ESTABLISHED) haproxy 57 1000620000 29u IPv4 3024343 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:40446 (ESTABLISHED) haproxy 57 1000620000 30u IPv4 3024345 0t0 TCP 10.128.2.1:51296->10.130.0.23:sun-sr-https (ESTABLISHED) haproxy 57 1000620000 31u IPv4 3025342 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:40518 (ESTABLISHED) haproxy 57 1000620000 34u IPv4 3024358 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:40522 (ESTABLISHED) haproxy 57 1000620000 35u IPv4 3024359 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:40524 (ESTABLISHED) haproxy 57 1000620000 37u IPv4 3024362 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:40526 (ESTABLISHED) haproxy 57 1000620000 43u IPv4 3026348 0t0 TCP 10.128.2.1:42556->10.129.0.22:pcsync-https (ESTABLISHED) haproxy 57 1000620000 44u IPv4 3024385 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:40536 (ESTABLISHED) haproxy 57 1000620000 45u IPv4 3024443 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:40610 (ESTABLISHED) haproxy 57 1000620000 46u IPv4 3024472 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:40654 (ESTABLISHED) haproxy 57 1000620000 49u IPv4 3024390 0t0 TCP 10.128.2.1:42404->10.129.0.22:pcsync-https (ESTABLISHED) haproxy 57 1000620000 50u IPv4 3022806 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:40598 (ESTABLISHED) haproxy 57 1000620000 53u IPv4 3029021 0t0 TCP 10.128.2.1:42558->10.129.0.22:pcsync-https (ESTABLISHED) haproxy 57 1000620000 54u IPv4 3024432 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:40602 (ESTABLISHED) haproxy 57 1000620000 57u IPv4 3024437 0t0 TCP 10.128.2.1:42412->10.129.0.22:pcsync-https (ESTABLISHED) haproxy 57 1000620000 60u IPv4 3025415 0t0 TCP 10.128.2.1:42414->10.129.0.22:pcsync-https (ESTABLISHED) haproxy 57 1000620000 61u IPv4 3024450 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:40638 (ESTABLISHED) haproxy 57 1000620000 64u IPv4 3024454 0t0 TCP 10.128.2.1:42420->10.129.0.22:pcsync-https (ESTABLISHED) haproxy 57 1000620000 65u IPv4 3024459 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:40648 (ESTABLISHED) haproxy 57 1000620000 68u IPv4 3024462 0t0 TCP 10.128.2.1:42422->10.129.0.22:pcsync-https (ESTABLISHED) haproxy 57 1000620000 71u IPv4 3024477 0t0 TCP 10.128.2.1:42424->10.129.0.22:pcsync-https (ESTABLISHED) haproxy 57 1000620000 72u IPv4 3024478 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:40656 (ESTABLISHED) haproxy 57 1000620000 75u IPv4 3025418 0t0 TCP 10.128.2.1:42426->10.129.0.22:pcsync-https (ESTABLISHED) haproxy 57 1000620000 76u IPv4 3029608 0t0 TCP 10.128.2.1:42698->10.129.0.22:pcsync-https (ESTABLISHED) haproxy 57 1000620000 79u IPv4 3025449 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:40786 (ESTABLISHED) haproxy 57 1000620000 81u IPv4 3024501 0t0 TCP 10.128.2.1:42446->10.129.0.22:pcsync-https (ESTABLISHED) haproxy 57 1000620000 82u IPv4 3025458 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:40796 (ESTABLISHED) haproxy 57 1000620000 85u IPv4 3024505 0t0 TCP 10.128.2.1:42450->10.129.0.22:pcsync-https (ESTABLISHED) haproxy 57 1000620000 86u IPv4 3025465 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:40820 (ESTABLISHED) haproxy 57 1000620000 89u IPv4 3024514 0t0 TCP 10.128.2.1:42454->10.129.0.22:pcsync-https (ESTABLISHED) haproxy 57 1000620000 92u IPv4 3024530 0t0 TCP 10.128.2.1:42466->10.129.0.22:pcsync-https (ESTABLISHED) haproxy 57 1000620000 93u IPv4 3024541 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:40898 (ESTABLISHED) haproxy 57 1000620000 96u IPv4 3024546 0t0 TCP 10.128.2.1:42468->10.129.0.22:pcsync-https (ESTABLISHED) haproxy 57 1000620000 97u IPv4 3024555 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:40906 (ESTABLISHED) haproxy 57 1000620000 100u IPv4 3024560 0t0 TCP 10.128.2.1:42470->10.129.0.22:pcsync-https (ESTABLISHED) haproxy 57 1000620000 101u IPv4 3024561 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:40908 (ESTABLISHED) haproxy 57 1000620000 104u IPv4 3024566 0t0 TCP 10.128.2.1:42472->10.129.0.22:pcsync-https (ESTABLISHED) haproxy 57 1000620000 105u IPv4 3024577 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:40944 (ESTABLISHED) haproxy 57 1000620000 108u IPv4 3024582 0t0 TCP 10.128.2.1:42476->10.129.0.22:pcsync-https (ESTABLISHED) haproxy 57 1000620000 109u IPv4 3024583 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:40960 (ESTABLISHED) haproxy 57 1000620000 112u IPv4 3024588 0t0 TCP 10.128.2.1:42478->10.129.0.22:pcsync-https (ESTABLISHED) haproxy 57 1000620000 113u IPv4 3024594 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:41000 (ESTABLISHED) haproxy 57 1000620000 116u IPv4 3024599 0t0 TCP 10.128.2.1:42480->10.129.0.22:pcsync-https (ESTABLISHED) haproxy 57 1000620000 117u IPv4 3024606 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:41024 (ESTABLISHED) haproxy 57 1000620000 120u IPv4 3024611 0t0 TCP 10.128.2.1:42486->10.129.0.22:pcsync-https (ESTABLISHED) haproxy 57 1000620000 124u IPv4 3024686 0t0 TCP 10.128.2.1:53512->10.130.0.4:pcsync-https (ESTABLISHED) Is this number of connections expected for a single connection via my browser? If I delete my browser tab and repeatedly run the following we see the number of established connection diminish (to almost zero): sh-4.4$ lsof -L -n -p 57 | grep TCP haproxy 57 1000620000 5u IPv4 2631687 0t0 TCP *:webcache (LISTEN) haproxy 57 1000620000 6u IPv4 2631688 0t0 TCP *:pcsync-https (LISTEN) haproxy 57 1000620000 23u IPv4 3033253 0t0 TCP 10.128.2.1:42912->10.129.0.22:pcsync-https (ESTABLISHED) haproxy 57 1000620000 25u IPv4 3047195 0t0 TCP 10.128.2.1:43660->10.129.0.22:pcsync-https (ESTABLISHED) haproxy 57 1000620000 26u IPv4 3046032 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:56424 (ESTABLISHED) haproxy 57 1000620000 28u IPv4 3029018 0t0 TCP 10.128.2.1:42546->10.129.0.22:pcsync-https (ESTABLISHED) haproxy 57 1000620000 31u IPv4 3025342 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:40518 (ESTABLISHED) haproxy 57 1000620000 34u IPv4 3024358 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:40522 (ESTABLISHED) haproxy 57 1000620000 35u IPv4 3024359 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:40524 (ESTABLISHED) haproxy 57 1000620000 37u IPv4 3024362 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:40526 (ESTABLISHED) haproxy 57 1000620000 43u IPv4 3048643 0t0 TCP 10.128.2.1:43744->10.129.0.22:pcsync-https (ESTABLISHED) haproxy 57 1000620000 50u IPv4 3049603 0t0 TCP 10.128.2.1:43742->10.129.0.22:pcsync-https (ESTABLISHED) haproxy 57 1000620000 53u IPv4 3048639 0t0 TCP 192.168.7.181:pcsync-https->192.168.7.164:57400 (ESTABLISHED) haproxy 57 1000620000 76u IPv4 3029608 0t0 TCP 10.128.2.1:42698->10.129.0.22:pcsync-https (ESTABLISHED) Approximately 1 minute later (maybe less) zero established connections: sh-4.4$ lsof -L -n -p 57 | grep TCP haproxy 57 1000620000 5u IPv4 2631687 0t0 TCP *:webcache (LISTEN) haproxy 57 1000620000 6u IPv4 2631688 0t0 TCP *:pcsync-https (LISTEN) So it would seem all/most/lots of those connections are all associated with the console. Multiplying this up by lots of users and the known issues we have with websocket connections and haproxy reloads we will have lots of outstanding processes that cannot be terminated. Next steps: I will stand up OCP v4.8.3 and repeat the experiment. We have not bumped the version of haproxy in any .z release of 4.8.
If adjusting the router's reload interval helps operationally then the next steps are to revert that change and apply a new change to understand why we are reloading so often. Let's change the router's default logging level from 2 to 5; the goal is to capture more debug output that the openshift-router can emit and, with some post-processing, understand what changes occur on either routes, endpoints, or services that will necessitate a reload. Steps: - Set a CVO override so CVO stops managing the ingress operator. $ oc patch clusterversions/version --type=json --patch='[{"op":"add","path":"/spec/overrides","value":[{"kind":"Deployment","group":"apps/v1","name":"ingress-operator","namespace":"openshift-ingress-operator","unmanaged":true}]}]' - Scale down the ingress-operator so that we can alter the router-default deployment $ oc scale --replicas 0 -n openshift-ingress-operator deployments ingress-operator - patch the logging level from 2->5 for the router container $ oc patch deployment/router-default -n openshift-ingress --patch='{"spec":{"template":{"spec":{"$setElementOrder/containers":[{"name":"router"},{"name":"logs"}],"containers":[{"command":["/usr/bin/openshift-router","--v=5"],"name":"router"}]}}}}' This will now log a significant amount. If we could collect the output from all the router pods (i.e., the "router" container) then we can post-process the information to try and understand what is driving frequent reloads. If would be helpful is access logging is also enabled. Please attach both the 'router' and 'logs' container output from all router pods. If we could let this run at this log level for, say, 15 mins (or more) then that would be helpful. If you've since altered the hard-stop-after period then let's collect at log level 5 for the hard-stop-after duration plus an additional 5 minutes. Equally, we should not leave this patch enabled indefinitely as it may affect overall performance.
Hi Andrew, In an another case (https://access.redhat.com/support/cases/internal/#/case/03058396) of mine, one of my customer is also facing the same issue and OCP version is same as 4.8.
(In reply to Jitendra Pradhan from comment #35) > In an another case > (https://access.redhat.com/support/cases/internal/#/case/03058396) of mine, > one of my customer is also facing the same issue and OCP version is same as > 4.8. Please try the workaround: oc -n openshift-ingress-operator patch ingresscontroller/default --type=merge --patch='{"spec":{"unsupportedConfigOverrides":{"loadBalancingAlgorithm":"leastconn"}}}' Let us know whether that works for this customer.
If possible, could you also provide details about the number of HAProxy processes, the memory per HAProxy process, and the haproxy.config file from one of the router pods for this new case?
I built haproxy-2.2 with https://pagure.io/glibc-malloc-trace-utils In my sample config I have 4004 backends: $ grep -c -e '^backend ' haproxy.cfg 4004 Run with the configuration but don't fork: $ ../../haproxy-2.2/haproxy -f ./haproxy.cfg -d -V Inspect the allocations made: $ ~/glibc-malloc-trace-utils/trace_allocs /tmp/mtrace.mtr.27404 | sort -n | uniq -c > /tmp/random I then swapped "balance random" for "balance leastconn" $ ../../haproxy-2.2/haproxy -f ./haproxy.cfg -d -V $ ~/glibc-malloc-trace-utils/trace_allocs /tmp/mtrace.mtr.27727 | sort -n | uniq -c > /tmp/leastconn Looking at the diff (fully expanded later): $ diff -y -w132 /tmp/leastconn /tmp/random 1 160112 1 160112 > 4004 196608 2 640448 2 640448 All other allocations appear to be identical apart from 4004 at 199608 bytes. Those 4004 allocations comes from chash_init_server_tree() which appears to be called only when "balance random" is chosen: void chash_init_server_tree(struct proxy *p)w { struct server *srv; struct eb_root init_head = EB_ROOT; int node; p->lbprm.set_server_status_up = chash_set_server_status_up;on point. p->lbprm.set_server_status_down = chash_set_server_status_down; p->lbprm.update_server_eweight = chash_update_server_weight; p->lbprm.server_take_conn = NULL; p->lbprm.server_drop_conn = NULL; p->lbprm.wdiv = BE_WEIGHT_SCALE; for (srv = p->srv; srv; srv = srv->next) { srv->next_eweight = (srv->uweight * p->lbprm.wdiv + p->lbprm.wmult - 1) / p->lbprm.wmult; srv_lb_commit_status(srv); } recount_servers(p); update_backend_weight(p); p->lbprm.chash.act = init_head; p->lbprm.chash.bck = init_head; p->lbprm.chash.last = NULL; /* queue active and backup servers in two distinct groups */ for (srv = p->srv; srv; srv = srv->next) { srv->lb_tree = (srv->flags & SRV_F_BACKUP) ? &p->lbprm.chash.bck : &p->lbprm.chash.act; srv->lb_nodes_tot = srv->uweight * BE_WEIGHT_SCALE; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The weight in our backends is 256 multiplied by BE_WEIGHT_SCALE (i.e., 16). The huge memory growth with "random" versus "leastconn" comes from this single allocation point as it is called for each backend. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ srv->lb_nodes_now = 0; srv->lb_nodes = calloc(srv->lb_nodes_tot, sizeof(struct tree_occ)); for (node = 0; node < srv->lb_nodes_tot; node++) { srv->lb_nodes[node].server = srv; srv->lb_nodes[node].node.key = full_hash(srv->puid * SRV_EWGHT_RANGE + node); } if (srv_currently_usable(srv)) chash_queue_dequeue_srv(srv); } } -diff------------------------------- 10055 1 10055 1 62 2 62 2 32 3 32 3 4018 4 4018 4 12 5 12 5 16 6 16 6 18 7 18 7 52 8 52 8 15 9 15 9 7 10 7 10 13 11 13 11 60 12 60 12 23 13 23 13 12070 14 12070 14 15 15 15 15 26198 16 26198 16 13 17 13 17 4011 18 4011 18 5 19 5 19 29 20 29 20 17 21 17 21 3 22 3 22 32 23 32 23 1257 24 1257 24 105 25 105 25 900 26 900 26 2007 27 2007 27 10 28 10 28 9 29 9 29 90 30 90 30 901 31 901 31 2825 32 2825 32 8026 33 8026 33 122 34 122 34 1170 35 1170 35 2713 36 2713 36 21 37 21 37 189 38 189 38 1923 39 1923 39 1465 40 1465 40 2701 41 2701 41 6 42 6 42 7 44 7 44 3 46 3 46 20 47 20 47 10271 48 10271 48 1804 49 1804 49 11 50 11 50 1 51 1 51 6 52 6 52 3 54 3 54 10153 56 10153 56 5 60 5 60 2 63 2 63 30 64 30 64 1 66 1 66 2 68 2 68 3 70 3 70 4053 72 4053 72 18 73 18 73 182 74 182 74 1804 75 1804 75 4 76 4 76 20 78 20 78 184 79 184 79 11830 80 11830 80 10 81 10 81 1 84 1 84 1 86 1 86 1 87 1 87 44 88 44 88 4 91 4 91 9 93 9 93 94 94 94 94 900 95 900 95 12057 96 12057 96 1 97 1 97 11 98 11 98 91 99 91 99 900 100 900 100 2002 101 2002 101 48 104 48 104 1 106 1 106 3 107 3 107 10024 108 10024 108 24 112 24 112 4 115 4 115 4 119 4 119 70 120 70 120 1 122 1 122 1 124 1 124 1 125 1 125 1 126 1 126 24 128 24 128 1 132 1 132 1 138 1 138 1 142 1 142 11 151 11 151 10059 152 10059 152 4013 160 4013 160 1 166 1 166 11 168 11 168 1 170 1 170 54 176 54 176 1 180 1 180 4 199 4 199 4 200 4 200 1 204 1 204 1 205 1 205 1 213 1 213 20 224 20 224 18 225 18 225 181 227 181 227 1804 229 1804 229 2 231 2 231 4011 232 4011 232 18 235 18 235 180 237 180 237 1800 239 1800 239 1 240 1 240 6 241 6 241 2 245 2 245 1 248 1 248 68 256 68 256 4 261 4 261 34 264 34 264 9 265 9 265 94 267 94 267 900 269 900 269 4 270 4 270 2001 271 2001 271 9 275 9 275 90 277 90 277 900 279 900 279 2001 281 2001 281 1 304 1 304 20 336 20 336 11 352 11 352 3 376 3 376 24 436 24 436 23 472 23 472 1 480 1 480 1 488 1 488 1 496 1 496 60 504 60 504 6 512 6 512 2 526 2 526 14 536 14 536 29 608 29 608 18 640 18 640 180 648 180 648 1804 656 1804 656 2 664 2 664 18 680 18 680 1 684 1 684 180 688 180 688 1800 696 1800 696 6 704 6 704 2 770 2 770 4 784 4 784 9 800 9 800 94 808 94 808 900 816 900 816 2001 824 2001 824 9 840 9 840 90 848 90 848 900 856 900 856 2001 864 2001 864 24 868 24 868 1 1016 1 1016 12 1024 12 1024 8 1025 8 1025 20 1028 20 1028 1 1053 1 1053 1 1055 1 1055 2 1100 2 1100 1 1192 1 1192 1 1194 1 1194 24 1216 24 1216 3 1217 3 1217 3 1219 3 1219 8 1306 8 1306 4 1333 4 1333 4 1335 4 1335 8 1380 8 1380 24 1648 24 1648 1 2048 1 2048 20 2104 20 2104 3 2204 3 2204 3 2208 3 2208 18 2256 18 2256 1 2276 1 2276 1 2400 1 2400 1 3056 1 3056 4007 3448 4007 3448 29887 4096 29887 4096 20 5952 20 5952 4009 6296 4009 6296 3 16384 3 16384 29 32768 29 32768 1 160112 1 160112 > 4004 196608 2 640448 2 640448 1 1048576 1 1048576 1 2561792 1 2561792
Verified in "4.10.0-0.nightly-2021-10-23-225921" release. With this payload, it is observed that the LB algorithm now defaults to "leastconn": ------- oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2021-10-23-225921 True False 52m Cluster version is 4.10.0-0.nightly-2021-10-23-225921 oc -n openshift-ingress get deployment router-default -o yaml | grep -i ROUTER_LOAD_BALANCE_ALGORITHM -A2 - name: ROUTER_LOAD_BALANCE_ALGORITHM value: leastconn Inside router pods: sh-4.4$ env | grep -i ROUTER_LOAD_BALANCE_ALGORITHM ROUTER_LOAD_BALANCE_ALGORITHM=leastconn Backend route algorigthm post the chanage: backend be_http:test1:service-unsecure mode http option redispatch option forwardfor balance leastconn <---- timeout check 5000ms http-request add-header X-Forwarded-Host %[req.hdr(host)] http-request add-header X-Forwarded-Port %[dst_port] http-request add-header X-Forwarded-Proto http if !{ ssl_fc } http-request add-header X-Forwarded-Proto https if { ssl_fc } http-request add-header X-Forwarded-Proto-Version h2 if { ssl_fc_alpn -i h2 } http-request add-header Forwarded for=%[src];host=%[req.hdr(host)];proto=%[req.hdr(X-Forwarded-Proto)] cookie e96c07fa08f2609cadf847f019750244 insert indirect nocache httponly server pod:web-server-rc-fpdlb:service-unsecure:http:10.128.2.28:8080 10.128.2.28:8080 cookie 6eb972d07c3f4b2d51696ce52cfa2115 weight 256 check inter 5000ms server pod:web-server-rc-7wd5w:service-unsecure:http:10.129.2.31:8080 10.129.2.31:8080 cookie 0e8a34e402536207cbae3af56924532e weight 256 check inter 5000ms -------
Hello Arvind, Currently this bug is in verified for version 4.10, is it possible for backport to 4.7. Regards, Eswar.
Hi, if there is anything that customers should know about this bug or if there are any important workarounds that should be outlined in the bug fixes section OpenShift Container Platform 4.10 release notes, please update the Doc Type and Doc Text fields. If not, can you please mark it as "no doc update"? Thanks!
No doc update is required for the 4.10.0 BZ because the change was already backported to 4.9.z. The 4.9.z BZ has a release note that clearly describes the issue, and with the change already backported to 4.9.z, there is effectively no change from 4.9.z to 4.10.0 to warrant a release note.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056