+++ This bug was initially created as a clone of Bug #1959563 +++ Description of problem: Can the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1902091 be ported back to 4.5 and 4.6 ? 4.6 is our (IBM Cloud) latest release at this time. We have an IBM Cloud customer that appears to be hitting that bug. The image-registry operator goes to `upgrading` status with $ oc describe co image-registry Message: Progressing: Unable to apply resources: unable to sync storage configuration: RequestError: send request failed Progressing: caused by: Head "https://s3.xxx/xxx": dial tcp x.x.x.x:443: connect: cannot assign requested address If they delete the cluster-image-registry-operator pod, the status immediately changes to available. Eventually the operator goes back to `upgrading`. Version-Release number of selected component (if applicable): 4.6 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: --- Additional comment from brian_mckeown.com on 2021-05-24 14:00:53 UTC --- hello team - has anyone had a chance to review this request - please let us know your thoughts - thanks a lot. --- Additional comment from obulatov on 2021-05-31 14:40:52 UTC --- 4.5 is going to reach EOL soon, so it's unlikely that we'll be able to backport it to 4.5. --- Additional comment from lisowski.com on 2021-06-09 19:01:22 UTC --- Created attachment 1789644 [details] Shows active connections to COS after 13 days Shows active connections to COS over time (this is after 13 days of image operator running) --- Additional comment from lisowski.com on 2021-06-09 19:07:11 UTC --- Hi Oleg: Just to confirm as well: It looks like the leak to COS happens independent of errors as well (happens during normal operation). I do think we need to backport this to 4.5 or else the leak can cause other downstream issues due to the ports these connections are consuming (port exhaustion, etc). We want to confirm as well you have validated that even during normal operations the leak is fixed. Here is an example showing during normal operations the connections growing over time (goes up 130 connections after 20 minutes. Continues to grow from there. [root@kube-c2a0njgd06pi318lnrl0-cp4appdso16-default-00000164 /]# find /proc/ 2>/dev/null | grep tcp | grep -v task | grep -v sys/net | xargs grep -v rem_address 2>/dev/null | awk '{x=strtonum("0x"substr($3,index($3,":")-2,2)); y=strtonum("0x"substr($4,index($4,":")-2,2)); for (i=5; i>0; i-=2) x = x"."strtonum("0x"substr($3,i,2)); for (i=5; i>0; i-=2) y = y"."strtonum("0x"substr($4,i,2))}{printf ("%s\t:%s\t ----> \t %s\t:%s\t%s\n",x,strtonum("0x"substr($3,index($3,":")+1,4)),y,strtonum("0x"substr($4,index($4,":")+1,4)),$1)}' | sort | uniq --check-chars=25 | grep "161.26.5.14" | grep "172.17.49.204" | wc -l 24 [root@kube-c2a0njgd06pi318lnrl0-cp4appdso16-default-00000164 /]# date; find /proc/ 2>/dev/null | grep tcp | grep -v task | grep -v sys/net | xargs grep -v rem_address 2>/dev/null | awk '{x=strtonum("0x"substr($3,index($3,":")-2,2)); y=strtonum("0x"substr($4,index($4,":")-2,2)); for (i=5; i>0; i-=2) x = x"."strtonum("0x"substr($3,i,2)); for (i=5; i>0; i-=2) y = y"."strtonum("0x"substr($4,i,2))}{printf ("%s\t:%s\t ----> \t %s\t:%s\t%s\n",x,strtonum("0x"substr($3,index($3,":")+1,4)),y,strtonum("0x"substr($4,index($4,":")+1,4)),$1)}' | sort | uniq --check-chars=25 | grep "161.26.5.14" | grep "172.17.49.204" | wc -l Wed Jun 9 13:45:17 CDT 2021 27 [root@kube-c2a0njgd06pi318lnrl0-cp4appdso16-default-00000164 /]# date; find /proc/ 2>/dev/null | grep tcp | grep -v task | grep -v sys/net | xargs grep -v rem_address 2>/dev/null | awk '{x=strtonum("0x"substr($3,index($3,":")-2,2)); y=strtonum("0x"substr($4,index($4,":")-2,2)); for (i=5; i>0; i-=2) x = x"."strtonum("0x"substr($3,i,2)); for (i=5; i>0; i-=2) y = y"."strtonum("0x"substr($4,i,2))}{printf ("%s\t:%s\t ----> \t %s\t:%s\t%s\n",x,strtonum("0x"substr($3,index($3,":")+1,4)),y,strtonum("0x"substr($4,index($4,":")+1,4)),$1)}' | sort | uniq --check-chars=25 | grep "161.26.5.14" | grep "172.17.49.204" | wc -l Wed Jun 9 13:45:29 CDT 2021 28 [root@kube-c2a0njgd06pi318lnrl0-cp4appdso16-default-00000164 /]# date; find /proc/ 2>/dev/null | grep tcp | grep -v task | grep -v sys/net | xargs grep -v rem_address 2>/dev/null | awk '{x=strtonum("0x"substr($3,index($3,":")-2,2)); y=strtonum("0x"substr($4,index($4,":")-2,2)); for (i=5; i>0; i-=2) x = x"."strtonum("0x"substr($3,i,2)); for (i=5; i>0; i-=2) y = y"."strtonum("0x"substr($4,i,2))}{printf ("%s\t:%s\t ----> \t %s\t:%s\t%s\n",x,strtonum("0x"substr($3,index($3,":")+1,4)),y,strtonum("0x"substr($4,index($4,":")+1,4)),$1)}' | sort | uniq --check-chars=25 | grep "161.26.5.14" | grep "172.17.49.204" | wc -l Wed Jun 9 13:46:00 CDT 2021 32 [root@kube-c2a0njgd06pi318lnrl0-cp4appdso16-default-00000164 /]# date; find /proc/ 2>/dev/null | grep tcp | grep -v task | grep -v sys/net | xargs grep -v rem_address 2>/dev/null | awk '{x=strtonum("0x"substr($3,index($3,":")-2,2)); y=strtonum("0x"substr($4,index($4,":")-2,2)); for (i=5; i>0; i-=2) x = x"."strtonum("0x"substr($3,i,2)); for (i=5; i>0; i-=2) y = y"."strtonum("0x"substr($4,i,2))}{printf ("%s\t:%s\t ----> \t %s\t:%s\t%s\n",x,strtonum("0x"substr($3,index($3,":")+1,4)),y,strtonum("0x"substr($4,index($4,":")+1,4)),$1)}' | sort | uniq --check-chars=25 | grep "161.26.5.14" | grep "172.17.49.204" | wc -l Wed Jun 9 13:46:09 CDT 2021 32 [root@kube-c2a0njgd06pi318lnrl0-cp4appdso16-default-00000164 /]# date; find /proc/ 2>/dev/null | grep tcp | grep -v task | grep -v sys/net | xargs grep -v rem_address 2>/dev/null | awk '{x=strtonum("0x"substr($3,index($3,":")-2,2)); y=strtonum("0x"substr($4,index($4,":")-2,2)); for (i=5; i>0; i-=2) x = x"."strtonum("0x"substr($3,i,2)); for (i=5; i>0; i-=2) y = y"."strtonum("0x"substr($4,i,2))}{printf ("%s\t:%s\t ----> \t %s\t:%s\t%s\n",x,strtonum("0x"substr($3,index($3,":")+1,4)),y,strtonum("0x"substr($4,index($4,":")+1,4)),$1)}' | sort | uniq --check-chars=25 | grep "161.26.5.14" | grep "172.17.49.204" | wc -l Wed Jun 9 13:46:15 CDT 2021 33 [root@kube-c2a0njgd06pi318lnrl0-cp4appdso16-default-00000164 /]# date; find /proc/ 2>/dev/null | grep tcp | grep -v task | grep -v sys/net | xargs grep -v rem_address 2>/dev/null | awk '{x=strtonum("0x"substr($3,index($3,":")-2,2)); y=strtonum("0x"substr($4,index($4,":")-2,2)); for (i=5; i>0; i-=2) x = x"."strtonum("0x"substr($3,i,2)); for (i=5; i>0; i-=2) y = y"."strtonum("0x"substr($4,i,2))}{printf ("%s\t:%s\t ----> \t %s\t:%s\t%s\n",x,strtonum("0x"substr($3,index($3,":")+1,4)),y,strtonum("0x"substr($4,index($4,":")+1,4)),$1)}' | sort | uniq --check-chars=25 | grep "161.26.5.14" | grep "172.17.49.204" | wc -l Wed Jun 9 13:46:31 CDT 2021 35 [root@kube-c2a0njgd06pi318lnrl0-cp4appdso16-default-00000164 /]# date; find /proc/ 2>/dev/null | grep tcp | grep -v task | grep -v sys/net | xargs grep -v rem_address 2>/dev/null | awk '{x=strtonum("0x"substr($3,index($3,":")-2,2)); y=strtonum("0x"substr($4,index($4,":")-2,2)); for (i=5; i>0; i-=2) x = x"."strtonum("0x"substr($3,i,2)); for (i=5; i>0; i-=2) y = y"."strtonum("0x"substr($4,i,2))}{printf ("%s\t:%s\t ----> \t %s\t:%s\t%s\n",x,strtonum("0x"substr($3,index($3,":")+1,4)),y,strtonum("0x"substr($4,index($4,":")+1,4)),$1)}' | sort | uniq --check-chars=25 | grep "161.26.5.14" | grep "172.17.49.204" | wc -l Wed Jun 9 14:04:58 CDT 2021 154 ``` --- Additional comment from lisowski.com on 2021-06-09 19:08:58 UTC --- Created attachment 1789645 [details] logs of cluster image registry operator --- Additional comment from lisowski.com on 2021-06-09 19:09:36 UTC --- The logs of the image registry operator during this time show no errors reported as well for documentation purposes. --- Additional comment from lisowski.com on 2021-06-09 19:11:11 UTC --- Openshift verison these logs are from is `4.6.28` --- Additional comment from lisowski.com on 2021-06-09 19:26:00 UTC --- Checking in a 4.5 cluster the logs line up with after 7 hours 174 connections opened up: ``` 172.17.56.112 :33092 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :33094 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :33636 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :33646 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :33652 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :34294 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :34298 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :34932 ----> 172.21.0.1 :443 /proc/19888/net/tcp: 172.17.56.112 :35208 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :35214 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :35416 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :35420 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :35578 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :35582 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :35612 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :35618 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :35680 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :35692 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :35696 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :35810 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :35818 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :36418 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :37392 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :37400 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :37774 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :37780 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :38268 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :38278 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :39576 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :39582 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :39588 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :39596 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :40164 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :40170 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :40298 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :40302 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :40416 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :40418 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :40464 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :40472 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :40488 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :40492 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :40514 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :40782 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :40786 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :40790 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :41116 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :41122 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :41128 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :41344 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :41522 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :41524 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :41754 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :42164 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :42168 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :42612 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :42620 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :42630 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :42670 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :43534 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :43540 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :43548 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :43556 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :43896 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :43900 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :43910 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :44192 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :44312 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :44320 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :44324 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :45244 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :45252 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :45430 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :45434 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :46290 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :46296 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :46302 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :47018 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :47218 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :47222 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :47268 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :47274 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :47328 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :47332 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :47406 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :47412 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :47414 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :47808 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :47820 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :49212 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :49216 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :49220 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :49280 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :49290 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :49294 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :49354 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :50462 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :50470 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :50504 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :50572 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :50576 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :51014 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :51022 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :51216 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :51230 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :51234 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :51936 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :51940 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :52150 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :52152 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :52162 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :52298 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :52302 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :52310 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :52360 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :52370 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :52374 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :53252 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :53258 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :53266 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :53274 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :53282 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :53484 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :53490 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :53556 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :53564 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :53572 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :53964 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :54130 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :54136 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :54142 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :54494 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :54500 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :54506 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :54568 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :54572 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :54866 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :55182 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :55192 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :55344 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :55348 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :56180 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :56190 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :56194 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :56232 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :56238 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :56454 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :56456 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :56457 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :56466 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :56470 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :56472 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :56742 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :56758 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :57086 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :57090 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :57218 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :57222 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :57230 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :57786 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :57788 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :58052 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :58104 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :58692 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :58698 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :58700 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :59822 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :59826 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :60578 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :60586 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :60846 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :60858 ----> 161.26.5.14 :443 /proc/19888/net/tcp: ``` --- Additional comment from lisowski.com on 2021-06-09 19:26:23 UTC --- ``` kubectl get pods -n openshift-image-registry -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES cluster-image-registry-operator-8564975cc6-m9tqx 2/2 Running 0 7h14m 172.17.56.112 10.240.128.50 <none> <none> ``` --- Additional comment from lisowski.com on 2021-06-09 19:35:48 UTC --- Openshift version for 4.5 is ``` 4.5.40 ``` --- Additional comment from lisowski.com on 2021-06-09 19:50:39 UTC --- Correction for Openshift 4.5 version it is ``` 4.5.39 ``` I suspect this bug was introduced between: 4.5.38 - 4.5.39 and for 4.6 4.6.24 - 4.6.28 --- Additional comment from lisowski.com on 2021-06-09 22:07:16 UTC --- These timelines are wrong: this has appeared to be in older openshift releases than that range One key note though: ``` storage: managementState: Unmanaged s3: bucket: XXXXX region: us-standard regionEndpoint: https://s3.direct.us.cloud-object-storage.appdomain.cloud virtualHostedStyle: false unsupportedConfigOverrides: null ``` If `managementState: Unmanaged` is set than the problem goes away (might need to restart the operator pod to fix the existing connections. --- Additional comment from lisowski.com on 2021-06-09 22:10:49 UTC --- I also see no problems on a cluster with…. storage: s3: bucket: roks-bsr0toow08i9rajtg1a0-cdbf region: us-standard regionEndpoint: https://s3.direct.us.cloud-object-storage.appdomain.cloud virtualHostedStyle: false storageManaged: false Specifically `storageManaged: false` --- Additional comment from lisowski.com on 2021-06-10 03:55:53 UTC --- Just to note: After 12 days we see the connection count go up to 28231 active connections. Looking into mitigation options --- Additional comment from lisowski.com on 2021-06-10 04:11:29 UTC --- Created attachment 1789710 [details] This is a conntrack table output showing the leaking connections This is a conntrack table output showing the leaking connections. This is after 12 days of usage --- Additional comment from lisowski.com on 2021-06-10 04:22:05 UTC --- Mitigation for now is: 1) Set the management state of the registry to Unmanaged to prevent the operator from leaking S3 Connections ``` oc patch configs.imageregistry.operator.openshift.io/cluster --type='json' -p='[{"op": "replace", "path": "/spec/managementState", "value":"Unmanaged"}]' ``` 2) To flush out existing connections restart the operator with ``` oc delete pod -n openshift-image-registry -l name=cluster-image-registry-operator pod "cluster-image-registry-operator-7cbb7997cd-vxkp5" deleted ``` Then connections will not leak anymore with the existing code base. However, patches/updates will not roll for the downstream registry pods until the management state is set to managed with: ``` oc patch configs.imageregistry.operator.openshift.io/cluster --type='json' -p='[{"op": "replace", "path": "/spec/managementState", "value":"Managed"}]' ``` --- Additional comment from lisowski.com on 2021-06-10 04:28:27 UTC --- Correction: The leak is bounded at 28231 connections which corresponds to the local port range cri-o sets for a given container --- Additional comment from rmarasch on 2021-06-10 13:54:41 UTC --- Keeping this as ASSIGNED until I investigate the new data. --- Additional comment from lisowski.com on 2021-06-10 22:04:29 UTC --- Due to the scale of this leak I think this is a higher severity than a medium fix. --- Additional comment from lisowski.com on 2021-06-10 22:09:21 UTC --- I also verified for those that are wondering that this leak is present in the latest openshift 4.3 and 4.4 versions (they are unsupported. But I think it's good information to have for those that might be sitting back on upgrading). I believe it is originally introduced on versions after July 3rd 2019 with this PR: https://github.com/openshift/cluster-image-registry-operator/commit/7e102747e43afd87b165491f4a08480d3375ea56#diff-15ce517731313[…]f349dee84826febR72 --- Additional comment from lisowski.com on 2021-06-11 05:21:50 UTC --- Update: I have built the patch for 4.6 and the connections to the S3 backend do appear to be stable. It is hovering around 10-12 connections in use at a time over multiple minutes (will check back in a few hours to confirm it's stable). I believe this proves the patch is valid (this is the patch for 4.6) [root@kube-c1rg7j4o0vnsob07caag-vpcocosa1-default-00000e20 /]# conntrack -L | grep 161.26.85.12 tcp 6 86386 ESTABLISHED src=172.17.31.165 dst=161.26.85.12 sport=48106 dport=443 src=161.26.85.12 dst=10.180.0.18 sport=443 dport=48106 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 86396 ESTABLISHED src=172.17.31.165 dst=161.26.85.12 sport=48196 dport=443 src=161.26.85.12 dst=10.180.0.18 sport=443 dport=48196 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 86387 ESTABLISHED src=172.17.31.165 dst=161.26.85.12 sport=48376 dport=443 src=161.26.85.12 dst=10.180.0.18 sport=443 dport=48376 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 86376 ESTABLISHED src=172.17.31.165 dst=161.26.85.12 sport=48012 dport=443 src=161.26.85.12 dst=10.180.0.18 sport=443 dport=48012 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 86396 ESTABLISHED src=172.17.31.165 dst=161.26.85.12 sport=49090 dport=443 src=161.26.85.12 dst=10.180.0.18 sport=443 dport=49090 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 86396 ESTABLISHED src=172.17.31.165 dst=161.26.85.12 sport=48786 dport=443 src=161.26.85.12 dst=10.180.0.18 sport=443 dport=48786 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 86386 ESTABLISHED src=172.17.31.165 dst=161.26.85.12 sport=48996 dport=443 src=161.26.85.12 dst=10.180.0.18 sport=443 dport=48996 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 86377 ESTABLISHED src=172.17.31.165 dst=161.26.85.12 sport=48278 dport=443 src=161.26.85.12 dst=10.180.0.18 sport=443 dport=48278 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 86376 ESTABLISHED src=172.17.31.165 dst=161.26.85.12 sport=48898 dport=443 src=161.26.85.12 dst=10.180.0.18 sport=443 dport=48898 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 6 CLOSE src=172.17.31.165 dst=161.26.85.12 sport=44176 dport=443 src=161.26.85.12 dst=10.180.0.18 sport=443 dport=44176 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 conntrack v1.4.4 (conntrack-tools): 156 flow entries have been shown. [root@kube-c1rg7j4o0vnsob07caag-vpcocosa1-default-00000e20 /]# conntrack -L | grep 161.26.85.12 | wc -l conntrack v1.4.4 (conntrack-tools): 151 flow entries have been shown. 10
Verified on 4.5.0-0.nightly-2021-06-21-181416.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.5.41 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2430