Description of problem: Can the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1902091 be ported back to 4.5 and 4.6 ? 4.6 is our (IBM Cloud) latest release at this time. We have an IBM Cloud customer that appears to be hitting that bug. The image-registry operator goes to `upgrading` status with $ oc describe co image-registry Message: Progressing: Unable to apply resources: unable to sync storage configuration: RequestError: send request failed Progressing: caused by: Head "https://s3.xxx/xxx": dial tcp x.x.x.x:443: connect: cannot assign requested address If they delete the cluster-image-registry-operator pod, the status immediately changes to available. Eventually the operator goes back to `upgrading`. Version-Release number of selected component (if applicable): 4.6 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
hello team - has anyone had a chance to review this request - please let us know your thoughts - thanks a lot.
4.5 is going to reach EOL soon, so it's unlikely that we'll be able to backport it to 4.5.
Created attachment 1789644 [details] Shows active connections to COS after 13 days Shows active connections to COS over time (this is after 13 days of image operator running)
Hi Oleg: Just to confirm as well: It looks like the leak to COS happens independent of errors as well (happens during normal operation). I do think we need to backport this to 4.5 or else the leak can cause other downstream issues due to the ports these connections are consuming (port exhaustion, etc). We want to confirm as well you have validated that even during normal operations the leak is fixed. Here is an example showing during normal operations the connections growing over time (goes up 130 connections after 20 minutes. Continues to grow from there. [root@kube-c2a0njgd06pi318lnrl0-cp4appdso16-default-00000164 /]# find /proc/ 2>/dev/null | grep tcp | grep -v task | grep -v sys/net | xargs grep -v rem_address 2>/dev/null | awk '{x=strtonum("0x"substr($3,index($3,":")-2,2)); y=strtonum("0x"substr($4,index($4,":")-2,2)); for (i=5; i>0; i-=2) x = x"."strtonum("0x"substr($3,i,2)); for (i=5; i>0; i-=2) y = y"."strtonum("0x"substr($4,i,2))}{printf ("%s\t:%s\t ----> \t %s\t:%s\t%s\n",x,strtonum("0x"substr($3,index($3,":")+1,4)),y,strtonum("0x"substr($4,index($4,":")+1,4)),$1)}' | sort | uniq --check-chars=25 | grep "161.26.5.14" | grep "172.17.49.204" | wc -l 24 [root@kube-c2a0njgd06pi318lnrl0-cp4appdso16-default-00000164 /]# date; find /proc/ 2>/dev/null | grep tcp | grep -v task | grep -v sys/net | xargs grep -v rem_address 2>/dev/null | awk '{x=strtonum("0x"substr($3,index($3,":")-2,2)); y=strtonum("0x"substr($4,index($4,":")-2,2)); for (i=5; i>0; i-=2) x = x"."strtonum("0x"substr($3,i,2)); for (i=5; i>0; i-=2) y = y"."strtonum("0x"substr($4,i,2))}{printf ("%s\t:%s\t ----> \t %s\t:%s\t%s\n",x,strtonum("0x"substr($3,index($3,":")+1,4)),y,strtonum("0x"substr($4,index($4,":")+1,4)),$1)}' | sort | uniq --check-chars=25 | grep "161.26.5.14" | grep "172.17.49.204" | wc -l Wed Jun 9 13:45:17 CDT 2021 27 [root@kube-c2a0njgd06pi318lnrl0-cp4appdso16-default-00000164 /]# date; find /proc/ 2>/dev/null | grep tcp | grep -v task | grep -v sys/net | xargs grep -v rem_address 2>/dev/null | awk '{x=strtonum("0x"substr($3,index($3,":")-2,2)); y=strtonum("0x"substr($4,index($4,":")-2,2)); for (i=5; i>0; i-=2) x = x"."strtonum("0x"substr($3,i,2)); for (i=5; i>0; i-=2) y = y"."strtonum("0x"substr($4,i,2))}{printf ("%s\t:%s\t ----> \t %s\t:%s\t%s\n",x,strtonum("0x"substr($3,index($3,":")+1,4)),y,strtonum("0x"substr($4,index($4,":")+1,4)),$1)}' | sort | uniq --check-chars=25 | grep "161.26.5.14" | grep "172.17.49.204" | wc -l Wed Jun 9 13:45:29 CDT 2021 28 [root@kube-c2a0njgd06pi318lnrl0-cp4appdso16-default-00000164 /]# date; find /proc/ 2>/dev/null | grep tcp | grep -v task | grep -v sys/net | xargs grep -v rem_address 2>/dev/null | awk '{x=strtonum("0x"substr($3,index($3,":")-2,2)); y=strtonum("0x"substr($4,index($4,":")-2,2)); for (i=5; i>0; i-=2) x = x"."strtonum("0x"substr($3,i,2)); for (i=5; i>0; i-=2) y = y"."strtonum("0x"substr($4,i,2))}{printf ("%s\t:%s\t ----> \t %s\t:%s\t%s\n",x,strtonum("0x"substr($3,index($3,":")+1,4)),y,strtonum("0x"substr($4,index($4,":")+1,4)),$1)}' | sort | uniq --check-chars=25 | grep "161.26.5.14" | grep "172.17.49.204" | wc -l Wed Jun 9 13:46:00 CDT 2021 32 [root@kube-c2a0njgd06pi318lnrl0-cp4appdso16-default-00000164 /]# date; find /proc/ 2>/dev/null | grep tcp | grep -v task | grep -v sys/net | xargs grep -v rem_address 2>/dev/null | awk '{x=strtonum("0x"substr($3,index($3,":")-2,2)); y=strtonum("0x"substr($4,index($4,":")-2,2)); for (i=5; i>0; i-=2) x = x"."strtonum("0x"substr($3,i,2)); for (i=5; i>0; i-=2) y = y"."strtonum("0x"substr($4,i,2))}{printf ("%s\t:%s\t ----> \t %s\t:%s\t%s\n",x,strtonum("0x"substr($3,index($3,":")+1,4)),y,strtonum("0x"substr($4,index($4,":")+1,4)),$1)}' | sort | uniq --check-chars=25 | grep "161.26.5.14" | grep "172.17.49.204" | wc -l Wed Jun 9 13:46:09 CDT 2021 32 [root@kube-c2a0njgd06pi318lnrl0-cp4appdso16-default-00000164 /]# date; find /proc/ 2>/dev/null | grep tcp | grep -v task | grep -v sys/net | xargs grep -v rem_address 2>/dev/null | awk '{x=strtonum("0x"substr($3,index($3,":")-2,2)); y=strtonum("0x"substr($4,index($4,":")-2,2)); for (i=5; i>0; i-=2) x = x"."strtonum("0x"substr($3,i,2)); for (i=5; i>0; i-=2) y = y"."strtonum("0x"substr($4,i,2))}{printf ("%s\t:%s\t ----> \t %s\t:%s\t%s\n",x,strtonum("0x"substr($3,index($3,":")+1,4)),y,strtonum("0x"substr($4,index($4,":")+1,4)),$1)}' | sort | uniq --check-chars=25 | grep "161.26.5.14" | grep "172.17.49.204" | wc -l Wed Jun 9 13:46:15 CDT 2021 33 [root@kube-c2a0njgd06pi318lnrl0-cp4appdso16-default-00000164 /]# date; find /proc/ 2>/dev/null | grep tcp | grep -v task | grep -v sys/net | xargs grep -v rem_address 2>/dev/null | awk '{x=strtonum("0x"substr($3,index($3,":")-2,2)); y=strtonum("0x"substr($4,index($4,":")-2,2)); for (i=5; i>0; i-=2) x = x"."strtonum("0x"substr($3,i,2)); for (i=5; i>0; i-=2) y = y"."strtonum("0x"substr($4,i,2))}{printf ("%s\t:%s\t ----> \t %s\t:%s\t%s\n",x,strtonum("0x"substr($3,index($3,":")+1,4)),y,strtonum("0x"substr($4,index($4,":")+1,4)),$1)}' | sort | uniq --check-chars=25 | grep "161.26.5.14" | grep "172.17.49.204" | wc -l Wed Jun 9 13:46:31 CDT 2021 35 [root@kube-c2a0njgd06pi318lnrl0-cp4appdso16-default-00000164 /]# date; find /proc/ 2>/dev/null | grep tcp | grep -v task | grep -v sys/net | xargs grep -v rem_address 2>/dev/null | awk '{x=strtonum("0x"substr($3,index($3,":")-2,2)); y=strtonum("0x"substr($4,index($4,":")-2,2)); for (i=5; i>0; i-=2) x = x"."strtonum("0x"substr($3,i,2)); for (i=5; i>0; i-=2) y = y"."strtonum("0x"substr($4,i,2))}{printf ("%s\t:%s\t ----> \t %s\t:%s\t%s\n",x,strtonum("0x"substr($3,index($3,":")+1,4)),y,strtonum("0x"substr($4,index($4,":")+1,4)),$1)}' | sort | uniq --check-chars=25 | grep "161.26.5.14" | grep "172.17.49.204" | wc -l Wed Jun 9 14:04:58 CDT 2021 154 ```
Created attachment 1789645 [details] logs of cluster image registry operator
The logs of the image registry operator during this time show no errors reported as well for documentation purposes.
Openshift verison these logs are from is `4.6.28`
Checking in a 4.5 cluster the logs line up with after 7 hours 174 connections opened up: ``` 172.17.56.112 :33092 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :33094 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :33636 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :33646 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :33652 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :34294 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :34298 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :34932 ----> 172.21.0.1 :443 /proc/19888/net/tcp: 172.17.56.112 :35208 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :35214 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :35416 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :35420 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :35578 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :35582 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :35612 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :35618 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :35680 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :35692 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :35696 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :35810 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :35818 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :36418 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :37392 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :37400 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :37774 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :37780 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :38268 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :38278 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :39576 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :39582 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :39588 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :39596 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :40164 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :40170 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :40298 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :40302 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :40416 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :40418 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :40464 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :40472 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :40488 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :40492 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :40514 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :40782 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :40786 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :40790 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :41116 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :41122 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :41128 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :41344 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :41522 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :41524 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :41754 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :42164 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :42168 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :42612 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :42620 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :42630 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :42670 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :43534 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :43540 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :43548 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :43556 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :43896 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :43900 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :43910 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :44192 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :44312 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :44320 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :44324 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :45244 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :45252 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :45430 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :45434 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :46290 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :46296 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :46302 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :47018 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :47218 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :47222 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :47268 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :47274 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :47328 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :47332 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :47406 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :47412 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :47414 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :47808 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :47820 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :49212 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :49216 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :49220 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :49280 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :49290 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :49294 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :49354 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :50462 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :50470 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :50504 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :50572 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :50576 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :51014 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :51022 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :51216 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :51230 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :51234 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :51936 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :51940 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :52150 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :52152 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :52162 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :52298 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :52302 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :52310 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :52360 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :52370 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :52374 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :53252 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :53258 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :53266 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :53274 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :53282 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :53484 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :53490 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :53556 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :53564 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :53572 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :53964 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :54130 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :54136 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :54142 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :54494 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :54500 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :54506 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :54568 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :54572 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :54866 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :55182 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :55192 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :55344 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :55348 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :56180 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :56190 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :56194 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :56232 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :56238 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :56454 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :56456 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :56457 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :56466 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :56470 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :56472 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :56742 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :56758 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :57086 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :57090 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :57218 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :57222 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :57230 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :57786 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :57788 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :58052 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :58104 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :58692 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :58698 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :58700 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :59822 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :59826 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :60578 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :60586 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :60846 ----> 161.26.5.14 :443 /proc/19888/net/tcp: 172.17.56.112 :60858 ----> 161.26.5.14 :443 /proc/19888/net/tcp: ```
``` kubectl get pods -n openshift-image-registry -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES cluster-image-registry-operator-8564975cc6-m9tqx 2/2 Running 0 7h14m 172.17.56.112 10.240.128.50 <none> <none> ```
Openshift version for 4.5 is ``` 4.5.40 ```
Correction for Openshift 4.5 version it is ``` 4.5.39 ``` I suspect this bug was introduced between: 4.5.38 - 4.5.39 and for 4.6 4.6.24 - 4.6.28
These timelines are wrong: this has appeared to be in older openshift releases than that range One key note though: ``` storage: managementState: Unmanaged s3: bucket: XXXXX region: us-standard regionEndpoint: https://s3.direct.us.cloud-object-storage.appdomain.cloud virtualHostedStyle: false unsupportedConfigOverrides: null ``` If `managementState: Unmanaged` is set than the problem goes away (might need to restart the operator pod to fix the existing connections.
I also see no problems on a cluster with…. storage: s3: bucket: roks-bsr0toow08i9rajtg1a0-cdbf region: us-standard regionEndpoint: https://s3.direct.us.cloud-object-storage.appdomain.cloud virtualHostedStyle: false storageManaged: false Specifically `storageManaged: false`
Just to note: After 12 days we see the connection count go up to 28231 active connections. Looking into mitigation options
Created attachment 1789710 [details] This is a conntrack table output showing the leaking connections This is a conntrack table output showing the leaking connections. This is after 12 days of usage
Mitigation for now is: 1) Set the management state of the registry to Unmanaged to prevent the operator from leaking S3 Connections ``` oc patch configs.imageregistry.operator.openshift.io/cluster --type='json' -p='[{"op": "replace", "path": "/spec/managementState", "value":"Unmanaged"}]' ``` 2) To flush out existing connections restart the operator with ``` oc delete pod -n openshift-image-registry -l name=cluster-image-registry-operator pod "cluster-image-registry-operator-7cbb7997cd-vxkp5" deleted ``` Then connections will not leak anymore with the existing code base. However, patches/updates will not roll for the downstream registry pods until the management state is set to managed with: ``` oc patch configs.imageregistry.operator.openshift.io/cluster --type='json' -p='[{"op": "replace", "path": "/spec/managementState", "value":"Managed"}]' ```
Correction: The leak is bounded at 28231 connections which corresponds to the local port range cri-o sets for a given container
Keeping this as ASSIGNED until I investigate the new data.
Due to the scale of this leak I think this is a higher severity than a medium fix.
I also verified for those that are wondering that this leak is present in the latest openshift 4.3 and 4.4 versions (they are unsupported. But I think it's good information to have for those that might be sitting back on upgrading). I believe it is originally introduced on versions after July 3rd 2019 with this PR: https://github.com/openshift/cluster-image-registry-operator/commit/7e102747e43afd87b165491f4a08480d3375ea56#diff-15ce517731313[…]f349dee84826febR72
Update: I have built the patch for 4.6 and the connections to the S3 backend do appear to be stable. It is hovering around 10-12 connections in use at a time over multiple minutes (will check back in a few hours to confirm it's stable). I believe this proves the patch is valid (this is the patch for 4.6) [root@kube-c1rg7j4o0vnsob07caag-vpcocosa1-default-00000e20 /]# conntrack -L | grep 161.26.85.12 tcp 6 86386 ESTABLISHED src=172.17.31.165 dst=161.26.85.12 sport=48106 dport=443 src=161.26.85.12 dst=10.180.0.18 sport=443 dport=48106 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 86396 ESTABLISHED src=172.17.31.165 dst=161.26.85.12 sport=48196 dport=443 src=161.26.85.12 dst=10.180.0.18 sport=443 dport=48196 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 86387 ESTABLISHED src=172.17.31.165 dst=161.26.85.12 sport=48376 dport=443 src=161.26.85.12 dst=10.180.0.18 sport=443 dport=48376 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 86376 ESTABLISHED src=172.17.31.165 dst=161.26.85.12 sport=48012 dport=443 src=161.26.85.12 dst=10.180.0.18 sport=443 dport=48012 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 86396 ESTABLISHED src=172.17.31.165 dst=161.26.85.12 sport=49090 dport=443 src=161.26.85.12 dst=10.180.0.18 sport=443 dport=49090 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 86396 ESTABLISHED src=172.17.31.165 dst=161.26.85.12 sport=48786 dport=443 src=161.26.85.12 dst=10.180.0.18 sport=443 dport=48786 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 86386 ESTABLISHED src=172.17.31.165 dst=161.26.85.12 sport=48996 dport=443 src=161.26.85.12 dst=10.180.0.18 sport=443 dport=48996 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 86377 ESTABLISHED src=172.17.31.165 dst=161.26.85.12 sport=48278 dport=443 src=161.26.85.12 dst=10.180.0.18 sport=443 dport=48278 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 86376 ESTABLISHED src=172.17.31.165 dst=161.26.85.12 sport=48898 dport=443 src=161.26.85.12 dst=10.180.0.18 sport=443 dport=48898 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 6 CLOSE src=172.17.31.165 dst=161.26.85.12 sport=44176 dport=443 src=161.26.85.12 dst=10.180.0.18 sport=443 dport=44176 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 conntrack v1.4.4 (conntrack-tools): 156 flow entries have been shown. [root@kube-c1rg7j4o0vnsob07caag-vpcocosa1-default-00000e20 /]# conntrack -L | grep 161.26.85.12 | wc -l conntrack v1.4.4 (conntrack-tools): 151 flow entries have been shown. 10
Confirmed it's still holding steady over 9 hours: The Openshift team has kindly accepted back-porting this to 4.5 even though it's near EOL (thank you Openshift team).
trying to get bug in place
update to verified
*** Bug 1973281 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.35 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2410