Bug 1949799

Summary: ingresscontroller should deny the setting when spec.tuningOptions.threadCount exceed 64
Product: OpenShift Container Platform Reporter: Hongan Li <hongli>
Component: NetworkingAssignee: Ryan Fredette <rfredette>
Networking sub component: router QA Contact: jechen <jechen>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: amcdermo, aos-bugs, hongli
Version: 4.8   
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-27 23:00:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Hongan Li 2021-04-15 06:55:58 UTC
Description of problem:
ingresscontroller should deny the setting when spec.tuningOptions.threadCount exceed 64, or the router pods failed to run.

Version-Release number of selected component (if applicable):
4.8.0-0.nightly-2021-04-13-171608

How reproducible:
100%

Steps to Reproduce:
1. set default ingresscontroller spec.tuningOptions.threadCount to 65 

$ oc -n openshift-ingress-operator edit ingresscontroller/default
spec:
  httpErrorCodePages:
    name: ""
  replicas: 2
  tuningOptions:
    threadCount: 65
  unsupportedConfigOverrides: null


Actual results:
$ oc -n openshift-ingress get pod
NAME                              READY   STATUS             RESTARTS   AGE
router-default-64b6d8d58c-h6hzj   1/1     Running            0          73m
router-default-d548b768-6grbk     0/1     CrashLoopBackOff   7          25m
router-default-d548b768-g982c     0/1     CrashLoopBackOff   7          25m

$ oc -n openshift-ingress logs router-default-d548b768-6grbk -p
[ALERT] 104/032223 (18) : parsing [/var/lib/haproxy/conf/haproxy.config:3] : 'nbthread' value must be between 1 and 64 (was 65).
[ALERT] 104/032223 (18) : Error(s) found in configuration file : /var/lib/haproxy/conf/haproxy.config
[ALERT] 104/032223 (18) : Fatal errors found in configuration.


Expected results:
should set maximum of spec.tuningOptions.threadCount to 64

Additional info:
https://github.com/openshift/api/blob/f71e361ed3f4c2ef6e80c87af29dd35e34aeb1fb/operator/v1/0000_50_ingress-operator_00-ingresscontroller.crd.yaml#L993

Comment 2 jechen 2021-05-19 02:11:19 UTC
verified in 4.8.0-0.nightly-2021-05-18-205323

$ oc get clusterversions.config.openshift.io 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-05-18-205323   True        False         10m     Cluster version is 4.8.0-0.nightly-2021-05-18-205323

$ oc -n openshift-ingress get pod
NAME                              READY   STATUS    RESTARTS   AGE
router-default-68f676f76b-2wvz5   1/1     Running   0          21m
router-default-68f676f76b-rk84h   1/1     Running   0          21m


# attempted to set default ingresscontroller spec.tuningOptions.threadCount to 65, it was rejected
$ oc -n openshift-ingress-operator edit ingresscontroller/default
error: ingresscontrollers.operator.openshift.io "default" is invalid
A copy of your changes has been stored to "/tmp/oc-edit-8x2j1.yaml"
error: Edit cancelled, no valid changes were saved.


#set default ingresscontroller spec.tuningOptions.threadCount to 64, editing was accepted, new router pods got created, no error message about 'nbthread' was found in new router pod's log
$ oc -n openshift-ingress-operator edit ingresscontroller/default
<--snip-->
spec:
  httpErrorCodePages:
    name: ""
  replicas: 2
  tuningOptions:
    threadCount: 64
  unsupportedConfigOverrides: null
<--snip-->

ingresscontroller.operator.openshift.io/default edited


$ oc -n openshift-ingress get pod
NAME                              READY   STATUS    RESTARTS   AGE
router-default-5b747cfd96-89v2r   1/1     Running   0          2m14s
router-default-5b747cfd96-dpm2w   1/1     Running   0          2m13s

$ oc -n openshift-ingress logs pod/router-default-5b747cfd96-89v2r 
I0519 01:57:31.595999       1 template.go:433] router "msg"="starting router"  "version"="majorFromGit: \nminorFromGit: \ncommitFromGit: 26025a318cc32693474086f21580c3accebb35a6\nversionFromGit: 4.0.0-297-g26025a31\ngitTreeState: clean\nbuildDate: 2021-05-14T06:59:48Z\n"
I0519 01:57:31.600107       1 metrics.go:155] metrics "msg"="router health and metrics port listening on HTTP and HTTPS"  "address"="0.0.0.0:1936"
I0519 01:57:31.607204       1 router.go:191] template "msg"="creating a new template router"  "writeDir"="/var/lib/haproxy"
I0519 01:57:31.607319       1 router.go:270] template "msg"="router will coalesce reloads within an interval of each other"  "interval"="5s"
I0519 01:57:31.607696       1 router.go:332] template "msg"="watching for changes"  "path"="/etc/pki/tls/private"
I0519 01:57:31.607761       1 router.go:262] router "msg"="router is including routes in all namespaces"  
E0519 01:57:31.720364       1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory
I0519 01:57:31.796329       1 router.go:579] template "msg"="router reloaded"  "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
I0519 01:57:36.799866       1 router.go:579] template "msg"="router reloaded"  "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"

$ oc -n openshift-ingress logs pod/router-default-5b747cfd96-dpm2w
I0519 01:57:35.875970       1 template.go:433] router "msg"="starting router"  "version"="majorFromGit: \nminorFromGit: \ncommitFromGit: 26025a318cc32693474086f21580c3accebb35a6\nversionFromGit: 4.0.0-297-g26025a31\ngitTreeState: clean\nbuildDate: 2021-05-14T06:59:48Z\n"
I0519 01:57:35.878739       1 metrics.go:155] metrics "msg"="router health and metrics port listening on HTTP and HTTPS"  "address"="0.0.0.0:1936"
I0519 01:57:35.885998       1 router.go:191] template "msg"="creating a new template router"  "writeDir"="/var/lib/haproxy"
I0519 01:57:35.886079       1 router.go:270] template "msg"="router will coalesce reloads within an interval of each other"  "interval"="5s"
I0519 01:57:35.886555       1 router.go:332] template "msg"="watching for changes"  "path"="/etc/pki/tls/private"
I0519 01:57:35.886620       1 router.go:262] router "msg"="router is including routes in all namespaces"  
E0519 01:57:35.993476       1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory
I0519 01:57:36.063275       1 router.go:579] template "msg"="router reloaded"  "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"


#repeated the test with ingresscontroller spec.tuningOptions.threadCount to 63, editing was accepted and new router pods were created with no error message on nbthread found in pod's log  
$ oc -n openshift-ingress-operator edit ingresscontroller/default
ingresscontroller.operator.openshift.io/default edited

$ oc -n openshift-ingress get pod
NAME                              READY   STATUS    RESTARTS   AGE
router-default-5b8b6d8c99-4qnrr   1/1     Running   0          3m20s
router-default-5b8b6d8c99-n78r4   1/1     Running   0          3m20s

$ oc -n openshift-ingress logs pod/router-default-5b8b6d8c99-4qnrr
I0519 02:05:18.963145       1 template.go:433] router "msg"="starting router"  "version"="majorFromGit: \nminorFromGit: \ncommitFromGit: 26025a318cc32693474086f21580c3accebb35a6\nversionFromGit: 4.0.0-297-g26025a31\ngitTreeState: clean\nbuildDate: 2021-05-14T06:59:48Z\n"
I0519 02:05:18.965386       1 metrics.go:155] metrics "msg"="router health and metrics port listening on HTTP and HTTPS"  "address"="0.0.0.0:1936"
I0519 02:05:18.971389       1 router.go:191] template "msg"="creating a new template router"  "writeDir"="/var/lib/haproxy"
I0519 02:05:18.971478       1 router.go:270] template "msg"="router will coalesce reloads within an interval of each other"  "interval"="5s"
I0519 02:05:18.971985       1 router.go:332] template "msg"="watching for changes"  "path"="/etc/pki/tls/private"
I0519 02:05:18.972069       1 router.go:262] router "msg"="router is including routes in all namespaces"  
E0519 02:05:19.081093       1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory
I0519 02:05:19.153485       1 router.go:579] template "msg"="router reloaded"  "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"

Comment 5 errata-xmlrpc 2021-07-27 23:00:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438