Bug 1822417

Summary: "bind: address already in use" masked by usage spew in pod logs
Product: OpenShift Container Platform Reporter: W. Trevor King <wking>
Component: kube-apiserverAssignee: David Eads <deads>
Status: CLOSED DUPLICATE QA Contact: Xingxing Xia <xxia>
Severity: low Docs Contact:
Priority: medium    
Version: 4.5CC: aos-bugs, mfojtik, sttts
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-20 14:54:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description W. Trevor King 2020-04-08 23:54:15 UTC
For example [1]:

...
I0408 16:44:23.374217       1 server.go:194] Version: v1.18.0-rc.1
Error: failed to create listener: failed to listen on 0.0.0.0:6443: listen tcp 0.0.0.0:6443: bind: address already in use
Usage:
  kube-apiserver [flags]
...

But the usage information is so verbose that the bind error doesn't show up in lastState:

$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/1630/pull-ci-openshift-machine-config-operator-master-e2e-gcp-upgrade/1627/artifacts/e2e-gcp-upgrade/pods.json | jq '.items[] | select(.metadata.name == "kube-apiserver-ci-op-kf7v5-m-0.c.openshift-gce-devel-ci.internal").status.containerStatuses[].lastState.terminated | select(. != null)'
{
  "containerID": "cri-o://fb487ce3249e53080a3f1eecebe269747ca859f3f90bbd066943c574215a87a7",
  "exitCode": 1,
  "finishedAt": "2020-04-08T16:44:23Z",
  "message": "t-signing-key-file string   Path to the file that contains the current private key of the service account token issuer. The issuer will sign issued ID tokens with this private key. (Requires the 'TokenRequest' feature gate.)\n      --service-cluster-ip-range string           A CIDR notation IP range from which to assign service cluster IPs. This must not overlap with any IP ranges assigned to nodes for pods.\n      --service-node-port-range portRange         A port range to reserve for services with NodePort visibility. Example: '30000-32767'. Inclusive at both ends of the range. (default 30000-32767)\n\nGlobal flags:\n\n      --add-dir-header                   If true, adds the file directory to the header\n      --alsologtostderr                  log to standard error as well as files\n  -h, --help                             help for kube-apiserver\n      --log-backtrace-at traceLocation   when logging hits line file:N, emit a stack trace (default :0)\n      --log-dir string                   If non-empty, write log files in this directory\n      --log-file string                  If non-empty, use this log file\n      --log-file-max-size uint           Defines the maximum size a log file can grow to. Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)\n      --log-flush-frequency duration     Maximum number of seconds between log flushes (default 5s)\n      --logtostderr                      log to standard error instead of files (default true)\n      --skip-headers                     If true, avoid header prefixes in the log messages\n      --skip-log-headers                 If true, avoid headers when opening log files\n      --stderrthreshold severity         logs at or above this threshold go to stderr (default 2)\n  -v, --v Level                          number for the log level verbosity (default 0)\n      --version version[=true]           Print version information and quit\n      --vmodule moduleSpec               comma-separated list of pattern=N settings for file-filtered logging\n\n",
  "reason": "Error",
  "startedAt": "2020-04-08T16:44:23Z"
}

We should reserve the usage dump for invocations that use an unrecognized option, not for invocations where we understand the request but are unable to deliver.

[1]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/1630/pull-ci-openshift-machine-config-operator-master-e2e-gcp-upgrade/1627/artifacts/e2e-gcp-upgrade/pods/openshift-kube-apiserver_kube-apiserver-ci-op-kf7v5-m-0.c.openshift-gce-devel-ci.internal_kube-apiserver_previous.log

Comment 1 W. Trevor King 2020-04-27 21:05:14 UTC
Seen again on [1], where it's now fatal for 4.5 update CI (via bug 1828427).  Should I split off a separate bug for the "bind: address already in use" crashlooping, so this one stays focused on log spew?

[1]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/27314

Comment 2 W. Trevor King 2020-04-27 23:13:38 UTC
Seems like this got a 'SilenceUsage' upstream in [1].  Not sure when that will land in OpenShift, but that should address the usage-spew portion of this.

[1]: https://github.com/kubernetes/kubernetes/pull/90297

Comment 4 David Eads 2020-05-20 14:54:04 UTC
fixed by https://github.com/openshift/origin/pull/24945

*** This bug has been marked as a duplicate of bug 1831533 ***

Comment 5 W. Trevor King 2020-05-21 04:24:35 UTC
Looks like bug 1831533 addressed the usage spew.  Did we have a separate ticket open for the bind contention that was being masked?