Bug 1746881 - Requests failures and TLS issues during tests
Summary: Requests failures and TLS issues during tests
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.2.0
Assignee: Jan Chaloupka
QA Contact: Jianwei Hou
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-29 12:33 UTC by Ricardo Maraschini
Modified: 2019-10-16 06:38 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:38:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:38:41 UTC

Comment 1 Ben Bennett 2019-08-29 13:39:36 UTC
If the information is being gathered from the kubelet, then the SDN is not involved.

Comment 2 Seth Jennings 2019-08-29 17:09:39 UTC
$ cat csr.json | jq '.items[] | {name:.metadata.name, message:.status.conditions[0].message, type:.status.conditions[0].type}'
{
  "name": "csr-4cgnb",
  "message": "This CSR was approved by the Node CSR Approver",
  "type": "Approved"
}
{
  "name": "csr-9qcsb",
  "message": "This CSR was approved by kubectl certificate approve.",
  "type": "Approved"
}
{
  "name": "csr-bhxg4",
  "message": "This CSR was approved by kubectl certificate approve.",
  "type": "Approved"
}
{
  "name": "csr-bm8bh",
  "message": "This CSR was approved by the Node CSR Approver",
  "type": "Approved"
}
{
  "name": "csr-dh6lw",
  "message": null,
  "type": null
}
{
  "name": "csr-gkdbg",
  "message": "This CSR was approved by kubectl certificate approve.",
  "type": "Approved"
}
{
  "name": "csr-gtg65",
  "message": "This CSR was approved by the Node CSR Approver",
  "type": "Approved"
}
{
  "name": "csr-hnbss",
  "message": "This CSR was approved by kubectl certificate approve.",
  "type": "Approved"
}
{
  "name": "csr-jq8jq",
  "message": "This CSR was approved by the Node CSR Approver",
  "type": "Approved"
}
{
  "name": "csr-k54ds",
  "message": "This CSR was approved by kubectl certificate approve.",
  "type": "Approved"
}
{
  "name": "csr-lw5d6",
  "message": "This CSR was approved by the Node CSR Approver",
  "type": "Approved"
}
{
  "name": "csr-rcpl4",
  "message": "This CSR was approved by the Node CSR Approver",
  "type": "Approved"
}
{
  "name": "csr-wzpgc",
  "message": "This CSR was approved by kubectl certificate approve.",
  "type": "Approved"
}

The one with no status

{
            "apiVersion": "certificates.k8s.io/v1beta1",
            "kind": "CertificateSigningRequest",
            "metadata": {
                "creationTimestamp": "2019-08-29T09:34:07Z",
                "generateName": "csr-",
                "name": "csr-dh6lw",
                "resourceVersion": "8313",
                "selfLink": "/apis/certificates.k8s.io/v1beta1/certificatesigningrequests/csr-dh6lw",
                "uid": "25d56045-ca40-11e9-b754-12e8f2535c10"
            },
            "spec": {
                "groups": [
                    "system:nodes",
                    "system:authenticated"
                ],
                "request": "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlJQlJUQ0I3QUlCQURCS01SVXdFd1lEVlFRS0V3eHplWE4wWlcwNmJtOWtaWE14TVRBdkJnTlZCQU1US0hONQpjM1JsYlRwdWIyUmxPbWx3TFRFd0xUQXRNVE15TFRFMk5pNWxZekl1YVc1MFpYSnVZV3d3V1RBVEJnY3Foa2pPClBRSUJCZ2dxaGtqT1BRTUJCd05DQUFRL2xhVlJBR3V4V2d6N3htV3pxZnBOa0JEdDhuM3I5OEVlQ1V5Q0d4SGgKSExURHJDS3hMZm5HallWZTZtWk05Y2RDS1VCSnR3OTgxQ0tRL0dKaUswd09vRUF3UGdZSktvWklodmNOQVFrTwpNVEV3THpBdEJnTlZIUkVFSmpBa2doeHBjQzB4TUMwd0xURXpNaTB4TmpZdVpXTXlMbWx1ZEdWeWJtRnNod1FLCkFJU21NQW9HQ0NxR1NNNDlCQU1DQTBnQU1FVUNJUUQ4eEdiU3hKeFlwNW1maTZ3djdHOU50U3FUWGF5K0VySjgKTmxPbkRVbEl1Z0lnUURTVU9XV1g4RTMwUjY5REhOZk4yYXJsdFVWWVZWL3doa1NjQlNnVzAzRT0KLS0tLS1FTkQgQ0VSVElGSUNBVEUgUkVRVUVTVC0tLS0tCg==",
                "usages": [
                    "digital signature",
                    "key encipherment",
                    "server auth"
                ],
                "username": "system:node:ip-10-0-132-166.ec2.internal"
            },
            "status": {}
        },

ip-10-0-132-166 is the node where any tests the need to read logs, exec, etc are failing

machine-approver is dropping CSRs.  Ryan was saying that it is watching, but not listing, which might cause it to miss seeing new CSRs.

Comment 3 Jan Chaloupka 2019-09-04 07:58:45 UTC
From the logs:

```
I0829 09:34:07.302930       1 main.go:107] CSR csr-dh6lw added
I0829 09:34:07.310565       1 main.go:132] CSR csr-dh6lw not authorized: No target machine
I0829 09:34:07.310588       1 main.go:164] Error syncing csr csr-dh6lw: No target machine
I0829 09:34:07.315773       1 main.go:107] CSR csr-dh6lw added
I0829 09:34:07.326631       1 main.go:132] CSR csr-dh6lw not authorized: No target machine
I0829 09:34:07.326747       1 main.go:164] Error syncing csr csr-dh6lw: No target machine
I0829 09:34:07.336960       1 main.go:107] CSR csr-dh6lw added
I0829 09:34:07.343642       1 main.go:132] CSR csr-dh6lw not authorized: No target machine
I0829 09:34:07.343663       1 main.go:164] Error syncing csr csr-dh6lw: No target machine
I0829 09:34:07.363872       1 main.go:107] CSR csr-dh6lw added
I0829 09:34:07.369072       1 main.go:132] CSR csr-dh6lw not authorized: No target machine
I0829 09:34:07.369095       1 main.go:164] Error syncing csr csr-dh6lw: No target machine
I0829 09:34:07.409320       1 main.go:107] CSR csr-dh6lw added
I0829 09:34:07.415788       1 main.go:132] CSR csr-dh6lw not authorized: No target machine
I0829 09:34:07.415815       1 main.go:164] Error syncing csr csr-dh6lw: No target machine
I0829 09:34:07.496074       1 main.go:107] CSR csr-dh6lw added
I0829 09:34:07.501979       1 main.go:132] CSR csr-dh6lw not authorized: No target machine
E0829 09:34:07.502004       1 main.go:174] No target machine
I0829 09:34:07.502013       1 main.go:175] Dropping CSR "csr-dh6lw" out of the queue: No target machine
```

The csr was dropped after 0.2s.

Comment 4 Andrew McDermott 2019-09-04 08:01:02 UTC
This is likely to be fixed by: https://bugzilla.redhat.com/show_bug.cgi?id=1746521

Comment 5 Jan Chaloupka 2019-09-04 08:06:16 UTC
The following PR increases the number of retries before a csr is dropped. That should give enough time to avoid dropping any crs in less than 1s.

https://github.com/openshift/cluster-machine-approver/pull/41

Comment 6 Andrew McDermott 2019-09-04 08:29:05 UTC
(In reply to Jan Chaloupka from comment #5)
> The following PR increases the number of retries before a csr is dropped.
> That should give enough time to avoid dropping any crs in less than 1s.
> 
> https://github.com/openshift/cluster-machine-approver/pull/41

Related BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1746521

Comment 8 Jianwei Hou 2019-09-09 04:11:18 UTC
Verified in 4.2.0-0.nightly-2019-09-08-180038

This has not been reproduced in the large scale test

Comment 9 errata-xmlrpc 2019-10-16 06:38:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.