Bug 1953999 - NNCP fails to Configure - Internal Error
Summary: NNCP fails to Configure - Internal Error
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Networking
Version: 4.8.0
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: 4.8.0
Assignee: Petr Horáček
QA Contact: Ofir Nash
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-27 12:02 UTC by Ofir Nash
Modified: 2021-07-27 14:31 UTC (History)
4 users (show)

Fixed In Version: kubernetes-nmstate-handler-container-v4.8.0-15
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 14:30:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
NNCP Example (448 bytes, text/plain)
2021-04-27 12:02 UTC, Ofir Nash
no flags Details
nmstate-handler pod logs (8.75 KB, text/plain)
2021-04-27 12:51 UTC, Ofir Nash
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2021:2920 0 None None None 2021-07-27 14:31:34 UTC

Description Ofir Nash 2021-04-27 12:02:14 UTC
Created attachment 1775915 [details]
NNCP Example

Created attachment 1775915 [details]
NNCP Example

Created attachment 1775915 [details]
NNCP Example

Created attachment 1775915 [details]
NNCP Example

Description of problem: When trying to apply any basic NNCP, it fails to configure with Internal Error (Error attached in the Actual results section).


Version-Release number of selected component (if applicable):
CNV/OCP 4.8
nmstate-handler: v4.8.0-13

How reproducible:
Always - When trying to create any NNCP

Steps to Reproduce:
1. Connect to a 4.8 cluster.
2. Create a NNCP Yaml (Attached example).
3. Apply the NNCP:
[cnv-qe-jenkins@onash-48-2-svkh5-executor ofir]$ oc apply -f nncp_test.yaml 

Actual results: 
[cnv-qe-jenkins@onash-48-2-svkh5-executor ofir]$ oc apply -f nncp_test.yaml 
Error from server (InternalError): error when creating "nncp_test.yaml": Internal error occurred: failed calling webhook "nodenetworkconfigurationpolicies-mutate.nmstate.io": Post "https://nmstate-webhook.openshift-cnv.svc:443/nodenetworkconfigurationpolicies-mutate?timeout=10s": dial tcp 10.130.0.97:8443: connect: connection refused


Expected results:
Apply NNCP successfully.

Additional info:
Happens with any NNCP configuration.

nmstate-handler pod logs:
> oc logs -n openshift-cnv nmstate-handler-d4c4g
{"level":"info","ts":1619517400.1112473,"logger":"controllers.NodeNetworkConfigurationPolicy.initializeEnactment","msg":"creating enactment","policy":"dns-onash-48-2-svkh5-worker-0-qlp8k","enactment":"onash-48-2-svkh5-worker-0-zj6rn.dns-onash-48-2-svkh5-worker-0-qlp8k"}
{"level":"info","ts":1619517401.1171918,"logger":"enactmentstatus","msg":"status: {DesiredState:dns-resolver:\n  config:\n    search:\n    - example.com\n    server:\n    - 8.8.8.8\n PolicyGeneration:1 Conditions:[]}","enactment":"onash-48-2-svkh5-worker-0-zj6rn.dns-onash-48-2-svkh5-worker-0-qlp8k"}
{"level":"info","ts":1619517401.1249983,"logger":"enactmentstatus","msg":"enactment updated at the node: true","enactment":"onash-48-2-svkh5-worker-0-zj6rn.dns-onash-48-2-svkh5-worker-0-qlp8k"}
{"level":"info","ts":1619517401.125237,"logger":"controllers.NodeNetworkConfigurationPolicy","msg":"Policy node selectors does not match node","nodenetworkconfigurationpolicy":"/dns-onash-48-2-svkh5-worker-0-qlp8k"}
{"level":"info","ts":1619517401.1252692,"logger":"enactmentconditions","msg":"NotifyNodeSelectorNotMatching","enactment":"onash-48-2-svkh5-worker-0-zj6rn.dns-onash-48-2-svkh5-worker-0-qlp8k"}
{"level":"info","ts":1619517401.1254523,"logger":"enactmentstatus","msg":"status: {DesiredState:dns-resolver:\n  config:\n    search:\n    - example.com\n    server:\n    - 8.8.8.8\n PolicyGeneration:1 Conditions:[{Type:Failing Status:False Reason:NodeSelectorNotMatching Message: LastHeartbeatTime:2021-04-27 09:56:41.125312718 +0000 UTC m=+57368.359485849 LastTransitionTime:2021-04-27 09:56:41.125312718 +0000 UTC m=+57368.359485849} {Type:Available Status:False Reason:NodeSelectorNotMatching Message: LastHeartbeatTime:2021-04-27 09:56:41.125313846 +0000 UTC m=+57368.359486839 LastTransitionTime:2021-04-27 09:56:41.125313846 +0000 UTC m=+57368.359486839} {Type:Progressing Status:False Reason:NodeSelectorNotMatching Message: LastHeartbeatTime:2021-04-27 09:56:41.125314526 +0000 UTC m=+57368.359487508 LastTransitionTime:2021-04-27 09:56:41.125314526 +0000 UTC m=+57368.359487508} {Type:Matching Status:False Reason:NodeSelectorNotMatching Message:Unmatching labels: map[kubernetes.io/hostname:onash-48-2-svkh5-worker-0-qlp8k] LastHeartbeatTime:2021-04-27 09:56:41.125315142 +0000 UTC m=+57368.359488121 LastTransitionTime:2021-04-27 09:56:41.125315142 +0000 UTC m=+57368.359488121} {Type:Aborted Status:False Reason:NodeSelectorNotMatching Message: LastHeartbeatTime:2021-04-27 09:56:41.125315511 +0000 UTC m=+57368.359488495 LastTransitionTime:2021-04-27 09:56:41.125315511 +0000 UTC m=+57368.359488495}]}","enactment":"onash-48-2-svkh5-worker-0-zj6rn.dns-onash-48-2-svkh5-worker-0-qlp8k"}
{"level":"info","ts":1619517401.135619,"logger":"enactmentstatus","msg":"enactment updated at the node: false","enactment":"onash-48-2-svkh5-worker-0-zj6rn.dns-onash-48-2-svkh5-worker-0-qlp8k"}
{"level":"info","ts":1619517402.135976,"logger":"enactmentstatus","msg":"enactment updated at the node: true","enactment":"onash-48-2-svkh5-worker-0-zj6rn.dns-onash-48-2-svkh5-worker-0-qlp8k"}
{"level":"info","ts":1619517402.137277,"logger":"policyconditions","msg":"enactments count: {failed: {true: 0, false: 5, unknown: 1}, progressing: {true: 1, false: 5, unknown: 0}, available: {true: 0, false: 5, unknown: 1}, matching: {true: 1, false: 5, unknown: 0}, aborted: {true: 0, false: 6, unknown: 0}}","policy":"dns-onash-48-2-svkh5-worker-0-qlp8k"}

Comment 1 Ofir Nash 2021-04-27 12:51:13 UTC
Created attachment 1775921 [details]
nmstate-handler pod logs

Comment 2 Quique Llorente 2021-04-27 14:18:26 UTC
Looks like the  nmstate-webhook is at crashloop

nmstate-webhook-5cbd5f7445-fd79b                      0/1     CrashLoopBackOff   71         20h
nmstate-webhook-5cbd5f7445-mwxcl                      0/1     CrashLoopBackOff   38         20h

After removing the secrets and those pods the system goes back to normal

Now NNCP should be fine

I think something is fishy with cert rotations

Comment 3 Quique Llorente 2021-04-27 14:52:24 UTC
I have force fast rotation editing HCO elements with

spec:
  certConfig:
    ca:
      duration: 48h0m0s
      renewBefore: 24h0m0s
    server:
      duration: 5m0s
      renewBefore: 2m30s


I observe a pair of issue one is that editing those parameters create some incosistencies with secrets with secrets created with different values and also 
after rotation we go back to the tls: private key does not match public key

I will try to reproduce this u/s

Comment 4 Quique Llorente 2021-04-28 05:16:37 UTC
This should fix it u/s https://github.com/qinqon/kube-admission-webhook/pull/48

Comment 5 Ofir Nash 2021-05-13 12:04:38 UTC
Verified on version: "kubernetes-nmstate-handler-container: v4.8.0-15"

Scenario checked:
1. Created and applied NNCP from the attachments - SuccessfullyConfigured.

[cnv-qe-jenkins@net-48-xlarge-5f997-executor ofir]$ oc get nncp -A
NAME       STATUS
br3-nncp   SuccessfullyConfigured

2. nmstate-webhook pods are Running successfully:

openshift-cnv                                      nmstate-webhook-6fcdcd9cd4-bxrzh                                  1/1     Running     3          42h
openshift-cnv                                      nmstate-webhook-6fcdcd9cd4-jfr9q                                  1/1     Running     5          42h

Comment 8 errata-xmlrpc 2021-07-27 14:30:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.8.0 Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2920


Note You need to log in before you can comment on or make changes to this bug.