Bug 1905730

Summary: ovn-ipsec pods reporting misleading logs before getting to stable state
Product: OpenShift Container Platform Reporter: Anurag saxena <anusaxen>
Component: NetworkingAssignee: Mark Gray <mark.d.gray>
Networking sub component: ovn-kubernetes QA Contact: Anurag saxena <anusaxen>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: medium    
Priority: medium CC: aconstan, anbhat, aos-bugs, bbennett, juzhao, kewang, mark.d.gray, mfojtik, mtleilia, rbrattai, vpickard, xxia, zzhao
Version: 4.7   
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1914402 (view as bug list) Environment:
Last Closed: 2021-02-08 17:28:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Anurag saxena 2020-12-08 22:38:58 UTC
Description of problem: Systemd is not available on the ovn-ipsec pod and it is trying to do a restart causing misleading information to log. Already had discussion with Mark on Slack, opening this bug to track this small issue

# oc logs ovn-ipsec-8vt9w -n openshift-ovn-kubernetes
+ trap cleanup SIGTERM
+ ulimit -n 1024
+ rm -rf /etc/ipsec.conf /etc/ipsec.d /etc/ipsec.secrets
+ touch /etc/openvswitch/ipsec.conf
+ touch /etc/openvswitch/ipsec.secrets
+ mkdir -p /etc/openvswitch/ipsec.d
+ ln -s /etc/openvswitch/ipsec.conf /etc/ipsec.conf
+ ln -s /etc/openvswitch/ipsec.d /etc/ipsec.d
+ ln -s /etc/openvswitch/ipsec.secrets /etc/ipsec.secrets
+ /usr/libexec/ipsec/addconn --config /etc/openvswitch/ipsec.conf --checkconfig
+ /usr/libexec/ipsec/_stackmanager start
+ /usr/sbin/ipsec --checknss
Initializing NSS database

+ /usr/sbin/ipsec --checknflog
chroot: cannot change root directory to '/host': No such file or directory
nflog ipsec capture disabled
+ /usr/libexec/ipsec/pluto --leak-detective --config /etc/openvswitch/ipsec.conf --logfile /var/log/openvswitch/libreswan.log
+ OVS_LOGDIR=/var/log/openvswitch
+ OVS_RUNDIR=/var/run/openvswitch
+ OVS_PKGDATADIR=/usr/share/openvswitch
+ /usr/share/openvswitch/scripts/ovs-ctl --ike-daemon=libreswan start-ovs-ipsec
2020-12-08T15:41:54Z |  0  | ovs-monitor-ipsec | ERR | Failed to clear NSS database.
startswith first arg must be bytes or a tuple of bytes, not str
2020-12-08T15:41:54Z |  1  | ovs-monitor-ipsec | INFO | Restarting LibreSwan
Redirecting to: systemctl restart ipsec.service
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down
+ true
+ sleep 60
+ true
+ sleep 60
+ true
+ sleep 60


Version-Release number of selected component (if applicable):4.7.0-0.ci-2020-12-08-180515


How reproducible:Always


Steps to Reproduce:
1.Install ovn ipsec cluster 
2.
3.

Actual results:Misleading logs under ovn-ipsec pods


Expected results: Should not have Failed logs 


Additional info:

Comment 2 Ke Wang 2020-12-30 07:52:37 UTC
We setup two clsuters with ovn network on aws, one enabled ipsec and the other without ipsec. observed them for more than ten hours, found openshfit-apiserver was unavailable now and then on the cluster enabled ipsec, 

$ oc get networks.config.openshift.io/cluster -o yaml
...
spec:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  externalIP:
    policy: {}
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16
...

$ STA_LAST=""; while true; do OUT=`oc get co openshift-apiserver --no-headers`; STA=`echo "$OUT"|                                                         
awk '{print $3 " " $4 " " $5}'`; if [ "$STA" != "$STA_LAST" ]; then echo "`date` status is: $OUT"; STA_LAST="$STA"; fi; sleep 2; done
Mon Dec 28 22:44:53 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   True   False   False   26m
Tue Dec 29 03:38:41 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   False   False   False   1s
Tue Dec 29 03:57:42 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   True   False   False   1s
Tue Dec 29 03:58:16 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   False   False   False   1s
Tue Dec 29 04:10:56 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   True   False   False   0s
Tue Dec 29 04:11:21 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   False   False   False   0s
Tue Dec 29 04:15:48 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   True   False   False   11s
Tue Dec 29 04:15:57 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   False   False   False   2s
Tue Dec 29 04:19:34 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   True   False   False   2s
Tue Dec 29 04:20:14 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   False   False   False   1s
Tue Dec 29 04:20:44 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   True   False   False   2s
Tue Dec 29 04:21:04 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   False   False   False   0s
Tue Dec 29 04:22:35 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   True   False   False   0s
Tue Dec 29 04:22:57 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   False   False   False   1s
Tue Dec 29 04:30:03 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   True   False   False   3s
Tue Dec 29 04:30:32 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   False   False   False   1s
Tue Dec 29 04:31:16 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   True   False   False   0s
Tue Dec 29 04:31:38 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   False   False   False   0s
Tue Dec 29 04:33:59 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   True   False   False   1s
Tue Dec 29 06:42:41 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   False   False   False   1s
Tue Dec 29 07:05:02 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   True   False   False   1s
Tue Dec 29 07:05:27 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   False   False   False   2s
Tue Dec 29 07:08:53 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   True   False   False   1s
Tue Dec 29 08:04:46 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   False   False   False   1s
Tue Dec 29 08:11:22 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   True   False   False   1s
Tue Dec 29 08:11:36 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   False   False   False   1s
Tue Dec 29 08:16:02 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   True   False   False   0s
Tue Dec 29 08:16:35 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   False   False   False   1s
Tue Dec 29 08:18:02 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   True   False   False   2s
Tue Dec 29 08:18:18 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   False   False   False   1s
Tue Dec 29 08:36:44 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   True   False   False   1s
Tue Dec 29 08:37:17 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   False   False   False   7s
Tue Dec 29 08:39:27 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   True   False   False   1s
Tue Dec 29 08:39:49 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   False   False   False   2s
Tue Dec 29 08:46:11 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   True   False   False   1s
Tue Dec 29 08:46:17 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   False   False   False   1s
Tue Dec 29 08:50:55 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   True   False   False   2s
Tue Dec 29 08:51:26 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   False   False   False   1s
Tue Dec 29 09:01:50 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   True   False   False   2s
Tue Dec 29 09:02:06 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   False   False   False   1s
Tue Dec 29 09:04:34 EST 2020 status is: openshift-apiserver   4.7.0-0.nightly-2020-12-21-131655   True   False   False   2s

$ oc get co | grep -v '.True.*False.*False'
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.7.0-0.nightly-2020-12-21-131655   False       True          True       79s
console                                    4.7.0-0.nightly-2020-12-21-131655   True        False         True       6h12m
openshift-apiserver                        4.7.0-0.nightly-2020-12-21-131655   False       False         False      9m9s

Checked the openshift-apiserver-operator logs, there are many following 503 errors:

1229 08:41:30.687751       1 status_controller.go:213] clusteroperator/openshift-apiserver diff {"status":{"conditions":[{"lastTransitionTime":"2020-12-29T03:41:24Z","message":"All is well","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2020-12-29T03:19:34Z","message":"All is well","reason":"AsExpected","status":"False","type":"Progressing"},{"lastTransitionTime":"2020-12-29T08:39:04Z","message":"APIServicesAvailable: \"authorization.openshift.io.v1\" is not ready: 503 (the server is currently unable to handle the request)\nAPIServicesAvailable: \"build.openshift.io.v1\" is not ready: 503 (the server is currently unable to handle the request)\nAPIServicesAvailable: \"quota.openshift.io.v1\" is not ready: 503 (the server is currently unable to handle the request)\nAPIServicesAvailable: \"security.openshift.io.v1\" is not ready: 503 (the server is currently unable to handle the request)","reason":"APIServices_Error","status":"False","type":"Available"},{"lastTransitionTime":"2020-12-29T03:12:46Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}]}}
E1229 08:41:30.688505       1 base_controller.go:250] "APIServiceController_openshift-apiserver" controller failed to sync "key", err: "authorization.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request)
"build.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request)
"quota.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request)
"security.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request)
I1229 08:41:30.700698       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-apiserver-operator", Name:"openshift-apiserver-operator", UID:"f63d3df0-18c9-4730-9ae4-34161e75f114", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/openshift-apiserver changed: Available message changed from "APIServicesAvailable: \"authorization.openshift.io.v1\" is not ready: 503 (the server is currently unable to handle the request)\nAPIServicesAvailable: \"security.openshift.io.v1\" is not ready: 503 (the server is currently unable to handle the request)" to "APIServicesAvailable: \"authorization.openshift.io.v1\" is not ready: 503 (the server is currently unable to handle the request)\nAPIServicesAvailable: \"build.openshift.io.v1\" is not ready: 503 (the server is currently unable to handle the request)\nAPIServicesAvailable: \"quota.openshift.io.v1\" is not ready: 503 (the server is currently unable to handle the request)\nAPIServicesAvailable: \"security.openshift.io.v1\" is not ready: 503 (the server is currently unable to handle the request)"


Then checked the IPsec service on the cluster, found it doesn't at all.

$ cat /var/log/openvswitch/ovs-monitor-ipsec.log

startswith first arg must be bytes or a tuple of bytes, not str
2020-12-29T03:12:51.392Z |  1  | ovs-monitor-ipsec | INFO | Restarting LibreSwan
2020-12-29T03:12:51.472Z |  3  | reconnect | INFO | unix:/var/run/openvswitch/db.sock: connecting...
2020-12-29T03:12:51.473Z |  6  | reconnect | INFO | unix:/var/run/openvswitch/db.sock: connected
2020-12-29T03:12:51.510Z |  11 | ovs-monitor-ipsec | INFO | Tunnel ovn-fbcad4-0 appeared in OVSDB
2020-12-29T03:12:51.511Z |  13 | ovs-monitor-ipsec | INFO | Tunnel ovn-a95af2-0 appeared in OVSDB
2020-12-29T03:12:51.722Z |  15 | ovs-monitor-ipsec | INFO | Refreshing LibreSwan configuration
2020-12-29T03:17:59.219Z |  76 | ovs-monitor-ipsec | INFO | Tunnel ovn-59c046-0 appeared in OVSDB
2020-12-29T03:17:59.220Z |  78 | ovs-monitor-ipsec | INFO | Refreshing LibreSwan configuration
2020-12-29T03:17:59.255Z |  79 | ovs-monitor-ipsec | INFO | ovn-a95af2-0-in-1 is outdated 1
2020-12-29T03:17:59.310Z |  80 | ovs-monitor-ipsec | INFO | ovn-a95af2-0-out-1 is outdated 1
2020-12-29T03:17:59.349Z |  81 | ovs-monitor-ipsec | INFO | ovn-fbcad4-0-in-1 is outdated 1
2020-12-29T03:17:59.398Z |  82 | ovs-monitor-ipsec | INFO | ovn-fbcad4-0-out-1 is outdated 1
2020-12-29T03:17:59.695Z |  84 | ovs-monitor-ipsec | INFO | Tunnel ovn-9ae0bf-0 appeared in OVSDB
2020-12-29T03:17:59.696Z |  86 | ovs-monitor-ipsec | INFO | Refreshing LibreSwan configuration
2020-12-29T03:17:59.717Z |  87 | ovs-monitor-ipsec | INFO | ovn-59c046-0-in-1 is outdated 1
2020-12-29T03:17:59.771Z |  88 | ovs-monitor-ipsec | INFO | ovn-a95af2-0-in-1 is outdated 1
2020-12-29T03:17:59.844Z |  89 | ovs-monitor-ipsec | INFO | ovn-fbcad4-0-in-1 is outdated 1
2020-12-29T03:17:59.860Z |  90 | ovs-monitor-ipsec | INFO | ovn-fbcad4-0-out-1 is outdated 1
2020-12-29T03:18:14.133Z |  93 | ovs-monitor-ipsec | INFO | Tunnel ovn-fd4721-0 appeared in OVSDB
2020-12-29T03:18:14.135Z |  95 | ovs-monitor-ipsec | INFO | Refreshing LibreSwan configuration
2020-12-29T03:18:14.155Z |  96 | ovs-monitor-ipsec | INFO | ovn-59c046-0-in-1 is outdated 1
2020-12-29T03:18:14.167Z |  97 | ovs-monitor-ipsec | INFO | ovn-9ae0bf-0-in-1 is outdated 1
2020-12-29T03:18:14.178Z |  98 | ovs-monitor-ipsec | INFO | ovn-a95af2-0-in-1 is outdated 1
2020-12-29T03:18:14.221Z |  99 | ovs-monitor-ipsec | INFO | ovn-a95af2-0-out-1 is outdated 1
2020-12-29T03:18:14.253Z | 100 | ovs-monitor-ipsec | INFO | ovn-fbcad4-0-in-1 is outdated 1
2020-12-29T03:18:14.281Z | 101 | ovs-monitor-ipsec | INFO | ovn-fbcad4-0-out-1 is outdated 1


$ oc get pods -n openshift-ovn-kubernetes
NAME                   READY   STATUS    RESTARTS   AGE
ovn-ipsec-6c8sh        1/1     Running   0          28h
ovn-ipsec-7xrjm        1/1     Running   0          28h
ovn-ipsec-8ngg5        1/1     Running   0          28h
ovn-ipsec-fn275        1/1     Running   0          28h
ovn-ipsec-qdmnr        1/1     Running   0          28h
ovn-ipsec-z7hmd        1/1     Running   0          28h
ovnkube-master-5rckl   6/6     Running   3          28h
ovnkube-master-9j2hf   6/6     Running   3          28h
ovnkube-master-dvkgc   6/6     Running   0          28h
ovnkube-node-8wml8     3/3     Running   0          28h
ovnkube-node-ghlqx     3/3     Running   0          28h
ovnkube-node-jvdg4     3/3     Running   0          28h
ovnkube-node-r2xss     3/3     Running   0          28h
ovnkube-node-vxj97     3/3     Running   0          28h
ovnkube-node-zq8vf     3/3     Running   0          28h
ovs-node-2jwjm         1/1     Running   0          28h
ovs-node-4xmvr         1/1     Running   0          28h
ovs-node-7f25g         1/1     Running   0          28h
ovs-node-942gw         1/1     Running   0          28h
ovs-node-j4h5j         1/1     Running   0          28h
ovs-node-wm267         1/1     Running   0          28h

$ oc logs ovn-ipsec-6c8sh -n openshift-ovn-kubernetes
+ trap cleanup SIGTERM
+ ulimit -n 1024
+ rm -rf /etc/ipsec.conf /etc/ipsec.d /etc/ipsec.secrets
+ touch /etc/openvswitch/ipsec.conf
+ touch /etc/openvswitch/ipsec.secrets
+ mkdir -p /etc/openvswitch/ipsec.d
+ ln -s /etc/openvswitch/ipsec.conf /etc/ipsec.conf
+ ln -s /etc/openvswitch/ipsec.d /etc/ipsec.d
+ ln -s /etc/openvswitch/ipsec.secrets /etc/ipsec.secrets
+ /usr/libexec/ipsec/addconn --config /etc/openvswitch/ipsec.conf --checkconfig
+ /usr/libexec/ipsec/_stackmanager start
+ /usr/sbin/ipsec --checknss
Initializing NSS database

+ /usr/sbin/ipsec --checknflog
chroot: cannot change root directory to '/host': No such file or directory
nflog ipsec capture disabled
+ /usr/libexec/ipsec/pluto --leak-detective --config /etc/openvswitch/ipsec.conf --logfile /var/log/openvswitch/libreswan.log
+ OVS_LOGDIR=/var/log/openvswitch
+ OVS_RUNDIR=/var/run/openvswitch
+ OVS_PKGDATADIR=/usr/share/openvswitch
+ /usr/share/openvswitch/scripts/ovs-ctl --ike-daemon=libreswan start-ovs-ipsec
2020-12-29T03:12:51Z |  0  | ovs-monitor-ipsec | ERR | Failed to clear NSS database.
startswith first arg must be bytes or a tuple of bytes, not str
2020-12-29T03:12:51Z |  1  | ovs-monitor-ipsec | INFO | Restarting LibreSwan
Redirecting to: systemctl restart ipsec.service
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down
+ true
+ sleep 60
...

And on the other cluster without IPsec, all is well. So I think the incorrect IPsec configuration caused the cluster unstable.

Comment 4 Junqi Zhao 2020-12-31 05:41:44 UTC
4.7.0-0.nightly-2020-12-21-131655, OVN (Encrypted with IPSec) cluster
# oc -n openshift-ovn-kubernetes logs ovn-ipsec-9fzzn
+ trap cleanup SIGTERM
+ ulimit -n 1024
+ rm -rf /etc/ipsec.conf /etc/ipsec.d /etc/ipsec.secrets
+ touch /etc/openvswitch/ipsec.conf
+ touch /etc/openvswitch/ipsec.secrets
+ mkdir -p /etc/openvswitch/ipsec.d
+ ln -s /etc/openvswitch/ipsec.conf /etc/ipsec.conf
+ ln -s /etc/openvswitch/ipsec.d /etc/ipsec.d
+ ln -s /etc/openvswitch/ipsec.secrets /etc/ipsec.secrets
+ /usr/libexec/ipsec/addconn --config /etc/openvswitch/ipsec.conf --checkconfig
+ /usr/libexec/ipsec/_stackmanager start
+ /usr/sbin/ipsec --checknss
Initializing NSS database

+ /usr/sbin/ipsec --checknflog
chroot: cannot change root directory to '/host': No such file or directory
nflog ipsec capture disabled
+ /usr/libexec/ipsec/pluto --leak-detective --config /etc/openvswitch/ipsec.conf --logfile /var/log/openvswitch/libreswan.log
+ OVS_LOGDIR=/var/log/openvswitch
+ OVS_RUNDIR=/var/run/openvswitch
+ OVS_PKGDATADIR=/usr/share/openvswitch
+ /usr/share/openvswitch/scripts/ovs-ctl --ike-daemon=libreswan start-ovs-ipsec
2020-12-30T03:14:25Z |  0  | ovs-monitor-ipsec | ERR | Failed to clear NSS database.
startswith first arg must be bytes or a tuple of bytes, not str
2020-12-30T03:14:25Z |  1  | ovs-monitor-ipsec | INFO | Restarting LibreSwan
Redirecting to: systemctl restart ipsec.service
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down
+ true
+ sleep 60

Comment 5 Mark Gray 2021-01-04 16:41:46 UTC
(In reply to Ke Wang from comment #2)
> We setup two clsuters with ovn network on aws, one enabled ipsec and the
> other without ipsec. observed them for more than ten hours, found
> openshfit-apiserver was unavailable now and then on the cluster enabled
> ipsec, 
> 

The comment above seems to suggest that the openshift-apiserver was unavailable without IPsec configured?

Comment 6 Mark Gray 2021-01-05 07:53:36 UTC
Can you test with latest CI build? There has been an update to the installer which opens some ports required for IPsec. It looks like it was committed after the build you are using.

Comment 7 Junqi Zhao 2021-01-05 08:17:42 UTC
(In reply to Mark Gray from comment #6)
> Can you test with latest CI build? There has been an update to the installer
> which opens some ports required for IPsec. It looks like it was committed
> after the build you are using.

you can change the status to ON_QA after the fix merged to the nightly payload, if not, we can't set this bug to VERIFIED, and this bug should be verified by the network QE

Comment 11 zhaozhanqi 2021-01-06 01:55:33 UTC
according to comment 9, this at least not 'testblocker'. remove the keywrord.

Comment 12 Anurag saxena 2021-01-06 04:08:35 UTC
For now, moving this to openshift-apiserver team for further analysis. Thanks

Comment 13 Ke Wang 2021-01-08 07:02:20 UTC
Hi mark.d.gray@redhat.com, the issue was reproduced in our test environment which ipi installed with ovn + ipsec.

$ oc describe co/openshift-apiserve
...
Status:
  Conditions:
    Last Transition Time:  2021-01-08T04:56:20Z
    Message:               APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-apiserver ()
    Reason:                APIServerDeployment_UnavailablePod
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2021-01-08T04:33:22Z
    Message:               All is well
    Reason:                AsExpected
    Status:                False
    Type:                  Progressing
    Last Transition Time:  2021-01-08T05:55:18Z
    Message:               APIServicesAvailable: "apps.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request)
APIServicesAvailable: "build.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request)
APIServicesAvailable: "quota.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request)
    Reason:                APIServices_Error
    Status:                False
    Type:                  Available
    Last Transition Time:  2021-01-08T03:28:09Z
    Message:               All is well
    Reason:                AsExpected
    Status:                True
    Type:                  Upgradeable
  Extension:               <nil>
...

$ oc describe co/network
...
Spec:
Status:
  Conditions:
    Last Transition Time:  2021-01-08T03:24:46Z
    Status:                False
    Type:                  ManagementStateDegraded
    Last Transition Time:  2021-01-08T05:09:40Z
    Message:               DaemonSet "openshift-ovn-kubernetes/ovn-ipsec" rollout is not making progress - last change 2021-01-08T04:58:35Z
    Reason:                RolloutHung
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2021-01-08T03:24:46Z
    Status:                True
    Type:                  Upgradeable
    Last Transition Time:  2021-01-08T04:34:03Z
    Message:               DaemonSet "openshift-ovn-kubernetes/ovn-ipsec" is not available (awaiting 2 nodes)
    Reason:                Deploying
    Status:                True
    Type:                  Progressing
    Last Transition Time:  2021-01-08T03:30:14Z
    Status:                True
    Type:                  Available
  Extension:               <nil>


$ oc get pods -n openshift-ovn-kubernetes | grep ipsec
ovn-ipsec-6pqzl        1/1     Running   0          3h19m
ovn-ipsec-b8q8t        1/1     Running   0          3h23m
ovn-ipsec-gmm8s        1/1     Running   0          121m
ovn-ipsec-snb4g        1/1     Running   0          3h30m
ovn-ipsec-tkfmk        0/1     Running   0          3h30m
ovn-ipsec-v4xj4        1/1     Running   0          3h22m
ovn-ipsec-vb6tc        1/1     Running   0          120m
ovn-ipsec-w6tc8        0/1     Running   3          3h30m
ovn-ipsec-zqtkx        1/1     Running   0          120m

$ oc logs ovn-ipsec-6pqzl -n openshift-ovn-kubernetes
ovnkube-node has configured node.
...
+ /usr/sbin/ipsec --checknflog
chroot: cannot change root directory to '/host': No such file or directory
nflog ipsec capture disabled
+ /usr/libexec/ipsec/pluto --leak-detective --config /etc/ipsec.conf --logfile /var/log/openvswitch/libreswan.log
+ OVS_LOGDIR=/var/log/openvswitch
+ OVS_RUNDIR=/var/run/openvswitch
+ OVS_PKGDATADIR=/usr/share/openvswitch
+ /usr/share/openvswitch/scripts/ovs-ctl --ike-daemon=libreswan start-ovs-ipsec
2021-01-08T03:37:42Z |  0  | ovs-monitor-ipsec | ERR | Failed to clear NSS database.
startswith first arg must be bytes or a tuple of bytes, not str
2021-01-08T03:37:42Z |  1  | ovs-monitor-ipsec | INFO | Restarting LibreSwan
Redirecting to: systemctl restart ipsec.service
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down
...