1997404 – SSH'ing in to node does not result in annotation being applied to node

Bug 1997404 - SSH'ing in to node does not result in annotation being applied to node

Summary: SSH'ing in to node does not result in annotation being applied to node

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Documentation
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	4.8.z
Assignee:	Jessi
QA Contact:	Rio Liu
Docs Contact:	Latha S
URL:
Whiteboard:
Duplicates (1):	1842603 (view as bug list)
Depends On:	2010947
Blocks:	ocp-47-z-tracker
TreeView+	depends on / blocked

Reported:	2021-08-25 07:42 UTC by Andy Bartlett
Modified:	2023-03-09 01:06 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-03-09 01:06:12 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	2010947	1	unspecified	CLOSED	journalctl -f with MESSAGE_ID= before messages with that ID exist in the journal results in "buffered" output of future ...	2023-09-21 11:16:29 UTC
Red Hat Issue Tracker	MCO-270	0	None	None	None	2022-07-13 20:04:12 UTC

Internal Links: 2010947

Description Andy Bartlett 2021-08-25 07:42:25 UTC

Description of problem:

I have a customer who raised this issue to me, and I can reproduce on my lab. They are trying to monitor any SSH access to nodes and as you can see from my results on my lab below this fails to work.

Version-Release number of selected component (if applicable):

OCP 4.7.24

How reproducible:
100%

Steps to Reproduce:

1. Check for any nodes accessed

[kni@prov-0 ~]$ oc get nodes -o 'custom-columns=Node Name:.metadata.name,Machine Name:.metadata.annotations.machine\.openshift\.io/machine,SSHAccessed:.metadata.annotations.machineconfiguration\.openshift\.io/ssh'
Node Name                         Machine Name                                           SSHAccessed
master-0.ocp4-bare.andytest.lab   openshift-machine-api/ocp4-bare-dbnww-master-0         <none>
master-1.ocp4-bare.andytest.lab   openshift-machine-api/ocp4-bare-dbnww-master-1         <none>
master-2.ocp4-bare.andytest.lab   openshift-machine-api/ocp4-bare-dbnww-master-2         <none>
worker-1.ocp4-bare.andytest.lab   openshift-machine-api/ocp4-bare-dbnww-worker-0-ktcng   <none>
worker-2.ocp4-bare.andytest.lab   openshift-machine-api/ocp4-bare-dbnww-worker-0-fhhwn   <none>

2. Access a node, and run ls just to make sure its working ok

[kni@prov-0 ~]$ ssh core.andytest.lab
Warning: the ECDSA host key for 'master-0.ocp4-bare.andytest.lab' differs from the key for the IP address '192.168.2.50'
Offending key for IP in /home/kni/.ssh/known_hosts:6
Matching host key in /home/kni/.ssh/known_hosts:8
Are you sure you want to continue connecting (yes/no)? yes
Red Hat Enterprise Linux CoreOS 47.84.202108052031-0
  Part of OpenShift 4.7, RHCOS is a Kubernetes native operating system
  managed by the Machine Config Operator (`clusteroperator/machine-config`).

WARNING: Direct SSH access to machines is not recommended; instead,
make configuration changes via `machineconfig` objects:
  https://docs.openshift.com/container-platform/4.7/architecture/architecture-rhcos.html

---
Last login: Fri Jul 30 10:00:16 2021 from 192.168.2.250
[core@master-0 ~]$ ls -al
total 16
drwx------. 4 core core 109 Jul 30 10:01 .
drwxr-xr-x. 3 root root  18 Jul 24 10:15 ..
-rw-------. 1 core core  50 Jul 30 10:01 .bash_history
-rw-r--r--. 1 core core  18 Mar 25 16:45 .bash_logout
-rw-r--r--. 1 core core 141 Mar 25 16:45 .bash_profile
-rw-r--r--. 1 core core 376 Mar 25 16:45 .bashrc
drwxr-xr-x. 3 core core  19 Jul 30 10:00 .local
drwx------. 2 core core  29 Aug 22 09:39 .ssh
[core@master-0 ~]$ exit
logout
Connection to master-0.ocp4-bare.andytest.lab closed.

3. Check node after SSH access

[kni@prov-0 ~]$ oc get nodes -o 'custom-columns=Node Name:.metadata.name,Machine Name:.metadata.annotations.machine\.openshift\.io/machine,SSHAccessed:.metadata.annotations.machineconfiguration\.openshift\.io/ssh'
Node Name                         Machine Name                                           SSHAccessed
master-0.ocp4-bare.andytest.lab   openshift-machine-api/ocp4-bare-dbnww-master-0         <none>
master-1.ocp4-bare.andytest.lab   openshift-machine-api/ocp4-bare-dbnww-master-1         <none>
master-2.ocp4-bare.andytest.lab   openshift-machine-api/ocp4-bare-dbnww-master-2         <none>
worker-1.ocp4-bare.andytest.lab   openshift-machine-api/ocp4-bare-dbnww-worker-0-ktcng   <none>
worker-2.ocp4-bare.andytest.lab   openshift-machine-api/ocp4-bare-dbnww-worker-0-fhhwn   <none>

Actual results:

The SSHAccessed label is not set

Expected results:

I expect the SSHAccessed label to set to reflect me accessing my nodes.

Additional info:

Comment 3 John Kyros 2021-08-30 21:04:40 UTC

From the must-gather, in pod machine-config-daemon-8h8ck on master-0.ocp4-bare.andytest.lab, it looks like that node might be having some connectivity issues: 

2021-08-22T09:45:58.388270376Z E0822 09:45:58.384638    6405 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://172.30.0.1:443/api/v1/nodes?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout
2021-08-22T09:46:00.523698766Z I0822 09:46:00.523244    6405 daemon.go:381] Node master-0.ocp4-bare.andytest.lab is part of the control plane
2021-08-22T09:46:01.170642137Z I0822 09:46:01.167861    6405 daemon.go:802] Current config: rendered-master-012acc289869be3f8becc00e86aec428
2021-08-22T09:46:01.170642137Z I0822 09:46:01.167889    6405 daemon.go:803] Desired config: rendered-master-754da41cc91abf2c7a0f19bc7e8745cf
2021-08-22T09:46:01.202362117Z I0822 09:46:01.201866    6405 update.go:1943] Disk currentConfig rendered-master-754da41cc91abf2c7a0f19bc7e8745cf overrides node's currentConfig annotation rendered-master-012acc289869be3f8becc00e86aec428
2021-08-22T09:46:01.216451220Z I0822 09:46:01.215039    6405 daemon.go:1085] Validating against pending config rendered-master-754da41cc91abf2c7a0f19bc7e8745cf
2021-08-22T09:46:01.298839675Z I0822 09:46:01.298727    6405 daemon.go:1096] Validated on-disk state
2021-08-22T09:46:01.512403291Z I0822 09:46:01.511434    6405 daemon.go:1151] Completing pending config rendered-master-754da41cc91abf2c7a0f19bc7e8745cf
2021-08-22T09:46:01.621500050Z I0822 09:46:01.621402    6405 update.go:1943] completed update for config rendered-master-754da41cc91abf2c7a0f19bc7e8745cf
2021-08-22T09:46:01.644233131Z I0822 09:46:01.642890    6405 daemon.go:1167] In desired config rendered-master-754da41cc91abf2c7a0f19bc7e8745cf
2021-08-22T09:57:47.016779861Z W0822 09:57:47.016703    6405 reflector.go:436] k8s.io/client-go/informers/factory.go:134: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2021-08-22T09:57:47.016971691Z W0822 09:57:47.016720    6405 reflector.go:436] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101: watch of *v1.MachineConfig ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding

If the mcd doesn't have a client connection to get to the node object, that would prevent the SSHAccessed annotation from being set on the node object. 

Looking at the mcd pods on the other nodes, they appear to be reporting connectivity errors there too. 

In the the host service logs, I see some ovs stuff going on. Stop times are: 

	Aug 22 09:42:00.108073 master-0.ocp4-bare.andytest.lab systemd[1]: Stopped Open vSwitch Forwarding Unit. (this is before the error) 
	Aug 22 09:50:53.934653 master-1.ocp4-bare.andytest.lab systemd[1]: Stopped Open vSwitch Forwarding Unit.
	Aug 22 09:57:14.952266 master-2.ocp4-bare.andytest.lab systemd[1]: Stopped Open vSwitch Forwarding Unit.

I also see: 

	Aug 22 09:44:33.312315 master-0.ocp4-bare.andytest.lab systemd[1]: ovs-configuration.service: Succeeded.
	Aug 22 09:44:33.313290 master-0.ocp4-bare.andytest.lab systemd[1]: Started Configures OVS with proper host networking configuration.
	Aug 22 09:44:33.313877 master-0.ocp4-bare.andytest.lab systemd[1]: ovs-configuration.service: Consumed 262ms CPU time


Was this just a clean cluster build or were other things done to it ?

Were you by chance testing/doing anything that would have affected connectivity before this occurred?

Comment 10 John Kyros 2021-10-12 21:27:24 UTC

*** Bug 1842603 has been marked as a duplicate of this bug. ***

Comment 18 Jessi 2023-01-10 20:40:08 UTC

https://github.com/openshift/openshift-docs/pull/54465

Starting deprecation notice.

Comment 19 Jessi 2023-01-10 20:55:08 UTC

https://github.com/openshift/openshift-docs/pull/54465

Starting deprecation notice.

Comment 21 Shiftzilla 2023-03-09 01:06:12 UTC

OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira.

https://issues.redhat.com/browse/OCPBUGS-8958

Note You need to log in before you can comment on or make changes to this bug.