2072058 – Worker file integrity remains in initializing state.

Bug 2072058 - Worker file integrity remains in initializing state.

Summary: Worker file integrity remains in initializing state.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	File Integrity Operator
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Matt Rogers
QA Contact:	xiyuan
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-04-05 13:57 UTC by German Parente
Modified:	2022-07-18 01:35 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-05-23 09:57:12 UTC
Target Upstream Version:
Embargoed:
Flags:	xiyuan: needinfo- xiyuan: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift file-integrity-operator pull 224	None	open	Bug 2072058: Use correct data keys for script configMaps	2022-04-07 21:22:15 UTC
Red Hat Knowledge Base (Solution)	6898781	None	None	None	2022-04-07 21:28:38 UTC
Red Hat Product Errata	RHBA-2022:1331	None	None	None	2022-05-23 09:57:15 UTC

Description German Parente 2022-04-05 13:57:55 UTC

Description of problem:

worker file integrity working fine and showing the results. However, it's showing 

 oc get fileintegrity worker-fileintegrity -o=jsonpath='{.status}'
{"phase":"Initializing"}

aide worker pods are running:

aide-worker-fileintegrity-l4nx6                                  1/1     Running   2          25h
aide-worker-fileintegrity-nmc6d                                  1/1     Running   2          25h
aide-worker-fileintegrity-pjpbg                                  1/1     Running   2          25h
aide-worker-fileintegrity-twt25                                  1/1     Running   2          25h
aide-worker-fileintegrity-zq86j                                  1/1     Running   2          25h
file-integrity-operator-6d877d8c59-vmvt6                         1/1     Running   0          22h

Following more details internally.

Comment 23 xiyuan 2022-05-11 08:21:24 UTC

Hi Matt,

The bug was reproduced with v0.1.21 > v0.1.22 FIO upgrade. So verify it with v0.1.21 > v0.1.24 FIO upgrade.
Generally it is fine, the cm aide-reinit was updated after FIO uprade; and the db reinit succeeded when user trigger manual reinit after FIO upgrade done.
The only problem is /hostroot/run/aide.reinit is missing on node. Is it expected? Thanks.

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.39    True        False         72m     Cluster version is 4.8.39

1. install FIO v0.1.21, create FileIntegrity, and trigger failure:
$ oc get ip
NAME            CSV                               APPROVAL    APPROVED
install-pzl7d   file-integrity-operator.v0.1.21   Automatic   true
$ oc get csv -w
NAME                               DISPLAY                            VERSION     REPLACES   PHASE
elasticsearch-operator.5.2.10-20   OpenShift Elasticsearch Operator   5.2.10-20              
file-integrity-operator.v0.1.21    File Integrity Operator            0.1.21                 Installing
file-integrity-operator.v0.1.21    File Integrity Operator            0.1.21                 Succeeded
^C$ oc get pod
NAME                                       READY   STATUS    RESTARTS   AGE
file-integrity-operator-748cf55bbd-s4v59   1/1     Running   0          30s
$ oc create -f - << EOF
> apiVersion: fileintegrity.openshift.io/v1alpha1
> kind: FileIntegrity
> metadata:
>   name: example-fileintegrity
>   namespace: openshift-file-integrity
> spec:
>   # Change to debug: true to enable more verbose logging from the logcollector
>   # container in the aide pods
>   debug: false
>   config: 
>     gracePeriod: 90
> EOF

fileintegrity.fileintegrity.openshift.io/example-fileintegrity created

$ oc extract cm/aide-reinit --confirm 
aide.sh
$ cat aide.sh 
#!/bin/sh
    touch /hostroot/etc/kubernetes/aide.reinit
$ oc extract cm/aide-pause --confirm
pause.sh
$ cat pause.sh 
#!/bin/sh
	sleep infinity & PID=$!
	trap "kill $PID" INT TERM
	wait $PID || true
$ oc get fileintegritynodestatuses
NAME                                                     NODE                               STATUS
example-fileintegrity-xiyuan11-48-bpggz-master-0         xiyuan11-48-bpggz-master-0         Failed
example-fileintegrity-xiyuan11-48-bpggz-master-1         xiyuan11-48-bpggz-master-1         Succeeded
example-fileintegrity-xiyuan11-48-bpggz-master-2         xiyuan11-48-bpggz-master-2         Succeeded
example-fileintegrity-xiyuan11-48-bpggz-worker-0-6rc8j   xiyuan11-48-bpggz-worker-0-6rc8j   Failed
example-fileintegrity-xiyuan11-48-bpggz-worker-0-mnh67   xiyuan11-48-bpggz-worker-0-mnh67   Succeeded
example-fileintegrity-xiyuan11-48-bpggz-worker-0-t4spn   xiyuan11-48-bpggz-worker-0-t4spn   Succeeded

2. upgrade to v0.1.24:

$ oc get ip
NAME            CSV                               APPROVAL    APPROVED
install-7rvcj   file-integrity-operator.v0.1.24   Automatic   true
install-pzl7d   file-integrity-operator.v0.1.21   Automatic   true
[xiyuan@MiWiFi-RA69-srv func]$ oc get csv
NAME                               DISPLAY                            VERSION     REPLACES   PHASE
elasticsearch-operator.5.2.10-20   OpenShift Elasticsearch Operator   5.2.10-20              Succeeded
file-integrity-operator.v0.1.21    File Integrity Operator            0.1.21                 Succeeded
$ oc get csv
NAME                               DISPLAY                            VERSION     REPLACES                          PHASE
elasticsearch-operator.5.2.10-20   OpenShift Elasticsearch Operator   5.2.10-20                                     Succeeded
file-integrity-operator.v0.1.24    File Integrity Operator            0.1.24      file-integrity-operator.v0.1.21   Succeeded


$ oc extract cm/aide-reinit --confirm
aide.sh
$ cat aide.sh 
#!/bin/sh
    touch /hostroot/run/aide.reinit

3. trigger reinit manually:
$ oc debug node/xiyuan11-48-bpggz-master-0 -- chroot /host ls -ltr /etc/kubernetes
Starting pod/xiyuan11-48-bpggz-master-0-debug ...
To use host binaries, run `chroot /host`
total 3860
-rw-r--r--.  1 root root    9179 May 11 06:16 kubeconfig
drwxr-xr-x.  3 root root      19 May 11 06:17 cni
drwxr-xr-x.  3 root root      20 May 11 06:17 kubelet-plugins
drwxr-xr-x. 19 root root    4096 May 11 06:44 static-pod-resources
-rw-r--r--.  1 root root     101 May 11 06:50 apiserver-url.env
drwxr-xr-x.  2 root root     192 May 11 06:50 manifests
-rw-r--r--.  1 root root    5875 May 11 06:50 kubelet-ca.crt
-rw-r--r--.  1 root root    1123 May 11 06:50 ca.crt
-rw-r--r--.  1 root root      94 May 11 06:50 cloud.conf
-rw-r--r--.  1 root root    1076 May 11 06:50 kubelet.conf
-rw-------.  1 root root      67 May 11 07:23 aide.log.backup-20220511T07_23_30
-rw-------.  1 root root 1946990 May 11 07:24 aide.db.gz.new
-rw-------.  1 root root 1946990 May 11 07:24 aide.db.gz
-rw-------.  1 root root     877 May 11 07:45 aide.log.new
-rw-------.  1 root root     877 May 11 07:45 aide.log

Removing debug pod ...
$ oc annotate fileintegrities/example-fileintegrity  file-integrity.openshift.io/re-init=
fileintegrity.fileintegrity.openshift.io/example-fileintegrity annotated

$ oc get fileintegrity example-fileintegrity -o=jsonpath={.status}
{"phase":"Initializing"}[xiyuan@MiWiFi-RA69-srv func]$ 
$ oc get fileintegrity example-fileintegrity -o=jsonpath={.status}
{"phase":"Active"}

$ oc debug node/xiyuan11-48-bpggz-master-0 -- ls -ltr /hostroot/run/aide.reinit
Starting pod/xiyuan11-48-bpggz-master-0-debug ...
To use host binaries, run `chroot /host`
ls: cannot access '/hostroot/run/aide.reinit': No such file or directory

Removing debug pod ...
error: non-zero exit code from debug container


$ oc debug node/xiyuan11-48-bpggz-master-0 -- chroot /host ls -ltr /etc/kubernetes
Starting pod/xiyuan11-48-bpggz-master-0-debug ...
To use host binaries, run `chroot /host`
total 5764
-rw-r--r--.  1 root root    9179 May 11 06:16 kubeconfig
drwxr-xr-x.  3 root root      19 May 11 06:17 cni
drwxr-xr-x.  3 root root      20 May 11 06:17 kubelet-plugins
drwxr-xr-x. 19 root root    4096 May 11 06:44 static-pod-resources
-rw-r--r--.  1 root root     101 May 11 06:50 apiserver-url.env
drwxr-xr-x.  2 root root     192 May 11 06:50 manifests
-rw-r--r--.  1 root root    5875 May 11 06:50 kubelet-ca.crt
-rw-r--r--.  1 root root    1123 May 11 06:50 ca.crt
-rw-r--r--.  1 root root      94 May 11 06:50 cloud.conf
-rw-r--r--.  1 root root    1076 May 11 06:50 kubelet.conf
-rw-------.  1 root root      67 May 11 07:23 aide.log.backup-20220511T07_23_30
-rw-------.  1 root root 1946990 May 11 07:47 aide.db.gz.backup-20220511T07_47_50
-rw-------.  1 root root     877 May 11 07:47 aide.log.backup-20220511T07_47_50
-rw-------.  1 root root 1947002 May 11 07:48 aide.db.gz.new
-rw-------.  1 root root 1947002 May 11 07:48 aide.db.gz
-rw-------.  1 root root     651 May 11 07:52 aide.log
-rw-------.  1 root root       0 May 11 07:53 aide.log.new

Removing debug pod ...

Comment 24 xiyuan 2022-05-11 13:11:33 UTC

Correct the command to check /hostroot/run/aide.reinit in  https://bugzilla.redhat.com/show_bug.cgi?id=2072058#c23. The same result.
$ oc debug node/xiyuan11-48-bpggz-master-0 -- chroot /host  ls -ltr /run/aide.reinit
Starting pod/xiyuan11-48-bpggz-master-0-debug ...
To use host binaries, run `chroot /host`
ls: cannot access '/run/aide.reinit': No such file or directory

Removing debug pod ...
error: non-zero exit code from debug container

Comment 25 xiyuan 2022-05-11 14:28:13 UTC

per https://bugzilla.redhat.com/show_bug.cgi?id=2072058#c23 and https://bugzilla.redhat.com/show_bug.cgi?id=2049206#c14, move it to verified

Comment 27 errata-xmlrpc 2022-05-23 09:57:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift File Integrity Operator bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:1331

Note You need to log in before you can comment on or make changes to this bug.