Bug 2072058 - Worker file integrity remains in initializing state.
Summary: Worker file integrity remains in initializing state.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: File Integrity Operator
Version: 4.8
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.11.0
Assignee: Matt Rogers
QA Contact: xiyuan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-05 13:57 UTC by German Parente
Modified: 2022-07-18 01:35 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-23 09:57:12 UTC
Target Upstream Version:
Embargoed:
xiyuan: needinfo-
xiyuan: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift file-integrity-operator pull 224 0 None open Bug 2072058: Use correct data keys for script configMaps 2022-04-07 21:22:15 UTC
Red Hat Knowledge Base (Solution) 6898781 0 None None None 2022-04-07 21:28:38 UTC
Red Hat Product Errata RHBA-2022:1331 0 None None None 2022-05-23 09:57:15 UTC

Description German Parente 2022-04-05 13:57:55 UTC
Description of problem:

worker file integrity working fine and showing the results. However, it's showing 

 oc get fileintegrity worker-fileintegrity -o=jsonpath='{.status}'
{"phase":"Initializing"}

aide worker pods are running:

aide-worker-fileintegrity-l4nx6                                  1/1     Running   2          25h
aide-worker-fileintegrity-nmc6d                                  1/1     Running   2          25h
aide-worker-fileintegrity-pjpbg                                  1/1     Running   2          25h
aide-worker-fileintegrity-twt25                                  1/1     Running   2          25h
aide-worker-fileintegrity-zq86j                                  1/1     Running   2          25h
file-integrity-operator-6d877d8c59-vmvt6                         1/1     Running   0          22h

Following more details internally.

Comment 23 xiyuan 2022-05-11 08:21:24 UTC
Hi Matt,

The bug was reproduced with v0.1.21 > v0.1.22 FIO upgrade. So verify it with v0.1.21 > v0.1.24 FIO upgrade.
Generally it is fine, the cm aide-reinit was updated after FIO uprade; and the db reinit succeeded when user trigger manual reinit after FIO upgrade done.
The only problem is /hostroot/run/aide.reinit is missing on node. Is it expected? Thanks.

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.39    True        False         72m     Cluster version is 4.8.39

1. install FIO v0.1.21, create FileIntegrity, and trigger failure:
$ oc get ip
NAME            CSV                               APPROVAL    APPROVED
install-pzl7d   file-integrity-operator.v0.1.21   Automatic   true
$ oc get csv -w
NAME                               DISPLAY                            VERSION     REPLACES   PHASE
elasticsearch-operator.5.2.10-20   OpenShift Elasticsearch Operator   5.2.10-20              
file-integrity-operator.v0.1.21    File Integrity Operator            0.1.21                 Installing
file-integrity-operator.v0.1.21    File Integrity Operator            0.1.21                 Succeeded
^C$ oc get pod
NAME                                       READY   STATUS    RESTARTS   AGE
file-integrity-operator-748cf55bbd-s4v59   1/1     Running   0          30s
$ oc create -f - << EOF
> apiVersion: fileintegrity.openshift.io/v1alpha1
> kind: FileIntegrity
> metadata:
>   name: example-fileintegrity
>   namespace: openshift-file-integrity
> spec:
>   # Change to debug: true to enable more verbose logging from the logcollector
>   # container in the aide pods
>   debug: false
>   config: 
>     gracePeriod: 90
> EOF

fileintegrity.fileintegrity.openshift.io/example-fileintegrity created

$ oc extract cm/aide-reinit --confirm 
aide.sh
$ cat aide.sh 
#!/bin/sh
    touch /hostroot/etc/kubernetes/aide.reinit
$ oc extract cm/aide-pause --confirm
pause.sh
$ cat pause.sh 
#!/bin/sh
	sleep infinity & PID=$!
	trap "kill $PID" INT TERM
	wait $PID || true
$ oc get fileintegritynodestatuses
NAME                                                     NODE                               STATUS
example-fileintegrity-xiyuan11-48-bpggz-master-0         xiyuan11-48-bpggz-master-0         Failed
example-fileintegrity-xiyuan11-48-bpggz-master-1         xiyuan11-48-bpggz-master-1         Succeeded
example-fileintegrity-xiyuan11-48-bpggz-master-2         xiyuan11-48-bpggz-master-2         Succeeded
example-fileintegrity-xiyuan11-48-bpggz-worker-0-6rc8j   xiyuan11-48-bpggz-worker-0-6rc8j   Failed
example-fileintegrity-xiyuan11-48-bpggz-worker-0-mnh67   xiyuan11-48-bpggz-worker-0-mnh67   Succeeded
example-fileintegrity-xiyuan11-48-bpggz-worker-0-t4spn   xiyuan11-48-bpggz-worker-0-t4spn   Succeeded

2. upgrade to v0.1.24:

$ oc get ip
NAME            CSV                               APPROVAL    APPROVED
install-7rvcj   file-integrity-operator.v0.1.24   Automatic   true
install-pzl7d   file-integrity-operator.v0.1.21   Automatic   true
[xiyuan@MiWiFi-RA69-srv func]$ oc get csv
NAME                               DISPLAY                            VERSION     REPLACES   PHASE
elasticsearch-operator.5.2.10-20   OpenShift Elasticsearch Operator   5.2.10-20              Succeeded
file-integrity-operator.v0.1.21    File Integrity Operator            0.1.21                 Succeeded
$ oc get csv
NAME                               DISPLAY                            VERSION     REPLACES                          PHASE
elasticsearch-operator.5.2.10-20   OpenShift Elasticsearch Operator   5.2.10-20                                     Succeeded
file-integrity-operator.v0.1.24    File Integrity Operator            0.1.24      file-integrity-operator.v0.1.21   Succeeded


$ oc extract cm/aide-reinit --confirm
aide.sh
$ cat aide.sh 
#!/bin/sh
    touch /hostroot/run/aide.reinit

3. trigger reinit manually:
$ oc debug node/xiyuan11-48-bpggz-master-0 -- chroot /host ls -ltr /etc/kubernetes
Starting pod/xiyuan11-48-bpggz-master-0-debug ...
To use host binaries, run `chroot /host`
total 3860
-rw-r--r--.  1 root root    9179 May 11 06:16 kubeconfig
drwxr-xr-x.  3 root root      19 May 11 06:17 cni
drwxr-xr-x.  3 root root      20 May 11 06:17 kubelet-plugins
drwxr-xr-x. 19 root root    4096 May 11 06:44 static-pod-resources
-rw-r--r--.  1 root root     101 May 11 06:50 apiserver-url.env
drwxr-xr-x.  2 root root     192 May 11 06:50 manifests
-rw-r--r--.  1 root root    5875 May 11 06:50 kubelet-ca.crt
-rw-r--r--.  1 root root    1123 May 11 06:50 ca.crt
-rw-r--r--.  1 root root      94 May 11 06:50 cloud.conf
-rw-r--r--.  1 root root    1076 May 11 06:50 kubelet.conf
-rw-------.  1 root root      67 May 11 07:23 aide.log.backup-20220511T07_23_30
-rw-------.  1 root root 1946990 May 11 07:24 aide.db.gz.new
-rw-------.  1 root root 1946990 May 11 07:24 aide.db.gz
-rw-------.  1 root root     877 May 11 07:45 aide.log.new
-rw-------.  1 root root     877 May 11 07:45 aide.log

Removing debug pod ...
$ oc annotate fileintegrities/example-fileintegrity  file-integrity.openshift.io/re-init=
fileintegrity.fileintegrity.openshift.io/example-fileintegrity annotated

$ oc get fileintegrity example-fileintegrity -o=jsonpath={.status}
{"phase":"Initializing"}[xiyuan@MiWiFi-RA69-srv func]$ 
$ oc get fileintegrity example-fileintegrity -o=jsonpath={.status}
{"phase":"Active"}

$ oc debug node/xiyuan11-48-bpggz-master-0 -- ls -ltr /hostroot/run/aide.reinit
Starting pod/xiyuan11-48-bpggz-master-0-debug ...
To use host binaries, run `chroot /host`
ls: cannot access '/hostroot/run/aide.reinit': No such file or directory

Removing debug pod ...
error: non-zero exit code from debug container


$ oc debug node/xiyuan11-48-bpggz-master-0 -- chroot /host ls -ltr /etc/kubernetes
Starting pod/xiyuan11-48-bpggz-master-0-debug ...
To use host binaries, run `chroot /host`
total 5764
-rw-r--r--.  1 root root    9179 May 11 06:16 kubeconfig
drwxr-xr-x.  3 root root      19 May 11 06:17 cni
drwxr-xr-x.  3 root root      20 May 11 06:17 kubelet-plugins
drwxr-xr-x. 19 root root    4096 May 11 06:44 static-pod-resources
-rw-r--r--.  1 root root     101 May 11 06:50 apiserver-url.env
drwxr-xr-x.  2 root root     192 May 11 06:50 manifests
-rw-r--r--.  1 root root    5875 May 11 06:50 kubelet-ca.crt
-rw-r--r--.  1 root root    1123 May 11 06:50 ca.crt
-rw-r--r--.  1 root root      94 May 11 06:50 cloud.conf
-rw-r--r--.  1 root root    1076 May 11 06:50 kubelet.conf
-rw-------.  1 root root      67 May 11 07:23 aide.log.backup-20220511T07_23_30
-rw-------.  1 root root 1946990 May 11 07:47 aide.db.gz.backup-20220511T07_47_50
-rw-------.  1 root root     877 May 11 07:47 aide.log.backup-20220511T07_47_50
-rw-------.  1 root root 1947002 May 11 07:48 aide.db.gz.new
-rw-------.  1 root root 1947002 May 11 07:48 aide.db.gz
-rw-------.  1 root root     651 May 11 07:52 aide.log
-rw-------.  1 root root       0 May 11 07:53 aide.log.new

Removing debug pod ...

Comment 24 xiyuan 2022-05-11 13:11:33 UTC
Correct the command to check /hostroot/run/aide.reinit in  https://bugzilla.redhat.com/show_bug.cgi?id=2072058#c23. The same result.
$ oc debug node/xiyuan11-48-bpggz-master-0 -- chroot /host  ls -ltr /run/aide.reinit
Starting pod/xiyuan11-48-bpggz-master-0-debug ...
To use host binaries, run `chroot /host`
ls: cannot access '/run/aide.reinit': No such file or directory

Removing debug pod ...
error: non-zero exit code from debug container

Comment 27 errata-xmlrpc 2022-05-23 09:57:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift File Integrity Operator bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:1331


Note You need to log in before you can comment on or make changes to this bug.