Bug 2072058
Summary: | Worker file integrity remains in initializing state. | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | German Parente <gparente> |
Component: | File Integrity Operator | Assignee: | Matt Rogers <mrogers> |
Status: | CLOSED ERRATA | QA Contact: | xiyuan |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.8 | CC: | achernet, alosingh, antaylor, ddelcian, dseals, eglottma, jhrozek, jmittapa, lbragsta, mrogers, suprs, wenshen |
Target Milestone: | --- | Flags: | xiyuan:
needinfo-
xiyuan: needinfo- |
Target Release: | 4.11.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-05-23 09:57:12 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
German Parente
2022-04-05 13:57:55 UTC
Hi Matt,
The bug was reproduced with v0.1.21 > v0.1.22 FIO upgrade. So verify it with v0.1.21 > v0.1.24 FIO upgrade.
Generally it is fine, the cm aide-reinit was updated after FIO uprade; and the db reinit succeeded when user trigger manual reinit after FIO upgrade done.
The only problem is /hostroot/run/aide.reinit is missing on node. Is it expected? Thanks.
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.8.39 True False 72m Cluster version is 4.8.39
1. install FIO v0.1.21, create FileIntegrity, and trigger failure:
$ oc get ip
NAME CSV APPROVAL APPROVED
install-pzl7d file-integrity-operator.v0.1.21 Automatic true
$ oc get csv -w
NAME DISPLAY VERSION REPLACES PHASE
elasticsearch-operator.5.2.10-20 OpenShift Elasticsearch Operator 5.2.10-20
file-integrity-operator.v0.1.21 File Integrity Operator 0.1.21 Installing
file-integrity-operator.v0.1.21 File Integrity Operator 0.1.21 Succeeded
^C$ oc get pod
NAME READY STATUS RESTARTS AGE
file-integrity-operator-748cf55bbd-s4v59 1/1 Running 0 30s
$ oc create -f - << EOF
> apiVersion: fileintegrity.openshift.io/v1alpha1
> kind: FileIntegrity
> metadata:
> name: example-fileintegrity
> namespace: openshift-file-integrity
> spec:
> # Change to debug: true to enable more verbose logging from the logcollector
> # container in the aide pods
> debug: false
> config:
> gracePeriod: 90
> EOF
fileintegrity.fileintegrity.openshift.io/example-fileintegrity created
$ oc extract cm/aide-reinit --confirm
aide.sh
$ cat aide.sh
#!/bin/sh
touch /hostroot/etc/kubernetes/aide.reinit
$ oc extract cm/aide-pause --confirm
pause.sh
$ cat pause.sh
#!/bin/sh
sleep infinity & PID=$!
trap "kill $PID" INT TERM
wait $PID || true
$ oc get fileintegritynodestatuses
NAME NODE STATUS
example-fileintegrity-xiyuan11-48-bpggz-master-0 xiyuan11-48-bpggz-master-0 Failed
example-fileintegrity-xiyuan11-48-bpggz-master-1 xiyuan11-48-bpggz-master-1 Succeeded
example-fileintegrity-xiyuan11-48-bpggz-master-2 xiyuan11-48-bpggz-master-2 Succeeded
example-fileintegrity-xiyuan11-48-bpggz-worker-0-6rc8j xiyuan11-48-bpggz-worker-0-6rc8j Failed
example-fileintegrity-xiyuan11-48-bpggz-worker-0-mnh67 xiyuan11-48-bpggz-worker-0-mnh67 Succeeded
example-fileintegrity-xiyuan11-48-bpggz-worker-0-t4spn xiyuan11-48-bpggz-worker-0-t4spn Succeeded
2. upgrade to v0.1.24:
$ oc get ip
NAME CSV APPROVAL APPROVED
install-7rvcj file-integrity-operator.v0.1.24 Automatic true
install-pzl7d file-integrity-operator.v0.1.21 Automatic true
[xiyuan@MiWiFi-RA69-srv func]$ oc get csv
NAME DISPLAY VERSION REPLACES PHASE
elasticsearch-operator.5.2.10-20 OpenShift Elasticsearch Operator 5.2.10-20 Succeeded
file-integrity-operator.v0.1.21 File Integrity Operator 0.1.21 Succeeded
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASE
elasticsearch-operator.5.2.10-20 OpenShift Elasticsearch Operator 5.2.10-20 Succeeded
file-integrity-operator.v0.1.24 File Integrity Operator 0.1.24 file-integrity-operator.v0.1.21 Succeeded
$ oc extract cm/aide-reinit --confirm
aide.sh
$ cat aide.sh
#!/bin/sh
touch /hostroot/run/aide.reinit
3. trigger reinit manually:
$ oc debug node/xiyuan11-48-bpggz-master-0 -- chroot /host ls -ltr /etc/kubernetes
Starting pod/xiyuan11-48-bpggz-master-0-debug ...
To use host binaries, run `chroot /host`
total 3860
-rw-r--r--. 1 root root 9179 May 11 06:16 kubeconfig
drwxr-xr-x. 3 root root 19 May 11 06:17 cni
drwxr-xr-x. 3 root root 20 May 11 06:17 kubelet-plugins
drwxr-xr-x. 19 root root 4096 May 11 06:44 static-pod-resources
-rw-r--r--. 1 root root 101 May 11 06:50 apiserver-url.env
drwxr-xr-x. 2 root root 192 May 11 06:50 manifests
-rw-r--r--. 1 root root 5875 May 11 06:50 kubelet-ca.crt
-rw-r--r--. 1 root root 1123 May 11 06:50 ca.crt
-rw-r--r--. 1 root root 94 May 11 06:50 cloud.conf
-rw-r--r--. 1 root root 1076 May 11 06:50 kubelet.conf
-rw-------. 1 root root 67 May 11 07:23 aide.log.backup-20220511T07_23_30
-rw-------. 1 root root 1946990 May 11 07:24 aide.db.gz.new
-rw-------. 1 root root 1946990 May 11 07:24 aide.db.gz
-rw-------. 1 root root 877 May 11 07:45 aide.log.new
-rw-------. 1 root root 877 May 11 07:45 aide.log
Removing debug pod ...
$ oc annotate fileintegrities/example-fileintegrity file-integrity.openshift.io/re-init=
fileintegrity.fileintegrity.openshift.io/example-fileintegrity annotated
$ oc get fileintegrity example-fileintegrity -o=jsonpath={.status}
{"phase":"Initializing"}[xiyuan@MiWiFi-RA69-srv func]$
$ oc get fileintegrity example-fileintegrity -o=jsonpath={.status}
{"phase":"Active"}
$ oc debug node/xiyuan11-48-bpggz-master-0 -- ls -ltr /hostroot/run/aide.reinit
Starting pod/xiyuan11-48-bpggz-master-0-debug ...
To use host binaries, run `chroot /host`
ls: cannot access '/hostroot/run/aide.reinit': No such file or directory
Removing debug pod ...
error: non-zero exit code from debug container
$ oc debug node/xiyuan11-48-bpggz-master-0 -- chroot /host ls -ltr /etc/kubernetes
Starting pod/xiyuan11-48-bpggz-master-0-debug ...
To use host binaries, run `chroot /host`
total 5764
-rw-r--r--. 1 root root 9179 May 11 06:16 kubeconfig
drwxr-xr-x. 3 root root 19 May 11 06:17 cni
drwxr-xr-x. 3 root root 20 May 11 06:17 kubelet-plugins
drwxr-xr-x. 19 root root 4096 May 11 06:44 static-pod-resources
-rw-r--r--. 1 root root 101 May 11 06:50 apiserver-url.env
drwxr-xr-x. 2 root root 192 May 11 06:50 manifests
-rw-r--r--. 1 root root 5875 May 11 06:50 kubelet-ca.crt
-rw-r--r--. 1 root root 1123 May 11 06:50 ca.crt
-rw-r--r--. 1 root root 94 May 11 06:50 cloud.conf
-rw-r--r--. 1 root root 1076 May 11 06:50 kubelet.conf
-rw-------. 1 root root 67 May 11 07:23 aide.log.backup-20220511T07_23_30
-rw-------. 1 root root 1946990 May 11 07:47 aide.db.gz.backup-20220511T07_47_50
-rw-------. 1 root root 877 May 11 07:47 aide.log.backup-20220511T07_47_50
-rw-------. 1 root root 1947002 May 11 07:48 aide.db.gz.new
-rw-------. 1 root root 1947002 May 11 07:48 aide.db.gz
-rw-------. 1 root root 651 May 11 07:52 aide.log
-rw-------. 1 root root 0 May 11 07:53 aide.log.new
Removing debug pod ...
Correct the command to check /hostroot/run/aide.reinit in https://bugzilla.redhat.com/show_bug.cgi?id=2072058#c23. The same result. $ oc debug node/xiyuan11-48-bpggz-master-0 -- chroot /host ls -ltr /run/aide.reinit Starting pod/xiyuan11-48-bpggz-master-0-debug ... To use host binaries, run `chroot /host` ls: cannot access '/run/aide.reinit': No such file or directory Removing debug pod ... error: non-zero exit code from debug container per https://bugzilla.redhat.com/show_bug.cgi?id=2072058#c23 and https://bugzilla.redhat.com/show_bug.cgi?id=2049206#c14, move it to verified Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift File Integrity Operator bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:1331 |