Bug 2011171
| Summary: | diskmaker-manager constantly redeployed by LSO when creating LV's | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | ryoji noma <rnoma> | |
| Component: | Storage | Assignee: | Jonathan Dobson <jdobson> | |
| Storage sub component: | Local Storage Operator | QA Contact: | Chao Yang <chaoyang> | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | high | |||
| Priority: | high | CC: | aos-bugs, mori | |
| Version: | 4.8 | |||
| Target Milestone: | --- | |||
| Target Release: | 4.10.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | No Doc Update | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2016556 2016557 (view as bug list) | Environment: | ||
| Last Closed: | 2022-03-10 16:17:19 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2016556 | |||
|
Description
ryoji noma
2021-10-06 08:00:15 UTC
Thank you Manabu and Ryoji, this info was very helpful. I'm able to reproduce the issue, still working on RCA.
0) Initial setup
Installed LSO and created a series of loopback devices for testing on each worker node as follows:
[jon@dagobah 2011171]$ oc debug node/ip-10-0-138-245.us-east-2.compute.internal
...
sh-4.4# for i in `seq 1 4`; do dd if=/dev/zero of=/tmp/loopback${i}.img bs=1 count=0 seek=1G; losetup -fP /tmp/loopback${i}.img; done
...
sh-4.4# losetup -a
/dev/loop1: [0042]:4466232 (/tmp/loopback2.img)
/dev/loop2: [0042]:4466840 (/tmp/loopback3.img)
/dev/loop0: [0042]:4466837 (/tmp/loopback1.img)
/dev/loop3: [0042]:4466842 (/tmp/loopback4.img)
1) Create single LV on single worker node
[jon@dagobah 2011171]$ cat lv-loop0.yaml
apiVersion: "local.storage.openshift.io/v1"
kind: "LocalVolume"
metadata:
name: "lv-loop0"
namespace: "openshift-local-storage"
spec:
nodeSelector:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- ip-10-0-138-245
storageClassDevices:
- storageClassName: "lv-loop0"
volumeMode: Filesystem
fsType: xfs
devicePaths:
- /dev/loop0
[jon@dagobah 2011171]$ oc apply -f lv-loop0.yaml
localvolume.local.storage.openshift.io/lv-loop0 created
This triggered multiple updates to the daemonset (see "daemonset changed" entries):
[jon@dagobah 2011171]$ oc logs -n openshift-local-storage pod/local-storage-operator-77d6b45fc9-t8879
...
I1014 22:28:05.087390 1 reconcile.go:34] Reconciling LocalVolume
I1014 22:28:05.087454 1 api_updater.go:74] Updating localvolume openshift-local-storage/lv-loop0
I1014 22:28:05.097273 1 reconcile.go:34] Reconciling LocalVolume
{"level":"info","ts":1634250485.0975707,"logger":"localvolumesetdaemon-controller","msg":"provisioner configmap changed","Request.Namespace":"openshift-local-storage","Request.Name":""}
E1014 22:28:05.108356 1 reconcile.go:96] failed to fetch diskmaker daemonset DaemonSet.apps "diskmaker-manager" not found
E1014 22:28:05.114731 1 reconcile.go:134] error syncing condition: LocalVolume.local.storage.openshift.io "lv-loop0" is invalid: status.generations: Required value
{"level":"info","ts":1634250485.1165073,"logger":"localvolumesetdaemon-controller","msg":"daemonset changed","Request.Namespace":"openshift-local-storage","Request.Name":"","daemonset.Name":"diskmaker-manager","op.Result":"created"}
I1014 22:28:05.117012 1 reconcile.go:34] Reconciling LocalVolume
{"level":"info","ts":1634250485.126506,"logger":"localvolumesetdaemon-controller","msg":"daemonset changed","Request.Namespace":"openshift-local-storage","Request.Name":"","daemonset.Name":"diskmaker-manager","op.Result":"updated"}
{"level":"info","ts":1634250485.1347363,"logger":"localvolumesetdaemon-controller","msg":"daemonset changed","Request.Namespace":"openshift-local-storage","Request.Name":"","daemonset.Name":"diskmaker-manager","op.Result":"updated"}
I1014 22:28:05.135488 1 reconcile.go:34] Reconciling LocalVolume
{"level":"info","ts":1634250485.1427386,"logger":"localvolumesetdaemon-controller","msg":"daemonset changed","Request.Namespace":"openshift-local-storage","Request.Name":"","daemonset.Name":"diskmaker-manager","op.Result":"updated"}
I1014 22:28:05.165650 1 reconcile.go:34] Reconciling LocalVolume
I1014 22:28:05.176266 1 reconcile.go:34] Reconciling LocalVolume
{"level":"error","ts":1634250485.17673,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"localvolumesetdaemon-controller","request":"openshift-local-storage/","error":"Operation cannot be fulfilled on daemonsets.apps \"diskmaker-manager\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/openshift/local-storage-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:258\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/openshift/local-storage-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/openshift/local-storage-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/openshift/local-storage-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}
{"level":"info","ts":1634250486.185989,"logger":"localvolumesetdaemon-controller","msg":"daemonset changed","Request.Namespace":"openshift-local-storage","Request.Name":"","daemonset.Name":"diskmaker-manager","op.Result":"updated"}
I1014 22:28:16.900963 1 reconcile.go:34] Reconciling LocalVolume
{"level":"info","ts":1634250496.9091973,"logger":"localvolumesetdaemon-controller","msg":"daemonset changed","Request.Namespace":"openshift-local-storage","Request.Name":"","daemonset.Name":"diskmaker-manager","op.Result":"updated"}
I1014 22:28:19.483278 1 reconcile.go:34] Reconciling LocalVolume
I1014 22:28:19.493026 1 reconcile.go:34] Reconciling LocalVolume
I1014 22:28:19.510662 1 reconcile.go:34] Reconciling LocalVolume
I1014 22:28:19.518589 1 reconcile.go:34] Reconciling LocalVolume
However, the generation number was still 1 after those updates:
[jon@dagobah 2011171]$ oc get daemonset.apps/diskmaker-manager -n openshift-local-storage -o
yaml | head
apiVersion: apps/v1
kind: DaemonSet
metadata:
annotations:
deprecated.daemonset.template.generation: "1"
local.storage.openshift.io/configMapDataHash: 17032e345ceabe091d608b58ed093b2c5d1c2f0107
dcfe34e7888d8ffca1c53c
creationTimestamp: "2021-10-14T22:28:05Z"
generation: 1
labels:
app: diskmaker-manager
This seems like a potential issue... if the generation number is not changing, then why isn't CreateOrUpdate() returning OperationResultNone?
https://github.com/openshift/local-storage-operator/blob/81d182a4cabe04d2dc7f42d7b6167bd0d86900e2/vendor/sigs.k8s.io/controller-runtime/pkg/controller/controllerutil/controllerutil.go#L211-L224
2) Create 4 LV's on each of 2 worker nodes (8 LV's total)
[jon@dagobah 2011171]$ cat reproduce-loopback.sh
#!/bin/bash
devprefix=/dev/loop
volmode=fs
declare -a workers=(
"ip-10-0-138-245"
"ip-10-0-163-198"
)
case ${volmode} in
fs)
volumeMode=Filesystem
;;
block)
volumeMode=Block
;;
esac
echo "* volmode: ${volmode}"
for (( w=0; w<${#workers[@]}; w++ )); do
worker=${workers[$w]}
echo "$w" "$worker"
for (( d=0; d<4; d++ )); do
devpath="${devprefix}$d"
echo "$d" "${devpath}"
cat << END | oc create -f -
apiVersion: "local.storage.openshift.io/v1"
kind: "LocalVolume"
metadata:
name: "local-${volmode}-w${w}-${d}"
namespace: "openshift-local-storage"
spec:
nodeSelector:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- ${worker}
storageClassDevices:
- storageClassName: "local-${volmode}-w${w}-${d}"
volumeMode: ${volumeMode}
devicePaths:
- ${devpath}
END
sleep 1
done
done
[jon@dagobah 2011171]$ ./reproduce-loopback.sh
* volmode: fs
0 ip-10-0-138-245
0 /dev/loop0
localvolume.local.storage.openshift.io/local-fs-w0-0 created
1 /dev/loop1
localvolume.local.storage.openshift.io/local-fs-w0-1 created
2 /dev/loop2
localvolume.local.storage.openshift.io/local-fs-w0-2 created
3 /dev/loop3
localvolume.local.storage.openshift.io/local-fs-w0-3 created
1 ip-10-0-163-198
0 /dev/loop0
localvolume.local.storage.openshift.io/local-fs-w1-0 created
1 /dev/loop1
localvolume.local.storage.openshift.io/local-fs-w1-1 created
2 /dev/loop2
localvolume.local.storage.openshift.io/local-fs-w1-2 created
3 /dev/loop3
localvolume.local.storage.openshift.io/local-fs-w1-3 created
Right away we can see the behavior described in the bug report.
[jon@dagobah 2011171]$ oc get pods -n openshift-local-storage
NAME READY STATUS RESTARTS AGE
diskmaker-manager-qqhmz 0/1 Terminating 0 10s
diskmaker-manager-xhs4h 0/1 Terminating 0 6s
local-storage-operator-77d6b45fc9-t8879 1/1 Running 0 87m
[jon@dagobah 2011171]$ oc get pods -n openshift-local-storage
NAME READY STATUS RESTARTS AGE
diskmaker-manager-qqhmz 0/1 Terminating 0 15s
diskmaker-manager-xhs4h 0/1 Terminating 0 11s
local-storage-operator-77d6b45fc9-t8879 1/1 Running 0 87m
[jon@dagobah 2011171]$ oc get pods -n openshift-local-storage
NAME READY STATUS RESTARTS AGE
diskmaker-manager-5l52x 0/1 ContainerCreating 0 1s
diskmaker-manager-plzvk 0/1 Pending 0 0s
local-storage-operator-77d6b45fc9-t8879 1/1 Running 0 88m
[jon@dagobah 2011171]$ oc get pods -n openshift-local-storage
NAME READY STATUS RESTARTS AGE
diskmaker-manager-5l52x 0/1 Terminating 0 3s
diskmaker-manager-xw8ql 0/1 ContainerCreating 0 0s
local-storage-operator-77d6b45fc9-t8879 1/1 Running 0 88m
While that was happening, I captured the yaml for diskmaker-manager and ran a diff.
[jon@dagobah 2011171]$ oc get daemonset.apps/diskmaker-manager -n openshift-local-storage -o yaml > diskmaker1.yaml
[jon@dagobah 2011171]$ oc get daemonset.apps/diskmaker-manager -n openshift-local-storage -o yaml > diskmaker2.yaml
[jon@dagobah 2011171]$ oc get daemonset.apps/diskmaker-manager -n openshift-local-storage -o yaml > diskmaker3.yaml
[jon@dagobah 2011171]$ oc get daemonset.apps/diskmaker-manager -n openshift-local-storage -o yaml > diskmaker4.yaml
[jon@dagobah 2011171]$ diff diskmaker1.yaml diskmaker4.yaml
5c5
< deprecated.daemonset.template.generation: "310"
---
> deprecated.daemonset.template.generation: "339"
8c8
< generation: 310
---
> generation: 339
62c62
< resourceVersion: "240948"
---
> resourceVersion: "241199"
198c198
< observedGeneration: 310
---
> observedGeneration: 339
At first it looks like just the generation number and resourceVersion are being updated.
But multiple updates are happening between these 2 files.
What's really happening is the ownerReferences and nodeAffinity lists are being reordered:
[jon@dagobah 2011171]$ diff diskmaker2.yaml diskmaker3.yaml
5c5
< deprecated.daemonset.template.generation: "324"
---
> deprecated.daemonset.template.generation: "329"
8c8
< generation: 324
---
> generation: 329
18,23d17
< name: local-fs-w0-3
< uid: 4178c3bf-33d9-4122-aac4-053be382362c
< - apiVersion: local.storage.openshift.io/v1
< blockOwnerDeletion: false
< controller: false
< kind: LocalVolume
62c56,62
< resourceVersion: "241069"
---
> - apiVersion: local.storage.openshift.io/v1
> blockOwnerDeletion: false
> controller: false
> kind: LocalVolume
> name: local-fs-w0-3
> uid: 4178c3bf-33d9-4122-aac4-053be382362c
> resourceVersion: "241122"
85c85
< - ip-10-0-138-245
---
> - ip-10-0-163-198
105c105
< - ip-10-0-163-198
---
> - ip-10-0-138-245
198,199c198
< observedGeneration: 321
< updatedNumberScheduled: 2
---
> observedGeneration: 329
[jon@dagobah 2011171]$ diff diskmaker3.yaml diskmaker4.yaml
5c5
< deprecated.daemonset.template.generation: "329"
---
> deprecated.daemonset.template.generation: "339"
8c8
< generation: 329
---
> generation: 339
17a18,23
> name: local-fs-w0-3
> uid: 4178c3bf-33d9-4122-aac4-053be382362c
> - apiVersion: local.storage.openshift.io/v1
> blockOwnerDeletion: false
> controller: false
> kind: LocalVolume
56,62c62
< - apiVersion: local.storage.openshift.io/v1
< blockOwnerDeletion: false
< controller: false
< kind: LocalVolume
< name: local-fs-w0-3
< uid: 4178c3bf-33d9-4122-aac4-053be382362c
< resourceVersion: "241122"
---
> resourceVersion: "241199"
85c85
< - ip-10-0-163-198
---
> - ip-10-0-138-245
105c105
< - ip-10-0-138-245
---
> - ip-10-0-163-198
198c198
< observedGeneration: 329
---
> observedGeneration: 339
configMapDataHash is not getting updated like I had theorized earlier, but I captured the configmap during this experiment too:
[jon@dagobah 2011171]$ oc get configmap -n openshift-local-storage local-provisioner -o yaml
> configmap1.yaml
[jon@dagobah 2011171]$ oc get configmap -n openshift-local-storage local-provisioner -o yaml
> configmap2.yaml
[jon@dagobah 2011171]$ oc get configmap -n openshift-local-storage local-provisioner -o yaml
> configmap3.yaml
[jon@dagobah 2011171]$ oc get configmap -n openshift-local-storage local-provisioner -o yaml
> configmap4.yaml
And here, again, it's the ownerReferences being reordered and updated.
[jon@dagobah 2011171]$ diff configmap2.yaml configmap3.yaml
68,69c68,69
< name: local-fs-w0-2
< uid: 7e645f08-b7c1-4e1c-a3ce-8d9e089dbdc2
---
> name: local-fs-w1-1
> uid: 5dce825e-3eeb-4402-835a-03e37a01ecba
74,75c74,75
< name: local-fs-w0-3
< uid: 4178c3bf-33d9-4122-aac4-053be382362c
---
> name: local-fs-w1-2
> uid: 9bb2cfae-3f78-46ee-bfc3-87c6a01d3173
80,81c80,81
< name: local-fs-w1-0
< uid: 2c2ee787-1938-43b2-8845-a02244819e4e
---
> name: local-fs-w1-3
> uid: 03ab4601-dda4-4b32-ac81-fdc79dc8885b
86,87c86,87
< name: local-fs-w1-1
< uid: 5dce825e-3eeb-4402-835a-03e37a01ecba
---
> name: local-fs-w0-0
> uid: d22d47e8-c03a-4b0e-8790-decbf979966b
92,93c92,93
< name: local-fs-w1-2
< uid: 9bb2cfae-3f78-46ee-bfc3-87c6a01d3173
---
> name: local-fs-w0-1
> uid: 8dd04e56-caa4-4374-9101-b7b890b368a8
98,99c98,99
< name: local-fs-w1-3
< uid: 03ab4601-dda4-4b32-ac81-fdc79dc8885b
---
> name: local-fs-w0-2
> uid: 7e645f08-b7c1-4e1c-a3ce-8d9e089dbdc2
104,105c104,105
< name: local-fs-w0-0
< uid: d22d47e8-c03a-4b0e-8790-decbf979966b
---
> name: local-fs-w0-3
> uid: 4178c3bf-33d9-4122-aac4-053be382362c
110,112c110,112
< name: local-fs-w0-1
< uid: 8dd04e56-caa4-4374-9101-b7b890b368a8
< resourceVersion: "249667"
---
> name: local-fs-w1-0
> uid: 2c2ee787-1938-43b2-8845-a02244819e4e
> resourceVersion: "249740"
[jon@dagobah 2011171]$ diff configmap3.yaml configmap4.yaml
68,69c68,69
< name: local-fs-w1-1
< uid: 5dce825e-3eeb-4402-835a-03e37a01ecba
---
> name: local-fs-w0-2
> uid: 7e645f08-b7c1-4e1c-a3ce-8d9e089dbdc2
74,75c74,75
< name: local-fs-w1-2
< uid: 9bb2cfae-3f78-46ee-bfc3-87c6a01d3173
---
> name: local-fs-w0-3
> uid: 4178c3bf-33d9-4122-aac4-053be382362c
80,81c80,81
< name: local-fs-w1-3
< uid: 03ab4601-dda4-4b32-ac81-fdc79dc8885b
---
> name: local-fs-w1-0
> uid: 2c2ee787-1938-43b2-8845-a02244819e4e
86,87c86,87
< name: local-fs-w0-0
< uid: d22d47e8-c03a-4b0e-8790-decbf979966b
---
> name: local-fs-w1-1
> uid: 5dce825e-3eeb-4402-835a-03e37a01ecba
92,93c92,93
< name: local-fs-w0-1
< uid: 8dd04e56-caa4-4374-9101-b7b890b368a8
---
> name: local-fs-w1-2
> uid: 9bb2cfae-3f78-46ee-bfc3-87c6a01d3173
98,99c98,99
< name: local-fs-w0-2
< uid: 7e645f08-b7c1-4e1c-a3ce-8d9e089dbdc2
---
> name: local-fs-w1-3
> uid: 03ab4601-dda4-4b32-ac81-fdc79dc8885b
104,105c104,105
< name: local-fs-w0-3
< uid: 4178c3bf-33d9-4122-aac4-053be382362c
---
> name: local-fs-w0-0
> uid: d22d47e8-c03a-4b0e-8790-decbf979966b
110,112c110,112
< name: local-fs-w1-0
< uid: 2c2ee787-1938-43b2-8845-a02244819e4e
< resourceVersion: "249740"
---
> name: local-fs-w0-1
> uid: 8dd04e56-caa4-4374-9101-b7b890b368a8
> resourceVersion: "249836"
So right now the things I want to understand are:
A) When creating a single LV, why do we get so many messages about daemonset being updated, even though the generation number is only 1? I would expect CreateOrUpdate() to return OperationResultNone in that case and not print a log message.
B) Why is the list ordering for ownerReferences and nodeAffinity constantly changing and causing updates to both the daemonset and the configmap? This seems only to happen with 2+ worker nodes. This is what's causing the diskmaker-manager pods to be recreated over and over.
Ok, RCA is understood now. I spent quite a bit of time tracing this with delve, and there were a few issues to fix. 1) We create the list of ownerRefs and node selector terms here in extractLVInfo: https://github.com/openshift/local-storage-operator/blob/81d182a4cabe04d2dc7f42d7b6167bd0d86900e2/pkg/controller/nodedaemon/aggregate.go#L88 But the order of the lists generated by that function is different every time. This explains my previous observations where the list ordering for the owner refs and node selectors were constantly changing. The fix for this is to sort the slice of lvs before looping through them, similar to what we already do in extractLVSetInfo, so that the lists will be created in a predictable order: https://github.com/openshift/local-storage-operator/blob/81d182a4cabe04d2dc7f42d7b6167bd0d86900e2/pkg/controller/nodedaemon/aggregate.go#L56-L61 2) Every time LSO updated the daemonset, it was attempting to clear TerminationMessagePath and TerminationMessagePolicy: (dlv) print existing.Spec.Template.Spec.Containers[0] k8s.io/api/core/v1.Container { ... TerminationMessagePath: "/dev/termination-log", TerminationMessagePolicy: "File", ... (dlv) print obj.Spec.Template.Spec.Containers[0] k8s.io/api/core/v1.Container { ... TerminationMessagePath: "", TerminationMessagePolicy: "", ... Looks like those fields were set by prometheus-operator. Maybe LSO is even "fighting" prometheus-operator to set those fields? Every time LSO updates the daemonset, I see the same thing. Those 2 fields are set on the existing object, but since they are not in MutateAggregatedSpec(), LSO tries to clear them on every update. Adding them to the definition in MutateAggregatedSpec() solves this particular issue. https://github.com/openshift/local-storage-operator/blob/81d182a4cabe04d2dc7f42d7b6167bd0d86900e2/pkg/controller/nodedaemon/daemonsets.go#L159 3) Every time LSO updated the daemonset, it was also attempting to clear the APIVersion field from the field selector for each environment variable, i.e.: Name: "MY_NODE_NAME", Value: "", ValueFrom: &v1.EnvVarSource{ FieldRef: &v1.ObjectFieldSelector{ - APIVersion: "v1", + APIVersion: "", FieldPath: "spec.nodeName", }, This again is because that field is not defined in MutateAggregatedSpec(), and it can be fixed by setting it to v1 for each entry here: https://github.com/openshift/local-storage-operator/blob/81d182a4cabe04d2dc7f42d7b6167bd0d86900e2/pkg/controller/nodedaemon/daemonsets.go#L233-L256 4) There is a Watch on DaemonSets that I don't think is necessary: https://github.com/openshift/local-storage-operator/blob/81d182a4cabe04d2dc7f42d7b6167bd0d86900e2/pkg/controller/nodedaemon/add.go#L70-L73 We already watch for changes on the daemonset at line 54, and LV's on line 65. I think this one can be removed--it's triggering an extra reconcile when one should not be needed. It was already removed from the master branch at some point: https://github.com/openshift/local-storage-operator/blob/a38697051f4567c7f630c897d11b202206113832/controllers/nodedaemon/reconcile.go#L207-L212 After fixing those 4 issues in my local test VM, I see the configmap and daemonset only getting updated once when creating a new LV: [vagrant@fedora33 deploy]$ k get localvolumes NAME AGE lv-loop0 19h lv-loop1 19h lv-loop2 19h [vagrant@fedora33 deploy]$ k apply -f lv-loop3.yaml localvolume.local.storage.openshift.io/lv-loop3 created [vagrant@fedora33 deploy]$ k logs local-storage-operator-5d6fc74cdd-5xml8 -f ... I1021 19:16:59.423579 1 reconcile.go:34] Reconciling LocalVolume I1021 19:16:59.423628 1 api_updater.go:74] Updating localvolume default/lv-loop3 I1021 19:16:59.450197 1 reconcile.go:34] Reconciling LocalVolume {"level":"info","ts":1634843819.4503026,"logger":"localvolumesetdaemon-controller","msg":"pr ovisioner configmap changed","Request.Namespace":"default","Request.Name":""} {"level":"info","ts":1634843819.4561226,"logger":"localvolumesetdaemon-controller","msg":"da emonset changed","Request.Namespace":"default","Request.Name":"","daemonset.Name":"diskmaker -manager","op.Result":"updated"} I1021 19:16:59.468948 1 reconcile.go:34] Reconciling LocalVolume I1021 19:16:59.476097 1 reconcile.go:34] Reconciling LocalVolume I1021 19:16:59.481306 1 reconcile.go:34] Reconciling LocalVolume I1021 19:16:59.484802 1 reconcile.go:34] Reconciling LocalVolume My testing so far was based on the release-4.8 branch, so next steps: - port fix over to 4.10 and 4.9 branches - test fix on 4.10, 4.9, 4.8 OCP clusters - file PR's for 4.10, 4.9, 4.8 RE https://bugzilla.redhat.com/show_bug.cgi?id=2011171#c11 Issues #1 and #2 apply to all 3 versions (4.8, 4.9, 4.10). Issues #3 and #4 only apply to 4.8 (already fixed on newer versions). 4.9 and 4.10 also have a kube-rbac-proxy sidecar that needs to be updated to avoid extra daemonset updates. The following fields were undefined and getting cleared on every update attempt:
containers:
- name: kube-rbac-proxy
imagePullPolicy: IfNotPresent
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumes:
- name: metrics-serving-cert
secret:
defaultMode: 420
./reproduce-loopback.sh * volmode: fs 0 ip-10-0-187-149.us-east-2.compute.internal 0 /dev/loop0 localvolume.local.storage.openshift.io/local-fs-w0-0 created 1 /dev/loop1 localvolume.local.storage.openshift.io/local-fs-w0-1 created 2 /dev/loop2 localvolume.local.storage.openshift.io/local-fs-w0-2 created 3 /dev/loop3 localvolume.local.storage.openshift.io/local-fs-w0-3 created 1 ip-10-0-142-92.us-east-2.compute.internal 0 /dev/loop0 localvolume.local.storage.openshift.io/local-fs-w1-0 created 1 /dev/loop1 localvolume.local.storage.openshift.io/local-fs-w1-1 created 2 /dev/loop2 localvolume.local.storage.openshift.io/local-fs-w1-2 created 3 /dev/loop3 localvolume.local.storage.openshift.io/local-fs-w1-3 created oc get pods NAME READY STATUS RESTARTS AGE diskmaker-discovery-jwh9m 2/2 Running 0 76m diskmaker-discovery-mzh29 2/2 Running 0 76m diskmaker-discovery-vhgs2 2/2 Running 0 76m diskmaker-manager-4c5zm 2/2 Running 0 15s diskmaker-manager-jphkp 2/2 Running 0 16s diskmaker-manager-zkpcv 2/2 Running 0 15s local-storage-operator-9855f445-pts2w 1/1 Running 0 130m Passed with local-storage-operator.4.10.0-202110270816 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |