Bug 2058626
Summary: | Multiple Azure upstream kube fsgroupchangepolicy tests are permafailing expecting gid "1000" but geting "root" | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Devan Goodwin <dgoodwin> |
Component: | Storage | Assignee: | Jan Safranek <jsafrane> |
Storage sub component: | Storage | QA Contact: | Wei Duan <wduan> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | unspecified | CC: | jsafrane, sippy, stbenjam, wking |
Version: | 4.11 | ||
Target Milestone: | --- | ||
Target Release: | 4.11.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: |
External Storage [Driver: disk.csi.azure.com] [Testpattern: Dynamic PV (default fs)] provisioning should provision storage with pvc data source in parallel
[sig-storage] In-tree Volumes [Driver: azure-disk] [Testpattern: Dynamic PV (default fs)] fsgroupchangepolicy (OnRootMismatch)[LinuxOnly], pod created with an initial fsgroup, new pod fsgroup applied to volume contents
[sig-storage] In-tree Volumes [Driver: azure-disk] [Testpattern: Dynamic PV (default fs)] fsgroupchangepolicy (OnRootMismatch)[LinuxOnly], pod created with an initial fsgroup, new pod fsgroup applied to volume contents
[sig-storage] In-tree Volumes [Driver: azure-disk] [Testpattern: Dynamic PV (default fs)] fsgroupchangepolicy (Always)[LinuxOnly], pod created with an initial fsgroup, volume contents ownership changed via chgrp in first pod, new pod with different fsgroup applied to the volume contents
[sig-storage] In-tree Volumes [Driver: azure-disk] [Testpattern: Dynamic PV (default fs)] fsgroupchangepolicy (Always)[LinuxOnly], pod created with an initial fsgroup, volume contents ownership changed via chgrp in first pod, new pod with different fsgroup applied to the volume contents
[sig-storage] In-tree Volumes [Driver: azure-disk] [Testpattern: Dynamic PV (default fs)] fsgroupchangepolicy (Always)[LinuxOnly], pod created with an initial fsgroup, volume contents ownership changed via chgrp in first pod, new pod with same fsgroup applied to the volume contents
[sig-storage] In-tree Volumes [Driver: azure-disk] [Testpattern: Dynamic PV (default fs)] fsgroupchangepolicy (Always)[LinuxOnly], pod created with an initial fsgroup, volume contents ownership changed via chgrp in first pod, new pod with same fsgroup applied to the volume contents
|
|
Last Closed: | 2022-08-10 10:51:23 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Devan Goodwin
2022-02-25 12:46:09 UTC
In addition, OCP does not install on Azure since yesterday: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-azure/1496906555121995776 Prometheus pods report permission denied: ts=2022-02-24T19:08:33.801Z caller=query_logger.go:86 level=error component=activeQueryTracker msg="Error opening query log file" file=/prometheus/queries.active err="open /prometheus/queries.active: permission denied" https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-e2e-azure/1496904660361940992/artifacts/e2e-azure/gather-extra/artifacts/pods/openshift-monitoring_prometheus-k8s-0_prometheus.log CSI migration PR in MCO landed around the same time: https://github.com/openshift/machine-config-operator/pull/2949, i.e. all Azure Disk set up and mounting is done by the CSI driver now. I can see in the Prometheus PV: fsType: "" Together with missing FSGroupChangePolicy in CSIDriver instance it means that fsGroup was not applied and Prometheus can't access its volume. Azure Disk in-tree volume plugin used "fsType: ext4" here. We decided that while this PR can fix installation, we would need backport to 4.10 to fix upgrade. It's quite late for such changes. In addition, we can't fix Cinder CSIDriver in the same way, as we ship its CSIDriver since 4.9. Therefore it's better to revert MCO changes and fix CSIDriver instances in our operators properly and ship "CSIDriver.FSGroupPolicy: File" everywhere it makes sense, together with some code that deletes and re-creates CSIDriver during update (CSIDriver is read-only after creation). In addition, we should fix the translation library upstream from in-tree PV with empty fsType to CSI PV with "fsType: ext4" in all volume plugins that default to ext4, so other Kubernetes distros don't hit the same issue. Not happened in these three days. Updated status to "Verified" Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |