Description of problem: Used the local-storage operator to expose one nvme disk per master node as a PV apiVersion: v1 kind: Namespace metadata: name: local-storage --- apiVersion: operators.coreos.com/v1alpha2 kind: OperatorGroup metadata: name: local-operator-group namespace: local-storage spec: targetNamespaces: - local-storage --- apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: local-storage-operator namespace: local-storage spec: channel: "4.4" installPlanApproval: Automatic name: local-storage-operator source: redhat-operators sourceNamespace: openshift-marketplace ========================================================= [kni@e19-h24-b01-fc640 local-storage]$ cat volume.yaml apiVersion: "local.storage.openshift.io/v1" kind: "LocalVolume" metadata: name: "local-disks" namespace: "local-storage" spec: tolerations: - key: storage operator: Equal value: "true" storageClassDevices: - storageClassName: "local-sc" volumeMode: Filesystem fsType: xfs devicePaths: - /dev/nvme0n1 All master nodes are tainted oc adm taint node master-<node> storage=true:NoSchedule The tried to deploy cluster-logging to use those PVs [kni@e19-h24-b01-fc640 local-storage]$ cat ~/logging/instance.yaml apiVersion: "logging.openshift.io/v1" kind: "ClusterLogging" metadata: name: "instance" namespace: "openshift-logging" spec: managementState: "Managed" logStore: type: "elasticsearch" elasticsearch: tolerations: - key: storage operator: Equal value: "true" nodeCount: 3 storage: storageClassName: local-sc size: 100G redundancyPolicy: "SingleRedundancy" visualization: tolerations: - key: storage operator: Equal value: "true" type: "kibana" kibana: replicas: 1 curation: type: "curator" curator: schedule: "30 3 * * *" collection: logs: type: "fluentd" fluentd: {} The ES pods never go into running and are stuck at containercreating looking at oc describe pod Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 19m default-scheduler Successfully assigned openshift-logging/elasticsearch-cdm-w4a280ij-1-6bd64d7578-x8nb6 to master-2 Warning FailedMount 17m kubelet, master-2 Unable to attach or mount volumes: unmounted volumes=[elasticsearch-storage], unattached volumes=[elasticsearch-token-vnrv8 elasticsearch-metrics elasticsearch-storage elasticsearch-config certificates]: timed out waiting for the condition Warning FailedMount 9m18s (x2 over 11m) kubelet, master-2 Unable to attach or mount volumes: unmounted volumes=[elasticsearch-storage], unattached volumes=[elasticsearch-config certificates elasticsearch-token-vnrv8 elasticsearch-metrics elasticsearch-storage]: timed out waiting for the condition Warning FailedMount 7m15s kubelet, master-2 Unable to attach or mount volumes: unmounted volumes=[elasticsearch-storage], unattached volumes=[elasticsearch-metrics elasticsearch-storage elasticsearch-config certificates elasticsearch-token-vnrv8]: timed out waiting for the condition Warning FailedMount 3m8s (x3 over 15m) kubelet, master-2 Unable to attach or mount volumes: unmounted volumes=[elasticsearch-storage], unattached volumes=[certificates elasticsearch-token-vnrv8 elasticsearch-metrics elasticsearch-storage elasticsearch-config]: timed out waiting for the condition Warning FailedMount 68s (x17 over 19m) kubelet, master-2 MountVolume.MountDevice failed for volume "local-pv-a2502689" : local: failed to mount device /mnt/local-storage/local-sc/nvme0n1 at /var/lib/kubelet/plugins/kubernetes.io/local-volume/mounts/local-pv-a2502689 (fstype: xfs), error 'xfs_repair' found errors on device /mnt/local-storage/local-sc/nvme0n1 but could not correct them: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. Warning FailedMount 65s (x2 over 13m) kubelet, master-2 Unable to attach or mount volumes: unmounted volumes=[elasticsearch-storage], unattached volumes=[elasticsearch-storage elasticsearch-config certificates elasticsearch-token-vnrv8 elasticsearch-metrics]: timed out waiting for the condition Please find journal on master-2 attached. Version-Release number of selected component (if applicable): 4.5 How reproducible: Steps to Reproduce: 1. Deploy cluster 2. Deploy local-storage operator 3. Try to use pvc from local-storage Actual results: Pod stuck in containercreating Expected results: Pods should spin up successfully Master Log http://rdu-storage01.scalelab.redhat.com/sai/journal.tar.gz Node Log (of failed PODs): PV Dump: [kni@e19-h24-b01-fc640 logging]$ oc get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE local-pv-2f659adb 1490Gi RWO Delete Released openshift-logging/elasticsearch-elasticsearch-cdm-5b7zrrqe-1 local-sc 19m local-pv-5e03c2b0 1490Gi RWO Delete Released openshift-logging/elasticsearch-elasticsearch-cdm-5b7zrrqe-2 local-sc 20m local-pv-a2502689 1490Gi RWO Delete Released openshift-logging/elasticsearch-elasticsearch-cdm-5b7zrrqe-3 local-sc 19m (please note, to try to get it working I deleted the PVs and recreated them after hitting the issue but did not help) PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info:
We seem to have a buggy version of mount library bundled in OCP-4.5. Opened for master - https://github.com/openshift/origin/pull/25006 will backport once merged.
Verified with: 4.5.0-0.nightly-2020-05-30-025738
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409