Description of problem: When using the Local Storage Operator[1] with "volumeMode: Block" in order to setup the monitoring stack with local persistent storage, as documented here[2], the pods are unable to mount the volumes with the following message: ~~~ 21s Warning FailedMount pod/alertmanager-main-0 Unable to mount volumes for pod "alertmanager-main-0_openshift-monitoring(2b8392c7-28c1-11ea-9a78-060eee984c30)": timeout expired waiting for volumes to attach or mount for pod "openshift-monitoring"/"alertmanager-main-0". list of unmounted volumes=[localblock-pvc]. list of unattached volumes=[localblock-pvc config-volume secret-alertmanager-main-tls secret-alertmanager-main-proxy alertmanager-trusted-ca-bundle alertmanager-main-token-p4qzr] ~~~ The PVCs are correctly bounded and also using Block volumeMode: ~~~ NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE localblock-pvc-alertmanager-main-0 Bound local-pv-941ca86b 100Gi RWO localblock-sc 6h43m localblock-pvc-alertmanager-main-1 Bound local-pv-2f0d4757 100Gi RWO localblock-sc 6h43m localblock-pvc-alertmanager-main-2 Bound local-pv-127a69fe 100Gi RWO localblock-sc 6h43m ~~~ The "cluster-monitoring-config" configmap exists and includes also the "volumeMode: Block": ~~~ $ oc get cm cluster-monitoring-config -o yaml apiVersion: v1 data: config.yaml: | alertmanagerMain: volumeClaimTemplate: metadata: name: localblock-pvc spec: accessModes: - ReadWriteOnce volumeMode: Block storageClassName: localblock-sc resources: requests: storage: 100Gi kind: ConfigMap metadata: creationTimestamp: "2019-12-27T10:50:00Z" name: cluster-monitoring-config namespace: openshift-monitoring resourceVersion: "56816549" selfLink: /api/v1/namespaces/openshift-monitoring/configmaps/cluster-monitoring-config uid: a14c2d7c-2896-11ea-bb18-0acbfff3d3f0 ~~~ However, after the pods are properly respawned with the new config, they still have "volumeMounts" instead of "volumeDevices" as documented here[3]: ~~~ volumeMounts: - mountPath: /etc/tls/private name: secret-alertmanager-main-tls - mountPath: /etc/proxy/secrets name: secret-alertmanager-main-proxy - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: alertmanager-main-token-p4qzr readOnly: true ~~~ NOTE: In this doc[4] is also stated that "Pods using raw block volumes must be configured to allow privileged containers." but this affects also the block volumes provided by the local storage operator? [1] - https://docs.openshift.com/container-platform/4.2/storage/persistent-storage/persistent-storage-local.html#create-local-pvc_persistent-storage-local [2] - https://docs.openshift.com/container-platform/4.2/monitoring/cluster-monitoring/configuring-the-monitoring-stack.html#configuring-a-local-persistent-volume-claim_configuring-monitoring [3] - https://docs.openshift.com/container-platform/4.2/storage/understanding-persistent-storage.html#block-volume-examples_understanding-persistent-storage [4] - https://docs.openshift.com/container-platform/4.2/storage/understanding-persistent-storage.html#block-volume-support_understanding-persistent-storage Version-Release number of selected component (if applicable): $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.12 True False 4d14h Error while reconciling 4.2.12: the cluster operator monitoring is degraded $ oc get clusterserviceversions.operators.coreos.com -A | grep local-storage-operator local-storage local-storage-operator.4.2.11-201912100122 Local Storage 4.2.11-201912100122 Succeeded How reproducible: Always Steps to Reproduce: 1. Attach new block devices to the nodes. 2. Install Local Storage Operator. 2. Provision the local volumes with Block mode. 3. Configure the monitoring stack with a local persistent volume claim. Actual results: ~~~ $ oc get pods | grep "NAME\|alert" NAME READY STATUS RESTARTS AGE alertmanager-main-0 0/3 ContainerCreating 0 6h37m alertmanager-main-1 0/3 ContainerCreating 0 6h37m alertmanager-main-2 0/3 ContainerCreating 0 6h37m ~~~ Expected results: ~~~ $ oc get pods | grep "NAME\|alert" NAME READY STATUS RESTARTS AGE alertmanager-main-0 3/3 Running 0 6h37m alertmanager-main-1 3/3 Running 0 6h37m alertmanager-main-2 3/3 Running 0 6h37m ~~~ Master Log: *Please specify which logs from the RHCOS master nodes are needed here. Node Log (of failed PODs): *Please specify which logs from the RHCOS worker nodes are needed here. PV Dump: ~~~ $ oc get pv local-pv-2f0d4757 -o yaml apiVersion: v1 kind: PersistentVolume metadata: annotations: pv.kubernetes.io/bound-by-controller: "yes" pv.kubernetes.io/provisioned-by: local-volume-provisioner-ip-10-0-163-177.eu-west-3.compute.internal-f7d1c8d4-d9f8-11e9-adec-0acbfff3d3f0 creationTimestamp: "2019-12-27T11:25:49Z" finalizers: - kubernetes.io/pv-protection labels: storage.openshift.com/local-volume-owner-name: local-disks storage.openshift.com/local-volume-owner-namespace: local-storage name: local-pv-2f0d4757 resourceVersion: "56726421" selfLink: /api/v1/persistentvolumes/local-pv-2f0d4757 uid: a2218c51-289b-11ea-948a-0e7f869d2710 spec: accessModes: - ReadWriteOnce capacity: storage: 100Gi claimRef: apiVersion: v1 kind: PersistentVolumeClaim name: localblock-pvc-alertmanager-main-0 namespace: openshift-monitoring resourceVersion: "56726411" uid: d8f997b6-289d-11ea-9a78-060eee984c30 local: path: /mnt/local-storage/localblock-sc/nvme1n1 nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - ip-10-0-163-177 persistentVolumeReclaimPolicy: Delete storageClassName: localblock-sc volumeMode: Block status: phase: Bound ~~~ PVC Dump: ~~~ $ oc get pvc localblock-pvc-alertmanager-main-0 -o yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: annotations: pv.kubernetes.io/bind-completed: "yes" pv.kubernetes.io/bound-by-controller: "yes" creationTimestamp: "2019-12-27T11:41:40Z" finalizers: - kubernetes.io/pvc-protection labels: alertmanager: main app: alertmanager name: localblock-pvc-alertmanager-main-0 namespace: openshift-monitoring resourceVersion: "56726423" selfLink: /api/v1/namespaces/openshift-monitoring/persistentvolumeclaims/localblock-pvc-alertmanager-main-0 uid: d8f997b6-289d-11ea-9a78-060eee984c30 spec: accessModes: - ReadWriteOnce resources: requests: storage: 40Gi storageClassName: localblock-sc volumeMode: Block volumeName: local-pv-2f0d4757 status: accessModes: - ReadWriteOnce capacity: storage: 100Gi phase: Bound ~~~ StorageClass Dump (if StorageClass used by PV/PVC): ~~~ $ oc get sc localblock-sc -o yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: annotations: storageclass.kubernetes.io/is-default-class: "false" creationTimestamp: "2019-12-24T16:17:49Z" labels: local.storage.openshift.io/owner-name: local-disks local.storage.openshift.io/owner-namespace: local-storage name: localblock-sc ownerReferences: - apiVersion: local.storage.openshift.io/v1 controller: true kind: LocalVolume name: local-disks uid: edbff4a6-2668-11ea-948a-0e7f869d2710 resourceVersion: "56172723" selfLink: /apis/storage.k8s.io/v1/storageclasses/localblock-sc uid: edcf8608-2668-11ea-948a-0e7f869d2710 provisioner: kubernetes.io/no-provisioner reclaimPolicy: Delete volumeBindingMode: WaitForFirstConsumer ~~~ Additional info: ~~~ $ oc get pod alertmanager-main-0 -o yaml apiVersion: v1 kind: Pod metadata: annotations: openshift.io/scc: restricted creationTimestamp: "2019-12-27T11:41:40Z" generateName: alertmanager-main- labels: alertmanager: main app: alertmanager controller-revision-hash: alertmanager-main-585f84bd59 statefulset.kubernetes.io/pod-name: alertmanager-main-0 name: alertmanager-main-0 namespace: openshift-monitoring ownerReferences: - apiVersion: apps/v1 blockOwnerDeletion: true controller: true kind: StatefulSet name: alertmanager-main uid: d8ed9523-289d-11ea-948a-0e7f869d2710 resourceVersion: "56726445" selfLink: /api/v1/namespaces/openshift-monitoring/pods/alertmanager-main-0 uid: d8fc8cca-289d-11ea-9a78-060eee984c30 spec: affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - podAffinityTerm: labelSelector: matchExpressions: - key: alertmanager operator: In values: - main namespaces: - openshift-monitoring topologyKey: kubernetes.io/hostname weight: 100 containers: - args: - --config.file=/etc/alertmanager/config/alertmanager.yaml - --cluster.listen-address=[$(POD_IP)]:9094 - --storage.path=/alertmanager - --data.retention=120h - --web.listen-address=127.0.0.1:9093 - --web.external-url=https://alertmanager-main-openshift-monitoring.apps.pamoedom.ocp4.xyz/ - --web.route-prefix=/ - --cluster.peer=alertmanager-main-0.alertmanager-operated.openshift-monitoring.svc:9094 - --cluster.peer=alertmanager-main-1.alertmanager-operated.openshift-monitoring.svc:9094 - --cluster.peer=alertmanager-main-2.alertmanager-operated.openshift-monitoring.svc:9094 env: - name: POD_IP valueFrom: fieldRef: apiVersion: v1 fieldPath: status.podIP image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5a970d3eb87e42d7ae72b225f9607eb8a671a058d5bd66b61301264e48c698b4 imagePullPolicy: IfNotPresent name: alertmanager ports: - containerPort: 9094 name: mesh-tcp protocol: TCP - containerPort: 9094 name: mesh-udp protocol: UDP resources: requests: memory: 200Mi securityContext: capabilities: drop: - KILL - MKNOD - SETGID - SETUID runAsUser: 1000370000 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /etc/alertmanager/config name: config-volume - mountPath: /alertmanager name: localblock-pvc subPath: alertmanager-db - mountPath: /etc/alertmanager/secrets/alertmanager-main-tls name: secret-alertmanager-main-tls readOnly: true - mountPath: /etc/alertmanager/secrets/alertmanager-main-proxy name: secret-alertmanager-main-proxy readOnly: true - mountPath: /etc/pki/alertmanager-ca-bundle/ name: alertmanager-trusted-ca-bundle readOnly: true - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: alertmanager-main-token-p4qzr readOnly: true - args: - -webhook-url=http://localhost:9093/-/reload - -volume-dir=/etc/alertmanager/config image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:28978e532ef46c9d4edeabfb73a1e8561493d6681b940793cd2a385f9f70f27f imagePullPolicy: IfNotPresent name: config-reloader resources: limits: cpu: 100m memory: 25Mi requests: cpu: 100m memory: 25Mi securityContext: capabilities: drop: - KILL - MKNOD - SETGID - SETUID runAsUser: 1000370000 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /etc/alertmanager/config name: config-volume readOnly: true - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: alertmanager-main-token-p4qzr readOnly: true - args: - -provider=openshift - -https-address=:9095 - -http-address= - -email-domain=* - -upstream=http://localhost:9093 - '-openshift-sar={"resource": "namespaces", "verb": "get"}' - '-openshift-delegate-urls={"/": {"resource": "namespaces", "verb": "get"}}' - -tls-cert=/etc/tls/private/tls.crt - -tls-key=/etc/tls/private/tls.key - -client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token - -cookie-secret-file=/etc/proxy/secrets/session_secret - -openshift-service-account=alertmanager-main - -openshift-ca=/etc/pki/tls/cert.pem - -openshift-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt - -skip-auth-regex=^/metrics env: - name: HTTP_PROXY - name: HTTPS_PROXY - name: NO_PROXY image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:86e282e319f97bab460e8afd3dc3033448942f3418f7d6259d4c2ddae0a65923 imagePullPolicy: IfNotPresent name: alertmanager-proxy ports: - containerPort: 9095 name: web protocol: TCP resources: {} securityContext: capabilities: drop: - KILL - MKNOD - SETGID - SETUID runAsUser: 1000370000 terminationMessagePath: /dev/termination-log terminationMessagePolicy: FallbackToLogsOnError volumeMounts: - mountPath: /etc/tls/private name: secret-alertmanager-main-tls - mountPath: /etc/proxy/secrets name: secret-alertmanager-main-proxy - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: alertmanager-main-token-p4qzr readOnly: true dnsPolicy: ClusterFirst enableServiceLinks: true hostname: alertmanager-main-0 imagePullSecrets: - name: alertmanager-main-dockercfg-m955v nodeName: ip-10-0-163-177.eu-west-3.compute.internal nodeSelector: kubernetes.io/os: linux priority: 2000000000 priorityClassName: system-cluster-critical restartPolicy: Always schedulerName: default-scheduler securityContext: fsGroup: 1000370000 seLinuxOptions: level: s0:c19,c14 serviceAccount: alertmanager-main serviceAccountName: alertmanager-main subdomain: alertmanager-operated terminationGracePeriodSeconds: 120 tolerations: - effect: NoSchedule key: node.kubernetes.io/memory-pressure operator: Exists - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 300 - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 300 volumes: - name: localblock-pvc persistentVolumeClaim: claimName: localblock-pvc-alertmanager-main-0 - name: config-volume secret: defaultMode: 420 secretName: alertmanager-main - name: secret-alertmanager-main-tls secret: defaultMode: 420 secretName: alertmanager-main-tls - name: secret-alertmanager-main-proxy secret: defaultMode: 420 secretName: alertmanager-main-proxy - configMap: defaultMode: 420 name: alertmanager-trusted-ca-bundle-9tkc0fdgcg4s6 optional: true name: alertmanager-trusted-ca-bundle - name: alertmanager-main-token-p4qzr secret: defaultMode: 420 secretName: alertmanager-main-token-p4qzr status: conditions: - lastProbeTime: null lastTransitionTime: "2019-12-27T11:41:41Z" status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: "2019-12-27T11:41:41Z" message: 'containers with unready status: [alertmanager config-reloader alertmanager-proxy]' reason: ContainersNotReady status: "False" type: Ready - lastProbeTime: null lastTransitionTime: "2019-12-27T11:41:41Z" message: 'containers with unready status: [alertmanager config-reloader alertmanager-proxy]' reason: ContainersNotReady status: "False" type: ContainersReady - lastProbeTime: null lastTransitionTime: "2019-12-27T11:41:41Z" status: "True" type: PodScheduled containerStatuses: - image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5a970d3eb87e42d7ae72b225f9607eb8a671a058d5bd66b61301264e48c698b4 imageID: "" lastState: {} name: alertmanager ready: false restartCount: 0 state: waiting: reason: ContainerCreating - image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:86e282e319f97bab460e8afd3dc3033448942f3418f7d6259d4c2ddae0a65923 imageID: "" lastState: {} name: alertmanager-proxy ready: false restartCount: 0 state: waiting: reason: ContainerCreating - image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:28978e532ef46c9d4edeabfb73a1e8561493d6681b940793cd2a385f9f70f27f imageID: "" lastState: {} name: config-reloader ready: false restartCount: 0 state: waiting: reason: ContainerCreating hostIP: 10.0.163.177 phase: Pending qosClass: Burstable startTime: "2019-12-27T11:41:41Z" ~~~
Alertmanager and prometheus expect a POSIX filesystem with correct permissions for storage, block devices are not supported. This needs a documentation fix.
(In reply to Pawel Krupa from comment #4) > Alertmanager and prometheus expect a POSIX filesystem with correct > permissions for storage, block devices are not supported. > > This needs a documentation fix. Thanks for the confirmation @Pawel, that explains the unexpected behavior, I'll raise a DOC BZ to double check on the documentation part ASAP. Best Regards.
For the record, DOC BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1788502
To prevent from hinting on any type of StorageClass and to force usage of filesystem volume (as this is the only one we support) I suggest changing: spec: ... storageClassName: gluster-block ... to: spec: ... storageClassName: fast volumeMode: filesystem ... This is also how kubernetes docs [1] reference storage classes without favoring any one. [1]: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistent-volumes
LGTM. Close it