Alertmanager and prometheus expect a POSIX filesystem with correct permissions for storage, block devices are not supported.
This needs a documentation fix.
(In reply to Pawel Krupa from comment #4)
> Alertmanager and prometheus expect a POSIX filesystem with correct
> permissions for storage, block devices are not supported.
>
> This needs a documentation fix.
Thanks for the confirmation @Pawel, that explains the unexpected behavior, I'll raise a DOC BZ to double check on the documentation part ASAP.
Best Regards.
To prevent from hinting on any type of StorageClass and to force usage of filesystem volume (as this is the only one we support) I suggest changing:
spec:
...
storageClassName: gluster-block
...
to:
spec:
...
storageClassName: fast
volumeMode: filesystem
...
This is also how kubernetes docs [1] reference storage classes without favoring any one.
[1]: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistent-volumes
Description of problem: When using the Local Storage Operator[1] with "volumeMode: Block" in order to setup the monitoring stack with local persistent storage, as documented here[2], the pods are unable to mount the volumes with the following message: ~~~ 21s Warning FailedMount pod/alertmanager-main-0 Unable to mount volumes for pod "alertmanager-main-0_openshift-monitoring(2b8392c7-28c1-11ea-9a78-060eee984c30)": timeout expired waiting for volumes to attach or mount for pod "openshift-monitoring"/"alertmanager-main-0". list of unmounted volumes=[localblock-pvc]. list of unattached volumes=[localblock-pvc config-volume secret-alertmanager-main-tls secret-alertmanager-main-proxy alertmanager-trusted-ca-bundle alertmanager-main-token-p4qzr] ~~~ The PVCs are correctly bounded and also using Block volumeMode: ~~~ NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE localblock-pvc-alertmanager-main-0 Bound local-pv-941ca86b 100Gi RWO localblock-sc 6h43m localblock-pvc-alertmanager-main-1 Bound local-pv-2f0d4757 100Gi RWO localblock-sc 6h43m localblock-pvc-alertmanager-main-2 Bound local-pv-127a69fe 100Gi RWO localblock-sc 6h43m ~~~ The "cluster-monitoring-config" configmap exists and includes also the "volumeMode: Block": ~~~ $ oc get cm cluster-monitoring-config -o yaml apiVersion: v1 data: config.yaml: | alertmanagerMain: volumeClaimTemplate: metadata: name: localblock-pvc spec: accessModes: - ReadWriteOnce volumeMode: Block storageClassName: localblock-sc resources: requests: storage: 100Gi kind: ConfigMap metadata: creationTimestamp: "2019-12-27T10:50:00Z" name: cluster-monitoring-config namespace: openshift-monitoring resourceVersion: "56816549" selfLink: /api/v1/namespaces/openshift-monitoring/configmaps/cluster-monitoring-config uid: a14c2d7c-2896-11ea-bb18-0acbfff3d3f0 ~~~ However, after the pods are properly respawned with the new config, they still have "volumeMounts" instead of "volumeDevices" as documented here[3]: ~~~ volumeMounts: - mountPath: /etc/tls/private name: secret-alertmanager-main-tls - mountPath: /etc/proxy/secrets name: secret-alertmanager-main-proxy - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: alertmanager-main-token-p4qzr readOnly: true ~~~ NOTE: In this doc[4] is also stated that "Pods using raw block volumes must be configured to allow privileged containers." but this affects also the block volumes provided by the local storage operator? [1] - https://docs.openshift.com/container-platform/4.2/storage/persistent-storage/persistent-storage-local.html#create-local-pvc_persistent-storage-local [2] - https://docs.openshift.com/container-platform/4.2/monitoring/cluster-monitoring/configuring-the-monitoring-stack.html#configuring-a-local-persistent-volume-claim_configuring-monitoring [3] - https://docs.openshift.com/container-platform/4.2/storage/understanding-persistent-storage.html#block-volume-examples_understanding-persistent-storage [4] - https://docs.openshift.com/container-platform/4.2/storage/understanding-persistent-storage.html#block-volume-support_understanding-persistent-storage Version-Release number of selected component (if applicable): $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.12 True False 4d14h Error while reconciling 4.2.12: the cluster operator monitoring is degraded $ oc get clusterserviceversions.operators.coreos.com -A | grep local-storage-operator local-storage local-storage-operator.4.2.11-201912100122 Local Storage 4.2.11-201912100122 Succeeded How reproducible: Always Steps to Reproduce: 1. Attach new block devices to the nodes. 2. Install Local Storage Operator. 2. Provision the local volumes with Block mode. 3. Configure the monitoring stack with a local persistent volume claim. Actual results: ~~~ $ oc get pods | grep "NAME\|alert" NAME READY STATUS RESTARTS AGE alertmanager-main-0 0/3 ContainerCreating 0 6h37m alertmanager-main-1 0/3 ContainerCreating 0 6h37m alertmanager-main-2 0/3 ContainerCreating 0 6h37m ~~~ Expected results: ~~~ $ oc get pods | grep "NAME\|alert" NAME READY STATUS RESTARTS AGE alertmanager-main-0 3/3 Running 0 6h37m alertmanager-main-1 3/3 Running 0 6h37m alertmanager-main-2 3/3 Running 0 6h37m ~~~ Master Log: *Please specify which logs from the RHCOS master nodes are needed here. Node Log (of failed PODs): *Please specify which logs from the RHCOS worker nodes are needed here. PV Dump: ~~~ $ oc get pv local-pv-2f0d4757 -o yaml apiVersion: v1 kind: PersistentVolume metadata: annotations: pv.kubernetes.io/bound-by-controller: "yes" pv.kubernetes.io/provisioned-by: local-volume-provisioner-ip-10-0-163-177.eu-west-3.compute.internal-f7d1c8d4-d9f8-11e9-adec-0acbfff3d3f0 creationTimestamp: "2019-12-27T11:25:49Z" finalizers: - kubernetes.io/pv-protection labels: storage.openshift.com/local-volume-owner-name: local-disks storage.openshift.com/local-volume-owner-namespace: local-storage name: local-pv-2f0d4757 resourceVersion: "56726421" selfLink: /api/v1/persistentvolumes/local-pv-2f0d4757 uid: a2218c51-289b-11ea-948a-0e7f869d2710 spec: accessModes: - ReadWriteOnce capacity: storage: 100Gi claimRef: apiVersion: v1 kind: PersistentVolumeClaim name: localblock-pvc-alertmanager-main-0 namespace: openshift-monitoring resourceVersion: "56726411" uid: d8f997b6-289d-11ea-9a78-060eee984c30 local: path: /mnt/local-storage/localblock-sc/nvme1n1 nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - ip-10-0-163-177 persistentVolumeReclaimPolicy: Delete storageClassName: localblock-sc volumeMode: Block status: phase: Bound ~~~ PVC Dump: ~~~ $ oc get pvc localblock-pvc-alertmanager-main-0 -o yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: annotations: pv.kubernetes.io/bind-completed: "yes" pv.kubernetes.io/bound-by-controller: "yes" creationTimestamp: "2019-12-27T11:41:40Z" finalizers: - kubernetes.io/pvc-protection labels: alertmanager: main app: alertmanager name: localblock-pvc-alertmanager-main-0 namespace: openshift-monitoring resourceVersion: "56726423" selfLink: /api/v1/namespaces/openshift-monitoring/persistentvolumeclaims/localblock-pvc-alertmanager-main-0 uid: d8f997b6-289d-11ea-9a78-060eee984c30 spec: accessModes: - ReadWriteOnce resources: requests: storage: 40Gi storageClassName: localblock-sc volumeMode: Block volumeName: local-pv-2f0d4757 status: accessModes: - ReadWriteOnce capacity: storage: 100Gi phase: Bound ~~~ StorageClass Dump (if StorageClass used by PV/PVC): ~~~ $ oc get sc localblock-sc -o yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: annotations: storageclass.kubernetes.io/is-default-class: "false" creationTimestamp: "2019-12-24T16:17:49Z" labels: local.storage.openshift.io/owner-name: local-disks local.storage.openshift.io/owner-namespace: local-storage name: localblock-sc ownerReferences: - apiVersion: local.storage.openshift.io/v1 controller: true kind: LocalVolume name: local-disks uid: edbff4a6-2668-11ea-948a-0e7f869d2710 resourceVersion: "56172723" selfLink: /apis/storage.k8s.io/v1/storageclasses/localblock-sc uid: edcf8608-2668-11ea-948a-0e7f869d2710 provisioner: kubernetes.io/no-provisioner reclaimPolicy: Delete volumeBindingMode: WaitForFirstConsumer ~~~ Additional info: ~~~ $ oc get pod alertmanager-main-0 -o yaml apiVersion: v1 kind: Pod metadata: annotations: openshift.io/scc: restricted creationTimestamp: "2019-12-27T11:41:40Z" generateName: alertmanager-main- labels: alertmanager: main app: alertmanager controller-revision-hash: alertmanager-main-585f84bd59 statefulset.kubernetes.io/pod-name: alertmanager-main-0 name: alertmanager-main-0 namespace: openshift-monitoring ownerReferences: - apiVersion: apps/v1 blockOwnerDeletion: true controller: true kind: StatefulSet name: alertmanager-main uid: d8ed9523-289d-11ea-948a-0e7f869d2710 resourceVersion: "56726445" selfLink: /api/v1/namespaces/openshift-monitoring/pods/alertmanager-main-0 uid: d8fc8cca-289d-11ea-9a78-060eee984c30 spec: affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - podAffinityTerm: labelSelector: matchExpressions: - key: alertmanager operator: In values: - main namespaces: - openshift-monitoring topologyKey: kubernetes.io/hostname weight: 100 containers: - args: - --config.file=/etc/alertmanager/config/alertmanager.yaml - --cluster.listen-address=[$(POD_IP)]:9094 - --storage.path=/alertmanager - --data.retention=120h - --web.listen-address=127.0.0.1:9093 - --web.external-url=https://alertmanager-main-openshift-monitoring.apps.pamoedom.ocp4.xyz/ - --web.route-prefix=/ - --cluster.peer=alertmanager-main-0.alertmanager-operated.openshift-monitoring.svc:9094 - --cluster.peer=alertmanager-main-1.alertmanager-operated.openshift-monitoring.svc:9094 - --cluster.peer=alertmanager-main-2.alertmanager-operated.openshift-monitoring.svc:9094 env: - name: POD_IP valueFrom: fieldRef: apiVersion: v1 fieldPath: status.podIP image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5a970d3eb87e42d7ae72b225f9607eb8a671a058d5bd66b61301264e48c698b4 imagePullPolicy: IfNotPresent name: alertmanager ports: - containerPort: 9094 name: mesh-tcp protocol: TCP - containerPort: 9094 name: mesh-udp protocol: UDP resources: requests: memory: 200Mi securityContext: capabilities: drop: - KILL - MKNOD - SETGID - SETUID runAsUser: 1000370000 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /etc/alertmanager/config name: config-volume - mountPath: /alertmanager name: localblock-pvc subPath: alertmanager-db - mountPath: /etc/alertmanager/secrets/alertmanager-main-tls name: secret-alertmanager-main-tls readOnly: true - mountPath: /etc/alertmanager/secrets/alertmanager-main-proxy name: secret-alertmanager-main-proxy readOnly: true - mountPath: /etc/pki/alertmanager-ca-bundle/ name: alertmanager-trusted-ca-bundle readOnly: true - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: alertmanager-main-token-p4qzr readOnly: true - args: - -webhook-url=http://localhost:9093/-/reload - -volume-dir=/etc/alertmanager/config image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:28978e532ef46c9d4edeabfb73a1e8561493d6681b940793cd2a385f9f70f27f imagePullPolicy: IfNotPresent name: config-reloader resources: limits: cpu: 100m memory: 25Mi requests: cpu: 100m memory: 25Mi securityContext: capabilities: drop: - KILL - MKNOD - SETGID - SETUID runAsUser: 1000370000 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /etc/alertmanager/config name: config-volume readOnly: true - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: alertmanager-main-token-p4qzr readOnly: true - args: - -provider=openshift - -https-address=:9095 - -http-address= - -email-domain=* - -upstream=http://localhost:9093 - '-openshift-sar={"resource": "namespaces", "verb": "get"}' - '-openshift-delegate-urls={"/": {"resource": "namespaces", "verb": "get"}}' - -tls-cert=/etc/tls/private/tls.crt - -tls-key=/etc/tls/private/tls.key - -client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token - -cookie-secret-file=/etc/proxy/secrets/session_secret - -openshift-service-account=alertmanager-main - -openshift-ca=/etc/pki/tls/cert.pem - -openshift-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt - -skip-auth-regex=^/metrics env: - name: HTTP_PROXY - name: HTTPS_PROXY - name: NO_PROXY image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:86e282e319f97bab460e8afd3dc3033448942f3418f7d6259d4c2ddae0a65923 imagePullPolicy: IfNotPresent name: alertmanager-proxy ports: - containerPort: 9095 name: web protocol: TCP resources: {} securityContext: capabilities: drop: - KILL - MKNOD - SETGID - SETUID runAsUser: 1000370000 terminationMessagePath: /dev/termination-log terminationMessagePolicy: FallbackToLogsOnError volumeMounts: - mountPath: /etc/tls/private name: secret-alertmanager-main-tls - mountPath: /etc/proxy/secrets name: secret-alertmanager-main-proxy - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: alertmanager-main-token-p4qzr readOnly: true dnsPolicy: ClusterFirst enableServiceLinks: true hostname: alertmanager-main-0 imagePullSecrets: - name: alertmanager-main-dockercfg-m955v nodeName: ip-10-0-163-177.eu-west-3.compute.internal nodeSelector: kubernetes.io/os: linux priority: 2000000000 priorityClassName: system-cluster-critical restartPolicy: Always schedulerName: default-scheduler securityContext: fsGroup: 1000370000 seLinuxOptions: level: s0:c19,c14 serviceAccount: alertmanager-main serviceAccountName: alertmanager-main subdomain: alertmanager-operated terminationGracePeriodSeconds: 120 tolerations: - effect: NoSchedule key: node.kubernetes.io/memory-pressure operator: Exists - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 300 - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 300 volumes: - name: localblock-pvc persistentVolumeClaim: claimName: localblock-pvc-alertmanager-main-0 - name: config-volume secret: defaultMode: 420 secretName: alertmanager-main - name: secret-alertmanager-main-tls secret: defaultMode: 420 secretName: alertmanager-main-tls - name: secret-alertmanager-main-proxy secret: defaultMode: 420 secretName: alertmanager-main-proxy - configMap: defaultMode: 420 name: alertmanager-trusted-ca-bundle-9tkc0fdgcg4s6 optional: true name: alertmanager-trusted-ca-bundle - name: alertmanager-main-token-p4qzr secret: defaultMode: 420 secretName: alertmanager-main-token-p4qzr status: conditions: - lastProbeTime: null lastTransitionTime: "2019-12-27T11:41:41Z" status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: "2019-12-27T11:41:41Z" message: 'containers with unready status: [alertmanager config-reloader alertmanager-proxy]' reason: ContainersNotReady status: "False" type: Ready - lastProbeTime: null lastTransitionTime: "2019-12-27T11:41:41Z" message: 'containers with unready status: [alertmanager config-reloader alertmanager-proxy]' reason: ContainersNotReady status: "False" type: ContainersReady - lastProbeTime: null lastTransitionTime: "2019-12-27T11:41:41Z" status: "True" type: PodScheduled containerStatuses: - image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5a970d3eb87e42d7ae72b225f9607eb8a671a058d5bd66b61301264e48c698b4 imageID: "" lastState: {} name: alertmanager ready: false restartCount: 0 state: waiting: reason: ContainerCreating - image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:86e282e319f97bab460e8afd3dc3033448942f3418f7d6259d4c2ddae0a65923 imageID: "" lastState: {} name: alertmanager-proxy ready: false restartCount: 0 state: waiting: reason: ContainerCreating - image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:28978e532ef46c9d4edeabfb73a1e8561493d6681b940793cd2a385f9f70f27f imageID: "" lastState: {} name: config-reloader ready: false restartCount: 0 state: waiting: reason: ContainerCreating hostIP: 10.0.163.177 phase: Pending qosClass: Burstable startTime: "2019-12-27T11:41:41Z" ~~~