Bug 1786779 - Local persistent storage of type Block is not working as documented for the monitoring stack
Summary: Local persistent storage of type Block is not working as documented for the m...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.4
Hardware: x86_64
OS: Unspecified
low
low
Target Milestone: ---
: 4.4.0
Assignee: Maxim Svistunov
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On: 1788502
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-12-27 23:14 UTC by Pedro Amoedo
Modified: 2023-03-24 16:35 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-04 10:50:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift openshift-docs pull 19354 0 None closed Bug 1786779: refer to filesystem storage classes, fix namespace 2021-01-19 05:23:47 UTC

Description Pedro Amoedo 2019-12-27 23:14:02 UTC
Description of problem:

When using the Local Storage Operator[1] with "volumeMode: Block" in order to setup the monitoring stack with local persistent storage, as documented here[2], the pods are unable to mount the volumes with the following message:

~~~
21s         Warning   FailedMount            pod/alertmanager-main-0                                    Unable to mount volumes for pod "alertmanager-main-0_openshift-monitoring(2b8392c7-28c1-11ea-9a78-060eee984c30)": timeout expired waiting for volumes to attach or mount for pod "openshift-monitoring"/"alertmanager-main-0". list of unmounted volumes=[localblock-pvc]. list of unattached volumes=[localblock-pvc config-volume secret-alertmanager-main-tls secret-alertmanager-main-proxy alertmanager-trusted-ca-bundle alertmanager-main-token-p4qzr]
~~~

The PVCs are correctly bounded and also using Block volumeMode:

~~~
NAME                                 STATUS   VOLUME              CAPACITY   ACCESS MODES   STORAGECLASS    AGE
localblock-pvc-alertmanager-main-0   Bound    local-pv-941ca86b   100Gi      RWO            localblock-sc   6h43m
localblock-pvc-alertmanager-main-1   Bound    local-pv-2f0d4757   100Gi      RWO            localblock-sc   6h43m
localblock-pvc-alertmanager-main-2   Bound    local-pv-127a69fe   100Gi      RWO            localblock-sc   6h43m
~~~

The "cluster-monitoring-config" configmap exists and includes also the "volumeMode: Block":

~~~
$ oc get cm cluster-monitoring-config -o yaml
apiVersion: v1
data:
  config.yaml: |
    alertmanagerMain:
      volumeClaimTemplate:
        metadata:
          name: localblock-pvc
        spec:
          accessModes:
          - ReadWriteOnce
          volumeMode: Block
          storageClassName: localblock-sc
          resources:
            requests:
              storage: 100Gi
kind: ConfigMap
metadata:
  creationTimestamp: "2019-12-27T10:50:00Z"
  name: cluster-monitoring-config
  namespace: openshift-monitoring
  resourceVersion: "56816549"
  selfLink: /api/v1/namespaces/openshift-monitoring/configmaps/cluster-monitoring-config
  uid: a14c2d7c-2896-11ea-bb18-0acbfff3d3f0
~~~

However, after the pods are properly respawned with the new config, they still have "volumeMounts" instead of "volumeDevices" as documented here[3]:

~~~
    volumeMounts:
    - mountPath: /etc/tls/private
      name: secret-alertmanager-main-tls
    - mountPath: /etc/proxy/secrets
      name: secret-alertmanager-main-proxy
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: alertmanager-main-token-p4qzr
      readOnly: true
~~~

NOTE: In this doc[4] is also stated that "Pods using raw block volumes must be configured to allow privileged containers." but this affects also the block volumes provided by the local storage operator?


[1] - https://docs.openshift.com/container-platform/4.2/storage/persistent-storage/persistent-storage-local.html#create-local-pvc_persistent-storage-local
[2] - https://docs.openshift.com/container-platform/4.2/monitoring/cluster-monitoring/configuring-the-monitoring-stack.html#configuring-a-local-persistent-volume-claim_configuring-monitoring
[3] - https://docs.openshift.com/container-platform/4.2/storage/understanding-persistent-storage.html#block-volume-examples_understanding-persistent-storage
[4] - https://docs.openshift.com/container-platform/4.2/storage/understanding-persistent-storage.html#block-volume-support_understanding-persistent-storage

Version-Release number of selected component (if applicable):

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.12    True        False         4d14h   Error while reconciling 4.2.12: the cluster operator monitoring is degraded

$ oc get clusterserviceversions.operators.coreos.com -A | grep local-storage-operator
local-storage                                           local-storage-operator.4.2.11-201912100122   Local Storage            4.2.11-201912100122                                                Succeeded

How reproducible:
Always

Steps to Reproduce:
1. Attach new block devices to the nodes.
2. Install Local Storage Operator.
2. Provision the local volumes with Block mode.
3. Configure the monitoring stack with a local persistent volume claim.

Actual results:

~~~
$ oc get pods | grep "NAME\|alert"
NAME                                           READY   STATUS              RESTARTS   AGE
alertmanager-main-0                            0/3     ContainerCreating   0          6h37m
alertmanager-main-1                            0/3     ContainerCreating   0          6h37m
alertmanager-main-2                            0/3     ContainerCreating   0          6h37m
~~~

Expected results:

~~~
$ oc get pods | grep "NAME\|alert"
NAME                                           READY   STATUS              RESTARTS   AGE
alertmanager-main-0                            3/3     Running             0          6h37m
alertmanager-main-1                            3/3     Running             0          6h37m
alertmanager-main-2                            3/3     Running             0          6h37m
~~~

Master Log:

*Please specify which logs from the RHCOS master nodes are needed here.

Node Log (of failed PODs):

*Please specify which logs from the RHCOS worker nodes are needed here.

PV Dump:

~~~
$ oc get pv local-pv-2f0d4757 -o yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/bound-by-controller: "yes"
    pv.kubernetes.io/provisioned-by: local-volume-provisioner-ip-10-0-163-177.eu-west-3.compute.internal-f7d1c8d4-d9f8-11e9-adec-0acbfff3d3f0
  creationTimestamp: "2019-12-27T11:25:49Z"
  finalizers:
  - kubernetes.io/pv-protection
  labels:
    storage.openshift.com/local-volume-owner-name: local-disks
    storage.openshift.com/local-volume-owner-namespace: local-storage
  name: local-pv-2f0d4757
  resourceVersion: "56726421"
  selfLink: /api/v1/persistentvolumes/local-pv-2f0d4757
  uid: a2218c51-289b-11ea-948a-0e7f869d2710
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 100Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: localblock-pvc-alertmanager-main-0
    namespace: openshift-monitoring
    resourceVersion: "56726411"
    uid: d8f997b6-289d-11ea-9a78-060eee984c30
  local:
    path: /mnt/local-storage/localblock-sc/nvme1n1
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - ip-10-0-163-177
  persistentVolumeReclaimPolicy: Delete
  storageClassName: localblock-sc
  volumeMode: Block
status:
  phase: Bound
~~~

PVC Dump:

~~~
$ oc get pvc localblock-pvc-alertmanager-main-0 -o yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
  creationTimestamp: "2019-12-27T11:41:40Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    alertmanager: main
    app: alertmanager
  name: localblock-pvc-alertmanager-main-0
  namespace: openshift-monitoring
  resourceVersion: "56726423"
  selfLink: /api/v1/namespaces/openshift-monitoring/persistentvolumeclaims/localblock-pvc-alertmanager-main-0
  uid: d8f997b6-289d-11ea-9a78-060eee984c30
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 40Gi
  storageClassName: localblock-sc
  volumeMode: Block
  volumeName: local-pv-2f0d4757
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 100Gi
  phase: Bound
~~~

StorageClass Dump (if StorageClass used by PV/PVC):

~~~
$ oc get sc localblock-sc -o yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "false"
  creationTimestamp: "2019-12-24T16:17:49Z"
  labels:
    local.storage.openshift.io/owner-name: local-disks
    local.storage.openshift.io/owner-namespace: local-storage
  name: localblock-sc
  ownerReferences:
  - apiVersion: local.storage.openshift.io/v1
    controller: true
    kind: LocalVolume
    name: local-disks
    uid: edbff4a6-2668-11ea-948a-0e7f869d2710
  resourceVersion: "56172723"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/localblock-sc
  uid: edcf8608-2668-11ea-948a-0e7f869d2710
provisioner: kubernetes.io/no-provisioner
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
~~~

Additional info:

~~~
$ oc get pod alertmanager-main-0 -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    openshift.io/scc: restricted
  creationTimestamp: "2019-12-27T11:41:40Z"
  generateName: alertmanager-main-
  labels:
    alertmanager: main
    app: alertmanager
    controller-revision-hash: alertmanager-main-585f84bd59
    statefulset.kubernetes.io/pod-name: alertmanager-main-0
  name: alertmanager-main-0
  namespace: openshift-monitoring
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: StatefulSet
    name: alertmanager-main
    uid: d8ed9523-289d-11ea-948a-0e7f869d2710
  resourceVersion: "56726445"
  selfLink: /api/v1/namespaces/openshift-monitoring/pods/alertmanager-main-0
  uid: d8fc8cca-289d-11ea-9a78-060eee984c30
spec:
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: alertmanager
              operator: In
              values:
              - main
          namespaces:
          - openshift-monitoring
          topologyKey: kubernetes.io/hostname
        weight: 100
  containers:
  - args:
    - --config.file=/etc/alertmanager/config/alertmanager.yaml
    - --cluster.listen-address=[$(POD_IP)]:9094
    - --storage.path=/alertmanager
    - --data.retention=120h
    - --web.listen-address=127.0.0.1:9093
    - --web.external-url=https://alertmanager-main-openshift-monitoring.apps.pamoedom.ocp4.xyz/
    - --web.route-prefix=/
    - --cluster.peer=alertmanager-main-0.alertmanager-operated.openshift-monitoring.svc:9094
    - --cluster.peer=alertmanager-main-1.alertmanager-operated.openshift-monitoring.svc:9094
    - --cluster.peer=alertmanager-main-2.alertmanager-operated.openshift-monitoring.svc:9094
    env:
    - name: POD_IP
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.podIP
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5a970d3eb87e42d7ae72b225f9607eb8a671a058d5bd66b61301264e48c698b4
    imagePullPolicy: IfNotPresent
    name: alertmanager
    ports:
    - containerPort: 9094
      name: mesh-tcp
      protocol: TCP
    - containerPort: 9094
      name: mesh-udp
      protocol: UDP
    resources:
      requests:
        memory: 200Mi
    securityContext:
      capabilities:
        drop:
        - KILL
        - MKNOD
        - SETGID
        - SETUID
      runAsUser: 1000370000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/alertmanager/config
      name: config-volume
    - mountPath: /alertmanager
      name: localblock-pvc
      subPath: alertmanager-db
    - mountPath: /etc/alertmanager/secrets/alertmanager-main-tls
      name: secret-alertmanager-main-tls
      readOnly: true
    - mountPath: /etc/alertmanager/secrets/alertmanager-main-proxy
      name: secret-alertmanager-main-proxy
      readOnly: true
    - mountPath: /etc/pki/alertmanager-ca-bundle/
      name: alertmanager-trusted-ca-bundle
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: alertmanager-main-token-p4qzr
      readOnly: true
  - args:
    - -webhook-url=http://localhost:9093/-/reload
    - -volume-dir=/etc/alertmanager/config
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:28978e532ef46c9d4edeabfb73a1e8561493d6681b940793cd2a385f9f70f27f
    imagePullPolicy: IfNotPresent
    name: config-reloader
    resources:
      limits:
        cpu: 100m
        memory: 25Mi
      requests:
        cpu: 100m
        memory: 25Mi
    securityContext:
      capabilities:
        drop:
        - KILL
        - MKNOD
        - SETGID
        - SETUID
      runAsUser: 1000370000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/alertmanager/config
      name: config-volume
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: alertmanager-main-token-p4qzr
      readOnly: true
  - args:
    - -provider=openshift
    - -https-address=:9095
    - -http-address=
    - -email-domain=*
    - -upstream=http://localhost:9093
    - '-openshift-sar={"resource": "namespaces", "verb": "get"}'
    - '-openshift-delegate-urls={"/": {"resource": "namespaces", "verb": "get"}}'
    - -tls-cert=/etc/tls/private/tls.crt
    - -tls-key=/etc/tls/private/tls.key
    - -client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token
    - -cookie-secret-file=/etc/proxy/secrets/session_secret
    - -openshift-service-account=alertmanager-main
    - -openshift-ca=/etc/pki/tls/cert.pem
    - -openshift-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    - -skip-auth-regex=^/metrics
    env:
    - name: HTTP_PROXY
    - name: HTTPS_PROXY
    - name: NO_PROXY
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:86e282e319f97bab460e8afd3dc3033448942f3418f7d6259d4c2ddae0a65923
    imagePullPolicy: IfNotPresent
    name: alertmanager-proxy
    ports:
    - containerPort: 9095
      name: web
      protocol: TCP
    resources: {}
    securityContext:
      capabilities:
        drop:
        - KILL
        - MKNOD
        - SETGID
        - SETUID
      runAsUser: 1000370000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: FallbackToLogsOnError
    volumeMounts:
    - mountPath: /etc/tls/private
      name: secret-alertmanager-main-tls
    - mountPath: /etc/proxy/secrets
      name: secret-alertmanager-main-proxy
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: alertmanager-main-token-p4qzr
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostname: alertmanager-main-0
  imagePullSecrets:
  - name: alertmanager-main-dockercfg-m955v
  nodeName: ip-10-0-163-177.eu-west-3.compute.internal
  nodeSelector:
    kubernetes.io/os: linux
  priority: 2000000000
  priorityClassName: system-cluster-critical
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1000370000
    seLinuxOptions:
      level: s0:c19,c14
  serviceAccount: alertmanager-main
  serviceAccountName: alertmanager-main
  subdomain: alertmanager-operated
  terminationGracePeriodSeconds: 120
  tolerations:
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: localblock-pvc
    persistentVolumeClaim:
      claimName: localblock-pvc-alertmanager-main-0
  - name: config-volume
    secret:
      defaultMode: 420
      secretName: alertmanager-main
  - name: secret-alertmanager-main-tls
    secret:
      defaultMode: 420
      secretName: alertmanager-main-tls
  - name: secret-alertmanager-main-proxy
    secret:
      defaultMode: 420
      secretName: alertmanager-main-proxy
  - configMap:
      defaultMode: 420
      name: alertmanager-trusted-ca-bundle-9tkc0fdgcg4s6
      optional: true
    name: alertmanager-trusted-ca-bundle
  - name: alertmanager-main-token-p4qzr
    secret:
      defaultMode: 420
      secretName: alertmanager-main-token-p4qzr
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2019-12-27T11:41:41Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2019-12-27T11:41:41Z"
    message: 'containers with unready status: [alertmanager config-reloader alertmanager-proxy]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2019-12-27T11:41:41Z"
    message: 'containers with unready status: [alertmanager config-reloader alertmanager-proxy]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2019-12-27T11:41:41Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5a970d3eb87e42d7ae72b225f9607eb8a671a058d5bd66b61301264e48c698b4
    imageID: ""
    lastState: {}
    name: alertmanager
    ready: false
    restartCount: 0
    state:
      waiting:
        reason: ContainerCreating
  - image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:86e282e319f97bab460e8afd3dc3033448942f3418f7d6259d4c2ddae0a65923
    imageID: ""
    lastState: {}
    name: alertmanager-proxy
    ready: false
    restartCount: 0
    state:
      waiting:
        reason: ContainerCreating
  - image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:28978e532ef46c9d4edeabfb73a1e8561493d6681b940793cd2a385f9f70f27f
    imageID: ""
    lastState: {}
    name: config-reloader
    ready: false
    restartCount: 0
    state:
      waiting:
        reason: ContainerCreating
  hostIP: 10.0.163.177
  phase: Pending
  qosClass: Burstable
  startTime: "2019-12-27T11:41:41Z"
~~~

Comment 4 Pawel Krupa 2020-01-06 09:02:06 UTC
Alertmanager and prometheus expect a POSIX filesystem with correct permissions for storage, block devices are not supported.

This needs a documentation fix.

Comment 5 Pedro Amoedo 2020-01-07 09:29:37 UTC
(In reply to Pawel Krupa from comment #4)
> Alertmanager and prometheus expect a POSIX filesystem with correct
> permissions for storage, block devices are not supported.
> 
> This needs a documentation fix.

Thanks for the confirmation @Pawel, that explains the unexpected behavior, I'll raise a DOC BZ to double check on the documentation part ASAP.

Best Regards.

Comment 6 Pedro Amoedo 2020-01-07 11:26:47 UTC
For the record, DOC BZ:

https://bugzilla.redhat.com/show_bug.cgi?id=1788502

Comment 8 Pawel Krupa 2020-01-27 10:43:48 UTC
To prevent from hinting on any type of StorageClass and to force usage of filesystem volume (as this is the only one we support) I suggest changing:


spec:
  ...
  storageClassName: gluster-block
  ...


to:


spec:
  ...
  storageClassName: fast
  volumeMode: filesystem
  ...


This is also how kubernetes docs [1] reference storage classes without favoring any one.

[1]: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistent-volumes

Comment 12 Junqi Zhao 2020-02-07 01:51:13 UTC
LGTM. Close it


Note You need to log in before you can comment on or make changes to this bug.