Bug 1652546 - Heketi pod crashed with error "Transport endpoint is not connected" and "All subvolumes are down" on FIPS enabled systems
Summary: Heketi pod crashed with error "Transport endpoint is not connected" and "All ...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: heketi
Version: ocs-3.11
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: ---
Assignee: Niels de Vos
QA Contact: Qixuan Wang
URL: https://www.redhat.com/en/about/press...
Whiteboard:
Depends On: 1459709 1650512
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-22 10:51 UTC by Qixuan Wang
Modified: 2019-09-23 17:57 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-23 17:57:53 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1615752 1 None None None 2023-06-16 10:40:49 UTC

Internal Links: 1615752

Comment 2 Niels de Vos 2018-11-22 15:12:44 UTC
From /var/log/glusterfs/bricks/var-lib-heketi-mounts-vg_...-brick.log in the glusterfs-storage-* pods:

[2018-11-22 09:01:32.193888] I [MSGID: 115029] [server-handshake.c:564:server_setvolume] 0-heketidbstorage-server: accepted client from cnv-executor-qwang-node2.example.com-4328-2018/11/22-09:01:32:217547-vol_c41a1f522a82424fcbd7df73b20c8369-client-2-0-0 (version: 3.12.2) with subvol /var/lib/heketi/mounts/vg_12699ad6ca15895d52324387c77b83ad/brick_cede07f142076e244057a871bd17be8d/brick
pending frames:
frame : type(0) op(37)
frame : type(0) op(29)
frame : type(0) op(16)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash: 
2018-11-22 09:01:32
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.12.2
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x9d)[0x7f6503715dfd]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f650371fec4]
/lib64/libc.so.6(+0x36280)[0x7f6501d75280]
/lib64/libc.so.6(gsignal+0x37)[0x7f6501d75207]
/lib64/libc.so.6(abort+0x148)[0x7f6501d768f8]
/lib64/libcrypto.so.10(+0x6da8f)[0x7f6502179a8f]
/lib64/libcrypto.so.10(MD5_Init+0x49)[0x7f6502180309]
/lib64/libcrypto.so.10(MD5+0x39)[0x7f6502180349]
/usr/lib64/glusterfs/3.12.2/xlator/storage/posix.so(+0x1d35b)[0x7f64fc56c35b]
/lib64/libglusterfs.so.0(default_rchecksum+0xd5)[0x7f6503796405]
/lib64/libglusterfs.so.0(default_rchecksum+0xd5)[0x7f6503796405]
/lib64/libglusterfs.so.0(default_rchecksum+0xd5)[0x7f6503796405]
/lib64/libglusterfs.so.0(default_rchecksum+0xd5)[0x7f6503796405]
/lib64/libglusterfs.so.0(default_rchecksum+0xd5)[0x7f6503796405]
/usr/lib64/glusterfs/3.12.2/xlator/features/locks.so(+0xd249)[0x7f64f6a6c249]
/lib64/libglusterfs.so.0(default_rchecksum+0xd5)[0x7f6503796405]
/lib64/libglusterfs.so.0(default_rchecksum+0xd5)[0x7f6503796405]
/lib64/libglusterfs.so.0(default_rchecksum+0xd5)[0x7f6503796405]
/lib64/libglusterfs.so.0(default_rchecksum+0xd5)[0x7f6503796405]
/lib64/libglusterfs.so.0(default_rchecksum_resume+0x1e3)[0x7f65037b37e3]
/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f650373a865]
/usr/lib64/glusterfs/3.12.2/xlator/performance/io-threads.so(+0x4f98)[0x7f64f6010f98]
/lib64/libpthread.so.0(+0x7dd5)[0x7f6502574dd5]
/lib64/libc.so.6(clone+0x6d)[0x7f6501e3cead]
---------

This is a segfault of the brick process in libcrypto.so while calling MD5 related functions. OCS is not FIPS tolerant (yet), only recently RHGS-3.4 replaced non-FIPS approved hashing algorithms to prevent these kind of segfaults.

And indeed, fips=1 is set on the kernel commandline:

sh-4.2# cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-3.10.0-957.el7.x86_64 root=UUID=1243690d-6b5d-4068-8296-f50864f430d7 ro console=tty0 console=ttyS0,115200n8 no_timer_check net.ifnames=0 crashkernel=auto LANG=en_US.UTF-8 fips=1

Comment 3 Niels de Vos 2018-11-22 15:37:55 UTC
Also note that the "Heketi pod" crashes because the Gluster Volume for the heketi database can not get mounted. Mounting fails because the bricks on the server-side are unavailable (due to segfault).

Comment 15 Qixuan Wang 2019-06-17 11:38:19 UTC
Hi there,

Any update on this? We hit this issue again when testing CNV 1.4.

Comment 16 Niels de Vos 2019-06-17 12:23:13 UTC
(In reply to Qixuan Wang from comment #15)
> Any update on this? We hit this issue again when testing CNV 1.4.

The last time this problem was fixed with an update to the container runtime (runc bug 1650512). Could you verify that you can 'oc exec' into some pods that are unrelated to Heketi/Gluster?

Also please provide details of the environment (versions of OCP components, OCS components and logs).

Thanks!

Comment 17 Qixuan Wang 2019-06-18 03:52:01 UTC
Yes, I can 'oc exec' into some pods that are unrelated to Heketi/Gluster.
[root@cnv-executor-cdn-stage-master-b83726-1 ~]# oc exec -it cdi-apiserver-7596c74489-pfbsw bash
bash-4.2$


OCS images
registry.access.stage.redhat.com/rhgs3/rhgs-server-rhel7                          v3.11.1                        d075bf120994   4 months ago    304 MB
registry.access.stage.redhat.com/rhgs3/rhgs-volmanager-rhel7                      v3.11.1                        f4a8b6113476   4 months ago    287 MB
registry.access.stage.redhat.com/rhgs3/rhgs-gluster-block-prov-rhel7              v3.11.1                        e74761279746   4 months ago    971 MB


OCP images
docker.io/openshift/origin-node                                                   v3.11                          304c69ee04c3   3 days ago      1.19 GB
registry.access.redhat.com/openshift3/node                                        v3.11                          be8a09b5514c   3 weeks ago     1.98 GB
registry.access.stage.redhat.com/openshift3/ose-node                              v3.11                          be8a09b5514c   3 weeks ago     1.98 GB
registry.access.stage.redhat.com/openshift3/ose-deployer                          v3.11                          1500740029de   3 weeks ago     1.17 GB
registry.access.stage.redhat.com/openshift3/ose-pod                               v3.11                          6759d8752074   3 weeks ago     1.04 GB
registry.access.stage.redhat.com/openshift3/ose-kube-rbac-proxy                   v3.11                          cdfa9d0da060   3 weeks ago     1.07 GB
registry.access.stage.redhat.com/openshift3/local-storage-provisioner             v3.11                          d6b3fc9da546   3 weeks ago     1.04 GB
registry.access.stage.redhat.com/openshift3/metrics-hawkular-metrics              v3.11                          e10d429e8b10   3 weeks ago     1.68 GB
registry.access.stage.redhat.com/openshift3/metrics-schema-installer              v3.11                          3c29bd72cc69   3 weeks ago     503 MB
registry.access.stage.redhat.com/openshift3/prometheus-node-exporter              v3.11                          0f508556d522   3 weeks ago     1.03 GB


CNV images
registry.access.stage.redhat.com/cnv-tech-preview/kubevirt-web-ui-operator        v1.4.0                         ec1d7c948d17   12 days ago     1.29 GB
registry.access.stage.redhat.com/cnv-tech-preview/kubevirt-web-ui                 v1.4.0                         7fec13c8b83a   2 weeks ago     1.06 GB
registry.access.stage.redhat.com/cnv-tech-preview/virt-handler                    v1.4.0                         9a86bcaca215   3 weeks ago     272 MB
registry.access.stage.redhat.com/cnv-tech-preview/virt-controller                 v1.4.0                         2bd18da270ea   3 weeks ago     255 MB
registry.access.stage.redhat.com/cnv-tech-preview/virt-api                        v1.4.0                         fdd076773175   3 weeks ago     255 MB
registry.access.stage.redhat.com/cnv-tech-preview/virt-launcher                   v1.4.0                         6bb754142817   3 weeks ago     493 MB
registry.access.stage.redhat.com/cnv-tech-preview/virt-operator                   v1.4.0                         7e1e32ab9f2f   3 weeks ago     253 MB
registry.access.stage.redhat.com/cnv-tech-preview/cnv-libvirt                     latest                         6ce7e5abc16a   3 weeks ago     426 MB
registry.access.stage.redhat.com/cnv-tech-preview/virt-cdi-importer               v1.4.0                         84afc9a1ed07   4 weeks ago     313 MB
registry.access.stage.redhat.com/cnv-tech-preview/virt-cdi-controller             v1.4.0                         eaa565b518f8   6 weeks ago     258 MB
registry.access.stage.redhat.com/cnv-tech-preview/multus-cni                      v1.4.0                         dc0ec22bb21a   6 weeks ago     250 MB
registry.access.stage.redhat.com/cnv-tech-preview/virt-cdi-apiserver              v1.4.0                         eba41663b815   6 weeks ago     258 MB
registry.access.stage.redhat.com/cnv-tech-preview/virt-cdi-uploadproxy            v1.4.0                         bf55d1556bcc   6 weeks ago     258 MB
registry.access.stage.redhat.com/cnv-tech-preview/kubevirt-cpu-model-nfd-plugin   v1.4.0                         824888646787   6 weeks ago     216 MB
registry.access.stage.redhat.com/cnv-tech-preview/kubevirt-cpu-node-labeller      v1.4.0                         4b6f3cae56bc   6 weeks ago     247 MB
registry.access.stage.redhat.com/cnv-tech-preview/virt-cdi-operator               v1.4.0                         4428190b92b5   6 weeks ago     259 MB
registry.access.stage.redhat.com/cnv-tech-preview/ovs-cni-plugin                  v1.4.0                         b9ab2f8bbe05   6 weeks ago     218 MB


[root@cnv-executor-cdn-stage-master-b83726-1 ~]# oc get pod
NAME                                          READY     STATUS              RESTARTS   AGE
glusterblock-storage-provisioner-dc-1-zk755   1/1       Running             1          3d
glusterfs-storage-7c744                       1/1       Running             1          3d
glusterfs-storage-jznzc                       1/1       Running             1          3d
glusterfs-storage-z4w5n                       1/1       Running             1          3d
heketi-storage-1-deploy                       1/1       Running             0          2m
heketi-storage-1-xxk24                        0/1       ContainerCreating   0          2m


[root@cnv-executor-cdn-stage-master-b83726-1 ~]# oc describe pod heketi-storage-1-xxk24
Name:               heketi-storage-1-xxk24
Namespace:          glusterfs
Priority:           0
PriorityClassName:  <none>
Node:               cnv-executor-cdn-stage-master-b83726-1.example.com/172.16.0.26
Start Time:         Mon, 17 Jun 2019 23:46:23 -0400
Labels:             deployment=heketi-storage-1
                    deploymentconfig=heketi-storage
                    glusterfs=heketi-storage-pod
                    heketi=storage-pod
Annotations:        openshift.io/deployment-config.latest-version=1
                    openshift.io/deployment-config.name=heketi-storage
                    openshift.io/deployment.name=heketi-storage-1
                    openshift.io/scc=privileged
Status:             Pending
IP:
Controlled By:      ReplicationController/heketi-storage-1
Containers:
  heketi:
    Container ID:
    Image:          registry.access.stage.redhat.com/rhgs3/rhgs-volmanager-rhel7:v3.11.1
    Image ID:
    Port:           8080/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Liveness:       http-get http://:8080/hello delay=30s timeout=3s period=10s #success=1 #failure=3
    Readiness:      http-get http://:8080/hello delay=3s timeout=3s period=10s #success=1 #failure=3
    Environment:
      HEKETI_USER_KEY:                 5zziUMvTNCdV42jfNL6u7LEfnKSiejXo1MzUCNxfBhc=
      HEKETI_ADMIN_KEY:                ld/mwMgEiBaonPiEFscaUPb9pTSE62WLYbRdaCdIvDI=
      HEKETI_CLI_USER:                 admin
      HEKETI_CLI_KEY:                  ld/mwMgEiBaonPiEFscaUPb9pTSE62WLYbRdaCdIvDI=
      HEKETI_EXECUTOR:                 kubernetes
      HEKETI_FSTAB:                    /var/lib/heketi/fstab
      HEKETI_SNAPSHOT_LIMIT:           14
      HEKETI_KUBE_GLUSTER_DAEMONSET:   1
      HEKETI_IGNORE_STALE_OPERATIONS:  true
      HEKETI_DEBUG_UMOUNT_FAILURES:    true
    Mounts:
      /etc/heketi from config (rw)
      /var/lib/heketi from db (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from heketi-storage-service-account-token-6mqj9 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  db:
    Type:           Glusterfs (a Glusterfs mount on the host that shares a pod's lifetime)
    EndpointsName:  heketi-db-storage-endpoints
    Path:           heketidbstorage
    ReadOnly:       false
  config:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  heketi-storage-config-secret
    Optional:    false
  heketi-storage-service-account-token-6mqj9:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  heketi-storage-service-account-token-6mqj9
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     <none>
Events:
  Type     Reason       Age   From                                                         Message
  ----     ------       ----  ----                                                         -------
  Normal   Scheduled    4m    default-scheduler                                            Successfully assigned glusterfs/heketi-storage-1-xxk24 to cnv-executor-cdn-stage-master-b83726-1.example.com
  Warning  FailedMount  4m    kubelet, cnv-executor-cdn-stage-master-b83726-1.example.com  MountVolume.SetUp failed for volume "db" : mount failed: mount failed: exit status 1
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/origin/openshift.local.volumes/pods/a400fc0f-917b-11e9-a065-fa163e3f7da1/volumes/kubernetes.io~glusterfs/db --scope -- mount -t glusterfs -o log-level=ERROR,log-file=/var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/glusterfs/db/heketi-storage-1-xxk24-glusterfs.log,backup-volfile-servers=172.16.0.15:172.16.0.17:172.16.0.26,auto_unmount 172.16.0.15:heketidbstorage /var/lib/origin/openshift.local.volumes/pods/a400fc0f-917b-11e9-a065-fa163e3f7da1/volumes/kubernetes.io~glusterfs/db
Output: Running scope as unit run-109421.scope.
Mount failed. Please check the log file for more details.

 the following error information was pulled from the glusterfs log to help diagnose this issue:
[2019-06-18 03:46:23.738413] E [fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 3: GETATTR 1 (00000000-0000-0000-0000-000000000001) resolution failed
The message "E [MSGID: 108006] [afr-common.c:4944:__afr_handle_child_down_event] 0-heketidbstorage-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up." repeated 2 times between [2019-06-18 03:46:23.725457] and [2019-06-18 03:46:23.728806]
  Warning  FailedMount  4m  kubelet, cnv-executor-cdn-stage-master-b83726-1.example.com  MountVolume.SetUp failed for volume "db" : mount failed: mount failed: exit status 1
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/origin/openshift.local.volumes/pods/a400fc0f-917b-11e9-a065-fa163e3f7da1/volumes/kubernetes.io~glusterfs/db --scope -- mount -t glusterfs -o log-level=ERROR,log-file=/var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/glusterfs/db/heketi-storage-1-xxk24-glusterfs.log,backup-volfile-servers=172.16.0.15:172.16.0.17:172.16.0.26,auto_unmount 172.16.0.15:heketidbstorage /var/lib/origin/openshift.local.volumes/pods/a400fc0f-917b-11e9-a065-fa163e3f7da1/volumes/kubernetes.io~glusterfs/db
Output: Running scope as unit run-109596.scope.
Mount failed. Please check the log file for more details.

 the following error information was pulled from the glusterfs log to help diagnose this issue:
[2019-06-18 03:46:24.671280] E [fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 3: GETATTR 1 (00000000-0000-0000-0000-000000000001) resolution failed
The message "E [MSGID: 108006] [afr-common.c:4944:__afr_handle_child_down_event] 0-heketidbstorage-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up." repeated 2 times between [2019-06-18 03:46:24.657693] and [2019-06-18 03:46:24.664253]
  Warning  FailedMount  4m  kubelet, cnv-executor-cdn-stage-master-b83726-1.example.com  MountVolume.SetUp failed for volume "db" : mount failed: mount failed: exit status 1
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/origin/openshift.local.volumes/pods/a400fc0f-917b-11e9-a065-fa163e3f7da1/volumes/kubernetes.io~glusterfs/db --scope -- mount -t glusterfs -o log-level=ERROR,log-file=/var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/glusterfs/db/heketi-storage-1-xxk24-glusterfs.log,backup-volfile-servers=172.16.0.15:172.16.0.17:172.16.0.26,auto_unmount 172.16.0.15:heketidbstorage /var/lib/origin/openshift.local.volumes/pods/a400fc0f-917b-11e9-a065-fa163e3f7da1/volumes/kubernetes.io~glusterfs/db
Output: Running scope as unit run-109777.scope.
Mount failed. Please check the log file for more details.

 the following error information was pulled from the glusterfs log to help diagnose this issue:
[2019-06-18 03:46:26.374291] E [fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 3: GETATTR 1 (00000000-0000-0000-0000-000000000001) resolution failed
The message "E [MSGID: 108006] [afr-common.c:4944:__afr_handle_child_down_event] 0-heketidbstorage-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up." repeated 2 times between [2019-06-18 03:46:26.364894] and [2019-06-18 03:46:26.368786]
  Warning  FailedMount  4m  kubelet, cnv-executor-cdn-stage-master-b83726-1.example.com  MountVolume.SetUp failed for volume "db" : mount failed: mount failed: exit status 1
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/origin/openshift.local.volumes/pods/a400fc0f-917b-11e9-a065-fa163e3f7da1/volumes/kubernetes.io~glusterfs/db --scope -- mount -t glusterfs -o log-level=ERROR,log-file=/var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/glusterfs/db/heketi-storage-1-xxk24-glusterfs.log,backup-volfile-servers=172.16.0.15:172.16.0.17:172.16.0.26,auto_unmount 172.16.0.15:heketidbstorage /var/lib/origin/openshift.local.volumes/pods/a400fc0f-917b-11e9-a065-fa163e3f7da1/volumes/kubernetes.io~glusterfs/db
Output: Running scope as unit run-109904.scope.
Mount failed. Please check the log file for more details.

 the following error information was pulled from the glusterfs log to help diagnose this issue:
[2019-06-18 03:46:28.787553] E [fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 3: GETATTR 1 (00000000-0000-0000-0000-000000000001) resolution failed
The message "E [MSGID: 108006] [afr-common.c:4944:__afr_handle_child_down_event] 0-heketidbstorage-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up." repeated 2 times between [2019-06-18 03:46:28.777526] and [2019-06-18 03:46:28.781571]
  Warning  FailedMount  3m  kubelet, cnv-executor-cdn-stage-master-b83726-1.example.com  MountVolume.SetUp failed for volume "db" : mount failed: mount failed: exit status 1
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/origin/openshift.local.volumes/pods/a400fc0f-917b-11e9-a065-fa163e3f7da1/volumes/kubernetes.io~glusterfs/db --scope -- mount -t glusterfs -o log-level=ERROR,log-file=/var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/glusterfs/db/heketi-storage-1-xxk24-glusterfs.log,backup-volfile-servers=172.16.0.15:172.16.0.17:172.16.0.26,auto_unmount 172.16.0.15:heketidbstorage /var/lib/origin/openshift.local.volumes/pods/a400fc0f-917b-11e9-a065-fa163e3f7da1/volumes/kubernetes.io~glusterfs/db
Output: Running scope as unit run-110112.scope.
Mount failed. Please check the log file for more details.

 the following error information was pulled from the glusterfs log to help diagnose this issue:
[2019-06-18 03:46:32.963394] E [fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 3: GETATTR 1 (00000000-0000-0000-0000-000000000001) resolution failed
The message "E [MSGID: 108006] [afr-common.c:4944:__afr_handle_child_down_event] 0-heketidbstorage-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up." repeated 2 times between [2019-06-18 03:46:32.954223] and [2019-06-18 03:46:32.958289]
  Warning  FailedMount  3m  kubelet, cnv-executor-cdn-stage-master-b83726-1.example.com  MountVolume.SetUp failed for volume "db" : mount failed: mount failed: exit status 1
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/origin/openshift.local.volumes/pods/a400fc0f-917b-11e9-a065-fa163e3f7da1/volumes/kubernetes.io~glusterfs/db --scope -- mount -t glusterfs -o log-level=ERROR,log-file=/var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/glusterfs/db/heketi-storage-1-xxk24-glusterfs.log,backup-volfile-servers=172.16.0.15:172.16.0.17:172.16.0.26,auto_unmount 172.16.0.15:heketidbstorage /var/lib/origin/openshift.local.volumes/pods/a400fc0f-917b-11e9-a065-fa163e3f7da1/volumes/kubernetes.io~glusterfs/db
Output: Running scope as unit run-110449.scope.
Mount failed. Please check the log file for more details.

 the following error information was pulled from the glusterfs log to help diagnose this issue:
[2019-06-18 03:46:41.173776] E [fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 3: GETATTR 1 (00000000-0000-0000-0000-000000000001) resolution failed
The message "E [MSGID: 108006] [afr-common.c:4944:__afr_handle_child_down_event] 0-heketidbstorage-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up." repeated 2 times between [2019-06-18 03:46:41.163604] and [2019-06-18 03:46:41.168343]
  Warning  FailedMount  3m  kubelet, cnv-executor-cdn-stage-master-b83726-1.example.com  MountVolume.SetUp failed for volume "db" : mount failed: mount failed: exit status 1
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/origin/openshift.local.volumes/pods/a400fc0f-917b-11e9-a065-fa163e3f7da1/volumes/kubernetes.io~glusterfs/db --scope -- mount -t glusterfs -o log-level=ERROR,log-file=/var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/glusterfs/db/heketi-storage-1-xxk24-glusterfs.log,backup-volfile-servers=172.16.0.15:172.16.0.17:172.16.0.26,auto_unmount 172.16.0.15:heketidbstorage /var/lib/origin/openshift.local.volumes/pods/a400fc0f-917b-11e9-a065-fa163e3f7da1/volumes/kubernetes.io~glusterfs/db
Output: Running scope as unit run-110878.scope.
Mount failed. Please check the log file for more details.

 the following error information was pulled from the glusterfs log to help diagnose this issue:
[2019-06-18 03:46:57.571960] E [fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 3: GETATTR 1 (00000000-0000-0000-0000-000000000001) resolution failed
The message "E [MSGID: 108006] [afr-common.c:4944:__afr_handle_child_down_event] 0-heketidbstorage-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up." repeated 2 times between [2019-06-18 03:46:57.562753] and [2019-06-18 03:46:57.566529]
  Warning  FailedMount  3m  kubelet, cnv-executor-cdn-stage-master-b83726-1.example.com  MountVolume.SetUp failed for volume "db" : mount failed: mount failed: exit status 1
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/origin/openshift.local.volumes/pods/a400fc0f-917b-11e9-a065-fa163e3f7da1/volumes/kubernetes.io~glusterfs/db --scope -- mount -t glusterfs -o auto_unmount,log-level=ERROR,log-file=/var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/glusterfs/db/heketi-storage-1-xxk24-glusterfs.log,backup-volfile-servers=172.16.0.15:172.16.0.17:172.16.0.26 172.16.0.15:heketidbstorage /var/lib/origin/openshift.local.volumes/pods/a400fc0f-917b-11e9-a065-fa163e3f7da1/volumes/kubernetes.io~glusterfs/db
Output: Running scope as unit run-111895.scope.
Mount failed. Please check the log file for more details.

 the following error information was pulled from the glusterfs log to help diagnose this issue:
[2019-06-18 03:47:29.787311] E [fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 3: GETATTR 1 (00000000-0000-0000-0000-000000000001) resolution failed
The message "E [MSGID: 108006] [afr-common.c:4944:__afr_handle_child_down_event] 0-heketidbstorage-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up." repeated 2 times between [2019-06-18 03:47:29.775438] and [2019-06-18 03:47:29.781864]
  Warning  FailedMount  2m  kubelet, cnv-executor-cdn-stage-master-b83726-1.example.com  Unable to mount volumes for pod "heketi-storage-1-xxk24_glusterfs(a400fc0f-917b-11e9-a065-fa163e3f7da1)": timeout expired waiting for volumes to attach or mount for pod "glusterfs"/"heketi-storage-1-xxk24". list of unmounted volumes=[db]. list of unattached volumes=[db config heketi-storage-service-account-token-6mqj9]
  Warning  FailedMount  1m  kubelet, cnv-executor-cdn-stage-master-b83726-1.example.com  (combined from similar events): MountVolume.SetUp failed for volume "db" : mount failed: mount failed: exit status 1
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/origin/openshift.local.volumes/pods/a400fc0f-917b-11e9-a065-fa163e3f7da1/volumes/kubernetes.io~glusterfs/db --scope -- mount -t glusterfs -o backup-volfile-servers=172.16.0.15:172.16.0.17:172.16.0.26,auto_unmount,log-level=ERROR,log-file=/var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/glusterfs/db/heketi-storage-1-xxk24-glusterfs.log 172.16.0.15:heketidbstorage /var/lib/origin/openshift.local.volumes/pods/a400fc0f-917b-11e9-a065-fa163e3f7da1/volumes/kubernetes.io~glusterfs/db
Output: Running scope as unit run-113468.scope.
Mount failed. Please check the log file for more details.

 the following error information was pulled from the glusterfs log to help diagnose this issue:
[2019-06-18 03:48:33.990581] E [fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 3: GETATTR 1 (00000000-0000-0000-0000-000000000001) resolution failed
The message "E [MSGID: 108006] [afr-common.c:4944:__afr_handle_child_down_event] 0-heketidbstorage-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up." repeated 2 times between [2019-06-18 03:48:33.972037] and [2019-06-18 03:48:33.979536]


Note You need to log in before you can comment on or make changes to this bug.