Bug 1709959 - Gluster causing Kubernetes containers to enter crash loop with 'mkdir ... file exists' error message
Summary: Gluster causing Kubernetes containers to enter crash loop with 'mkdir ... fil...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: 4.1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-14 16:13 UTC by Jeff Bischoff
Modified: 2019-05-29 17:47 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-05-29 17:47:38 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Gluster logs (188.86 KB, application/gzip)
2019-05-14 17:36 UTC, Jeff Bischoff
no flags Details

Description Jeff Bischoff 2019-05-14 16:13:37 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Jeff Bischoff 2019-05-14 16:20:54 UTC
Description of problem:
Various pods (for example Heketi) in our Kubernetes cluster enter an infinite crash loop. It seems to be an issue with the gluster mounts. The error message always contains "file exists"

Version-Release number of selected component (if applicable):
4.1.7


How reproducible:
No known steps to reproduce, but it happens every few days in multiple environments that we are running.

Steps to Reproduce:
1. Kubernetes environment is healthy, with working gluster mounts
2. Various containers enter a crash loop, with "file exists" error message
   The bricks appear to be offline at this point.

Actual results:
Gluster mounts seem to fail and never recover.

Expected results:
Gluster mounts are stable, or at least automatically remount after a failure.

Additional info:
See comments below.

Comment 2 Jeff Bischoff 2019-05-14 16:24:22 UTC
This is the Kubernetes version from our latest failing environments:

	$  kubectl version
	Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
	Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

Here's how the heketi pod looks:


$ kubectl describe pod heketi-7495cdc5fd-xqmxr
Error from server (NotFound): pods "heketi-7495cdc5fd-xqmxr" not found
[turbo@node1 ~]$ kubectl describe pod heketi-7495cdc5fd-xqmxr -n default
Name:               heketi-7495cdc5fd-xqmxr
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               node1/10.10.168.25
Start Time:         Mon, 06 May 2019 02:11:42 +0000
Labels:             glusterfs=heketi-pod
                    heketi=pod
                    pod-template-hash=7495cdc5fd
Annotations:        <none>
Status:             Running
IP:                 10.233.90.85
Controlled By:      ReplicaSet/heketi-7495cdc5fd
Containers:
  heketi:
    Container ID:   docker://fed61190bf01d149027f187e49a8428e0654fc347de9a9164665f40247c543b3
    Image:          heketi/heketi:dev
    Image ID:       docker-pullable://heketi/heketi@sha256:bcbf709fd084793e4ff0379f08ca44f71154c270d3a74df2bd146472e2d28402
    Port:           8080/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       ContainerCannotRun
      Message:      error while creating mount source path '/var/lib/kubelet/pods/4a2574bb-6fa4-11e9-a315-005056b83c80/volumes/kubernetes.io~glusterfs/db': mkdir /var/lib/kubelet/pods/4a2574bb-6fa4-11e9-a315-005056b83c80/volumes/kubernetes.io~glusterfs/db: file exists
      Exit Code:    128
      Started:      Tue, 14 May 2019 14:34:55 +0000
      Finished:     Tue, 14 May 2019 14:34:55 +0000
    Ready:          False
    Restart Count:  1735
    Liveness:       http-get http://:8080/hello delay=30s timeout=3s period=10s #success=1 #failure=3
    Readiness:      http-get http://:8080/hello delay=3s timeout=3s period=10s #success=1 #failure=3
    Environment:
      HEKETI_USER_KEY:                 
      HEKETI_ADMIN_KEY:                
      HEKETI_EXECUTOR:                 kubernetes
      HEKETI_FSTAB:                    /var/lib/heketi/fstab
      HEKETI_SNAPSHOT_LIMIT:           14
      HEKETI_KUBE_GLUSTER_DAEMONSET:   y
      HEKETI_IGNORE_STALE_OPERATIONS:  true
    Mounts:
      /etc/heketi from config (rw)
      /var/lib/heketi from db (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from heketi-service-account-token-ntfx2 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  db:
    Type:           Glusterfs (a Glusterfs mount on the host that shares a pod's lifetime)
    EndpointsName:  heketi-storage-endpoints
    Path:           heketidbstorage
    ReadOnly:       false
  config:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  heketi-config-secret
    Optional:    false
  heketi-service-account-token-ntfx2:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  heketi-service-account-token-ntfx2
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason   Age                     From            Message
  ----     ------   ----                    ----            -------
  Warning  BackOff  3m36s (x40124 over 6d)  kubelet, node1  Back-off restarting failed container

I'm not at all familiar with gluster brick logs, but looking at those it appears that some health checks failed, and they were shut down?

```
[2019-05-08 13:48:33.642896] W [MSGID: 113075] [posix-helpers.c:1895:posix_fs_health_check] 0-vol_a720850474f6ce7ae6c57dcc60284b1f-posix: aio_write() on /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_0343
10c050aa134b254316068472b4cc/brick/.glusterfs/health_check returned [Resource temporarily unavailable]
[2019-05-08 13:48:33.748515] M [MSGID: 113075] [posix-helpers.c:1962:posix_health_check_thread_proc] 0-vol_a720850474f6ce7ae6c57dcc60284b1f-posix: health-check failed, going down
[2019-05-08 13:48:33.999892] M [MSGID: 113075] [posix-helpers.c:1981:posix_health_check_thread_proc] 0-vol_a720850474f6ce7ae6c57dcc60284b1f-posix: still alive! -> SIGTERM
[2019-05-08 13:49:04.598861] W [glusterfsd.c:1514:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f2a27df4dd5] -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x562568920d65] -->/usr/sbin/glusterfsd(cleanup_an
d_exit+0x6b) [0x562568920b8b] ) 0-: received signum (15), shutting down
```
...and...
```
[2019-05-06 03:34:39.698647] I [MSGID: 115036] [server.c:483:server_rpc_notify] 0-vol_95b9fad3e8bce2d1c9aac2da2af46057-server: disconnecting connection from node1-21644-2019/05/06-02:17:50:364351-vol_95b9fad3e8bce2d1c9aac2da2af46057-client-0-0-0
[2019-05-06 03:34:39.698956] I [MSGID: 101055] [client_t.c:444:gf_client_unref] 0-vol_95b9fad3e8bce2d1c9aac2da2af46057-server: Shutting down connection node1-21644-2019/05/06-02:17:50:364351-vol_95b9fad3e8bce2d1c9aac2da2af46057-client-0-0-0
[2019-05-06 03:34:54.929155] I [addr.c:55:compare_addr_and_update] 0-/var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_1adb1f9dad96381614efc30fe22943a7/brick: allowed = "*", received addr = "10.10.168.25"
[2019-05-06 03:34:54.929223] I [login.c:111:gf_auth] 0-auth/login: allowed user names: 57cda2e6-f071-4ec4-b1a5-04f43f91a204
[2019-05-06 03:34:54.929253] I [MSGID: 115029] [server-handshake.c:495:server_setvolume] 0-vol_95b9fad3e8bce2d1c9aac2da2af46057-server: accepted client from node1-23801-2019/05/06-03:34:54:882971-vol_95b9fad3e8bce2d1c9aac2da2af46057-client-0-0-0 (version: 3.12.2)
[2019-05-07 11:50:30.502074] I [MSGID: 115036] [server.c:483:server_rpc_notify] 0-vol_95b9fad3e8bce2d1c9aac2da2af46057-server: disconnecting connection from node1-23801-2019/05/06-03:34:54:882971-vol_95b9fad3e8bce2d1c9aac2da2af46057-client-0-0-0
[2019-05-07 11:50:30.524408] I [MSGID: 101055] [client_t.c:444:gf_client_unref] 0-vol_95b9fad3e8bce2d1c9aac2da2af46057-server: Shutting down connection node1-23801-2019/05/06-03:34:54:882971-vol_95b9fad3e8bce2d1c9aac2da2af46057-client-0-0-0
[2019-05-07 11:54:45.456189] W [glusterfsd.c:1514:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f684d7ccdd5] -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x556c12dead65] -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x556c12deab8b] ) 0-: received signum (15), shutting down


Full gluster logs here: 
[glusterfs.zip](https://github.com/heketi/heketi/files/3178441/glusterfs.zip)

I tried to get the heketi container logs, but it appears they don't exist:


$ kubectl logs -n default heketi-7495cdc5fd-xqmxr -p
failed to open log file "/var/log/pods/4a2574bb-6fa4-11e9-a315-005056b83c80/heketi/1741.log": open /var/log/pods/4a2574bb-6fa4-11e9-a315-005056b83c80/heketi/1741.log: no such file or directory


Gluster seems to indicate that all of my bricks are offline:

[root@node1 /]# gluster
gluster> volume list
heketidbstorage
vol_050f6767658bceaed3e4c58693f3220e
vol_0f8c60645f1014a72b9999036d6244e2
vol_27ef5ea360e90f459d56082bc2b7be9f
vol_59090c2fd20479d553a5baa153d3fcbd
vol_673aef3de9147eaede26b7169ebf5f6e
vol_6848649bb5d29d60985d4d59380caafe
vol_744c23296132470b8639599b837ae671
vol_76c6f946e64d2150f99503953127c647
vol_84337b6825c0eb3d7a0e6008b65dd757
vol_9e6cad52d8a8e2e7f8febe2709ef253a
vol_a720850474f6ce7ae6c57dcc60284b1f
vol_c98a28cd587883dc2882c00695b02d52
vol_ced2ac693a19d4ae53af897eaf13bd86
vol_dcece16823bead8503333ef11c022775
vol_e6cbcf7bcb912d6c9725f3390f96b4b3
vol_eaa27e3100f78bff42ff337f163fee0f
vol_ff040fc48f8bd16727423b59ac7244c6
gluster> volume status
Status of volume: heketidbstorage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_a1
6f9f0374fe5db948a60a017a3f5e60/brick        N/A       N/A        N       N/A  
 
Task Status of Volume heketidbstorage
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: vol_050f6767658bceaed3e4c58693f3220e
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_a6
d6af28e7525bbe3563948f4f9455bd/brick        N/A       N/A        N       N/A  
 
Task Status of Volume vol_050f6767658bceaed3e4c58693f3220e
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: vol_0f8c60645f1014a72b9999036d6244e2
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_ed
81730bd5d36a151cf5163f379474b4/brick        N/A       N/A        N       N/A  
 
Task Status of Volume vol_0f8c60645f1014a72b9999036d6244e2
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: vol_27ef5ea360e90f459d56082bc2b7be9f
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_76
14e5014a0e402630a0e1fd776acf0a/brick        N/A       N/A        N       N/A  
 
Task Status of Volume vol_27ef5ea360e90f459d56082bc2b7be9f
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: vol_59090c2fd20479d553a5baa153d3fcbd
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_93
e6fdd290a8e963d927de4a1115d17e/brick        N/A       N/A        N       N/A  
 
Task Status of Volume vol_59090c2fd20479d553a5baa153d3fcbd
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: vol_673aef3de9147eaede26b7169ebf5f6e
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_0e
d4f7f941de388cda678fe273e9ceb4/brick        N/A       N/A        N       N/A  
 
Task Status of Volume vol_673aef3de9147eaede26b7169ebf5f6e
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: vol_6848649bb5d29d60985d4d59380caafe
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_5f
8b153d183d154b789425f5f5c8f912/brick        N/A       N/A        N       N/A  
 
Task Status of Volume vol_6848649bb5d29d60985d4d59380caafe
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: vol_744c23296132470b8639599b837ae671
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_c1
1ac2780871f7d759a3da1c27e01941/brick        N/A       N/A        N       N/A  
 
Task Status of Volume vol_744c23296132470b8639599b837ae671
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: vol_76c6f946e64d2150f99503953127c647
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_03
ef27d7e6834e4c7519a8db19369742/brick        N/A       N/A        N       N/A  
 
Task Status of Volume vol_76c6f946e64d2150f99503953127c647
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: vol_84337b6825c0eb3d7a0e6008b65dd757
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_a3
cef78a5914a2808da0b5736e3daec7/brick        N/A       N/A        N       N/A  
 
Task Status of Volume vol_84337b6825c0eb3d7a0e6008b65dd757
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: vol_9e6cad52d8a8e2e7f8febe2709ef253a
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_29
88103500386566a0ef4dd3fa69e429/brick        N/A       N/A        N       N/A  
 
Task Status of Volume vol_9e6cad52d8a8e2e7f8febe2709ef253a
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: vol_a720850474f6ce7ae6c57dcc60284b1f
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_03
4310c050aa134b254316068472b4cc/brick        N/A       N/A        N       N/A  
 
Task Status of Volume vol_a720850474f6ce7ae6c57dcc60284b1f
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: vol_c98a28cd587883dc2882c00695b02d52
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_38
41cba307728c0bd2a66a1429160112/brick        N/A       N/A        N       N/A  
 
Task Status of Volume vol_c98a28cd587883dc2882c00695b02d52
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: vol_ced2ac693a19d4ae53af897eaf13bd86
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_c6
4cb733906d43c5101044898eac8a35/brick        N/A       N/A        N       N/A  
 
Task Status of Volume vol_ced2ac693a19d4ae53af897eaf13bd86
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: vol_dcece16823bead8503333ef11c022775
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_76
bd80272c57164663bec3b1c9750366/brick        N/A       N/A        N       N/A  
 
Task Status of Volume vol_dcece16823bead8503333ef11c022775
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: vol_e6cbcf7bcb912d6c9725f3390f96b4b3
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_63
ec19a814ece3152021772f71ddbd92/brick        N/A       N/A        N       N/A  
 
Task Status of Volume vol_e6cbcf7bcb912d6c9725f3390f96b4b3
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: vol_eaa27e3100f78bff42ff337f163fee0f
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_38
7ecde606556b9d25487167b02e1e6b/brick        N/A       N/A        N       N/A  
 
Task Status of Volume vol_eaa27e3100f78bff42ff337f163fee0f
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: vol_ff040fc48f8bd16727423b59ac7244c6
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_21
2c67914837cef8a927922ee63c7ee7/brick        N/A       N/A        N       N/A  
 
Task Status of Volume vol_ff040fc48f8bd16727423b59ac7244c6
------------------------------------------------------------------------------
There are no active volume tasks


My heketi volume info:

gluster> volume info heketidbstorage
 
Volume Name: heketidbstorage
Type: Distribute
Volume ID: 34b897d0-0953-4f8f-9c5c-54e043e55d92
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.10.168.25:/var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_a16f9f0374fe5db948a60a017a3f5e60/brick
Options Reconfigured:
user.heketi.id: 1d2400626dac780fce12e45a07494853
transport.address-family: inet
nfs.disable: on


Our gluster settings/volume options:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gluster-heketi
  selfLink: /apis/storage.k8s.io/v1/storageclasses/gluster-heketi
parameters:
  gidMax: "50000"
  gidMin: "2000"
  resturl: http://10.233.35.158:8080
  restuser: "null"
  restuserkey: "null"
  volumetype: "none"
  volumeoptions: cluster.post-op-delay-secs 0, performance.client-io-threads off, performance.open-behind off, performance.readdir-ahead off, performance.read-ahead off, performance.stat-prefetch off, performance.write-behind off, performance.io-cache off, cluster.consistent-metadata on, performance.quick-read off, performance.strict-o-direct on
provisioner: kubernetes.io/glusterfs
reclaimPolicy: Delete

Comment 3 Jeff Bischoff 2019-05-14 17:36:11 UTC
Created attachment 1568579 [details]
Gluster logs

Comment 4 Jeff Bischoff 2019-05-14 19:08:56 UTC
Looking at the glusterd.log, it seems like everything was running for over a day with no log messages, when suddenly we hit this:

    got disconnect from stale rpc on /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_d0456279568a623a16a5508daa89b4d5/brick`

Here's the context for that snippet. The lines from 05/06 were during brick startup, while the lines from 05/07 are when the problem started.

====
    [2019-05-06 02:18:00.292652] I [glusterd-utils.c:6090:glusterd_brick_start] 0-management: starting a fresh brick process for brick /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_d0456279568a623a16a5508daa89b4d5/brick
    The message "W [MSGID: 101095] [xlator.c:181:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/4.1.7/xlator/nfs/server.so: cannot open shared object file: No such file or directory" repeated 12 times between [2019-05-06 02:17:49.214270] and [2019-05-06 02:17:59.537241]
    [2019-05-06 02:18:00.474120] I [MSGID: 106142] [glusterd-pmap.c:297:pmap_registry_bind] 0-pmap: adding brick /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_d0456279568a623a16a5508daa89b4d5/brick on port 49169
    [2019-05-06 02:18:00.477708] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
    [2019-05-06 02:18:00.507596] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped
    [2019-05-06 02:18:00.507662] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: nfs service is stopped
    [2019-05-06 02:18:00.507682] I [MSGID: 106599] [glusterd-nfs-svc.c:82:glusterd_nfssvc_manager] 0-management: nfs/server.so xlator is not installed
    [2019-05-06 02:18:00.511313] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
    [2019-05-06 02:18:00.511386] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: bitd service is stopped
    [2019-05-06 02:18:00.513396] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped
    [2019-05-06 02:18:00.513503] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: scrub service is stopped
    [2019-05-06 02:18:00.534304] I [run.c:241:runner_log] (-->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe2c9a) [0x7f795f17fc9a] -->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe2765) [0x7f795f17f765] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f79643180f5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh --volname=vol_d0a0dcf9903e236f68a3933c3060ec5a --first=no --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
    [2019-05-06 02:18:00.582971] E [run.c:241:runner_log] (-->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe2c9a) [0x7f795f17fc9a] -->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe26c3) [0x7f795f17f6c3] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f79643180f5] ) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=vol_d0a0dcf9903e236f68a3933c3060ec5a --first=no --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
    The message "W [MSGID: 101095] [xlator.c:452:xlator_dynload] 0-xlator: /usr/lib64/glusterfs/4.1.7/xlator/nfs/server.so: cannot open shared object file: No such file or directory" repeated 76 times between [2019-05-06 02:16:52.212662] and [2019-05-06 02:17:58.606533]
    [2019-05-07 11:53:38.663362] I [run.c:241:runner_log] (-->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0x3a7a5) [0x7f795f0d77a5] -->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe2765) [0x7f795f17f765] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f79643180f5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/stop/pre/S29CTDB-teardown.sh --volname=vol_d0a0dcf9903e236f68a3933c3060ec5a --last=no
    [2019-05-07 11:53:38.905338] E [run.c:241:runner_log] (-->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0x3a7a5) [0x7f795f0d77a5] -->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe26c3) [0x7f795f17f6c3] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f79643180f5] ) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/stop/pre/S30samba-stop.sh --volname=vol_d0a0dcf9903e236f68a3933c3060ec5a --last=no
    [2019-05-07 11:53:38.982785] I [MSGID: 106542] [glusterd-utils.c:8253:glusterd_brick_signal] 0-glusterd: sending signal 15 to brick with pid 8951
    [2019-05-07 11:53:39.983244] I [MSGID: 106143] [glusterd-pmap.c:397:pmap_registry_remove] 0-pmap: removing brick /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_d0456279568a623a16a5508daa89b4d5/brick on port 49169
    [2019-05-07 11:53:39.984656] W [glusterd-handler.c:6124:__glusterd_brick_rpc_notify] 0-management: got disconnect from stale rpc on /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_d0456279568a623a16a5508daa89b4d5/brick
    [2019-05-07 11:53:40.316466] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped
    [2019-05-07 11:53:40.316601] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: nfs service is stopped
    [2019-05-07 11:53:40.316644] I [MSGID: 106599] [glusterd-nfs-svc.c:82:glusterd_nfssvc_manager] 0-management: nfs/server.so xlator is not installed
    [2019-05-07 11:53:40.319650] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
    [2019-05-07 11:53:40.319708] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: bitd service is stopped
    [2019-05-07 11:53:40.321091] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped
    [2019-05-07 11:53:40.321132] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: scrub service is stopped
====

What would cause it to go stale? What is actually going stale here? Where should I look next? I am using whatever is built-in to gluster-centos:latest image from dockerhub.

Comment 5 Jeff Bischoff 2019-05-14 19:09:21 UTC
Looking at the glusterd.log, it seems like everything was running for over a day with no log messages, when suddenly we hit this:

    got disconnect from stale rpc on /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_d0456279568a623a16a5508daa89b4d5/brick`

Here's the context for that snippet. The lines from 05/06 were during brick startup, while the lines from 05/07 are when the problem started.

====
    [2019-05-06 02:18:00.292652] I [glusterd-utils.c:6090:glusterd_brick_start] 0-management: starting a fresh brick process for brick /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_d0456279568a623a16a5508daa89b4d5/brick
    The message "W [MSGID: 101095] [xlator.c:181:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/4.1.7/xlator/nfs/server.so: cannot open shared object file: No such file or directory" repeated 12 times between [2019-05-06 02:17:49.214270] and [2019-05-06 02:17:59.537241]
    [2019-05-06 02:18:00.474120] I [MSGID: 106142] [glusterd-pmap.c:297:pmap_registry_bind] 0-pmap: adding brick /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_d0456279568a623a16a5508daa89b4d5/brick on port 49169
    [2019-05-06 02:18:00.477708] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
    [2019-05-06 02:18:00.507596] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped
    [2019-05-06 02:18:00.507662] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: nfs service is stopped
    [2019-05-06 02:18:00.507682] I [MSGID: 106599] [glusterd-nfs-svc.c:82:glusterd_nfssvc_manager] 0-management: nfs/server.so xlator is not installed
    [2019-05-06 02:18:00.511313] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
    [2019-05-06 02:18:00.511386] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: bitd service is stopped
    [2019-05-06 02:18:00.513396] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped
    [2019-05-06 02:18:00.513503] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: scrub service is stopped
    [2019-05-06 02:18:00.534304] I [run.c:241:runner_log] (-->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe2c9a) [0x7f795f17fc9a] -->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe2765) [0x7f795f17f765] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f79643180f5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh --volname=vol_d0a0dcf9903e236f68a3933c3060ec5a --first=no --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
    [2019-05-06 02:18:00.582971] E [run.c:241:runner_log] (-->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe2c9a) [0x7f795f17fc9a] -->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe26c3) [0x7f795f17f6c3] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f79643180f5] ) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=vol_d0a0dcf9903e236f68a3933c3060ec5a --first=no --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
    The message "W [MSGID: 101095] [xlator.c:452:xlator_dynload] 0-xlator: /usr/lib64/glusterfs/4.1.7/xlator/nfs/server.so: cannot open shared object file: No such file or directory" repeated 76 times between [2019-05-06 02:16:52.212662] and [2019-05-06 02:17:58.606533]
    [2019-05-07 11:53:38.663362] I [run.c:241:runner_log] (-->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0x3a7a5) [0x7f795f0d77a5] -->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe2765) [0x7f795f17f765] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f79643180f5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/stop/pre/S29CTDB-teardown.sh --volname=vol_d0a0dcf9903e236f68a3933c3060ec5a --last=no
    [2019-05-07 11:53:38.905338] E [run.c:241:runner_log] (-->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0x3a7a5) [0x7f795f0d77a5] -->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe26c3) [0x7f795f17f6c3] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f79643180f5] ) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/stop/pre/S30samba-stop.sh --volname=vol_d0a0dcf9903e236f68a3933c3060ec5a --last=no
    [2019-05-07 11:53:38.982785] I [MSGID: 106542] [glusterd-utils.c:8253:glusterd_brick_signal] 0-glusterd: sending signal 15 to brick with pid 8951
    [2019-05-07 11:53:39.983244] I [MSGID: 106143] [glusterd-pmap.c:397:pmap_registry_remove] 0-pmap: removing brick /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_d0456279568a623a16a5508daa89b4d5/brick on port 49169
    [2019-05-07 11:53:39.984656] W [glusterd-handler.c:6124:__glusterd_brick_rpc_notify] 0-management: got disconnect from stale rpc on /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_d0456279568a623a16a5508daa89b4d5/brick
    [2019-05-07 11:53:40.316466] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped
    [2019-05-07 11:53:40.316601] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: nfs service is stopped
    [2019-05-07 11:53:40.316644] I [MSGID: 106599] [glusterd-nfs-svc.c:82:glusterd_nfssvc_manager] 0-management: nfs/server.so xlator is not installed
    [2019-05-07 11:53:40.319650] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
    [2019-05-07 11:53:40.319708] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: bitd service is stopped
    [2019-05-07 11:53:40.321091] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped
    [2019-05-07 11:53:40.321132] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: scrub service is stopped
====

What would cause it to go stale? What is actually going stale here? Where should I look next? I am using whatever is built-in to gluster-centos:latest image from dockerhub.

Comment 6 Jeff Bischoff 2019-05-14 19:40:08 UTC
This is the version of Gluster that I am using on the gluster pod:

# rpm -qa | grep gluster
glusterfs-rdma-4.1.7-1.el7.x86_64
gluster-block-0.3-2.el7.x86_64
python2-gluster-4.1.7-1.el7.x86_64
centos-release-gluster41-1.0-3.el7.centos.noarch
glusterfs-4.1.7-1.el7.x86_64
glusterfs-api-4.1.7-1.el7.x86_64
glusterfs-cli-4.1.7-1.el7.x86_64
glusterfs-geo-replication-4.1.7-1.el7.x86_64
glusterfs-libs-4.1.7-1.el7.x86_64
glusterfs-client-xlators-4.1.7-1.el7.x86_64
glusterfs-fuse-4.1.7-1.el7.x86_64
glusterfs-server-4.1.7-1.el7.x86_64



This is the version of gluster running on the Kubernetes node:

$ rpm -qa | grep gluster
glusterfs-libs-3.12.2-18.el7.x86_64
glusterfs-3.12.2-18.el7.x86_64
glusterfs-fuse-3.12.2-18.el7.x86_64
glusterfs-client-xlators-3.12.2-18.el7.x86_64

Comment 7 Jeff Bischoff 2019-05-16 19:21:02 UTC
Update: Looking at the logs chronologically, I first see a failure in the brick and then a few seconds later the volume shuts down (we have only one brick per volume):

From the Brick log
------------------
[2019-05-08 13:48:33.642605] W [MSGID: 113075] [posix-helpers.c:1895:posix_fs_health_check] 0-heketidbstorage-posix: aio_write() on /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_a16f9f0374fe5db948a60a017a3f5e60/brick/.glusterfs/health_check returned [Resource temporarily unavailable]
[2019-05-08 13:48:33.749246] M [MSGID: 113075] [posix-helpers.c:1962:posix_health_check_thread_proc] 0-heketidbstorage-posix: health-check failed, going down
[2019-05-08 13:48:34.000428] M [MSGID: 113075] [posix-helpers.c:1981:posix_health_check_thread_proc] 0-heketidbstorage-posix: still alive! -> SIGTERM
[2019-05-08 13:49:04.597061] W [glusterfsd.c:1514:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f16fdd94dd5] -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x556e53da2d65] -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x556e53da2b8b] ) 0-: received signum (15), shutting down


From the GlusterD log
---------------------
[2019-05-08 13:49:04.673536] I [MSGID: 106143] [glusterd-pmap.c:397:pmap_registry_remove] 0-pmap: removing brick /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_a16f9f0374fe5db948a60a017a3f5e60/brick on port
 49152
[2019-05-08 13:49:05.003848] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/fe4ac75011a4de0e.socket failed (No data available)


This same pattern repeats for all the bricks/volumes. Most of them go offline within a second of the first one. The stragglers go offline within the next 30 minutes.

My interpretation of these logs is that the socket gluster is using times out. Do I need to increase 'network.ping-timeout' or 'client.grace-timeout'to address this?

What really boggles me is why the brick stays offline after the timeout. After all, it is only a "temporarily" unavailable resource. Shouldn't Gluster be able to recover from this error without user intervention?

Comment 8 Jeff Bischoff 2019-05-16 20:30:17 UTC
In my last comment, I asked: "...why the brick stays offline after the timeout. After all, it is only a "temporarily" unavailable resource. Shouldn't Gluster be able to recover from this error without user intervention?"

To answer my own question: this appears to be a feature, not a bug according to https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/brick-failure-detection/.

"When a brick process detects that the underlaying storage is not responding anymore, the process will exit. There is no automated way that the brick process gets restarted, the sysadmin will need to fix the problem with the storage first."

It's good to at least understand why it isn't coming back up. However, it seems strange to me that Gluster would choose to stop and stay off like this in the face of an apparently transient issue. What is the best approach to remedy this? Should I increase the timeouts... or even disable the health checker?

Comment 9 Ravishankar N 2019-05-24 15:26:49 UTC
Adding folks who have worked on posix health-checker and containers to see if there is something that needs fixing in gluster.

Comment 10 Mohit Agrawal 2019-05-24 16:05:48 UTC
Hi,

As per log it is showing aio_write is failed due to underline file system so brick process assumes underline file system is not healthy and kill itself.
Brick process kills itself only while every brick is running as an independent process 
it means brick_multiplex is not enabled otherwise brick process send detach event to respective brick to detach only specific brick. For container envrionment, we do recommend enable brick multiplex to attach multiple bricks with a single brick process.

There are two ways to avoid it
1) Either enable brick_multiplex
    Or
2) Disable health check thread(to disable health check thread you can configure health-check-interval to 0.


Thanks,
Mohit Agrawal

Comment 11 Jeff Bischoff 2019-05-24 16:56:13 UTC
Mohit, thanks for your feedback!

We have been testing solution #2, and so far so good. We haven't rolled that changed out to all our environments yet, but we haven't had an incident yet on the environments that do have the health checks disabled.

Regarding solution #1, we appreciate the suggest and will try out multiplexing. 

Just to confirm, what would the expected behavior be if we had multiplexing turned on and then ran into the same aio_write failure? You say the process will send a detach even to the specific brick. What happens after that? Will the brick eventually be automatically reattached? Or will it still require admin intervention, even with multiplexing?

I'm assuming its not wise to choose both solution #1 and #2? Better to just pick one?

Comment 12 Mohit Agrawal 2019-05-25 02:00:47 UTC
After enabling multiplex health_check_thread will send detach request for specific 
brick but brick will not reattach automatically.

Yes, you are right admin intervention requires to attach brick again, glusterd does not attach brick automatically unless the user does not start volume.

To avoid this you don't' need to choose both options but there are several other benefits of enabling brick multiplex. To know more please follow this 

https://gluster.home.blog/2019/05/06/why-brick-multiplexing/

If you just want to avoid this situation you can use 2nd option.

Thanks,
Mohit Agrawal

Comment 13 Jeff Bischoff 2019-05-29 14:53:30 UTC
Thanks for the feedback, Mohit. We will certainly try out those options.

So, in your opinion is this functionality working as designed? Should I close this bug? Or is there something in what I described that still bears investigation?

Comment 14 Mohit Agrawal 2019-05-29 15:26:51 UTC
Yes, please close this bug.

Comment 15 Jeff Bischoff 2019-05-29 17:47:38 UTC
Closed with workaround provided (disable health checks)


Note You need to log in before you can comment on or make changes to this bug.