Description of problem: Create a glusterfs PV with invalid endpoints, when the Pod failed to mount, there were no useful messages describing how/why it failed. Version-Release number of selected component (if applicable): openshift v3.6.126.1 kubernetes v1.6.1+5115d708d7 How reproducible: Always Steps to Reproduce: 1. Create an invalid endpoints, create a glusterfs PV with this endpoints 2. Create PVC and Pod 3. Pod failed to mount, run 'oc describe pod' Actual results: No useful message except: ``` 2m 2m 1 default-scheduler Normal Scheduled Successfully assigned gluster to ip-172-18-7-39.ec2.internal 8s 8s 1 kubelet, ip-172-18-7-39.ec2.internal Warning FailedMount Unable to mount volumes for pod "gluster_bndck(11cab1ac-5cb8-11e7-af12-0e8aebc0d9ea)": timeout expired waiting for volumes to attach/mount for pod "bndck"/"gluster". list of unattached/unmounted volumes=[gluster] 8s 8s 1 kubelet, ip-172-18-7-39.ec2.internal Warning FailedSync Error syncing pod ``` Expected results: In earlier version we could expect "glusterfs: mount failed" from 'oc describe pod' and and then find out how/why it failed to mount. Master Log: Node Log (of failed PODs): Endpoints Dump: apiVersion: v1 items: - apiVersion: v1 kind: Endpoints metadata: creationTimestamp: null name: glusterfs-cluster subsets: - addresses: - ip: 11.11.11.114 - ip: 11.11.11.118 ports: - port: 1 protocol: TCP kind: List metadata: {} PV Dump: apiVersion: v1 kind: PersistentVolume metadata: annotations: pv.kubernetes.io/bound-by-controller: "yes" creationTimestamp: null name: gluster-bndck spec: accessModes: - ReadWriteOnce capacity: storage: 5Gi claimRef: apiVersion: v1 kind: PersistentVolumeClaim name: glusterc namespace: bndck resourceVersion: "14597" uid: 0f51dfe7-5cb8-11e7-af12-0e8aebc0d9ea glusterfs: endpoints: glusterfs-cluster path: testvol persistentVolumeReclaimPolicy: Retain status: {} PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info:
Jianwei, how exactly is the endpoint wrong? I created an endpoint with invalid IP and otherwise correct pv + pvc + pod and "oc describe pod" is very verbose abound the bad IP address: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 8s 8s 1 default-scheduler Normal Scheduled Successfully assigned testpod to host-8-174-8.host.centralci.eng.rdu2.redhat.com 2s 2s 1 kubelet, host-8-174-8.host.centralci.eng.rdu2.redhat.com Warning FailedMount MountVolume.SetUp failed for volume "kubernetes.io/glusterfs/f81b5ab0-60be-11e7-842b-fa163ef34b22-gluster-pv" (spec.Name: "gluster-pv") pod "f81b5ab0-60be-11e7-842b-fa163ef34b22" (UID: "f81b5ab0-60be-11e7-842b-fa163ef34b22") with: glusterfs: mount failed: mount failed: exit status 1 Mounting command: mount Mounting arguments: 172.17.0.3:test_vol /var/lib/origin/openshift.local.volumes/pods/f81b5ab0-60be-11e7-842b-fa163ef34b22/volumes/kubernetes.io~glusterfs/gluster-pv glusterfs [log-level=ERROR log-file=/var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/glusterfs/gluster-pv/testpod-glusterfs.log] Output: Mount failed. Please check the log file for more details. the following error information was pulled from the glusterfs log to help diagnose this issue: [2017-07-04 13:45:04.424365] E [socket.c:2395:socket_connect_finish] 0-glusterfs: connection to 172.17.0.3:24007 failed (No route to host) [2017-07-04 13:45:04.424468] E [glusterfsd-mgmt.c:1907:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 172.17.0.3 (Transport endpoint is not connected) It's probably too much information, but at least there is "connection to 172.17.0.3:24007 failed (No route to host)" somewhere. oc v3.6.133 kubernetes v1.6.1+5115d708d7
@jsafrane I've tried it again and when an invalid ip in the endpoint is given, I could get the logs: Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 8m 8m 1 default-scheduler Normal Scheduled Successfully assigned gluster to node1.vsphere.local 4m 4m 1 kubelet, node1.vsphere.local Warning FailedMount MountVolume.SetUp failed for volume "kubernetes.io/glusterfs/ff912ad5-6159-11e7-99b2-0050569f4dcf-gluster" (spec.Name: "gluster") pod "ff912ad5-6159-11e7-99b2-0050569f4dcf" (UID: "ff912ad5-6159-11e7-99b2-0050569f4dcf") with: glusterfs: mount failed: mount failed: exit status 1 Mounting command: mount Mounting arguments: 11.66.147.137:testvol /var/lib/origin/openshift.local.volumes/pods/ff912ad5-6159-11e7-99b2-0050569f4dcf/volumes/kubernetes.io~glusterfs/gluster glusterfs [log-file=/var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/glusterfs/gluster/gluster-glusterfs.log log-level=ERROR] Output: Mount failed. Please check the log file for more details. the following error information was pulled from the glusterfs log to help diagnose this issue: [2017-07-05 08:18:57.460446] E [socket.c:2395:socket_connect_finish] 0-glusterfs: connection to 11.66.147.137:24007 failed (Connection timed out) [2017-07-05 08:18:57.460625] E [glusterfsd-mgmt.c:1907:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 11.66.147.137 (Transport endpoint is not connected) 6m 1m 3 kubelet, node1.vsphere.local Warning FailedMount Unable to mount volumes for pod "gluster_jhou(ff912ad5-6159-11e7-99b2-0050569f4dcf)": timeout expired waiting for volumes to attach/mount for pod "jhou"/"gluster". list of unattached/unmounted volumes=[gluster] 6m 1m 3 kubelet, node1.vsphere.local Warning FailedSync Error syncing pod But with this endpoint(two invalid IPs), it was reproduced. oc export endpoints apiVersion: v1 items: - apiVersion: v1 kind: Endpoints metadata: creationTimestamp: null name: glusterfs-cluster subsets: - addresses: - ip: 11.11.11.114 - ip: 11.11.11.118 ports: - port: 1 protocol: TCP kind: List metadata: {} Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 2m 2m 1 default-scheduler Normal Scheduled Successfully assigned gluster to node1.vsphere.local 45s 45s 1 kubelet, node1.vsphere.local Warning FailedMount Unable to mount volumes for pod "gluster_jhou(ec7f7a87-615b-11e7-99b2-0050569f4dcf)": timeout expired waiting for volumes to attach/mount for pod "jhou"/"gluster". list of unattached/unmounted volumes=[gluster] 45s 45s 1 kubelet, node1.vsphere.local Warning FailedSync Error syncing pod
I see, I can reproduce the bug with two IP addresses that drop IP packets and don't send "connection refused". The reason is that kubelet tries to mount the filesystem several times: - 11.11.11.114:/volume - 11.11.11.114:/volume again, now without auto_unmount - 11.11.11.118:/volume - 11.11.11.118:/volume again, now without auto_unmount All these attempts take 2 minutes to time out, so together it takes 8 minutes for kubelet to realize that something it wrong. Indeed, after 8 minutes it sends the usual event: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 9m 9m 1 default-scheduler Normal Scheduled Successfully assigned glusterfs to localhost.localdomain 7m 1m 4 kubelet, localhost.localdomain Warning FailedMount Unable to mount volumes for pod "glusterfs_default(4221407c-6568-11e7-862d-5254000ac840)": timeout expired waiting for volumes to attach/mount for pod "default"/"glusterfs". list of unattached/unmounted volumes=[glusterfsvol] 7m 1m 4 kubelet, localhost.localdomain Warning FailedSync Error syncing pod 1m 1m 1 kubelet, localhost.localdomain Warning FailedMount MountVolume.SetUp failed for volume "kubernetes.io/glusterfs/4221407c-6568-11e7-862d-5254000ac840-glusterfsvol" (spec.Name: "glusterfsvol") pod "4221407c-6568-11e7-862d-5254000ac840" (UID: "4221407c-6568-11e7-862d-5254000ac840") with: glusterfs: mount failed: mount failed: exit status 1 ... glusterfs: connection to 11.11.11.118:24007 failed (Connection timed out); disconnecting socket As an interim fix, trying blindly the same mount without auto_unmount is useless if we know that auto_unmount was not the culprit. Still, for two IP addresses it will take 2x2 minutes to time out, for 5 IP addresses it will take 10 minutes... It does not scale well. @Humble, is there a possibility to reduce the initial timeout to say 30 seconds? We would get feedback in one minute for two servers and 2.5 minutes for five servers - long, but acceptable.
(In reply to Jan Safranek from comment #3) > > @Humble, is there a possibility to reduce the initial timeout to say 30 > seconds? We would get feedback in one minute for two servers and 2.5 minutes > for five servers - long, but acceptable. Jan, Glusterfs rpc engineers are taking a look at this to confirm whether any extra delay is introduced from gluster layer in this case. I am setting needinfo accordingly.
(In reply to Humble Chirammal from comment #8) > (In reply to Jan Safranek from comment #3) > > > > > > @Humble, is there a possibility to reduce the initial timeout to say 30 > > seconds? We would get feedback in one minute for two servers and 2.5 minutes > > for five servers - long, but acceptable. > > Jan, Glusterfs rpc engineers are taking a look at this to confirm whether > any extra delay is introduced from gluster layer in this case. I am setting > needinfo accordingly. With a test program connecting to an invalid address, the connect() system call times out after 2m 10s on my F25 laptop. Seems like this is the scenario encountered with gluster as well. So, that is the default timeout when a client connects to gluster bricks. To reduce the timeout to the desired time of 30s, gluster would need to set the O_NONBLOCK flag on the socket fd and then wait in a select() syscall for 30s at most to get the desired effect. This needs work to be done at gluster end. The timeout is also affected by the transport.socket.ignore-enoent gluster volume option. If set to "on", it is presumed that a connect failure would possibly be because the bricks are yet to be fully functional and that at most 5 connection attempts are made successively to eventually declare a connection failure. Other than these factors, you could try tweaking the /proc/sys/net/ipv4/* system-wide tcp tunables for the desired effect. Documentation/networking/ip-sysctl.txt in the kernel source tree describes these tunables.
I posted https://github.com/openshift/origin/pull/15199 that removes re-try without auto_unmount if it was not necessary. Event "glusterfsd-mgmt: failed to connect with remote-host: 11.66.147." will appear in half of the time. That would remove "Regression" from this bug, the timeout would be the same as in previous OpenShift releases. Faster gluster timeouts must be fixed on Gluster side, I filled #1470986 there.
(In reply to Jan Safranek from comment #10) > I posted https://github.com/openshift/origin/pull/15199 that removes re-try > without auto_unmount if it was not necessary. Thanks Jan! This will be pulled into OCP 3.6 tree, Isnt it ? > > Event "glusterfsd-mgmt: failed to connect with remote-host: 11.66.147." will > appear in half of the time. That would remove "Regression" from this bug, > the timeout would be the same as in previous OpenShift releases. > > Faster gluster timeouts must be fixed on Gluster side, I filled #1470986 > there. Thanks. I will follow up with Gluster RPC engineers.
It would be very useful to pull also https://github.com/kubernetes/kubernetes/pull/49127 as it reduces the timeout from 2 minutes * nr. of endpoints to 2 minutes. And yes, it would be nice to have in 3.6.x. PRs will follow shortly.
Bumping priority - it's regression after all.
Cloned into 3.6.x as bug #1473587. From now on, this bug tracks fix in 3.7
Tested on v3.7.0-0.127.0 When the mount failed with two invalid ip the following messages are displayed. It was 6 minutes later after the previous event. It's not like the description in comment 12, @jsafrane could you take a look? Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 9m 9m 1 default-scheduler Normal Scheduled Successfully assigned gluster to ocp37.lb.node1.vsphere.local 9m 9m 1 kubelet, ocp37.lb.node1.vsphere.local Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "default-token-x5xwm" 3m 3m 1 kubelet, ocp37.lb.node1.vsphere.local Warning FailedMount MountVolume.SetUp failed for volume "gluster" : glusterfs: mount failed: mount failed: exit status 1 Mounting command: systemd-run Mounting arguments: --description=Kubernetes transient mount for /var/lib/origin/openshift.local.volumes/pods/6e623079-a1b7-11e7-a970-0050569f68e7/volumes/kubernetes.io~glusterfs/gluster --scope -- mount -t glusterfs -o backup-volfile-servers=11.11.11.114:11.11.11.118,log-level=ERROR,log-file=/var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/glusterfs/gluster/gluster-glusterfs.log 11.11.11.114:testvol /var/lib/origin/openshift.local.volumes/pods/6e623079-a1b7-11e7-a970-0050569f68e7/volumes/kubernetes.io~glusterfs/gluster Output: Running scope as unit run-14258.scope. Mount failed. Please check the log file for more details. the following error information was pulled from the glusterfs log to help diagnose this issue: [2017-09-25 06:09:04.071411] E [glusterfsd-mgmt.c:1907:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 11.11.11.114 (Transport endpoint is not connected) [2017-09-25 06:11:13.095419] E [glusterfsd-mgmt.c:1907:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 11.11.11.118 (Transport endpoint is not connected) 7m 51s 4 kubelet, ocp37.lb.node1.vsphere.local Warning FailedMount Unable to mount volumes for pod "gluster_jhou(6e623079-a1b7-11e7-a970-0050569f68e7)": timeout expired waiting for volumes to attach/mount for pod "jhou"/"gluster". list of unattached/unmounted volumes=[gluster] 7m 51s 4 kubelet, ocp37.lb.node1.vsphere.local Warning FailedSync Error syncing pod
The main message here is "Transport endpoint is not connected", which is indeed different to "Connection timed out", but it's change on Gluster side. Kubernetes/OpenShift is just printing whatever Gluster provides. So on OpenShift side it's fixed. If you want you may open a new bug on Gluster to log something better than "Transport endpoint is not connected".
Thank you! This bug is verified according to comment 16
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188