Bug 1211220 - quota: ENOTCONN parodically seen in logs when setting hard/soft timeout during I/O.
Summary: quota: ENOTCONN parodically seen in logs when setting hard/soft timeout durin...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: quota
Version: mainline
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
Assignee: Vijaikumar Mallikarjuna
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-04-13 10:50 UTC by Vijaikumar Mallikarjuna
Modified: 2016-06-16 12:49 UTC (History)
7 users (show)

Fixed In Version: glusterfs-3.8rc2
Doc Type: Bug Fix
Doc Text:
Clone Of: 1039674
: 1226789 (view as bug list)
Environment:
Last Closed: 2016-06-16 12:49:31 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Vijaikumar Mallikarjuna 2015-04-13 10:50:47 UTC
+++ This bug was initially created as a clone of Bug #1039674 +++

Description of problem:

When running quota automation I occasionally(1 in 10 runs?) see the following testcase fail:

1. create a 6x2 volume, start it.
2. gluster volume quota <vol-name> enable
3. gluster volume quota <vol-name> limit-usage / 5GB
4. gluster volume quota <vol-name> list
5. mount -t nfs/glusterfs/(or mount using SMB) <server-ip>:<vol-name> <mount-point>
6. start creating data inside the mount-point, till limit is reached. files of size 2MB meanwhile:
7. gluster volume quota <vol-name> soft-timeout 30s
8. gluster volume quota <vol-name> hard-timeout 60s after data creation is completed.
10. gluster volume quota <vol-name> list

Client side I see:

dd: opening `/quota-mount/tcms_285026/test.file': Transport endpoint is not connected

And in the brick logs I see:

/var/log/glusterfs/bricks/bricks-quota-test-setup_brick2.log:[2013-12-06 17:59:02.743336] W [quota-enforcer-client.c:187:quota_enforcer_lookup_cbk] 0-quota-test-setup-quota: remote operation failed: Transport endpoint is not connected. Path: /tcms_285026 (d892ce24-7e59-4eeb-b86f-7c7d34c71317)
/var/log/glusterfs/bricks/bricks-quota-test-setup_brick2.log:[2013-12-06 17:59:02.743377] I [server-rpc-fops.c:1618:server_create_cbk] 0-quota-test-setup-server: 26: CREATE /tcms_285026/test.file (d892ce24-7e59-4eeb-b86f-7c7d34c71317/test.file) ==> (Transport endpoint is not connected)

Version-Release number of selected component (if applicable):

glusterfs-server-3.4.0.44.1u2rhs-1.el6rhs.x86_64

How reproducible:

So far this looks to be about 1 in 10 runs.

Steps to Reproduce:
1. create a 6x2 volume, start it.
2. gluster volume quota <vol-name> enable
3. gluster volume quota <vol-name> limit-usage / 5GB
4. gluster volume quota <vol-name> list
5. mount -t nfs/glusterfs/(or mount using SMB) <server-ip>:<vol-name> <mount-point>
6. start creating data inside the mount-point, till limit is reached. files of size 2MB meanwhile:
7. gluster volume quota <vol-name> soft-timeout 30s
8. gluster volume quota <vol-name> hard-timeout 60s after data creation is completed.
10. gluster volume quota <vol-name> list

Actual results:

I/O errors are occasionally hit when the hard/soft timeout is modified with data in flight.

Expected results:

I/Os complete successfully when timeouts are modified.

Additional info:

I'll try to provide a more concrete reproducer.

--- Additional comment from Vijaikumar Mallikarjuna on 2015-03-03 03:59:22 EST ---

Hi Ben,

I am not able to re-create this issue with 3.6 release.

--- Additional comment from Vijaikumar Mallikarjuna on 2015-04-13 06:41:06 EDT ---

Whenever a new volume is created, quotad gets restarted. This can cause ENOTCONN in the others volumes IO path

Comment 1 Anand Avati 2015-04-14 08:42:13 UTC
REVIEW: http://review.gluster.org/10230 (quota: retry connecting to quotad on ENOTCONN error) posted (#1) for review on master by Vijaikumar Mallikarjuna (vmallika)

Comment 2 Anand Avati 2015-04-24 06:20:08 UTC
REVIEW: http://review.gluster.org/10230 (quota: retry connecting to quotad on ENOTCONN error) posted (#2) for review on master by Vijaikumar Mallikarjuna (vmallika)

Comment 3 Anand Avati 2015-05-28 04:53:18 UTC
REVIEW: http://review.gluster.org/10230 (quota: retry connecting to quotad on ENOTCONN error) posted (#3) for review on master by Atin Mukherjee (amukherj)

Comment 4 Anand Avati 2015-05-29 07:28:48 UTC
REVIEW: http://review.gluster.org/10230 (quota: retry connecting to quotad on ENOTCONN error) posted (#4) for review on master by Vijaikumar Mallikarjuna (vmallika)

Comment 5 Niels de Vos 2015-06-02 08:20:15 UTC
The required changes to fix this bug have not made it into glusterfs-3.7.1. This bug is now getting tracked for glusterfs-3.7.2.

Comment 6 Niels de Vos 2015-06-20 10:07:59 UTC
Unfortunately glusterfs-3.7.2 did not contain a code change that was associated with this bug report. This bug is now proposed to be a blocker for glusterfs-3.7.3.

Comment 7 Vijaikumar Mallikarjuna 2015-06-22 06:48:40 UTC
Upstream patch: http://review.gluster.org/#/c/10230/
Release-3.7 patch: http://review.gluster.org/#/c/11024/

Comment 8 Niels de Vos 2016-06-16 12:49:31 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.