Bug 1379228 - smoke test fails with read/write failed (ENOTCONN)
Summary: smoke test fails with read/write failed (ENOTCONN)
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: mainline
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
Assignee: Mohammed Rafi KC
QA Contact:
: 1381016 (view as bug list)
Depends On: 1388292 1399015 1399018 1399023 1399024
TreeView+ depends on / blocked
Reported: 2016-09-26 06:05 UTC by Ravishankar N
Modified: 2018-08-29 03:36 UTC (History)
12 users (show)

Fixed In Version: glusterfs-4.1.3 (or later)
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2018-08-29 03:36:15 UTC
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:

Attachments (Terms of Use)
statedump from the hung glusterfs process (745.97 KB, text/plain)
2016-11-16 02:05 UTC, Vijay Bellur
no flags Details

Description Ravishankar N 2016-09-26 06:05:32 UTC
Description of problem:

Smoke tests seem to be failing for different patches for reasons unrelated to the patches themselves:


Comment 1 Nigel Babu 2016-09-26 06:12:17 UTC
Checking now.

Comment 2 Nigel Babu 2016-09-26 06:15:12 UTC
These are the dbench failures.

Comment 3 Nigel Babu 2016-09-26 06:28:12 UTC
Seemingly all these are on slave34. I'm going to quickly sample all the smoke failures to see if they're all happening on slave34.

Comment 5 Nigel Babu 2016-09-26 09:25:15 UTC
Michael and Rajesh, do you have any idea what's going on in these failures. Any pointers to look at where to start debugging would also be useful.

Comment 6 Nigel Babu 2016-09-27 17:45:10 UTC
I'm inclined to think that this is a Gluster bug than an infra issue. What could cause "Transport endpoint is not connected" errors from Gluster?

Comment 7 Nigel Babu 2016-09-27 17:57:52 UTC
From what I can see from rudimentary searches:

"Transport endpoint is not connected" most likely means the FUSE mount has crashed. This most definitely means an intermittent bug has slipped past us?

Comment 8 Nigel Babu 2016-09-27 18:45:47 UTC
I'm guessing Pranith was looking for /var/log/glusterfs files. I've added the logic to grab them for smoke. We never did get them for smoke: https://github.com/gluster/glusterfs-patch-acceptance-tests/pull/62/files

Comment 9 Shyamsundar 2016-09-27 19:24:09 UTC
My notes:

The script: https://github.com/gluster/glusterfs-patch-acceptance-tests/blob/master/smoke.sh

Success case: https://build.gluster.org/job/smoke/30870/console
10:47:56 + wait %3
  ---> This happens when %2 wait is complete, so dbench was done by this time and the script started waiting on %3 (IOW the line printed is going to be executed)
10:48:51 All tests successful.
10:48:51 Files=191, Tests=1960, 129 wallclock secs ( 1.28 usr  0.36 sys +  9.57 cusr  7.43 csys = 18.64 CPU)
10:48:51 Result: PASS
  ---> %3 (compliance) completed (took about 129 seconds, dbench would take about 71-72 seconds including the warmup), so the wait above was over and we proceed
  ---> cleanup starts
10:48:51 + rm -rf clients
10:48:53 + cd -
10:48:53 /home/jenkins/root/workspace/smoke
10:48:53 + finish
10:48:53 + RET=0
  ---> NOTE: RET here takes the output of rm -rf clients, not sure if this is intended
10:48:53 + '[' 0 -ne 0 ']'
10:48:53 + cleanup
  ---> cleanup invoked by the finish, and this possibly has the set -x enabled by the script (but watchdog does not see the failed case)
10:48:53 + killall -15 glusterfs glusterfsd glusterd
  ---> All well!

Failure case: https://build.gluster.org/job/smoke/30852/console
00:03:16 All tests successful.
00:03:16 Files=191, Tests=1960, 93 wallclock secs ( 0.89 usr  0.26 sys +  5.46 cusr  3.30 csys =  9.91 CPU)
00:03:16 Result: PASS
00:11:36 Kicking in watchdog after 600 secs
  ---> Where are the watchdog cleanup calls noted? It appears that watchdog is called before set -x and hence cleanup is not logged here
  ---> Assuming cleanup was called, it killed all gluster processes, and dbench finally errored out in the read (no connection), and hence %2 completed
00:11:36 + wait %3
  ---> wait for %3 starts, and gets over ASAP as compliance has finished running about 8 minutes back (00:03:16)
00:11:36 + rm -rf clients
00:11:36 rm: cannot remove `clients': Transport endpoint is not connected
  ---> We cannot as watchdog has cleaned up the process, so this rm -rf fails (we failed cleanup, is this an issue for the next run?)
00:11:36 + finish
00:11:36 + RET=1
  ---> rm -rf failed, so we caught that, is this what is intended?
00:11:36 + '[' 1 -ne 0 ']'
00:11:36 + cat /build/dbench-logs
00:11:36   10  cleanup 581 sec
  ---> dbench has been attempting cleanup for 580 odd seconds
00:11:36 [643] read failed on handle 10007 (Transport endpoint is not connected)
  ---> Finally the dbench clients get an error as watchdog shut the process and hence the volume down and we get connection errors and dbench exits
00:11:36 + cleanup
  ---> Called by finish, and everything fails as watchdog has cleaned up already
00:11:36 + killall -15 glusterfs glusterfsd glusterd
00:11:36 glusterfs: no process killed
00:11:36 glusterfsd: no process killed
00:11:36 glusterd: no process killed

Root cause:
Looks like dbench got stuck at https://github.com/sahlberg/dbench/blob/master/fileio.c#L400 (or pread) and never was able to break out of it. This caused dbench never to complete till the volume and the mount was taken down and it errored out.

Why it got stuck here, would be the next question I guess.

Comment 10 Nigel Babu 2016-09-30 05:29:40 UTC
Moving to core as this is quite certainly not an infra issue.

Comment 11 Nigel Babu 2016-10-02 17:32:12 UTC
*** Bug 1381016 has been marked as a duplicate of this bug. ***

Comment 12 Nigel Babu 2016-10-14 09:40:44 UTC
Assigning this to Pranith as he's debugging this now.

Comment 13 Pranith Kumar K 2016-10-19 03:29:03 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1381433#c7 has the modified script which we need to deploy to debug this further. The modified script will collect statedumps of the client and server and move them to /var/log/glusterfs so that they will be available in tarball. Nigel could you deploy them whenever you have time?

Comment 14 Nigel Babu 2016-10-19 03:34:59 UTC
Why don't you submit a pull request to https://github.com/gluster/glusterfs-patch-acceptance-tests with your changes so I can review and merge them? If we get it merged today, I can trigger a deploy across all machines.

Comment 16 Nigel Babu 2016-11-11 14:36:25 UTC
Perhaps the 5 second wait isn't enough?

06:22:41 + mv '/var/run/gluster/*dump*' /var/log/glusterfs/
06:22:41 mv: cannot stat `/var/run/gluster/*dump*': No such file or directory


Comment 17 Nithya Balachandran 2016-11-15 08:24:32 UTC
From https://build.gluster.org/job/smoke/31600/console:

From mnt.log:
[2016-11-14 14:46:53.043548] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 81138: LOOKUP() /_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_12345 => -1 (File name too long)
[2016-11-14 14:46:53.043483] W [MSGID: 114031] [client-rpc-fops.c:2930:client3_3_lookup_cbk] 0-patchy-client-1: remote operation failed. Path: /_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_12345 (00000000-0000-0000-0000-000000000000) [File name too long]
[2016-11-14 14:46:53.043517] W [MSGID: 114031] [client-rpc-fops.c:2930:client3_3_lookup_cbk] 0-patchy-client-0: remote operation failed. Path: /_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_12345 (00000000-0000-0000-0000-000000000000) [File name too long]
[2016-11-14 14:54:53.930498] W [glusterfsd.c:1329:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7aa1) [0x7f620a9f1aa1] -->glusterfs(glusterfs_sigwaiter+0xe4) [0x409c58] -->glusterfs(cleanup_and_exit+0x87) [0x408092] ) 0-: received signum (15), shutting down
[2016-11-14 14:54:53.930539] I [fuse-bridge.c:5797:fini] 0-fuse: Unmounting '/mnt'.
[2016-11-14 14:54:53.932418] W [socket.c:588:__socket_rwv] 0-patchy-client-1: readv on failed (No data available)

From glusterdump.11579.dump.1479135284:



From glusterdump.18347.dump.1479132791:

wind_to=FIRST_CHILD (frame->this)->fops->readv


I don't know why there are 2 statedumps for the mnt process.

Comment 18 Nigel Babu 2016-11-15 16:15:49 UTC
I've taken slave34 offline and Vijay's debugging smoke test failures on there now.

Comment 19 Vijay Bellur 2016-11-16 02:05:21 UTC
Created attachment 1221016 [details]
statedump from the hung glusterfs process

Comment 20 Vijay Bellur 2016-11-16 02:13:00 UTC
The problem seems related to READ fops not going beyond write-behind. This can be seen in the attached statedump (comment 19). Still need to determine why READs are not being percolated down from write-behind.

Comment 21 Poornima G 2016-11-22 11:45:12 UTC
Possible fix: http://review.gluster.org/15901

Please refer to the commit message for more explaination

Comment 22 Poornima G 2016-11-24 06:17:54 UTC
    In certain cases, ioc_readv() issues STACK_WIND_TAIL() instead
    of STACK_WIND(). One such case is when inode_ctx for that file
    is not present (can happen if readdirp was called, and populates
    md-cache and serves all the lookups from cache).
    Consider the following graph:
    io-cache (parent)
    Below is the code snippet of ioc_readv calling STACK_WIND_TAIL:
     if (!inode_ctx)
       STACK_WIND_TAIL (frame, FIRST_CHILD (frame->this),
                        FIRST_CHILD (frame->this)->fops->readv, fd,
                        size, offset, flags, xdata);
       /* Ideally, this stack_wind should wind to readdir-ahead:readv()
          but it winds to read-ahead:readv(). See below for
    STACK_WIND_TAIL (frame, obj, fn, ...)
      frame->this = obj;
      /* for the above mentioned graph, frame->this will be readdir-ahead
       * frame->this = FIRST_CHILD (frame->this) i.e. readdir-ahead, which
       * is as expected
      THIS = obj;
      /* THIS will be read-ahead instead of readdir-ahead!, as obj expands
       * to "FIRST_CHILD (frame->this)" and frame->this was pointing
       * to readdir-ahead in the previous statement.
      fn (frame, obj, params);
      /* fn will call read-ahead:readv() instead of readdir-ahead:readv()!
       * as fn expands to "FIRST_CHILD (frame->this)->fops->readv" and
       * frame->this was pointing ro readdir-ahead in the first statement
    Thus, the readdir-ahead's readv() implementation will be skipped, and
    ra_readv() will be called with frame->this = "readdir-ahead" and
    this = "read-ahead". This can lead to corruption / hang / other problems.
    But in this perticular case, when 'frame->this' and 'this' passed
    to ra_readv() doesn't match, it causes ra_readv() to call ra_readv()
    again!. Thus the logic of read-ahead readv() falls apart and leads to

Comment 23 Amar Tumballi 2018-08-29 03:36:15 UTC
This update is done in bulk based on the state of the patch and the time since last activity. If the issue is still seen, please reopen the bug.

Note You need to log in before you can comment on or make changes to this bug.