Bug 811552
Summary: | [glusterfs-3.3.0qa33]: fd migration failing in fuse_resolve | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Raghavendra Bhat <rabhat> |
Component: | fuse | Assignee: | Raghavendra Bhat <rabhat> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | mainline | CC: | amarts, gluster-bugs, vbellur |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.4.0 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-07-24 17:41:32 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | glusterfs-3.3.0qa40 | Category: | --- |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 817967 |
Description
Raghavendra Bhat
2012-04-11 12:00:30 UTC
Just running ping_pong with graph changes is suffecient to hit the bug. It led to total fd count increasing to the VFS max file limit, and thus errors and some of the applications getting segfaulted. When ran with valgrind ping_pong started getting EBADFD errors and logs indicated fd migration failing. 2012-04-24 16:05:38.852707] W [fuse-resolve.c:360:fuse_resolve_fd] 0-fuse-resolve: migration of fd (0xb577af0) did not complete, failing fop with EBADF [2012-04-24 16:05:38.853246] D [socket.c:193:__socket_rwv] 3-mirror-client-2: EOF from peer 127.0.0.1:24013 [2012-04-24 16:05:38.854810] D [socket.c:1521:__socket_proto_state_machine] 3-mirror-client-2: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:24013) [2012-04-24 16:05:38.855269] D [socket.c:1807:socket_event_handler] 3-transport: disconnecting now [2012-04-24 16:05:38.856020] I [client.c:2099:client_rpc_notify] 3-mirror-client-2: disconnected [2012-04-24 16:05:38.856468] E [afr-common.c:3666:afr_notify] 3-mirror-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2012-04-24 16:05:38.856901] D [fuse-bridge.c:4208:notify] 0-fuse: got event 6 on graph 3 [2012-04-24 16:05:38.858771] W [fuse-resolve.c:360:fuse_resolve_fd] 0-fuse-resolve: migration of fd (0xb577af0) did not complete, failing fop with EBADF [2012-04-24 16:05:38.875087] W [fuse-resolve.c:152:fuse_resolve_gfid_cbk] 0-fuse: f8a743cd-5f91-4132-a870-09b2f9a57627: failed to resolve (Transport endpoint is not connected) [2012-04-24 16:05:38.877005] W [fuse-bridge.c:2845:fuse_getxattr_resume] 0-glusterfs-fuse: 81528: GETXATTR f8a743cd-5f91-4132-a870-09b2f9a57627/191002180 (security.capability) resolution failed [2012-04-24 16:05:38.880052] W [fuse-resolve.c:360:fuse_resolve_fd] 0-fuse-resolve: migration of fd (0xb577af0) did not complete, failing fop with EBADF [2012-04-24 16:05:38.881607] W [fuse-resolve.c:360:fuse_resolve_fd] 0-fuse-resolve: migration of fd (0xb577af0) did not complete, failing fop with EBADF [2012-04-24 16:05:38.883238] W [fuse-resolve.c:360:fuse_resolve_fd] 0-fuse-resolve: migration of fd (0xb577af0) did not complete, failing fop with EBADF [2012-04-24 16:05:38.886202] W [fuse-resolve.c:152:fuse_resolve_gfid_cbk] 0-fuse: f8a743cd-5f91-4132-a870-09b2f9a57627: failed to resolve (Transport endpoint is not connected) [2012-04-24 16:05:38.886846] W [fuse-bridge.c:2845:fuse_getxattr_resume] 0-glusterfs-fuse: 81532: GETXATTR f8a743cd-5f91-4132-a870-09b2f9a57627/191002180 (security.capability) resolution failed [2012-04-24 16:05:38.888388] W [fuse-resolve.c:360:fuse_resolve_fd] 0-fuse-resolve: migration of fd (0xb577af0) did not complete, failing fop with EBADF [2012-04-24 16:05:38.889948] W [fuse-resolve.c:360:fuse_resolve_fd] 0-fuse-resolve: migration of fd (0xb577af0) did not complete, failing fop with EBADF [2012-04-24 16:05:38.891540] W [fuse-resolve.c:360:fuse_resolve_fd] 0-fuse-resolve: migration of fd (0xb577af0) did not complete, failing fop with EBADF [2012-04-24 16:05:38.894579] W [fuse-resolve.c:152:fuse_resolve_gfid_cbk] 0-fuse: f8a743cd-5f91-4132-a870-09b2f9a57627: failed to resolve (Transport endpoint is not connected) [2012-04-24 16:05:38.895180] W [fuse-bridge.c:2845:fuse_getxattr_resume] 0-glusterfs-fuse: 81536: GETXATTR f8a743cd-5f91-4132-a870-09b2f9a57627/191002180 (security.capability) resolution failed [2012-04-24 16:05:38.896698] W [fuse-resolve.c:360:fuse_resolve_fd] 0-fuse-resolve: migration of fd (0xb577af0) did not complete, failing fop with EBADF [2012-04-24 16:05:38.898243] W [fuse-resolve.c:360:fuse_resolve_fd] 0-fuse-resolve: migration of fd (0xb577af0) did not complete, failing fop with EBADF [2012-04-24 16:05:38.899826] W [fuse-resolve.c:360:fuse_resolve_fd] 0-fuse-resolve: migration of fd (0xb577af0) did not complete, failing fop with EBADF [2012-04-24 16:05:38.901319] W [fuse-resolve.c:360:fuse_resolve_fd] 0-fuse-resolve: migration of fd (0xb577af0) did not complete, failing fop with EBADF This is the dmesg output indicating the fd count hitting the VFS max limit. Apr 24 12:29:09 hyperspace kernel: [ 3671.564994] VFS: file-max limit 385710 reached Apr 24 12:29:09 hyperspace kernel: [ 3673.048036] VFS: file-max limit 385710 reached Apr 24 12:29:09 hyperspace kernel: [ 3673.151796] VFS: file-max limit 385710 reached Apr 24 12:29:09 hyperspace kernel: [ 3673.162949] VFS: file-max limit 385710 reached Apr 24 12:29:10 hyperspace kernel: [ 3673.720373] VFS: file-max limit 385710 reached Apr 24 12:29:10 hyperspace kernel: [ 3673.772767] VFS: file-max limit 385710 reached Apr 24 12:29:10 hyperspace kernel: [ 3673.798308] VFS: file-max limit 385710 reached Apr 24 12:29:10 hyperspace kernel: [ 3673.798956] VFS: file-max limit 385710 reached Apr 24 12:29:10 hyperspace kernel: [ 3673.814634] VFS: file-max limit 385710 reached Apr 24 12:29:11 hyperspace kernel: [ 3675.410128] show_signal_msg: 24 callbacks suppressed Apr 24 12:29:11 hyperspace kernel: [ 3675.410134] compiz[1836]: segfault at 20 ip 00007f1139ecd58b sp 00007fffd2822970 error 4 in libdrm_intel.so.1.0.0[7f1139ec7000+a000] Apr 24 12:29:11 hyperspace kernel: [ 3675.412261] VFS: file-max limit 385710 reached Apr 24 12:29:13 hyperspace kernel: [ 3677.149905] VFS: file-max limit 385710 reached Apr 24 12:29:17 hyperspace kernel: [ 3680.831535] VFS: file-max limit 385710 reached Apr 24 12:29:17 hyperspace postfix/master[1426]: warning: master_wakeup_timer_event: service qmgr(public/qmgr): Too many open files in system Apr 24 12:29:17 hyperspace postfix/master[1426]: warning: master_wakeup_timer_event: service pickup(public/pickup): Too many open files in system Apr 24 12:29:17 hyperspace kernel: [ 3680.949343] VFS: file-max limit 385710 reached Apr 24 12:29:17 hyperspace kernel: [ 3680.953927] VFS: file-max limit 385710 reached Apr 24 12:29:17 hyperspace kernel: [ 3680.963900] VFS: file-max limit 385710 reached Apr 24 12:29:17 hyperspace postfix/qmgr[2082]: fatal: scan_dir_push: open directory deferred: Too many open files in system Apr 24 12:29:17 hyperspace kernel: [ 3681.342443] VFS: file-max limit 385710 reached Apr 24 12:29:17 hyperspace kernel: [ 3681.370371] VFS: file-max limit 385710 reached Apr 24 12:29:17 hyperspace kernel: [ 3681.373358] VFS: file-max limit 385710 reached Apr 24 12:29:18 hyperspace kernel: [ 3681.550102] VFS: file-max limit 385710 reached Apr 24 12:29:18 hyperspace kernel: [ 3681.666582] VFS: file-max limit 385710 reached Apr 24 12:29:18 hyperspace kernel: [ 3681.672847] VFS: file-max limit 385710 reached Apr 24 12:29:18 hyperspace kernel: [ 3681.781079] VFS: file-max limit 385710 reached Apr 24 12:29:18 hyperspace kernel: [ 3681.782683] VFS: file-max limit 385710 reached Apr 24 12:29:18 hyperspace postfix/master[1426]: warning: process /usr/lib/postfix/qmgr pid 2082 exit status 1 Apr 24 12:29:18 hyperspace kernel: [ 3682.341382] VFS: file-max limit 385710 reached Apr 24 12:29:18 hyperspace kernel: [ 3682.521635] VFS: file-max limit 385710 reached Apr 24 12:29:19 hyperspace kernel: [ 3682.560183] VFS: file-max limit 385710 reached Apr 24 12:29:19 hyperspace kernel: [ 3682.584952] VFS: file-max limit 385710 reached Apr 24 12:29:19 hyperspace kernel: [ 3682.635927] VFS: file-max limit 385710 reached Apr 24 12:29:19 hyperspace kernel: [ 3682.675731] VFS: file-max limit 385710 reached Apr 24 12:29:19 hyperspace kernel: [ 3682.681351] VFS: file-max limit 385710 reached Apr 24 12:29:19 hyperspace kernel: [ 3682.691349] VFS: file-max limit 385710 reached Apr 24 12:29:19 hyperspace kernel: [ 3682.701340] VFS: file-max limit 385710 reached Apr 24 12:29:19 hyperspace kernel: [ 3682.711316] VFS: file-max limit 385710 reached Apr 24 12:29:19 hyperspace kernel: [ 3682.834319] VFS: file-max limit 385710 reached Apr 24 12:29:19 hyperspace kernel: [ 3683.000432] VFS: file-max limit 385710 reached Apr 24 12:29:19 hyperspace kernel: [ 3683.350397] VFS: file-max limit 385710 reached : CHANGE: http://review.gluster.com/3222 (mount/fuse: unref the fds after they have been migrated to the new graph) merged in master by Anand Avati (avati) Checked with glusterfs-3.3.0qa40. ran ping_pong with graph changes happening parallely. No such log messages are seen and VFS max limit is not reached since we are properly unrefing the fd after being migrated to the new graph. |