Bug 1626313 - fix glfs_fini related problems
Summary: fix glfs_fini related problems
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: eventsapi
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-09-07 02:41 UTC by Kinglong Mee
Modified: 2019-03-25 16:30 UTC (History)
1 user (show)

Fixed In Version: glusterfs-6.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-03-25 16:30:38 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gluster.org Gerrit 21217 0 None Abandoned rpc-clnt: get reference of rpc/transport when they are used 2019-02-06 19:34:42 UTC
Gluster.org Gerrit 21218 0 None Abandoned inode: wait all non-root inodes unrefenced when inode_table_destroy 2019-02-06 19:34:18 UTC

Description Kinglong Mee 2018-09-07 02:41:49 UTC
Description of problem:
glfs_fini return -1 when active threads exist in event_pool

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Worker Ant 2018-09-07 02:45:31 UTC
REVIEW: https://review.gluster.org/21111 (event: get time by gettimeofday as pthread_cond_timedwait using) posted (#1) for review on master by Kinglong Mee

Comment 2 Worker Ant 2018-09-17 13:43:19 UTC
COMMIT: https://review.gluster.org/21111 committed in master by "Shyamsundar Ranganathan" <srangana> with a commit message- event: get time by clock_gettime as pthread_cond_timedwait using

Debug information shows time() has a delay between seconds increase,
get the right time by clock_gettime as pthread_cond_timedwait using.

                         ret = pthread_cond_timedwait (&event_pool->cond,
                                                       &event_pool->mutex,
                                                       &sleep_till);
+                         gf_msg ("epoll", GF_LOG_INFO, 0,
+                                 LG_MSG_EXITED_EPOLL_THREAD,
+                                 "pthread_cond_timedwait %lu %p return %d (active %d:%d)",
+                                 sleep_till.tv_sec, event_pool, ret,
+                                 event_pool->activethreadcount, threadcount);
                 }
         }
         pthread_mutex_unlock (&event_pool->mutex);

[2018-09-06 18:33:57.000879] I [event.c:284:event_dispatch_destroy] 0-epoll: pthread_cond_timedwait 1536230037 0x1f48e60 return 110 (active 5:4)
[2018-09-06 18:33:57.000916] I [event.c:284:event_dispatch_destroy] 0-epoll: pthread_cond_timedwait 1536230037 0x1f48e60 return 110 (active 5:4)
[2018-09-06 18:33:57.000931] I [event.c:284:event_dispatch_destroy] 0-epoll: pthread_cond_timedwait 1536230037 0x1f48e60 return 110 (active 5:4)
[2018-09-06 18:33:57.000945] I [event.c:284:event_dispatch_destroy] 0-epoll: pthread_cond_timedwait 1536230037 0x1f48e60 return 110 (active 5:4)
[2018-09-06 18:33:57.000957] I [event.c:284:event_dispatch_destroy] 0-epoll: pthread_cond_timedwait 1536230037 0x1f48e60 return 110 (active 5:4)
[2018-09-06 18:33:57.000970] I [event.c:284:event_dispatch_destroy] 0-epoll: pthread_cond_timedwait 1536230037 0x1f48e60 return 110 (active 5:4)
[2018-09-06 18:33:57.000983] I [event.c:284:event_dispatch_destroy] 0-epoll: pthread_cond_timedwait 1536230037 0x1f48e60 return 110 (active 5:4)
[2018-09-06 18:33:57.000997] I [event.c:284:event_dispatch_destroy] 0-epoll: pthread_cond_timedwait 1536230037 0x1f48e60 return 110 (active 5:4)
[2018-09-06 18:33:57.001010] I [event.c:284:event_dispatch_destroy] 0-epoll: pthread_cond_timedwait 1536230037 0x1f48e60 return 110 (active 5:4)
[2018-09-06 18:33:57.001022] I [event.c:284:event_dispatch_destroy] 0-epoll: pthread_cond_timedwait 1536230037 0x1f48e60 return 110 (active 5:4)
[2018-09-06 18:33:57.001034] I [event.c:284:event_dispatch_destroy] 0-epoll: pthread_cond_timedwait 1536230037 0x1f48e60 return 110 (active 5:4)
[2018-09-06 18:33:57.001060] I [event.c:284:event_dispatch_destroy] 0-epoll: pthread_cond_timedwait 1536230037 0x1f48e60 return 110 (active 5:4)
[2018-09-06 18:33:57.003085] I [event.c:284:event_dispatch_destroy] 0-epoll: pthread_cond_timedwait 1536230038 0x1f48e60 return 0 (active 4:4)
[2018-09-06 18:33:57.014142] I [event.c:284:event_dispatch_destroy] 0-epoll: pthread_cond_timedwait 1536230038 0x1f48e60 return 0 (active 3:4)

Change-Id: I735249eee9a6f8284392b69e501479ac163b8409
fixes: bz#1626313
Signed-off-by: Kinglong Mee <mijinlong>

Comment 3 Worker Ant 2018-09-19 08:11:33 UTC
REVIEW: https://review.gluster.org/21218 (inode: wait all non-root inodes unrefenced when inode_table_destroy) posted (#1) for review on master by Kinglong Mee

Comment 4 Worker Ant 2018-09-19 08:12:44 UTC
REVIEW: https://review.gluster.org/21215 (libgfapi: fix use after freed of clnt when dispatching events) posted (#1) for review on master by Kinglong Mee

Comment 5 Worker Ant 2018-09-19 08:13:59 UTC
REVIEW: https://review.gluster.org/21217 (rpc-clnt: get reference of rpc/transport when they are used) posted (#1) for review on master by Kinglong Mee

Comment 6 Worker Ant 2018-09-19 08:15:19 UTC
REVIEW: https://review.gluster.org/21216 (rpc: failed requests immediately if rpc connection is down) posted (#1) for review on master by Kinglong Mee

Comment 7 Worker Ant 2018-09-19 08:16:33 UTC
REVIEW: https://review.gluster.org/21219 (gfapi: fix crash of using uninitialized fs->ctx) posted (#1) for review on master by Kinglong Mee

Comment 8 Worker Ant 2018-09-25 08:09:04 UTC
REVIEW: https://review.gluster.org/21270 (socket: clear return value if error is going to be handled in event thread) posted (#1) for review on master by Kinglong Mee

Comment 9 Worker Ant 2018-09-25 08:10:19 UTC
REVIEW: https://review.gluster.org/21271 (logging: fix file handle leak when calls glfs_set_logging more times) posted (#1) for review on master by Kinglong Mee

Comment 10 Worker Ant 2018-09-26 09:56:06 UTC
REVIEW: https://review.gluster.org/21282 (syncop: check syncenv status before pthread_cond_timedwait() to avoid 600s timeout) posted (#1) for review on master by Kinglong Mee

Comment 11 Worker Ant 2018-09-26 10:13:05 UTC
COMMIT: https://review.gluster.org/21215 committed in master by "Kinglong Mee" <kinglongmee> with a commit message- libgfapi: fix use after freed of clnt when dispatching events

Avoid dispatching events to mgmt after freed,
unreference mgmt after the event_dispatch_destroy.

Change-Id: I5b762b37901de70a955661df0aff95bf055ba4ea
updates: bz#1626313
Signed-off-by: Kinglong Mee <mijinlong>

Comment 12 Worker Ant 2018-09-27 03:04:43 UTC
COMMIT: https://review.gluster.org/21216 committed in master by "Amar Tumballi" <amarts> with a commit message- rpc: failed requests immediately if rpc connection is down

In the case glfs_fini is ongoing, some cache xlators like readdir-ahead,
continues to submit requests. Current rpc submit code ignores
connection status and queues these internally generated requests. These
requests then got cleaned up after inode table has been destroyed,
causing crash.

Change-Id: Ife6b17d8592a054f7a7f310c79d07af005087017
updates: bz#1626313
Signed-off-by: Zhang Huan <zhanghuan>

Comment 13 Worker Ant 2018-10-09 05:48:31 UTC
COMMIT: https://review.gluster.org/21282 committed in master by "Amar Tumballi" <amarts> with a commit message- syncop: check syncenv status before pthread_cond_timedwait() to avoid 600s timeout

If a syncenv_task starts after syncenv_destroy, the syncenv_task enters
a 600s timeout cond timedwait, and syncenv_destroy must waits it timeout.

Change-Id: I972a2b231e50cbebd3c71707800e58033e40c29d
updates: bz#1626313
Signed-off-by: Kinglong Mee <mijinlong>

Comment 14 Worker Ant 2018-10-10 05:49:52 UTC
COMMIT: https://review.gluster.org/21271 committed in master by "Amar Tumballi" <amarts> with a commit message- logging: fix file handle leak when calls glfs_set_logging more times

Closes the log file and reopens it to prevent leakage of file handles.

Change-Id: Idfaa479961bb0088004d0d5558bdb0eb32cff632
updates: bz#1626313
Signed-off-by: Kinglong Mee <mijinlong>

Comment 15 Worker Ant 2018-10-10 05:52:34 UTC
COMMIT: https://review.gluster.org/21270 committed in master by "Amar Tumballi" <amarts> with a commit message- socket: clear return value if error is going to be handled in event thread

Change-Id: Ibce94f282b0aafaa1ca60ab927a469b70595e81f
updates: bz#1626313
Signed-off-by: Zhang Huan <zhanghuan>

Comment 16 Worker Ant 2018-10-10 17:08:27 UTC
COMMIT: https://review.gluster.org/21219 committed in master by "Poornima G" <pgurusid> with a commit message- gfapi: fix crash of using uninitialized fs->ctx

0  0x00007fb3db3a2ee4 in pub_glfs_fini (fs=0x7f8977d63f00) at glfs.c:1236
1  0x00007fb3db3a2065 in pub_glfs_new (volname=0x7f80de4d4d40 "openfs1")
    at glfs.c:784
2  0x00007fb3db5cf089 in glusterfs_get_fs (params=...,
    up_ops=up_ops@entry=0x7fb3ca643130)
    at /usr/src/debug/nfs-ganesha/src/FSAL/FSAL_GLUSTER/export.c:889
3  0x00007fb3db5cf99a in glusterfs_create_export (
    fsal_hdl=0x7fb3db7e2490 <GlusterFS+112>, parse_node=0x7fb3ca6387d0,
    err_type=<optimized out>, up_ops=0x7fb3ca643130)
    at /usr/src/debug/nfs-ganesha/src/FSAL/FSAL_GLUSTER/export.c:1011
4  0x00007fb3e11c485f in mdcache_fsal_create_export (
    sub_fsal=0x7fb3db7e2490 <GlusterFS+112>,
    parse_node=parse_node@entry=0x7fb3ca6387d0,
    err_type=err_type@entry=0x7fb3c0bef140, super_up_ops=<optimized out>)
    at /usr/src/debug/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_main.c:281

(gdb) p errno
$1 = 12

Change-Id: I3dd5b84b52962ceb0b5d4f9b4f475bf4aa724292
updates: bz#1626313
Signed-off-by: Kinglong Mee <mijinlong>

Comment 17 Shyamsundar 2019-03-25 16:30:38 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report.

glusterfs-6.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2019-March/000120.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.