1626313 – fix glfs_fini related problems

Bug 1626313 - fix glfs_fini related problems

Summary: fix glfs_fini related problems

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	eventsapi
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-09-07 02:41 UTC by Kinglong Mee
Modified:	2019-03-25 16:30 UTC (History)
CC List:	1 user (show)
Fixed In Version:	glusterfs-6.0
Clone Of:
Environment:
Last Closed:	2019-03-25 16:30:38 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Gluster.org Gerrit	21217	0	None	Abandoned	rpc-clnt: get reference of rpc/transport when they are used	2019-02-06 19:34:42 UTC
Gluster.org Gerrit	21218	0	None	Abandoned	inode: wait all non-root inodes unrefenced when inode_table_destroy	2019-02-06 19:34:18 UTC

Description Kinglong Mee 2018-09-07 02:41:49 UTC

Description of problem:
glfs_fini return -1 when active threads exist in event_pool

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Worker Ant 2018-09-07 02:45:31 UTC

REVIEW: https://review.gluster.org/21111 (event: get time by gettimeofday as pthread_cond_timedwait using) posted (#1) for review on master by Kinglong Mee

Comment 2 Worker Ant 2018-09-17 13:43:19 UTC

COMMIT: https://review.gluster.org/21111 committed in master by "Shyamsundar Ranganathan" <srangana> with a commit message- event: get time by clock_gettime as pthread_cond_timedwait using

Debug information shows time() has a delay between seconds increase,
get the right time by clock_gettime as pthread_cond_timedwait using.

                         ret = pthread_cond_timedwait (&event_pool->cond,
                                                       &event_pool->mutex,
                                                       &sleep_till);
+                         gf_msg ("epoll", GF_LOG_INFO, 0,
+                                 LG_MSG_EXITED_EPOLL_THREAD,
+                                 "pthread_cond_timedwait %lu %p return %d (active %d:%d)",
+                                 sleep_till.tv_sec, event_pool, ret,
+                                 event_pool->activethreadcount, threadcount);
                 }
         }
         pthread_mutex_unlock (&event_pool->mutex);

[2018-09-06 18:33:57.000879] I [event.c:284:event_dispatch_destroy] 0-epoll: pthread_cond_timedwait 1536230037 0x1f48e60 return 110 (active 5:4)
[2018-09-06 18:33:57.000916] I [event.c:284:event_dispatch_destroy] 0-epoll: pthread_cond_timedwait 1536230037 0x1f48e60 return 110 (active 5:4)
[2018-09-06 18:33:57.000931] I [event.c:284:event_dispatch_destroy] 0-epoll: pthread_cond_timedwait 1536230037 0x1f48e60 return 110 (active 5:4)
[2018-09-06 18:33:57.000945] I [event.c:284:event_dispatch_destroy] 0-epoll: pthread_cond_timedwait 1536230037 0x1f48e60 return 110 (active 5:4)
[2018-09-06 18:33:57.000957] I [event.c:284:event_dispatch_destroy] 0-epoll: pthread_cond_timedwait 1536230037 0x1f48e60 return 110 (active 5:4)
[2018-09-06 18:33:57.000970] I [event.c:284:event_dispatch_destroy] 0-epoll: pthread_cond_timedwait 1536230037 0x1f48e60 return 110 (active 5:4)
[2018-09-06 18:33:57.000983] I [event.c:284:event_dispatch_destroy] 0-epoll: pthread_cond_timedwait 1536230037 0x1f48e60 return 110 (active 5:4)
[2018-09-06 18:33:57.000997] I [event.c:284:event_dispatch_destroy] 0-epoll: pthread_cond_timedwait 1536230037 0x1f48e60 return 110 (active 5:4)
[2018-09-06 18:33:57.001010] I [event.c:284:event_dispatch_destroy] 0-epoll: pthread_cond_timedwait 1536230037 0x1f48e60 return 110 (active 5:4)
[2018-09-06 18:33:57.001022] I [event.c:284:event_dispatch_destroy] 0-epoll: pthread_cond_timedwait 1536230037 0x1f48e60 return 110 (active 5:4)
[2018-09-06 18:33:57.001034] I [event.c:284:event_dispatch_destroy] 0-epoll: pthread_cond_timedwait 1536230037 0x1f48e60 return 110 (active 5:4)
[2018-09-06 18:33:57.001060] I [event.c:284:event_dispatch_destroy] 0-epoll: pthread_cond_timedwait 1536230037 0x1f48e60 return 110 (active 5:4)
[2018-09-06 18:33:57.003085] I [event.c:284:event_dispatch_destroy] 0-epoll: pthread_cond_timedwait 1536230038 0x1f48e60 return 0 (active 4:4)
[2018-09-06 18:33:57.014142] I [event.c:284:event_dispatch_destroy] 0-epoll: pthread_cond_timedwait 1536230038 0x1f48e60 return 0 (active 3:4)

Change-Id: I735249eee9a6f8284392b69e501479ac163b8409
fixes: bz#1626313
Signed-off-by: Kinglong Mee <mijinlong>

Comment 3 Worker Ant 2018-09-19 08:11:33 UTC

REVIEW: https://review.gluster.org/21218 (inode: wait all non-root inodes unrefenced when inode_table_destroy) posted (#1) for review on master by Kinglong Mee

Comment 4 Worker Ant 2018-09-19 08:12:44 UTC

REVIEW: https://review.gluster.org/21215 (libgfapi: fix use after freed of clnt when dispatching events) posted (#1) for review on master by Kinglong Mee

Comment 5 Worker Ant 2018-09-19 08:13:59 UTC

REVIEW: https://review.gluster.org/21217 (rpc-clnt: get reference of rpc/transport when they are used) posted (#1) for review on master by Kinglong Mee

Comment 6 Worker Ant 2018-09-19 08:15:19 UTC

REVIEW: https://review.gluster.org/21216 (rpc: failed requests immediately if rpc connection is down) posted (#1) for review on master by Kinglong Mee

Comment 7 Worker Ant 2018-09-19 08:16:33 UTC

REVIEW: https://review.gluster.org/21219 (gfapi: fix crash of using uninitialized fs->ctx) posted (#1) for review on master by Kinglong Mee

Comment 8 Worker Ant 2018-09-25 08:09:04 UTC

REVIEW: https://review.gluster.org/21270 (socket: clear return value if error is going to be handled in event thread) posted (#1) for review on master by Kinglong Mee

Comment 9 Worker Ant 2018-09-25 08:10:19 UTC

REVIEW: https://review.gluster.org/21271 (logging: fix file handle leak when calls glfs_set_logging more times) posted (#1) for review on master by Kinglong Mee

Comment 10 Worker Ant 2018-09-26 09:56:06 UTC

REVIEW: https://review.gluster.org/21282 (syncop: check syncenv status before pthread_cond_timedwait() to avoid 600s timeout) posted (#1) for review on master by Kinglong Mee

Comment 11 Worker Ant 2018-09-26 10:13:05 UTC

COMMIT: https://review.gluster.org/21215 committed in master by "Kinglong Mee" <kinglongmee> with a commit message- libgfapi: fix use after freed of clnt when dispatching events

Avoid dispatching events to mgmt after freed,
unreference mgmt after the event_dispatch_destroy.

Change-Id: I5b762b37901de70a955661df0aff95bf055ba4ea
updates: bz#1626313
Signed-off-by: Kinglong Mee <mijinlong>

Comment 12 Worker Ant 2018-09-27 03:04:43 UTC

COMMIT: https://review.gluster.org/21216 committed in master by "Amar Tumballi" <amarts> with a commit message- rpc: failed requests immediately if rpc connection is down

In the case glfs_fini is ongoing, some cache xlators like readdir-ahead,
continues to submit requests. Current rpc submit code ignores
connection status and queues these internally generated requests. These
requests then got cleaned up after inode table has been destroyed,
causing crash.

Change-Id: Ife6b17d8592a054f7a7f310c79d07af005087017
updates: bz#1626313
Signed-off-by: Zhang Huan <zhanghuan>

Comment 13 Worker Ant 2018-10-09 05:48:31 UTC

COMMIT: https://review.gluster.org/21282 committed in master by "Amar Tumballi" <amarts> with a commit message- syncop: check syncenv status before pthread_cond_timedwait() to avoid 600s timeout

If a syncenv_task starts after syncenv_destroy, the syncenv_task enters
a 600s timeout cond timedwait, and syncenv_destroy must waits it timeout.

Change-Id: I972a2b231e50cbebd3c71707800e58033e40c29d
updates: bz#1626313
Signed-off-by: Kinglong Mee <mijinlong>

Comment 14 Worker Ant 2018-10-10 05:49:52 UTC

COMMIT: https://review.gluster.org/21271 committed in master by "Amar Tumballi" <amarts> with a commit message- logging: fix file handle leak when calls glfs_set_logging more times

Closes the log file and reopens it to prevent leakage of file handles.

Change-Id: Idfaa479961bb0088004d0d5558bdb0eb32cff632
updates: bz#1626313
Signed-off-by: Kinglong Mee <mijinlong>

Comment 15 Worker Ant 2018-10-10 05:52:34 UTC

COMMIT: https://review.gluster.org/21270 committed in master by "Amar Tumballi" <amarts> with a commit message- socket: clear return value if error is going to be handled in event thread

Change-Id: Ibce94f282b0aafaa1ca60ab927a469b70595e81f
updates: bz#1626313
Signed-off-by: Zhang Huan <zhanghuan>

Comment 16 Worker Ant 2018-10-10 17:08:27 UTC

COMMIT: https://review.gluster.org/21219 committed in master by "Poornima G" <pgurusid> with a commit message- gfapi: fix crash of using uninitialized fs->ctx

0  0x00007fb3db3a2ee4 in pub_glfs_fini (fs=0x7f8977d63f00) at glfs.c:1236
1  0x00007fb3db3a2065 in pub_glfs_new (volname=0x7f80de4d4d40 "openfs1")
    at glfs.c:784
2  0x00007fb3db5cf089 in glusterfs_get_fs (params=...,
    up_ops=up_ops@entry=0x7fb3ca643130)
    at /usr/src/debug/nfs-ganesha/src/FSAL/FSAL_GLUSTER/export.c:889
3  0x00007fb3db5cf99a in glusterfs_create_export (
    fsal_hdl=0x7fb3db7e2490 <GlusterFS+112>, parse_node=0x7fb3ca6387d0,
    err_type=<optimized out>, up_ops=0x7fb3ca643130)
    at /usr/src/debug/nfs-ganesha/src/FSAL/FSAL_GLUSTER/export.c:1011
4  0x00007fb3e11c485f in mdcache_fsal_create_export (
    sub_fsal=0x7fb3db7e2490 <GlusterFS+112>,
    parse_node=parse_node@entry=0x7fb3ca6387d0,
    err_type=err_type@entry=0x7fb3c0bef140, super_up_ops=<optimized out>)
    at /usr/src/debug/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_main.c:281

(gdb) p errno
$1 = 12

Change-Id: I3dd5b84b52962ceb0b5d4f9b4f475bf4aa724292
updates: bz#1626313
Signed-off-by: Kinglong Mee <mijinlong>

Comment 17 Shyamsundar 2019-03-25 16:30:38 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report.

glusterfs-6.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2019-March/000120.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.