Bug 1636570 - Cores due to SIGILL during multiplex regression tests
Summary: Cores due to SIGILL during multiplex regression tests
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: posix
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-05 17:49 UTC by Shyamsundar
Modified: 2019-03-25 16:31 UTC (History)
3 users (show)

Fixed In Version: glusterfs-6.0
Clone Of:
Environment:
Last Closed: 2019-03-25 16:31:17 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gluster.org Gerrit 21717 0 None Merged posix: posix_health_check_thread_proc crash due to priv is NULL 2018-12-11 11:09:21 UTC

Description Shyamsundar 2018-10-05 17:49:56 UTC
Description of problem:

Brick-mux enabled regressions fail with a core pattern as below,

Job: https://build.gluster.org/job/regression-test-with-multiplex/886/consoleText

Core was generated by `/build/install/sbin/glusterfsd -s builder102.cloud.gluster.org --volfile-id pat'.
Program terminated with signal 4, Illegal instruction.
#0  0x00007f228e098a44 in _xbegin () at ../nptl/sysdeps/unix/sysv/linux/x86/hle.h:53
53	  asm volatile (".byte 0xc7,0xf8 ; .long 0" : "+a" (ret) :: "memory");
(gdb) bt
#0  0x00007f228e098a44 in _xbegin () at ../nptl/sysdeps/unix/sysv/linux/x86/hle.h:53
#1  __lll_lock_elision (futex=0x7f22505adb50, adapt_count=0x7f22505adb66, private=128) at ../nptl/sysdeps/unix/sysv/linux/x86/elision-lock.c:62
#2  0x00007f22818f54e4 in posix_health_check_thread_proc (data=0x7f2250001cc0) at /home/jenkins/root/workspace/regression-test-with-multiplex/xlators/storage/posix/src/posix-helpers.c:1940
#3  0x00007f228e08fe25 in start_thread (arg=0x7f222e7fc700) at pthread_create.c:308
#4  0x00007f228d958bad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113


Job: https://build.gluster.org/job/regression-test-with-multiplex/889/consoleText

Core was generated by `/build/install/sbin/glusterfsd -s builder108.cloud.gluster.org --volfile-id pat'.
Program terminated with signal 4, Illegal instruction.
#0  0x00007fae922b7a44 in _xbegin () at ../nptl/sysdeps/unix/sysv/linux/x86/hle.h:53
53	  asm volatile (".byte 0xc7,0xf8 ; .long 0" : "+a" (ret) :: "memory");
(gdb) bt
#0  0x00007fae922b7a44 in _xbegin () at ../nptl/sysdeps/unix/sysv/linux/x86/hle.h:53
#1  __lll_lock_elision (futex=0x7fae4c1ad950, adapt_count=0x7fae4c1ad966, private=128) at ../nptl/sysdeps/unix/sysv/linux/x86/elision-lock.c:62
#2  0x00007fae85b144e4 in posix_health_check_thread_proc (data=0x7fae4c001cc0) at /home/jenkins/root/workspace/regression-test-with-multiplex/xlators/storage/posix/src/posix-helpers.c:1940
#3  0x00007fae922aee25 in start_thread (arg=0x7fae337fe700) at pthread_create.c:308
#4  0x00007fae91b77bad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

This happens with test (in the above 2 cases, I have observed a few others before but unsure if it is the same test), ./tests/basic/ec/ec-5-2.t

On investigation it is noted that possibly posix_fini has been called on this instance of a brick in the multiplexed scenario and that posix prov is free'd, thus an access to the lock to update posix priv crashes with SIGILL.

Here is a short analysis of priv:
(gdb) f 2
#2  0x00007fae85b144e4 in posix_health_check_thread_proc (data=0x7fae4c001cc0) at /home/jenkins/root/workspace/regression-test-with-multiplex/xlators/storage/posix/src/posix-helpers.c:1940
1940	/home/jenkins/root/workspace/regression-test-with-multiplex/xlators/storage/posix/src/posix-helpers.c: No such file or directory.
(gdb) p priv
$5 = (struct posix_private *) 0x7fae4c1ad940
(gdb) p *priv
$6 = {base_path = 0x7fae4c062300 "/d/backends/patchy2", base_path_length = 1275470720, path_max = 32686, lock = {spinlock = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 1275465376, 
        __nusers = 32686, __kind = 1276832128, __spins = 32686, __elision = 0, __list = {__prev = 0x7fae4c047420, __next = 0x7fae4c1b68a0}}, 
      __size = "\000\000\000\000\000\000\000\000\240\016\006L\256\177\000\000\200\351\032L\256\177\000\000 t\004L\256\177\000\000\240h\033L\256\177\000", __align = 0}}, hostname = 0x7fae4c01a0a8 "\004", 
  prev_fetch_time = {tv_sec = 140387448712816, tv_usec = 140387426333216}, init_time = {tv_sec = 140387426334176, tv_usec = 0}, last_landfill_check = 140386576507056, janitor_sleep_duration = 1275470000, 
  janitor_fds = {next = 0x7fae7e8d4be1 <fini>, prev = 0x7fae7e8d4abe <init>}, janitor_cond = {__data = {__lock = 0, __futex = 0, __total_seq = 140387424225926, __wakeup_seq = 0, __woken_seq = 140387772779144, 
      __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = "\000\000\000\000\000\000\000\000\206J\215~\256\177\000\000\000\000\000\000\000\000\000\000\210\312S\223\256\177", '\000' <repeats 17 times>, 
    __align = 0}, janitor_lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, 
    __size = '\000' <repeats 39 times>, __align = 0}, read_value = 0, write_value = 0, nr_files = 0, export_statfs = false, o_direct = false, background_unlink = false, janitor = 0, janitor_present = false, 
  trash_path = 0x0, mount_lock = 0x0, handledir = {st_dev = 0, st_ino = 0, st_nlink = 0, st_mode = 0, st_uid = 0, st_gid = 0, __pad0 = 0, st_rdev = 0, st_size = 0, st_blksize = 0, st_blocks = 0, st_atim = {
      tv_sec = 0, tv_nsec = 0}, st_mtim = {tv_sec = 0, tv_nsec = 0}, st_ctim = {tv_sec = 0, tv_nsec = 0}, __unused = {0, 0, 0}}, glusterd_uuid = '\000' <repeats 15 times>, aio_configured = false, 
  aio_init_done = false, aio_capable = false, ctxp = 0x0, aiothread = 0, node_uuid_pathinfo = false, fsyncer = 0, fsyncs = {next = 0x0, prev = 0x0}, fsync_mutex = {__data = {__lock = 0, __count = 0, 
      __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, fsync_cond = {__data = {__lock = 0, 
      __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, __align = 0}, fsync_queue_count = 0, 
  batch_fsync_mode = BATCH_NONE, batch_fsync_delay_usec = 0, update_pgfid_nlinks = false, gfid2path = false, gfid2path_sep = "\000\000\000\000\000\000\000", health_check_interval = 0, health_check_timeout = 0, 
  health_check = 0, health_check_active = false, disk_reserve = 0, disk_space_full = 0, disk_space_check = 0, disk_space_check_active = false, shared_brick_count = 0, disable_landfill_purge = false, 
  force_create_mode = 0, force_directory_mode = 0, create_mask = 0, create_directory_mask = 0, max_hardlinks = 0, fips_mode_rchecksum = false, ctime = false}
(gdb) p priv->janitor
$7 = 0
(gdb) p priv->fsyncer 
$8 = 0

On further checking the code, it seems that this is entirely possible as the posix_health_check_thread_proc thread, is sent a cancel, and fini work continues. In case the thread picks up the cancel state late, it may so happen that the private structures are already free'd.

The manner of handling thread active state and updating the same within this thread routine, seems prone to races even without brick multiplex (on code inspection).

The above needs correction to ensure that this does not crash the brick process.

Comment 1 Nithya Balachandran 2018-11-26 08:21:53 UTC
Correct. 


in posix_fini:
    if (priv->health_check_active) {
        priv->health_check_active = _gf_false;
        pthread_cancel(priv->health_check);    <--- this is not synchronous, so posix_fini will continue after calling this. From the man page:
"the return status of pthread_cancel() merely informs the caller whether the cancellation request was successfully queued."
        priv->health_check = 0;
    }

In posix_health_check_thread_proc ()

    while (1) {
        /* aborting sleep() is a request to exit this thread, sleep()
         * will normally not return when cancelled */
        ret = sleep(interval);
        if (ret > 0)
            break;
        /* prevent thread errors while doing the health-check(s) */
        pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL);  
   <---- if the pthread_cancel is sent here, the thread will continue running until the cancel state is enabled again or it hits a cancellation point.
        /* Do the health-check.*/
        ret = posix_fs_health_check(this);
        if (ret < 0 && priv->health_check_active)
            goto abort;
        if (!priv->health_check_active)
            goto out;
        pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
    }



posix_fini needs to wait for the thread to exit before it cleans up priv. The thread is a detached thread so one easy way to solve this is to change that to a joinable thread and wait until it returns before proceeding with the fini.

Comment 2 Worker Ant 2018-11-26 09:20:05 UTC
REVIEW: https://review.gluster.org/21717 (posix: posix_health_check_thread_proc crash due to priv is NULL) posted (#1) for review on master by MOHIT AGRAWAL

Comment 3 Worker Ant 2018-12-11 11:09:21 UTC
REVIEW: https://review.gluster.org/21717 (posix: posix_health_check_thread_proc crash due to priv is NULL) posted (#9) for review on master by Amar Tumballi

Comment 4 Shyamsundar 2019-03-25 16:31:17 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report.

glusterfs-6.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2019-March/000120.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.