+++ This bug was initially created as a clone of Bug #2005442 +++ D Environment: 3 node AWS cluster, 3 m5.xlarge VMs, Ceph 5.0 installation This mount command worked on the admin (bootstrap) host of the cluster: [root@ip-172-31-32-53 ceph]# ceph-fuse /mnt/cephfs/ -n client.admin --client-fs=cephfs01 2021-09-16T17:57:37.325+0000 7ffbb1858200 -1 init, newargv = 0x55d0f420e9e0 newargc=15 ceph-fuse[238730]: starting ceph client ceph-fuse[238730]: starting fuse [root@ip-172-31-32-53 ceph]# [root@ip-172-31-32-53 ceph]# df /mnt/cephfs Filesystem 1K-blocks Used Available Use% Mounted on ceph-fuse 946319360 0 946319360 0% /mnt/cephfs However, on the other two nodes within the Ceph cluster, these two methods to attempt mount caused this core dump. The ceph.conf and the keyrings were copied to /etc/ceph directory. [root@ip-172-31-42-149 share]# ceph-fuse /mnt/cephfs/ -n client.2 --client-fs=cephfs01 ceph-fuse: ../nptl/pthread_mutex_lock.c:81: __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed. Aborted (core dumped) [root@ip-172-31-42-149 share]# [root@ip-172-31-42-149 share]# ceph-fuse /mnt/cephfs/ -n client.admin --client-fs=cephfs01 ceph-fuse: ../nptl/pthread_mutex_lock.c:81: __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed. Aborted (core dumped) Kernel mounts on these two client nodes such as: # mount -t ceph ip-172-31-35-184:6789:/ /mnt/cephfs-kernel -o name=admin,fs=cphfs01 --- Additional comment from Patrick Donnelly on 2021-09-17 17:02:30 UTC --- (In reply to mcurrier from comment #0) > D > Environment: 3 node AWS cluster, 3 m5.xlarge VMs, Ceph 5.0 installation > > This mount command worked on the admin (bootstrap) host of the cluster: > > [root@ip-172-31-32-53 ceph]# ceph-fuse /mnt/cephfs/ -n client.admin > --client-fs=cephfs01 > 2021-09-16T17:57:37.325+0000 7ffbb1858200 -1 init, newargv = 0x55d0f420e9e0 > newargc=15 > ceph-fuse[238730]: starting ceph client > ceph-fuse[238730]: starting fuse > [root@ip-172-31-32-53 ceph]# > [root@ip-172-31-32-53 ceph]# df /mnt/cephfs > Filesystem 1K-blocks Used Available Use% Mounted on > ceph-fuse 946319360 0 946319360 0% /mnt/cephfs > > However, on the other two nodes within the Ceph cluster, these two methods > to attempt mount caused this core dump. The ceph.conf and the keyrings were > copied to /etc/ceph directory. > > [root@ip-172-31-42-149 share]# ceph-fuse /mnt/cephfs/ -n client.2 > --client-fs=cephfs01 > ceph-fuse: ../nptl/pthread_mutex_lock.c:81: __pthread_mutex_lock: Assertion > `mutex->__data.__owner == 0' failed. > Aborted (core dumped) > [root@ip-172-31-42-149 share]# > [root@ip-172-31-42-149 share]# ceph-fuse /mnt/cephfs/ -n client.admin > --client-fs=cephfs01 > ceph-fuse: ../nptl/pthread_mutex_lock.c:81: __pthread_mutex_lock: Assertion > `mutex->__data.__owner == 0' failed. > Aborted (core dumped) Can you verify the ceph-fuse versions were the same on both hosts? Do you have a coredump you can share? > Kernel mounts on these two client nodes such as: > > # mount -t ceph ip-172-31-35-184:6789:/ /mnt/cephfs-kernel -o > name=admin,fs=cphfs01 Do the kernel mounts succeed? --- Additional comment from on 2021-09-20 18:29:30 UTC --- Hello, Yes, the kernel mounts succeed. --- Additional comment from Patrick Donnelly on 2021-09-21 17:21:16 UTC --- > fs=cphfs01 is that a typo? In any case, please turn up debugging: > ceph config set client debug_client 20 > ceph config set client debug_ms 1 > ceph config set client debug_monc 10 and retry the ceph-fuse mounts. Please upload the logs. --- Additional comment from on 2021-09-22 12:55:34 UTC --- The version of Ceph-fuse is: [root@ip-172-31-32-53 ~]# ceph-fuse --version ceph version 16.2.0-117.el8cp (0e34bb74700060ebfaa22d99b7d2cdc037b28a57) pacific (stable) I did install same way on each of the three nodes. However checking the version on a second two non-bootstrap node shows the same core dump issue: [root@ip-172-31-42-149 ~]# ceph-fuse --verison ceph-fuse: ../nptl/pthread_mutex_lock.c:81: __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed. on 3rd node: [ec2-user@ip-172-31-40-206 ~]$ ceph-fuse --version ceph version 16.2.0-117.el8cp (0e34bb74700060ebfaa22d99b7d2cdc037b28a57) pacific (stable) The fs=cphfs01 is part of the command given. It is not a typo, it is the filesystem name. I will attach the core file and logs. --- Additional comment from on 2021-09-22 13:01:37 UTC --- --- Additional comment from on 2021-09-22 13:03:41 UTC --- --- Additional comment from on 2021-09-22 13:05:28 UTC --- --- Additional comment from on 2021-09-22 13:06:37 UTC --- --- Additional comment from on 2021-09-22 13:07:24 UTC --- --- Additional comment from on 2021-09-22 13:08:03 UTC --- --- Additional comment from on 2021-09-22 13:08:54 UTC --- --- Additional comment from Ben England on 2021-09-23 17:36:43 UTC --- Matt, next time create a tarball with all the logs and post 1 attachment, much easier. You can also create a SOS report using the command "sos report", That collects all the RHEL config in one big file automatically. -ben --- Additional comment from Venky Shankar on 2021-11-23 04:34:38 UTC --- (In reply to mcurrier from comment #5) > Created attachment 1825305 [details] > Bug 2005442 attachments Hey Matt, Did you miss updating with the client log (with `debug client = 20`)? In the meantime, I'll take a look at the core dump. Cheers, Venky --- Additional comment from on 2021-12-01 18:13:02 UTC --- Hi Venky, Sorry for late reply. I missed this earlier. I looked through my notes and I see I applied this: ceph config set client debug_client 20 I hope this helps. Matt --- Additional comment from Venky Shankar on 2021-12-03 05:04:53 UTC --- (In reply to mcurrier from comment #14) > Hi Venky, > > Sorry for late reply. I missed this earlier. > > I looked through my notes and I see I applied this: > ceph config set client debug_client 20 I cannot find the client logs in the attachment - just the core, mgr, audit, volume log. I couldn't get a clean backtrace from the core. Could you please check. > > I hope this helps. > Matt --- Additional comment from Venky Shankar on 2021-12-03 09:20:37 UTC --- Looks like a uninitialized mutex is being locked:: ``` #0 0x00007f2bbcd7537f in raise () from /lib64/libc.so.6 #1 0x00007f2bbcd5fdb5 in abort () from /lib64/libc.so.6 #2 0x00007f2bbcd5fc89 in __assert_fail_base.cold.0 () from /lib64/libc.so.6 #3 0x00007f2bbcd6da76 in __assert_fail () from /lib64/libc.so.6 #4 0x00007f2bbe323b61 in pthread_mutex_lock () from /lib64/libpthread.so.0 #5 0x000055ef2578bdb7 in ?? () #6 0x000055ef2652a880 in ?? () ``` I couldn't get other stack frames for some reason (I have the required packages installed though). Matt - client logs would really help (and/or if you can provide the backtrace through gdb too, that would be great). --- Additional comment from on 2021-12-03 17:08:26 UTC --- Hi Venky, This appears to no longer be an issue in this RHCS V5 cluster. I can now mount the ceph-fuse mount points on the other two hosts. I think we should close this bugzilla. [root@ip-172-31-42-149 testruns]# ll /etc/ceph total 24 -rw-------. 1 root root 63 Sep 16 15:47 ceph.client.admin.keyring -rw-r--r--. 1 root root 175 Sep 16 15:47 ceph.conf -rw-r--r--. 1 root root 184 Sep 16 14:30 ceph.client.2.keyring -rw-r--r--. 1 root root 41 Sep 16 15:59 ceph.client.2.keyring.tmp -rw-------. 1 root root 110 Dec 2 19:03 podman-auth.json -rw-r--r--. 1 root root 92 Sep 16 14:52 rbdmap 2021-12-03T17:02:49.004+0000 7fb9671f3200 -1 init, newargv = 0x5633521c6740 newargc=15 ceph-fuse[935873]: starting ceph client ceph-fuse[935873]: starting fuse [root@ip-172-31-42-149 testruns]# df Filesystem Type Size Used Avail Use% Mounted on devtmpfs devtmpfs 16G 0 16G 0% /dev tmpfs tmpfs 16G 84K 16G 1% /dev/shm tmpfs tmpfs 16G 1.6G 14G 11% /run tmpfs tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/nvme0n1p2 xfs 10G 6.5G 3.6G 65% / overlay overlay 10G 6.5G 3.6G 65% /var/lib/containers/storage/overlay/9d34db80512c92b9998999f25843420af062d13baa4958c1444283d5b53ae378/merged overlay overlay 10G 6.5G 3.6G 65% /var/lib/containers/storage/overlay/d224cd3efeb6a738da7d5bc96b70a9fb98aaf6b6e0ef77b467b4e3b07a6b840a/merged overlay overlay 10G 6.5G 3.6G 65% /var/lib/containers/storage/overlay/c653abe212907b403d124d56d2a1eb6420916cce645b6d1bdf5336bfab7b976d/merged overlay overlay 10G 6.5G 3.6G 65% /var/lib/containers/storage/overlay/a04b90fec89ba4190749dd06517ce7ff284c2934982c52ec4dd29d68c60da754/merged overlay overlay 10G 6.5G 3.6G 65% /var/lib/containers/storage/overlay/589f8ca22f4c66b15586afa03ee4989bbd65fa124f650b81c931759b6567319c/merged overlay overlay 10G 6.5G 3.6G 65% /var/lib/containers/storage/overlay/203bb3a0b0ba30347593a7ccb825461cda3a16d4f42c6065a6866ed7e124e3a9/merged tmpfs tmpfs 3.1G 0 3.1G 0% /run/user/0 172.31.32.53:6789:/ ceph 1.9T 142G 1.7T 8% /mnt/kernel-cephfs tmpfs tmpfs 3.1G 0 3.1G 0% /run/user/1000 overlay overlay 10G 6.5G 3.6G 65% /var/lib/containers/storage/overlay/ddcc4a2c56cb6050fc9252e7c7d2841c9ccd2a6b52df94e3a83f750e1608238c/merged ceph-fuse fuse.ceph-fuse 1.9T 142G 1.7T 8% /mnt/cephfs 2021-12-03T17:04:16.871+0000 7f6489dcf200 -1 init, newargv = 0x55f9f78c9930 newargc=15 ceph-fuse[1115149]: starting ceph client ceph-fuse[1115149]: starting fuse [root@ip-172-31-40-206 ~]# [root@ip-172-31-40-206 ~]# [root@ip-172-31-40-206 ~]# df Filesystem Type Size Used Avail Use% Mounted on devtmpfs devtmpfs 16G 0 16G 0% /dev tmpfs tmpfs 16G 84K 16G 1% /dev/shm tmpfs tmpfs 16G 1.6G 14G 11% /run tmpfs tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/nvme0n1p2 xfs 10G 7.1G 3.0G 71% / overlay overlay 10G 7.1G 3.0G 71% /var/lib/containers/storage/overlay/f19012f83cc96aee042f5e535ebb6acfea5abe5e7aac7e6b38af2d22d32aa283/merged overlay overlay 10G 7.1G 3.0G 71% /var/lib/containers/storage/overlay/32a0fc30124cc92c19e8b8b8991849cd8faf373219d305def02f7d5f22f39bae/merged overlay overlay 10G 7.1G 3.0G 71% /var/lib/containers/storage/overlay/e6cc3f32f914f06115d5129f78f1fc1f4804504da0c4886a1d37baa945102a3b/merged overlay overlay 10G 7.1G 3.0G 71% /var/lib/containers/storage/overlay/480cdbc4eac1d9db38e5a697c21a1d6f8fdc9946c2b575896d586c48f7ecb42b/merged overlay overlay 10G 7.1G 3.0G 71% /var/lib/containers/storage/overlay/e3d11731bde9033eaf00b7933cebfe223639345feb4365ab18e524757686a9d1/merged overlay overlay 10G 7.1G 3.0G 71% /var/lib/containers/storage/overlay/c112930d4c94ff9a8156172dea2b0b43497856aeadbd0f55dfb70c5abf2bd539/merged overlay overlay 10G 7.1G 3.0G 71% /var/lib/containers/storage/overlay/aa435dfbefef1fa0bb9a2deb1b126dc8b569c14dc9bc4c4df8aea7e388a3f365/merged overlay overlay 10G 7.1G 3.0G 71% /var/lib/containers/storage/overlay/550e0dab60d22208def86704ef7ceb01eedf241c43897cf02ff0dba7235be9e1/merged tmpfs tmpfs 3.1G 0 3.1G 0% /run/user/0 172.31.32.53:6789:/ ceph 1.9T 141G 1.7T 8% /mnt/kernel-cephfs tmpfs tmpfs 3.1G 0 3.1G 0% /run/user/1000 overlay overlay 10G 7.1G 3.0G 71% /var/lib/containers/storage/overlay/00b0dc5a7a3b9c316bd63b4733c8f9dc0bd2bd71601ebf294e0c126bbadadc8a/merged ceph-fuse fuse.ceph-fuse 1.9T 141G 1.7T 8% /mnt/cephfs [root@ip-172-31-40-206 ~]# --- Additional comment from Venky Shankar on 2021-12-03 17:36:31 UTC --- (In reply to mcurrier from comment #17) > Hi Venky, > > This appears to no longer be an issue in this RHCS V5 cluster. I can now > mount the ceph-fuse mount points on the other two hosts. I think we should > close this bugzilla. > ACK - please reopen if you hit it again. --- Additional comment from on 2023-08-07 08:25:56 UTC --- I'm observing the same issue described in this bug in my lab. * Ceph version: ceph versions { "mon": { "ceph version 16.2.10-172.el8cp (00a157ecd158911ece116ae43095de793ed9f389) pacific (stable)": 5 }, "mgr": { "ceph version 16.2.10-172.el8cp (00a157ecd158911ece116ae43095de793ed9f389) pacific (stable)": 2 }, "osd": { "ceph version 16.2.10-172.el8cp (00a157ecd158911ece116ae43095de793ed9f389) pacific (stable)": 3 }, "mds": { "ceph version 16.2.10-172.el8cp (00a157ecd158911ece116ae43095de793ed9f389) pacific (stable)": 2 }, "rgw": { "ceph version 16.2.10-172.el8cp (00a157ecd158911ece116ae43095de793ed9f389) pacific (stable)": 2 }, "overall": { "ceph version 16.2.10-172.el8cp (00a157ecd158911ece116ae43095de793ed9f389) pacific (stable)": 14 } } * The CephFS file system can be mounted correctly using the kernel client: df | grep cephfs 10.0.88.144:6789,10.0.88.193:6789,10.0.90.124:6789,10.0.91.14:6789,10.0.94.189:6789:/ 9547776 0 9547776 0% /mnt/cephfs * However, cephfs-fuse mount fails: # ceph-fuse -n client.cephfs --client_fs my-filesystem /mnt/ceph-fuse ceph-fuse: ../nptl/pthread_mutex_lock.c:81: __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed. Aborted (core dumped) # ceph-fuse version ceph-fuse: ../nptl/pthread_mutex_lock.c:81: __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed. Aborted (core dumped) * Backtrace of the core file: # coredumpctl info 546820 PID: 546820 (ceph-fuse) UID: 0 (root) GID: 0 (root) Signal: 6 (ABRT) Timestamp: Mon 2023-08-07 04:16:05 EDT (2min 13s ago) Command Line: ceph-fuse versions Executable: /usr/bin/ceph-fuse Control Group: /user.slice/user-1000.slice/session-7.scope Unit: session-7.scope Slice: user-1000.slice Session: 7 Owner UID: 1000 (quickcluster) Boot ID: 44877874079142e981164132ac16023a Machine ID: 29565e14434c4afb8b0afd9f014d71a5 Hostname: rgws-1.nravinargw1.lab.upshift.rdu2.redhat.com Storage: /var/lib/systemd/coredump/core.ceph-fuse.0.44877874079142e981164132ac16023a.546820.1691396165000000.lz4 Message: Process 546820 (ceph-fuse) of user 0 dumped core. Stack trace of thread 546820: #0 0x00007f66b42a2a4f raise (libc.so.6) #1 0x00007f66b4275db5 abort (libc.so.6) #2 0x00007f66b4275c89 __assert_fail_base.cold.0 (libc.so.6) #3 0x00007f66b429b3a6 __assert_fail (libc.so.6) #4 0x00007f66b583ccb1 __pthread_mutex_lock (libpthread.so.0) #5 0x0000563393237387 _Z15global_pre_initPKSt3mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES5_St4lessIS5_ESaISt4pairIKS5_S5_EEERSt6vectorIPKcSaISH_EEj18code_environment_ti (ceph-fuse) #6 0x0000563393239576 _Z11global_initPKSt3mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES5_St4lessIS5_ESaISt4pairIKS5_S5_EEERSt6vectorIPKcSaISH_EEj18code_environment_tib (ceph-fuse) #7 0x000056339314a01e main (ceph-fuse) #8 0x00007f66b428eca3 __libc_start_main (libc.so.6) #9 0x000056339315173e _start (ceph-fuse) (gdb) t a a bt Thread 1 (Thread 0x7f66bfaf6380 (LWP 546820)): #0 0x00007f66b42a2a4f in raise () from /lib64/libc.so.6 #1 0x00007f66b4275db5 in abort () from /lib64/libc.so.6 #2 0x00007f66b4275c89 in __assert_fail_base.cold.0 () from /lib64/libc.so.6 #3 0x00007f66b429b3a6 in __assert_fail () from /lib64/libc.so.6 #4 0x00007f66b583ccb1 in pthread_mutex_lock () from /lib64/libpthread.so.0 #5 0x0000563393237387 in global_pre_init(std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const*, std::vector<char const*, std::allocator<char const*> >&, unsigned int, code_environment_t, int) () #6 0x0000563393239576 in global_init(std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const*, std::vector<char const*, std::allocator<char const*> >&, unsigned int, code_environment_t, int, bool) () #7 0x000056339314a01e in main () * Please note that this issue is not attached to any customer-facing problem, as this was observed while I was working on my lab. Hence it's a low-priority request. Please, let me know any other information you might need. Thank you, Natalia --- Additional comment from Venky Shankar on 2023-08-14 04:44:03 UTC --- This looks like its locking an uninitialised mutex - checking. --- Additional comment from Venky Shankar on 2023-11-09 12:14:18 UTC --- Change is under review: https://github.com/ceph/ceph/pull/54433 Will be backported to RHCS6 and RHCS7 releases. --- Additional comment from Venky Shankar on 2023-11-10 10:09:28 UTC --- (In reply to Venky Shankar from comment #21) > Change is under review: https://github.com/ceph/ceph/pull/54433 > > Will be backported to RHCS6 and RHCS7 releases. Will be ported to RHCS5 too.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 6.1 security, bug fix, and enhancement updates.), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:5960