Description of problem: Cluster creation task in USM fails because of some timeout. The root cause is ceph-installer task stuck for nearly ten hours. # date Tue May 10 10:21:27 CEST 2016 # ceph-installer task 797f6111-3f77-4906-bc0c-61227d9cf514 --> endpoint: /api/osd/configure --> succeeded: False --> stdout: None --> started: 2016-05-10 01:10:02.380725 --> exit_code: None --> ended: None --> command: /bin/ansible-playbook -v -u ceph-installer /usr/share/ceph-ansible/osd-configure.yml -i /tmp/797f6111-3f77-4906-bc0c-61227d9cf514_NN4aJM --extra-vars {"raw_journal_devices": ["/dev/vdb"], "devices": ["/dev/vde"], "cluster": "ceph", "ceph_stable_rh_storage_cdn_install": true, "public_network": "10.70.44.0/22", "fetch_directory": "/var/lib/ceph-installer/fetch", "cluster_network": "10.70.44.0/22", "journal_size": 5120, "raw_multi_journal": true, "fsid": "cce29303-f65d-4124-a7da-fdc286e1cebe", "ceph_stable_rh_storage": true} --skip-tags package-install --> stderr: None --> identifier: 797f6111-3f77-4906-bc0c-61227d9cf514 Version-Release number of selected component (if applicable): ceph-ansible-1.0.5-5.el7scon.noarch ceph-installer-1.0.6-1.el7scon.noarch ceph-base-10.2.0-1.el7cp.x86_64 ceph-common-10.2.0-1.el7cp.x86_64 ceph-osd-10.2.0-1.el7cp.x86_64 ceph-selinux-10.2.0-1.el7cp.x86_64 How reproducible: Not sure. Steps to Reproduce: 1. Run cluster creation from USM. Actual results: ceph-installer task is stuck for hours Expected results: ceph-installer task will finish in reasonable time, or timeout with proper problem explanation
Daniel to try to reproduce.
Closing this bug, because I wasn't able to reproduce it again. Prospectively I'll reopen it if find similar issue.
Reopening this Bug, as I saw same issue on two clusters today. The symptoms are similar as it was mentioned in the description. I have the cluster running so I'll try to debug it and gather as much data as I'll be able and compare it with the logs from comment 2. I'll try to update here shortly.
Created attachment 1161165 [details] /var/log/messages
To avoid races ceph-disk acquires a global lock, including ceph-disk list. If a ceph-disk hangs for some reason, the others will just wait for it to complete. You just need to find which ceph-disk is the oldest and what (strace or something) it is blocked on and you'll find what the cause of the problem is.
It seems like it is related only to OSD nodes with more data disks (OSDs) peer one node. (It passed twice on the same cluster from the same nodes, where it fails twice yesterday. The only change was using only 1 ceph data disk for OSD peer node.)
As it is visible in the comment 8, the problem begun with this flock process (pid 19721): > root 19721 0.0 0.0 107892 596 ? Ss 11:30 0:00 flock /var/lock/ceph-disk /usr/sbin/ceph-disk --verbose --log-stdout trigger --sync /dev/vdc1 > root 19724 0.0 0.6 211460 11556 ? S 11:30 0:00 /usr/bin/python /usr/sbin/ceph-disk --verbose --log-stdout trigger --sync /dev/vdc1 This is the subprocesses tree: flock,19721 /var/lock/ceph-disk /usr/sbin/ceph-disk --verbose --log-stdout trigger --sync /dev/vdc1 └─ceph-disk,19724 /usr/sbin/ceph-disk --verbose --log-stdout trigger --sync /dev/vdc1 └─ceph-disk,19752 /usr/sbin/ceph-disk --verbose activate /dev/vdc1 └─ceph-osd,19877 --cluster ceph --mkfs --mkkey -i 14 --monmap /var/lib/ceph/tmp/mnt.lsSCAp/activate.monmap --osd-data /var/lib/ceph/tmp/mnt.lsSCAp --osd-journal /var/lib/ceph/tmp/mnt.lsSCAp/journal --osd-uuid fb9716ea-19c6-40d3-9380-3cc14824fbbd --keyring /var/lib/ceph/tmp/mnt.lsSCAp/keyring --setuser ceph --setgroup ceph └─{ceph-osd},19881 It seems like the 'flock ...' command was launched by related ceph-disk@....service (ceph-disk in this example) not sure when and how, but it is defined in /usr/lib/systemd/system/ceph-disk@.service. When is the ceph-disk@...service created for particular disk?
You need to strace -p 19881 which is apparently blocked on something. You could also gdb -p 19881 to find out where it is at. And also lsof -p 19881 to show which files are open in case that may help. It's blocked on something but what ...
# strace -p 19877 Process 19877 attached restart_syscall(<... resuming interrupted call ...>) = -1 ETIMEDOUT (Connection timed out) futex(0x7fa8a7438820, FUTEX_WAIT_PRIVATE, 2, {0, 149352576}) = -1 ETIMEDOUT (Connection timed out) futex(0x7fa8a7438820, FUTEX_WAIT_PRIVATE, 2, {0, 244311408}) = -1 ETIMEDOUT (Connection timed out) futex(0x7fa8a7438820, FUTEX_WAIT_PRIVATE, 2, {0, 164106848}) = -1 ETIMEDOUT (Connection timed out) # lsof -p 19877 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME ceph-osd 19877 ceph cwd DIR 253,2 4096 128 / ceph-osd 19877 ceph rtd DIR 253,2 4096 128 / ceph-osd 19877 ceph txt REG 253,2 14520192 1896129 /usr/bin/ceph-osd ceph-osd 19877 ceph mem REG 253,2 753232 8417152 /usr/lib64/libsqlite3.so.0.8.6 ceph-osd 19877 ceph mem REG 253,2 261352 8663466 /usr/lib64/libsoftokn3.so ceph-osd 19877 ceph mem REG 253,2 61928 9441099 /usr/lib64/libnss_files-2.17.so ceph-osd 19877 ceph mem REG 253,2 11112 8428079 /usr/lib64/liburcu-common.so.1.0.0 ceph-osd 19877 ceph mem REG 253,2 28216 8428077 /usr/lib64/liburcu-cds.so.1.0.0 ceph-osd 19877 ceph mem REG 253,2 28448 8428075 /usr/lib64/liburcu-bp.so.1.0.0 ceph-osd 19877 ceph mem REG 253,2 40552 9735207 /usr/lib64/liblttng-ust-tracepoint.so.0.0.0 ceph-osd 19877 ceph mem REG 253,2 68192 8416251 /usr/lib64/libbz2.so.1.0.6 ceph-osd 19877 ceph mem REG 253,2 20032 8416169 /usr/lib64/libuuid.so.1.3.0 ceph-osd 19877 ceph mem REG 253,2 15768 8415975 /usr/lib64/libplds4.so ceph-osd 19877 ceph mem REG 253,2 20016 8415974 /usr/lib64/libplc4.so ceph-osd 19877 ceph mem REG 253,2 182056 8415976 /usr/lib64/libnssutil3.so ceph-osd 19877 ceph mem REG 253,2 48672 8624411 /usr/lib64/libunwind.so.8.0.1 ceph-osd 19877 ceph mem REG 253,2 2112384 8389556 /usr/lib64/libc-2.17.so ceph-osd 19877 ceph mem REG 253,2 88720 8388745 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 ceph-osd 19877 ceph mem REG 253,2 1141560 8389562 /usr/lib64/libm-2.17.so ceph-osd 19877 ceph mem REG 253,2 999936 8415978 /usr/lib64/libstdc++.so.6.0.19 ceph-osd 19877 ceph mem REG 253,2 106448 8624396 /usr/lib64/libboost_iostreams-mt.so.1.53.0 ceph-osd 19877 ceph mem REG 253,2 252704 8390765 /usr/lib64/libblkid.so.1.1.0 ceph-osd 19877 ceph mem REG 253,2 44096 9461559 /usr/lib64/librt-2.17.so ceph-osd 19877 ceph mem REG 253,2 11384 8624398 /usr/lib64/libboost_random-mt.so.1.53.0 ceph-osd 19877 ceph mem REG 253,2 15592 8513788 /usr/lib64/libboost_system-mt.so.1.53.0 ceph-osd 19877 ceph mem REG 253,2 94304 8584522 /usr/lib64/libboost_thread-mt.so.1.53.0 ceph-osd 19877 ceph mem REG 253,2 19520 8389559 /usr/lib64/libdl-2.17.so ceph-osd 19877 ceph mem REG 253,2 142304 9461549 /usr/lib64/libpthread-2.17.so ceph-osd 19877 ceph mem REG 253,2 251784 8415972 /usr/lib64/libnspr4.so ceph-osd 19877 ceph mem REG 253,2 1220152 9008119 /usr/lib64/libnss3.so ceph-osd 19877 ceph mem REG 253,2 255104 8428049 /usr/lib64/libfuse.so.2.9.2 ceph-osd 19877 ceph mem REG 253,2 23800 8790630 /usr/lib64/libsnappy.so.1.1.4 ceph-osd 19877 ceph mem REG 253,2 348928 9270607 /usr/lib64/libleveldb.so.1.0.7 ceph-osd 19877 ceph mem REG 253,2 90632 8416074 /usr/lib64/libz.so.1.2.7 ceph-osd 19877 ceph mem REG 253,2 6264 8584488 /usr/lib64/libaio.so.1.0.1 ceph-osd 19877 ceph mem REG 253,2 305136 8624415 /usr/lib64/libtcmalloc.so.4.2.6 ceph-osd 19877 ceph mem REG 253,2 164440 8663471 /usr/lib64/ld-2.17.so ceph-osd 19877 ceph mem REG 253,2 34742 997567 /etc/ld.so.cache ceph-osd 19877 ceph 0r CHR 1,3 0t0 1028 /dev/null ceph-osd 19877 ceph 1w FIFO 0,8 0t0 634603 pipe ceph-osd 19877 ceph 2w FIFO 0,8 0t0 634604 pipe ceph-osd 19877 ceph 3r REG 0,18 0 544896 /run/lock/ceph-disk ceph-osd 19877 ceph 4w REG 253,2 0 117481929 /var/lib/ceph/tmp/ceph-disk.activate.lock ceph-osd 19877 ceph 5w REG 253,2 0 41974958 /var/log/ceph/ceph-osd.14.log ceph-osd 19877 ceph 7r CHR 1,9 0t0 1033 /dev/urandom ceph-osd 19877 ceph 8r CHR 1,9 0t0 1033 /dev/urandom ceph-osd 19877 ceph 9r REG 253,2 2246 25176371 /usr/share/zoneinfo/Europe/Prague gdb (gdb) thread apply all bt Thread 2 (Thread 0x7fa8a30ab700 (LWP 19881)): #0 0x00007fa8a5facf4d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007fa8a5fa8d1d in _L_lock_840 () from /lib64/libpthread.so.0 #2 0x00007fa8a5fa8c3a in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007fa8a7442029 in tls_get_addr_tail () from /lib64/ld-linux-x86-64.so.2 #4 0x00007fa8a7201849 in GetStackTrace_libunwind(void**, int, int) () from /lib64/libtcmalloc.so.4 #5 0x00007fa8a72020be in GetStackTrace(void**, int, int) () from /lib64/libtcmalloc.so.4 #6 0x00007fa8a71f3314 in tcmalloc::PageHeap::GrowHeap(unsigned long) () from /lib64/libtcmalloc.so.4 #7 0x00007fa8a71f3633 in tcmalloc::PageHeap::New(unsigned long) () from /lib64/libtcmalloc.so.4 #8 0x00007fa8a71f1f64 in tcmalloc::CentralFreeList::Populate() () from /lib64/libtcmalloc.so.4 #9 0x00007fa8a71f2148 in tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) () from /lib64/libtcmalloc.so.4 #10 0x00007fa8a71f21dd in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /lib64/libtcmalloc.so.4 #11 0x00007fa8a71f5235 in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) () from /lib64/libtcmalloc.so.4 #12 0x00007fa8a7203d5b in tc_malloc () from /lib64/libtcmalloc.so.4 #13 0x00007fa8a45eb3d4 in __tzfile_read () from /lib64/libc.so.6 #14 0x00007fa8a45ea3c4 in tzset_internal () from /lib64/libc.so.6 #15 0x00007fa8a45ead73 in __tz_convert () from /lib64/libc.so.6 #16 0x00007fa8a7fbb93b in ceph::log::Log::_flush(ceph::log::EntryQueue*, ceph::log::EntryQueue*, bool) () #17 0x00007fa8a7fbbf49 in ceph::log::Log::flush() () #18 0x00007fa8a7fbc1be in ceph::log::Log::entry() () #19 0x00007fa8a5fa6dc5 in start_thread () from /lib64/libpthread.so.0 #20 0x00007fa8a4631ced in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7fa8a764a800 (LWP 19877)): #0 0x00007fa8a7201573 in base::internal::SpinLockDelay(int volatile*, int, int) () from /lib64/libtcmalloc.so.4 #1 0x00007fa8a7201447 in SpinLock::SlowLock() () from /lib64/libtcmalloc.so.4 #2 0x00007fa8a71f2078 in tcmalloc::CentralFreeList::Populate() () from /lib64/libtcmalloc.so.4 #3 0x00007fa8a71f2148 in tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) () from /lib64/libtcmalloc.so.4 #4 0x00007fa8a71f21dd in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /lib64/libtcmalloc.so.4 #5 0x00007fa8a71f5235 in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) () from /lib64/libtcmalloc.so.4 #6 0x00007fa8a7203d5b in tc_malloc () from /lib64/libtcmalloc.so.4 #7 0x00007fa8a3cedb0c in PL_HashTableRawAdd () from /lib64/libplds4.so #8 0x00007fa8a410a8ec in SECOID_Init () from /lib64/libnssutil3.so #9 0x00007fa8a245e2ed in sftk_startup_tests () from /lib64/libsoftokn3.so #10 0x00007fa8a74503a3 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2 #11 0x00007fa8a7454ab6 in dl_open_worker () from /lib64/ld-linux-x86-64.so.2 #12 0x00007fa8a74501b4 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2 #13 0x00007fa8a74541ab in _dl_open () from /lib64/ld-linux-x86-64.so.2 #14 0x00007fa8a5d9c02b in dlopen_doit () from /lib64/libdl.so.2 #15 0x00007fa8a74501b4 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2 #16 0x00007fa8a5d9c62d in _dlerror_run () from /lib64/libdl.so.2 #17 0x00007fa8a5d9c0c1 in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2 #18 0x00007fa8a61cfb8b in PR_LoadLibraryWithFlags () from /lib64/libnspr4.so #19 0x00007fa8a410a41f in loader_LoadLibInReferenceDir () from /lib64/libnssutil3.so #20 0x00007fa8a410a47d in PORT_LoadLibraryFromOrigin () from /lib64/libnssutil3.so #21 0x00007fa8a642fc2e in softoken_LoadDSO () from /lib64/libnss3.so #22 0x00007fa8a61d5b9a in PR_CallOnce () from /lib64/libnspr4.so #23 0x00007fa8a6437817 in secmod_LoadPKCS11Module () from /lib64/libnss3.so #24 0x00007fa8a64432ef in SECMOD_LoadModule () from /lib64/libnss3.so #25 0x00007fa8a64124cb in nss_Init () from /lib64/libnss3.so #26 0x00007fa8a6412ea1 in NSS_InitContext () from /lib64/libnss3.so #27 0x00007fa8a8091cb8 in ceph::crypto::init(CephContext*) () #28 0x00007fa8a808b229 in CephContext::init_crypto() () #29 0x00007fa8a809ae39 in common_init_finish(CephContext*) () #30 0x00007fa8a796c001 in main ()
It looks like this could be another instance of http://tracker.ceph.com/issues/13522
Daniel, how frequently do you run into this problem ? Was it just once (that's what I would expect) or more than once ?
Daniel, I see you may have ran twice into this problem. I write "may" because it is unclear if the problem that prompted the creation of this bug entry was indeed the same problem since the root cause was not found at the time.
Loic, I checked the logs from the first occurrence of this issue (comment 2, attachment 1155623 [details] - usm5-node3/machine_information) and it seems like the same issue: |-flock,30745 /var/lock/ceph-disk /usr/sbin/ceph-disk --verbose --log-stdout trigger --sync /dev/vde1 | `-ceph-disk,30746 /usr/sbin/ceph-disk --verbose --log-stdout trigger --sync /dev/vde1 | `-ceph-disk,30762 /usr/sbin/ceph-disk --verbose activate /dev/vde1 | `-ceph-osd,30921 --cluster ceph --mkfs --mkkey -i 10 --monmap /var/lib/ceph/tmp/mnt.lHmAb6/activate.monmap --osd-data /var/lib/ceph/tmp/mnt.lHmAb6 --osd-journal /var/lib/ceph/tmp/mnt.lHmAb6/journal --osd-uuid 7732940b-4c2b-482e-a32f-becf6a9b6633 --keyring /var/lib/ceph/tmp/mnt.lHmAb6/keyring --setuser ceph --setgroup ceph | `-{ceph-osd},30923 |-flock,30768 /var/lock/ceph-disk /usr/sbin/ceph-disk --verbose --log-stdout trigger --sync /dev/vde1 |-flock,30895 /var/lock/ceph-disk /usr/sbin/ceph-disk --verbose --log-stdout trigger --sync /dev/vde1 So I saw it once few weeks ago, and then few times this week (at least 4 or 5 times on at least 2 different sets of VMs in different environment). Yesterday and today I unsuccessfully tried 3 times to reproduce it on VMs in OpenStack environment with 1 data disk + 1 journal disk peer OSD VM. Today I was able to reproduce it again on the same VMs but with 2 Ceph data disks (+ 1 journal disk) peer each OSD VM. I launched the installation again on the configuration where it was reproducible, fell free to ping me on IRC, if it will help you to watch it directly there.
Daniel, thanks for the quick feedback. That's a lot more frequent than I expected. I'm inclined to suggest this should be marked as a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1339710 because I can't think of a viable workaround.
This is a blocker we need more data to decide if it remains one. frequency and Number of tests
Daniel, Would you please give an estimation of how many runs you done and not seen this issue and how many times you have? cheers, G
Gregory, On one specific configuration in OpenStack (2 OSDs peer node), it happened about 5-6 times from about 7-8 attempts in the last week. On the same cluster but with only 1 OSD peer node it didn't happened in neither from about 5-7 attempts. On another clusters (different environment) I saw it only once or twice if I remember it correctly, but I don't know from how much attempts. And it is also true, that most of the clusters where are testing my colleagues are with only 1 OSD peer node (because of there were another issues with more OSDs peer node in USM and they need slightly more stable environment for testing of another thinks).
@Daniel it would be great if you could try the following workaround https://github.com/ceph/ceph/pull/9343/files
Loic, I'll try to test it, but it takes some time.
Thanks Daniel. I've tested it to work, it's not holding up the merge or anything. Whenever you can confirm it improves things on your side, it would be most welcome.
Please see Comment 32. This is a blocker for Beta 1 and needs to be resolved ASAP.
I'm working on that, but it takes quite a lot of time and resources, which I have quite limited.
It seems like the occurrence of this issue depends also on outer conditions of the environment where it is tested, so while I was able to reproduce it 5 times out of 6 attempts previously, now the score is contrary - 1 occurrence out of 4 attempts - all o on the same environment with the same configuration as initially... In this state it's hard to confirm that the workaround works as expected as it is hard to recognise if it didn't happened because of the workaround or simply because it will not occur even without that patch.
Daniel, thanks for trying :-) It would be good enough if you can at least confirm that you ran X times with the patch included and never run into the issue. It does not mean the workaround actually works, but that's a step forward to confirm that it does.
https://github.com/ceph/ceph/pull/9343/commits has a revised commit that includes a test. The code itself is slighly different from 21caf964c05c41324f04dcc06e4ad7350e5cdf38 but only to the extent that it helps with testing. The logic is unchanged.
I'd still like to get some confidence that this code actually fixes the problem or that at the very least, it doesn't introduce any regressions..
It has been unit tested and is now in master. It will now be used in all integration tests and as time passes we'll be more certain that it does not introduce a regression.
I run our cluster creation tests on patched version of ceph_disk and later on ceph_disk from ceph-base-10.2.1-13.el7cp.x86_64 (where it is already patched) and I can confirm that I didn't noticed this issue again.
Merged into jewel. https://github.com/ceph/ceph/pull/9427
I've run similar scenarios multiple times in the last few weeks and this issue didn't occurred, so considering this as tested and verified. Latest run was on ceph 10.2.2-16.el7cp.x86_64. >> VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1755.html