Description of problem: When a flatten is done on the clone during bemch-write the rbd crashes with a segmentation fault. Version-Release number of selected component (if applicable): Ceph 10.1.1 How reproducible: Always Steps to Reproduce: 1. Create rbd image, write some data, create a snap and protect it 2. ceate a clone , and create a snap on it. 3. start bench-write on the clone, and then start flatten. RBD crashes Actual results: rbd crash is seen Expected results: Even if this not a supported scenario, it should be handled gracefully. Additional info: [root@magna080 ~]# rbd create Tejas/b1 --size 20G --image-feature layering,deep-flatten,object-map,fast-diff,exclusive-lock [root@magna080 ~]# [root@magna080 ~]# [root@magna080 ~]# rbd ls -l Tejas NAME SIZE PARENT FMT PROT LOCK b1 20480M 2 cln 51200M Tejas/imgio@s1 2 imgio 51200M 2 imgio@s1 51200M 2 yes [root@magna080 ~]# [root@magna080 ~]# [root@magna080 ~]# [root@magna080 ~]# [root@magna080 ~]# rbd snap create Tejas/b1@s1 [root@magna080 ~]# [root@magna080 ~]# [root@magna080 ~]# rbd snap protect Tejas/b1@s1 [root@magna080 ~]# [root@magna080 ~]# [root@magna080 ~]# rbd clone Tejas/b1@s1 Tejas/cln1 --image-feature layering,deep-flatten,fast-diff,object-map,exclusive-lock [root@magna080 ~]# [root@magna080 ~]# rbd snap create Tejas/c1@s2 [root@magna080 ~]# [root@magna080 ~]# [root@magna080 ~]# rbd ls -l Tejas NAME SIZE PARENT FMT PROT LOCK b1 20480M 2 b1@s1 20480M 2 yes c1 20480M Tejas/b1@s1 2 c1@s2 20480M Tejas/b1@s1 2 cln 51200M Tejas/imgio@s1 2 imgio 51200M 2 imgio@s1 51200M 2 yes [root@magna080 ~]# [root@magna080 ~]# rbd bench-write Tejas/c1 bench-write io_size 4096 io_threads 16 bytes 1073741824 pattern sequential SEC OPS OPS/SEC BYTES/SEC 1 17760 15950.51 65333292.38 2 31429 15629.65 64019054.23 3 42086 13979.88 57261593.23 4 50589 12467.14 51065405.08 5 59024 11738.08 48079179.17 6 69654 10583.35 43349403.61 7 85448 10764.23 44090296.24 8 98077 11037.82 45210896.94 9 109663 11858.98 48574396.06 10 114938 10180.47 41699187.63 11 121117 10261.95 42032936.55 12 129627 8791.30 36009168.47 13 137002 7882.85 32288140.16 14 152782 8686.71 35580762.05 15 167413 11514.71 47164251.05 16 176866 11118.85 45542827.88 17 184264 11041.19 45224716.02 18 187511 9959.37 40793562.29 19 190675 7153.20 29299490.49 20 192821 5050.29 20685992.49 21 194953 3622.35 14837153.76 22 201266 3248.50 13305841.29 23 205507 3650.07 14950689.25 24 211847 4318.73 17689500.71 25 216092 4496.31 18416876.09 26 219299 4685.22 19190663.60 27 223551 4584.67 18778805.12 28 229907 4844.05 19841223.47 29 237270 5132.72 21023608.74 30 240447 5161.99 21143492.33 31 243645 4935.39 20215367.20 32 247864 4609.86 18882000.58 33 251023 4215.27 17265759.52 34 254227 3459.52 14170186.42 35 259476 3657.96 14982999.34 elapsed: 36 ops: 262144 ops/sec: 7136.75 bytes/sec: 29232112.21 *** Caught signal (Segmentation fault) ** in thread 7f5ee2bca700 thread_name:fn_anonymous ceph version 10.1.1-1.el7cp (61adb020219fbad4508050b5f0a792246ba74dae) 1: (()+0x1d8dea) [0x7f5f02302dea] 2: (()+0xf100) [0x7f5eee6ff100] 3: (()+0x221a04) [0x7f5ef880aa04] 4: (()+0x91b07) [0x7f5ef867ab07] 5: (()+0x82554) [0x7f5ef866b554] 6: (()+0x82a85) [0x7f5ef866ba85] 7: (()+0x134049) [0x7f5ef871d049] 8: (()+0x87516) [0x7f5ef8670516] 9: (()+0x8765b) [0x7f5ef867065b] 10: (()+0x71f29) [0x7f5ef865af29] 11: (()+0x815d7) [0x7f5ef866a5d7] 12: (()+0x89979) [0x7f5ef8672979] 13: (()+0x8bf0a) [0x7f5ef8674f0a] 14: (()+0x9d00d) [0x7f5eeef0d00d] 15: (()+0x85529) [0x7f5eeeef5529] 16: (()+0x16eb46) [0x7f5eeefdeb46] 17: (()+0x7dc5) [0x7f5eee6f7dc5] 18: (clone()+0x6d) [0x7f5eec80d28d] 2016-04-12 10:35:20.101837 7f5ee2bca700 -1 *** Caught signal (Segmentation fault) ** in thread 7f5ee2bca700 thread_name:fn_anonymous ceph version 10.1.1-1.el7cp (61adb020219fbad4508050b5f0a792246ba74dae) 1: (()+0x1d8dea) [0x7f5f02302dea] 2: (()+0xf100) [0x7f5eee6ff100] 3: (()+0x221a04) [0x7f5ef880aa04] 4: (()+0x91b07) [0x7f5ef867ab07] 5: (()+0x82554) [0x7f5ef866b554] 6: (()+0x82a85) [0x7f5ef866ba85] 7: (()+0x134049) [0x7f5ef871d049] 8: (()+0x87516) [0x7f5ef8670516] 9: (()+0x8765b) [0x7f5ef867065b] 10: (()+0x71f29) [0x7f5ef865af29] 11: (()+0x815d7) [0x7f5ef866a5d7] 12: (()+0x89979) [0x7f5ef8672979] 13: (()+0x8bf0a) [0x7f5ef8674f0a] 14: (()+0x9d00d) [0x7f5eeef0d00d] 15: (()+0x85529) [0x7f5eeeef5529] 16: (()+0x16eb46) [0x7f5eeefdeb46] 17: (()+0x7dc5) [0x7f5eee6f7dc5] 18: (clone()+0x6d) [0x7f5eec80d28d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- -14> 2016-04-12 10:34:43.260938 7f5f020f0d80 5 asok(0x7f5f0bdc6e00) register_command perfcounters_dump hook 0x7f5f0bdcbb90 -13> 2016-04-12 10:34:43.260949 7f5f020f0d80 5 asok(0x7f5f0bdc6e00) register_command 1 hook 0x7f5f0bdcbb90 -12> 2016-04-12 10:34:43.260954 7f5f020f0d80 5 asok(0x7f5f0bdc6e00) register_command perf dump hook 0x7f5f0bdcbb90 -11> 2016-04-12 10:34:43.260956 7f5f020f0d80 5 asok(0x7f5f0bdc6e00) register_command perfcounters_schema hook 0x7f5f0bdcbb90 -10> 2016-04-12 10:34:43.260959 7f5f020f0d80 5 asok(0x7f5f0bdc6e00) register_command 2 hook 0x7f5f0bdcbb90 -9> 2016-04-12 10:34:43.260961 7f5f020f0d80 5 asok(0x7f5f0bdc6e00) register_command perf schema hook 0x7f5f0bdcbb90 -8> 2016-04-12 10:34:43.260965 7f5f020f0d80 5 asok(0x7f5f0bdc6e00) register_command perf reset hook 0x7f5f0bdcbb90 -7> 2016-04-12 10:34:43.260981 7f5f020f0d80 5 asok(0x7f5f0bdc6e00) register_command config show hook 0x7f5f0bdcbb90 -6> 2016-04-12 10:34:43.260984 7f5f020f0d80 5 asok(0x7f5f0bdc6e00) register_command config set hook 0x7f5f0bdcbb90 -5> 2016-04-12 10:34:43.260990 7f5f020f0d80 5 asok(0x7f5f0bdc6e00) register_command config get hook 0x7f5f0bdcbb90 -4> 2016-04-12 10:34:43.260994 7f5f020f0d80 5 asok(0x7f5f0bdc6e00) register_command config diff hook 0x7f5f0bdcbb90 -3> 2016-04-12 10:34:43.260996 7f5f020f0d80 5 asok(0x7f5f0bdc6e00) register_command log flush hook 0x7f5f0bdcbb90 -2> 2016-04-12 10:34:43.261000 7f5f020f0d80 5 asok(0x7f5f0bdc6e00) register_command log dump hook 0x7f5f0bdcbb90 -1> 2016-04-12 10:34:43.261005 7f5f020f0d80 5 asok(0x7f5f0bdc6e00) register_command log reopen hook 0x7f5f0bdcbb90 0> 2016-04-12 10:35:20.101837 7f5ee2bca700 -1 *** Caught signal (Segmentation fault) ** in thread 7f5ee2bca700 thread_name:fn_anonymous ceph version 10.1.1-1.el7cp (61adb020219fbad4508050b5f0a792246ba74dae) 1: (()+0x1d8dea) [0x7f5f02302dea] 2: (()+0xf100) [0x7f5eee6ff100] 3: (()+0x221a04) [0x7f5ef880aa04] 4: (()+0x91b07) [0x7f5ef867ab07] 5: (()+0x82554) [0x7f5ef866b554] 6: (()+0x82a85) [0x7f5ef866ba85] 7: (()+0x134049) [0x7f5ef871d049] 8: (()+0x87516) [0x7f5ef8670516] 9: (()+0x8765b) [0x7f5ef867065b] 10: (()+0x71f29) [0x7f5ef865af29] 11: (()+0x815d7) [0x7f5ef866a5d7] 12: (()+0x89979) [0x7f5ef8672979] 13: (()+0x8bf0a) [0x7f5ef8674f0a] 14: (()+0x9d00d) [0x7f5eeef0d00d] 15: (()+0x85529) [0x7f5eeeef5529] 16: (()+0x16eb46) [0x7f5eeefdeb46] 17: (()+0x7dc5) [0x7f5eee6f7dc5] 18: (clone()+0x6d) [0x7f5eec80d28d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 0 lockdep 0/ 0 context 0/ 0 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 0 buffer 0/ 0 timer 0/ 0 filer 0/ 1 striper 0/ 0 objecter 0/ 0 rados 0/ 0 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 0 journaler 0/ 5 objectcacher 0/ 0 client 0/ 0 osd 0/ 0 optracker 0/ 0 objclass 0/ 0 filestore 0/ 0 journal 0/ 0 ms 0/ 0 mon 0/ 0 monc 0/ 0 paxos 0/ 0 tp 0/ 0 auth 1/ 5 crypto 0/ 0 finisher 0/ 0 heartbeatmap 0/ 0 perfcounter 0/ 0 rgw 1/10 civetweb 1/ 5 javaclient 0/ 0 asok 0/ 0 throttle 0/ 0 refs 1/ 5 xio 1/ 5 compressor 1/ 5 newstore 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 1/ 5 kinetic 1/ 5 fuse -2/-2 (syslog threshold) 99/99 (stderr threshold) max_recent 500 max_new 1000 log_file /var/log/rbd-clients/qemu-guest-16083.log --- end dump of recent events --- Segmentation fault (core dumped) [root@magna080 ~
Please attach the core dump or provide backtraces with symbols. I haven't been able to recreate the segfault, but I was able to create a different issue when running against a starved OSD.
Cancel that ... I believe I recreated the crash you witnessed. It occurs when "rbd bench-write" completes while the flatten is still in-progress.
Upstream PR: https://github.com/ceph/ceph/pull/8565
*** Bug 1326650 has been marked as a duplicate of this bug. ***
The above PR is present in v10.2.0.
Unable to reproduce the crash. Moving to Verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1755.html