Bug 1381463
Summary: | RADOS bench crashes while doing sequential read operations | ||||||
---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Vasishta <vashastr> | ||||
Component: | RADOS | Assignee: | Loic Dachary <ldachary> | ||||
Status: | CLOSED ERRATA | QA Contact: | Vidushi Mishra <vimishra> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 1.3.3 | CC: | ceph-eng-bugs, dzafman, hnallurv, jdurgin, kchai, kdreyer | ||||
Target Milestone: | rc | ||||||
Target Release: | 2.3 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | RHEL: ceph-10.2.7-2.el7cp Ubuntu: ceph_10.2.7-3redhat1xenial | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-06-19 13:27:09 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
(gdb) bt bt #0 __memcmp_sse4_1 () at ../sysdeps/x86_64/multiarch/memcmp-sse4.S:69 #1 0x0000000000872829 in ObjBencher::seq_read_bench (this=0x7fffffffe730, seconds_to_run=20, num_objects=36, concurrentios=16, pid=1587, no_verify=false) at common/obj_bencher.cc:592 #2 0x00000000008703d5 in ObjBencher::aio_bench (this=0x7fffffffe730, operation=2, secondsToRun=20, maxObjectsToCreate=0, concurrentios=16, op_size=4194304, cleanup=true, run_name=0x0, no_verify=false) at common/obj_bencher.cc:208 Python Exception <class 'IndexError'> list index out of range: #3 0x000000000084b340 in rados_tool_common (opts=std::map with 1 elements, nargs=std::vector of length 3, capacity 8 = {...}) at tools/rados/rados.cc:2271 #4 0x0000000000850e61 in main (argc=6, argv=0x7fffffffecd8) at tools/rados/rados.cc:2732 (gdb) frame 1 frame 1 #1 0x0000000000872829 in ObjBencher::seq_read_bench (this=0x7fffffffe730, seconds_to_run=20, num_objects=36, concurrentios=16, pid=1587, no_verify=false) at common/obj_bencher.cc:592 592 if (memcmp(data.object_contents, cur_contents->c_str(), data.object_size) != 0) { (gdb) list list 587 // invalidate internal crc cache 588 cur_contents->invalidate_crc(); 589 590 if (!no_verify) { 591 snprintf(data.object_contents, data.object_size, "I'm the %16dth object!", current_index); 592 if (memcmp(data.object_contents, cur_contents->c_str(), data.object_size) != 0) { 593 cerr << name[slot] << " is not correct!" << std::endl; 594 ++errors; 595 } 596 } (gdb) print cur_contents->c_str() print cur_contents->c_str() $3 = 0x0 http://tracker.ceph.com/issues/17526 was incorrectly set to Resolved and not backported. This is now fixed and should be ready for 2.3. The rados man page update is included in 10.2.6. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1497 |
Created attachment 1207098 [details] out.txt - Description of problem: RADOS bench read crashes when tried on a pool, on which rados bench write command was executed from a different nodes. Version-Release number of selected component (if applicable): ceph version 0.94.9-3.el7cp How reproducible: Always Steps to Reproduce: 1. Create a pool from node 1. 2. Run rados bench write test on the newly created pool from node 1: rados bench -p <pool_name> 10 write --no-cleanup 3. From different node (node 2), run a sequential read test on same pool: rados bench -p <pool_name> 10 seq Actual results: (Please refer attachment - out.txt for entire log) +known_if_redirected e249) v5 -- ?+0 0x35f7090 con 0x35ca0d0 -5> 2016-10-03 12:13:13.112353 7fe13c2cd700 1 -- 10.8.128.111:0/3815556558 <== osd.5 10.8.128.110:6808/4307 2 ==== osd_op_reply(2 benchmark_data_magna111_19058_object0 [read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6 ==== 204+0+0 (3295514241 0 0) 0x7fe12c000ca0 con 0x35b0000 -4> 2016-10-03 12:13:13.112429 7fe13c2cd700 1 -- 10.8.128.111:0/3815556558 <== osd.5 10.8.128.110:6808/4307 3 ==== osd_op_reply(4 benchmark_data_magna111_19058_object2 [read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6 ==== 204+0+0 (1055835100 0 0) 0x7fe12c000bc0 con 0x35b0000 -3> 2016-10-03 12:13:13.112571 7fe13c2cd700 1 -- 10.8.128.111:0/3815556558 <== osd.5 10.8.128.110:6808/4307 4 ==== osd_op_reply(9 benchmark_data_magna111_19058_object7 [read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6 ==== 204+0+0 (2382207108 0 0) 0x7fe12c000bc0 con 0x35b0000 -2> 2016-10-03 12:13:13.112676 7fe13c2cd700 1 -- 10.8.128.111:0/3815556558 <== osd.5 10.8.128.110:6808/4307 5 ==== osd_op_reply(11 benchmark_data_magna111_19058_object9 [read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6 ==== 204+0+0 (614169484 0 0) 0x7fe12c000bc0 con 0x35b0000 -1> 2016-10-03 12:13:13.112833 7fe126ef8700 1 -- 10.8.128.111:0/3815556558 <== osd.4 10.8.128.110:6804/4088 1 ==== osd_op_reply(3 benchmark_data_magna111_19058_object1 [read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6 ==== 204+0+0 (1600347836 0 0) 0x7fe100000b00 con 0x35ca0d0 0> 2016-10-03 12:13:13.112853 7fe14766f7c0 -1 *** Caught signal (Segmentation fault) ** in thread 7fe14766f7c0 ceph version 0.94.9-3.el7cp (7358f71bebe44c463df4d91c2770149e812bbeaa) 1: rados() [0x4efb72] 2: (()+0xf100) [0x7fe144330100] 3: (()+0x15dd20) [0x7fe14348ed20] 4: (ObjBencher::seq_read_bench(int, int, int, int, bool)+0xbeb) [0x4e274b] 5: (ObjBencher::aio_bench(int, int, int, int, int, bool, char const*, bool)+0x307) [0x4e7ad7] 6: (main()+0x9195) [0x4c5f85] 7: (__libc_start_main()+0xf5) [0x7fe143352b15] 8: rados() [0x4ca549] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 keyvaluestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio -2/-2 (syslog threshold) 99/99 (stderr threshold) max_recent 500 max_new 1000 log_file --- end dump of recent events --- Expected results: rados bench read should execute without any errors. Observation - 'Rados bench sequential read' looking for objects with hostname of node from which read was executed but pool doesn't have any objects having name of this particular node.