Bug 1262982 - RFE: mon verify that the store is writeable before participating in election
Summary: RFE: mon verify that the store is writeable before participating in election
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 1.2.3
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: 1.3.2
Assignee: Samuel Just
QA Contact: ceph-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-09-14 19:45 UTC by Samuel Just
Modified: 2022-02-21 18:36 UTC (History)
9 users (show)

Fixed In Version: RHEL: ceph-0.94.5-1.el7cp Ubuntu: ceph_0.94.5-2redhat1
Doc Type: Enhancement
Doc Text:
Feature: The Ceph monitor process verifies that the mon store is writable before participating in an election. Reason: Prior to this change, a monitor could suffer a disk error that would make its store unwritable, causing commit failures. Result: The Ceph monitor will now stop with an assert() if the leveldb store is unwritable.
Clone Of:
Environment:
Last Closed: 2016-02-29 14:43:34 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 13200 0 None None None Never
Red Hat Issue Tracker RHCEPH-3489 0 None None None 2022-02-21 18:36:12 UTC
Red Hat Product Errata RHBA-2016:0313 0 normal SHIPPED_LIVE Red Hat Ceph Storage 1.3.2 bug fix and enhancement update 2016-02-29 19:37:43 UTC

Description Samuel Just 2015-09-14 19:45:19 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Ken Dreyer (Red Hat) 2015-12-11 22:15:38 UTC
https://github.com/ceph/ceph/pull/6144 was merged upstream, and will ship in v0.94.6. We will cherry-pick that change downstream to ceph-1.3-rhel-patches for RHCS 1.3.2.

Comment 3 Kefu Chai 2015-12-16 05:19:59 UTC
Harish, 

i am afraid that we don't have a knob or trigger to set the read-only/read-write mode for a monstore.

as we know, the monstore is backed by leveldb by default. and the path to the underlying leveldb is something like: /var/lib/ceph/mon.foobar/store.db. and leveldb tends to create more tables (the .sst file) when more kv pairs are stored.

AFAIK, we can hardly revoke the write access of a running process to certain file once the file is opened. so probably we can

1. chmod -w /var/lib/ceph/mon.foobar/store.db
2. for i in `seq 100`; do ceph osd pool create test-pool-$i 1 1; done
3. watch -n1 -d ceph -s # check the status of quorum, we will see "1 mons down", when more and more pools are being created.



and by checking the log of the dead mon, we will see 

 FAILED assert(0 == "failed to write to db")

simply truncating the store.db/*.log to zero will crash leveldb. so it's not an option.

Comment 6 shylesh 2016-02-04 15:00:41 UTC
I was not able to bring down the mon by following above steps, kefu wanted to try reproducing again on his setup . I will try verifying this bug  once kefu confirms the exact steps.

@kefu,

Could you please update this bug with exact steps to verify.

Comment 7 Kefu Chai 2016-02-05 09:47:55 UTC
@shylesh,


- the sstables (underlying table files) in store.db are opened and mmap'ed by leveldb. so we can not take the read/write privilege to them from monitor at runtime
- the background compaction does fail, and the "failed to write to db" assertion is triggered, if we "chmod -w store.db". and leveldb complains in this case, but it does not return error on Write() unless "leveldb_paranoid" is true, but this option is "false" by default.
- if we set a limit for the monitor process using ulimit, it will receive a signal of SIGXFSZ, when it breaches its quota, and the process is terminated, as we don't handle it.

maybe we should do it in the hardway, by creating a loopback fs image with a local file
and mount it to the store.db. the size of which should be small enough so we don't need to wait too long before the assertion failure pops up, say 1MB.

Comment 8 Kefu Chai 2016-02-09 15:46:27 UTC
MonitorDBStore.h:285,
set var r=-1

MonitorDBStore.h:570,
set var r=-1

Comment 9 shylesh 2016-02-09 16:53:54 UTC
With the help of kefu I followed the following steps to verify the behaviour.

1. Attach gdb to the mon process
2. put the break points in the mon as mentioned in comment 8.
3. when break point is hit set the r value to -1 then continue.


Result:
====
Mon dies with SIGABRT.

  -1> 2016-02-09 15:50:31.640532 7fef59325700  2 -- 10.8.128.105:6789/0 >> :/0 pipe(0x4347000 sd=28 :6789 s=4 pgs=0 cs=0
 l=0 c=0x45747e0).fault (0) Success
     0> 2016-02-09 15:50:31.646012 7fef5cd34700 -1 mon/MonitorDBStore.h: In function 'int MonitorDBStore::apply_transactio
n(MonitorDBStore::TransactionRef)' thread 7fef5cd34700 time 2016-02-09 15:50:31.632626
mon/MonitorDBStore.h: 295: FAILED assert(0 == "failed to write to db")

 ceph version 0.94.5-8.el7cp (deef183a81111fa5e128ec88c90a32c9587c615d)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7b8ba5]
 2: (MonitorDBStore::apply_transaction(std::tr1::shared_ptr<MonitorDBStore::Transaction>)+0x6c9) [0x564c79]
 3: (Elector::bump_epoch(unsigned int)+0x168) [0x6917d8]
 4: (Elector::handle_propose(MMonElection*)+0x200) [0x691c40]
 5: (Elector::dispatch(Message*)+0xc63) [0x6956f3]
 6: (Monitor::dispatch(MonSession*, Message*, bool)+0x6bb) [0x596eeb]
 7: (Monitor::_ms_dispatch(Message*)+0x1a6) [0x5975d6]
 8: (Monitor::ms_dispatch(Message*)+0x23) [0x5b6ad3]
 9: (DispatchQueue::entry()+0x62a) [0x8a8afa]
 10: (DispatchQueue::DispatchThread::entry()+0xd) [0x7a0ded]
 11: (()+0x7dc5) [0x7fef63d4fdc5]
 12: (clone()+0x6d) [0x7fef6283021d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


Kefu had concerns regarding the verification approach using GDB. If somebody can confirm that this way of verifying the bug using GDB is ok and also the results  then I will mark this bug as verified.  

@Ken,

Could you please draw conclusion to this ??

Comment 10 Ken Dreyer (Red Hat) 2016-02-09 17:37:03 UTC
That approach sounds fine to me Shylesh. We expect the monitor to die with that assert (see the doc text field).

Comment 11 shylesh 2016-02-09 17:44:08 UTC
verified on ceph-0.94.5-8.el7cp.x86_64

Comment 13 errata-xmlrpc 2016-02-29 14:43:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:0313


Note You need to log in before you can comment on or make changes to this bug.