Bug 1262982 - RFE: mon verify that the store is writeable before participating in election
RFE: mon verify that the store is writeable before participating in election
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RADOS (Show other bugs)
Unspecified Unspecified
unspecified Severity unspecified
: rc
: 1.3.2
Assigned To: Samuel Just
: FutureFeature
Depends On:
  Show dependency treegraph
Reported: 2015-09-14 15:45 EDT by Samuel Just
Modified: 2017-07-30 11:20 EDT (History)
9 users (show)

See Also:
Fixed In Version: RHEL: ceph-0.94.5-1.el7cp Ubuntu: ceph_0.94.5-2redhat1
Doc Type: Enhancement
Doc Text:
Feature: The Ceph monitor process verifies that the mon store is writable before participating in an election. Reason: Prior to this change, a monitor could suffer a disk error that would make its store unwritable, causing commit failures. Result: The Ceph monitor will now stop with an assert() if the leveldb store is unwritable.
Story Points: ---
Clone Of:
Last Closed: 2016-02-29 09:43:34 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

External Trackers
Tracker ID Priority Status Summary Last Updated
Ceph Project Bug Tracker 13200 None None None Never

  None (edit)
Description Samuel Just 2015-09-14 15:45:19 EDT
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:
Comment 2 Ken Dreyer (Red Hat) 2015-12-11 17:15:38 EST
https://github.com/ceph/ceph/pull/6144 was merged upstream, and will ship in v0.94.6. We will cherry-pick that change downstream to ceph-1.3-rhel-patches for RHCS 1.3.2.
Comment 3 Kefu Chai 2015-12-16 00:19:59 EST

i am afraid that we don't have a knob or trigger to set the read-only/read-write mode for a monstore.

as we know, the monstore is backed by leveldb by default. and the path to the underlying leveldb is something like: /var/lib/ceph/mon.foobar/store.db. and leveldb tends to create more tables (the .sst file) when more kv pairs are stored.

AFAIK, we can hardly revoke the write access of a running process to certain file once the file is opened. so probably we can

1. chmod -w /var/lib/ceph/mon.foobar/store.db
2. for i in `seq 100`; do ceph osd pool create test-pool-$i 1 1; done
3. watch -n1 -d ceph -s # check the status of quorum, we will see "1 mons down", when more and more pools are being created.

and by checking the log of the dead mon, we will see 

 FAILED assert(0 == "failed to write to db")

simply truncating the store.db/*.log to zero will crash leveldb. so it's not an option.
Comment 6 shylesh 2016-02-04 10:00:41 EST
I was not able to bring down the mon by following above steps, kefu wanted to try reproducing again on his setup . I will try verifying this bug  once kefu confirms the exact steps.


Could you please update this bug with exact steps to verify.
Comment 7 Kefu Chai 2016-02-05 04:47:55 EST

- the sstables (underlying table files) in store.db are opened and mmap'ed by leveldb. so we can not take the read/write privilege to them from monitor at runtime
- the background compaction does fail, and the "failed to write to db" assertion is triggered, if we "chmod -w store.db". and leveldb complains in this case, but it does not return error on Write() unless "leveldb_paranoid" is true, but this option is "false" by default.
- if we set a limit for the monitor process using ulimit, it will receive a signal of SIGXFSZ, when it breaches its quota, and the process is terminated, as we don't handle it.

maybe we should do it in the hardway, by creating a loopback fs image with a local file
and mount it to the store.db. the size of which should be small enough so we don't need to wait too long before the assertion failure pops up, say 1MB.
Comment 8 Kefu Chai 2016-02-09 10:46:27 EST
set var r=-1

set var r=-1
Comment 9 shylesh 2016-02-09 11:53:54 EST
With the help of kefu I followed the following steps to verify the behaviour.

1. Attach gdb to the mon process
2. put the break points in the mon as mentioned in comment 8.
3. when break point is hit set the r value to -1 then continue.

Mon dies with SIGABRT.

  -1> 2016-02-09 15:50:31.640532 7fef59325700  2 -- >> :/0 pipe(0x4347000 sd=28 :6789 s=4 pgs=0 cs=0
 l=0 c=0x45747e0).fault (0) Success
     0> 2016-02-09 15:50:31.646012 7fef5cd34700 -1 mon/MonitorDBStore.h: In function 'int MonitorDBStore::apply_transactio
n(MonitorDBStore::TransactionRef)' thread 7fef5cd34700 time 2016-02-09 15:50:31.632626
mon/MonitorDBStore.h: 295: FAILED assert(0 == "failed to write to db")

 ceph version 0.94.5-8.el7cp (deef183a81111fa5e128ec88c90a32c9587c615d)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7b8ba5]
 2: (MonitorDBStore::apply_transaction(std::tr1::shared_ptr<MonitorDBStore::Transaction>)+0x6c9) [0x564c79]
 3: (Elector::bump_epoch(unsigned int)+0x168) [0x6917d8]
 4: (Elector::handle_propose(MMonElection*)+0x200) [0x691c40]
 5: (Elector::dispatch(Message*)+0xc63) [0x6956f3]
 6: (Monitor::dispatch(MonSession*, Message*, bool)+0x6bb) [0x596eeb]
 7: (Monitor::_ms_dispatch(Message*)+0x1a6) [0x5975d6]
 8: (Monitor::ms_dispatch(Message*)+0x23) [0x5b6ad3]
 9: (DispatchQueue::entry()+0x62a) [0x8a8afa]
 10: (DispatchQueue::DispatchThread::entry()+0xd) [0x7a0ded]
 11: (()+0x7dc5) [0x7fef63d4fdc5]
 12: (clone()+0x6d) [0x7fef6283021d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Kefu had concerns regarding the verification approach using GDB. If somebody can confirm that this way of verifying the bug using GDB is ok and also the results  then I will mark this bug as verified.  


Could you please draw conclusion to this ??
Comment 10 Ken Dreyer (Red Hat) 2016-02-09 12:37:03 EST
That approach sounds fine to me Shylesh. We expect the monitor to die with that assert (see the doc text field).
Comment 11 shylesh 2016-02-09 12:44:08 EST
verified on ceph-0.94.5-8.el7cp.x86_64
Comment 13 errata-xmlrpc 2016-02-29 09:43:34 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.