1262982 – RFE: mon verify that the store is writeable before participating in election

Bug 1262982 - RFE: mon verify that the store is writeable before participating in election

Summary: RFE: mon verify that the store is writeable before participating in election

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RADOS
Sub Component:
Version:	1.2.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	1.3.2
Assignee:	Samuel Just
QA Contact:	ceph-qe-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-09-14 19:45 UTC by Samuel Just
Modified:	2022-02-21 18:36 UTC (History)
CC List:	9 users (show)
Fixed In Version:	RHEL: ceph-0.94.5-1.el7cp Ubuntu: ceph_0.94.5-2redhat1
Doc Type:	Enhancement
Doc Text:	Feature: The Ceph monitor process verifies that the mon store is writable before participating in an election. Reason: Prior to this change, a monitor could suffer a disk error that would make its store unwritable, causing commit failures. Result: The Ceph monitor will now stop with an assert() if the leveldb store is unwritable.
Clone Of:
Environment:
Last Closed:	2016-02-29 14:43:34 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	13200	None	None	None	Never
Red Hat Issue Tracker	RHCEPH-3489	None	None	None	2022-02-21 18:36:12 UTC
Red Hat Product Errata	RHBA-2016:0313	normal	SHIPPED_LIVE	Red Hat Ceph Storage 1.3.2 bug fix and enhancement update	2016-02-29 19:37:43 UTC

Description Samuel Just 2015-09-14 19:45:19 UTC

Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Ken Dreyer (Red Hat) 2015-12-11 22:15:38 UTC

https://github.com/ceph/ceph/pull/6144 was merged upstream, and will ship in v0.94.6. We will cherry-pick that change downstream to ceph-1.3-rhel-patches for RHCS 1.3.2.

Comment 3 Kefu Chai 2015-12-16 05:19:59 UTC

Harish, 

i am afraid that we don't have a knob or trigger to set the read-only/read-write mode for a monstore.

as we know, the monstore is backed by leveldb by default. and the path to the underlying leveldb is something like: /var/lib/ceph/mon.foobar/store.db. and leveldb tends to create more tables (the .sst file) when more kv pairs are stored.

AFAIK, we can hardly revoke the write access of a running process to certain file once the file is opened. so probably we can

1. chmod -w /var/lib/ceph/mon.foobar/store.db
2. for i in `seq 100`; do ceph osd pool create test-pool-$i 1 1; done
3. watch -n1 -d ceph -s # check the status of quorum, we will see "1 mons down", when more and more pools are being created.



and by checking the log of the dead mon, we will see 

 FAILED assert(0 == "failed to write to db")

simply truncating the store.db/*.log to zero will crash leveldb. so it's not an option.

Comment 6 shylesh 2016-02-04 15:00:41 UTC

I was not able to bring down the mon by following above steps, kefu wanted to try reproducing again on his setup . I will try verifying this bug  once kefu confirms the exact steps.

@kefu,

Could you please update this bug with exact steps to verify.

Comment 7 Kefu Chai 2016-02-05 09:47:55 UTC

@shylesh,


- the sstables (underlying table files) in store.db are opened and mmap'ed by leveldb. so we can not take the read/write privilege to them from monitor at runtime
- the background compaction does fail, and the "failed to write to db" assertion is triggered, if we "chmod -w store.db". and leveldb complains in this case, but it does not return error on Write() unless "leveldb_paranoid" is true, but this option is "false" by default.
- if we set a limit for the monitor process using ulimit, it will receive a signal of SIGXFSZ, when it breaches its quota, and the process is terminated, as we don't handle it.

maybe we should do it in the hardway, by creating a loopback fs image with a local file
and mount it to the store.db. the size of which should be small enough so we don't need to wait too long before the assertion failure pops up, say 1MB.

Comment 8 Kefu Chai 2016-02-09 15:46:27 UTC

MonitorDBStore.h:285,
set var r=-1

MonitorDBStore.h:570,
set var r=-1

Comment 9 shylesh 2016-02-09 16:53:54 UTC

With the help of kefu I followed the following steps to verify the behaviour.

1. Attach gdb to the mon process
2. put the break points in the mon as mentioned in comment 8.
3. when break point is hit set the r value to -1 then continue.


Result:
====
Mon dies with SIGABRT.

  -1> 2016-02-09 15:50:31.640532 7fef59325700  2 -- 10.8.128.105:6789/0 >> :/0 pipe(0x4347000 sd=28 :6789 s=4 pgs=0 cs=0
 l=0 c=0x45747e0).fault (0) Success
     0> 2016-02-09 15:50:31.646012 7fef5cd34700 -1 mon/MonitorDBStore.h: In function 'int MonitorDBStore::apply_transactio
n(MonitorDBStore::TransactionRef)' thread 7fef5cd34700 time 2016-02-09 15:50:31.632626
mon/MonitorDBStore.h: 295: FAILED assert(0 == "failed to write to db")

 ceph version 0.94.5-8.el7cp (deef183a81111fa5e128ec88c90a32c9587c615d)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7b8ba5]
 2: (MonitorDBStore::apply_transaction(std::tr1::shared_ptr<MonitorDBStore::Transaction>)+0x6c9) [0x564c79]
 3: (Elector::bump_epoch(unsigned int)+0x168) [0x6917d8]
 4: (Elector::handle_propose(MMonElection*)+0x200) [0x691c40]
 5: (Elector::dispatch(Message*)+0xc63) [0x6956f3]
 6: (Monitor::dispatch(MonSession*, Message*, bool)+0x6bb) [0x596eeb]
 7: (Monitor::_ms_dispatch(Message*)+0x1a6) [0x5975d6]
 8: (Monitor::ms_dispatch(Message*)+0x23) [0x5b6ad3]
 9: (DispatchQueue::entry()+0x62a) [0x8a8afa]
 10: (DispatchQueue::DispatchThread::entry()+0xd) [0x7a0ded]
 11: (()+0x7dc5) [0x7fef63d4fdc5]
 12: (clone()+0x6d) [0x7fef6283021d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


Kefu had concerns regarding the verification approach using GDB. If somebody can confirm that this way of verifying the bug using GDB is ok and also the results  then I will mark this bug as verified.  

@Ken,

Could you please draw conclusion to this ??

Comment 10 Ken Dreyer (Red Hat) 2016-02-09 17:37:03 UTC

That approach sounds fine to me Shylesh. We expect the monitor to die with that assert (see the doc text field).

Comment 11 shylesh 2016-02-09 17:44:08 UTC

verified on ceph-0.94.5-8.el7cp.x86_64

Comment 13 errata-xmlrpc 2016-02-29 14:43:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:0313

Note You need to log in before you can comment on or make changes to this bug.