Bug 1224622 - [RFE] Improve logging information when OSD dies due to timeout so that user can track cause
Summary: [RFE] Improve logging information when OSD dies due to timeout so that user c...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 1.3.0
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: rc
: 3.*
Assignee: Josh Durgin
QA Contact: ceph-qe-bugs
John Wilkins
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-05-25 07:25 UTC by Tanay Ganguly
Modified: 2018-06-15 19:47 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-15 19:47:08 UTC
Embargoed:


Attachments (Terms of Use)
OSD log and Mon logs (712.25 KB, application/x-gzip)
2015-05-25 07:25 UTC, Tanay Ganguly
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 11823 0 None None None Never

Description Tanay Ganguly 2015-05-25 07:25:55 UTC
Created attachment 1029363 [details]
OSD log and Mon logs

Description of problem:
OSDs are getting crashed when i was writing heavy IO operations from rbd clients.

Version-Release number of selected component (if applicable):
ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)

How reproducible:
Not Sure, will try to reproduce again.

Steps to Reproduce:
1. Create 4 rbd image and attach them to VM's, with object map feature enabled.
2. Boot all the VM's and start OS installation process in all of them at once.
3. Create 10 rbd image.
4. When the installation kicked in, i started rbd write-bench on all of my 10 OSD hosts using the same rbd images created in step 3.

rbd bench-write --io-pattern rand Tanay-RBD/WBClone_new5

Actual results:
OSDs are getting crashed.

 -2  2.00000     host OSD1                                    
  0  0.39999         osd.0     down        0          1.00000 
  3  0.39999         osd.3     down        0          1.00000 
  6  0.39999         osd.6     down        0          1.00000 
  9  0.39999         osd.9     down        0          1.00000 
 12  0.39999         osd.12    down        0          1.00000 

bt:
     0> 2015-05-23 13:28:21.836189 7f906020e700 -1 *** Caught signal (Aborted) **
 in thread 7f906020e700

 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
 1: /usr/bin/ceph-osd() [0x9f6562]
 2: (()+0xf130) [0x7f906d6c0130]
 3: (gsignal()+0x37) [0x7f906c0da5d7]
 4: (abort()+0x148) [0x7f906c0dbcc8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f906c9de9b5]
 6: (()+0x5e926) [0x7f906c9dc926]
 7: (()+0x5e953) [0x7f906c9dc953]
 8: (()+0x5eb73) [0x7f906c9dcb73]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x27a) [0xaf640a]
 10: (SyncEntryTimeout::finish(int)+0xbd) [0x8d83bd]
 11: (Context::complete(int)+0x9) [0x67c759]
 12: (SafeTimer::timer_thread()+0x104) [0xadf804]
 13: (SafeTimerThread::entry()+0xd) [0xae07bd]
 14: (()+0x7df5) [0x7f906d6b8df5]
 15: (clone()+0x6d) [0x7f906c19b1ad]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Expected results:
There should not be any crash.

Additional info:
I started all the write-bench thread and the installation process, today morning i am seeing OSDs being crahsed.

[root@OSD1 logs]# ceph osd tree
ID  WEIGHT   TYPE NAME      UP/DOWN REWEIGHT PRIMARY-AFFINITY
 -1 15.59984 root default
 -2  2.00000     host OSD1                                    
  0  0.39999         osd.0     down        0          1.00000 
  3  0.39999         osd.3     down        0          1.00000 
  6  0.39999         osd.6     down        0          1.00000 
  9  0.39999         osd.9     down        0          1.00000 
 12  0.39999         osd.12    down        0          1.00000 
 -3  1.99997     host OSD2                                    
  1  0.39999         osd.1       up  1.00000          1.00000 
  4  0.39999         osd.4       up  1.00000          1.00000 
  7  0.39999         osd.7       up  1.00000          1.00000 
 10  0.39999         osd.10      up  1.00000          1.00000 
 13  0.39999         osd.13      up  1.00000          1.00000 
 -4  2.00000     host OSD3                                    
  2  0.39999         osd.2       up  1.00000          1.00000 
  5  0.39999         osd.5       up  1.00000          1.00000 
  8  0.39999         osd.8       up  1.00000          1.00000 
 11  0.39999         osd.11      up  1.00000          1.00000 
 14  0.39999         osd.14      up  1.00000          1.00000 
 -5  1.20000     host ADMIN                                   
 15  0.39999         osd.15      up  1.00000          1.00000 
 16  0.39999         osd.16      up  1.00000          1.00000 
 17  0.39999         osd.17      up  1.00000          1.00000 
 -6  1.19998     host OSD5                                    
 33  0.39999         osd.33      up  1.00000          1.00000 
 34  0.39999         osd.34      up  1.00000          1.00000 
 35  0.39999         osd.35      up  1.00000          1.00000 
 -7  1.19998     host OSD4                                    
 30  0.39999         osd.30      up  1.00000          1.00000 
 31  0.39999         osd.31      up  1.00000          1.00000 
 32  0.39999         osd.32      up  1.00000          1.00000 
 -8  1.19998     host OSD6                                    
 36  0.39999         osd.36      up  1.00000          1.00000 
 37  0.39999         osd.37      up  1.00000          1.00000 
 38  0.39999         osd.38      up  1.00000          1.00000 
 -9  1.19998     host OSD7                                    
 39  0.39999         osd.39      up  1.00000          1.00000
 40  0.39999         osd.40      up  1.00000          1.00000
 41  0.39999         osd.41      up  1.00000          1.00000
-10  1.19998     host OSD8
 42  0.39999         osd.42      up  1.00000          1.00000
 43  0.39999         osd.43      up  1.00000          1.00000
 44  0.39999         osd.44      up  1.00000          1.00000
-11  1.19998     host OSD9
 18  0.39999         osd.18      up  1.00000          1.00000
 19  0.39999         osd.19      up  1.00000          1.00000
 20  0.39999         osd.20      up  1.00000          1.00000
-12  1.19998     host OSD10
 21  0.39999         osd.21      up  1.00000          1.00000
 22  0.39999         osd.22      up  1.00000          1.00000
 23  0.39999         osd.23      up  1.00000          1.00000

ceph -s
    cluster 1ceee8cc-75a3-4b10-a24f-6be6bbaa6a99
     health HEALTH_WARN
            25 requests are blocked > 32 sec
     monmap e1: 1 mons at {MON1=10.12.27.26:6789/0}
            election epoch 1, quorum 0 MON1
     osdmap e10915: 39 osds: 34 up, 34 in
      pgmap v745916: 4608 pgs, 2 pools, 115 GB data, 31971 objects
            2391 GB used, 11450 GB / 13842 GB avail
                4608 active+clean
  client io 191 kB/s rd, 121 kB/s wr, 107 op/s


Attaching the logs of OSD's, MON and the ceph log for further investigation.

Comment 2 Jason Dillaman 2015-05-26 14:09:35 UTC
The sub-component should be RADOS.  It looks like you hit the filestore sync timeout (defaults to 10 minutes) which indicates you have severely slow I/O on the OSD.

Comment 3 Samuel Just 2015-05-26 17:01:47 UTC
Apparently, you are exceednig the filestore_commit_timeout threshold, which defaults to 600s -- which means that a syncfs(2) took longer than 10 minutes.  I think the disk on which this is happening is probably just bad, or the disk/node is hilariously overloaded.  I guess this is still the atom systems?

Comment 4 Harish NV Rao 2015-05-27 15:24:33 UTC
Sam, is osd crash in this case an expected behavior? Shouldn't it be handled gracefully?

Comment 5 Samuel Just 2015-05-27 16:40:30 UTC
It crashes because usually a syncfs(2) taking 10 minutes indicates that the storage died, and the osd should commit suicide.  It's a config option (filestore_commit_timeout).

Comment 6 Ken Dreyer (Red Hat) 2015-05-29 21:35:45 UTC
Crashing with ceph_abort() seems extreme to me, because it's hard for a non-developer user to know what's happened. All that users will see is an obfuscated stack trace with __ceph_assert_fail() somewhere in there. It would be nice if the the OSDs logged what's happened, and the log message could even point at the filestore_commit_timeout setting, and then shut down gracefully.

John, in terms of documentation, I guess we need a knowledge-base article with this information? Something like "if users see a crash in SyncEntryTimeout::finish, it means that the storage hardware has died and the OSD refused to continue to run. This value is configurable with the filestore_commit_timeout setting". Does that sound right to you?

Comment 7 Samuel Just 2015-05-29 21:48:38 UTC
We probably should add some logging to indicate what is going on, but a graceful shutdown is impossible since the disk is not able to commit.

Comment 8 Samuel Just 2015-05-29 21:54:51 UTC
http://tracker.ceph.com/issues/11823

Comment 10 Samuel Just 2015-07-14 18:15:47 UTC
This hasn't been done upstream yet.  You probably want to push it for now.

Comment 11 Ken Dreyer (Red Hat) 2015-07-15 22:54:16 UTC
Thanks Sam, re-targeting to 1.3.2.

Comment 12 Samuel Just 2015-12-11 21:16:44 UTC
Clearing need_info thing.


Note You need to log in before you can comment on or make changes to this bug.