Bug 1305812 - osd down during upgrade to 1.3.2 latest build
Summary: osd down during upgrade to 1.3.2 latest build
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Unclassified
Version: 1.3.2
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: rc
: 1.3.3
Assignee: ceph-eng-bugs
QA Contact: ceph-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-02-09 10:07 UTC by rakesh
Modified: 2022-02-21 18:03 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-02-15 14:31:44 UTC
Target Upstream Version:


Attachments (Terms of Use)
mon and osd logs (108.66 KB, application/zip)
2016-02-09 10:07 UTC, rakesh
no flags Details

Description rakesh 2016-02-09 10:07:56 UTC
Created attachment 1122366 [details]
mon and osd logs

My cluster set up 

3 mons 
3 osds 
1 admin 

initially installed ceph 1.3.2 of jn 16th build. 
and then updated today to ceph 1.3.2 build of feb 5th build. 

after update, one of the osd . i.e osd.4 in one node is down
when checked with logs. it was not pointing to right journal, and hence the service was not starting. 

when i did ls -l in /var/lib/ceph/osd/ceph-4/

-rw-r--r--   1 root root  490 Feb  5 14:56 activate.monmap
-rw-r--r--   1 root root    3 Feb  5 14:56 active
-rw-r--r--   1 root root   37 Feb  5 14:56 ceph_fsid
drwxr-xr-x 190 root root 8192 Feb  8 09:29 current
-rw-r--r--   1 root root   37 Feb  5 14:56 fsid
lrwxrwxrwx   1 root root   58 Feb  5 14:56 journal -> /dev/disk/by-partuuid/37a4cc7f-ec14-4a14-90b6-2c8db3caae18
-rw-r--r--   1 root root   37 Feb  5 14:56 journal_uuid
-rw-------   1 root root   56 Feb  5 14:56 keyring
-rw-r--r--   1 root root   21 Feb  5 14:56 magic
-rw-r--r--   1 root root    6 Feb  5 14:56 ready
-rw-r--r--   1 root root    4 Feb  5 14:56 store_version
-rw-r--r--   1 root root   53 Feb  5 14:56 superblock
-rw-r--r--   1 root root    0 Feb  5 14:56 upstart

but by-partuuid shows it does not have journal

 ls -l /dev/disk/by-partuuid/
total 0
lrwxrwxrwx 1 root root 10 Feb  5 15:00 7f223239-d39b-4401-8794-78a00ee95c68 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Feb  8 11:03 c4110af4-b22a-43c1-869d-f4e01dc334aa -> ../../sdb2
lrwxrwxrwx 1 root root 10 Feb  8 11:05 e60c60ea-9076-48e2-8bf9-2683f067051d -> ../../sdd2
lrwxrwxrwx 1 root root 10 Feb  5 15:00 f588e1f6-ec3f-4dd7-a2df-1f5a7cb6b44f -> ../../sdd1

this is on ubuntu 14.04 
I had kernel 3.16 and then updated to kernel 3.19. 
this bug was hit when kernel was in 3.16 and still existed when updated to 3.19 


the osd.4 logs are attached

Comment 3 Loic Dachary 2016-02-09 15:07:50 UTC
It seems to indicate that during the upgrade the /dev/disk/by-partuuid/37a4cc7f-ec14-4a14-90b6-2c8db3caae18 symlink was broken. This symlink is not created by ceph itself but by the underlying operating system. I don't see how a ceph upgrade could damage it. Are you able to repeat the problem ?

Comment 4 rakesh 2016-02-15 14:31:44 UTC
Hi Loic, 

I have testing this upgrade path again. I did not hit this issue and hence could reproduce. closing this as not a bug as of now.


Note You need to log in before you can comment on or make changes to this bug.