Bug 1305812 - osd down during upgrade to 1.3.2 latest build
osd down during upgrade to 1.3.2 latest build
Status: CLOSED NOTABUG
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Unclassified (Show other bugs)
1.3.2
All Linux
unspecified Severity unspecified
: rc
: 1.3.3
Assigned To: ceph-eng-bugs
ceph-qe-bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-02-09 05:07 EST by rakesh
Modified: 2016-02-15 09:31 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-02-15 09:31:44 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
mon and osd logs (108.66 KB, application/zip)
2016-02-09 05:07 EST, rakesh
no flags Details

  None (edit)
Description rakesh 2016-02-09 05:07:56 EST
Created attachment 1122366 [details]
mon and osd logs

My cluster set up 

3 mons 
3 osds 
1 admin 

initially installed ceph 1.3.2 of jn 16th build. 
and then updated today to ceph 1.3.2 build of feb 5th build. 

after update, one of the osd . i.e osd.4 in one node is down
when checked with logs. it was not pointing to right journal, and hence the service was not starting. 

when i did ls -l in /var/lib/ceph/osd/ceph-4/

-rw-r--r--   1 root root  490 Feb  5 14:56 activate.monmap
-rw-r--r--   1 root root    3 Feb  5 14:56 active
-rw-r--r--   1 root root   37 Feb  5 14:56 ceph_fsid
drwxr-xr-x 190 root root 8192 Feb  8 09:29 current
-rw-r--r--   1 root root   37 Feb  5 14:56 fsid
lrwxrwxrwx   1 root root   58 Feb  5 14:56 journal -> /dev/disk/by-partuuid/37a4cc7f-ec14-4a14-90b6-2c8db3caae18
-rw-r--r--   1 root root   37 Feb  5 14:56 journal_uuid
-rw-------   1 root root   56 Feb  5 14:56 keyring
-rw-r--r--   1 root root   21 Feb  5 14:56 magic
-rw-r--r--   1 root root    6 Feb  5 14:56 ready
-rw-r--r--   1 root root    4 Feb  5 14:56 store_version
-rw-r--r--   1 root root   53 Feb  5 14:56 superblock
-rw-r--r--   1 root root    0 Feb  5 14:56 upstart

but by-partuuid shows it does not have journal

 ls -l /dev/disk/by-partuuid/
total 0
lrwxrwxrwx 1 root root 10 Feb  5 15:00 7f223239-d39b-4401-8794-78a00ee95c68 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Feb  8 11:03 c4110af4-b22a-43c1-869d-f4e01dc334aa -> ../../sdb2
lrwxrwxrwx 1 root root 10 Feb  8 11:05 e60c60ea-9076-48e2-8bf9-2683f067051d -> ../../sdd2
lrwxrwxrwx 1 root root 10 Feb  5 15:00 f588e1f6-ec3f-4dd7-a2df-1f5a7cb6b44f -> ../../sdd1

this is on ubuntu 14.04 
I had kernel 3.16 and then updated to kernel 3.19. 
this bug was hit when kernel was in 3.16 and still existed when updated to 3.19 


the osd.4 logs are attached
Comment 3 Loic Dachary 2016-02-09 10:07:50 EST
It seems to indicate that during the upgrade the /dev/disk/by-partuuid/37a4cc7f-ec14-4a14-90b6-2c8db3caae18 symlink was broken. This symlink is not created by ceph itself but by the underlying operating system. I don't see how a ceph upgrade could damage it. Are you able to repeat the problem ?
Comment 4 rakesh 2016-02-15 09:31:44 EST
Hi Loic, 

I have testing this upgrade path again. I did not hit this issue and hence could reproduce. closing this as not a bug as of now.

Note You need to log in before you can comment on or make changes to this bug.