Hide Forgot
Created attachment 1122366 [details] mon and osd logs My cluster set up 3 mons 3 osds 1 admin initially installed ceph 1.3.2 of jn 16th build. and then updated today to ceph 1.3.2 build of feb 5th build. after update, one of the osd . i.e osd.4 in one node is down when checked with logs. it was not pointing to right journal, and hence the service was not starting. when i did ls -l in /var/lib/ceph/osd/ceph-4/ -rw-r--r-- 1 root root 490 Feb 5 14:56 activate.monmap -rw-r--r-- 1 root root 3 Feb 5 14:56 active -rw-r--r-- 1 root root 37 Feb 5 14:56 ceph_fsid drwxr-xr-x 190 root root 8192 Feb 8 09:29 current -rw-r--r-- 1 root root 37 Feb 5 14:56 fsid lrwxrwxrwx 1 root root 58 Feb 5 14:56 journal -> /dev/disk/by-partuuid/37a4cc7f-ec14-4a14-90b6-2c8db3caae18 -rw-r--r-- 1 root root 37 Feb 5 14:56 journal_uuid -rw------- 1 root root 56 Feb 5 14:56 keyring -rw-r--r-- 1 root root 21 Feb 5 14:56 magic -rw-r--r-- 1 root root 6 Feb 5 14:56 ready -rw-r--r-- 1 root root 4 Feb 5 14:56 store_version -rw-r--r-- 1 root root 53 Feb 5 14:56 superblock -rw-r--r-- 1 root root 0 Feb 5 14:56 upstart but by-partuuid shows it does not have journal ls -l /dev/disk/by-partuuid/ total 0 lrwxrwxrwx 1 root root 10 Feb 5 15:00 7f223239-d39b-4401-8794-78a00ee95c68 -> ../../sdb1 lrwxrwxrwx 1 root root 10 Feb 8 11:03 c4110af4-b22a-43c1-869d-f4e01dc334aa -> ../../sdb2 lrwxrwxrwx 1 root root 10 Feb 8 11:05 e60c60ea-9076-48e2-8bf9-2683f067051d -> ../../sdd2 lrwxrwxrwx 1 root root 10 Feb 5 15:00 f588e1f6-ec3f-4dd7-a2df-1f5a7cb6b44f -> ../../sdd1 this is on ubuntu 14.04 I had kernel 3.16 and then updated to kernel 3.19. this bug was hit when kernel was in 3.16 and still existed when updated to 3.19 the osd.4 logs are attached
It seems to indicate that during the upgrade the /dev/disk/by-partuuid/37a4cc7f-ec14-4a14-90b6-2c8db3caae18 symlink was broken. This symlink is not created by ceph itself but by the underlying operating system. I don't see how a ceph upgrade could damage it. Are you able to repeat the problem ?
Hi Loic, I have testing this upgrade path again. I did not hit this issue and hence could reproduce. closing this as not a bug as of now.