| Summary: | [RH Ceph 2.0 - 10.1.1] One osd fails to start after upgrade ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7fe2b857f285] | ||
|---|---|---|---|
| Product: | Red Hat Ceph Storage | Reporter: | Vasu Kulkarni <vakulkar> |
| Component: | RADOS | Assignee: | Kefu Chai <kchai> |
| Status: | CLOSED NOTABUG | QA Contact: | ceph-qe-bugs <ceph-qe-bugs> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 2.0 | CC: | ceph-eng-bugs, dzafman, kchai |
| Target Milestone: | rc | ||
| Target Release: | 2.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-04-20 04:02:57 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
Feel, free to use magna094.ceph.redhat.com
[ubuntu@magna094 ~]$ sudo ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 4.49997 root default
-2 1.79999 host magna094
0 0.89999 osd.0 up 1.00000 1.00000
1 0.89999 osd.1 down 0 1.00000
-3 2.69998 host magna095
2 0.89999 osd.2 up 1.00000 1.00000
3 0.89999 osd.3 up 1.00000 1.00000
4 0.89999 osd.4 up 1.00000 1.00000
[ubuntu@magna094 ~]$ sudo ceph -s
cluster 3cc10402-b0cd-456e-9119-b7ae9ceccff6
health HEALTH_WARN
clock skew detected on mon.magna095
23 pgs degraded
23 pgs stuck degraded
24 pgs stuck unclean
23 pgs stuck undersized
23 pgs undersized
Monitor clock skew detected
monmap e1: 2 mons at {magna093=10.8.128.93:6789/0,magna095=10.8.128.95:6789/0}
election epoch 12, quorum 0,1 magna093,magna095
osdmap e73: 5 osds: 4 up, 4 in; 1 remapped pgs
pgmap v172: 64 pgs, 1 pools, 0 bytes data, 0 objects
136 MB used, 3704 GB / 3704 GB avail
40 active+clean
23 active+undersized+degraded
1 active+remapped
Vasu,
as /var/log/ceph/ceph-osd.1.log does not show anything interesting, you can launch the osd daemon manually:
# /usr/bin/ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup ceph --debug-osd=10 --debug-filestore=10 --log-to-stderr=1
- debug-osd and debug-filestore to get more verbose logging, since we are debugging osd
- log-to-stderr so it prints the log to stderr.
so i have following log messages:
-7> 2016-04-20 03:58:23.982224 7f4f95866800 10 register_cxx_method refcount.get flags 3 0x7f4f824d6f30
-6> 2016-04-20 03:58:23.982226 7f4f95866800 10 register_cxx_method refcount.put flags 3 0x7f4f824d6270
-5> 2016-04-20 03:58:23.982227 7f4f95866800 10 register_cxx_method refcount.set flags 3 0x7f4f824d5250
-4> 2016-04-20 03:58:23.982228 7f4f95866800 10 register_cxx_method refcount.read flags 1 0x7f4f824d68a0
-3> 2016-04-20 03:58:23.982229 7f4f95866800 10 _load_class refcount success
-2> 2016-04-20 03:58:23.982266 7f4f95866800 10 filestore(/var/lib/ceph/osd/ceph-1) error opening file /var/lib/ceph/osd/ceph-1/current/meta/osdmap.57__0_64172C0C__none with flags=2: (13) Permission denied
-1> 2016-04-20 03:58:23.982273 7f4f95866800 10 filestore(/var/lib/ceph/osd/ceph-1) FileStore::read(meta/#-1:3034e826:::osdmap.57:0#) open error: (13) Permission denied
0> 2016-04-20 03:58:23.983951 7f4f95866800 -1 osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f4f95866800 time 2016-04-20 03:58:23.982283
osd/OSD.h: 885: FAILED assert(ret)
does it sound familiar to you?
> -2> 2016-04-20 03:58:23.982266 7f4f95866800 10 filestore(/var/lib/ceph/osd/ceph-1) error opening file /var/lib/ceph/osd/ceph-1/current/meta/osdmap.57__0_64172C0C__none with flags=2: (13) Permission denied
it does ring a bell with me. so i checked the permissions:
# ls -l /var/lib/ceph/osd/ceph-1/current/meta/osdmap.57__0_64172C0C__none
-rw-r--r--. 1 root root 4150 Apr 18 18:02 /var/lib/ceph/osd/ceph-1/current/meta/osdmap.57__0_64172C0C__none
and there are more of them:
[root@magna094 ceph]# ls -l /var/lib/ceph/osd/ceph-1/current/meta/ | grep root
-rw-r--r--. 1 root root 179 Apr 18 18:02 inc\uosdmap.56__0_F4E9E663__none
-rw-r--r--. 1 root root 198 Apr 18 18:02 inc\uosdmap.57__0_F4E9E733__none
-rw-r--r--. 1 root root 4150 Apr 18 18:02 osdmap.56__0_64172F7C__none
-rw-r--r--. 1 root root 4150 Apr 18 18:02 osdmap.57__0_64172C0C__none
you might want to enable the log and double check the permissions next time =)
Thanks Kefu, I did use -R this time on /var/lib/ceph, I am still not sure how that folder wasn't changed, it did work for 1 OSD, but thanks I am going to keep an eye on all subfolders now :) |
Description of problem: 1. 3 node cluser node1 - [mon.a, osd.0, osd.1, osd.2] node2 - [mon.b, client.0] node3 - [osd.3, osd.4, client.1] node1 sucessfully upgraded to 2.0(0.94.5 to 10.1.1), had some permission issues with ceph:ceph on /var/lib/ceph which kefu resolved and all the osd/mon active on node1. upgrade node3 with 2 osd's, 1 osd fails to start and i see the below crash, for some reason the time is out of sync by 2 or 3 seconds, I did sync them but it still show the warning of monitor clock out of sync between node1 and node 2. There is no activity from any client side so no IO operations are in progress. Version-Release number of selected component (if applicable): 10.1.1 [ubuntu@magna094 ~]$ sudo systemctl status ceph-osd@1 ● ceph-osd - Ceph object storage daemon Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled; vendor preset: disabled) Active: failed (Result: signal) since Mon 2016-04-18 18:11:20 UTC; 1h 27min ago Process: 3188 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=killed, signal=ABRT) Process: 3121 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=0/SUCCESS) Main PID: 3188 (code=killed, signal=ABRT) Apr 18 18:11:20 magna094 ceph-osd[3188]: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x267) [0x7fed2cc68467] Apr 18 18:11:20 magna094 ceph-osd[3188]: 6: (OSDService::get_map(unsigned int)+0x3d) [0x7fed2c649e3d] Apr 18 18:11:20 magna094 ceph-osd[3188]: 7: (OSD::init()+0x17c2) [0x7fed2c5fcec2] Apr 18 18:11:20 magna094 ceph-osd[3188]: 8: (main()+0x2c45) [0x7fed2c561a05] Apr 18 18:11:20 magna094 ceph-osd[3188]: 9: (__libc_start_main()+0xf5) [0x7fed29153b15] Apr 18 18:11:20 magna094 ceph-osd[3188]: 10: (()+0x352e29) [0x7fed2c5abe29] Apr 18 18:11:20 magna094 ceph-osd[3188]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Apr 18 18:11:20 magna094 systemd[1]: ceph-osd: main process exited, code=killed, status=6/ABRT Apr 18 18:11:20 magna094 systemd[1]: Unit ceph-osd entered failed state. Apr 18 18:11:20 magna094 systemd[1]: ceph-osd failed. [ubuntu@magna094 ~]$ sudo ls -lt /var/lib/ceph/ total 32 drwxr-x---. 2 ceph ceph 4096 Apr 18 18:11 tmp drwxr-x---. 2 ceph ceph 4096 Apr 8 09:59 bootstrap-mds drwxr-x---. 2 ceph ceph 4096 Apr 8 09:59 bootstrap-osd drwxr-x---. 2 ceph ceph 4096 Apr 8 09:59 bootstrap-rgw drwxr-x---. 2 ceph ceph 4096 Apr 8 09:59 mds drwxr-x---. 2 ceph ceph 4096 Apr 8 09:59 mon drwxr-x---. 4 ceph ceph 4096 Apr 8 09:59 osd drwxr-xr-x. 2 ceph ceph 4096 Apr 8 09:59 radosgw [ubuntu@magna094 ~]$ sudo ls -lt /var/lib/ceph/osd total 0 drwxr-xr-x. 3 ceph ceph 217 Apr 18 18:11 ceph-1 drwxr-xr-x. 3 ceph ceph 217 Apr 18 18:11 ceph-0 [ubuntu@magna094 ~]$ sudo ls -lt /var/lib/ceph/osd/ceph-1 total 44 -rw-r--r--. 1 root root 0 Apr 18 18:11 systemd drwxr-xr-x. 68 ceph ceph 1075 Apr 18 18:04 current -rw-r--r--. 1 ceph ceph 3 Apr 15 22:37 active -rw-------. 1 ceph ceph 56 Apr 15 22:37 keyring -rw-r--r--. 1 ceph ceph 6 Apr 15 22:37 ready -rw-r--r--. 1 ceph ceph 53 Apr 15 22:37 superblock -rw-r--r--. 1 ceph ceph 4 Apr 15 22:37 store_version -rw-r--r--. 1 ceph ceph 342 Apr 15 22:37 activate.monmap -rw-r--r--. 1 ceph ceph 2 Apr 15 22:37 whoami -rw-r--r--. 1 ceph ceph 21 Apr 15 22:37 magic -rw-r--r--. 1 ceph ceph 37 Apr 15 22:37 journal_uuid -rw-r--r--. 1 ceph ceph 37 Apr 15 22:37 fsid -rw-r--r--. 1 ceph ceph 37 Apr 15 22:37 ceph_fsid lrwxrwxrwx. 1 ceph ceph 58 Apr 15 22:37 journal -> /dev/disk/by-partuuid/b296d58b-882e-449a-8508-8aa62492661e 2624 Apr 18 18:04:58 magna094 ceph-osd: ceph version 10.1.1-1.el7cp (61adb020219fbad4508050b5f0a792246ba74dae) 2625 Apr 18 18:04:58 magna094 ceph-osd: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7fe2b857f285] 2626 Apr 18 18:04:58 magna094 ceph-osd: 2: (OSDService::get_map(unsigned int)+0x3d) [0x7fe2b7f60e3d] 2627 Apr 18 18:04:58 magna094 ceph-osd: 3: (OSD::init()+0x17c2) [0x7fe2b7f13ec2] 2628 Apr 18 18:04:58 magna094 ceph-osd: 4: (main()+0x2c45) [0x7fe2b7e78a05] 2629 Apr 18 18:04:58 magna094 ceph-osd: 5: (__libc_start_main()+0xf5) [0x7fe2b4a6ab15] 2630 Apr 18 18:04:58 magna094 ceph-osd: 6: (()+0x352e29) [0x7fe2b7ec2e29] 2631 Apr 18 18:04:58 magna094 ceph-osd: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2632 Apr 18 18:04:58 magna094 ceph-osd: 2016-04-18 18:04:58.102523 7fe2b7b51800 -1 osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7fe2b7b51800 tim e 2016-04-18 18:04:58.100909 2633 Apr 18 18:04:58 magna094 ceph-osd: osd/OSD.h: 885: FAILED assert(ret) 2634 Apr 18 18:04:58 magna094 ceph-osd: ceph version 10.1.1-1.el7cp (61adb020219fbad4508050b5f0a792246ba74dae) 2635 Apr 18 18:04:58 magna094 ceph-osd: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7fe2b857f285] 2636 Apr 18 18:04:58 magna094 ceph-osd: 2: (OSDService::get_map(unsigned int)+0x3d) [0x7fe2b7f60e3d] 2637 Apr 18 18:04:58 magna094 ceph-osd: 3: (OSD::init()+0x17c2) [0x7fe2b7f13ec2] 2638 Apr 18 18:04:58 magna094 ceph-osd: 4: (main()+0x2c45) [0x7fe2b7e78a05] 2639 Apr 18 18:04:58 magna094 ceph-osd: 5: (__libc_start_main()+0xf5) [0x7fe2b4a6ab15] 2640 Apr 18 18:04:58 magna094 ceph-osd: 6: (()+0x352e29) [0x7fe2b7ec2e29] 2641 Apr 18 18:04:58 magna094 ceph-osd: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2642 Apr 18 18:04:58 magna094 ceph-osd: 0> 2016-04-18 18:04:58.102523 7fe2b7b51800 -1 osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7fe2b7b51800 time 2016-04-18 18:04:58.100909 2643 Apr 18 18:04:58 magna094 ceph-osd: osd/OSD.h: 885: FAILED assert(ret) 2644 Apr 18 18:04:58 magna094 ceph-osd: ceph version 10.1.1-1.el7cp (61adb020219fbad4508050b5f0a792246ba74dae) 2645 Apr 18 18:04:58 magna094 ceph-osd: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7fe2b857f285] 2646 Apr 18 18:04:58 magna094 ceph-osd: 2: (OSDService::get_map(unsigned int)+0x3d) [0x7fe2b7f60e3d] 2647 Apr 18 18:04:58 magna094 ceph-osd: 3: (OSD::init()+0x17c2) [0x7fe2b7f13ec2] 2648 Apr 18 18:04:58 magna094 ceph-osd: 4: (main()+0x2c45) [0x7fe2b7e78a05] 2649 Apr 18 18:04:58 magna094 ceph-osd: 5: (__libc_start_main()+0xf5) [0x7fe2b4a6ab15] 2650 Apr 18 18:04:58 magna094 ceph-osd: 6: (()+0x352e29) [0x7fe2b7ec2e29] 2651 Apr 18 18:04:58 magna094 ceph-osd: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2652 Apr 18 18:04:58 magna094 ceph-osd: *** Caught signal (Aborted) ** 2653 Apr 18 18:04:58 magna094 ceph-osd: in thread 7fe2b7b51800 thread_name:ceph-osd 2654 Apr 18 18:04:58 magna094 ceph-osd: ceph version 10.1.1-1.el7cp (61adb020219fbad4508050b5f0a792246ba74dae) 2655 Apr 18 18:04:58 magna094 ceph-osd: 1: (()+0x90f0ba) [0x7fe2b847f0ba] 2656 Apr 18 18:04:58 magna094 ceph-osd: 2: (()+0xf100) [0x7fe2b64bb100] 2657 Apr 18 18:04:58 magna094 ceph-osd: 3: (gsignal()+0x37) [0x7fe2b4a7e5f7] 2658 Apr 18 18:04:58 magna094 ceph-osd: 4: (abort()+0x148) [0x7fe2b4a7fce8] 2659 Apr 18 18:04:58 magna094 ceph-osd: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x267) [0x7fe2b857f467] 2660 Apr 18 18:04:58 magna094 ceph-osd: 6: (OSDService::get_map(unsigned int)+0x3d) [0x7fe2b7f60e3d] 2661 Apr 18 18:04:58 magna094 ceph-osd: 7: (OSD::init()+0x17c2) [0x7fe2b7f13ec2] 2662 Apr 18 18:04:58 magna094 ceph-osd: 8: (main()+0x2c45) [0x7fe2b7e78a05] 2663 Apr 18 18:04:58 magna094 ceph-osd: 9: (__libc_start_main()+0xf5) [0x7fe2b4a6ab15] 2664 Apr 18 18:04:58 magna094 ceph-osd: 10: (()+0x352e29) [0x7fe2b7ec2e29] 2665 Apr 18 18:04:58 magna094 ceph-osd: 2016-04-18 18:04:58.104345 7fe2b7b51800 -1 *** Caught signal (Aborted) ** 2666 Apr 18 18:04:58 magna094 ceph-osd: in thread 7fe2b7b51800 thread_name:ceph-osd 2667 Apr 18 18:04:58 magna094 ceph-osd: ceph version 10.1.1-1.el7cp (61adb020219fbad4508050b5f0a792246ba74dae) 2668 Apr 18 18:04:58 magna094 ceph-osd: 1: (()+0x90f0ba) [0x7fe2b847f0ba] 2669 Apr 18 18:04:58 magna094 ceph-osd: 2: (()+0xf100) [0x7fe2b64bb100] 2670 Apr 18 18:04:58 magna094 ceph-osd: 3: (gsignal()+0x37) [0x7fe2b4a7e5f7] 2671 Apr 18 18:04:58 magna094 ceph-osd: 4: (abort()+0x148) [0x7fe2b4a7fce8] 2672 Apr 18 18:04:58 magna094 ceph-osd: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x267) [0x7fe2b857f467] 2673 Apr 18 18:04:58 magna094 ceph-osd: 6: (OSDService::get_map(unsigned int)+0x3d) [0x7fe2b7f60e3d] 2674 Apr 18 18:04:58 magna094 ceph-osd: 7: (OSD::init()+0x17c2) [0x7fe2b7f13ec2] 2675 Apr 18 18:04:58 magna094 ceph-osd: 8: (main()+0x2c45) [0x7fe2b7e78a05] 2676 Apr 18 18:04:58 magna094 ceph-osd: 9: (__libc_start_main()+0xf5) [0x7fe2b4a6ab15] 2677 Apr 18 18:04:58 magna094 ceph-osd: 10: (()+0x352e29) [0x7fe2b7ec2e29] 2678 Apr 18 18:04:58 magna094 ceph-osd: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2679 Apr 18 18:04:58 magna094 ceph-osd: 0> 2016-04-18 18:04:58.104345 7fe2b7b51800 -1 *** Caught signal (Aborted) ** 2680 Apr 18 18:04:58 magna094 ceph-osd: in thread 7fe2b7b51800 thread_name:ceph-osd 2681 Apr 18 18:04:58 magna094 ceph-osd: ceph version 10.1.1-1.el7cp (61adb020219fbad4508050b5f0a792246ba74dae) 2682 Apr 18 18:04:58 magna094 ceph-osd: 1: (()+0x90f0ba) [0x7fe2b847f0ba] 2683 Apr 18 18:04:58 magna094 ceph-osd: 2: (()+0xf100) [0x7fe2b64bb100] 2684 Apr 18 18:04:58 magna094 ceph-osd: 3: (gsignal()+0x37) [0x7fe2b4a7e5f7] 2685 Apr 18 18:04:58 magna094 ceph-osd: 4: (abort()+0x148) [0x7fe2b4a7fce8] 2686 Apr 18 18:04:58 magna094 ceph-osd: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x267) [0x7fe2b857f467] 2687 Apr 18 18:04:58 magna094 ceph-osd: 6: (OSDService::get_map(unsigned int)+0x3d) [0x7fe2b7f60e3d] 2688 Apr 18 18:04:58 magna094 ceph-osd: 7: (OSD::init()+0x17c2) [0x7fe2b7f13ec2] 2689 Apr 18 18:04:58 magna094 ceph-osd: 8: (main()+0x2c45) [0x7fe2b7e78a05] 2690 Apr 18 18:04:58 magna094 ceph-osd: 9: (__libc_start_main()+0xf5) [0x7fe2b4a6ab15] 2691 Apr 18 18:04:58 magna094 ceph-osd: 10: (()+0x352e29) [0x7fe2b7ec2e29] 2692 Apr 18 18:04:58 magna094 ceph-osd: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2693 Apr 18 18:04:58 magna094 systemd: ceph-osd: main process exited, code=killed, status=6/ABRT 2694 Apr 18 18:04:58 magna094 systemd: Unit ceph-osd entered failed state. 2695 Apr 18 18:04:58 magna094 systemd: ceph-osd failed.