Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1755956

Summary: Containerized OSD failed to start after upgrading failed to read label for a lv
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vasishta <vashastr>
Component: RADOSAssignee: Neha Ojha <nojha>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Manohar Murthy <mmurthy>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.3CC: ceph-eng-bugs, dzafman, jbrier, jdurgin, kchai, nojha, oneumyvakin, sweil, tchandra, vumrao
Target Milestone: z3Keywords: Regression
Target Release: 3.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-12-19 18:42:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
File contains journald log of respective OSD none

Description Vasishta 2019-09-26 14:00:10 UTC
Created attachment 1619586 [details]
File contains journald log of respective OSD

Description of problem:
Tried to upgrade containerized cluster from [1] to [2] using ceph-ansible [3] ceph-ansible-3.2.27-1.el7cp.noarch . rolling update failed as cluster was did not have clean pgs within timeout. (noout, norebalance was set by ceph-ansible during upgrade)

It was observed that only 1 out of 16 OSDs was down with stderr - failed to read label for 'lv' (lv was present)

Version-Release number of selected component (if applicable):
[1] ceph-3.3-rhel-7-containers-candidate-37269-20190917163216, 12.2.12-70.el7cp
[2] ceph-3.3-rhel-7-containers-candidate-99414-20190918165737, 12.2.12-71.el7cp

How reproducible:
1

Steps to Reproduce:
1. Configure n-1 version of containerized cluster
2. upgrade it to nth version

Actual results:
 
Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-411886c1-5eb6-445b-bfd0-badb1c7a412a/osd-data-2232e149-96e3-4969-8079-cbbfab969b0b --path /var
ceph-osd-run.sh[65833]: stderr: failed to read label for /dev/ceph-411886c1-5eb6-445b-bfd0-badb1c7a412a/osd-data-2232e149-96e3-4969-8079-cbbfab969b0b: (2) No such file or directory


$ sudo lvdisplay /dev/ceph-411886c1-5eb6-445b-bfd0-badb1c7a412a/osd-data-2232e149-96e3-4969-8079-cbbfab969b0b
  --- Logical volume ---
  LV Path                /dev/ceph-411886c1-5eb6-445b-bfd0-badb1c7a412a/osd-data-2232e149-96e3-4969-8079-cbbfab969b0b
  LV Name                osd-data-2232e149-96e3-4969-8079-cbbfab969b0b
  VG Name                ceph-411886c1-5eb6-445b-bfd0-badb1c7a412a
  LV UUID                7UBCWl-RJSw-maqd-1ezX-N243-7C2I-UQeRmd
  LV Write Access        read/write
  LV Creation host, time magna033, 2019-09-18 05:33:21 +0000
  LV Status              available
  # open                 0
  LV Size                232.12 GiB
  Current LE             59424
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     8192
  Block device           253:10
   


Expected results:
OSDs should get updated successfully

Additional info:

Particular OSD was created using lvm batch 
nodexx devices="['/dev/sdb','/dev/sdc','/dev/sdd']" osd_scenario="lvm" osds_per_device=4


With primary observation at the whole env and logs, we have assumed that the issue is with RADOS as  ceph-bluestore-tool failed to read label of a lv. Please feel free to reassign to relevant component based on your findings.

Comment 3 Josh Durgin 2019-09-30 13:48:24 UTC
Could you retry with --debug-bluestore=30 and attach the osd log? Also please add dmesg from that node to see if there's anything going on with the disks at the kernel level.

Comment 6 Sage Weil 2019-10-01 19:26:43 UTC
Can you attach the output from

dd if=/dev/ceph-411886c1-5eb6-445b-bfd0-badb1c7a412a/osd-data-2232e149-96e3-4969-8079-cbbfab969b0b of=/tmp/foo bs=4K count=2
hexdump -C /tmp/foo

Thanks!

Comment 13 John Brier 2019-10-18 19:33:00 UTC
Note there was a request to remove this from the release notes but it was never attached to the 3.3 Release Notes tracker nor does it have a Doc Text/Doc Type so it was never actually in the Release Notes.

Comment 16 Oleg Neumyvakin 2021-08-08 05:51:46 UTC
I've faced with looks very similar issue on ceph version 15.2.13


[2021-08-08 04:08:22,963][ceph_volume.process][INFO ] Running command: /usr/bin/lsblk --nodeps P -o NAME,KNAME,MAJ:MIN,FSTYPE,MOUNTPOINT,LABEL,UUID,RO,RM,MODEL,SIZE,STATE,OWNER,GROUP,MODE,ALIGNMENT,PHY-SEC,L
OG-SEC,ROTA,SCHED,TYPE,DISC-ALN,DISC-GRAN,DISC-MAX,DISC-ZERO,PKNAME,PARTLABEL /dev/sdb
[2021-08-08 04:08:22,902][ceph_volume.process][INFO ] stdout TAGS=:systemd:
[2021-08-08 04:08:22,903][ceph_volume.process][INFO ] Running command: /usr/sbin/lvs --noheadings --readonly --separator=";" -a --units=b --nosuffix -S lv_path=/dev/sdb -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
[2021-08-08 04:08:22,963][ceph_volume.process][INFO ] Running command: /usr/bin/lsblk --nodeps -P -o NAME,KNAME,MAJ:MIN,FSTYPE,MOUNTPOINT,LABEL,UUID,RO,RM,MODEL,SIZE,STATE,OWNER,GROUP,MODE,ALIGNMENT,PHY-SEC,LOG-SEC,ROTA,SCHED,TYPE,DISC-ALN,DISC-GRAN,DISC-MAX,DISC-ZERO,PKNAME,PARTLABEL /dev/sdb
[2021-08-08 04:08:22,972][ceph_volume.process][INFO ] stdout NAME="sdb" KNAME="sdb" MAJ:MIN="8:16" FSTYPE="LVM2_member" MOUNTPOINT="" LABEL="" UUID="BVsYZe-nnPG-UCo3-D0xy-H5vr-mESK-IrW4IH" RO="0" RM="0" MODEL="QEMU HARDDISK " SIZE="80G" STATE="running" OWNER="root" GROUP="disk" MODE="brw-rw---" ALIGNMENT="0" PHY-SEC="512" LOG-SEC="512" ROTA="1" SCHED="mq-deadline" TYPE="disk" DISC-ALN="0" DISC-GRAN="4K" DISC-MAX="1G" DISC-ZERO="0" PKNAME="" PARTLABEL=""
[2021-08-08 04:08:22,973][ceph_volume.process][INFO ] Running command: /usr/sbin/blkid p /dev/sdb
[2021-08-08 04:08:22,977][ceph_volume.process][INFO ] stdout /dev/sdb: UUID="BVsYZe-nnPG-UCo3-D0xy-H5vr-mESK-IrW4IH" VERSION="LVM2 001" TYPE="LVM2_member" USAGE="raid"
[2021-08-08 04:08:22,978][ceph_volume.process][INFO ] Running command: /usr/sbin/pvs --noheadings --readonly --units=b --nosuffix --separator=";" -o vg_name,pv_count,lv_count,vg_attr,vg_extent_count,vg_free_count,vg_extent_size /dev/sdb
[2021-08-08 04:08:23,034][ceph_volume.process][INFO ] stdout ceph-0142ad4b-5029-457d-ae0a-9ed927249e32";"1";"1";"wz--n";"20479";"0";"4194304
[2021-08-08 04:08:23,034][ceph_volume.process][INFO ] Running command: /usr/sbin/pvs --noheadings --readonly --separator=";" -a --units=b --nosuffix -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size /dev/sdb
[2021-08-08 04:08:23,094][ceph_volume.process][INFO ] stdout ceph.block_device=/dev/ceph-0142ad4b-5029-457d-ae0a-9ed927249e32/osd-block-bf95f45b-0849-46cd-9715-7c27db32b9ff,ceph.block_uuid=rL6nxR-ESXS-Xd5B-TwP1-GGNE-MLpa-M4mcHY,ceph.cephx_lockbox_secret=,ceph.cluster_fsid=c6f330e2-f775-11eb-a326-85f44cce260a,ceph.cluster_name=ceph,ceph.crush_device_class=None,ceph.encrypted=0,ceph.osd_fsid=bf95f45b-0849-46cd-9715-7c27db32b9ff,ceph.osd_id=0,ceph.osdspec_affinity=None,ceph.type=block,ceph.vdo=0";"/dev/ceph-0142ad4b-5029-457d-ae0a-9ed927249e32/osd-block-bf95f45b-0849-46cd-9715-7c27db32b9ff";"osd-block-bf95f45b-0849-46cd-9715-7c27db32b9ff";"ceph-0142ad4b-5029-457d-ae0a-9ed927249e32";"rL6nxR-ESXS-Xd5B-TwP1-GGNE-MLpa-M4mcHY";"85895151616
[2021-08-08 04:08:23,095][ceph_volume.process][INFO ] Running command: /usr/bin/ceph-bluestore-tool show-label --dev /dev/sdb
[2021-08-08 04:08:23,125][ceph_volume.process][INFO ] stderr unable to read label for /dev/sdb: (2) No such file or directory

Run /usr/bin/ceph-bluestore-tool --log-level=30:

/usr/bin/ceph-bluestore-tool show-label --log-level=30 --dev /dev/sdb -l /var/log/ceph/ceph-volume.log
unable to read label for /dev/sdb: (2) No such file or directory

2021-08-08T05:17:45.303+0000 7ff7fc9c0240 10 bluestore(/dev/sdb) _read_bdev_label
2021-08-08T05:17:45.303+0000 7ff7fc9c0240 2 bluestore(/dev/sdb) _read_bdev_label unable to decode label at offset 102: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding


I've submit separate bug report at https://tracker.ceph.com/issues/52095 where you can find hexdump of /dev/sdb