Bug 1494543 - ceph-disk prepare cannot find partition that does exist when run inside containers
Summary: ceph-disk prepare cannot find partition that does exist when run inside conta...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Disk
Version: 3.0
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: rc
: 3.0
Assignee: Kefu Chai
QA Contact: John Fulton
URL:
Whiteboard:
: 1490283 (view as bug list)
Depends On:
Blocks: 1496509
TreeView+ depends on / blocked
 
Reported: 2017-09-22 13:08 UTC by John Fulton
Modified: 2018-06-26 23:46 UTC (History)
11 users (show)

Fixed In Version: RHEL: ceph-12.2.1-13.el7cp Ubuntu: ceph_12.2.1-15redhat1xenial
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1496509 (view as bug list)
Environment:
Last Closed: 2017-12-05 23:45:31 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 21493 0 None None None 2017-09-22 13:08:57 UTC
Red Hat Product Errata RHBA-2017:3387 0 normal SHIPPED_LIVE Red Hat Ceph Storage 3.0 bug fix and enhancement update 2017-12-06 03:03:45 UTC

Description John Fulton 2017-09-22 13:08:57 UTC
When using ceph-disk prepare as provided by ceph-osd-10.2.9-0.el7.x86_64 on /dev/vdb, which exists on my system, I get a stack trace [1] and see [Errno 2] No such file or directory: '/dev/vdb1'. Unfortunately it seems intermittent. For example, if I zap the device and re-run prepare, then I don't see this error but it does recur and will fail a ceph-ansible deployment. I am using ceph-docker [2].

[1]

Traceback (most recent call last):
File "/usr/sbin/ceph-disk", line 9, in <module>
load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5095, in run
main(sys.argv[1:])
File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5046, in main
args.func(args)
File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1855, in main
Prepare.factory(args).prepare()
File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1844, in prepare
self.prepare_locked()
File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1875, in prepare_locked
self.data.prepare(self.journal)
File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 2542, in prepare
self.prepare_device(*to_prepare_list)
File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 2705, in prepare_device
self.set_data_partition()
File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 2632, in set_data_partition
self.partition = self.create_data_partition()
File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 2616, in create_data_partition
return device.get_partition(partition_number)
File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1622, in get_partition
path=self.path, dev=dev, args=self.args)
File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1685, in factory
(dev is not None and is_mpath(dev))):
File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 535, in is_mpath
uuid = get_dm_uuid(dev)
File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 522, in get_dm_uuid
uuid_path = os.path.join(block_path(dev), 'dm', 'uuid')
File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 516, in block_path
rdev = os.stat(path).st_rdev

[2] Workaround: zap and re-run:

docker run -ti --privileged=true -v /dev/:/dev/ -e OSD_DEVICE=/dev/vdb docker.io/ceph/daemon:tag-build-master-jewel-centos-7 zap_device
docker start ceph-osd-prepare-overcloud-cephstorage-1-devdevvdb; docker logs -f ceph-osd-prepare-overcloud-cephstorage-1-devdevvdb
[root@overcloud-cephstorage-1 ~]# docker run -ti --entrypoint=bash docker.io/ceph/daemon:tag-build-master-jewel-centos-7
[root@e902236a00de /]#
[root@e902236a00de /]# rpm -qa | grep ceph
libcephfs1-10.2.9-0.el7.x86_64
python-cephfs-10.2.9-0.el7.x86_64
ceph-base-10.2.9-0.el7.x86_64
ceph-osd-10.2.9-0.el7.x86_64
ceph-radosgw-10.2.9-0.el7.x86_64
ceph-release-1-1.el7.noarch
ceph-common-10.2.9-0.el7.x86_64
ceph-selinux-10.2.9-0.el7.x86_64
ceph-mds-10.2.9-0.el7.x86_64
ceph-mon-10.2.9-0.el7.x86_64
[root@e902236a00de /]#

Comment 2 John Fulton 2017-09-22 13:13:24 UTC
In the workaround section I provided an example with the upstream docker.io/ceph/daemon:tag-build-master-jewel-centos-7. However, I have also seen this and used the same workaround with docker-registry.engineering.redhat.com/ceph/rhceph-2-rhel7:latest

Comment 5 Kefu Chai 2017-09-29 05:11:06 UTC
John, 

> [Errno 2] No such file or directory: '/dev/vdb1'

the exception looks weird if "/dev/vdb1" exists. can you reproduce this by rerunning "ceph-disk prepare /dev/vdb1" after this command fails? and what is the output if you run

$ stat /dev/sda

right after "ceph-disk prepare /dev/vdb1" fails?

Comment 6 Jon Schlueter 2017-09-29 15:06:30 UTC
I see that release flag has this targeted at ceph-3.0 but package/container reported against is jewel/ceph 2

Comment 7 John Fulton 2017-09-29 18:52:29 UTC
I was able to reproduce this again: 

[root@overcloud-cephstorage-2 ~]# docker start ceph-osd-prepare-overcloud-cephstorage-2-devdevvdb ; docker logs -f ceph-osd-prepare-overcloud-cephstorage-2-devdevvdb

...

command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=osd_journal_size
get_dm_uuid: get_dm_uuid /dev/vdb uuid path is /sys/dev/block/252:16/dm/uuid
get_dm_uuid: get_dm_uuid /dev/vdb uuid path is /sys/dev/block/252:16/dm/uuid
get_dm_uuid: get_dm_uuid /dev/vdb uuid path is /sys/dev/block/252:16/dm/uuid
command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_type
command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs
command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
get_dm_uuid: get_dm_uuid /dev/vdd uuid path is /sys/dev/block/252:48/dm/uuid
prepare_device: OSD will not be hot-swappable if journal is not the same device as the osd data
get_dm_uuid: get_dm_uuid /dev/vdd uuid path is /sys/dev/block/252:48/dm/uuid
ptype_tobe_for_name: name = journal
get_dm_uuid: get_dm_uuid /dev/vdd uuid path is /sys/dev/block/252:48/dm/uuid
command: Running command: /usr/sbin/parted --machine -- /dev/vdd print
get_free_partition_index: get_free_partition_index: analyzing BYT;
/dev/vdd:1074MB:virtblk:512:512:gpt:Virtio Block Device:;
2:269MB:538MB:268MB::ceph journal:;
create_partition: Creating journal partition num 3 size 256 on /dev/vdd
command_check_call: Running command: /usr/sbin/sgdisk --new=3:0:+256M --change-name=3:ceph journal --partition-guid=3:d7dcab81-6d89-48ba-bd9e-081dceba85b3 --typecode=3:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/vdd
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
update_partition: Calling partprobe on created device /dev/vdd
command_check_call: Running command: /usr/bin/udevadm settle --timeout=600
command: Running command: /usr/bin/flock -s /dev/vdd /usr/sbin/partprobe /dev/vdd
command_check_call: Running command: /usr/bin/udevadm settle --timeout=600
get_dm_uuid: get_dm_uuid /dev/vdd uuid path is /sys/dev/block/252:48/dm/uuid
get_dm_uuid: get_dm_uuid /dev/vdd uuid path is /sys/dev/block/252:48/dm/uuid
get_dm_uuid: get_dm_uuid /dev/vdd3 uuid path is /sys/dev/block/252:51/dm/uuid
prepare_device: Journal is GPT partition /dev/disk/by-partuuid/d7dcab81-6d89-48ba-bd9e-081dceba85b3
prepare_device: Journal is GPT partition /dev/disk/by-partuuid/d7dcab81-6d89-48ba-bd9e-081dceba85b3
get_dm_uuid: get_dm_uuid /dev/vdb uuid path is /sys/dev/block/252:16/dm/uuid
set_data_partition: Creating osd partition on /dev/vdb
get_dm_uuid: get_dm_uuid /dev/vdb uuid path is /sys/dev/block/252:16/dm/uuid
ptype_tobe_for_name: name = data
get_dm_uuid: get_dm_uuid /dev/vdb uuid path is /sys/dev/block/252:16/dm/uuid
create_partition: Creating data partition num 1 size 0 on /dev/vdb
command_check_call: Running command: /usr/sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:ad14c7b6-c23e-4357-a594-009234f7069e --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be --mbrtogpt -- /dev/vdb
The operation has completed successfully.
update_partition: Calling partprobe on created device /dev/vdb
command_check_call: Running command: /usr/bin/udevadm settle --timeout=600
command: Running command: /usr/bin/flock -s /dev/vdb /usr/sbin/partprobe /dev/vdb
command_check_call: Running command: /usr/bin/udevadm settle --timeout=600
get_dm_uuid: get_dm_uuid /dev/vdb uuid path is /sys/dev/block/252:16/dm/uuid
get_dm_uuid: get_dm_uuid /dev/vdb uuid path is /sys/dev/block/252:16/dm/uuid
Traceback (most recent call last):
  File "/usr/sbin/ceph-disk", line 9, in <module>
    load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5095, in run
    main(sys.argv[1:])
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5046, in main
    args.func(args)
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1855, in main
    Prepare.factory(args).prepare()
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1844, in prepare
    self.prepare_locked()
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1875, in prepare_locked
    self.data.prepare(self.journal)
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 2542, in prepare
    self.prepare_device(*to_prepare_list)
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 2705, in prepare_device
    self.set_data_partition()
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 2632, in set_data_partition
    self.partition = self.create_data_partition()
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 2616, in create_data_partition
    return device.get_partition(partition_number)
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1622, in get_partition
    path=self.path, dev=dev, args=self.args)
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1685, in factory
    (dev is not None and is_mpath(dev))):
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 535, in is_mpath
    uuid = get_dm_uuid(dev)
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 522, in get_dm_uuid
    uuid_path = os.path.join(block_path(dev), 'dm', 'uuid')
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 516, in block_path
    rdev = os.stat(path).st_rdev
OSError: [Errno 2] No such file or directory: '/dev/vdb1'
[root@overcloud-cephstorage-2 ~]# stat /dev/vdb1
  File: ‘/dev/vdb1’
  Size: 0         	Blocks: 0          IO Block: 4096   block special file
Device: 5h/5d	Inode: 3594514     Links: 1     Device type: fc,11
Access: (0660/brw-rw----)  Uid: (    0/    root)   Gid: (    6/    disk)
Context: system_u:object_r:fixed_disk_device_t:s0
Access: 2017-09-29 19:43:35.502364450 +0000
Modify: 2017-09-29 19:43:35.502364450 +0000
Change: 2017-09-29 19:43:35.502364450 +0000
 Birth: -
[root@overcloud-cephstorage-2 ~]# stat /dev/sda
stat: cannot stat ‘/dev/sda’: No such file or directory
[root@overcloud-cephstorage-2 ~]# stat /dev/vda
  File: ‘/dev/vda’
  Size: 0         	Blocks: 0          IO Block: 4096   block special file
Device: 5h/5d	Inode: 9480        Links: 1     Device type: fc,0
Access: (0660/brw-rw----)  Uid: (    0/    root)   Gid: (    6/    disk)
Context: system_u:object_r:fixed_disk_device_t:s0
Access: 2017-09-29 19:24:19.883364450 +0000
Modify: 2017-09-25 01:58:34.239364450 +0000
Change: 2017-09-25 01:58:34.239364450 +0000
 Birth: -
[root@overcloud-cephstorage-2 ~]#

Comment 8 John Fulton 2017-09-29 18:53:21 UTC
(In reply to Jon Schlueter from comment #6)
> I see that release flag has this targeted at ceph-3.0 but package/container
> reported against is jewel/ceph 2

re-assigning needinfo after accidentally closing all need-infos

Comment 9 John Fulton 2017-09-29 18:55:31 UTC
(In reply to Kefu Chai from comment #5)
> John, 
> 
> > [Errno 2] No such file or directory: '/dev/vdb1'
> 
> the exception looks weird if "/dev/vdb1" exists. can you reproduce this by
> rerunning "ceph-disk prepare /dev/vdb1" after this command fails? 

Rather than directly run "ceph-disk prepare /dev/vdb1" I had the ceph container run it by running: 

 docker start ceph-osd-prepare-overcloud-cephstorage-2-devdevvdb

> and what is the output if you run
> 
> $ stat /dev/sda
> 
> right after "ceph-disk prepare /dev/vdb1" fails?

I ran this as requested, after reproducing the error, and pasted it in comment #7

Comment 10 John Fulton 2017-09-29 18:58:53 UTC
If you want to reproduce this error, then here are the steps. 

1.  use ceph-ansible to do a cotainerized deployment by using site-docker.yaml
2. If you fail on this, then you've reproduced it, but if you don't fail on it, you can do the following....

Given a working containerized Ceph deployment:

[root@overcloud-cephstorage-2 ~]# docker ps
CONTAINER ID        IMAGE                                                   COMMAND             CREATED             STATUS              PORTS               NAMES
ba88ee78641b        docker.io/ceph/daemon:tag-build-master-jewel-centos-7   "/entrypoint.sh"    10 minutes ago      Up 10 minutes                           ceph-osd-overcloud-cephstorage-2-devvdb
983ef4bc33b7        tripleoupstream/centos-binary-cron:latest               "kolla_start"       4 days ago          Up 4 days                               logrotate_crond
b0cebb90f70f        docker.io/ceph/daemon:tag-build-master-jewel-centos-7   "/entrypoint.sh"    4 days ago          Up 4 days                               ceph-osd-overcloud-cephstorage-2-devvdc
[root@overcloud-cephstorage-2 ~]# 

Determine the OSD number of the container you want to stop by asking that container to show it's output of df:

[root@overcloud-cephstorage-2 ~]# docker exec -ti ba88ee78641b df 
Filesystem     1K-blocks    Used Available Use% Mounted on
overlay         52416492 4906864  47509628  10% /
tmpfs            4005268       0   4005268   0% /sys/fs/cgroup
devtmpfs         3979576       0   3979576   0% /dev
shm                65536       0     65536   0% /dev/shm
/dev/vda2       52416492 4906864  47509628  10% /etc/ceph
/dev/vdb1        1044076   33576   1010500   4% /var/lib/ceph/osd/ceph-0
[root@overcloud-cephstorage-2 ~]# 

To tell systemd not to restart the container after you take it down, change it's unit file as follows:

-Restart=always
+Restart=never
Find the line to change like this and don't forget to reload systemd.

[root@overcloud-cephstorage-2 ~]# vi /etc/systemd/system/ceph-osd\@.service
# Please do not change this file directly since it is managed by Ansible and will be overwritten
[Unit]
Description=Ceph OSD
After=docker.service

[Service]
EnvironmentFile=-/etc/environment
ExecStartPre=-/usr/bin/docker stop ceph-osd-overcloud-cephstorage-2-dev%i
ExecStartPre=-/usr/bin/docker rm -f ceph-osd-overcloud-cephstorage-2-dev%i
ExecStart=/usr/share/ceph-osd-run.sh %i
ExecStop=-/usr/bin/docker stop ceph-osd-overcloud-cephstorage-2-dev%i
Restart=always   # <---------------------------------------------------------- HERE!  
RestartSec=10s
TimeoutStartSec=120
TimeoutStopSec=15

[Install]
WantedBy=multi-user.target
[root@overcloud-cephstorage-2 ~]# systemctl daemon-reload
[root@overcloud-cephstorage-2 ~]# 

Stop the container with systemd (the command below is what it does)

/usr/bin/docker stop ceph-osd-overcloud-cephstorage-2-devvdb

Verify the OSD is down

[root@overcloud-cephstorage-2 ~]# docker ps
CONTAINER ID        IMAGE                                                   COMMAND             CREATED             STATUS              PORTS               NAMES
983ef4bc33b7        tripleoupstream/centos-binary-cron:latest               "kolla_start"       4 days ago          Up 4 days                               logrotate_crond
b0cebb90f70f        docker.io/ceph/daemon:tag-build-master-jewel-centos-7   "/entrypoint.sh"    4 days ago          Up 4 days                               ceph-osd-overcloud-cephstorage-2-devvdc
[root@overcloud-cephstorage-2 ~]# ceph -s
    cluster 7c5b1446-a16b-11e7-aacd-00e165f39225
     health HEALTH_WARN
            1/6 in osds are down
     monmap e1: 1 mons at {overcloud-controller-0=192.168.24.9:6789/0}
            election epoch 3, quorum 0 overcloud-controller-0
      fsmap e5: 1/1/1 up {0=overcloud-controller-0=up:active}
     osdmap e22: 6 osds: 5 up, 6 in
            flags sortbitwise,require_jewel_osds
      pgmap v1600: 240 pgs, 8 pools, 40164 kB data, 30 objects
            240 MB used, 5877 MB / 6117 MB avail
                 240 active+clean
[root@overcloud-cephstorage-2 ~]# 

Have a container zap the disk:

[root@overcloud-cephstorage-2 ~]# docker run -ti --privileged=true -v /dev/:/dev/ -e OSD_DEVICE=/dev/vdb docker.io/ceph/daemon:tag-build-master-jewel-centos-7 zap_device
ownership of '/var/run/ceph/' retained as ceph:ceph
ownership of '/var/lib/ceph/bootstrap-mds' retained as ceph:ceph
ownership of '/var/lib/ceph/bootstrap-osd' retained as ceph:ceph
ownership of '/var/lib/ceph/bootstrap-rgw' retained as ceph:ceph
ownership of '/var/lib/ceph/mds' retained as ceph:ceph
changed ownership of '/var/lib/ceph/mds/ceph-mds-1f2980da8c8d' from root:root to ceph:ceph
ownership of '/var/lib/ceph/mon' retained as ceph:ceph
changed ownership of '/var/lib/ceph/mon/ceph-1f2980da8c8d' from root:root to ceph:ceph
ownership of '/var/lib/ceph/osd' retained as ceph:ceph
changed ownership of '/var/lib/ceph/radosgw' from root:root to ceph:ceph
changed ownership of '/var/lib/ceph/radosgw/1f2980da8c8d' from root:root to ceph:ceph
ownership of '/var/lib/ceph/tmp' retained as ceph:ceph
2017-09-29 19:13:37  /entrypoint.sh: Zapping the entire device /dev/vdb
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
The operation has completed successfully.
2017-09-29 19:13:39  /entrypoint.sh: Executing partprobe on /dev/vdb
[root@overcloud-cephstorage-2 ~]# 

If you're using a separate disk for a journal, then you need to zap the journal partition too.

parted /dev/vdd --script rm 1

Note that I had verified that partition 1 was the journal in use vdb.

[root@overcloud-cephstorage-2 ~]# ceph-disk list
/dev/vda :
 /dev/vda1 other, iso9660
 /dev/vda2 other, xfs, mounted on /
/dev/vdb other, unknown
/dev/vdc :
 /dev/vdc1 ceph data, prepared, cluster ceph, osd.5, journal /dev/vdd2
/dev/vdd :
 /dev/vdd2 ceph journal, for /dev/vdc1
[root@overcloud-cephstorage-2 ~]# 

In my case I want to remove all partitions on the journal disk except the second one.

for p in $(parted /dev/vdd --script print | awk {'print $1'} | egrep [0-9] | grep -v 2); do parted /dev/vdd --script rm $p; done

Have a container prepare the disk:

[root@overcloud-cephstorage-2 ~]#   docker start ceph-osd-prepare-overcloud-cephstorage-2-devdevvdb
ceph-osd-prepare-overcloud-cephstorage-2-devdevvdb
[root@overcloud-cephstorage-2 ~]# 

See if the preparation worked.

docker logs -f ceph-osd-prepare-overcloud-cephstorage-2-devdevvdb

If it, then did rezap and re-prepare until the prepare fails.

In Summary

Erase:

docker run -ti --privileged=true -v /dev/:/dev/ -e OSD_DEVICE=/dev/vdb docker.io/ceph/daemon:tag-build-master-jewel-centos-7 zap_device ; for p in $(parted /dev/vdd --script print | awk {'print $1'} | egrep [0-9] | grep -v 2); do parted /dev/vdd --script rm $p; done

Try:

docker start ceph-osd-prepare-overcloud-cephstorage-2-devdevvdb ; docker logs -f ceph-osd-prepare-overcloud-cephstorage-2-devdevvdb

I had to run the above commands to erase and try five times in a row, but I did get the error the last time. 

Note that I am using virtual hardware.

Comment 11 John Fulton 2017-09-29 20:19:41 UTC
(In reply to Kefu Chai from comment #5)
> John, 
> 
> > [Errno 2] No such file or directory: '/dev/vdb1'
> 
> the exception looks weird if "/dev/vdb1" exists. can you reproduce this by
> rerunning "ceph-disk prepare /dev/vdb1" after this command fails? and what
> is the output if you run
> 
> $ stat /dev/sda
> 
> right after "ceph-disk prepare /dev/vdb1" fails?

I want to be sure I'm providing the info you need. So I reproduced the error again using the following loop and then doing a Ctrl-C when I saw it.


while [ 1 ] ; do docker run -ti --privileged=true -v /dev/:/dev/ -e OSD_DEVICE=/dev/vdb docker.io/ceph/daemon:tag-build-master-jewel-centos-7 zap_device ; for p in $(parted /dev/vdd --script print | awk {'print $1'} | egrep [0-9] | grep -v 2); do parted /dev/vdd --script rm $p; done ; docker start ceph-osd-prepare-overcloud-cephstorage-2-devdevvdb ; docker logs -f  ceph-osd-prepare-overcloud-cephstorage-2-devdevvdb ; sleep 1; done ;


Here is a paste from that...


get_dm_uuid: get_dm_uuid /dev/vdb uuid path is /sys/dev/block/252:16/dm/uuid
create_partition: Creating data partition num 1 size 0 on /dev/vdb
command_check_call: Running command: /usr/sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:4a323d18-e61e-4f47-ae98-291d44625203 --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be --mbrtogpt -- /dev/vdb
The operation has completed successfully.
update_partition: Calling partprobe on created device /dev/vdb
command_check_call: Running command: /usr/bin/udevadm settle --timeout=600
command: Running command: /usr/bin/flock -s /dev/vdb /usr/sbin/partprobe /dev/vdb
command_check_call: Running command: /usr/bin/udevadm settle --timeout=600
get_dm_uuid: get_dm_uuid /dev/vdb uuid path is /sys/dev/block/252:16/dm/uuid
get_dm_uuid: get_dm_uuid /dev/vdb uuid path is /sys/dev/block/252:16/dm/uuid
Traceback (most recent call last):
  File "/usr/sbin/ceph-disk", line 9, in 
    load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5095, in run
    main(sys.argv[1:])
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5046, in main
    args.func(args)
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1855, in main
    Prepare.factory(args).prepare()
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1844, in prepare
    self.prepare_locked()
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1875, in prepare_locked
    self.data.prepare(self.journal)
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 2542, in prepare
    self.prepare_device(*to_prepare_list)
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 2705, in prepare_device
    self.set_data_partition()
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 2632, in set_data_partition
    self.partition = self.create_data_partition()
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 2616, in create_data_partition
    return device.get_partition(partition_number)
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1622, in get_partition
    path=self.path, dev=dev, args=self.args)
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1685, in factory
    (dev is not None and is_mpath(dev))):
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 535, in is_mpath
    uuid = get_dm_uuid(dev)
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 522, in get_dm_uuid
    uuid_path = os.path.join(block_path(dev), 'dm', 'uuid')
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 516, in block_path
    rdev = os.stat(path).st_rdev
OSError: [Errno 2] No such file or directory: '/dev/vdb1'
ownership of '/var/run/ceph/' retained as ceph:ceph
ownership of '/var/lib/ceph/osd' retained as ceph:ceph
ownership of '/var/lib/ceph/osd/ceph-0' retained as ceph:ceph
ownership of '/var/lib/ceph/osd/ceph-5' retained as ceph:ceph
ownership of '/var/lib/ceph/bootstrap-mds' retained as ceph:ceph
ownership of '/var/lib/ceph/bootstrap-mds/ceph.keyring' retained as ceph:ceph
ownership of '/var/lib/ceph/radosgw' retained as ceph:ceph
ownership of '/var/lib/ceph/radosgw/overcloud-cephstorage-2' retained as ceph:ceph
ownership of '/var/lib/ceph/bootstrap-osd' retained as ceph:ceph
ownership of '/var/lib/ceph/bootstrap-osd/ceph.keyring' retained as ceph:ceph
ownership of '/var/lib/ceph/mds' retained as ceph:ceph
ownership of '/var/lib/ceph/mds/ceph-mds-overcloud-cephstorage-2' retained as ceph:ceph
ownership of '/var/lib/ceph/mon' retained as ceph:ceph
ownership of '/var/lib/ceph/mon/ceph-overcloud-cephstorage-2' retained as ceph:ceph
ownership of '/var/lib/ceph/bootstrap-rgw' retained as ceph:ceph
ownership of '/var/lib/ceph/bootstrap-rgw/ceph.keyring' retained as ceph:ceph
ownership of '/var/lib/ceph/tmp' retained as ceph:ceph
ownership of '/var/lib/ceph/tmp/ceph-disk.prepare.lock' retained as ceph:ceph
ownership of '/var/lib/ceph/tmp/ceph-disk.activate.lock' retained as ceph:ceph
ownership of '/var/lib/ceph/bootstrap-rbd' retained as ceph:ceph
2017-09-29 20:47:39  /entrypoint.sh: static: does not generate config
HEALTH_OK
2017-09-29 20:47:39  /entrypoint.sh: Regarding parted, device /dev/vdb is inconsistent/broken/weird.
2017-09-29 20:47:39  /entrypoint.sh: It would be too dangerous to destroy it without any notification.
2017-09-29 20:47:39  /entrypoint.sh: Please set OSD_FORCE_ZAP to '1' if you really want to zap this disk.
^C
[root@overcloud-cephstorage-2 ~]# ^C
[root@overcloud-cephstorage-2 ~]# ceph-disk prepare /dev/vdb1
Traceback (most recent call last):
  File "/sbin/ceph-disk", line 9, in 
    load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5047, in run
    main(sys.argv[1:])
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5000, in main
    main_catch(args.func, args)
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5025, in main_catch
    func(args)
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1812, in main
    Prepare.factory(args).prepare()
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1808, in factory
    return PrepareFilestore(args)
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1820, in __init__
    self.data = PrepareFilestoreData(args)
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 2420, in __init__
    self.set_type()
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 2428, in set_type
    dmode = os.stat(self.args.data).st_mode
OSError: [Errno 2] No such file or directory: '/dev/vdb1'
[root@overcloud-cephstorage-2 ~]#


However, I had originally asked it to prepare /dev/vdb, not /dev/vdb1. It's prepare that tries to create vdb1 as the above contains the following:
 create_partition: Creating data partition num 1 size 0 on /dev/vdb
but it's not there:


[root@overcloud-cephstorage-2 ~]# parted /dev/vdb --script print 
Error: /dev/vdb: unrecognised disk label
Model: Virtio Block Device (virtblk)
Disk /dev/vdb: 1074MB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags: 
[root@overcloud-cephstorage-2 ~]# 

If you were asking me to run that, not in a container but directly, then here's that output:

[root@overcloud-cephstorage-2 ~]# ceph-disk prepare /dev/vdb 
Creating new GPT entries.
The operation has completed successfully.
The operation has completed successfully.
meta-data=/dev/vdb1              isize=2048   agcount=4, agsize=49087 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0, sparse=0
data     =                       bsize=4096   blocks=196347, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=864, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
The operation has completed successfully.
[root@overcloud-cephstorage-2 ~]# parted /dev/vdb --script print 
Model: Virtio Block Device (virtblk)
Disk /dev/vdb: 1074MB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size   File system  Name          Flags
 2      1049kB  269MB   268MB  xfs          ceph journal
 1      269MB   1074MB  804MB  xfs          ceph data

[root@overcloud-cephstorage-2 ~]#

We seem to only hit this when using containers, I received no reports about it outside of containers and I tried to reproduce it without containers using the following loop.

while [ 1 ] ; do sgdisk -Z /dev/vdb ; ceph-disk prepare /dev/vdb ; done

However, I didn't run into it. Until there's other evidence I think this is a containers only bug for ceph-disk.

Comment 12 John Fulton 2017-10-02 16:41:58 UTC
*** Bug 1490283 has been marked as a duplicate of this bug. ***

Comment 13 John Fulton 2017-10-04 13:17:42 UTC
I changed the subject to say "in containers" as this does not happen outside of containers. 

This appears to be a timing issue when running inside the container where partitions need more time to show up and ceph-disk doesn't wait for them to be fully available.

Similar symptoms might have been fixed by this: 

 https://github.com/ceph/ceph/commit/702edb5519e67bc5f8c5b65c6f63c9635cd758cf

in the past, but that fix is in my container and still an issue. 

I can provide an environment to anyone working on ceph-disk where the problem exists.

Comment 16 John Fulton 2017-10-04 19:26:58 UTC
(In reply to Jon Schlueter from comment #6)
> I see that release flag has this targeted at ceph-3.0 but package/container
> reported against is jewel/ceph 2

The same fix could be made against ceph-3.0 and then be backported to the following clone of this bug to get it in jewel/ceph 2:

 https://bugzilla.redhat.com/show_bug.cgi?id=1496509

Comment 18 John Fulton 2017-10-05 14:02:15 UTC
Kefu,

I had a brief conversation with Seb. He thinks that this pattern may need to be applied elsewhere in ceph-disk for Jewel (not just where it is currently applied). Do you think that's relevant? 

 https://github.com/ceph/ceph/commit/93e7b95ed8b4c78daebf7866bb1f0826d7199075

Thanks,
  John

Comment 20 Kefu Chai 2017-10-07 11:27:34 UTC
john,

to the best of my knowledge, https://github.com/ceph/ceph/commit/93e7b95ed8b4c78daebf7866bb1f0826d7199075 keep polling the /sys/block until the expected partition sysfs file shows up under /sys/block. but a sysfs attribute file is created by the linux kernel when a new partition is found. partprobe sends ioctl(BLKPG, ..) calls to instruct kernel to add/del/resize partitions. and the ioctl() syscall does not return until the sysfs file is created. so it's all linux kernel's job to create sysfs files in a blocking way.

while, in general, the files under /dev/ representing partitions are created by udev. as Loïc pointed out, udev and (partprobe + linux kernel) are asynchronous. i.e. partprobe instruct linux kernel to add new partitions when they are created, and linux kernel sends uevent to udev in userspace. in the meanwhile, udev adds device files under /dev/ when handling the uevents from kernel.

anyway, apply the same fix -- retry if a /dev/ file does not exists might help as per the analysis of http://tracker.ceph.com/issues/19428, the sysfile takes up to 1 sec to show up under /sys/block. probably the same applies to /dev/ in docker container. i will prepare a patch addressing it.


also, "udevadmin settle" waits until the event queue is empty by watching /run/udev using inotify(7). and the funny thing about udevadmin is that it returns 0 if the /run/udev directory does not exists at all. see adm_settle() and udev_queue_get_fd()

quote from adm_settle() in src/udev/udevadm-settle.c of systemd-234
===========>8============
        pfd[0].fd = udev_queue_get_fd(queue);
        if (pfd[0].fd < 0) {
                log_debug("queue is empty, nothing to watch");
                rc = EXIT_SUCCESS;
                goto out;
        }
===========8<============

i verified my theory in the testbed:

[root@overcloud-cephstorage-2 ~]# docker run -ti --entrypoint=bash docker.io/ceph/daemon:tag-build-master-jewel-centos-7
[root@df045229ec6c /]# udevadm --version
219
[root@a036184074e4 /]# ls /run/udev/
ls: cannot access /run/udev/: No such file or directory
[root@a036184074e4 /]# udevadm settle
[root@a036184074e4 /]# echo $?
0
[root@a036184074e4 /]# exit
[root@overcloud-cephstorage-2 ~]# ls /run/udev/
control  data  links  static_node-tags  tags  watch

so, to ensure that "udevadm" is able to check the status of udev's event queue, we need to pass "-v /run/udev:/run/udev:ro" to docker. take the reproducer as an example:


while true ; do
    docker run -ti --privileged=true -v /dev/:/dev/ -v /run/udev:/run/udev:ro -e OSD_DEVICE=/dev/vdb docker.io/ceph/daemon:tag-build-master-jewel-centos-7 zap_device ;
    for p in $(parted /dev/vdd --script print | awk {'print $1'} | egrep [0-9] | grep -v 2); do
        parted /dev/vdd --script rm $p;
    done ;
    docker start ceph-osd-prepare-overcloud-cephstorage-2-devdevvdb ;
    docker logs -f  ceph-osd-prepare-overcloud-cephstorage-2-devdevvdb ;
    sleep 1;
done ;

Comment 21 Kefu Chai 2017-10-07 11:36:42 UTC
also, https://github.com/ceph/ceph-ansible/pull/1531 could also address this issue. @john, have you get a chance to try it?

Comment 22 Kefu Chai 2017-10-07 16:42:50 UTC
upstream PR posted at https://github.com/ceph/ceph/pull/18162. will backport it once it gets merged.

Comment 23 Kefu Chai 2017-10-07 18:10:18 UTC
forgot to mention, ceph-disk fails even with "-v /run/udev:/run/udev:ro".

Comment 24 Kefu Chai 2017-10-08 12:50:47 UTC
John, i prepared a test build with the backport of PR 18162 at https://2.chacra.ceph.com/r/ceph/wip-bz1494543-jewel-kefu/d36009b00ddbaf2f34e106be7c0e824b327d124a/centos/7/flavors/default/, could you give it a try?

Comment 26 Sébastien Han 2017-10-08 14:36:10 UTC
John, I built a new image with Kefu's fix here: https://hub.docker.com/r/ceph/daemon/builds/bafedtemdtvo6ofheper28a/


Use the following tag: tag-build-race-jewel-devel-wip-bz1494543-jewel-kefu-centos-7

Comment 27 Sébastien Han 2017-10-08 14:47:06 UTC
new image building here, https://hub.docker.com/r/ceph/daemon/builds/b6h8yysq4mrjmmitdjybear/

same tag

Comment 31 John Fulton 2017-10-09 20:12:03 UTC
Thanks Kefu! 

I conducted 5 tests in a row with a docker image [1] that contains the fix and didn't hit the issue. I am going to pursue getting this promoted into the puddle so OpenStack QE can try it too. 

[1] https://hub.docker.com/r/ceph/daemon/builds/bvqtdrttvmbajeepyjbnvm3

Comment 48 errata-xmlrpc 2017-12-05 23:45:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3387


Note You need to log in before you can comment on or make changes to this bug.