Description of problem: Attempted splitstack deployment tests and see following error: "fatal: [ceph-0]: FAILED! => changed=false ", " - --privileged", " - --ipc=host", " - --ulimit", " - nofile=1024:4096", " - /run/lock/lvm:/run/lock/lvm:z", " - /var/run/udev/:/var/run/udev/:z", " - /dev:/dev", " - /run/lvm/:/run/lvm/", " - --entrypoint=ceph-volume", " - lvm", " - batch", " - --bluestore", " - --yes", " - --prepare", " - --report", " - --format=json", " stderr: 'Error: error checking path \"/run/lock/lvm\": stat /run/lock/lvm: no such file or directory'", "fatal: [ceph-2]: FAILED! => changed=false ", "fatal: [ceph-1]: FAILED! => changed=false ", "NO MORE HOSTS LEFT *************************************************************", "PLAY RECAP *********************************************************************", "ceph-0 : ok=89 changed=4 unreachable=0 failed=1 skipped=198 rescued=0 ignored=0 ", "ceph-1 : ok=87 changed=4 unreachable=0 failed=1 skipped=197 rescued=0 ignored=0 ", "ceph-2 : ok=87 changed=4 unreachable=0 failed=1 skipped=197 rescued=0 ignored=0 ", "compute-0 : ok=45 changed=3 unreachable=0 failed=0 skipped=145 rescued=0 ignored=0 ", "compute-1 : ok=34 changed=2 unreachable=0 failed=0 skipped=115 rescued=0 ignored=0 ", "controller-0 : ok=166 changed=21 unreachable=0 failed=0 skipped=287 rescued=0 ignored=0 ", "controller-1 : ok=153 changed=19 unreachable=0 failed=0 skipped=277 rescued=0 ignored=0 ", "controller-2 : ok=153 changed=19 unreachable=0 failed=0 skipped=279 rescued=0 ignored=0 ", Version-Release number of selected component (if applicable): RHOS_TRUNK-16.0-RHEL-8-20191122.n.2 How reproducible: Every time a split stack jenkins job is executed Steps to Reproduce: 1. Execute any split stack job in Jenkins e.g. DFG-df-splitstack-16-virsh-3cont_2comp_3ceph-skip-deploy-identifier-scaleup 2. 3. Actual results: Job aborts with error in description Expected results: Job completes successfully Additional info:
This is happening because the lvm2 package is missing from your ceph storage nodes [1] and is required by Ceph in order to create bluestore OSDs with ceph-volume [2] I'd file this as a overcloud image bug (missing needed package) but because you're using split-stack the person doing the deployment is responsible for installing the needed packages [3]. That said it doesn't explicitly say you need to install the lvm2 package. For that reason I think you need to fix your job by installing that package and we need a docbug to tell the user to install the package. [1] [fultonj@runcible bz1777020]$ ls ceph-0/etc/lvm ls: cannot access 'ceph-0/etc/lvm': No such file or directory [fultonj@runcible bz1777020]$ grep -i lvm ceph-0/var/log/rpm.list [fultonj@runcible bz1777020]$ [2] https://github.com/ceph/ceph-ansible/blob/v4.0.5/roles/ceph-config/tasks/main.yml#L18 ceph-ansible-4.0.5-1.el8cp.noarch ceph_docker_image: ceph/rhceph-4.0-rhel8 ceph_docker_image_tag: latest [3] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/15/html-single/director_installation_and_usage/index#registering-the-operating-system-for-pre-provisioned-nodes
Please update step 7.3 "Registering the operating system for pre-provisioned nodes" in the director_installation_and_usage document [1] to have an additional step which requires the user to install the lvm2 package. Here's an example where I've moved step 6 to step 7 and inserted a new step 6. """ 5. Enable the required Red Hat Enterprise Linux repositories. For x86_64 systems, run: [root@controller-0 ~]# sudo subscription-manager repos --enable=rhel-8-for-x86_64-baseos-rpms --enable=rhel-8-for-x86_64-appstream-rpms --enable=rhel-8-for-x86_64-highavailability-rpms --enable=ansible-2.8-for-rhel-8-x86_64-rpms --enable=openstack-15-for-rhel-8-x86_64-rpms --enable=rhceph-4-osd-for-rhel-8-x86_64-rpms--enable=rhceph-4-mon-for-rhel-8-x86_64-rpms --enable=rhceph-4-tools-for-rhel-8-x86_64-rpms --enable=advanced-virt-for-rhel-8-x86_64-rpms --enable=fast-datapath-for-rhel-8-x86_64-rpms 6. Install packages required by Ceph (optional) If you're going to using Ceph in the overloud, then run the following command to install the necessary packages: [root@controller-0 ~]# sudo yum install -y lvm2 7. Update your system to ensure you have the latest base system packages: [root@controller-0 ~]# sudo yum update -y [root@controller-0 ~]# sudo reboot """ [1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/15/html-single/director_installation_and_usage/index#registering-the-operating-system-for-pre-provisioned-nodes
We will also add a validation to make solving this issue easier to understand in the field. Tracked in bug 1777336.
So there is a tripleo-bootstrap role that should handle this pre-req install.
(In reply to John Fulton from comment #2) > I'd file this as a overcloud image bug (missing needed package) but because > you're using split-stack the person doing the deployment is responsible for > installing the needed packages [3]. I was wrong about ^ I thought the person doing the deployment was responsible for installing the packages. They're only responsible for enabling the repositories [3]. tripleo-bootstrap is what installs them. [3] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/15/html-single/director_installation_and_usage/index#registering-the-operating-system-for-pre-provisioned-nodes
Since this is a DF bz and the patch has merged I'm changing the DFG label here to DF
job passed: https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/df/view/splitstack/job/DFG-df-splitstack-16-virsh-3cont_2comp_3ceph-skip-deploy-identifier-scaleup/
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:0283