Bug 1748638
Summary: | openshift-install 4.2 GCloud does not install /lib/udev/rules.d/65-gce-disk-naming.rules | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Craig Rodrigues <rodrigc> |
Component: | RHCOS | Assignee: | Micah Abbott <miabbott> |
Status: | CLOSED ERRATA | QA Contact: | Michael Nguyen <mnguyen> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.2.0 | CC: | bbreard, dustymabe, imcleod, jligon, nstielau |
Target Milestone: | --- | ||
Target Release: | 4.2.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-10-16 06:40:12 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Craig Rodrigues
2019-09-03 23:27:39 UTC
This is similar to https://bugzilla.redhat.com/show_bug.cgi?id=1747575 If I create a normal GKE cluster in GCloud, using the GCloud web UI, I can log into one of the nodes, and I can see that: /lib/udev/rules.d/64-gce-disk-removal.rules /lib/udev/rules.d/99-gce.rules /lib/udev/rules.d/65-gce-disk-naming.rules are there. Our storage product (Portworx) works fine in that case when installed on plain GKE. However, when we install our product on Openshift4 in GCloud, we hit problems when dynamically provisioning disks. This problem will hit other storage providers using Openshift4 in GCloud. So in GKE, these files are coming from the gce-compute-image-packages package: for f in $(find /lib/udev/rules.d -name "*gc*" ) > do > dpkg -S $f > done gce-compute-image-packages: /lib/udev/rules.d/64-gce-disk-removal.rules gce-compute-image-packages: /lib/udev/rules.d/99-gce.rules gce-compute-image-packages: /lib/udev/rules.d/65-gce-disk-naming.rules Looking around, I found these rules on GitHub: https://github.com/GoogleCloudPlatform/compute-image-packages/tree/master/packages/google-compute-engine/src/lib/udev/rules.d/ Craig, does Portworx require the `64-gce-disk-removal.rules`? It looks like it is just forcing a lazy unmount of the device and logging a message to the journal. I would imagine the umounting of the device would be handled at a higher layer and wouldn't be handled by the udev rules. Micah, At Portworx, we have extensive tests for cloud storage. As you fix this bug on GCloud, could you run the tests we have for mounting storage? Our tests are Open Source. You can do the following. 1. Provision an Openshift cluster on GCloud 2. Get direct access to one of the nodes and log into it. 3. Read: https://github.com/libopenstorage/cloudops/blob/master/gce/README.md 4. Use the following container to checkout and run the tests on GCloud, replace the environment variables with your GCloud setup: docker run \ --rm \ -t \ -i \ -e GOOGLE_APPLICATION_CREDENTIALS=<path-to-service-account-json-file> \ -e GCE_INSTANCE_NAME=<gce-instance-name> \ -e GCE_INSTANCE_ZONE=<gce-instance-zone> \ -e GCE_INSTANCE_PROJECT=<gce-project-name> \ -v $PWD:/go/src/github.com/libopenstorage \ -w /go/src/github.com/libopenstorage \ hatsunemiku/golang-dev-docker \ bash -c 'git clone https://github.com/libopenstorage/cloudops && cd cloudops && make && make test' And for Azure: 1. Provision an Openshift cluster on Azure 2. Get direct access to one of the nodes and log into it. 3. Read: https://github.com/libopenstorage/cloudops/blob/master/azure/README.md for 4. Use the following container to checkout and run the tests on Azure, replace the environment variables with your Azure setup: docker run \ --rm \ -t \ -i \ -e AZURE_INSTANCE_ID=<instance-id> \ -e AZURE_INSTANCE_REGION=<instance-region> \ -e AZURE_SCALE_SET_NAME=<scale-set-name> \ -e AZURE_SUBSCRIPTION_ID=<subscription-id> \ -e AZURE_RESOURCE_GROUP_NAME=<resource-group-name-of-instance> \ -e AZURE_ENVIRONMENT=<azure-cloud-environment> \ -e AZURE_TENANT_ID=<tenant-id> \ -e AZURE_CLIENT_ID=<client-id> \ -e AZURE_CLIENT_SECRET=<client-secret> \ -v $PWD:/go/src/github.com/libopenstorage \ -w /go/src/github.com/libopenstorage \ hatsunemiku/golang-dev-docker \ bash -c 'git clone https://github.com/libopenstorage/cloudops && cd cloudops && make && make test' You can run additional tests on GCE by doing: docker run \ --rm \ -t \ -i \ -e GOOGLE_APPLICATION_CREDENTIALS=<path-to-service-account-json-file> \ -e GCE_INSTANCE_NAME=<gce-instance-name> \ -e GCE_INSTANCE_ZONE=<gce-instance-zone> \ -e GCE_INSTANCE_PROJECT=<gce-project-name> \ -v $PWD:/go/src/github.com/libopenstorage \ -w /go/src/github.com/libopenstorage \ hatsunemiku/golang-dev-docker \ bash -c 'git clone https://github.com/libopenstorage/cloudops && cd cloudops/gce && go test -v' Rules have been added to the RHCOS config and will be present in RHCOS 42.80.20190911.0 and later. Regarding your question, I looked at: https://github.com/GoogleCloudPlatform/compute-image-packages/blob/master/packages/google-compute-engine/src/lib/udev/rules.d/64-gce-disk-removal.rules and don't understand why Google decided to do a umount -f on device removal. That's weird. Portworx doesn't depend on this. Thanks for the reply, Craig. We included the removal rule for completeness. We don't think it should have any adverse affects. I monkeyed around with the test you provided in comment #7 and got it working with `podman` (no `docker` on RHCOS) and got it to pass after also mounting the host's `/dev` into the container. I confirmed that the `/dev/disk/by-id` for attached disks showed up while the test was running. ``` $ ls -l /dev/disk/by-id/ total 0 lrwxrwxrwx. 1 root root 9 Sep 13 02:29 google-openstorage-test-6e5443e9-1ef2-42da-9ae1-5ec6928897d8 -> ../../sdb lrwxrwxrwx. 1 root root 9 Sep 12 19:34 google-persistent-disk-0 -> ../../sda lrwxrwxrwx. 1 root root 10 Sep 12 19:34 google-persistent-disk-0-part1 -> ../../sda1 lrwxrwxrwx. 1 root root 10 Sep 12 19:35 google-persistent-disk-0-part2 -> ../../sda2 lrwxrwxrwx. 1 root root 10 Sep 12 19:34 google-persistent-disk-0-part3 -> ../../sda3 lrwxrwxrwx. 1 root root 9 Sep 13 02:29 scsi-0Google_PersistentDisk_openstorage-test-6e5443e9-1ef2-42da-9ae1-5ec6928897d8 -> ../../sdb lrwxrwxrwx. 1 root root 9 Sep 12 19:34 scsi-0Google_PersistentDisk_persistent-disk-0 -> ../../sda lrwxrwxrwx. 1 root root 10 Sep 12 19:34 scsi-0Google_PersistentDisk_persistent-disk-0-part1 -> ../../sda1 lrwxrwxrwx. 1 root root 10 Sep 12 19:35 scsi-0Google_PersistentDisk_persistent-disk-0-part2 -> ../../sda2 lrwxrwxrwx. 1 root root 10 Sep 12 19:34 scsi-0Google_PersistentDisk_persistent-disk-0-part3 -> ../../sda3 ``` Micah, Thanks for working on this, and for running the libopenstorage/cloudops tests to verify. Just out of curiousity, for solving this problem did you bake the udev files into the RHCOS image, or did you add them to the code in afterburn, which does cloud-specific provisioning? In this case, we've baked the rules into the RHCOS image. In the future, we want to break the rules into a separate package (see BZ#1751310) so that we can include them that way. Micah, I verified this fix on the latest RHCOS image with openshift-install on GCP. Specifically, I used the latest openshift-install to provision an Openshift 4 cluster in GCP, then I provisioned Portworx, created a StorageCluster, and I observed that the disks were created properly and mounted. Thanks a lot for working on this fix, and running the libopenstorage/cloudops tests. Micah, Thanks for fixing this problem on GCP and Azure. With this fix, I was able to set up two Openshift 4.2 clusters (1 on GCP, 1 on Azure). Ryan Wallner, on my team, was able to test Portworx Disaster Recovery (DR), where he managed to migrate applications + data running on one Openshift cluster, and migrate it to the other cluster running in a different cloud. Ryan made a video: "Cross-Cloud Application Migration with Google Cloud and Microsoft Azure on OpenShift 4.2" https://youtu.be/ZhdpE6sl_jM Thanks again, Micah, for making this possible. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922 |