Created attachment 1765478 [details] running cryptsetup from the container Description of problem (please be detailed as possible and provide log snippests): Creating a encrypted ceph-cluster on openshift 4.7 with disks backed by san/multipath, gets stuck in "initcontainer expand-encrypted-bluefs". Error message is "Underlying device for crypt device ocs-deviceset-block-dev-from-san-1-data-0-5qkmm-block-dmcrypt disappeared." Version of all relevant components (if applicable): OpenShift 4.7 OpenShift Container Storage 4.6.3 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, well it means that the storage cluster won't come up. Is there any workaround available to the best of your knowledge? The only work around I have at the moment is to edit the deployment and replace the command "cryptsetup --verbose resize ocs-deviceset-block-dev-from-san-1-data-0-5qkmm-block-dmcrypt" with "/bin/true", this seems to work, however I dont know what kind of impact this does. Also, restarting a pod requires me to edit the deployment again - so its a really nasty hack that I wouldn't recommend anyone to try. Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Yes Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Set up OpenShift 4.6 with 3 dedicated infra-nodes that will be used for the ceph-cluster. These servers should have disks backed by san/multipath. Install the operators, - OpenShift Container Storage - Local Storage 2. Create a default /etc/multipath.conf defaults { user_friendly_names yes find_multipaths yes enable_foreign "^$" } blacklist_exceptions { property "(SCSI_IDENT_|ID_WWN)" } blacklist { } 3. Start the multipathd.service and the iscsid.service $ > sudo systemsctl start multipathd.service iscsid.service 4. Create a localvolume with the "local storage operator", $ > oc apply -f <snippet below> local-storage-block.yaml apiVersion: local.storage.openshift.io/v1 kind: LocalVolume metadata: name: block-dev-from-san namespace: openshift-local-storage labels: app: ocs-storagecluster spec: tolerations: - key: "node.ocs.openshift.io/storage" value: "true" effect: NoSchedule nodeSelector: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage operator: In values: - "" storageClassDevices: - storageClassName: block-dev-from-san volumeMode: Block devicePaths: - /dev/dm-0 (or /dev/mapper/mpatha) 5. Now go a head a create an instance of "Storage cluster", select your servers and disk, and make sure to select "enable encryption". Actual results: Everything gets created as expected, but when the containers, - rook-ceph-osd-1-xxxx - rook-ceph-osd-2-xxxx - rook-ceph-osd-3-xxxx is suppose to start, they get stuck in the init-container, expand-encrypted-bluefs with the error message "Underlying device for crypt device ocs-deviceset-block-dev-from-san-1-data-0-5qkmm-block-dmcrypt disappeared. Command failed with code -1 (wrong or missing parameters)." Expected results: A running ceph cluster. Additional info: I've added additional log files, one is from the command running inside the failing container (I modified the command to be "sleep 900000", and then entered the container and ran the command manually). The second log is from when I run the command on the hosts itself. Not really to familiar within this area, but it seems like the /dev/mapper/mpatha is not available in the container, but on the host, and this seems to cause the issue. * * * Snippet from container * * * # Releasing device-mapper backend. # Allocating context for crypt device (none). # Initialising device-mapper backend library. Underlying device for crypt device ocs-deviceset-block-dev-from-san-1-data-0-5qkmm-block-dmcrypt disappeared. * * * * * * * Snippet from host * * * # Releasing device-mapper backend. # Trying to open and read device /dev/mapper/mpatha with direct-io. # Allocating context for crypt device /dev/mapper/mpatha. # Trying to open and read device /dev/mapper/mpatha with direct-io. # Initialising device-mapper backend library. * * * *
Created attachment 1765479 [details] running cryptsetup from the host
Thanks for the in-depth troubleshooting, it seems that adding /dev/mapper in this init container will make it work. I'm working on a patch.
Once more thing, do you have the output of the init container "encrypted-block-status"? Thanks.
Hi Sébastien, Sure thing, I added the '--debug' flag as well to the container, here's the output. $ > oc logs rook-ceph-osd-2-7d7bbb9986-x4dw4 encrypted-block-status # cryptsetup 2.3.3 processing "cryptsetup --verbose --debug status ocs-deviceset-block-dev-from-san-2-data-0-5gsqr-block-dmcrypt" # Running command status. # Installing SIGINT/SIGTERM handler. # Unblocking interruption on signal. # Initialising device-mapper backend library. # dm version [ opencount flush ] [16384] (*1) # dm versions [ opencount flush ] [16384] (*1) # Detected dm-ioctl version 4.42.0. # Detected dm-crypt version 1.20.0. # Udev is not running. Not using udev synchronisation code. # Device-mapper backend running with UDEV support disabled. # dm status ocs-deviceset-block-dev-from-san-2-data-0-5gsqr-block-dmcrypt [ opencount noflush ] [16384] (*1) # Releasing device-mapper backend. /dev/mapper/ocs-deviceset-block-dev-from-san-2-data-0-5gsqr-block-dmcrypt is active. # Allocating crypt device context by device ocs-deviceset-block-dev-from-san-2-data-0-5gsqr-block-dmcrypt. # Initialising device-mapper backend library. # dm versions [ opencount flush ] [16384] (*1) # dm status ocs-deviceset-block-dev-from-san-2-data-0-5gsqr-block-dmcrypt [ opencount noflush ] [16384] (*1) # Releasing device-mapper backend. # Allocating context for crypt device (none). # Initialising device-mapper backend library. Underlying device for crypt device ocs-deviceset-block-dev-from-san-2-data-0-5gsqr-block-dmcrypt disappeared. # dm versions [ opencount flush ] [16384] (*1) # dm table ocs-deviceset-block-dev-from-san-2-data-0-5gsqr-block-dmcrypt [ opencount flush securedata ] [16384] (*1) # dm status (253:0) [ opencount noflush ] [16384] (*1) # dm versions [ opencount flush ] [16384] (*1) # dm deps ocs-deviceset-block-dev-from-san-2-data-0-5gsqr-block-dmcrypt [ opencount flush ] [16384] (*1) # dm table mpatha [ opencount flush securedata ] [16384] (*1) # LUKS device header not available. type: n/a # dm versions [ opencount flush ] [16384] (*1) # dm table ocs-deviceset-block-dev-from-san-2-data-0-5gsqr-block-dmcrypt [ opencount flush securedata ] [16384] (*1) # dm versions [ opencount flush ] [16384] (*1) # dm table ocs-deviceset-block-dev-from-san-2-data-0-5gsqr-block-dmcrypt [ opencount flush securedata ] [16384] (*1) # dm status (253:0) [ opencount noflush ] [16384] (*1) # dm versions [ opencount flush ] [16384] (*1) # dm table ocs-deviceset-block-dev-from-san-2-data-0-5gsqr-block-dmcrypt [ opencount flush securedata ] [16384] (*1) # dm versions [ opencount flush ] [16384] (*1) # dm table ocs-deviceset-block-dev-from-san-2-data-0-5gsqr-block-dmcrypt [ opencount flush securedata ] [16384] (*1) cipher: aes-xts-plain64 # dm versions [ opencount flush ] [16384] (*1) # dm table ocs-deviceset-block-dev-from-san-2-data-0-5gsqr-block-dmcrypt [ opencount flush securedata ] [16384] (*1) keysize: 512 bits key location: dm-crypt device: (null) sector size: 512 offset: 32768 sectors size: 8589905152 sectors mode: read/write flags: discards # Releasing crypt device (null) context. # Releasing device-mapper backend. Command successful. Funny thing is that it seems as the status command succeeds, even with the underlying device missing, so not really sure how that is suppose to work. >> Thanks for the in-depth troubleshooting, it seems that adding /dev/mapper in this init container will make it work. No problem at all. Yes I believe so as well, I'll be happy to test it when you got something ready.
(In reply to Patrik Martinsson from comment #5) > Hi Sébastien, > > Sure thing, I added the '--debug' flag as well to the container, here's the > output. > > $ > oc logs rook-ceph-osd-2-7d7bbb9986-x4dw4 encrypted-block-status > # cryptsetup 2.3.3 processing "cryptsetup --verbose --debug status > ocs-deviceset-block-dev-from-san-2-data-0-5gsqr-block-dmcrypt" > # Running command status. > # Installing SIGINT/SIGTERM handler. > # Unblocking interruption on signal. > # Initialising device-mapper backend library. > # dm version [ opencount flush ] [16384] (*1) > # dm versions [ opencount flush ] [16384] (*1) > # Detected dm-ioctl version 4.42.0. > # Detected dm-crypt version 1.20.0. > # Udev is not running. Not using udev synchronisation code. > # Device-mapper backend running with UDEV support disabled. > # dm status ocs-deviceset-block-dev-from-san-2-data-0-5gsqr-block-dmcrypt [ > opencount noflush ] [16384] (*1) > # Releasing device-mapper backend. > /dev/mapper/ocs-deviceset-block-dev-from-san-2-data-0-5gsqr-block-dmcrypt is > active. > # Allocating crypt device context by device > ocs-deviceset-block-dev-from-san-2-data-0-5gsqr-block-dmcrypt. > # Initialising device-mapper backend library. > # dm versions [ opencount flush ] [16384] (*1) > # dm status ocs-deviceset-block-dev-from-san-2-data-0-5gsqr-block-dmcrypt [ > opencount noflush ] [16384] (*1) > # Releasing device-mapper backend. > # Allocating context for crypt device (none). > # Initialising device-mapper backend library. > Underlying device for crypt device > ocs-deviceset-block-dev-from-san-2-data-0-5gsqr-block-dmcrypt disappeared. > # dm versions [ opencount flush ] [16384] (*1) > # dm table ocs-deviceset-block-dev-from-san-2-data-0-5gsqr-block-dmcrypt [ > opencount flush securedata ] [16384] (*1) > # dm status (253:0) [ opencount noflush ] [16384] (*1) > # dm versions [ opencount flush ] [16384] (*1) > # dm deps ocs-deviceset-block-dev-from-san-2-data-0-5gsqr-block-dmcrypt [ > opencount flush ] [16384] (*1) > # dm table mpatha [ opencount flush securedata ] [16384] (*1) > # LUKS device header not available. > type: n/a > # dm versions [ opencount flush ] [16384] (*1) > # dm table ocs-deviceset-block-dev-from-san-2-data-0-5gsqr-block-dmcrypt [ > opencount flush securedata ] [16384] (*1) > # dm versions [ opencount flush ] [16384] (*1) > # dm table ocs-deviceset-block-dev-from-san-2-data-0-5gsqr-block-dmcrypt [ > opencount flush securedata ] [16384] (*1) > # dm status (253:0) [ opencount noflush ] [16384] (*1) > # dm versions [ opencount flush ] [16384] (*1) > # dm table ocs-deviceset-block-dev-from-san-2-data-0-5gsqr-block-dmcrypt [ > opencount flush securedata ] [16384] (*1) > # dm versions [ opencount flush ] [16384] (*1) > # dm table ocs-deviceset-block-dev-from-san-2-data-0-5gsqr-block-dmcrypt [ > opencount flush securedata ] [16384] (*1) > cipher: aes-xts-plain64 > # dm versions [ opencount flush ] [16384] (*1) > # dm table ocs-deviceset-block-dev-from-san-2-data-0-5gsqr-block-dmcrypt [ > opencount flush securedata ] [16384] (*1) > keysize: 512 bits > key location: dm-crypt > device: (null) > sector size: 512 > offset: 32768 sectors > size: 8589905152 sectors > mode: read/write > flags: discards > # Releasing crypt device (null) context. > # Releasing device-mapper backend. > Command successful. > > Funny thing is that it seems as the status command succeeds, even with the > underlying device missing, so not really sure how that is suppose to work. Indeed, that's interesting, even if: "device: (null)", I believe that "status" just prints the status without throwing any error. > > >> Thanks for the in-depth troubleshooting, it seems that adding /dev/mapper in this init container will make it work. > No problem at all. Yes I believe so as well, I'll be happy to test it when > you got something ready. I have a PR here: https://github.com/rook/rook/pull/7466 But in the meantime, if you edit the deployment again and add a new volume mount that maps /dev/mapper, the deployment already has the volume if you look in the spec. Also the last container named "osd" has it if you need an example.
Hi again, Thanks for the quick fix, it works indeed by adding the mount. Here are the container definitions (snippets), encrypted-block-status: Command: cryptsetup Args: --verbose status ocs-deviceset-block-dev-from-san-0-data-0-zs5bw-block-dmcrypt State: Terminated Reason: Completed Exit Code: 0 Started: Wed, 24 Mar 2021 08:05:41 -0400 Finished: Wed, 24 Mar 2021 08:05:41 -0400 Ready: True Mounts: /dev/mapper from dev-mapper (rw) /var/lib/ceph/osd/ceph-0 from ocs-deviceset-block-dev-from-san-0-data-0-zs5bw-bridge (rw,path="ceph-0") /var/run/secrets/kubernetes.io/serviceaccount from rook-ceph-osd-token-l9ndh (ro) expand-encrypted-bluefs: Command: cryptsetup Args: --verbose resize ocs-deviceset-block-dev-from-san-0-data-0-zs5bw-block-dmcrypt State: Terminated Reason: Completed Exit Code: 0 Started: Wed, 24 Mar 2021 08:05:42 -0400 Finished: Wed, 24 Mar 2021 08:05:42 -0400 Ready: True Mounts: /dev/mapper from dev-mapper (rw) /var/lib/ceph/osd/ceph-0 from ocs-deviceset-block-dev-from-san-0-data-0-zs5bw-bridge (rw,path="ceph-0") /var/run/secrets/kubernetes.io/serviceaccount from rook-ceph-osd-token-l9ndh (ro) And the output from the, expand-encrypted-bluefs-container $ > oc logs rook-ceph-osd-0-565ffd5bc5-rxjhd -c expand-encrypted-bluefs WARNING: Locking directory /run/cryptsetup is missing! Command successful. I guess the warning is okay, even though it would be nice to only have "Command successful" in the output ;) Thanks again for this! // Patrik
(In reply to Patrik Martinsson from comment #7) > Hi again, > > Thanks for the quick fix, it works indeed by adding the mount. > > Here are the container definitions (snippets), > > encrypted-block-status: > Command: > cryptsetup > Args: > --verbose > status > ocs-deviceset-block-dev-from-san-0-data-0-zs5bw-block-dmcrypt > State: Terminated > Reason: Completed > Exit Code: 0 > Started: Wed, 24 Mar 2021 08:05:41 -0400 > Finished: Wed, 24 Mar 2021 08:05:41 -0400 > Ready: True > Mounts: > /dev/mapper from dev-mapper (rw) > /var/lib/ceph/osd/ceph-0 from > ocs-deviceset-block-dev-from-san-0-data-0-zs5bw-bridge (rw,path="ceph-0") > /var/run/secrets/kubernetes.io/serviceaccount from > rook-ceph-osd-token-l9ndh (ro) > > expand-encrypted-bluefs: > Command: > cryptsetup > Args: > --verbose > resize > ocs-deviceset-block-dev-from-san-0-data-0-zs5bw-block-dmcrypt > State: Terminated > Reason: Completed > Exit Code: 0 > Started: Wed, 24 Mar 2021 08:05:42 -0400 > Finished: Wed, 24 Mar 2021 08:05:42 -0400 > Ready: True > Mounts: > /dev/mapper from dev-mapper (rw) > /var/lib/ceph/osd/ceph-0 from > ocs-deviceset-block-dev-from-san-0-data-0-zs5bw-bridge (rw,path="ceph-0") > /var/run/secrets/kubernetes.io/serviceaccount from > rook-ceph-osd-token-l9ndh (ro) > > And the output from the, expand-encrypted-bluefs-container > > $ > oc logs rook-ceph-osd-0-565ffd5bc5-rxjhd -c expand-encrypted-bluefs > > WARNING: Locking directory /run/cryptsetup is missing! > Command successful. > > I guess the warning is okay, even though it would be nice to only have > "Command successful" in the output ;) > > Thanks again for this! > > // Patrik Thank YOU again for the great bug report, I wish BZ would go this way more often :)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2041