Bug 1660280 - Block device creation fails "Create Block Volume Failed:failed to configure on xxx" in OCS 3.11.1 OCP 3.11.51
Summary: Block device creation fails "Create Block Volume Failed:failed to configure o...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: rhgs-server-container
Version: ocs-3.11
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: OCS 3.11.1
Assignee: Niels de Vos
QA Contact: Neha Berry
URL:
Whiteboard:
Depends On: 1662312
Blocks: 1644160
TreeView+ depends on / blocked
 
Reported: 2018-12-18 03:30 UTC by Neha Berry
Modified: 2022-03-13 16:32 UTC (History)
29 users (show)

Fixed In Version: ocs/rhgs-server-rhel7:3.11.1-5
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-02-07 04:12:47 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github gluster gluster-containers pull 115 0 'None' closed centos: if a "host dev" exists bind mount it over /dev 2021-02-03 16:26:29 UTC
Red Hat Knowledge Base (Solution) 3785221 0 None None None 2019-01-24 15:32:11 UTC
Red Hat Product Errata RHEA-2019:0287 0 None None None 2019-02-07 04:13:08 UTC

Description Neha Berry 2018-12-18 03:30:37 UTC
Block device creation fails "Create Block Volume Failed:failed to configure on xxx" in OCS 3.11.1 OCP 3.11.51



Description of problem:
=====================

Created a fresh OCP 3.11.51 + OCS 3.11.1(gluster-block-0.2.1-30.el7rhgs.x86_64) 3 node setup with docker as container runtime. 

With no existing block hosting volume or block devices, a block PVC request was created.

"

# date && ./pvc-create.sh  jerry 2 
Mon Dec 17 17:26:14 IST 2018
persistentvolumeclaim/jerry created
[root@dhcp47-135 scripts]# oc get pvc
NAME      STATUS    VOLUME    CAPACITY   ACCESS MODES   STORAGECLASS   AGE
jerry     Pending                                       block-sc       30m

"

 The PVC creation failed with the following error:

"

gluster.org/glusterblock 117b14fa-01e6-11e9-be30-0a580a820204  Failed to provision volume with StorageClass "block-sc": failed to create volume: heketi block volume creation failed: [heketi] failed to create volume: { "RESULT": "FAIL", "errCode": 255, "errMsg": "failed to configure on 10.70.42.35 configure failed\nfailed to configure on 10.70.46.149 configure failed\nfailed to configure on 10.70.46.146 configure failed" }

"

On checking the heketi logs, the following error message was seen:
--------------------------------------------------------

[kubeexec] ERROR 2018/12/17 11:56:24 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:242: Failed to run command [gluster-block create vol_79d21bc5ec8ffed0aec6272a835bdf72/blk_glusterfs_jerry_c1099fa5-01f2-11e9-be31-0a580a820204  ha 3 auth enable prealloc full 10.70.46.149,10.70.42.35,10.70.46.146 2GiB --json] on glusterfs-storage-76xdh: Err[command terminated with exit code 255]: Stdout []: Stderr [{ "RESULT": "FAIL", "errCode": 255, "errMsg": "failed to configure on 10.70.46.149 configure failed\nfailed to configure on 10.70.42.35 configure failed\nfailed to configure on 10.70.46.146 configure failed" }


On checking the gluster-blockd logs, following error message was seen
-------------------------------------------------------

[2018-12-17 11:56:24.104669] ERROR: backend creation failed for: vol_79d21bc5ec8ffed0aec6272a835bdf72/blk_glusterfs_jerry_c1099fa5-01f2-11e9-be31-0a580a820204 [at block_svc_routines.c+4033 :<blockValidateCommandOutput>]
[2018-12-17 11:56:24.104736] DEBUG: raw output, targetcli shell version 2.1.fb46


on checking the tcmu-runner logs, following error message was seen
------------------------------------------------------------

2018-12-17 11:56:24.062 436 [ERROR] add_device:516: could not open /dev/uio0
2018-12-17 11:56:44.146 436 [ERROR] add_device:516: could not open /dev/uio0
2018-12-17 11:57:03.629 436 [ERROR] add_device:516: could not open /dev/uio0


2018-12-17 11:56:24.094 430 [ERROR] add_device:516: could not open /dev/uio0
2018-12-17 11:56:44.165 430 [ERROR] add_device:516: could not open /dev/uio0



Some more details:
-----------------------

1. We tried creating multiple pvcs over a span of time, each creation failed with similar error message.
2. File volume creations(via pvc) and manual Block Hosting volume creations(via heketi) are succeeding.
3. Since Block device creations fail, the underlying BHV is also deleted (expected behavior)
4. Similar behavior is seen on 2 deployments - Greenfield(OCP 3.11.51+ OCS 3.11.1 together) and brownfield (first OCP 3.11.51 and then OCS 3.11.1)
5. Almost similar issue is seen in CRI-O setups as well - BZ#1653571




How reproducible:
===================
2x2 on two different fresh setups of OCP 3.11.51 and OCS 3.11.1

Steps to Reproduce:
1. Create an 3 node OCP 3.11.51 and OCS 3.11.1 setup
2. Send a pvc request for a block device.
3. Check for success/failure. In case of failure, check for heekti logs, gluster-blockd logs and pvc describe messages.

Actual results:
=================
Unable to create a single Block device on a OCP 3.11.51 and OCS 3.11.1 with docker as a container runtime.

Expected results:
=================
Block device creations should succeed.

Comment 9 Nicholas Schuetz 2018-12-18 14:07:47 UTC
doing a PoC now with a customer and we are hitting this issue exactly.  Please advise...

Comment 25 Niels de Vos 2018-12-24 10:35:27 UTC
The rhgs-server container image needs to get the updated version of update-params.sh that configures the /dev rbind-mount. The change has been posted upstream as PR#115. The current version of the script is at https://github.com/gluster/gluster-containers/blob/45497f475a9ff008e35dc7da8bbd43e77ecdbcc2/CentOS/update-params.sh

Comment 30 Niels de Vos 2018-12-27 15:51:04 UTC
The changes to the daemonset explained in comment #15 and comment #16 will be included in cns-deploy through bug 1653571 and pushed into openshift-ansible (bug 1662312).

Comment 35 Nicholas Schuetz 2019-01-10 20:21:12 UTC
still getting this error on the recently released v3.11.59.


[kubeexec] ERROR 2019/01/10 20:15:48 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:242: Failed to run command [bash -c "set -o pipefail && gluster-block delete vol_f022f1e90cb33b0e76fb4faa4295ed69/blockvol_fd50002df6d075d5c290b2bff0bd5e4e --json |tee /dev/stderr"] on glusterfs-storage-wxhlv: Err[command terminated with exit code 2]: Stdout [{ "RESULT": "FAIL", "errCode": 2, "errMsg": "block vol_f022f1e90cb33b0e76fb4faa4295ed69\/blockvol_fd50002df6d075d5c290b2bff0bd5e4e doesn't exist" }
]: Stderr [{ "RESULT": "FAIL", "errCode": 2, "errMsg": "block vol_f022f1e90cb33b0e76fb4faa4295ed69\/blockvol_fd50002df6d075d5c290b2bff0bd5e4e doesn't exist" }

Comment 36 John Mulligan 2019-01-10 21:42:51 UTC
(In reply to Nicholas Nachefski from comment #35)
> still getting this error on the recently released v3.11.59.
> 
> 
> [kubeexec] ERROR 2019/01/10 20:15:48
> /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:242: Failed to
> run command [bash -c "set -o pipefail && gluster-block delete
> vol_f022f1e90cb33b0e76fb4faa4295ed69/
> blockvol_fd50002df6d075d5c290b2bff0bd5e4e --json |tee /dev/stderr"] on
> glusterfs-storage-wxhlv: Err[command terminated with exit code 2]: Stdout [{
> "RESULT": "FAIL", "errCode": 2, "errMsg": "block
> vol_f022f1e90cb33b0e76fb4faa4295ed69\/
> blockvol_fd50002df6d075d5c290b2bff0bd5e4e doesn't exist" }
> ]: Stderr [{ "RESULT": "FAIL", "errCode": 2, "errMsg": "block
> vol_f022f1e90cb33b0e76fb4faa4295ed69\/
> blockvol_fd50002df6d075d5c290b2bff0bd5e4e doesn't exist" }

That may be a non-fatal error from when it tries to clean up after a create volume error. Was there an earlier error in the logs for a create command?

Comment 37 Ashmitha Ambastha 2019-01-11 10:40:26 UTC
Hi, 

The block volume creation is successful in OCP 3.11.67-1 and OCS 3.11.1 (latest available builds)

Comment 39 Mark Szczewski 2019-01-18 20:31:33 UTC
This looks like it might be present as an issue in 3.11.43 as well so it may have been introduced earlier than originally thought.

Comment 40 Mark Szczewski 2019-01-21 19:52:44 UTC
(In reply to Mark Szczewski from comment #39)
> This looks like it might be present as an issue in 3.11.43 as well so it may
> have been introduced earlier than originally thought.

Customer made mistake when reporting the issue with 3.11.43. They were not running a complete teardown and used the 3.11.59 playbooks to do the install so the issue would have been presented. 3.11.43 does not show this issue!

Comment 43 errata-xmlrpc 2019-02-07 04:12:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0287


Note You need to log in before you can comment on or make changes to this bug.