Bug 1359265

Summary: [RFE] Ability to set different mount options for hosted_engine storage than the default
Product: [oVirt] ovirt-engine Reporter: Simone Tiraboschi <stirabos>
Component: BLL.HostedEngineAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED CURRENTRELEASE QA Contact: Nikolai Sednev <nsednev>
Severity: high Docs Contact:
Priority: high    
Version: 4.0.2CC: bugs, dfediuck, hsahmed, knarra, mavital, mgoldboi, nsednev, rs, stirabos, trichard, ylavi
Target Milestone: ovirt-4.2.1Keywords: FutureFeature, Triaged
Target Release: ---Flags: ylavi: ovirt-4.2?
gklein: blocker?
nsednev: testing_plan_complete+
rule-engine: planning_ack?
rule-engine: devel_ack+
rule-engine: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-12 11:57:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1455169    
Bug Blocks: 1263602, 1411323    
Attachments:
Description Flags
sosreport from alma03
none
vdsm log from host
none
sosreport from alma03
none
screenshot from UI none

Description Simone Tiraboschi 2016-07-22 15:43:30 UTC
Description of problem:
On hosted-engine-setup we let the user deploy over NFSv3 or NFSv4 and this is saved in hosted-engine.conf and so this parameter got honored on HE hosts.

Since 3.6, the hosted-engine storage domain got imported into the engine and so the engine tries to mount it on all the hosts of the selected datacenter.

The auto-import procedure ignores the nfsVersion parameter and so, on not hosted-engine host, the engine tries to mount the hosted-engine SD only as NFSv3.

If, for any reason, the nfs export can be accessed only with NFSv4 and not with NFSv3, all the not hosted-engine hosts of the datacenter will be declared as not operational since they cannot mount the hosted-engine SD.


Version-Release number of selected component (if applicable):
3.6.z

How reproducible:
100%

Steps to Reproduce:
1. deploy hosted-engine on NFSv4
2. add another storage domain to trigger the autoimport procedure
3. add a not hosted-engine host
4. check how the hosted-engine storage domain got mounted on the not HE host

Actual results:
the autoimport import procedure ignores the nfsVersion parameter

Expected results:
the autoimport import procedure honors the nfsVersion parameter

Additional info:

Comment 1 Robert Story 2016-07-22 17:51:29 UTC
My quick immediate hack was to do the mount manually:

# mkdir /rhev/data-center/mnt/nfs.localdomain:_ovirt_hosted-engine

# /usr/bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6,nfsvers=4 \
  nfs.localdomain:/ovirt/hosted-engine \
  /rhev/data-center/mnt/nfs.localdomain:_ovirt_hosted-engine


A more permant fix, from Simone's email on the users list:

If you need a quick fix you can:
- fix the configuration of your storage server to allow it to be accessed
also over nfsv3

- edit the configuration of the storage connection in the engine DB on the
engine VM to add the missing parameter. Something like:
 # sudo -u postgresl psql
 \c engine;
 select * from storage_server_connections;
 UPDATE storage_server_connections SET nfs_version = '4' WHERE connection =
'nfs.localdomain:/ovirt/hosted-engine';
 commit;
 select * from storage_server_connections;

Comment 3 Doron Fediuck 2017-06-08 16:11:11 UTC
*** Bug 1435570 has been marked as a duplicate of this bug. ***

Comment 4 Doron Fediuck 2017-06-08 16:12:08 UTC
This should support Gluster and possibly other storage types as well.

Comment 6 Yaniv Lavi 2017-07-17 09:21:14 UTC
*** Bug 1471026 has been marked as a duplicate of this bug. ***

Comment 7 Yaniv Kaul 2017-12-03 13:01:56 UTC
Part of node zero I assume.

Comment 8 Simone Tiraboschi 2017-12-19 17:10:25 UTC
Addressed in node-zero for NFS and gluster

Comment 9 RHV bug bot 2018-01-05 16:58:13 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[No external trackers attached]

For more info please contact: infra

Comment 10 Martin Sivák 2018-01-10 12:18:30 UTC
Simone, are there patches to attach to this bug to make the bot happy?

Comment 11 Nikolai Sednev 2018-01-10 14:36:03 UTC
Can you please provide relevant documentation regarding usage of mounting options during deployment of Node Zero over NFSv3 and NFSv4 storage domains?

Comment 12 Simone Tiraboschi 2018-01-10 15:11:23 UTC
(In reply to Martin Sivák from comment #10)
> Simone, are there patches to attach to this bug to make the bot happy?

It's implicitly addressed by the new node zero flow, no specific patches on the old one (vdsm is still not reporting in use mount options to the engine at autoimport time).

Comment 13 Simone Tiraboschi 2018-01-10 15:35:03 UTC
(In reply to Nikolai Sednev from comment #11)
> Can you please provide relevant documentation regarding usage of mounting
> options during deployment of Node Zero over NFSv3 and NFSv4 storage domains?

Please try with "rsize=32768,wsize=32768" on this question:
          If needed, specify additional mount options for the connection to the hosted-engine storagedomain []: 

The additional mount options should be visible in the engine and in hosted-engine.conf on all the involved hosts.

Comment 14 Nikolai Sednev 2018-01-10 16:31:16 UTC
I've tried with additional NFSv3 mount options from https://www.centos.org/docs/5/html/Deployment_Guide-en-US/s1-nfs-client-config-options.html, such as rsize=32768,wsize=32768:

[ ERROR ]  [WARNING]: Failure using method (v2_runner_on_failed) in callback plugin
         
[ ERROR ] (<ansible.plugins.callback.1_otopi_json.CallbackModule object at 0x1f2d950>):
         
[ ERROR ] 'ascii' codec can't encode character u'\u2018' in position 520: ordinal not in
         
[ ERROR ] range(128)
         
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook

Components used during deployment:
ovirt-engine-appliance-4.2-20180108.1.el7.centos.noarch.rpm

http://pastebin.test.redhat.com/545655

Comment 15 Nikolai Sednev 2018-01-10 16:33:20 UTC
Created attachment 1379602 [details]
sosreport from alma03

Comment 16 Nikolai Sednev 2018-01-10 18:07:50 UTC
(In reply to Simone Tiraboschi from comment #13)
> (In reply to Nikolai Sednev from comment #11)
> > Can you please provide relevant documentation regarding usage of mounting
> > options during deployment of Node Zero over NFSv3 and NFSv4 storage domains?
> 
> Please try with "rsize=32768,wsize=32768" on this question:
>           If needed, specify additional mount options for the connection to
> the hosted-engine storagedomain []: 
> 
> The additional mount options should be visible in the engine and in
> hosted-engine.conf on all the involved hosts.

I've tried with additional NFSv3 mount options from https://www.centos.org/docs/5/html/Deployment_Guide-en-US/s1-nfs-client-config-options.html, such as rsize=32768,wsize=32768:

[ INFO  ] changed: [localhost]
          Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs)[nfs]: 
          Please specify the nfs version you would like to use (auto, v3, v4, v4_1)[auto]: v3
          Please specify the full shared storage connection path to use (example: host:/path): yellow-vdsb.qa.lab.tlv.redhat.com:/Compute_NFS/nsednev_he_1
          If needed, specify additional mount options for the connection to the hosted-engine storagedomain []: "rsize=32768,wsize=32768"
[ INFO  ] Creating Storage Domain

[ ERROR ] Error: Fault reason is "Operation Failed". Fault detail is "[Problem while trying to mount target]". HTTP response code is 400.
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Problem while trying to mount target]\". HTTP response code is 400."}
          Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs)[nfs]: 

Components used during deployment:
ovirt-engine-appliance-4.2-20180108.1.el7.centos.noarch.rpm
ovirt-hosted-engine-setup-2.2.4-0.0.master.20180108132354.git2005b97.el7.centos.noarch
ovirt-hosted-engine-ha-2.2.3-0.0.master.20171218181916.20171218181911.git4c22b93.el7.centos.noarch
Red Hat Enterprise Linux Server release 7.4 (Maipo)
Linux 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Dec 28 14:23:39 EST 2017 x86_64 x86_64 x86_64 GNU/Linux

Comment 17 Simone Tiraboschi 2018-01-10 21:22:37 UTC
(In reply to Nikolai Sednev from comment #14)     
> [ ERROR ] 'ascii' codec can't encode character u'\u2018' in position 520:
> ordinal not in

The real issue is here:
2018-01-10 17:58:30,479+0200 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:73 {u'_ansible_parsed': True, u'stderr_lines': [u'dd: failed to open \u2018/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_nsednev__he__2/000ccade-f52e-4bf6-9a11-d09f4c74a3a9/images/6ab7311c-3107-4afb-8d77-c290efdaabf6/71c22342-0ec9-43df-b756-1bc6ab8ef741\u2019: Permission denied'], u'cmd': [u'dd', u'bs=20480', u'count=1', u'oflag=direct', u'if=/var/tmp/localvmYRB8qf/71c22342-0ec9-43df-b756-1bc6ab8ef741', u'of=/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_nsednev__he__2/000ccade-f52e-4bf6-9a11-d09f4c74a3a9/images/6ab7311c-3107-4afb-8d77-c290efdaabf6/71c22342-0ec9-43df-b756-1bc6ab8ef741'], u'end': u'2018-01-10 17:58:30.337743', u'_ansible_no_log': False, u'stdout': u'', u'changed': True, u'start': u'2018-01-10 17:58:30.332305', u'delta': u'0:00:00.005438', u'stderr': u'dd: failed to open \u2018/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_nsednev__he__2/000ccade-f52e-4bf6-9a11-d09f4c74a3a9/images/6ab7311c-3107-4afb-8d77-c290efdaabf6/71c22342-0ec9-43df-b756-1bc6ab8ef741\u2019: Permission denied', u'rc': 1, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': False, u'_raw_params': u'dd bs=20480 count=1 oflag=direct if="/var/tmp/localvmYRB8qf/71c22342-0ec9-43df-b756-1bc6ab8ef741" of="/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_nsednev__he__2/000ccade-f52e-4bf6-9a11-d09f4c74a3a9/images/6ab7311c-3107-4afb-8d77-c290efdaabf6/71c22342-0ec9-43df-b756-1bc6ab8ef741"', u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'stdout_lines': [], u'msg': u'non-zero return code'}


but we had an issue parsing the error message on:
[u'dd: failed to open \u2018/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_nsednev__he__2/000ccade-f52e-4bf6-9a11-d09f4c74a3a9/images/6ab7311c-3107-4afb-8d77-c290efdaabf6/71c22342-0ec9-43df-b756-1bc6ab8ef741\u2019: Permission denied']

And so:
2018-01-10 17:58:30,783+0200 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:150 (<ansible.plugins.callback.1_otopi_json.CallbackModule object at 0x1f2d950>):
2018-01-10 17:58:30,783+0200 DEBUG otopi.plugins.otopi.dialog.human 
2018-01-10 17:58:30,783+0200 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:150 'ascii' codec can't encode character u'\u2018' in position 520: ordinal not in

(\u2018 is LEFT SINGLE QUOTATION MARK and \u2019 is RIGHT SINGLE QUOTATION MARK).

Could you please open a new bug on the node zero flow about correctly handling unicode chars in error messages?

Comment 18 Simone Tiraboschi 2018-01-10 21:24:03 UTC
(In reply to Nikolai Sednev from comment #16)
> [ ERROR ] Error: Fault reason is "Operation Failed". Fault detail is
> "[Problem while trying to mount target]". HTTP response code is 400.
> [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault
> reason is \"Operation Failed\". Fault detail is \"[Problem while trying to
> mount target]\". HTTP response code is 400."}

Sorry, plaese try instead with v4 and 'rsize=8192,wsize=8192'

[ INFO  ] changed: [localhost]
          Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs)[nfs]: 
          Please specify the nfs version you would like to use (auto, v3, v4, v4_1)[auto]: v4
          Please specify the full shared storage connection path to use (example: host:/path): 192.168.1.115:/storage/nfs/he1
          If needed, specify additional mount options for the connection to the hosted-engine storagedomain []: rsize=8192,wsize=8192

Comment 19 Nikolai Sednev 2018-01-11 07:09:29 UTC
Why it did not worked for NFSv3?
According to what is written in documentation, NFSv3 should support changing of these valuse.
"rsize=num and wsize=num — These settings speed up NFS communication for reads (rsize) and writes (wsize) by setting a larger data block size, in bytes, to be transferred at one time. Be careful when changing these values; some older Linux kernels and network cards do not work well with larger block sizes. For NFSv2 or NFSv3, the default values for both parameters is set to 8192. For NFSv4, the default values for both parameters is set to 32768."

Comment 21 Nikolai Sednev 2018-01-11 11:15:08 UTC
Created attachment 1379958 [details]
vdsm log from host

Comment 22 Simone Tiraboschi 2018-01-11 11:31:15 UTC
OK, the issue is just here:

2018-01-11 12:09:36,852+0200 INFO  (jsonrpc/2) [vdsm.api] START connectStorageServer(domType=1, spUUID=u'00000000-0000-0000-0000-000000000000', conList=[{u'mnt_options': u"'rsize=8192,wsize=8192'", u'id': u'00000000-0000-0000-0000-000000000000', u'connection': u'yellow-vdsb.qa.lab.tlv.redhat.com:/Compute_NFS/nsednev_he_1', u'iqn': u'', u'user': u'', u'tpgt': u'1', u'protocol_version': u'4', u'password': '********', u'port': u''}], options=None) from=::ffff:192.168.122.167,47676, flow_id=f91f0ff7-069d-4d21-9cf3-bc104a63ece7, task_id=d9fb3a50-e7aa-44f5-98b6-b9bcad56b51c (api:46)
2018-01-11 12:09:36,856+0200 INFO  (jsonrpc/2) [storage.Mount] mounting yellow-vdsb.qa.lab.tlv.redhat.com:/Compute_NFS/nsednev_he_1 at /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_nsednev__he__1 (mount:204)
2018-01-11 12:09:36,995+0200 ERROR (jsonrpc/2) [storage.HSM] Could not connect to storageServer (hsm:2407)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2404, in connectStorageServer
    conObj.connect()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/storageServer.py", line 406, in connect
    return self._mountCon.connect()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/storageServer.py", line 179, in connect
    six.reraise(t, v, tb)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/storageServer.py", line 171, in connect
    self._mount.mount(self.options, self._vfsType, cgroup=self.CGROUP)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 207, in mount
    cgroup=cgroup)
  File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 55, in __call__
    return callMethod()
  File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 53, in <lambda>
    **kwargs)
  File "<string>", line 2, in mount
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod
    raise convert_to_error(kind, result)
MountError: (32, ';mount.nfs: an incorrect mount option was specified\n')
2018-01-11 12:09:36,997+0200 INFO  (jsonrpc/2) [vdsm.api] FINISH connectStorageServer return={'statuslist': [{'status': 477, 'id': u'00000000-0000-0000-0000-000000000000'}]} from=::ffff:192.168.122.167,47676, flow_id=f91f0ff7-069d-4d21-9cf3-bc104a63ece7, task_id=d9fb3a50-e7aa-44f5-98b6-b9bcad56b51c (api:52)

and indeed you entered: "'rsize=8192,wsize=8192'" while you should simply enter "rsize=8192,wsize=8192" (skipping also double quote characters, they are there just to keep it a bit more readable!)

>           If needed, specify additional mount options for the connection to
> the hosted-engine storagedomain []: 'rsize=8192,wsize=8192'


The issue was exactly the same also on the nfs v3 test:

>           Please specify the nfs version you would like to use (auto, v3,
> v4, v4_1)[auto]: v3
>           Please specify the full shared storage connection path to use
> (example: host:/path):
> yellow-vdsb.qa.lab.tlv.redhat.com:/Compute_NFS/nsednev_he_1
>           If needed, specify additional mount options for the connection to
> the hosted-engine storagedomain []: "rsize=32768,wsize=32768"

Comment 23 Nikolai Sednev 2018-01-11 14:31:36 UTC
I did around 16:23 as follows:
  Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs)[nfs]: 
          Please specify the nfs version you would like to use (auto, v3, v4, v4_1)[auto]: v4
          Please specify the full shared storage connection path to use (example: host:/path): yellow-vdsb.qa.lab.tlv.redhat.com:/Compute_NFS/nsednev_he_1
          If needed, specify additional mount options for the connection to the hosted-engine storagedomain []: rsize=32768,wsize=32768


Received at 16:31:
[ ERROR ] ConnectionError: Error while sending HTTP request: (7, 'Failed connect to nsednev-he-1.qa.lab.tlv.redhat.com:443; No route to host')
[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 50, "changed": false, "msg": "Error while sending HTTP request: (7, 'Failed connect to nsednev-he-1.qa.lab.tlv.redhat.com:443; No route to host')"}
          Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs)[nfs]: 

See the sosreport from host alma03 as appears within the attachment.

Comment 24 Nikolai Sednev 2018-01-11 14:36:57 UTC
Created attachment 1380061 [details]
sosreport from alma03

Comment 25 Nikolai Sednev 2018-01-11 16:17:35 UTC
Ok, the deployment worked for me with simple insert of parameters without quoting:
[ INFO  ] changed: [localhost]
          Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs)[nfs]: 
          Please specify the nfs version you would like to use (auto, v3, v4, v4_1)[auto]: v4
          Please specify the full shared storage connection path to use (example: host:/path): yellow-vdsb.qa.lab.tlv.redhat.com:/Compute_NFS/nsednev_he_1
          If needed, specify additional mount options for the connection to the hosted-engine storagedomain []: rsize=32768,wsize=32768

[ INFO  ] Hosted Engine successfully deployed



alma04 ~]# cat  /etc/ovirt-hosted-engine/hosted-engine.conf
fqdn=nsednev-he-1.qa.lab.tlv.redhat.com
vm_disk_id=a3251750-f78e-4d19-986d-5d2d224bce3f
vm_disk_vol_id=96951442-d779-4761-a05c-5484eefa5457
vmid=ecf2ed4a-1b2b-46e8-bd6a-45aa3973d4a5
storage=yellow-vdsb.qa.lab.tlv.redhat.com:/Compute_NFS/nsednev_he_1
nfs_version=v4
mnt_options=rsize=32768,wsize=32768
conf=/var/run/ovirt-hosted-engine-ha/vm.conf
host_id=1
console=vnc
domainType=nfs
spUUID=00000000-0000-0000-0000-000000000000
sdUUID=b550d9b5-05e6-47ef-b72d-2aa7d9305218
connectionUUID=e29cf818-5ee5-46e1-85c1-8aeefa33e95d
ca_cert=/etc/pki/vdsm/libvirt-spice/ca-cert.pem
ca_subject="C=EN, L=Test, O=Test, CN=Test"
vdsm_use_ssl=true
gateway=10.35.72.254
bridge=ovirtmgmt
metadata_volume_UUID=a8672425-5b56-450b-aa8f-85126b51017d
metadata_image_UUID=6614c002-abe1-4bd1-a8d3-6cbd98a5357a
lockspace_volume_UUID=600aa1fc-d0e4-436e-a348-8ffd2d69a054
lockspace_image_UUID=da5ecc31-ba4d-4958-b7d3-4ccda9cfa21c
conf_volume_UUID=08fef88e-d263-4ac6-bd44-b9baa3c1f5fb
conf_image_UUID=f7718321-1b3a-4963-bdad-bd198b645084

# The following are used only for iSCSI storage
iqn=
portal=
user=
password=
port=
You have new mail in /var/spool/mail/root

Also values shown in UI appeared correctly, please see the attached screenshot.

Moving to verified.

Worked for me on these components:
ovirt-hosted-engine-ha-2.2.3-0.0.master.20171218181916.20171218181911.git4c22b93.el7.centos.noarch
ovirt-hosted-engine-setup-2.2.4-0.0.master.20180109170856.gitb593776.el7.centos.noarch
ovirt-engine-appliance-4.2-20180110.1.el7.centos.noarch

Comment 26 Nikolai Sednev 2018-01-11 16:18:29 UTC
Created attachment 1380101 [details]
screenshot from UI

Comment 27 Sandro Bonazzola 2018-02-12 11:57:10 UTC
This bugzilla is included in oVirt 4.2.1 release, published on Feb 12th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.