Bug 1251827 - NFS storage can not be mounted when network is bond during Hosted Engine setup
Summary: NFS storage can not be mounted when network is bond during Hosted Engine setup
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-node-plugin-hosted-engine
Version: 3.5.4
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ovirt-3.6.3
: 3.6.0
Assignee: Douglas Schilling Landgraf
QA Contact: wanghui
URL:
Whiteboard:
Depends On:
Blocks: 1250199 1257980 1273072
TreeView+ depends on / blocked
 
Reported: 2015-08-10 05:47 UTC by wanghui
Modified: 2016-03-09 14:34 UTC (History)
21 users (show)

Fixed In Version: ovirt-node-plugin-hosted-engine-0.3.0-1.el7ev
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1257980 1273072 (view as bug list)
Environment:
Last Closed: 2016-03-09 14:34:50 UTC
oVirt Team: Node
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
log files in rhevh part (189.78 KB, application/x-gzip)
2015-08-10 05:47 UTC, wanghui
no flags Details
nfs_error (12.12 KB, application/x-gzip)
2016-02-07 14:11 UTC, Michael Burman
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:0378 0 normal SHIPPED_LIVE ovirt-node bug fix and enhancement update for RHEV 3.6 2016-03-09 19:06:36 UTC
oVirt gerrit 44758 0 master MERGED adding restart of rpc-statd on network config Never
oVirt gerrit 44773 0 ovirt-3.5 MERGED adding restart of rpc-statd on network config Never
oVirt gerrit 47161 0 ovirt-3.5 MERGED adding restart of rpc-statd on network config Never

Description wanghui 2015-08-10 05:47:43 UTC
Created attachment 1060944 [details]
log files in rhevh part

Description of problem:
The NFS storage can not be mounted when network is bonding during Hosted Engine setup. No such issue when using nonbonding network like em1.

Version-Release number of selected component (if applicable):
rhev-hypervisor7-7.1-20150805.0.el7ev
ovirt-node-3.2.3-16.el7.noarch
ovirt-node-plugin-hosted-engine-0.2.0-18.0.el7ev.noarch
ovirt-hosted-engine-setup-1.2.5.2-1.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. Clean install rhev-hypervisor7-7.1-20150805.0.el7ev
2. Create bond with mode=1

# cat /proc/net/bonding/bond1 
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: p3p1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: p3p1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:1b:21:36:79:f0
Slave queue ID: 0

Slave Interface: p3p2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:1b:21:36:79:f1
Slave queue ID: 0

3. Donwload ova file to start first host setup
4. Select nfs3 as the storage.
4. Please specify the storage you would like to use (iscsi, nfs3, nfs4)[nfs3]: <enter>
5. Please specify the full shared storage connection path to use (example: host:/path): 10.66.65.196:/home/huiwa/nfs1
[ ERROR ] Error while mounting specified storage path: mount.nfs: rpc.statd is not running but is required for remote locking. mount.nfs: Either use '-o nolock' to keep locks local, or start statd. mount.nfs: an incorrect mount option was specified
[WARNING] Cannot unmount /tmp/tmpgsQcrd 
[ ERROR ] Cannot access storage connection 10.66.65.196:/home/huiwa/nfs1: mount.nfs: rpc.statd is not running but is required for remote locking. mount.nfs: Either use '-o nolock' to keep locks local, or start statd. mount.nfs: an incorrect mount option was specified
          Please specify the full shared storage connection path to use (example: host:/path):

Actual results:
1. After step4, it reports the error like follows.

Please specify the storage you would like to use (iscsi, nfs3, nfs4)[nfs3]: <enter>
Please specify the full shared storage connection path to use (example: host:/path): 10.66.65.196:/home/huiwa/nfs1
[ ERROR ] Error while mounting specified storage path: mount.nfs: rpc.statd is not running but is required for remote locking. mount.nfs: Either use '-o nolock' to keep locks local, or start statd. mount.nfs: an incorrect mount option was specified
[WARNING] Cannot unmount /tmp/tmpgsQcrd 
[ ERROR ] Cannot access storage connection 10.66.65.196:/home/huiwa/nfs1: mount.nfs: rpc.statd is not running but is required for remote locking. mount.nfs: Either use '-o nolock' to keep locks local, or start statd. mount.nfs: an incorrect mount option was specified
          Please specify the full shared storage connection path to use (example: host:/path):

Expected results:
1. The nfs storage can be added succeed.

Additional info:
1. No such issue when network is em1.

Comment 2 Fabian Deutsch 2015-08-10 08:27:49 UTC
Sandro, is he-setup taking care to bring up the right daemons? or is it expected that the OS os configured properly?

Or do you have any other cmoments wrt the error above?

Comment 3 Fabian Deutsch 2015-08-10 12:41:14 UTC
Could this also be related to bug 1159183?

Comment 4 Sandro Bonazzola 2015-08-10 13:18:38 UTC
(In reply to Fabian Deutsch from comment #2)
> Sandro, is he-setup taking care to bring up the right daemons? or is it
> expected that the OS os configured properly?

well, hosted-engine expect that the system is configured properly to some extent.
Specifically it expects that if you're trying to use a NFS storage, you can actually mount it.

> 
> Or do you have any other cmoments wrt the error above?

In this specific case, the setup has been ran at 05:05:28 (according to setup log)
and rpc.statd was running according to message log:

Aug  7 04:37:12 localhost rpc.statd[16855]: Version 1.3.0 starting

So something else is preventing mount to work properly.

Comment 5 Fabian Deutsch 2015-08-10 13:28:42 UTC
Thanks Sandro.

Hui, can you manually mount that path from the description on RHEV-H?

Comment 6 wanghui 2015-08-12 03:10:33 UTC
(In reply to Fabian Deutsch from comment #5)
> Thanks Sandro.
> 
> Hui, can you manually mount that path from the description on RHEV-H?

Yes, manually mount can be succeed. And the same nfs path can be used when network is not bonded.

Comment 7 wanghui 2015-08-12 03:20:58 UTC
(In reply to wanghui from comment #6)
> (In reply to Fabian Deutsch from comment #5)
> > Thanks Sandro.
> > 
> > Hui, can you manually mount that path from the description on RHEV-H?
> 
> Yes, manually mount can be succeed. And the same nfs path can be used when
> network is not bonded.

I have tried manally mount with the default version. And checked that it used nfsv4 for default.
# mount -t nfs 10.66.65.196:/home/huiwa/nfs1 /tmp  -- succeed

But failed when use nfsv3.
# mount -t nfs -o vers=3,retry=1 10.66.65.196:/home/huiwa/nfs1 /tmp

Failed with the follow errors.
mount.nfs: rpc.statd is not running but is required for remote locking.
mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
mount.nfs: an incorrect mount option was specified

And nfs path can be mounted through nfsv3 version when add "nolock".
# mount -t nfs -o vers=3,retry=1,nolock 10.66.65.196:/home/huiwa/nfs1 /tmp

# mount |grep nfs1
10.66.65.196:/home/huiwa/nfs1 on /tmp type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.66.65.196,mountvers=3,mountport=20048,mountproto=udp,local_lock=all,addr=10.66.65.196)
10.66.65.196:/home/huiwa/nfs1 on /var/lib/stateless/writable/tmp type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.66.65.196,mountvers=3,mountport=20048,mountproto=udp,local_lock=all,addr=10.66.65.196)

Comment 8 Fabian Deutsch 2015-08-12 11:01:29 UTC
We need to understand if statd is installed an enabled or not. Or if something else is missing.

Comment 9 Anatoly Litovsky 2015-08-12 12:53:31 UTC
It looks that rpc-statd needs to be restarted after bond creation

Comment 14 Ivan Makfinsky 2015-09-11 13:37:20 UTC
Found the same issue - rpc-statd service needs to be restarted on 7.1 hypervisor in order to mount NFS storage domains.

systemctl reports that rpc-statd is running but the logs indicate that the mount fails and that rpc-statd is not running.

Restarting rpc-statd service resolves the issue and NFSv3 storage domains then mount automatically afterwards.

Comment 15 Ivan Makfinsky 2015-09-11 13:37:46 UTC
Found the same issue - rpc-statd service needs to be restarted on 7.1 hypervisor in order to mount NFS storage domains.

systemctl reports that rpc-statd is running but the logs indicate that the mount fails and that rpc-statd is not running.

Restarting rpc-statd service resolves the issue and NFSv3 storage domains then mount automatically afterwards.

Comment 18 wanghui 2015-11-12 09:10:22 UTC
Test version:
rhev-hypervisor7-7.2-20151104.0.iso
ovirt-node-3.6.0-0.20.20151103git3d3779a.el7ev.noarch
ovirt-node-plugin-hosted-engine-0.3.0-2.el7ev.noarch

Test steps:
1. Clean install rhev-hypervisor7-7.2-20151104.0.iso
2. Create bond with mode=1
3. Donwload ova file to start first host setup
4. Select nfs3 as the storage.
Please specify the full shared storage connection path to use (example: host:/path): 10.66.9.243:/home/test
[ INFO  ] Installing on first host

Test result:
1. NFS can be mounted succeed.

So this issue is fixed in ovirt-node-plugin-hosted-engine-0.3.0-2.el7ev.noarch.

Comment 19 Michael Burman 2016-02-07 14:05:29 UTC
Hi

This bug should be re opened because it still happens on ovirt-node-plugin-hosted-engine-0.3.0-6.el7ev.noarch when trying to run HE deploy over rhev-H over vlan tagged bond using a rhevm-appliance 

[ ERROR ] Cannot access storage connection mount.nfs: rpc.statd is not running but is required for remote locking. mount.nfs: Either use '-o nolock' to keep locks local, or start statd. mount.nfs: an incorrect mount option was specified

Restarting rpc-statd service is not helping. 

tested with -  rhevm-appliance-20160128.1-1 over a rhev-h 7.2 20160126.0.el7ev
ovirt-node-3.6.1-5.0.el7ev.noarch

root@orchid-vds2-vlan162 ~]# ip -4 a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
9: bond0.162@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
    inet 10.35.129.15/24 brd 10.35.129.255 scope global dynamic bond0.162
       valid_lft 37847sec preferred_lft 37847sec

Comment 20 Michael Burman 2016-02-07 14:11:00 UTC
Created attachment 1121911 [details]
nfs_error

Comment 21 Allon Mureinik 2016-02-08 10:08:29 UTC
Didn't we add a restart of rpc-statsd already? Or what that only in VDSM and not HE?

Comment 22 Douglas Schilling Landgraf 2016-02-08 21:54:04 UTC
(In reply to Allon Mureinik from comment #21)
> Didn't we add a restart of rpc-statsd already? Or what that only in VDSM and
> not HE?

I believe the issue here at this stage is not restart, as reporter shared in comment#19 as it doesn't help. I have tried myself in Michael's machine the restart of rpcbind, rpc-statd and doesn't work either (As I couldn't reproduce the issue locally).

I have noticed that there is a report open against nfs-utils that involve VDSM [1] using the nfs-utils 1.3.0.0.21.el7 which is the same version of the host and the last comment in BZ from Steve is suggesting to update to the last nfs-utils and I did in the Michal's host and after a restart of rpc-statd, the mount worked nicely.

# mount -o remount,rw /
# wget http://<brew>/brewroot/packages/nfs-utils/1.3.0/0.22.el7/x86_64/nfs-utils-1.3.0-0.22.el7.x86_64.rpm
# /bin/systemctl restart  rpc-statd.service
# mount -tnfs -overs=3,retry=1, IP_ADDR:/vol/RHEV/Network/mburman/HE_Over_BOND /mnt
#

Data from Machiel's host before the upgrade:

# cat /etc/redhat-release 
Red Hat Enterprise Virtualization Hypervisor (Beta) release 7.2 (20160126.0.el7ev)

# rpm -qa | grep -i nfs-utils
nfs-utils-1.3.0-0.21.el7.x86_64

# rpm -qa | grep -i vdsm
vdsm-xmlrpc-4.17.18-0.el7ev.noarch
ovirt-node-plugin-vdsm-0.6.1-7.el7ev.noarch
vdsm-yajsonrpc-4.17.18-0.el7ev.noarch
vdsm-python-4.17.18-0.el7ev.noarch
vdsm-cli-4.17.18-0.el7ev.noarch
vdsm-infra-4.17.18-0.el7ev.noarch
vdsm-hook-vmfex-dev-4.17.18-0.el7ev.noarch
vdsm-jsonrpc-4.17.18-0.el7ev.noarch
vdsm-4.17.18-0.el7ev.noarch
vdsm-hook-ethtool-options-4.17.18-0.el7ev.noarch


@Michael, could you please try a deploy of HE now with the nfs-utils updated in your machine?


[1] 
[vdsm] NFS mount fails sometimes with "rpc.statd is not running but is required for remote locking"
https://bugzilla.redhat.com/show_bug.cgi?id=1275082

Comment 25 Michael Burman 2016-02-09 08:04:39 UTC
(In reply to Douglas Schilling Landgraf from comment #22)
> (In reply to Allon Mureinik from comment #21)
> > Didn't we add a restart of rpc-statsd already? Or what that only in VDSM and
> > not HE?
> 
> I believe the issue here at this stage is not restart, as reporter shared in
> comment#19 as it doesn't help. I have tried myself in Michael's machine the
> restart of rpcbind, rpc-statd and doesn't work either (As I couldn't
> reproduce the issue locally).
> 
> I have noticed that there is a report open against nfs-utils that involve
> VDSM [1] using the nfs-utils 1.3.0.0.21.el7 which is the same version of the
> host and the last comment in BZ from Steve is suggesting to update to the
> last nfs-utils and I did in the Michal's host and after a restart of
> rpc-statd, the mount worked nicely.
> 
> # mount -o remount,rw /
> # wget
> http://<brew>/brewroot/packages/nfs-utils/1.3.0/0.22.el7/x86_64/nfs-utils-1.
> 3.0-0.22.el7.x86_64.rpm
> # /bin/systemctl restart  rpc-statd.service
> # mount -tnfs -overs=3,retry=1,
> IP_ADDR:/vol/RHEV/Network/mburman/HE_Over_BOND /mnt
> #
> 
> Data from Machiel's host before the upgrade:
> 
> # cat /etc/redhat-release 
> Red Hat Enterprise Virtualization Hypervisor (Beta) release 7.2
> (20160126.0.el7ev)
> 
> # rpm -qa | grep -i nfs-utils
> nfs-utils-1.3.0-0.21.el7.x86_64
> 
> # rpm -qa | grep -i vdsm
> vdsm-xmlrpc-4.17.18-0.el7ev.noarch
> ovirt-node-plugin-vdsm-0.6.1-7.el7ev.noarch
> vdsm-yajsonrpc-4.17.18-0.el7ev.noarch
> vdsm-python-4.17.18-0.el7ev.noarch
> vdsm-cli-4.17.18-0.el7ev.noarch
> vdsm-infra-4.17.18-0.el7ev.noarch
> vdsm-hook-vmfex-dev-4.17.18-0.el7ev.noarch
> vdsm-jsonrpc-4.17.18-0.el7ev.noarch
> vdsm-4.17.18-0.el7ev.noarch
> vdsm-hook-ethtool-options-4.17.18-0.el7ev.noarch
> 
> 
> @Michael, could you please try a deploy of HE now with the nfs-utils updated
> in your machine?
Douglas, thanks, this is helped)
> 
> 
> [1] 
> [vdsm] NFS mount fails sometimes with "rpc.statd is not running but is
> required for remote locking"
> https://bugzilla.redhat.com/show_bug.cgi?id=1275082

Comment 26 Fabian Deutsch 2016-02-09 09:20:37 UTC
Michael, can you please check if nfs-utils-1.3.0-0.21.el7_2 also fixes the issue?

This build is the one which should get released today.

Comment 30 Steve Dickson 2016-02-10 15:47:20 UTC
Why is there a needinfo for me?

Comment 31 Michael Burman 2016-02-10 16:03:30 UTC
There was a need info on you(see comment 23^^) from Douglas that i removed by mistake.

Comment 33 wanghui 2016-02-24 08:33:36 UTC
Test version:
rhevh-7.2-20160222.0.el7ev.iso
ovirt-node-plugin-hosted-engine-0.3.0-7.el7ev.noarch

Test step:
Scenario 1:
1. Install rhevh
2. Enable bond with mode 1
3. Select nfs3 as the storage.

Scenario 2:
1. Install rhevh
2. Enable bond+vlan with mode 1
3. Select nfs3 as the storage.

Test result:
1. Both scenario 1&2 can mount nfs3 storage succeed during HE setup.

So this issue is fixed in ovirt-node-plugin-hosted-engine-0.3.0-7.el7ev.noarch. Change status to verified.

Comment 35 errata-xmlrpc 2016-03-09 14:34:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0378.html


Note You need to log in before you can comment on or make changes to this bug.