Bug 1030855

Summary: Failure to add new host to oVirt 3.3 Server on Fedora 19
Product: [Retired] oVirt Reporter: Boris Derzhavets <bderzhavets>
Component: vdsmAssignee: lpeer <lpeer>
Status: CLOSED INSUFFICIENT_DATA QA Contact: yeylon <yeylon>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.3CC: acathrow, alonbl, amureini, bazulay, bderzhavets, bugs, danken, dougsland, iheim, lpeer, mgoldboi, nsoffer, srevivo, ybronhei, yeylon
Target Milestone: ---   
Target Release: 3.4.3   
Hardware: x86_64   
OS: Linux   
Whiteboard: gluster
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-07-01 09:40:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Boris Derzhavets 2013-11-15 09:54:37 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Boris Derzhavets 2013-11-15 10:20:19 UTC
Description of problem:

Original box set up per http://community.redhat.com/up-and-running-with-ovirt-3-3/.  Plus bug for NFS Server start up and manual creation of
ovirtmgmt bridge.  Local glusterfs volume configured as default storage.
Locally everything works fine.

On target box recent (F19) :
1. yum localinstall http://ovirt.org/releases/ovirt-release-fedora.noarch.rpm -y
2. ovirtmgmt bridge manually install
3. NFS server bug https://bugzilla.redhat.com/show_bug.cgi?id=970595 fixed

When adding new host via web-admin-console on first box.
System fails gets frozen in initializing phase.

Status on second box when it gets down :-

# service vdsmd status

vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled)
   Active: active (running) since Fri 2013-11-15 12:36:44 FET; 29min ago
  Process: 2661 ExecStart=/lib/systemd/systemd-vdsmd start (code=exited, status=0/SUCCESS)
 Main PID: 2990 (respawn)
   CGroup: name=systemd:/system/vdsmd.service
           ├─2990 /bin/bash -e /usr/share/vdsm/respawn --minlifetime 10 --daemon --masterpid /var/run/vdsm/respawn.pid /usr/share/vdsm/vdsm
           ├─2992 /usr/bin/python /usr/share/vdsm/vdsm
           └─3334 /usr/bin/python /usr/share/vdsm/storage/remoteFileHandler.pyc 26 15

Nov 15 12:36:45 ovirt02.localdomain python[2992]: DIGEST-MD5 ask_user_info()
Nov 15 12:36:45 ovirt02.localdomain python[2992]: DIGEST-MD5 make_client_response()
Nov 15 12:36:45 ovirt02.localdomain python[2992]: DIGEST-MD5 client step 3
Nov 15 12:36:47 ovirt02.localdomain vdsm[2992]: vdsm StorageServer.MountConnection ERROR Mount failed: (1, 'Mount failed. Please check the log file for more details.\n;')
Nov 15 12:38:53 ovirt02.localdomain vdsm[2992]: vdsm StorageServer.MountConnection ERROR Mount failed: (32, ';mount.nfs: Connection timed out\n')
Nov 15 12:41:00 ovirt02.localdomain vdsm[2992]: vdsm StorageServer.MountConnection ERROR Mount failed: (32, ';mount.nfs: Connection timed out\n')
Nov 15 12:41:03 ovirt02.localdomain vdsm[2992]: vdsm TaskManager.Task ERROR Task=`dbfd039f-a950-4ae6-ae45-966ff161c65a`::Unexpected error
Nov 15 12:45:01 ovirt02.localdomain vdsm[2992]: vdsm StorageServer.MountConnection ERROR Mount failed: (1, 'Mount failed. Please check the log file for more details.\n;')
Nov 15 12:47:07 ovirt02.localdomain vdsm[2992]: vdsm StorageServer.MountConnection ERROR Mount failed: (32, ';mount.nfs: Connection timed out\n')
Nov 15 12:49:16 ovirt02.localdomain vdsm[2992]: vdsm StorageServer.MountConnection ERROR Mount failed: (32, ';mount.nfs: Connection timed out\n')

# gluster peer status
Number of Peers: 1

Hostname: 192.168.1.149
Uuid: 0772f512-d0ae-4cc3-9634-a7ec5d295b0e
State: Peer in Cluster (Disconnected)

192.168.1.149 is original oVirt 3.3 F19 box

Version-Release number of selected component (if applicable):

oVirt 3.3 & F19 with most recent "yum -y update"

How reproducible:

Install oVirt 3.3 on F19  and try to create a new host.

Steps to Reproduce:

1. Make the most recent ovirt repo avalible on new box (F19)
2. Create ovirtmgmt bridge
3. Fix bug with NFS Server
4. Make sure it cannot bring into Glusterfs 3.4.1 Cluster via oVirt 3.3   graphical users interface.

Actual results:

Failure to create Glusterfs 3.4.1 Cluster (2 nodes) via oVirt 3.3 UI

Expected results:

Success to create Glusterfs 3.4.1 Cluster (2 nodes) via oVirt 3.3 UI

Additional info:

Comment 2 Boris Derzhavets 2013-11-21 11:43:22 UTC
I was able to find workaround via selection iptables as firewall manager during initial engine setup on host-1. Then on host-2 supposed to be added :-

1. $ sudo yum localinstall http://ovirt.org/releases/ovirt-release-fedora.noarch.rpm -y
2. set up ovirtmgmt bridge, disabled firewalld and enabled iptables firewall manager

Performing adding new host I shut down iptables on host-1 only for period of initializing host-2 . Finally I've got host-2 successfully installed and brought
service iptables up. Two node glusterfs 3.4.1 cluster has been already built.
However , I was able manage gluster volumes only via CLI. All actions were
properly detected by Web Admin Console.

Without any connection to oVirt I always have to select iptables as firewall
working with glusterfs 3.4.1 clusters on fedora 19. Seems to me as firewalld service issue also affecting Neutron configuration support in RDO Havana 2013.2

Comment 3 Nir Soffer 2013-12-11 08:46:51 UTC
Looks like networking issue - Livnat, can you check this?

Comment 4 Dan Kenigsberg 2014-01-02 09:20:53 UTC
Boris, would you elaborate on why you were able to "manage gluster volumes only via CLI"?

Alon, isn't it the job of ovirt-host-deploy to punch the relevant holes in firewalld?

In any case, setting firewall rules has traditionally fallen into Doron's domain.

Comment 5 Boris Derzhavets 2014-01-02 09:52:49 UTC
(In reply to Dan Kenigsberg from comment #4)
> Boris, would you elaborate on why you were able to "manage gluster volumes
> only via CLI"?

In version 3.3.0 Web Admin gave an error attempting to create replicated volume,but CLI worked fine.  Starting with 3.3.1 no problems with Web Admin Console via my experience

Comment 6 Alon Bar-Lev 2014-01-02 09:55:52 UTC
(In reply to Dan Kenigsberg from comment #4)
> Alon, isn't it the job of ovirt-host-deploy to punch the relevant holes in
> firewalld?
> 
> In any case, setting firewall rules has traditionally fallen into Doron's
> domain.

Host deploy currently supports only iptables, it disables firewalld.

Iptables rules are taken from engine at vdc_options:
- IPTablesConfig
- IPTablesConfigForGluster
- IPTablesConfigForVirt

Any change in these will effect next host-deploy (can be same host). If this related to gluster I guess IPTablesConfigForGluster should be modified, up to gluster maintainer to decide what.

Comment 7 Itamar Heim 2014-01-02 11:01:38 UTC
(In reply to Dan Kenigsberg from comment #4)
...
> In any case, setting firewall rules has traditionally fallen into Doron's
> domain.

host deploy (and firewall rules) are 'infra'. in this case, seems to be gluster specific, hence 'gluster'.
though according to comment 5, 3.3.1 is ok. 
Boris - is there anything that still needs fixing here?

Comment 8 Boris Derzhavets 2014-01-03 08:47:27 UTC
(In reply to Itamar Heim from comment #7)
> (In reply to Dan Kenigsberg from comment #4)
> ...
> > In any case, setting firewall rules has traditionally fallen into Doron's
> > domain.
> 
> host deploy (and firewall rules) are 'infra'. in this case, seems to be
> gluster specific, hence 'gluster'.
> though according to comment 5, 3.3.1 is ok. 
> Boris - is there anything that still needs fixing here?

Regarding version 3.3.2 host deployment:-

 Personally i was experiencing one issue during second host deployment, which required service vdsmd restart on second host to allow system bring it up at the end of installation. Two installs behaved absolutely similar

[root@hv02 ~]# service vdsmd status
Redirecting to /bin/systemctl status  vdsmd.service
vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled)
   Active: active (running) since Tue 2013-12-24 15:40:40 MSK; 50s ago
  Process: 2896 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS)
 Main PID: 3166 (vdsm)
   CGroup: name=systemd:/system/vdsmd.service
           └─3166 /usr/bin/python /usr/share/vdsm/vdsm

Dec 24 15:40:41 hv02.localdomain python[3192]: detected unhandled Python exception in '/usr/bin/vdsm-tool'

Dec 24 15:40:41 hv02.localdomain vdsm[3166]: [427B blob data]
Dec 24 15:40:41 hv02.localdomain vdsm[3166]: vdsm vds WARNING Unable to load the json rpc server module. Ple...led.
Dec 24 15:40:41 hv02.localdomain python[3166]: DIGEST-MD5 client step 2
Dec 24 15:40:41 hv02.localdomain python[3166]: DIGEST-MD5 parse_server_challenge()
Dec 24 15:40:41 hv02.localdomain python[3166]: DIGEST-MD5 ask_user_info()
Dec 24 15:40:41 hv02.localdomain python[3166]: DIGEST-MD5 client step 2
Dec 24 15:40:41 hv02.localdomain python[3166]: DIGEST-MD5 ask_user_info()
Dec 24 15:40:41 hv02.localdomain python[3166]: DIGEST-MD5 make_client_response()
Dec 24 15:40:41 hv02.localdomain python[3166]: DIGEST-MD5 client step 3

[root@hv02 ~]# service vdsmd restart
Redirecting to /bin/systemctl restart  vdsmd.service

[root@hv02 ~]# service vdsmd status
Redirecting to /bin/systemctl status  vdsmd.service
vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled)
   Active: active (running) since Tue 2013-12-24 15:41:42 MSK; 2s ago
  Process: 3355 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS)
  Process: 3358 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS)
 Main PID: 3418 (vdsm)
   CGroup: name=systemd:/system/vdsmd.service
           └─3418 /usr/bin/python /usr/share/vdsm/vdsm

Dec 24 15:41:42 hv02.localdomain vdsmd_init_common.sh[3358]: vdsm: Running test_conflicting_conf

Dec 24 15:41:42 hv02.localdomain vdsmd_init_common.sh[3358]: SUCCESS: ssl configured to true. No conflicts

Dec 24 15:41:42 hv02.localdomain systemd[1]: Started Virtual Desktop Server Manager.
Dec 24 15:41:43 hv02.localdomain vdsm[3418]: vdsm vds WARNING Unable to load the json rpc server module. Ple...led.
Dec 24 15:41:43 hv02.localdomain python[3418]: DIGEST-MD5 client step 2
Dec 24 15:41:43 hv02.localdomain python[3418]: DIGEST-MD5 parse_server_challenge()
Dec 24 15:41:43 hv02.localdomain python[3418]: DIGEST-MD5 ask_user_info()
Dec 24 15:41:43 hv02.localdomain python[3418]: DIGEST-MD5 client step 2
Dec 24 15:41:43 hv02.localdomain python[3418]: DIGEST-MD5 ask_user_info()
Dec 24 15:41:43 hv02.localdomain python[3418]: DIGEST-MD5 make_client_response()
Dec 24 15:41:43 hv02.localdomain python[3418]: DIGEST-MD5 client step 3

Comment 9 Itamar Heim 2014-01-12 08:42:25 UTC
setting target release to current version for consideration and review. please do not push non-RFE bugs to an undefined target release to make sure bugs are reviewed for relevancy, fix, closure, etc.

Comment 11 Sandro Bonazzola 2014-03-04 09:19:44 UTC
This is an automated message.
Re-targeting all non-blocker bugs still open on 3.4.0 to 3.4.1.

Comment 12 Sandro Bonazzola 2014-05-08 13:52:16 UTC
This is an automated message.

oVirt 3.4.1 has been released.
This issue has been retargeted to 3.4.2 as it has severity high, please retarget if needed.
If this is a blocker please add it to the tracker Bug #1095370

Comment 13 Sandro Bonazzola 2014-06-11 07:04:52 UTC
This is an automated message:
oVirt 3.4.2 has been released.
This bug has been re-targeted from 3.4.2 to 3.4.3 since priority or severity were high or urgent.

Comment 14 Sandro Bonazzola 2014-06-11 07:05:28 UTC
This is an automated message:
oVirt 3.4.2 has been released.
This bug has been re-targeted from 3.4.2 to 3.4.3 since priority or severity were high or urgent.

Comment 15 Dan Kenigsberg 2014-06-11 08:23:34 UTC
Yaniv, this issue is becoming quite old, but what do you make of

> Dec 24 15:40:41 hv02.localdomain python[3192]: detected unhandled Python exception in '/usr/bin/vdsm-tool'

?

Boris, was this exception reproducible in any way? Did you get hold of its traceback?

Comment 16 Yaniv Bronhaim 2014-06-15 11:57:01 UTC
In ovirt-3.3 we don't handle exceptions in vdsm-tool at all. so i guess here a RuntimeError was raised and its the reason for that message.
Dan, I don't understand your question? I don't make anything of that :)

It could be anything, please try to start vdsmd manually and see if you get any errors in syslog. If you still have problems to see errors switch to user vdsm, run /usr/share/vdsm/vdsm and see if you get an exception.

try also to run "vdsm-tool configure" and see if you have any issues during that

please reply with the outputs you get

Comment 17 Boris Derzhavets 2014-06-30 17:07:30 UTC
(In reply to Dan Kenigsberg from comment #15)
> Yaniv, this issue is becoming quite old, but what do you make of
> 
> > Dec 24 15:40:41 hv02.localdomain python[3192]: detected unhandled Python exception in '/usr/bin/vdsm-tool'
> 
> ?
> 
> Boris, was this exception reproducible in any way? Did you get hold of its
> traceback?

In march 2014 it was reproducable ( v 3.3.2) .  I didn't get hold of its traceback. Too much time has passed the system is no longer under my supervision.

Comment 18 Boris Derzhavets 2014-06-30 17:10:54 UTC
(In reply to Dan Kenigsberg from comment #15)
> Yaniv, this issue is becoming quite old, but what do you make of
> 
> > Dec 24 15:40:41 hv02.localdomain python[3192]: detected unhandled Python exception in '/usr/bin/vdsm-tool'
> 
> ?
> 
> Boris, was this exception reproducible in any way? Did you get hold of its
> traceback?

In January 2014 it was reproducable ( v 3.3.2) .  I didn't get hold of its traceback. Too much time has passed the system is no longer under my supervision.

Comment 19 Dan Kenigsberg 2014-07-01 09:40:24 UTC
Unfortunately we did not understand the problem while ovirt-3.3.z was supported; if this issue reproduces on a supported version, please open a clear new bug about it.