| Summary: | Failure to add new host to oVirt 3.3 Server on Fedora 19 | ||
|---|---|---|---|
| Product: | [Retired] oVirt | Reporter: | Boris Derzhavets <bderzhavets> |
| Component: | vdsm | Assignee: | lpeer <lpeer> |
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | yeylon <yeylon> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.3 | CC: | acathrow, alonbl, amureini, bazulay, bderzhavets, bugs, danken, dougsland, iheim, lpeer, mgoldboi, nsoffer, srevivo, ybronhei, yeylon |
| Target Milestone: | --- | ||
| Target Release: | 3.4.3 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | gluster | ||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2014-07-01 09:40:24 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Boris Derzhavets
2013-11-15 09:54:37 UTC
Description of problem: Original box set up per http://community.redhat.com/up-and-running-with-ovirt-3-3/. Plus bug for NFS Server start up and manual creation of ovirtmgmt bridge. Local glusterfs volume configured as default storage. Locally everything works fine. On target box recent (F19) : 1. yum localinstall http://ovirt.org/releases/ovirt-release-fedora.noarch.rpm -y 2. ovirtmgmt bridge manually install 3. NFS server bug https://bugzilla.redhat.com/show_bug.cgi?id=970595 fixed When adding new host via web-admin-console on first box. System fails gets frozen in initializing phase. Status on second box when it gets down :- # service vdsmd status vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: active (running) since Fri 2013-11-15 12:36:44 FET; 29min ago Process: 2661 ExecStart=/lib/systemd/systemd-vdsmd start (code=exited, status=0/SUCCESS) Main PID: 2990 (respawn) CGroup: name=systemd:/system/vdsmd.service ├─2990 /bin/bash -e /usr/share/vdsm/respawn --minlifetime 10 --daemon --masterpid /var/run/vdsm/respawn.pid /usr/share/vdsm/vdsm ├─2992 /usr/bin/python /usr/share/vdsm/vdsm └─3334 /usr/bin/python /usr/share/vdsm/storage/remoteFileHandler.pyc 26 15 Nov 15 12:36:45 ovirt02.localdomain python[2992]: DIGEST-MD5 ask_user_info() Nov 15 12:36:45 ovirt02.localdomain python[2992]: DIGEST-MD5 make_client_response() Nov 15 12:36:45 ovirt02.localdomain python[2992]: DIGEST-MD5 client step 3 Nov 15 12:36:47 ovirt02.localdomain vdsm[2992]: vdsm StorageServer.MountConnection ERROR Mount failed: (1, 'Mount failed. Please check the log file for more details.\n;') Nov 15 12:38:53 ovirt02.localdomain vdsm[2992]: vdsm StorageServer.MountConnection ERROR Mount failed: (32, ';mount.nfs: Connection timed out\n') Nov 15 12:41:00 ovirt02.localdomain vdsm[2992]: vdsm StorageServer.MountConnection ERROR Mount failed: (32, ';mount.nfs: Connection timed out\n') Nov 15 12:41:03 ovirt02.localdomain vdsm[2992]: vdsm TaskManager.Task ERROR Task=`dbfd039f-a950-4ae6-ae45-966ff161c65a`::Unexpected error Nov 15 12:45:01 ovirt02.localdomain vdsm[2992]: vdsm StorageServer.MountConnection ERROR Mount failed: (1, 'Mount failed. Please check the log file for more details.\n;') Nov 15 12:47:07 ovirt02.localdomain vdsm[2992]: vdsm StorageServer.MountConnection ERROR Mount failed: (32, ';mount.nfs: Connection timed out\n') Nov 15 12:49:16 ovirt02.localdomain vdsm[2992]: vdsm StorageServer.MountConnection ERROR Mount failed: (32, ';mount.nfs: Connection timed out\n') # gluster peer status Number of Peers: 1 Hostname: 192.168.1.149 Uuid: 0772f512-d0ae-4cc3-9634-a7ec5d295b0e State: Peer in Cluster (Disconnected) 192.168.1.149 is original oVirt 3.3 F19 box Version-Release number of selected component (if applicable): oVirt 3.3 & F19 with most recent "yum -y update" How reproducible: Install oVirt 3.3 on F19 and try to create a new host. Steps to Reproduce: 1. Make the most recent ovirt repo avalible on new box (F19) 2. Create ovirtmgmt bridge 3. Fix bug with NFS Server 4. Make sure it cannot bring into Glusterfs 3.4.1 Cluster via oVirt 3.3 graphical users interface. Actual results: Failure to create Glusterfs 3.4.1 Cluster (2 nodes) via oVirt 3.3 UI Expected results: Success to create Glusterfs 3.4.1 Cluster (2 nodes) via oVirt 3.3 UI Additional info: I was able to find workaround via selection iptables as firewall manager during initial engine setup on host-1. Then on host-2 supposed to be added :- 1. $ sudo yum localinstall http://ovirt.org/releases/ovirt-release-fedora.noarch.rpm -y 2. set up ovirtmgmt bridge, disabled firewalld and enabled iptables firewall manager Performing adding new host I shut down iptables on host-1 only for period of initializing host-2 . Finally I've got host-2 successfully installed and brought service iptables up. Two node glusterfs 3.4.1 cluster has been already built. However , I was able manage gluster volumes only via CLI. All actions were properly detected by Web Admin Console. Without any connection to oVirt I always have to select iptables as firewall working with glusterfs 3.4.1 clusters on fedora 19. Seems to me as firewalld service issue also affecting Neutron configuration support in RDO Havana 2013.2 Looks like networking issue - Livnat, can you check this? Boris, would you elaborate on why you were able to "manage gluster volumes only via CLI"? Alon, isn't it the job of ovirt-host-deploy to punch the relevant holes in firewalld? In any case, setting firewall rules has traditionally fallen into Doron's domain. (In reply to Dan Kenigsberg from comment #4) > Boris, would you elaborate on why you were able to "manage gluster volumes > only via CLI"? In version 3.3.0 Web Admin gave an error attempting to create replicated volume,but CLI worked fine. Starting with 3.3.1 no problems with Web Admin Console via my experience (In reply to Dan Kenigsberg from comment #4) > Alon, isn't it the job of ovirt-host-deploy to punch the relevant holes in > firewalld? > > In any case, setting firewall rules has traditionally fallen into Doron's > domain. Host deploy currently supports only iptables, it disables firewalld. Iptables rules are taken from engine at vdc_options: - IPTablesConfig - IPTablesConfigForGluster - IPTablesConfigForVirt Any change in these will effect next host-deploy (can be same host). If this related to gluster I guess IPTablesConfigForGluster should be modified, up to gluster maintainer to decide what. (In reply to Dan Kenigsberg from comment #4) ... > In any case, setting firewall rules has traditionally fallen into Doron's > domain. host deploy (and firewall rules) are 'infra'. in this case, seems to be gluster specific, hence 'gluster'. though according to comment 5, 3.3.1 is ok. Boris - is there anything that still needs fixing here? (In reply to Itamar Heim from comment #7) > (In reply to Dan Kenigsberg from comment #4) > ... > > In any case, setting firewall rules has traditionally fallen into Doron's > > domain. > > host deploy (and firewall rules) are 'infra'. in this case, seems to be > gluster specific, hence 'gluster'. > though according to comment 5, 3.3.1 is ok. > Boris - is there anything that still needs fixing here? Regarding version 3.3.2 host deployment:- Personally i was experiencing one issue during second host deployment, which required service vdsmd restart on second host to allow system bring it up at the end of installation. Two installs behaved absolutely similar [root@hv02 ~]# service vdsmd status Redirecting to /bin/systemctl status vdsmd.service vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: active (running) since Tue 2013-12-24 15:40:40 MSK; 50s ago Process: 2896 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS) Main PID: 3166 (vdsm) CGroup: name=systemd:/system/vdsmd.service └─3166 /usr/bin/python /usr/share/vdsm/vdsm Dec 24 15:40:41 hv02.localdomain python[3192]: detected unhandled Python exception in '/usr/bin/vdsm-tool' Dec 24 15:40:41 hv02.localdomain vdsm[3166]: [427B blob data] Dec 24 15:40:41 hv02.localdomain vdsm[3166]: vdsm vds WARNING Unable to load the json rpc server module. Ple...led. Dec 24 15:40:41 hv02.localdomain python[3166]: DIGEST-MD5 client step 2 Dec 24 15:40:41 hv02.localdomain python[3166]: DIGEST-MD5 parse_server_challenge() Dec 24 15:40:41 hv02.localdomain python[3166]: DIGEST-MD5 ask_user_info() Dec 24 15:40:41 hv02.localdomain python[3166]: DIGEST-MD5 client step 2 Dec 24 15:40:41 hv02.localdomain python[3166]: DIGEST-MD5 ask_user_info() Dec 24 15:40:41 hv02.localdomain python[3166]: DIGEST-MD5 make_client_response() Dec 24 15:40:41 hv02.localdomain python[3166]: DIGEST-MD5 client step 3 [root@hv02 ~]# service vdsmd restart Redirecting to /bin/systemctl restart vdsmd.service [root@hv02 ~]# service vdsmd status Redirecting to /bin/systemctl status vdsmd.service vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: active (running) since Tue 2013-12-24 15:41:42 MSK; 2s ago Process: 3355 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS) Process: 3358 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS) Main PID: 3418 (vdsm) CGroup: name=systemd:/system/vdsmd.service └─3418 /usr/bin/python /usr/share/vdsm/vdsm Dec 24 15:41:42 hv02.localdomain vdsmd_init_common.sh[3358]: vdsm: Running test_conflicting_conf Dec 24 15:41:42 hv02.localdomain vdsmd_init_common.sh[3358]: SUCCESS: ssl configured to true. No conflicts Dec 24 15:41:42 hv02.localdomain systemd[1]: Started Virtual Desktop Server Manager. Dec 24 15:41:43 hv02.localdomain vdsm[3418]: vdsm vds WARNING Unable to load the json rpc server module. Ple...led. Dec 24 15:41:43 hv02.localdomain python[3418]: DIGEST-MD5 client step 2 Dec 24 15:41:43 hv02.localdomain python[3418]: DIGEST-MD5 parse_server_challenge() Dec 24 15:41:43 hv02.localdomain python[3418]: DIGEST-MD5 ask_user_info() Dec 24 15:41:43 hv02.localdomain python[3418]: DIGEST-MD5 client step 2 Dec 24 15:41:43 hv02.localdomain python[3418]: DIGEST-MD5 ask_user_info() Dec 24 15:41:43 hv02.localdomain python[3418]: DIGEST-MD5 make_client_response() Dec 24 15:41:43 hv02.localdomain python[3418]: DIGEST-MD5 client step 3 setting target release to current version for consideration and review. please do not push non-RFE bugs to an undefined target release to make sure bugs are reviewed for relevancy, fix, closure, etc. This is an automated message. Re-targeting all non-blocker bugs still open on 3.4.0 to 3.4.1. This is an automated message. oVirt 3.4.1 has been released. This issue has been retargeted to 3.4.2 as it has severity high, please retarget if needed. If this is a blocker please add it to the tracker Bug #1095370 This is an automated message: oVirt 3.4.2 has been released. This bug has been re-targeted from 3.4.2 to 3.4.3 since priority or severity were high or urgent. This is an automated message: oVirt 3.4.2 has been released. This bug has been re-targeted from 3.4.2 to 3.4.3 since priority or severity were high or urgent. Yaniv, this issue is becoming quite old, but what do you make of
> Dec 24 15:40:41 hv02.localdomain python[3192]: detected unhandled Python exception in '/usr/bin/vdsm-tool'
?
Boris, was this exception reproducible in any way? Did you get hold of its traceback?
In ovirt-3.3 we don't handle exceptions in vdsm-tool at all. so i guess here a RuntimeError was raised and its the reason for that message. Dan, I don't understand your question? I don't make anything of that :) It could be anything, please try to start vdsmd manually and see if you get any errors in syslog. If you still have problems to see errors switch to user vdsm, run /usr/share/vdsm/vdsm and see if you get an exception. try also to run "vdsm-tool configure" and see if you have any issues during that please reply with the outputs you get (In reply to Dan Kenigsberg from comment #15) > Yaniv, this issue is becoming quite old, but what do you make of > > > Dec 24 15:40:41 hv02.localdomain python[3192]: detected unhandled Python exception in '/usr/bin/vdsm-tool' > > ? > > Boris, was this exception reproducible in any way? Did you get hold of its > traceback? In march 2014 it was reproducable ( v 3.3.2) . I didn't get hold of its traceback. Too much time has passed the system is no longer under my supervision. (In reply to Dan Kenigsberg from comment #15) > Yaniv, this issue is becoming quite old, but what do you make of > > > Dec 24 15:40:41 hv02.localdomain python[3192]: detected unhandled Python exception in '/usr/bin/vdsm-tool' > > ? > > Boris, was this exception reproducible in any way? Did you get hold of its > traceback? In January 2014 it was reproducable ( v 3.3.2) . I didn't get hold of its traceback. Too much time has passed the system is no longer under my supervision. Unfortunately we did not understand the problem while ovirt-3.3.z was supported; if this issue reproduces on a supported version, please open a clear new bug about it. |