Description of problem: ----------------------- After adding the RHSS Node to the gluster enabled cluster in RHEVM, crash report was seen in the RHSS Node Version-Release number of selected component (if applicable): ------------------------------------------------------------- RHSS - RHSS 2.1 Update2 RC1 ISO - RHSS-2.1-20140116.2 RHEVM IS32.2 [3.3.0-0.45.el6ev] How reproducible: ----------------- Always Steps to Reproduce: -------------------- 1. Add a RHSS Node the gluster enabled cluster Actual results: --------------- Following was observed in RHSS Node : [Fri Jan 17 15:40:43 UTC 2014 root@:~/scripts ]You have mail in /var/spool/mail/root [Fri Jan 17 15:40:43 UTC 2014 root@:~/scripts ] # mail Heirloom Mail version 12.4 7/29/08. Type ? for help. "/var/spool/mail/root": 1 message 1 new >N 1 user.e Fri Jan 17 10:33 41/1487 "[abrt] full crash report" Expected results: ----------------- There should not be any crash happening Additional info: ---------------- The message contained the following info : Message 2: From user.eng.blr.redhat.com Fri Jan 17 12:35:31 2014 Return-Path: <user.eng.blr.redhat.com> X-Original-To: root@localhost Delivered-To: root.eng.blr.redhat.com Date: Fri, 17 Jan 2014 12:35:31 -0500 From: user.eng.blr.redhat.com To: root.eng.blr.redhat.com Subject: [abrt] full crash report User-Agent: Heirloom mailx 12.4 7/29/08 Content-Type: text/plain; charset=us-ascii Status: RO abrt_version: 2.0.8 cmdline: /usr/bin/python /usr/bin/vdsm-tool vdsm-id executable: /usr/bin/vdsm-tool kernel: 2.6.32-358.28.1.el6.x86_64 time: Fri 17 Jan 2014 07:07:41 AM EST uid: 0 username: root sosreport.tar.xz: Binary file, 389832 bytes backtrace: :vdsm-id.py:32:getUUID:RuntimeError: Cannot retrieve host UUID : :Traceback (most recent call last): : File "/usr/bin/vdsm-tool", line 143, in <module> : sys.exit(main()) : File "/usr/bin/vdsm-tool", line 140, in main : return tool_command[cmd]["command"](*args[1:]) : File "/usr/lib64/python2.6/site-packages/vdsm/tool/vdsm-id.py", line 32, in getUUID : raise RuntimeError('Cannot retrieve host UUID') :RuntimeError: Cannot retrieve host UUID : :Local variables in innermost frame: :hostUUID: None
Created attachment 851572 [details] sosreport on RHSS Node
This crash report is seen only once after adding the RHSS Node to gluster enabled cluster and never repeats I removed the RHSS Node that was earlier added to gluster enabled cluster in RHEVM and added it again to the same cluster, no crash report was seen
Adding "REGRESSION" keyword as this bug was not seen with previous builds of RHSS
(In reply to SATHEESARAN from comment #3) > Adding "REGRESSION" keyword as this bug was not seen with previous builds of > RHSS Could it be that the issue is specific to this host? This error can show up on hosts with faulty bios that misses system-uuid. What is the output of `dmidecode -s system-uuid`? If this is indeed the case, `genuuid > /etc/vdsm/vdsm.id` would solve the issue for this host.
(In reply to Dan Kenigsberg from comment #5) > (In reply to SATHEESARAN from comment #3) > > Adding "REGRESSION" keyword as this bug was not seen with previous builds of > > RHSS > > Could it be that the issue is specific to this host? > > This error can show up on hosts with faulty bios that misses system-uuid. > What is the output of `dmidecode -s system-uuid`? > > If this is indeed the case, `genuuid > /etc/vdsm/vdsm.id` would solve the > issue for this host. No. This happens with all the hosts. In my case, these hosts are RHSS VMs installed with RHSS 2.1 Update2 RC1, RHSS-2.1-20140116.2 I don't see that this problem is specific to the host Output of `dmidecode -s system-uuid` [Mon Jan 20 15:30:53 UTC 2014 root.37.86:~ ] # dmidecode -s system-uuid 288B454F-5550-47D7-8C91-43BF66549647
Are these Vms ? Any way, can u please snip the host-deploy log at time of the issue ?
(In reply to Humble Chirammal from comment #7) > Are these Vms ? Yes, all these hosts are VMs > Any way, can u please snip the host-deploy log at time of the issue ? There was error while bootstrapping in host-deploy logs: <snip> 2014-01-19 04:18:28 DEBUG otopi.context context._executeMethod:123 Stage customization METHOD otopi.plugins.ovirt_host_deploy.vdsm.vdsmid.Plugin._detect_id 2014-01-19 04:18:28 DEBUG otopi.plugins.ovirt_host_deploy.vdsm.vdsmid plugin.executeRaw:366 execute: ('/usr/sbin/dmidecode', '-s', 'system-uuid'), executable='None', cwd='None', env=None 2014-01-19 04:18:28 DEBUG otopi.plugins.ovirt_host_deploy.vdsm.vdsmid plugin.executeRaw:383 execute-result: ('/usr/sbin/dmidecode', '-s', 'system-uuid'), rc=0 2014-01-19 04:18:28 DEBUG otopi.plugins.ovirt_host_deploy.vdsm.vdsmid plugin.execute:441 execute-output: ('/usr/sbin/dmidecode', '-s', 'system-uuid') stdout: 288B454F-5550-47D7-8C91-43BF66549647 2014-01-19 04:18:28 DEBUG otopi.plugins.ovirt_host_deploy.vdsm.vdsmid plugin.execute:446 execute-output: ('/usr/sbin/dmidecode', '-s', 'system-uuid') stderr: </snip> I see from the above log that system UUID is not retrievable. But now executing `dmidecode -s system-uuid` gives the system UUID [Mon Jan 20 15:30:53 UTC 2014 root.37.86:~ ] # dmidecode -s system-uuid 288B454F-5550-47D7-8C91-43BF66549647
Created attachment 852706 [details] ovirt host deploy log from RHEVM Attached the ovirt host deploy log from RHEVM
afacit, the vdsm-tool does not exist in downstream.. not sure why it still contribute in the error path..
Please attach the sos report and let us know the VDSM version.
(In reply to Timothy Asir from comment #11) > Please attach the sos report and let us know the VDSM version. VDSM Version is 4.13.0 [root@rhss1 vdsm]# rpm -qi vdsm Name : vdsm Relocations: (not relocatable) Version : 4.13.0 Vendor: Red Hat, Inc. Release : 24.el6rhs Build Date: Wed 15 Jan 2014 05:52:00 AM EST Install Date: Tue 21 Jan 2014 10:03:52 AM EST Build Host: x86-028.build.eng.bos.redhat.com Group : Applications/System Source RPM: vdsm-4.13.0-24.el6rhs.src.rpm Size : 2944289 License: GPLv2+ Signature : RSA/8, Thu 16 Jan 2014 03:18:29 AM EST, Key ID 199e2f91fd431d51 Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> URL : http://www.ovirt.org/wiki/Vdsm Summary : Virtual Desktop Server Manager Description : The VDSM service is required by a Virtualization Manager to manage the Linux hosts. VDSM manages and monitors the host's storage, memory and networks as well as virtual machine creation, other host administration tasks, statistics gathering, and log collection. sosreports from RHSS Node is already attached to this bug, comment1
(In reply to Dan Kenigsberg from comment #5) > > If this is indeed the case, `genuuid > /etc/vdsm/vdsm.id` would solve the > issue for this host. 1. I installed a new RHSS Node using RHSS ISO RC1 2. Created a /etc/vdsm/vdsm-id as following, (i.e) uuidgen > /etc/vdsm/vdsm-id 3. Added RHSS Node to RHEVM and no crash reports are observed RHSS Node I repeated the test, and this time simply touched a file, "/etc/vdsm/vdsm-id", even this time the crash report was not observed. From the ovirt host deploy logs, its very evident that, engine looks for this file, "/etc/vdsm/vdsm-id" and if its not available, it tries to generate it. And that causes the crash report.
Hello, Can you please try out to recreate that manually? # rm /etc/vdsm/vdsm-id # vdsm-tool vdsm-id I would like to know what the output of this command, hopefully it will cause the same abort response. Thanks!
(In reply to Alon Bar-Lev from comment #14) > Hello, > > Can you please try out to recreate that manually? > > # rm /etc/vdsm/vdsm-id > # vdsm-tool vdsm-id > > I would like to know what the output of this command, hopefully it will > cause the same abort response. > > Thanks! Alon, I tried as you mentioned, and there were no problems at all [root@rhss4 ~]# ls /etc/vdsm/ logger.conf mom.conf mom.d/ svdsm.logger.conf [root@rhss4 ~]# vdsm-tool vdsm-id 33ED04E7-1DCA-4297-8C1B-229C48C456E6 There were no abrt responses even after this
(In reply to SATHEESARAN from comment #15) > (In reply to Alon Bar-Lev from comment #14) > > Hello, > > > > Can you please try out to recreate that manually? > > > > # rm /etc/vdsm/vdsm-id > > # vdsm-tool vdsm-id > > > > I would like to know what the output of this command, hopefully it will > > cause the same abort response. > > > > Thanks! > > Alon, > > I tried as you mentioned, and there were no problems at all > > [root@rhss4 ~]# ls /etc/vdsm/ > logger.conf mom.conf mom.d/ svdsm.logger.conf > > [root@rhss4 ~]# vdsm-tool vdsm-id > 33ED04E7-1DCA-4297-8C1B-229C48C456E6 > > There were no abrt responses even after this Thank you! And just to confirm, if you try to add this host in this state via engine you do get abrt response?
(In reply to Alon Bar-Lev from comment #16) > (In reply to SATHEESARAN from comment #15) > > (In reply to Alon Bar-Lev from comment #14) > > > Hello, > > > > > > Can you please try out to recreate that manually? > > > > > > # rm /etc/vdsm/vdsm-id > > > # vdsm-tool vdsm-id > > > > > > I would like to know what the output of this command, hopefully it will > > > cause the same abort response. > > > > > > Thanks! > > > > Alon, > > > > I tried as you mentioned, and there were no problems at all > > > > [root@rhss4 ~]# ls /etc/vdsm/ > > logger.conf mom.conf mom.d/ svdsm.logger.conf > > > > [root@rhss4 ~]# vdsm-tool vdsm-id > > 33ED04E7-1DCA-4297-8C1B-229C48C456E6 > > > > There were no abrt responses even after this > > > Thank you! > > And just to confirm, if you try to add this host in this state via engine > you do get abrt response? Yes, after adding to RHEVM I see crash report from abrt Steps I did was : 1. Removed /etc/vdsm/vdsm-id [root@rhss1 ~]# rm -rf /etc/vdsm/vdsm.id 2. Executed 'vdsm-tool vdsm-id' [root@rhss1 ~]# vdsm-tool vdsm-id D22D99D9-B1F0-4E5E-B2A4-6F4AC6D43F24[root@rhss1 ~]# Note: /etc/vdsm/vdsm-id file was not generated after this 3. Added this RHSS Node to gluster enabled cluster in glusterfs DC (3.3 compatible) I see the crash report getting generated by abrt as below : Message 2: From user.eng.blr.redhat.com Wed Jan 22 09:35:09 2014 Return-Path: <user.eng.blr.redhat.com> X-Original-To: root@localhost Delivered-To: root.eng.blr.redhat.com Date: Wed, 22 Jan 2014 09:35:09 -0500 From: user.eng.blr.redhat.com To: root.eng.blr.redhat.com Subject: [abrt] full crash report User-Agent: Heirloom mailx 12.4 7/29/08 Content-Type: text/plain; charset=us-ascii Status: R abrt_version: 2.0.8 cmdline: /usr/bin/python /usr/bin/vdsm-tool vdsm-id executable: /usr/bin/vdsm-tool kernel: 2.6.32-358.28.1.el6.x86_64 time: Wed 22 Jan 2014 09:35:02 AM EST uid: 0 username: root sosreport.tar.xz: Binary file, 374904 bytes backtrace: :vdsm-id.py:32:getUUID:RuntimeError: Cannot retrieve host UUID : :Traceback (most recent call last): : File "/usr/bin/vdsm-tool", line 143, in <module> : sys.exit(main()) : File "/usr/bin/vdsm-tool", line 140, in main : return tool_command[cmd]["command"](*args[1:]) : File "/usr/lib64/python2.6/site-packages/vdsm/tool/vdsm-id.py", line 32, in getUUID : raise RuntimeError('Cannot retrieve host UUID') :RuntimeError: Cannot retrieve host UUID : :Local variables in innermost frame: :hostUUID: None
For the purpose of double-checking, I did the above test with earlier build of RHSS just before RC, RHSS-2.1-20140106.n.0 located @ http://download.eng.bos.redhat.com/composes/nightly/RHSS-2.1-20140106.n.0/2.1/RHS/x86_64/iso/RHSS-2.1-20140106.n.0-RHS-x86_64-DVD1.iso And this issue was not seen
(In reply to SATHEESARAN from comment #18) > For the purpose of double-checking, I did the above test with earlier build > of RHSS just before RC, RHSS-2.1-20140106.n.0 > located @ > http://download.eng.bos.redhat.com/composes/nightly/RHSS-2.1-20140106.n.0/2. > 1/RHS/x86_64/iso/RHSS-2.1-20140106.n.0-RHS-x86_64-DVD1.iso > > And this issue was not seen right, as detecting the uuid before host-deploy is recent change.
(In reply to SATHEESARAN from comment #17) > (In reply to Alon Bar-Lev from comment #16) > > (In reply to SATHEESARAN from comment #15) > > > (In reply to Alon Bar-Lev from comment #14) > > > > Hello, > > > > > > > > Can you please try out to recreate that manually? > > > > > > > > # rm /etc/vdsm/vdsm-id > > > > # vdsm-tool vdsm-id > > > > > > > > I would like to know what the output of this command, hopefully it will > > > > cause the same abort response. > > > > > > > > Thanks! > > > > > > Alon, > > > > > > I tried as you mentioned, and there were no problems at all > > > > > > [root@rhss4 ~]# ls /etc/vdsm/ > > > logger.conf mom.conf mom.d/ svdsm.logger.conf > > > > > > [root@rhss4 ~]# vdsm-tool vdsm-id > > > 33ED04E7-1DCA-4297-8C1B-229C48C456E6 > > > > > > There were no abrt responses even after this > > > > > > Thank you! > > > > And just to confirm, if you try to add this host in this state via engine > > you do get abrt response? > > Yes, after adding to RHEVM I see crash report from abrt > > Steps I did was : > 1. Removed /etc/vdsm/vdsm-id > [root@rhss1 ~]# rm -rf /etc/vdsm/vdsm.id > > 2. Executed 'vdsm-tool vdsm-id' > [root@rhss1 ~]# vdsm-tool vdsm-id > D22D99D9-B1F0-4E5E-B2A4-6F4AC6D43F24[root@rhss1 ~]# > > Note: /etc/vdsm/vdsm-id file was not generated after this > > 3. Added this RHSS Node to gluster enabled cluster in glusterfs DC (3.3 > compatible) > > I see the crash report getting generated by abrt as below : This is strange! Let's try one more thing.. make sure you do not have /etc/vdsm/vdsm.id. Then try to execute this command remotely using ssh: $ ssh root@host vdsm-tool vdsm-id This is what the engine is doing actually, I do not expect any change... but I try to understand in which case we have this abrt.
> This is strange! > > Let's try one more thing.. make sure you do not have /etc/vdsm/vdsm.id. > > Then try to execute this command remotely using ssh: > > $ ssh root@host vdsm-tool vdsm-id > > This is what the engine is doing actually, I do not expect any change... but > I try to understand in which case we have this abrt. Tried as you mentioned and this time there is a crash report generated by ABRT This command executed from remote machine [Wed Jan 22 10:11:27 UTC 2014 satheesaran@unused:~ ] # ssh root.37.86 vdsm-tool vdsm-id root.37.86's password: WARNING:root:Could not find host UUID. Traceback (most recent call last): File "/usr/bin/vdsm-tool", line 143, in <module> sys.exit(main()) File "/usr/bin/vdsm-tool", line 140, in main return tool_command[cmd]["command"](*args[1:]) File "/usr/lib64/python2.6/site-packages/vdsm/tool/vdsm-id.py", line 32, in getUUID raise RuntimeError('Cannot retrieve host UUID') RuntimeError: Cannot retrieve host UUID The same crash report as in comment17 is seen
(In reply to SATHEESARAN from comment #21) > > This is strange! > > > > Let's try one more thing.. make sure you do not have /etc/vdsm/vdsm.id. > > > > Then try to execute this command remotely using ssh: > > > > $ ssh root@host vdsm-tool vdsm-id > > > > This is what the engine is doing actually, I do not expect any change... but > > I try to understand in which case we have this abrt. > > > Tried as you mentioned and this time there is a crash report generated by > ABRT > > This command executed from remote machine > [Wed Jan 22 10:11:27 UTC 2014 satheesaran@unused:~ ] # ssh root.37.86 > vdsm-tool vdsm-id > root.37.86's password: > WARNING:root:Could not find host UUID. > Traceback (most recent call last): > File "/usr/bin/vdsm-tool", line 143, in <module> > sys.exit(main()) > File "/usr/bin/vdsm-tool", line 140, in main > return tool_command[cmd]["command"](*args[1:]) > File "/usr/lib64/python2.6/site-packages/vdsm/tool/vdsm-id.py", line 32, > in getUUID > raise RuntimeError('Cannot retrieve host UUID') > RuntimeError: Cannot retrieve host UUID > > The same crash report as in comment17 is seen Great!!!!! I guess it is selinux related(?)? anything at other logs? Also, I appreciate if you can attach the output of: $ ssh root.37.86 strace /usr/bin/python /usr/bin/vdsm-tool vdsm-id
patch sent to upstream: http://gerrit.ovirt.org/#/c/23551/
(In reply to Timothy Asir from comment #23) > patch sent to upstream: http://gerrit.ovirt.org/#/c/23551/ first we need to understand why at remote we cannot get the uuid.
Created attachment 853831 [details] output of strace command Attached the output of strace for comment22
(In reply to Alon Bar-Lev from comment #22) > > Great!!!!! > > I guess it is selinux related(?)? anything at other logs? > > Also, I appreciate if you can attach the output of: > > $ ssh root.37.86 strace /usr/bin/python /usr/bin/vdsm-tool vdsm-id Attached the output of the above command as in comment22
""" sudo: sorry, you must have a tty """ yaniv, vdsm should not attempt to run anything using sudo if already at root, check out utils.py::getHostUUID. but if we require that, the sudoers file should be modified to support working without tty to root as well. thanks,
(In reply to Alon Bar-Lev from comment #24) > (In reply to Timothy Asir from comment #23) > > patch sent to upstream: http://gerrit.ovirt.org/#/c/23551/ > > first we need to understand why at remote we cannot get the uuid. As this vdsm.id file is unavailable, vdsm-tool try to retrieve by executing dmidecode with "sudo" access which requires tty enabled.
ssh -t root.37.86 /usr/bin/python /usr/bin/vdsm-tool vdsm-id command will provide the uuid.
Moving the bug out of Corbett. Marking it for documentation.
The update is unnecessary, bug 1056554 handles vdsm part. the attached patch fixes the vdsm-id verb to allow running it remotely as root. reverting bug changes and omit the assign, please update with relevant fields.
Please review the edited Doc Text and sign off.
The doc text is fine
Latest update with testing RHEV 3.4 and RHS I have observed the following : 1. While adding RHS 3.0 ( glusterfs-3.6.0.25-1.el6rhs ) to gluster enabled cluster in Data center (of 3.4 compatibility) in RHEVM 3.4, this issue was not seen 2. While adding RHS 2.1 U2 Node ( after upgrading from channel ), to gluster enabled cluster in Data center (of 3.3 compatibility) in RHEVM 3.4, this issue is still seen ( 100% reproducible ) Below are the evidences : 1. Observation while adding RHS 3.0 Node to gluster enabled cluster in Datacenter ( of compatibility 3.4 ) in RHEVM 3.4 [Thu Jul 31 18:20:25 UTC 2014 root@:~ ] # cat /etc/redhat-storage-release Red Hat Storage Server 3.0 [Thu Jul 31 18:20:31 UTC 2014 root@:~ ] # rpm -qa | grep vdsm vdsm-xmlrpc-4.14.7.2-1.el6rhs.noarch vdsm-python-4.14.7.2-1.el6rhs.x86_64 vdsm-4.14.7.2-1.el6rhs.x86_64 vdsm-reg-4.14.7.2-1.el6rhs.noarch vdsm-gluster-4.14.7.2-1.el6rhs.noarch vdsm-python-zombiereaper-4.14.7.2-1.el6rhs.noarch vdsm-cli-4.14.7.2-1.el6rhs.noarch [Thu Jul 31 18:20:37 UTC 2014 root@:~ ] # rpm -qa | grep gluster glusterfs-3.6.0.25-1.el6rhs.x86_64 glusterfs-geo-replication-3.6.0.25-1.el6rhs.x86_64 gluster-nagios-common-0.1.3-2.el6rhs.noarch gluster-nagios-addons-0.1.6-1.el6rhs.x86_64 samba-glusterfs-3.6.9-168.3.el6rhs.x86_64 glusterfs-api-3.6.0.25-1.el6rhs.x86_64 glusterfs-fuse-3.6.0.25-1.el6rhs.x86_64 glusterfs-server-3.6.0.25-1.el6rhs.x86_64 glusterfs-rdma-3.6.0.25-1.el6rhs.x86_64 vdsm-gluster-4.14.7.2-1.el6rhs.noarch glusterfs-libs-3.6.0.25-1.el6rhs.x86_64 glusterfs-cli-3.6.0.25-1.el6rhs.x86_64 2. Observation while adding RHS 2.1 U2 Node to gluster enabled cluster in Datacenter ( of compatibility 3.3 ) in RHEVM 3.4 [root@test-corbett ~]# mailx Heirloom Mail version 12.4 7/29/08. Type ? for help. "/var/spool/mail/root": 3 messages 3 new >N 1 user Thu Jul 31 14:10 41/1428 "[abrt] full crash report" N 2 user Thu Jul 31 14:10 171/8653 "[abrt] full crash report" N 3 user Thu Jul 31 14:10 171/8653 "[abrt] full crash report" & q Held 3 messages in /var/spool/mail/root [root@test-corbett ~]# rpm -qa | grep vdsm vdsm-gluster-4.13.0-24.el6rhs.noarch vdsm-cli-4.13.0-24.el6rhs.noarch vdsm-python-4.13.0-24.el6rhs.x86_64 vdsm-xmlrpc-4.13.0-24.el6rhs.noarch vdsm-4.13.0-24.el6rhs.x86_64 vdsm-reg-4.13.0-24.el6rhs.noarch vdsm-python-cpopen-4.13.0-24.el6rhs.x86_64 [root@test-corbett ~]# rpm -qa | grep gluster vdsm-gluster-4.13.0-24.el6rhs.noarch glusterfs-libs-3.4.0.59rhs-1.el6rhs.x86_64 glusterfs-server-3.4.0.59rhs-1.el6rhs.x86_64 gluster-swift-account-1.10.0-2.el6rhs.noarch glusterfs-api-3.4.0.59rhs-1.el6rhs.x86_64 gluster-swift-plugin-1.10.0-5.el6rhs.noarch glusterfs-3.4.0.59rhs-1.el6rhs.x86_64 glusterfs-fuse-3.4.0.59rhs-1.el6rhs.x86_64 glusterfs-geo-replication-3.4.0.59rhs-1.el6rhs.x86_64 gluster-swift-object-1.10.0-2.el6rhs.noarch gluster-swift-container-1.10.0-2.el6rhs.noarch samba-glusterfs-3.6.9-167.10.el6rhs.x86_64 gluster-swift-1.10.0-2.el6rhs.noarch gluster-swift-proxy-1.10.0-2.el6rhs.noarch glusterfs-rdma-3.4.0.59rhs-1.el6rhs.x86_64 [root@test-corbett ~]# cat /etc/redhat-storage-release Red Hat Storage Server 2.1 Update 2
This issue has been fixed post rebase to upstream ovirt 3.5 in RHGS release 3.1. Closing this bug. Please reopen a bug against version 3.1 if you encounter the issue again