Description of problem: pcsd fails to start on fedora 19 (minimal install); ==== Jun 15 12:39:44 pcmk1 systemd: Starting PCS GUI... Jun 15 12:39:44 pcmk1 systemd: Started PCS GUI. Jun 15 12:39:45 pcmk1 pcsd: Starting pcsd: /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:45:in `require': cannot load such file -- rpam_ext (LoadError) Jun 15 12:39:45 pcmk1 pcsd: from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:45:in `require' Jun 15 12:39:45 pcmk1 pcsd: from /usr/lib/pcsd/gemhome/gems/rpam-ruby19-1.2.1/lib/rpam.rb:1:in `<top (required)>' Jun 15 12:39:45 pcmk1 pcsd: from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:110:in `require' Jun 15 12:39:45 pcmk1 pcsd: from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:110:in `rescue in require' Jun 15 12:39:45 pcmk1 pcsd: from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:35:in `require' Jun 15 12:39:45 pcmk1 pcsd: from /usr/lib/pcsd/auth.rb:4:in `<top (required)>' Jun 15 12:39:45 pcmk1 pcsd: from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:45:in `require' Jun 15 12:39:45 pcmk1 pcsd: from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:45:in `require' Jun 15 12:39:45 pcmk1 pcsd: from /usr/lib/pcsd/pcsd.rb:11:in `<top (required)>' Jun 15 12:39:45 pcmk1 pcsd: from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:45:in `require' Jun 15 12:39:45 pcmk1 pcsd: from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:45:in `require' Jun 15 12:39:45 pcmk1 pcsd: from /usr/lib/pcsd/ssl.rb:47:in `<main>' Jun 15 12:39:45 pcmk1 pcsd: [FAILED] ==== Version-Release number of selected component (if applicable): pcs-0.9.44-4.fc19.x86_64 How reproducible: 100% Steps to Reproduce: 1. Install 'pacemaker corosync pcs' 2. Try to start 'pcsd', examine logs 3. Actual results: Fails to start. Expected results: Starts. Additional info: Updated the OS prior to testing.
Tried to manually install rpam gem, failed with: ==== [root@pcmk1 ~]# gem install rpam Fetching: rpam-1.0.1.gem (100%) Building native extensions. This could take a while... ERROR: Error installing rpam: ERROR: Failed to build gem native extension. /usr/bin/ruby extconf.rb mkmf.rb can't find header files for ruby at /usr/share/include/ruby.h Gem files will remain installed in /usr/local/share/gems/gems/rpam-1.0.1 for inspection. Results logged to /usr/local/share/gems/gems/rpam-1.0.1/ext/Rpam/gem_make.out ==== Installing 'gcc ruby-devel pam-devel' and 'gem install rpam' worked. Tried starting 'pcsd' and it appears to now work. So, the pcsd package is missing the 'rpam' gem.
This does not appear to actually make pcsd work though... ==== [root@pcmk1 ~]# systemctl status pcsd.service pcsd.service - PCS GUI Loaded: loaded (/usr/lib/systemd/system/pcsd.service; enabled) Active: active (running) since Sat 2013-06-15 13:27:24 NDT; 25s ago Main PID: 226 (pcsd) CGroup: name=systemd:/system/pcsd.service ├─226 /bin/sh /usr/lib/pcsd/pcsd start ├─259 /bin/bash -c ulimit -S -c 0 >/dev/null 2>&1 ; /usr/bin/ruby -I/usr/lib/pcsd /usr... └─260 /usr/bin/ruby-mri -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb [root@pcmk1 ~]# pcs cluster auth pcmk1.alteeve.ca pcmk2.alteeve.ca Username: hacluster Password: Error: unable to connect to pcsd on pcmk1.alteeve.ca Error connecting to pcmk1.alteeve.ca - (HTTP error: 500) [root@pcmk1 ~]# pcs cluster auth pcmk1 pcmk2 Username: hacluster Password: Error: unable to connect to pcsd on pcmk1 Error connecting to pcmk1 - (HTTP error: 500) [root@pcmk1 ~]# uname -n pcmk1.alteeve.ca [root@pcmk1 ~]# hostname pcmk1.alteeve.ca ==== I set and reset the 'hacluster' user's password and tried both short and long names from both nodes. Neither would authenticate.
It looks like the daemon is not listening at all; I did a --local setup and manually pushed out corosync.conf. Trying to start it via pcs failed; ==== [root@pcmk1 ~]# pcs cluster start --all Unable to authenticate to pcmk1.alteeve.ca - (HTTP error: 401) Unable to authenticate to pcmk2.alteeve.ca - (HTTP error: 401) ==== I was able to start it locally on each node though. ==== [root@pcmk1 ~]# pcs cluster start Starting Cluster... [root@pcmk1 ~]# pcs status Cluster name: an-cluster-03 WARNING: no stonith devices and stonith-enabled is not false Last updated: Sat Jun 15 13:39:02 2013 Last change: Sat Jun 15 13:38:45 2013 via crmd on pcmk1.alteeve.ca Current DC: NONE 2 Nodes configured, unknown expected votes 0 Resources configured. Node pcmk1.alteeve.ca (1): UNCLEAN (offline) Node pcmk2.alteeve.ca (2): UNCLEAN (offline) Full list of resources: [root@pcmk1 ~]# corosync-c corosync-cfgtool corosync-cmapctl corosync-cpgtool [root@pcmk1 ~]# corosync-c corosync-cfgtool corosync-cmapctl corosync-cpgtool [root@pcmk1 ~]# corosync-cfgtool -s Printing ring status. Local node ID 1 RING ID 0 id = 192.168.122.11 status = ring 0 active with no faults ====
Following up on my conversation is feist on #linux-cluster; I wiped my test nodes and reinstalled fresh. I installed 'pcs-0.9.44-5.fc19.x86_64.rpm' from koji and tested. Issue seems to be resolved now. I will leave it to feist to close this bug in case he wants to append any data before doing so.
*** Bug 975015 has been marked as a duplicate of this bug. ***
*** Bug 973786 has been marked as a duplicate of this bug. ***