Bug 1209859 - [GSS] "Cluster Updates Are Stale. The Cluster isn't updating Calamari. Please contact Administrator".
Summary: [GSS] "Cluster Updates Are Stale. The Cluster isn't updating Calamari. Please...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Calamari
Version: 1.2.3
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: 1.2.3
Assignee: Christina Meno
QA Contact: Warren
URL:
Whiteboard:
Depends On:
Blocks: 1214399
TreeView+ depends on / blocked
 
Reported: 2015-04-08 11:12 UTC by Vikhyat Umrao
Modified: 2019-06-13 08:22 UTC (History)
6 users (show)

Fixed In Version: calamari-server-1.2.3-13.el6cp, calamari-server-1.2.3-13.el7cp
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1214399 (view as bug list)
Environment:
Last Closed: 2015-04-16 14:36:30 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 1407943 0 None None None Never
Red Hat Product Errata RHBA-2015:0842 0 normal SHIPPED_LIVE Ceph Storage: calamari-server package bug-fix update 2015-04-16 18:27:52 UTC

Description Vikhyat Umrao 2015-04-08 11:12:36 UTC
Description of problem:

Our one of the customer is getting , "Cluster Updates Are Stale. The Cluster isn't updating Calamari. Please contact Administrator". after updating to Red Hat Ceph Storage 1.2.3.

Version-Release number of selected component (if applicable):
Red Hat Ceph Storage 1.2.3
calamari-server-1.2.3-11.el7cp.x86_64
calamari-clients-1.2.3-3.el7cp.x86_64

ceph-0.80.8-5.el7cp.x86_64                                
ceph-common-0.80.8-5.el7cp.x86_64                           
ceph-mon-0.80.8-5.el7cp.x86_64                           
ceph-osd-0.80.8-5.el7cp.x86_64      

How reproducible:
For customer always

Comment 6 Christina Meno 2015-04-09 14:50:43 UTC
Vikhyat,

This is likely a failure in the agent reporting in due to upgrade.

for further diagnosis I would like to see the results of
sudo salt-key -L

and sudo salt '*' ceph.get_heartbeats

as issued from the shell where calamari is running.

this will establish what nodes should be reporting and then verify that they are reporting

If there are no results fron the get_heartbeats command:
ssh into any of the nodes listed in the salt-key -L "Accepted Keys:" list

and run:
sudo tail -f /var/log/salt/minion

if you see a message like:
2015-04-09 07:07:14,565 [salt.crypt       ][CRITICAL] The Salt Master server's public key did not authenticate!
The master may need to be updated if it is a version of Salt lower than 2014.1.11, or
If you are confident that you are connecting to a valid Salt Master, then remove the master public key and restart the Salt Minion.
The master public key can be found at:
/etc/salt/pki/minion/minion_master.pub

It means that the salt-master has rotated it's keys and that we need to remove the stale ones

the process would be for each node reporting to calamari:
sudo rm /etc/salt/pki/minion/minion_master.pub; sudo service salt-minion restart

Comment 12 Christina Meno 2015-04-10 15:44:56 UTC
ok I have identified the fix.

I failed to back-port an upstream fix to harden the socket matching code in the the 1.2.3 release. 

Fix is upstream here:
https://github.com/ceph/calamari/pull/268

Comment 15 Christina Meno 2015-04-10 21:08:11 UTC
after updating the package calamari-server
sudo salt '*' saltutil.sync_modules
must be run from the calamari node

Comment 16 Tamil 2015-04-11 20:17:14 UTC
tested on rhel 7.1 and it looks good.

Comment 22 Christina Meno 2015-04-13 17:09:27 UTC
Vikhyat,

Please know that we discovered this issue while testing the fix.

https://bugzilla.redhat.com/show_bug.cgi?id=1211347

It has the potential to cause the fix to appear to not work.

If having applied the fix for 1209859 customer continues to see the error check date time on the client machine.

Comment 23 Tamil 2015-04-13 22:56:55 UTC
Warren tested the fix on rhel 6.6 as well.

Comment 24 Vikhyat Umrao 2015-04-14 06:27:07 UTC
(In reply to Gregory Meno from comment #22)
> Vikhyat,
> 
> Please know that we discovered this issue while testing the fix.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1211347
> 
> It has the potential to cause the fix to appear to not work.
> 
> If having applied the fix for 1209859 customer continues to see the error
> check date time on the client machine.

Thanks Greg !

Comment 29 errata-xmlrpc 2015-04-16 14:36:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:0842


Note You need to log in before you can comment on or make changes to this bug.