Bug 1313770 - "Initialize" task is not started for few hosts sometimes
"Initialize" task is not started for few hosts sometimes
Status: CLOSED WONTFIX
Product: Red Hat Storage Console
Classification: Red Hat
Component: core (Show other bugs)
2
Unspecified Unspecified
unspecified Severity high
: ---
: 3
Assigned To: Nishanth Thomas
sds-qe-bugs
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-03-02 05:23 EST by Martin Bukatovic
Modified: 2016-08-03 09:51 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-06-17 06:18:19 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
screenshot of Hosts page (initializing circle is shown there) (34.88 KB, image/png)
2016-03-02 05:23 EST, Martin Bukatovic
no flags Details
screenshot of Taks List page (142.12 KB, image/png)
2016-05-10 11:47 EDT, Martin Bukatovic
no flags Details
screenshot of Hosts page (156.94 KB, image/png)
2016-05-10 11:49 EDT, Martin Bukatovic
no flags Details

  None (edit)
Description Martin Bukatovic 2016-03-02 05:23:41 EST
Created attachment 1132231 [details]
screenshot of Hosts page (initializing circle is shown there)

Description of problem
======================

When I run "accept all" task, the "Accept Node" tasks all finished, but
the "Initialize node" tasks were not started adn when I look at Hosts page,
I see all nodes stuck in "Initializing .." state.

Version-Release number of selected component
============================================

rhscon-ceph-0.0.6-8.el7.x86_64
rhscon-core-0.0.8-7.el7.x86_64
rhscon-ui-0.0.16-1.el7.noarch

rhscon-agent-0.0.3-2.el7.noarch

How reproducible
================

5 % (I hit this only once)

Steps to Reproduce
==================

1. Install USM according to the documentation
2. Accept all nodes
3. Check Tasks and Hosts pages

Actual results
==============

Tasks page shows all "Accept Node" tasks as finished, but doesn't show any
"Initialize" task.

Hosts page shows all nodes in "Initializing ..." phase (the state icon is
spinning in a circle).

Expected results
================

Initialize tasks are actually started and should finish with a success.

In case of a problem, admin can kill stuck task. Moreover some timeout should
be introduced.

Additional info
===============

I let it run in this state for few days, but nothing changed. All nodes
were still "Initializing ...".

Based on a discussing with developers, some solution (timeouts, ability to
reinitialize node) has been already implemented or is coming soon. This BZ
is here for QE team to test this use case properly.
Comment 2 Nishanth Thomas 2016-03-10 05:08:32 EST
Is it reproducible? We are not seeing this issue.
Comment 9 Martin Bukatovic 2016-05-10 11:33:53 EDT
I have just reproduced this issue again with the current build:

ceph storage machines:

 * rhscon-agent-0.0.6-1.el7scon.noarch

RHSC 2.0 machine:

 * ceph-ansible-1.0.5-5.el7scon.noarch
 * ceph-installer-1.0.6-1.el7scon.noarch
 * rhscon-ceph-0.0.13-1.el7scon.x86_64
 * rhscon-core-0.0.16-1.el7scon.x86_64
 * rhscon-ui-0.0.29-1.el7scon.noarch

This time, *the outcome is not as bad* as it was when I hit this issue for
the first time, but it's bad enough to reopen this bug for further
investigation and tracking.

Here is updated *Actual results* section:

Tasks page shows all "Accept Node" tasks as finished, but doesn't show
"Initialize" task for 3 hosts (out of 7 hosts in total).
First, the Hosts page showed these 3 hosts in "Initializing ..." phase 
(the state icon is spinning in a circle), even though there was no "Initialize"
task for them. After some time, the state changed from "Iinitializing ..." into
"Failed" and the icon changed to a red cross in a circle. So the timeout
feature (as Nishanth points out in comment 8) works and the node no longer
remains in the "initializing ..." state forewer.

So the main difference and a good news are:

1) the timeout feature actually works and moved the Host into "Failed" state
2) there is a "Reinitialize" action available in the left popup menu

The bad news are:

1) The root cause of this issue remains: sometimes the Initialize task is not
   properly started for a Host so that it's not visible in the Tasks list,
   which is bad as the user is can miss the issue since there is no error
   neither in the Tasks or Events list.
2) While there is a "Reinitialize" action available, when I click on it,
   nothing happens with the node, so it's not possible to actually reinitialize
   the node from the UI.

Additional problems:

When I try to reinitialize failed Host via "Reinitialize" button, I can find
an entry about this in the Evens list, but it has:

 * is classified as "Info" event, even though the message states:
   Failed to reinitialize node: dhcp-126-79.test.example.com
   (I would expect to see something like a "Fail" or "Error" as a type of this
   Event)
 * no details are available,
Comment 10 Martin Bukatovic 2016-05-10 11:38:07 EDT
Digging deeper, I see the following error in salt minion log on one machine
which failed to initialize:

```
2016-05-10 14:43:20,229 [salt.crypt       ][INFO    ][1108] Waiting 10 seconds before retry.
2016-05-10 14:43:30,251 [salt.crypt       ][DEBUG   ][1108] Decrypting the current master AES key
2016-05-10 14:43:30,251 [salt.crypt       ][DEBUG   ][1108] Loaded minion key: /etc/salt/pki/minion/minion.pem
2016-05-10 14:43:30,261 [salt.crypt       ][ERROR   ][1108] The master failed to decrypt the random minion token
2016-05-10 14:43:30,261 [salt.crypt       ][CRITICAL][1108] The Salt Master server's public key did not authenticate!
The master may need to be updated if it is a version of Salt lower than 2015.5.5, or
If you are confident that you are connecting to a valid Salt Master, then remove the master public key and restart the Salt Minion.
The master public key can be found at:                                          
/etc/salt/pki/minion/minion_master.pub                                          
2016-05-10 14:43:30,262 [salt.cli.daemons ][INFO    ][1108] The salt minion is shut down
```
Comment 11 Martin Bukatovic 2016-05-10 11:47 EDT
Created attachment 1155784 [details]
screenshot of Taks List page

Attaching new screenshots: rhsc.bz-1313770.01.png shows:

There are only 4 "Initiazlie Note" tasks in the Taks list, while there are
7 machines in total.
Comment 12 Martin Bukatovic 2016-05-10 11:49 EDT
Created attachment 1155785 [details]
screenshot of Hosts page

Attaching new screenshots: rhsc.bz-1313770.04.png shows some nodes in
a failed state
Comment 15 Martin Bukatovic 2016-05-11 12:04:55 EDT
This seems to be related to the work flow I use when maintaining my
virtual machines:

1) deploy the qe machines via jenkins (creates machines, installs usm packages)
2) shutdown all the machines
3) create snapshot
4) start the machines again
5) start the testing: accept hosts, ...
Comment 16 Martin Bukatovic 2016-05-13 10:02:59 EDT
Additional information extending comment 10:

The error message in /var/log/salt/minion on affected machine looks like
I need to delete a salt master public key there to retry, but the key is
the same on all hosts (no matter if it's Initialize task was created or not):

~~~
$ ansible -i mbukatov-usm3.hosts usm_nodes -m shell -a 'md5sum /etc/salt/pki/minion/minion_master.pub'
mbukatov-usm3-mon1.usersys.redhat.com | SUCCESS | rc=0 >>
0c86e018c85368edcdd69faaa8995621  /etc/salt/pki/minion/minion_master.pub

mbukatov-usm3-node4.usersys.redhat.com | SUCCESS | rc=0 >>
0c86e018c85368edcdd69faaa8995621  /etc/salt/pki/minion/minion_master.pub

mbukatov-usm3-node2.usersys.redhat.com | SUCCESS | rc=0 >>
0c86e018c85368edcdd69faaa8995621  /etc/salt/pki/minion/minion_master.pub

mbukatov-usm3-mon2.usersys.redhat.com | SUCCESS | rc=0 >>
0c86e018c85368edcdd69faaa8995621  /etc/salt/pki/minion/minion_master.pub

mbukatov-usm3-node1.usersys.redhat.com | SUCCESS | rc=0 >>
0c86e018c85368edcdd69faaa8995621  /etc/salt/pki/minion/minion_master.pub

mbukatov-usm3-node3.usersys.redhat.com | SUCCESS | rc=0 >>
0c86e018c85368edcdd69faaa8995621  /etc/salt/pki/minion/minion_master.pub

mbukatov-usm3-mon3.usersys.redhat.com | SUCCESS | rc=0 >>
0c86e018c85368edcdd69faaa8995621  /etc/salt/pki/minion/minion_master.pub
~~~

That said, I'm able to fix the problem on affected host by:

 * restarting salt-minion there
 * trying to "Reinitialize" from the usm web ui

This time, the "initialize Node" task for the machine is actually created
and succeeds.
Comment 17 Nishanth Thomas 2016-05-24 01:26:43 EDT
This looks more of an issue with your infrastructure. If the minion or master has an old key cached, the the salt-communication will not be up(even though the node gets accepted) for communication and the event will not be emitted by salt-master on which the initialization process starts.

As per 'Comment 15' I am not sure the impact of step 2,3, and 4. Are you starting machine from the snapshot?

In any case its a pre-requisite from the USM standpoint that the minions reached USM cleanly without any errors(as in comment 10) , before starting the acceptance process.
Comment 18 Martin Bukatovic 2016-05-27 04:11:33 EDT
(In reply to Nishanth Thomas from comment #17)
> This looks more of an issue with your infrastructure. If the minion or
> master has an old key cached, the the salt-communication will not be up(even
> though the node gets accepted) for communication and the event will not be
> emitted by salt-master on which the initialization process starts.

This looks like a corner case, that's true. But this is not a issue with
my infrastructure (see details below). Moreover when it happens, RHSC 2.0 is
unable to even notice and properly inform the user about problem, which leads
to many inconsistencies, which I still consider a valid issue.

> As per 'Comment 15' I am not sure the impact of step 2,3, and 4. Are you
> starting machine from the snapshot?

All I did (from salt or RHSC 2.0 point of view) was a restart of all machines
of the cluster (including usm server and ceph storage servers). The fact
that I did a snapshot of a turned off machine has no effect on the machine
itself, because this was done while the machines were powered down.

I did a little experiment here to check this:

Run "Accept All" on freshly provisioned cluster 3 times:

 * 1 node failed (initialize task was shown, so ok), the rest was ok
 * all node accepted with success (both accept and initialize tasks)
 * all node accepted with success (both accept and initialize tasks)

Run "Accept All" on freshly provisioned cluster which I have just rebooted
(I restarted all machines, including RHSC 2.0 server itself) 3 times:

 * 1 node stuck in "accepting" (reproducing this BZ), the rest was ok,
   this means that for the one stuck node:
    - the accept task succeeded
    - but there was no initialize task for this node
    - there was no event for this
    - salt-minion on that node shut itself off
 * 1 node stuck in "accepting" (reproducing this BZ), the rest was ok
 * 4 node stuck in "accepting" (reproducing this BZ), the rest was ok

Run "Accept All" on freshly provisioned cluster which I have just rebooted
(this time I restarted all machines of the cluster with the exception of
RHSC 2.0 server) 3 times:

 * 1 node stuck in "accepting" (reproducing this BZ), the rest was ok
 * 3 node stuck in "accepting" (reproducing this BZ), the rest was ok
 * 3 node stuck in "accepting" (reproducing this BZ), the rest was ok

This is clear evidence that rebooting nodes before running "Accept All"
increases the likelihood that this issues happens.

> In any case its a pre-requisite from the USM standpoint that the minions
> reached USM cleanly without any errors(as in comment 10) , before starting
> the acceptance process.

Ok, that's true and I don't object it here. 

On the other hand, I would expect that RHSC 2.0 would be able to *notice* that
the prerequisites are not fulfilled and to provide a report about this problem 
to the user in clear way.

Right now, I see that the node is stuck in "Accepting" state even though
the "Accept" task finished with success, there is no error/event reported
and "Initialize" task is missing without any explanation ...
Comment 19 Martin Bukatovic 2016-06-08 14:11:21 EDT
For the record: version of salt packages

On RHSC 2.0 machine:

salt-master-2015.5.5-1.el7.noarch
salt-2015.5.5-1.el7.noarch

On storage server machines:

salt-minion-2015.5.5-1.el7.noarch
salt-2015.5.5-1.el7.noarch
Comment 20 Martin Bukatovic 2016-06-08 14:25:20 EDT
Based on the request for the dev team, I rechecked the issue with the following
results.

01: initial state
=================

I have rebooted all machines which were just provisioned by our USM QE jenkins.

On USM server I see: 

~~~
# salt-key --finger-all
Local Keys:
master.pem:  96:da:8c:88:71:81:fd:af:66:ea:08:ac:f7:2b:41:78
master.pub:  58:da:a6:dd:ec:65:5a:67:80:00:00:f9:bb:94:b4:6e
Unaccepted Keys:
dhcp-126-79.lab.eng.brq.redhat.com:  54:d5:be:1a:e2:85:14:32:bb:f7:c9:ab:e0:03:68:3d
dhcp-126-80.lab.eng.brq.redhat.com:  b2:f8:03:36:22:24:04:49:5c:90:95:bb:77:98:ad:d8
dhcp-126-83.lab.eng.brq.redhat.com:  e3:46:f4:29:81:ac:1e:29:f4:4a:29:be:5a:c5:40:55
dhcp-126-81.lab.eng.brq.redhat.com:  b4:c0:31:84:74:81:e7:21:28:b7:dd:f2:1a:d4:1b:03
dhcp-126-82.lab.eng.brq.redhat.com:  3e:6f:85:a9:a0:ea:0c:c2:5c:0a:67:a8:44:1e:57:e4
dhcp-126-85.lab.eng.brq.redhat.com:  b0:88:da:20:6a:a1:3b:d3:df:6e:74:98:69:cd:f2:59
dhcp-126-84.lab.eng.brq.redhat.com:  ef:ec:48:47:c5:45:72:73:7b:a4:73:db:89:3d:ff:a8
~~~

And I have verified that master has cached the keys right. Every key salt
master stores in /etc/salt/pki/master/minions_pre/${host} matches particular
/etc/salt/pki/minion/minion.pub on the ${host}.


02: Running "Accept All"
========================

Then I start "Accept All" task via USM UI:

~~~
# salt-key --finger-all
Local Keys:
master.pem:  96:da:8c:88:71:81:fd:af:66:ea:08:ac:f7:2b:41:78
master.pub:  58:da:a6:dd:ec:65:5a:67:80:00:00:f9:bb:94:b4:6e
Accepted Keys:
dhcp-126-79.lab.eng.brq.redhat.com:  54:d5:be:1a:e2:85:14:32:bb:f7:c9:ab:e0:03:68:3d
dhcp-126-80.lab.eng.brq.redhat.com:  b2:f8:03:36:22:24:04:49:5c:90:95:bb:77:98:ad:d8
dhcp-126-83.lab.eng.brq.redhat.com:  e3:46:f4:29:81:ac:1e:29:f4:4a:29:be:5a:c5:40:55
dhcp-126-81.lab.eng.brq.redhat.com:  b4:c0:31:84:74:81:e7:21:28:b7:dd:f2:1a:d4:1b:03
dhcp-126-82.lab.eng.brq.redhat.com:  3e:6f:85:a9:a0:ea:0c:c2:5c:0a:67:a8:44:1e:57:e4
dhcp-126-85.lab.eng.brq.redhat.com:  b0:88:da:20:6a:a1:3b:d3:df:6e:74:98:69:cd:f2:59
dhcp-126-84.lab.eng.brq.redhat.com:  ef:ec:48:47:c5:45:72:73:7b:a4:73:db:89:3d:ff:a8
~~~

And everything looks good.

But in the meantime, I hit the issue and on 3 nodes, salt minion shutdown
itself:

~~~
# salt '*' test.ping
dhcp-126-82.lab.eng.brq.redhat.com:
    True
dhcp-126-79.lab.eng.brq.redhat.com:
    True
dhcp-126-83.lab.eng.brq.redhat.com:
    True
dhcp-126-85.lab.eng.brq.redhat.com:
    True
dhcp-126-84.lab.eng.brq.redhat.com:
    Minion did not return. [Not connected]
dhcp-126-81.lab.eng.brq.redhat.com:
    Minion did not return. [Not connected]
dhcp-126-80.lab.eng.brq.redhat.com:
    Minion did not return. [Not connected]
~~~

On one affected machine:

~~~
mbukatov-usm3-mon2.usersys.redhat.com | FAILED | rc=3 >>
● salt-minion.service - The Salt Minion
   Loaded: loaded (/usr/lib/systemd/system/salt-minion.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Wed 2016-06-08 19:21:17 CEST; 5min ago
  Process: 893 ExecStart=/usr/bin/salt-minion (code=exited, status=0/SUCCESS)
 Main PID: 893 (code=exited, status=0/SUCCESS)

Jun 08 19:21:07 dhcp-126-80.lab.eng.brq.redhat.com salt-minion[893]: [INFO    ] Waiting 10 seconds before retry.
Jun 08 19:21:17 dhcp-126-80.lab.eng.brq.redhat.com salt-minion[893]: [DEBUG   ] Decrypting the current master AES key
Jun 08 19:21:17 dhcp-126-80.lab.eng.brq.redhat.com salt-minion[893]: [DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
Jun 08 19:21:17 dhcp-126-80.lab.eng.brq.redhat.com salt-minion[893]: [ERROR   ] The master failed to decrypt the random minion token
Jun 08 19:21:17 dhcp-126-80.lab.eng.brq.redhat.com salt-minion[893]: [CRITICAL] The Salt Master server's public key did not authenticate!
Jun 08 19:21:17 dhcp-126-80.lab.eng.brq.redhat.com salt-minion[893]: The master may need to be updated if it is a version of Salt lower than 2015.5.5, or
Jun 08 19:21:17 dhcp-126-80.lab.eng.brq.redhat.com salt-minion[893]: If you are confident that you are connecting to a valid Salt Master, then remove the mast
er public key and restart the Salt Minion.
Jun 08 19:21:17 dhcp-126-80.lab.eng.brq.redhat.com salt-minion[893]: The master public key can be found at:
Jun 08 19:21:17 dhcp-126-80.lab.eng.brq.redhat.com salt-minion[893]: /etc/salt/pki/minion/minion_master.pub
Jun 08 19:21:17 dhcp-126-80.lab.eng.brq.redhat.com salt-minion[893]: [INFO    ] The salt minion is shut down
~~~

If I recheck the keys now, all keys are ok:

Every key salt master stores in /etc/salt/pki/master/minions_pre/${host}
matches particular /etc/salt/pki/minion/minion.pub on the ${host}.

Moreover /etc/salt/pki/minion/minion_master.pub file on each salt minion host
matches /etc/salt/pki/master/master.pub on salt master machine.

So it seems that it's not a problem with the keys.

03: Restarting salt-minions
===========================

So now I restart salt-minion on all storage nodes. And I see that:

1) salt-minion is up and running everywhere:

~~~
# salt '*' test.ping
dhcp-126-79.lab.eng.brq.redhat.com:
    True
dhcp-126-85.lab.eng.brq.redhat.com:
    True
dhcp-126-84.lab.eng.brq.redhat.com:
    True
dhcp-126-82.lab.eng.brq.redhat.com:
    True
dhcp-126-83.lab.eng.brq.redhat.com:
    True
dhcp-126-81.lab.eng.brq.redhat.com:
    True
dhcp-126-80.lab.eng.brq.redhat.com:
    True
~~~

2) All the keys are still ok (here I did the same checks as in previous step)

So it all works fine after all and no key was changed during this process.
Again, based on this it seems to me that it's not a problem with the keys.

04: Reinitialize
================

If I click on "reinitialize" from the Hosts page now, new Initialize task is
created and finished with success for all affected hosts.

So it all works after all. On affected storage servers (salt minion machines),
I just needed to:

 * restart salt-minion services
 * run "reinitialize" via USM web
Comment 22 Martin Bukatovic 2016-06-08 14:44:03 EDT
After closer examination, I noticed that while keys hasn't been changed,
configuration of affected nodes differs a bit compared to the nodes which
initialized successfully:

On salt minion machines which initialized with success, the following lines
were added into /etc/salt/minion.d/_schedule.conf file:

~~~
schedule:
  __mine_interval: {function: mine.update, jid_include: true, maxrunning: 2, minutes: 60}
~~~

While on the nodes which failed to initialize, file _schedule.conf hasn't been
changed at all.

Then after I restarted affected salt minion services, I noticed that
_schedule.conf file was changed in the same way as was previously done
on successful machines only.

So it looks like that for some reason, configuration randomly fails on some
salt minion machines when I reboot them before running "Accept All" via RHSC 2.0
Comment 24 Martin Bukatovic 2016-06-08 14:59:14 EDT
So based on my investigation, I see 2 problems here:

1) The root cause is missing reconfiguration of salt minion service
(changes of /etc/salt/minion.d/_schedule.conf).

Question: which component is responsible for this step?

2) RHSC 2.0 fails to notice the issue. And instead of presenting that Initialize
task failed to start immediately, it reports *nothing at all* as if the start
of Initialize task were delayed while having task spinner icon running until
it timeouts. This is not acceptable. Console should be able to detect the problem
immediately (which *should be possible* - the initialize task failed to start
after all).

I consider this 2nd point more severe. I just wasted lot of time debugging this
problem just because console *didn't properly report the problem*.

=> increasing severity to HIGH
Comment 25 Martin Bukatovic 2016-06-08 15:02:28 EDT
Question for the problem #1: which component is responsible for configuration
changes done in etc/salt/minion.d/_schedule.conf file?
Comment 26 Nishanth Thomas 2016-06-09 03:03:36 EDT
(In reply to Martin Bukatovic from comment #24)
> So based on my investigation, I see 2 problems here:
> 
> 1) The root cause is missing reconfiguration of salt minion service
> (changes of /etc/salt/minion.d/_schedule.conf).
> 
> Question: which component is responsible for this step?

RHSC is not managing this conf file. This is updated by salt itself after successful acceptance

> 
> 2) RHSC 2.0 fails to notice the issue. And instead of presenting that
> Initialize
> task failed to start immediately, it reports *nothing at all* as if the start
> of Initialize task were delayed while having task spinner icon running until
> it timeouts. This is not acceptable. Console should be able to detect the
> problem
> immediately (which *should be possible* - the initialize task failed to start
> after all).
> 

One thing to understand here is that RHSC start managing the node once it is accepted and initialized. There are two steps involved here

1. Accepting the node(just the keys) - local operation, happens locally at the USM server. acceptance will succeed even if the salt minion is down 

2. Once acceptance is done master tries to establish the communication to the minion. once the channel is up, master sends an event which is consumed by the USM to start the initialization.

> I consider this 2nd point more severe. I just wasted lot of time debugging
> this
> problem just because console *didn't properly report the problem*.

In this case the channel will never come up(as the salt minion is shutdown), USM keep waiting for the event and finally times out. USM or salt master can never detect this until and unless the the minion is accepted and initialized. 

What we could do is that we can generate an audit log(Info event) stating the the initialize failed once the node acceptance marked as failed

> 
> => increasing severity to HIGH
Comment 27 gowtham 2016-06-10 01:02:15 EDT
I had tried to reproduce the problem, But everything is works fine.

Steps followed:
 1.Installed RHS in fresh RHEL machine.
 2.Configured everything in fresh storage nodes.
 3.After curl command i have checked server whether it shows all unaccepted nodes with the help of salt-key -L command.
 4.Restarted all the storage nodes and sever.
 5.Run RHS as service.
 6.Accepted all the unmanaged nodes from UI.
 7.After acceptance nodes initialized successfully.
Comment 28 Nishanth Thomas 2016-06-17 06:18:19 EDT
Closing this bug as per the discussion and comment 26
Comment 29 Lubos Trilety 2016-08-03 09:49:55 EDT
The problem is still valid, reproduced on 
rhscon-core-0.0.39-1.el7scon.x86_64
rhscon-core-selinux-0.0.39-1.el7scon.noarch
rhscon-ceph-0.0.39-1.el7scon.x86_64
rhscon-ui-0.0.51-1.el7scon.noarch
salt-2015.5.5-1.el7.noarch
salt-master-2015.5.5-1.el7.noarch
salt-selinux-0.0.39-1.el7scon.noarch
salt-minion-2015.5.5-1.el7.noarch

# cat /var/log/salt/minion
...
2016-08-03 15:04:07,772 [salt.crypt       ][DEBUG   ][1001] Loaded minion key: /etc/salt/pki/minion/minion.pem
2016-08-03 15:04:07,793 [salt.crypt       ][ERROR   ][1001] The master failed to decrypt the random minion token
2016-08-03 15:04:07,794 [salt.crypt       ][CRITICAL][1001] The Salt Master server's public key did not authenticate!
The master may need to be updated if it is a version of Salt lower than 2015.5.5, or
If you are confident that you are connecting to a valid Salt Master, then remove the master public key and restart the Salt Minion.
The master public key can be found at:
/etc/salt/pki/minion/minion_master.pub
2016-08-03 15:04:07,794 [salt.cli.daemons ][INFO    ][1001] The salt minion is shut down
...

Time out on usm side without any knowledge where the issue is. At least some indication that salt master lost contact with salt minion could be helpful.

Note You need to log in before you can comment on or make changes to this bug.