Description of problem: No Version#0 changeset created for drift configurations created during network outage, even after the outage is repaired (scenario #7)
100%. both jsanda and mfoley have repro'd this
Steps to Reproduce:
1. There is a network partition. My configuration was server on Linux, and
agent on my PC ...connected by wifi.
2. I simulated network failure (I turned wifi off)
3. I created a Drift configuration while there was a network outage
4. I turned wifi back on...resolving the network failure
5. I do not observe a version#0 changeset for the new drift configuration
no version #0 changeset
a version #0 changeset
Created attachment 523604 [details]
This issue should be resolved now with changes introduced around error handling.
commit hash: 3f3397557aedabbd11420c022344776d08f76e2e
From the commit log...
This commit introduces several changes and a changed work flow to
address some boundary conditions that can arise when the server fails
to receive a change set report. The issue stems from the way we stream
the change set report to the server. Because the request is processed
aysnchronously, we cannot know for certainty if/when errors arise in the
When DriftDetector runs, a new snapshot file is generated, and now a
copy of the previous version snapshot is maintained as well. After the
server processes the change set, it now sends an ack to the agent. This
lets the agent know that the change set was successfully persisted on
the server. The agent then cleans up, deleting the previous version
snapshot, and the change set zip file.
If drift detection runs again before the agent receives the
ackowlegement, drift detection is skipped. The most likely scenario for
not receiving an acknowledgement would be a network error or a down
If any errors occur during drift detection, which includes sending the
change set to the server, the agent will attempt to revert back to the
previous version snapshot. This is to ensure we have a consistent
snapshot on disk with which to work.
This commit also fixes a bug in the drift inventory sync code. In
situations where there are existing change sets on the server, and the
agent has to fetch a snapshot from the server, the snapshot version was
getting set incorrectly. This is because the snapshot was not being
built correctly. Change sets were being applied out of order. This is
documenting the behavior here:
1) i did *not* get version #0 changeset after repairing the outage
2) i did *not* get a version #0 changeset after repairing the outage and clicking the "detect now" button
3) it was only after a subsequent change was drift detected ... and this change was picked up as version #0
this is different than i expected. i expected version #0 changeset after the network outage was repaired and clicking the "detect now" button.
jsanda ... can you clarify if the behavior i am seeing is correct or not?
You definitely should get that initial change set at some point after agent reconnects with the server. I think I see the problem. I forgot to handle the base case. For any version greater than zero, the agent checks to see if there is a copy of the previous snapshot before doing a drift scan. That previous snapshot only gets removed when the server acknowledges that it processed it successfully; so, its presence let's the agent know that something may have gone wrong. This would likely be the case during a network outage. When agent and server reconnect, an inventory sync runs, and the agent will revert to the previous snapshot which will trigger the agent to resend the change set to the server (assuming that drift is still present). I need to put similar logic in place for the initial change set.
I have retested this and got the expected results. I think this was fixed some time ago and I just forgot to update the BZ.
changing status of VERIFIED BZs for JON 2.4.2 and JON 3.0 to CLOSED/CURRENTRELEASE
marking VERIFIED BZs to CLOSED/CURRENTRELEASE