Bug 1368914

Summary: [WALA] WALA process restart after auto-update when the Azure Server has internal error
Product: Red Hat Enterprise Linux 7 Reporter: Yuxin Sun <yuxisun>
Component: WALinuxAgentAssignee: Yuxin Sun <yuxisun>
Status: CLOSED CURRENTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.3CC: bihan, leiwang, wshi, yuxisun
Target Milestone: rcKeywords: Extras, Tracking
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-04 10:32:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1387783    

Description Yuxin Sun 2016-08-22 05:49:48 UTC
Github issue: https://github.com/Azure/WALinuxAgent/issues/360

Description of problem:
The process "python -u bin/WALinuxAgent-2.1.6-py2.7.egg -run-exthandlers" unexpectedly restart occasionally after auto-update, when receives Azure Server http responses code 500.

Version-Release number of selected component (if applicable):
WALinuxAgent-2.1.5 (scratch build)

RHEL Version:
RHEL-7.3-20160811.0

How reproducible:
<10%

Steps to Reproduce:
1. Prepare a RHEL7.3 VM on Azure. Enable wala auto-update in the /etc/waagent.conf:
# AutoUpdate.Enabled=y
# AutoUpdate.GAFamily=Prod
2. restart waagent service
# systemctl restart waagent
3. Wait. Check /var/log/waagent.log


Actual results:
There's an error log that the WALinuxAgent unexpectedly restart.

/var/log/waagent.log:

2016/08/17 17:47:32.408668 WARNING Initial upload failed [(000009)Failed to upload block blob: 500]
2016/08/17 17:47:32.423295 INFO getting API versions at [http://10.90.212.9:32526/versions]
2016/08/17 17:47:32.437673 WARNING Agent WALinuxAgent-2.1.6 failed with exception: must be convertible to a buffer, not VMStatus
2016/08/17 17:47:32.462253 WARNING Agent WALinuxAgent-2.1.6 launched with command 'python -u bin/WALinuxAgent-2.1.6-py2.7.egg -run-exthandlers' returned code: 1
2016/08/17 17:47:32.479908 INFO Determined Agent WALinuxAgent-2.1.6 to be the latest agent
2016/08/17 17:47:32.501114 INFO Agent WALinuxAgent-2.1.6 launched with command 'python -u bin/WALinuxAgent-2.1.6-py2.7.egg -run-exthandlers'
2016/08/17 17:47:32.608483 INFO Agent WALinuxAgent-2.1.6 is running as the goal state agent
2016/08/17 17:47:32.624739 INFO Wire server endpoint:10.90.212.9
2016/08/17 17:47:32.633603 INFO Event: name=WALinuxAgent-2.1.6, op=HeartBeat, message=
2016/08/17 17:47:32.643336 INFO Start env monitor service.
2016/08/17 17:47:32.649152 INFO Configure routes
2016/08/17 17:47:32.654530 INFO Gateway:None
2016/08/17 17:47:32.659004 INFO Routes:None
2016/08/17 17:47:32.684194 INFO WALinuxAgent-2.1.6 running as process 4294
2016/08/17 17:47:32.691101 INFO WALinuxAgent-2.1.6 unexpectedly restarted
2016/08/17 17:47:32.698225 ERROR Event: name=WALinuxAgent, op=Restart, message=WALinuxAgent-2.1.6 unexpectedly restarted


Expected results:
WALinuxAgent process doesn't restart even if the Azure Server responses code 500.


Additional info:
1. It seems that everytime there's a code 500, WALinuxAgent restarts. But it's really hard to touch off the code 500 error.

Comment 2 Yuxin Sun 2016-11-04 10:30:10 UTC
Never see it again in WALA-2.2.0. Do some sanity test. Result is pass.