Bug 1368914 - [WALA] WALA process restart after auto-update when the Azure Server has internal error
Summary: [WALA] WALA process restart after auto-update when the Azure Server has inter...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: WALinuxAgent
Version: 7.3
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Yuxin Sun
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 1387783
TreeView+ depends on / blocked
 
Reported: 2016-08-22 05:49 UTC by Yuxin Sun
Modified: 2019-02-26 20:57 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-04 10:32:57 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github Azure WALinuxAgent issues 360 0 None None None 2016-08-22 05:49:47 UTC

Description Yuxin Sun 2016-08-22 05:49:48 UTC
Github issue: https://github.com/Azure/WALinuxAgent/issues/360

Description of problem:
The process "python -u bin/WALinuxAgent-2.1.6-py2.7.egg -run-exthandlers" unexpectedly restart occasionally after auto-update, when receives Azure Server http responses code 500.

Version-Release number of selected component (if applicable):
WALinuxAgent-2.1.5 (scratch build)

RHEL Version:
RHEL-7.3-20160811.0

How reproducible:
<10%

Steps to Reproduce:
1. Prepare a RHEL7.3 VM on Azure. Enable wala auto-update in the /etc/waagent.conf:
# AutoUpdate.Enabled=y
# AutoUpdate.GAFamily=Prod
2. restart waagent service
# systemctl restart waagent
3. Wait. Check /var/log/waagent.log


Actual results:
There's an error log that the WALinuxAgent unexpectedly restart.

/var/log/waagent.log:

2016/08/17 17:47:32.408668 WARNING Initial upload failed [(000009)Failed to upload block blob: 500]
2016/08/17 17:47:32.423295 INFO getting API versions at [http://10.90.212.9:32526/versions]
2016/08/17 17:47:32.437673 WARNING Agent WALinuxAgent-2.1.6 failed with exception: must be convertible to a buffer, not VMStatus
2016/08/17 17:47:32.462253 WARNING Agent WALinuxAgent-2.1.6 launched with command 'python -u bin/WALinuxAgent-2.1.6-py2.7.egg -run-exthandlers' returned code: 1
2016/08/17 17:47:32.479908 INFO Determined Agent WALinuxAgent-2.1.6 to be the latest agent
2016/08/17 17:47:32.501114 INFO Agent WALinuxAgent-2.1.6 launched with command 'python -u bin/WALinuxAgent-2.1.6-py2.7.egg -run-exthandlers'
2016/08/17 17:47:32.608483 INFO Agent WALinuxAgent-2.1.6 is running as the goal state agent
2016/08/17 17:47:32.624739 INFO Wire server endpoint:10.90.212.9
2016/08/17 17:47:32.633603 INFO Event: name=WALinuxAgent-2.1.6, op=HeartBeat, message=
2016/08/17 17:47:32.643336 INFO Start env monitor service.
2016/08/17 17:47:32.649152 INFO Configure routes
2016/08/17 17:47:32.654530 INFO Gateway:None
2016/08/17 17:47:32.659004 INFO Routes:None
2016/08/17 17:47:32.684194 INFO WALinuxAgent-2.1.6 running as process 4294
2016/08/17 17:47:32.691101 INFO WALinuxAgent-2.1.6 unexpectedly restarted
2016/08/17 17:47:32.698225 ERROR Event: name=WALinuxAgent, op=Restart, message=WALinuxAgent-2.1.6 unexpectedly restarted


Expected results:
WALinuxAgent process doesn't restart even if the Azure Server responses code 500.


Additional info:
1. It seems that everytime there's a code 500, WALinuxAgent restarts. But it's really hard to touch off the code 500 error.

Comment 2 Yuxin Sun 2016-11-04 10:30:10 UTC
Never see it again in WALA-2.2.0. Do some sanity test. Result is pass.


Note You need to log in before you can comment on or make changes to this bug.