Bug 636080 - Reset network setup reset after test if necessary
Summary: Reset network setup reset after test if necessary
Keywords:
Status: CLOSED EOL
Alias: None
Product: Beaker
Classification: Retired
Component: beah
Version: 0.5
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: beaker-dev-list
QA Contact:
URL:
Whiteboard: Misc
Depends On: 872421
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-09-21 13:52 UTC by Marian Csontos
Modified: 2020-02-11 12:18 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-11 12:17:52 UTC
Embargoed:


Attachments (Terms of Use)
draft script for review (2.00 KB, text/plain)
2012-11-02 03:02 UTC, Qixiang Wan
no flags Details

Description Marian Csontos 2010-09-21 13:52:40 UTC
Description of problem:
Task may leave unclean network setup, especially when it ends up with local watchdog. Recipe will be killed by external watchdog in such case.

This may be a result of broken test or demonstration of a bug. Without working network it won't be reported until it's too late and recipe is already killed.

In order to get some logs out of machine do following:

1. Network resuscitation:

If connectivity does not resume in some time, try to reset network setup to last known working configuration.

NOTE: Wait for task end as network may be misconfigured intentionally.

2. Write condition report to console

Include configuration dump and grep|tail of important log files.

Comment 1 Nick Coghlan 2012-10-17 04:35:09 UTC
Bulk reassignment of issues as Bill has moved to another team.

Comment 2 Nick Coghlan 2012-11-01 08:20:14 UTC
I'll be helping with #862518 tomorrow, so reassigning this one.

Current thoughts are that we can probably improve our handling of this situation by writing a (shell or Python 2.2 compatible) script (e.g. "10_restore_network") that runs when the local watchdog fires (see the section on custom scripts in http://git.beaker-project.org/cgit/rhts/tree/doc/README).

This script should check if it can ping the lab controller, if that works, there's nothing to be done since the test didn't break the network, it just died for some reason.

If it *doesn't* work, then it should make a very basic attempt at getting a working network setup back by doing something rough-and-ready like:

1. Forcing "BOOTPROTO=dhcp" in all /etc/sysconfig/network-scripts/ifcfg-* files (other than ifcfg-lo)
2. Try restarting the NetworkManager service
3. If the restart succeeds, try pinging the lab controller again
4. If the ping succeeds, jump to step 9
5. If the ping failed, stop the NetworkManager service again
6. Restart the network service
7. If the restart succeeds, try pinging the lab controller again
8. If the ping failed, bail out (we have nothing left to try)
9. Upload some logs (such as the ifcfg-* files)

Could skip steps 2-5 on versions of RHEL prior to 6 (but it may be easier to just let the restart command fail at step 2 rather than trying to detect those versions)

A potentially attractive alternative to trying to create a "one-size-fits-all" approach is to turn this into a docs bug, and simply better advertise the ability for users to install their *own* network recovery scripts for execution when the local watchdog fire, if they're writing tests that are particularly prone to breaking the test system's network connectivity.

Comment 3 Qixiang Wan 2012-11-02 03:02:07 UTC
Created attachment 636928 [details]
draft script for review

Comment 4 Nick Coghlan 2012-11-02 03:27:12 UTC
This command will run in the same environment as test tasks do, but the associated documentation is rather poor. I've created #872421 to note this, and marking it as a blocker for this bug, as those docs are needed not just for people writing tests, but also for developers writing scripts and other components that need to execute locally on the system under test.

Comment 7 Martin Styk 2020-02-11 12:17:52 UTC
Beah is no longer supported by Beaker development team.
Instead of that, we are working on Restraint test harness. You can find all the features of Restraint here.

https://restraint.readthedocs.io/en/latest/

If you think your RFE should be still implemented as part of Restraint feel free to create a new BZ ticket.

https://bugzilla.redhat.com/enter_bug.cgi?product=Restraint

In case you have any question feel free to reach out to me
Thank you,
Martin Styk <martin.styk>


Note You need to log in before you can comment on or make changes to this bug.