|
7#
楼主 |
发表于 2004-5-24 11:16:41
|
只看该作者
There is too much hype, mythology, and wishful thinking surrounding GUI-based regression automation. They can create an illusion of testing coverage where no significant coverage exists, they can cause serious staff turnover, and they can focus your most skilled staff into designing and maintaining test cases that yield relatively few bugs.
These tools can be genuinely useful, but they require a significant investment, careful planning, trained staff, and great caution.
Appendix
Some of the Conclusions Reached by LAWST Participants [10]
During the last third of each day, we copied several statements made during the discussion onto whiteboards and voted on them. We didn’t attempt to reach consensus. The goal was to gauge the degree to which each statement matched the experience of several experienced testers. In some cases, some of us chose not to vote, either because we lacked the specific experience relevant to this vote, or because we considered the statement ill-framed. (I’ve skipped most of those statements.)
If you’re trying to educate an executive into costs and risks of automation, these vote tallies might be useful data for your discussions.
General principles
These statements are not ultimate truths. In automation planning, as in so many other endeavors, you must keep in mind what problem are you trying to solve, and what context are you trying to solve it in. (Consensus)
GUI test automation is a significant software development effort that requires architecture, standards, and discipline. The general principles that apply to software design and implementation apply to automation design and implementation. (Consensus)
For efficiency and maintainability, we need first to develop an automation structure that is invariant across feature changes; we should develop GUI-based automation content only as features stabilize. (Consensus)
Several of us had a sense of patterns of evolution of a company’s automation efforts over time:
First generalization (7 yes, 1 no): In the absence of previous automation experience, most automation efforts evolve through:
Failure in capture /playback. It doesn’t matter whether we’re capturing bits or widgets (object oriented capture/replay);
Failure in using individually programmed test cases. (Individuals code test cases on their own, without following common standards and without building shared libraries.)
Development of libraries that are maintained on an ongoing basis. The libraries might contain scripted test cases or data-driven tests.
Second generalization (10 yes, 1 no): Common automation initiatives failures are due to:
Using capture/playback as the principle means of creating test cases;
Using individually scripted tested cases (i.e. test cases that individuals code on their own, without following common standards and without building shared libraries);
Using poorly designed frameworks. This is a common problem.
Straight replay of test cases yields a low percentage of defects. (Consensus)
Once the program passes a test, it is unlikely to fail that test again in the future. This led to several statements (none cleanly voted on) that automated testing can be dangerous because it can gives us a falsely warm and fuzzy feeling that the program is not broken. Even if the program isn’t broken today in the ways that it wasn’t broken yesterday, there are probably many ways in which the program is broken. But you won’t find them if you keep looking where the bugs aren’t.
Of the bugs found during an automated testing effort, 60%-80% are found during development of the tests. That is, unless you create and run new test cases under the automation tool right from the start, most bugs are found during manual testing. (Consensus)
(Most of us do not usually use the automation tool to run test cases the first time. In the traditional paradigm, you run the test case manually first, then add it to the automation suite after the program passes the test. However, you can use the tool more efficiently if you have a way of determining whether the program passed or failed the test that doesn’t depend on previously captured output. For example:
Run the same series of tests on the program across different operating system versions or configurations. You may have never tested the program under this particular environment, but you know how it should work.
Run a function equivalence test. [11] In this case, you run two programs in parallel and feed the same inputs to both. The program that you are testing passes the test if its results always match those of the comparison program.
Instrument the code under test so that it will generate a log entry any time that the program reaches an unexpected state, makes an unexpected state transition, manages memory, stack space, or other resources in an unexpected way, or does anything else that is an indicator of one of the types of errors under investigation. Use the test tool to randomly drive the program through a huge number of state transitions, logging the commands that it executes as it goes. The next day, the tester and the programmer trace through the log looking for bugs and the circumstances that triggered them. This is a simple example of a simulation. If you are working in collaboration with the application programming team, you can create tests like this that might use your tool more extensively and more effectively (in terms of finding new bugs per week) than you can achieve on your own, scripting new test cases by hand.)
Automation can be much more successful when we collaborate with the programmers to develop hooks, interfaces, and debug output. (Consensus)
Many of these collaborative approaches don’t rely on GUI-based automation tools, or they use these tools simply as convenient test drivers, without regard to what I’ve been calling the basic GUI regression paradigm. It was fascinating going around the table on the first day of LAWST, hearing automation success stories. In most cases, the most dramatic successes involved collaboration with the programming team, and didn’t involve traditional uses (if any use) of the GUI-based regression tools.
We will probably explore collaborative test design and development in a later meeting of LAWST.
Most code that is generated by a capture utility is unmaintainable and of no long term value. However, the capture utility can be useful when writing a test because it shows how the tool interprets a series of recent events. The script created by the capture tool can give you useful ideas for writing your own code. (Consensus)
We don't use screen shots "at all" because they are a waste of time. (Actually, we mean that we hate using screen shots and use them only when necessary. We do find value in comparing small sections of the screen. And sometimes we have to compare screen shots, perhaps because we’re testing an owner-draw control. But to the extent possible, we should be comparing logical results, not bitmaps.) (Consensus)
Don't lose site of the testing in test automation. It is too easy to get trapped in writing scripts instead of looking for bugs. (Consensus)
Test Design
Automating the easy stuff is probably not the right strategy. (Consensus)
If you start by creating a bunch of simple test cases, you will probably run out of time before you create the powerful test cases. A large collection of simple, easy-to-pass test cases might look more rigorous than ad hod manual testing, but a competent manual tester is probably running increasingly complex tests as the program stabilizes.
Combining tests can find new bugs (the sum is greater than the parts). (Consensus)
There is value in using automated tests that are indeterminate (i.e. random) though we need methods to make a test case determinate. (Consensus)
We aren’t advocating blind testing. You need to know what test you’ve run. And sometimes you need to be able to specify exact inputs or sequences of inputs. But if you can determine whether or not the program is passing the tests that you’re running, there is a lot to be said for constantly giving it new test cases instead of reruns of old tests that it has passed.
We need to plan for the ability to log what testing was done. (Consensus)
Some tools make it easier to log the progress of testing, some make it harder. For debugging purposes and for tracing the progress of testing, you want to know at a glance what tests cases have been run and what the results were.
Staffing and Management
Most of the benefit from automation work that is done during Release N (such as Release 3.0) is realized in Release N+1. There are exceptions to this truism, situations in which you can achieve near-term payback for the automation effort. Examples include smoke tests, some stress tests (some stress tests are impossible unless you automate), and configuration/compatibility tests. (Consensus)
If Release N is the first release of a program that you are automating, then your primary goal in Release N may be to provide scaffolding for automation to be written in Release N+1. Your secondary goal would be light but targeted testing in N. (Consensus)
People need to focus on automation not to do it as an on-the-side task. If no one is dedicated to the task then the automation effort is probably going to be a waste of time. (Consensus)
Many testers are junior programmers who don't know how to architect or create well designed frameworks. (Consensus)
Data-driven approach
The data-driven approach was described in the main paper. I think that it’s safe to say that we all like data-driven approaches, but that none of us would use a data-driven approach in every conceivable situation. Here are a few additional, specific notes from the meeting. |
|