Performance bug patterns and bug-hunting

skinapi · 发表于 2006-3-17 22:20:23

http://www.testingreflections.com/node/view/3398

Performance bug patterns and bug-hunting
Submitted by Ainars Galvans on Wed, 15/03/2006 - 09:39. heuristics | performance testing | performance testing patterns
One way to test performance is to write scenarios and load system with them and then spend hours in examining why your performance is not as good as you want or even why it crashes at certain load.
Another way is to directly search for those issues that could be reason for load tests once they will be done. I’m going to list patterns for those issues and hints for finding those issues without blindly loading system.

First of all some background/context.
I don’t call myself performance tester although practiced this for some years and now typically manage performance tests either done by testers or developers. So I’m not going to compete with those who know java garbage collection mechanisms, web applications server architecture, etc. I’m going to approach in a somewhat black-box way while describing common performance issue patterns.
Secondly one of my hobbies is math. to be exact I’m giving lessons in combinatoric for undergraduates preparing them for international mathematic Olympiad. Guess what – I prefer using my brains (doing analyze, modeling, calculations and recognizing performance issue patterns) instead of going frenzy scripting use-cases. Running them again-and-again. Tuning both scripts and applications. Finally showing numerous graphs to management convincing them our tools are so great at collecting data.
And last but not least I have always got support from developers during performance testing and never seen them blaming tools (probably because of LoadRunner is so recognized). They have supported me with code, information about DB structure, architecture, etc. I was lucky to have built-in logging (either as debug or normal feature) features on most of applications I have tested. I only sometimes got them not believing in my calculations that I addressed by emulation afterwards...:).

Performance issue patterns
· Issue number one I know is what I call “data scalability”. You should simply populate each one of your tables (that are supposed to be dynamically extended) with at least some 10 000 rows, while the main tables (supposed to be the huge ones) with some million rows. You will need a script for this or a copy of production system. Developers typically provide me with such a scripts. But you will see a lot of issues even in functional testing – some requests will be running for half a minute instead of second or two.
· Client request returning dynamic set of data (e.g. table or tree). Although self-evident this is still unfortunately a common issue that data are retrieved from database row-by-row or one-by-one item instead single SQL statement. I’m not going into details as I believe this is really clear case and it is clear how to test it – just increase data set to be retrieved and see if time increases linearly (wrong) or by logarithm or something (right).
· Lack of indexes in database. Simply monitor DB CPU usage - if a single client call uses significant (see last item “lack of CPU resources” for CPU usage monitoring hints below) amount of CPU resources, this will under load make DB to become slow.
· Simultaneous client requests trying to update the same data: either database row/cell or file in file system, or anything. Just analyze what business data are shared among different users and what activities will update the same value. Typical case is getting next instance out common queue of tasks. Get 3 of 4 computers and try to execute this function simultaneously (press submit button at a time) and examine for issues (e.g. two users get the same item:-). Note: if you have MS SQL DB it does page lock by default, which mean if two requests update two different rows which are next to each other, they will still lock each other... probably there are more such technical issues in other tolls I’m not aware of.
· Simultaneous client requests causing server to use the temporal file in the file system. I believe this case is clear and you need to either do code review or use some specific tools monitoring file system to detect those issues.
· Simultaneous client requests causing server to execute some thread-unsafe 3rd party library call (e.g. MS word converting documents). Either developers forget to add semaphore or added, but it cause all except one thread to return error to client or cause a long waiting. Here I suggest involving developers or again using tolls that examine DLL (or something) usage or read dev. documentation – what 3rd party tools used and read those tools documentation for thread-safety support.
· No or weak support of load balancing for processing user requests. Well I don’t have a good experience with this one... Still it is wrong assumption this can only be tested under load. You should use only few computers as client and closely monitor your hardware balancing (both resource usage tools and reading debug logs). Try to review architecture – is the server (e.g. EJB) stateless or store some dynamic info per user?
· No or weak load balancing for background operations. This is tricky one. Example: two systems need to communicate each other in background. The simplest way is to pass all data in historical order. This will make logic of data processing simpler and work faster. However this will mean you can’t scale your system by adding more computers to process this in parallel. Another example process that go through the table and process each record somehow. If the process is slow you may want to run multiple instances, but then you need to add record locking mechanism that will slow down each separate process. To detect those issues you should simply ask if it is possible or try to run two or more instances.
· I’m not going to discuss batch-processing or server applications working in synchronous mode searching for next data item to process and processing it synchronously. I believe this is clear how to test it. Suggest also see last item “lack of CPU resources” for CPU usage monitoring hints below.
· Have little experience with huge traffic issues but believe it is possible to monitor traffic just the same way as I suggest below monitoring CPU, just need to know how to use this data :). However I typically observed that huge data traffic happens along with any of issues described above such as “Client request returning dynamic set of data”
· Lack of CPU resources once number of users becomes high enough. It tends to happen due to specific (localized) functions overusing CPU (not optimized). However it results in CPU being utilized up to 100% for a short period of time (e.g. second). This does not result in bad response time for single user. However once multiple users will start doing it CPU will become overused. There is a simple hint to allocate those issues within single-user-manual testing. See below
Hints on monitoring resource utilization in manual testing:
Monitor CPU time delta instead of average utilization. In windows task manager there are possible to add column “CPU time” in addition to CPU (usage). This will indicate for each application how much CPU resources it has utilized since it was started. Now if you see value 23 (seconds) and after submitting single user action see it increased to 25 it is an issue. This is not acceptable for server application. You could do simple math yourself and see that it will start lagging once 150 users will submit this request each 5 seconds on average.
You could also do the same request several times in a row and get average utilization. Suggest you to take slow hardware to better see those issues.
One more hint. Most of the client-server apps I’ve seen don’t perform asynchronous operations while processing single client request (while waiting for DB request to be completed it does not perform any business logic or file operations in parallel). It means that if you get your actual time of the response and subtract sum off all CPU utilizations you will get the time spend for network and file operations. This one is still tricky as Oracle for example is able to utilize several CPUs for single SQL execution.

Exceptions and extension to those patterns
First of all resource leaking issues. You could monitor memory usage while doing functional testing and I encourage you to do so. Monitor not only memory, but also critical resources, such as non-handled pool for windows. Still I would never myself be comfortable to say that there are no stability issues without having automated tests run for at least several hours and better at least 48.
The second is that I typically try to encourage developers to do some performance tests themselves for client-server applications. Reusing client code they could quite simply write trivial application that performs few typical client steps in a loop. It should not be hard to add threading to this code. You will not get any reusable results for capacity planning, but you will get great benchmarks for regression testing and a nice/simple stability tests. You will also get your developers to think about performance at least a little bit.

P.S. I will probably extend this list as I remember more stories, but I believe this is the most significant items listed.

		自动登录	找回密码
密码			(注-册)加入51Testing

[转贴] Performance bug patterns and bug-hunting

相关帖子

站长推荐 /1