The Risks Of Testing in Scaled Performance Environments

Half Size Quater Capacity

This article will explain the risks associated with using a scaled (aka downsized) environment for Performance Testing.  Performance testing is frequently executed against a smaller sized environment than the live production. Testing against the production environment is the ideal solution but not possible due to several factors ranging from cost, practicality and risk. This article will specifically detail the risks associated with performance testing regular drops in a scaled environment prior to releasing in go-live.

One of the most complicated questions to succinctly answer and quantify when performance testing is: “If we half the size of the performance/load testing environment can’t we just extrapolate the results to a full sized environment figure?” This is a straightforward question and the answer is “No”.  However explaining why in simple terms is more difficult.   Particularly to PM’s and people not directly attached to the technology.  So I’m going to attempt to answer in simple terms why scaled load testing environments tend not to work and highlight the risks to be considered when using them.  

First lets take an object: A square  – if we half the square, do we get half the size? Well yes and no – the square is half the size, but its capacity is 1/4 of the original square.  

Also, if you half the size of a cube (or engine), you get 1/8 of the volume or capacity. This is a very simplistic view – but it illustrates if something is halved then its capacity may not be. Performance environments can be thought of as irregular blobs, not simple squares – and if you half the size of them you do not get a proportional decrease in capacity.  I’m setting the scene so please bear with me …..

IT projects are complicated engineering projects – each piece of the system is build separately and then put together before delivery.  Lets take the analogy of motor vehicle that has a top speed (aka production) of 100mph.  Let imagine engine is the IT hardware and the rest of the vehicle is the software components. If we half the IT environment (e.g. Half the number of CPU’s and therefore the speed) we are effectively halving the size of the engine and giving it ¼ of the capacity.  We are now also driving the full weight of the software with a much smaller engine.  This may give the engine 1/4 of the capacity & performance (L1, L2 caches are reduced, queuing is increased, caches are harder hit)  So now the maximum speed of the car with the re-sized engine is 25 mph. Its probably slower – as the smaller engine has to work harder to carry the weight of vehicle. Remember, this is an analogy, so isn’t a perfect fit – but it is illustrating some important points. E.g.  software is never scaled down – only hardware.

Now lets imagine other parts of the car are software subsystems – they bolt indirectly onto the engine.  Wheels, Nuts, bolts, steering and Axial.  The smaller engine is driving these but can never load them past 25mph. So actually everything looks OK in the scaled environment when we are going full speed i.e. 25mph.  Now lets take the axial – an essential part of the car that is attached to the engine, lets say this has been unwittingly updated but a fault is introduced that means it will break at 30 mph.  This performance fault will never be seen in the scaled environment.  It will only be seen in production.  So the key lesson here is:  By using a scaled environment you are unlikely to find performance issues past your environments capacity.  However, if the axial breaks at 24 mph you will find it – so it does reduce risk.

                    to this ->

Now lets technical forex say the wheel nuts have been redesigned to.  These have been initially designed to be over-tolerance, which means they have been designed to break at 200mph, way above the 100mph limit of the car.  Lets say a re-design fault means that the tolerance has actually been lowered to 102mph.  What does this mean – This means the system won’t won’t break in test or live, but the actual the capacity of the associated components have been drastically reduced without visibility.  So the key lesson here is: By using a scaled environment it becomes more difficult to see if the overall capacity and tolerance of the system has been reduced – making the chances of live performance failure more likely.  

So what can Performance Testers do? 

Now in reality – the engine is made of many components that are essential to its speed e.g. Spark plugs, piston, camshaft.  When we scale a hardware environment down we do not reduce the size of these components in proportion.  Our essential IT components consist of memory, L1 Cache, CPU speed, network, disk i/o, DB – these interact in a complex way and resizing one will affect the overall performance. If you have to prioritize any of these – then attempt to keep memory the same. So the key takeaway here is: If you have to work in a scaled environment – do not scale everything down.  Identify, Prioritize and attempt to keep as many key attributes as close to live as possible.

So that’s a simple analogy.  In reality you also have to consider the risks of deadlocking (less likely in a scaled environment), configuration differences and actual scalability capabilities of the application.  I consulted at a company that had 200+ identical instances of a middle tier server – replicating this in a load testing environment wasn’t practical or cost effective. So I studied the architecture and identified and communicated the limitations of the performance test environment.  This also enabled me to identify some follies before I even begun performance testing. Software developers had taken a common component out of the instances and centralized it – without even considering that they had now introduced a single point of failure which was going to be subjected to 200 times more traffic.  So another takeaway is: Study the Environment and architecture – attempt to scale the performance testing environment sensibly.

Of course every architecture is different – people rarely program in languages such as CUDA, so its rare to come across truly scalable software architectures that will downsize proportionally.

So here are the key takeaways:

  1. Scaled environments do not downsize proportionally
  2. By using a scaled environment you are unlikely to find performance issues past your scaled environments capacity.
  3. By using a scaled environment you can reduce the capacity and tolerance of overall system without visibility – this increases the risk of live performance issues
  4. If you have to work in a scaled environment – do not scale everything down.  Identify, Prioritize and attempt to keep as many key attributes as close to live.
  5. Identify and communicated the limitations of the scaled performance test environment to management.
  6. Study the environment and architecture – attempt to scale the performance testing environment sensibly.
  7. If you use a scaled environment for Performance testing then make sure adequate (and fast) roll back procedures are in place in the live environment

I hope you find this useful in explaining the risks associated with scaled performance environments.

Just a quick note – Its worth mentioning here that I have never ever seen Capacity Planning tools work. They are expensive & ineffective.  There are much better ways of forecasting capacities or increasing the performance of the existing application. Insert  (Future link to a reference to Capacity Planning tools -I’m yet to be convinced)

See Also

How To Increase Live Performance 

One thought on “The Risks Of Testing in Scaled Performance Environments

  1. Often, when performance testing in a scaled environment, the software will break before the hardware does. What I mean is that a software setting somewhere (e.g. thread pool, bad code) will be the limiting factor way before CPU or memory runs out. This testing still adds tremendous value, because bottlencks are being identified. Often it is possible to tune a scaled down environment to meet the demands of the expected production load, meaning that in prodcution, where the hardware is usually over speced, performance should be better (or at least the same) as the scaled down environment. Ofcourse though it is better to test in a production sized environment, not least of all because of the data size.

Leave a Reply

Your email address will not be published. Required fields are marked *