Wednesday, January 28, 2015

Performance Testing Framework




In this post I’ll introduce a set of performance testing classes and practices that I use when creating automated performance tests – with one main aim: simplify the task of measuring/verifying the performance of key areas of my .NET applications.

Source Code:
 

Often it is not practical, nor even recommended to measure all parts of an application – why waste time profiling sections of code that get called infrequently?  It’s also rather difficult knowing exactly where to start profiling an application, whether it’s server or UI, when that app either hasn’t had performance proactively “baked in” or when you’re trying to find the root cause of an occasional freeze. 

If I’m trying to isolate performance issues (or maybe retrofitting performance tests), then I tend to start off by using any of the popular . NET profiling apps, such as Red Gate’s ANTS.   These give me an idea of where to look, eg for methods that are called very frequently or take a while to run.  At least this gives me a fighting chance of where to concentrate my efforts, rather than wading through 1000s of lines of code…needle in a hay stack springs to mind.

There are many resources available showing common coding pitfalls, which can result in extra memory allocates that aren’t needed, making non-deterministic functions behave in a deterministic fashion, lack of caching, multiple enumerations etc.  Such examples include:
  • string concatenation rather than using a StringBuilder, 
  • switch versus multiple if statements, 
  • caching values for ToString/GetHashCode on immutable types



…the list goes on.  

I’ll not delve further into the specifics …will leave that for another post; another day.

I treat performance tests as being just as important as my unit tests (which, to me, are of equal importance to my actual project’s code).  The performance tests must be easy to run, allowing me to tweak the code under test, run the performance test, and should tell me as soon as a timing metric has not been met…a red alert followed by a fail.

If you’re familiar with the red/green/refactor mantra from the Test Driven Development world, then you’re probably used to seeing red/green alerts whilst you work through a project.

Initial Benchmarks

When I’m looking at improving performance for a block of code and I have no timing metrics, aka key performance indicators, to work with then I follow these steps:

  1. Create a performance test that calls the specific method/block - initially with an expected duration of zero milliseconds – this will fail when ran
  2. Run the performance test (which will repeatedly run my method/block under test) to get an idea of current average time (later I’ll discuss why average is not always the best statistic to use)
  3. Bake that first result into my test
  4. Start the process of incremental refactoring/re-running test to quantifiably improve performance

By following these steps, I have an initial benchmark to work from and will know pretty quickly if I’m actually making things worse, or if the incremental changes are starting to make less of an impact on the results (when this happens it’s generally time to stop trying to squeeze the last clock tick out of a method). 

The benchmark can then serve as future indicator of performance.  If, sometime later, a developer makes a change to an area of code which negatively affects the performance, the performance test will fail - much like a failing unit test.

Sampling

Unfortunately, there are many factors that can affect how long a section of code takes to run, eg JIT overhead, other applications running, IO activity, debug/release build, physical/virtual memory, type of hard drive, number and speed of processors. 

All of these come into play and unfortunately it’s not possible to eradicate any as a factor of the timing stats.  To try to alleviate this, the performance test should run the code under test a number of times and calculate a meaningful statistic from that. 

Gotcha: This problem is made even worse if you run the performance tests as part of a Continuous Integration on a build machine.  Your build machine may be used by other dev teams or may be lower spec than that of your development or end user machines.  Performance tests that run without failing on your development machines may well fail on a build machine.

Running Tests
When I’m working through a new feature or refactoring existing code, then it’s important that I can tweak some code and easily run my tests (whether they’re unit or performance tests).  To help I use Resharper from JetBrains.  This Visual Studio productivity tool makes it’s so easy to run/debug a test (unfortunately not all editions of Visual Studio allow you to create unit/stress/performance tests), simply by clicking on the test method, using test explorer or re-running as part of a test session:



Getting Started
The key tasks of the performance test base class are:
·         Specify a minimum acceptable time span that a test method should take to run
protected abstract TimeSpan AcceptableDuration { get; }

·         Optionally run some untimed fixture setup code, prior to my test cycle running
protected virtual void FixtureSetup()
  
 
·         Optionally run some untimed code prior to running each test case method, maybe to reduce impact of first JIT operations:
protected virtual void PreTestCaseSetup()

·         Run a specific timed test method a repeated number of times, collecting timing stats from each call:
protected abstract void TestCase()
 
·         Optionally run some untimed code after my test cycle runs, maybe to clean up my environment:
protected virtual void FixtureTearDown()
·         Calculate some  stats from the timings collected

·         Throw an exception if the average time to complete is greater than the AcceptableDuration property

I’m going to use the NUnit framework to throw an exception, via Assert.Fail if the timespan is too long, so you’ll need to run the following in NuGet Package Manager Console:
Install-Package NUnit

As it happens, I could have just thrown an exception to indicate an error state for the test, but NUnit has so many useful features I’ll stick with the using NUnit’s Assert class.

Starting to look at the code, I’ve created a new Class Library project called PerformanceTests (which has the NUnit package referenced above) and added a new class called PerformanceTestBase.cs

This is an abstract class, which I’ll base each of my specific performance tests on – so one test class for each method/block under test.  As an example, say I’ve established that the GetHashCode function on my Customer class needs to be profiled, then I’d create a class called CustomerGetHashCodeTest – which will contain all of the code to test just GetHashCode.

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using NUnit.Framework;
 
internal abstract class PerformanceTestBase
{
    /// <summary>
    /// The default number of tests cycles to run in order to calc avg
    /// </summary>
    private const int DefaultTestCycles = 100;
 
    /// <summary>
    /// How long each test cycle took to complete
    /// </summary>
    private readonly List<TimeSpan> _testCycleDurations = new List<TimeSpan>();
 
    /// <summary>
    /// Override if you have specific number of times you want
    /// a test to be executed
    /// </summary>
    protected virtual int TestCyclesToRun
    {
        get { return DefaultTestCycles; }
    }
 
    /// <summary>
    /// The minimum amount of time that we'll allow before failing a test
    /// </summary>
    protected abstract TimeSpan AcceptableDuration { get; }
 
    /// <summary>
    /// TIMED: The actual test that you want to run per cycle.
    /// </summary>
    protected abstract void TestCase();
 
    /// <summary>
    /// If you have any code that needs to be ran prior to a test, 
    /// eg to force JITting 
    /// </summary>
    protected virtual void PreTestCaseSetup()
    {}
 
    /// <summary>
    /// Override if you need an initial setup operation before 
    /// the cycle test starts
    /// </summary>
    protected virtual void FixtureSetup()
    {}
 
    /// <summary>
    /// Override if you need to run clean-up operation after 
    /// of the cycles have completed
    /// </summary>
    protected virtual void FixtureTearDown()
    {}
}

As you can see there are a mixture of abstract and virtual methods.  When you implement a performance test you’ll need to implement the following:
protected abstract TimeSpan AcceptableDuration { get; }
Returns the minimum acceptable average timespan that will result in a failure of this test
protected abstract void TestCase();
Contains the actual test code to run – this method call is timed and is used to calculate an overall average.
Now we need to add code for the controlling method, ensuring that it’s accessible by any NUnit test runners.  TestCaseRunner is a public void method, annotated with the NUnit Test attribute:
[Test]
public void TestCaseRunner()
{
    FixtureSetup();
 
    var testCycles = TestCyclesToRun;
 
    var timer = new Stopwatch();
 
    Log("STARTING TEST. Cycles={0}, AcceptableDuration={1}", 
        testCycles, FormatTimeSpan(AcceptableDuration));
 
    Log("CYCLE \tTime");
    for (var cycle = 1; cycle <= testCycles; cycle++)
    {
        PreTestCaseSetup();
 
        timer.Start();
        TestCase();  // run the actual test
        timer.Stop();
 
        var elapsed = timer.Elapsed;
        _testCycleDurations.Add(elapsed);
 
        Log("{0:000}\t{1}", cycle, FormatTimeSpan(elapsed));
 
        timer.Reset();
    }
        
    FixtureTearDown();
    AssertStatistics();
}

We use a StopWatch instance to provide the actual timing element, simply starting, stopping and resetting the timer between each cycle – this provides a high-enough resolution.

The remaining Log and AssertStatistics methods need to be defined.  Log is simply a helper to log messages to the console.

private void AssertStatistics()
{
    var average = _testCycleDurations.AverageTotalMilliseconds();
    var min = _testCycleDurations.MinTotalMilliseconds();
    var max = _testCycleDurations.MaxTotalMilliseconds();
    var deviation = _testCycleDurations.Deviation();
 
    const double Percentile = 95D;
    var percentile = _testCycleDurations.Percentile(Percentile);
 
    var acceptableDuration = AcceptableDuration;
 
    Log("TEST FINISHED. Timings(ms): Acceptable={0}, Percentile={1}, Avg={2}, Min={3}, Max={4}, Deviation={5}, ActualTimes={6}",
        FormatTimeSpan(acceptableDuration), 
        FormatTimeSpan(percentile), FormatTimeSpan(average),
        FormatTimeSpan(min), FormatTimeSpan(max), 
        FormatTimeSpan(deviation), 
        Summarise(_testCycleDurations));
 
    var testPassed = percentile <= acceptableDuration;
    if (!testPassed)
    {
        Assert.Fail("DURATION EXCEEDED: Acceptable={0}, Percentile={1}",
            FormatTimeSpan(acceptableDuration), FormatTimeSpan(percentile));
    }
}
AssertStatistics() uses a few  extensions methods that extend ICollection<TimeSpan> from a class called TimeSpanCollectionExtensions.cs (below) .  These methods determine the minimum, maximum, average, deviation and most importantly the Percentile value from the timing stats collected after each cycle.

In order to determine if we’ve exceeded the acceptable time space, compare that with use the 95th percentile – this is more accurate than using the average value. 
 
The remaining methods help with formatting the results: 
 
private static string FormatTimeSpan(TimeSpan span)
{
    return span.TotalMilliseconds.ToString("N0");
}
 
private static string Summarise(IEnumerable<TimeSpan> durations)
{
    return durations
        .Aggregate(new StringBuilder(), 
            (bld, ts) => bld.Append(string.Format("{0}|", FormatTimeSpan(ts))))
        .ToString();
}
 
protected static void Log(string format, params object[] args)
{
    var msg = string.Format("{0:HH:mm:ss.fff}{1}", 
        DateTime.Now, string.Format(format, args));
    Debug.WriteLine(msg);
}
 

Finally TimeSpanCollectionExtensions.cs.  These methods haven’t been profiled:

using System;
using System.Collections.Generic;
using System.Linq;
 
internal static class TimeSpanCollectionExtensions
{
    public static TimeSpan Percentile(this ICollection<TimeSpan> testDurations, double percentile)
    {
        var sortedDurations = new List<TimeSpan>(testDurations);
        sortedDurations.Sort();
 
        var percentileIndex = Convert.ToInt32((percentile / 100D) * sortedDurations.Count);
        return sortedDurations.ElementAt(percentileIndex);
    }
 
    public static TimeSpan Deviation(this ICollection<TimeSpan> testCycleDurations)
    {
        var sortedDurations = new List<TimeSpan>(testCycleDurations);
        sortedDurations.Sort();
 
        var minTime = sortedDurations.First();
        var maxTime = sortedDurations.Last();
 
        if (minTime != maxTime)
        {
            sortedDurations.Remove(minTime);
            sortedDurations.Remove(maxTime);
        }
 
        return TimeSpan.FromMilliseconds(sortedDurations.Average(ts => ts.TotalMilliseconds));
    }
 
    public static TimeSpan AverageTotalMilliseconds(this ICollection<TimeSpan> testDurations)
    {
        return TimeSpan.FromMilliseconds(
            testDurations.Average(ts => ts.TotalMilliseconds));
    }
 
    public static TimeSpan MinTotalMilliseconds(this ICollection<TimeSpan> testDurations)
    {
        return TimeSpan.FromMilliseconds(
            testDurations.Min(ts => ts.TotalMilliseconds));
    }
 
    public static TimeSpan MaxTotalMilliseconds(this ICollection<TimeSpan> testDurations)
    {
        return TimeSpan.FromMilliseconds(
            testDurations.Max(ts => ts.TotalMilliseconds));
    }
}
Creating a Performance Test
Before we start, I need to create something to test.  So, I’ve added a new Class Library project (SampleApp).  To that, I’ll add a NumberWords class:
using System;
 
public static class NumberWords
{
    public static string FromSwitch(int number)
    {
        switch (number)
        {
            case 0: return "zero-" + Environment.TickCount;
            case 1: return "one-" + Environment.TickCount;
......
            case 49: return "fortynine-" + Environment.TickCount;
            case 50: return "fifty-" + Environment.TickCount;
            defaultreturn "??-" + Environment.TickCount;
        }
    }
 
    public static string FromIfs(int number)
    {
        if (number == 0) return "zero-" + Environment.TickCount;
        if (number == 1) return "one-" + Environment.TickCount;
......
        if (number == 49) return "fortynine-" + Environment.TickCount;
        if (number == 50) return "fifty-" + Environment.TickCount;
        return "??-" + Environment.TickCount;
    }
 
}
FromSwitch() and FromIfs() return exactly the same thing, they highlight the performance difference you might see between the two implementations.

I’ll start with creating a performance test for the FromSwitch method (I’ll need another test class for FromIfs too).  Add a new class to the PerformanceTests project NumberWordsFromIfsTest.cs:

using System;
using System.Linq;
using NUnit.Framework;
 
public static class NumberWordTestCriteria
{
    public const int Start = 0;
    public const int Count = 50000;
    public const int AcceptableMiliseconds = 90;
}
 
[TestFixture]
class NumberWordsFromIfsTest : PerformanceTestBase
{
    protected override TimeSpan AcceptableDuration
    {
        get { return TimeSpan.FromMilliseconds(NumberWordTestCriteria.AcceptableMiliseconds); }
    }
 
    protected override void PreTestCaseSetup()
    {
        var num = GetWord(0);
    }
 
    protected override void TestCase()
    {
        var words = Enumerable.Range(NumberWordTestCriteria.Start, NumberWordTestCriteria.Count)
            .Select(GetWord)
            .ToList();
    }
 
    private static string GetWord(int number)
    {
        return NumberWords.FromIfs(number);
    }
}
 
NumberWordTestCriteria is a static class that defines some parameters for the tests – I’ve added this class as I want to use the same criteria for the next NumberWordsFromSwitchTest.cs

Looking at NumberWordsFromIfsTest, the class derives from our base performance class PerformanceTestBase and is decorated with the TestFixture attribute so that NUnit runners can find the test.

The abstract AcceptableDuration property is implemented – I want this to pass if the average time span is less than 90 milliseconds.

I’ve also implemented PreTestCaseSetup(), this will get ran prior to running my TestCase method – the method under test.  You can generally use this approach to negate the effect of CLR JITing when a method is first called.

TestCase is the method that gets called repeatedly and its execution time averaged out to see if I’m improving or degrading performance.

Using Resharper I can quickly run the performance test:



Once the test run is over, I’ll see a red or green alert and the following results:

20:19:54.583: STARTING TEST. Cycles=100, AcceptableDuration=90
20:19:54.620: CYCLE         Time
20:19:54.854: 001              114
20:19:54.935: 002              78
20:19:55.019: 003              80
20:19:55.095: 004              73
20:19:55.176: 005              77
…..
20:20:03.308: 095              84
20:20:03.387: 096              76
20:20:03.479: 097              89
20:20:03.575: 098              93
20:20:03.661: 099              83
20:20:03.768: 100              104
20:20:03.824: TEST FINISHED. Timings(ms): Acceptable=90, Percentile=113, Avg=87, Min=70, Max=123, Deviation=87, ActualTimes=114|78|80.

Unfortunately my performance test failed L.  The TestCase method was called 100 times, the 95th percentile time was 113 milliseconds, but my acceptable time was only 90 milliseconds

DURATION EXCEEDED: Acceptable=90, Percentile=113

So now it’s up to me to find a faster way to concatenate the words.

Going Further
I’ve added the following JoinWords class to my SampleApp project:

using System;
using System.Linq;
using System.Text;
 
public static class JoinWords
{
    public static string FromStringBuilder(int count)
    {
        var builder = new StringBuilder();
        foreach (var item in Enumerable.Range(0, count))
        {
            builder.AppendLine(item.ToString());
        }
 
        return builder.ToString();
    }
 
    public static string FromConcatenation(int count)
    {
        var line = string.Empty;
        foreach (var item in Enumerable.Range(0, count))
        {
            line += item + Environment.NewLine;
        }
 
        return line;
    }
 
    public static string FromAggregate(int count)
    {
        return Enumerable.Range(0, count)
            .Aggregate(new StringBuilder(), 
                (builder, item) => builder.AppendLine(item.ToString()))
            .ToString();
    }
}

Again, each of the three methods do the same thing, they return a string of numbers joined together, using a StringBuilder, string concatenation and  a StringBuilder in conjunction the Linq Aggregate method.

I’ve chosen these implementations to highlight how slow string concatenation can be.  I’ve added three test classes using the pattern outline above (although I do have an extra abstract class JoinWordsTestBase that wraps up some commonality)
It’s interesting to see just how slow string concatenation runs compared to using the StringBuilder – ten times slower.


 

12 comments:

Unknown said...

Hello Jason,
The Article on Performance Testing Framework. Is informative. It gives detailed information about it.. Thanks for Sharing the information about the Performance Testing Framework. For More information check the detail on the Performance Testing check, Software Testing Company

Rathinam said...


This blog is very useful for the young generation and I like this content. Your written way is very attractive to me and kindly updating them..!
Corporate Training in Chennai
Corporate Training Companies in Chennai
Oracle DBA Training in Chennai
Linux Training in Chennai
Job Openings in Chennai
Oracle Training in Chennai
Tableau Training in Chennai
Unix Training in Chennai
Corporate Training in Adyar
Corporate Training in Annanagar

Kerrthika K said...

It's really magnificent article! which you have posted is very useful..
TOEFL Coaching in Chennai
TOEFL Training in Chennai
French Classes in Chennai
vue test center in chennai
German Language Course in Chennai
IELTS Coaching centre in Chennai
Best Spoken English Classes in Chennai
TOEFL Coaching in VelaChery
TOEFL Coaching in Tambaram
TOEFL Coaching in Anna Nagar

cathrine juliet said...

Thanks for your efforts in sharing this information in detail. This was very helpful to me. kindly keep continuing the great work.

Web Designing Course Training in Chennai | Web Designing Course Training in annanagar | Web Designing Course Training in omr | Web Designing Course Training in porur | Web Designing Course Training in tambaram | Web Designing Course Training in velachery



Jayalakshmi said...

I feel really happy to have seen your webpage and look forward to so many more entertaining times reading here. Thanks once more for all the details
data science training in chennai

data science training in tambaram

android training in chennai

android training in tambaram

devops training in chennai

devops training in tambaram

artificial intelligence training in chennai

artificial intelligence training in tambaram

deiva said...

Thank you for the nice article here. Really nice and keep update to explore more gaming tips and ideas.
web designing training in chennai

web designing training in omr

digital marketing training in chennai

digital marketing training in omr

rpa training in chennai

rpa training in omr

tally training in chennai

tally training in omr

lakshmik7410 said...

This is such a lovely article. A very informative piece of content in the blog. Thankyou.

data science training in chennai

ccna training in chennai

iot training in chennai

cyber security training in chennai
ethical hacking training in chennai

Prwatech said...

Amazing post thanks for sharing.

Best Tableau Training Institute in Pune

Hi Every One said...

Learn how to Hack WiFi Networks and Create the Most Effective Evil Twin Attacks using Captive Portals. Online Wifi Password Hack

Samia Beel said...

ShareMouse Product Key, On the other hand, it can support different keys of multimedia. For instance, the programmers who use multiple PC ShareMouse Crack 2022

Hi Every One said...

“I hate goodbyes, so let's not say it! I hope you always remember that you have a supportive friend here, doesn't matter wherever you go Website

DL said...

Performance testing frameworks help automate and streamline the process of evaluating an application's performance under load. Here's a breakdown of key concepts and some popular options:

Types of Performance Testing:

Load Testing: Simulates real-world user traffic to assess how the system behaves under increasing load.
Stress Testing: Pushes the system beyond its expected capacity to identify breaking points and stability issues.

domains in computer science

IEEE CSE projects

Final Year Project Centers in India


Spike Testing: Simulates sudden bursts of traffic to evaluate the system's ability to handle unexpected spikes in load.