Sunday, February 08, 2015

Improving Struct Performance By Overriding Equals()

I’ve spent a few days analysing the performance of a C# WPF application.  The client’s users were concerned that performance across certain areas just didn’t feel as fast as it ought to be – particularly during busy times of a trading day. 

That might sound pretty vague, but often as developers that’s all we get.

There are plenty of resources available that go into a great level of detail on the best practices to follow when designing apps.  It helps to understand what .NET does under the hood, in many aspects, but it’s important to remember that there are many layers in which subtle coding designs can negatively affect performance.

I’m going to describe some interesting implementation details which I discovered were the cause of performance issues that the users were complaining about.  This particular blog covers structs and how slow Equals calls can be – please see my other blog entry for similar results found by examining the default GetHashCode implementation.


Summary of Results (Bottom Line)


If you don’t want to read how these numbers were created, then the bottom line is, if you are using structs to store data and these structs regularly have their Equals or GetHashCode default methods called - either directly or indirectly (by storing in a Dictionary), then in terms of performance you should override the methods and implement your own Equals and GetHashCode methods.

If you’re using a struct and you do not override the Equals or GetHashCode methods, then you’re deferring down to the default implementations.  These are shockingly slow, as they use Reflection to look at the fields defined in your struct.

The chart below shows (times in milliseconds) how slow it can be by repeatedly calling Equals on a struct that does/does not have the Equals method overridden.





Use of Class Versus Struct To Store Data


Commonly known facts:
·         Class instances get allocate on the heap
·         Class instances get via a copied pointer reference (size is 4/8 bytes for 32/64 bit process)
·         Class instances have additional overhead of  8/16 bytes for 32/64 bit process (in fact .NET aligns objects to the minimum of 12/24 bytes object size for 32/64 bit processes)
·         Structs have no additional overhead, the amount of memory is the sum of all fields in the struct
·         Structs declared in a class exist on the heap (not the stack!)
·         Structs, by default, when passed to a method are copies of the original struct (a byte-by-byte copy), so any changes made to the struct from inside the method are local to that method

With this in mind, it appears that the original application developer used a struct to store minute-by-minute stock market pricing data, rather than a class.  I can see their thinking: in a 32 bit process the amount of memory required to store an array of 10’s of thousands structs would have been about half that required if a class were used (there is even more of difference in a 64 bit process)

In addition to using structs to store the market data info, various supporting calculations required value equality comparison and were stored in Dictionary based collections, hence a large of calls to Equals and GetHashCode:

public override int GetHashCode()
public override bool Equals(object obj)
 
 
Unfortunately, these structs did not to override the GetHashCode and Equals methods, therefore using .NET’s default values, which resulted in some rather surprising performance issues.

Blittable vs Non-Blittable Types

Performance can be further decreased by the type of fields that the default implementation has to reflect over.  Non-blittable types, are those which do not have the same structure in memory in managed and unmanaged code, and as such cannot be shared directly – reducing performance of the default method implementations.

Commonly used blittable type (there are many more):
  • System.Byte
  • System.SByte
  • System.Int16
  • System.UInt16
  • System.Int32
  • System.UInt32
  • System.Int64
  • System.IntPtr
  • System.UIntPtr

Commonly used non-blittable types:
  • System.Boolean
  • System.Char
  • System.Object
  • System.String



Test Code Generate Results


I’m going to walk through a selection of code snippets to show how much of a performance difference you can see by implementing Equals and GetHashCode.

As part of my profiling samples, I’ve created a base class (TimedActionBase) that, when used in conjunction with NUnit and Resharper, makes it really easy to run snippets of test code with a single mouse click and provide a statistical average for the area under test.  You’ll find a link to the code at the bottom of this article. 

You won’t need Resharper to compile the test code, but you’ll need to run Install-Package NUnit  using the Package Manager Console to download and install NUnit from NuGet.

I’ll just summarise the base test class here, as it’s the actual tests that are more important.   
 
 
Each test class derives from TimedActionBase and implements the TimedAction() method.  This method gets called once initially (unloading any timed JIT overhead) and then is called a set number of times, with a percentile calculation used to provide a meaningful idea of how long the method takes to execute.  I can then run SinglePassRunner() if a want a single timed pass (this calls the test method 100 times), or, in order to build up the data for the graphs MultiPassRunner(): 
 
 
using System;
using System.Collections.Generic;
using System.Diagnostics;
using NUnit.Framework;

[TestFixture]
public abstract class TimedActionBase
{
    private readonly List<TimeSpan> _durations = new List<TimeSpan>();
    private readonly Stopwatch _timer = new Stopwatch();
    private bool _headerLogged;

    protected abstract void TimedAction();

    [Test]
    public void SinglePassRunner()
    {
        TimedActionRunner();
    }

    [Test]
    public void MultiPassRunner()
    {
        const int NoOfPasses = 180;

        for (var i = 0; i < NoOfPasses; i ++)
        {
            TimedActionRunner();
        }
    }

    private void TimedActionRunner()
    {
        const int DefaultTestCycles = 100;

        // Prerun TimedAction method in case there's any JIT overhead
        TimedAction();

        _durations.Clear();

        for (var cycle = 1; cycle <= DefaultTestCycles; cycle++)
        {
            _timer.Start();
            TimedAction();
            _timer.Stop();

            var duration = _timer.Elapsed;
            _durations.Add(duration);

            _timer.Reset();
        }

        CalculateStatistics();
    }

    private void CalculateStatistics()
    {
        var average = _durations.AverageTotalMilliseconds();
        var min = _durations.MinTotalMilliseconds();
        var max = _durations.MaxTotalMilliseconds();
        var deviation = _durations.Deviation();

        const double Percentile = 95D;
        var percentile = _durations.Percentile(Percentile);

        var header = string.Empty;
        if (!_headerLogged)
        {
            _headerLogged = true;
            header = "Timings(ms)     \tPercentile\tAverage\tMinimum\tMaxmimum\tDeviation\n";
        }

        Log(string.Concat(header, string.Join("\t"
            new[]
            {
                GetType().Name,
                FormatTimeSpan(percentile), FormatTimeSpan(average),
                FormatTimeSpan(min), FormatTimeSpan(max),
                FormatTimeSpan(deviation)
            })));
    }

    
    private static string FormatTimeSpan(TimeSpan span)
    {
        return span.TotalMilliseconds.ToString("N0");
    }

    protected static void Log(string format, params object[] args)
    {
        Console.WriteLine(format, args);
    }

}


To compare the performance of different implementations, all of the sample tests below follow the same basic steps:

1.       Define a struct with slightly different implementation under test (eg, with or without Equals)
2.       Create a List containing  50,000 structs
3.       Create a single comparison struct
4.       Walk through the list using the .Equals method and calculate how long that timed operation took

1. Not Overriding Equals() With Non Blittable Fields (Slowest Speed)

Equals Overridden
Non-Blittable Fields
Blittable Fields
Typical Milliseconds
No
3
0
117

class SlowNonBlittableWithoutEqualsTest : TimedActionBase
{
    struct MyStruct
    {
        public Decimal Field1;
        public Decimal Field2;
        public Decimal Field3;
    }

    private readonly List<MyStruct> _items = Enumerable
        .Range(1, EqualsTestParams.NumberOfItems)
        .Select(n => new MyStruct
        {
            Field1 = n + 1.1M,
            Field2 = n + 2.2M,
            Field3 = n + 3.3M
        })
        .ToList();

    private readonly MyStruct _comparison = new MyStruct
    {
        Field1 = 1.1M, Field2 = 2.2M, Field3 = 3.3M
    };

    protected override void TimedAction()
    {
        var allEqual = _items
            .Select(item => item.Equals(_comparison))
            .ToList();
    }
}


2. Not Overriding Equals() With Mixture of Blittable/Non Blittable Fields (Slow)

Equals Overridden
Non-Blittable Fields
Blittable Fields
Typical Milliseconds
No
2
1
90

class SlowBlittableMixtureWithoutEqualsTest : TimedActionBase
{
    struct MyStruct
    {
        public Decimal Field1;
        public int Field2;
        public Decimal Field3;
    }

    private readonly List<MyStruct> _items = Enumerable
        .Range(1, EqualsTestParams.NumberOfItems)
        .Select(n => new MyStruct
        {
            Field1 = n + 1.1M,
            Field2 = n + 2,
            Field3 = n + 3.3M
        })
        .ToList();

    private readonly MyStruct _comparison = new MyStruct
    {
        Field1 = 1.1M, Field2 = 2, Field3 = 3.3M
    };

    protected override void TimedAction()
    {
        var allEqual = _items
            .Select(item => item.Equals(_comparison))
            .ToList();
    }
}


3. Not Overriding Equals() With All Blittable Fields (Medium)

Equals Overridden
Non-Blittable Fields
Blittable Fields
Typical Milliseconds
No
0
3
13

class MediumBlittableWithoutEqualsTest : TimedActionBase
{
    struct MyStruct
    {
        public int Field1;
        public int Field2;
        public int Field3;
    }

    private readonly List<MyStruct> _items = Enumerable
        .Range(1, EqualsTestParams.NumberOfItems)
        .Select(n => new MyStruct
        {
            Field1 = n + 1,
            Field2 = n + 2,
            Field3 = n + 3
        })
        .ToList();

    private readonly MyStruct _comparison = new MyStruct
    {
        Field1 = 1, 
        Field2 = 2, 
        Field3 = 3
    };

    protected override void TimedAction()
    {
        var allEqual = _items
            .Select(item => item.Equals(_comparison))
            .ToList();
    }
}


4. Overriding Equals() With Non-Blittable Fields (Medium)

Equals Overridden
Non-Blittable Fields
Blittable Fields
Typical Milliseconds
Yes
3
0
12

class MediumNonBlittableWithEqualsTest : TimedActionBase
{
    struct MyStruct  : IEquatable<MyStruct>
    {
        public Decimal Field1;
        public Decimal Field2;
        public Decimal Field3;

        public override bool Equals(object obj)
        {
            if (obj == null || (!(obj is MyStruct)))
                return false;

            return Equals((MyStruct)obj);
        }

        public bool Equals(MyStruct other)
        {
            return Field1 == other.Field1 &&
                   Field2 == other.Field2 &&
                   Field3 == other.Field3;
        }
    }

    private readonly List<MyStruct> _items = Enumerable
        .Range(1, EqualsTestParams.NumberOfItems)
        .Select(n => new MyStruct
        {
            Field1 = n + 1.1M, Field2 = n + 2.2M, Field3 = n + 3.3M
        })
        .ToList();

    private readonly MyStruct _comparison = new MyStruct
    {
        Field1 = 1.1M, 
        Field2 = 2.2M, 
        Field3 = 3.3M
    };

    protected override void TimedAction()
    {
        var allEqual = _items
            .Select(item => item.Equals(_comparison))
            .ToList();
    }

}

5. Overriding Equals() With Blittable Fields (Fastest)
Equals Overridden
Non-Blittable Fields
Blittable Fields
Typical Milliseconds
Yes
0
3
6

class FastBlittableWithEqualsTest : TimedActionBase
{
    struct MyStruct : IEquatable<MyStruct>
    {
        public int Field1;
        public int Field2;
        public int Field3;

        public override bool Equals(object obj)
        {
            if (obj == null || (!(obj is MyStruct)))
                return false;

            return Equals((MyStruct)obj);
        }

        public bool Equals(MyStruct other)
        {
            return Field1 == other.Field1 &&
                    Field2 == other.Field2 &&
                    Field3 == other.Field3;
        }
    }

    private readonly List<MyStruct> _items = Enumerable
        .Range(1, EqualsTestParams.NumberOfItems)
        .Select(n => new MyStruct 
        { 
            Field1 = n+1,
            Field2 = n+2, 
            Field3 = n+3 
        })
        .ToList();

    private readonly MyStruct _comparison = new MyStruct
    {
        Field1 = 1, 
        Field2 = 2, 
        Field3 = 3
    };

    protected override void TimedAction()
    {
        var allEqual = _items
            .Select(item => item.Equals(_comparison))
            .ToList();
    }

}


The numbers are very surprising.

You can view similar results for the GetHashCode performance test post.


No comments: