Vaughan Reid's blog

Counting sheep with BenchmarkDotNet

When trying to improve the performance of your code, the first thing you need to do is measure. If you don’t measure then you won’t know if your changes are actually improving performance. In most cases, there is a trade-off between readability and performance. Often better performing code is more complex and harder to maintain. Its for this reason that you have to be very careful when making changes without actually understanding what the real impact will be.

To measure small parts of your code, there is a great package that you can use for this in .NET called BenchmarkDotNet. Its quite heavily used in opensource .NET projects that need to try get every ounce of performance out. Two good examples that use it are the ASP.NET Core and the gRPC-dotnet projects. For them to save a few allocations could make a drastic difference to their end users. They even have builds running benchmarks to make sure that new changes don’t negatively affect performance.

This tool is really easy to get started and gives quite detailed and accurate results. I would really recommend it if you are looking to micro benchmark parts of your code.

One really important thing to note is that you most probably really shouldn’t be doing micro benchmarks for the majority of use cases and applications. Generally your time is better spent making your application maintainable and readable. You can then rather focus on recording more high level benchmarks which can provide easy wins for your performance problems. As an example, saving a few nano seconds on a method that is rarely used on a service that takes a few seconds for a response is a waste of your time. No one will care or notice. Maybe the only person who will notice is the poor person who has to fix your bugs.

There are times when you have a class that is executed often in a time critical service. In some of those cases, you have a case to do micro benchmarks since it could possibly improve the business proposition of the application. So please make sure that the performance gains you are looking for will add business value. Please leave your ego out of it.

Talking about business value, I have created a sheep counting service. A string of animals will be passed in and it will return the number of sheep. A common way to count the number of occurrences in a string is to use the Regex.Matches method. This can be achieved in one line and I’m sure is bug free. For most applications this is enough.

public class SheepCount
{
	private readonly string _sheep;

	public SheepCount()
	{
		_sheep = "sheep";
	}
	public int TotalSheepRegex(string dream)
	{
		return Regex.Matches(dream, _sheep,RegexOptions.IgnoreCase).Count;
	}
}

But now imagine that this calculation gets called 1 million times a day and that count is growing. How would you try improve the memory allocation and speed of this method and save your company a lot of money in the process?

Well first you would create some unit tests to confirm the expected behaviour. This will be useful when we make a faster method to confirm that the business logic still works.

[TestFixture]
class SheepCountTests
{
	private const string inputDreamLong = "Dog,Sheep,cat,lion,sheep,tiger,Dog,Sheep,cat,lion,sheep,tiger," +
	"Dog,Sheep,cat,lion,sheep,tiger,Dog,Sheep,cat,lion,sheep,tiger" +
	"Dog,Sheep,cat,lion,sheep,tiger,sheep,Sheep,cat,lion,sheep,tiger";

	private const string inputDreamNone = "Dog";
	private const string inputDreamOne = "Dog,Sheep";

	public SheepCount GetSUT()
	{
		return new SheepCount();
	}

	[TestCase(inputDreamLong, 13)]
	[TestCase(inputDreamOne, 1)]
	[TestCase(inputDreamNone, 0)]
	public void Regex_works(string dream, int expectedCount)
	{
		var sut = GetSUT();

		Assert.AreEqual(expectedCount, sut.TotalSheepRegex(dream));
	}
}

Then lets write an updated method which in theory doesn’t allocate additional memory and only runs through the string once. I called the new method TotalSheepStr to compare.

public class SheepCount
{
	private readonly char[] _sheepChar ;
	private readonly string _sheep;

	public SheepCount()
	{
		_sheep = "sheep";
		_sheepChar = _sheep.ToCharArray();
	}
	public int TotalSheepRegex(string dream)
	{
		return Regex.Matches(dream, _sheep,RegexOptions.IgnoreCase).Count;
	}

	public int TotalSheepStr(string dream)
	{
		int total = 0;

		int offset = 0;
		int sleepCharMaxIndex = _sheepChar.Length - 1;
		for (int i = 0; i < dream.Length; i++)
		{
			if (ToLower(dream[i]) == _sheepChar[offset])
			{
				if (offset < sleepCharMaxIndex)
				{
					offset++;
				}
				else if (offset == sleepCharMaxIndex)
				{
					total++;
				}
				else
				{
					offset = 0;
				}
			}
			else
			{
				offset = 0;
			}
		}

		char ToLower(char c)
		{
			if(c < 97)
			{
				c = (char)((int)c + 32);
			}
			return c;
		}

		return total;
	}

}

I’d just like to point out that this is much more than one line and someone else is most probably going to have to support this at some stage.

To confirm that all this new logic is working, lets update the tests to make sure the same test cases work for the new method.

[TestFixture]
class SheepCountTests
{

	private const string inputDreamLong = "Dog,Sheep,cat,lion,sheep,tiger,Dog,Sheep,cat,lion,sheep,tiger," +
	"Dog,Sheep,cat,lion,sheep,tiger,Dog,Sheep,cat,lion,sheep,tiger" +
	"Dog,Sheep,cat,lion,sheep,tiger,sheep,Sheep,cat,lion,sheep,tiger";

	private const string inputDreamNone = "Dog";
	private const string inputDreamOne = "Dog,Sheep";

	public SheepCount GetSUT()
	{
		return new SheepCount();
	}

	[TestCase(inputDreamLong, 13)]
	[TestCase(inputDreamOne, 1)]
	[TestCase(inputDreamNone, 0)]
	public void Regex_works(string dream, int expectedCount)
	{
		var sut = GetSUT();

		Assert.AreEqual(expectedCount, sut.TotalSheepRegex(dream));
	}

	[TestCase(inputDreamLong, 13)]
	[TestCase(inputDreamOne, 1)]
	[TestCase(inputDreamNone, 0)]
	public void Str_works(string dream, int expectedCount)
	{
		var sut = GetSUT();

		Assert.AreEqual(expectedCount, sut.TotalSheepStr(dream));
	}
}
	

Now its time to see if all this complex code actually made a difference. I’m creating a benchmark class using the MemoryDiagnoser attribute. This configures Benchmarkdotnet to measure memory allocations when running the method.

	
 [MemoryDiagnoser]
    public class SheepBenchmarks
    {
        private const string inputDreamLong = "Dog,Sheep,cat,lion,sheep,tiger,Dog,Sheep,cat,lion,sheep,tiger," +
            "Dog,Sheep,cat,lion,sheep,tiger,Dog,Sheep,cat,lion,sheep,tiger" +
            "Dog,Sheep,cat,lion,sheep,tiger,Dog,Sheep,cat,lion,sheep,tiger" +
            "Dog,Sheep,cat,lion,sheep,tiger,Dog,Sheep,cat,lion,sheep,tiger" +
            "Dog,Sheep,cat,lion,sheep,tiger,Dog,Sheep,cat,lion,sheep,tiger" +
            "Dog,Sheep,cat,lion,sheep,tiger,Dog,Sheep,cat,lion,sheep,tiger" +
            "Dog,Sheep,cat,lion,sheep,tiger,sheep,Sheep,cat,lion,sheep,tiger";

        private const string inputDreamNone = "Dog";

        private SheepCount sheepCount = new SheepCount();

        [ParamsSource(nameof(ValuesForDream))]
        public string Dream { get; set; }

        public IEnumerable<string> ValuesForDream => new[] { inputDreamLong , inputDreamNone };

        [Benchmark]
        public void StringSheep()
        {
            var count = sheepCount.TotalSheepStr(Dream);
        }

        [Benchmark]
        public void RegexSheep()
        {
            var count = sheepCount.TotalSheepRegex(Dream);
        }
    }
	

To run it, you just use the BenchmarkRunner method against the assembly with all of your benchmarks.

   class Program
    {
        static void Main(string[] args)
        {
            var summary = BenchmarkRunner.Run(typeof(Program).Assembly);
        }
    }
	

The result of it is as follows.

benchmark_sheep

For a few requests this really doesn’t matter but for my 1 million requests it makes quite a difference. Its 13 times faster to run on the bigger string which sounds great but really it only gives you 7 seconds over the million requests. For most systems, that isn’t really worth the more complex code.

The big difference comes when looking at the allocations and subsequent collections. With 1 million requests, this method will save 6.8 GB of allocations, 1,434 Gen 0 collections and 305 Gen 1 Collections. That is a lot less work in the garbage collector and more time running your code. The application will also run with a smaller memory footprint which would possibly help you minimize how much you need to pay when scaling out the application.