TPL Dataflow vs. Go Channels

Asynchrony with C# and Go

Ausgabe: Volume 10

TPL Dataflow vs. Go Channels

Asynchrony with C# and Go

Rainer Stropek

How can we exchange messages asynchronously between processing steps? There are different approaches for this in C # and Go. It’s a good reason to dare to think outside the box and to take a closer look at the Go programming language. But before we start, let's first take a look at the TPL Dataflow Library.

These days, concurrent programming is the rule, not the exception. Data is read or received, processed asynchronously within the process, and the results are output or sent to a recipient. Of course, communication between the threads of a process can be solved via buffers in memory and synchronization objects such as locks or semaphores. However, this programming model is error-prone and can often lead to inefficient algorithms.

Most of the programming platforms I know include alternative mechanisms at the programming language or framework level to asynchronously exchange messages between processing steps. In this article, I will compare the approaches of C# and Go, focusing on the specifics of Go. Exchanging messages between Goroutines via channels is a characteristic of the Go language. This article intends to offer C# developers a look at the bigger picture and perhaps it will even make them want to try out Go in a project.

TPL Dataflow Library

Before we get to Go and its channels, I'd like to briefly remind you what C# and .NET have built-in when it comes to asynchronous message exchange within processes: the TPL (Task Parallel Library) Dataflow Library [1]. This library is based on the following basic principles:

Source blocks (ISourceBlock) are data sources. You can read messages from them.
Target blocks (ITargetBlock) receive messages. You can write messages on them.
Propagator blocks (IPropagatorBlock<TIn, TOut>) are source and target blocks. They process data in some form (projecting, filtering, grouping, etc.).

The blocks are flexible and can be combined into pipelines. .NET comes with many predefined blocks (Namespace: System.Threading.Tasks.Dataflow) that can be combined into a pipeline depending on the application.

Listing 1 shows an example of how data processing is programmed with the TPL Dataflow Library. A producer generates data. In practice, these would be loaded from files or databases, or received via the network. In the example, random data is written to a BufferBlock. Depending on the application, it can be used as a source, propagator, or target block. The Transformer processes the data and passes the processing result to the next step of the pipeline. In the example, the mean value is calculated from numbers in an array. In practice, arbitrarily complex logics can run here for processing. One special feature in the TPL Dataflow Library is it automatically takes care of message processing parallelization if the respective algorithm allows it. In the example, MaxDegreeOfParallelism is specified, causing multiple transformers to run in parallel. This can accelerate processing accordingly. In the example, the consumer is the last step in the pipeline. It awaits the arrival of messages and processes them.

Listing 1: TPL Dataflow Pipeline

using System;
using static System.Console;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;
 
// Create a buffer (=target block) into which we can write messages
var buffer = new BufferBlock<byte[]>();
 
// Create a transformer (=source and target block) that processes data.
// For demo purposes, we specify a degree of parallelism. This allows
// .NET to run multiple transformers concurrently.
var transform = new TransformBlock<byte[], double>(Transform, new() { MaxDegreeOfParallelism = 5 });
 
// Link buffer with transformer
buffer.LinkTo(transform);
 
// Start asynchronous consumer
var consumerTask = ConsumeAsync(transform);
 
// Start producer
Produce(buffer);
 
// Producer is done, we can mark transformer as completed.
// This will stop the consumer after it will have been finished
// consuming buffered messages.
transform.Complete();
 
// Wait for consumer to finish and print number of processed messages
var bytesProcessed = await consumerTask;
WriteLine($"Processed {bytesProcessed} messages.");
 
/// <summary>
/// Produces values and writes them into <c>target</c>
/// </summary>
static void Produce(ITargetBlock<byte[]> target)
{
  // Here we generate random bytes. In practice, the producer
  // would e.g. read data from disk, receive data over the network,
  // get data from a database, etc.
  var rand = new Random();
  for (int i = 0; i < 100; ++i)
  {
    var buffer = new byte[1024];
    rand.NextBytes(buffer);
 
    // Send message into target block
    WriteLine("Sending message");
    target.Post(buffer);
  }
 
  // Mark as completed
  target.Complete();
}
 
/// <summary>
/// Transforms incoming message (byte array -> average value)
/// </summary>
static double Transform(byte[] bytes)
{
  // For debug purposes, we print the thread id. If you run the program,
  // you will see that transformers run in parallel on multiple threads.
  WriteLine($"Transforming message on thread ${Thread.CurrentThread.ManagedThreadId}");
  return bytes.Average(val => (double)val);
}
 
/// <summary>
/// Consumes message
/// </summary>
static async Task<int> ConsumeAsync(ISourceBlock<double> source)
{
  var messagesProcessed = 0;
 
  // Await incoming message
  while (await source.OutputAvailableAsync())
  {
    // Receive message
    var average = await source.ReceiveAsync();
 
    // Process message. In this demo we are just printing its content.
    WriteLine($"Consumed message, average value is {average}");
    messagesProcessed++;
  }
 
  return messagesProcessed;
}

The TPL Dataflow Library is well-suited for CPU- or I/O-intensive algorithms where a large amount of data needs to be processed, ideally in multiple threads running in parallel. As a developer, you don’t need to bother with shared memory, thread synchronization, and manual task scheduling. This is automatically done in the background by the library.

Performance: Go

Before we get into the topic of Go Channels, let me first say a few introductory words about the programming language for those who haven’t yet tried it. Go comes from the house of Google. The basic idea behind it is to have a programming language that’s snappy when compiling, easy to use, and performant when it comes to generated code. In contrast to C#, Go is compiled in machine language. There is no intermediate language. As with C #, it is still a managed language and includes a garbage collector.

After making the switch from C# to Go, you will notice that the most striking difference is the language’s simplicity. Go has only a fraction of keywords compared to C#. One example of this is the increment operator (++). In Go, it is a statement, not an expression. Something like if (++x > 5) does not exist in Go. The Go development team is very reluctant to add new language features that only exist to make code shorter. If a developer has to write a few extra lines of code, that isn’t seen as a disadvantage. In return, the language can be kept simple and lean. Another example that clarifies the Go philosophy is generics....

Asynchrony with C# and Go

Asynchrony with C# and Go

TPL Dataflow Library

Performance: Go

STARTEN

Weitere Artikel zu diesem Thema