Sunday, May 28, 2017

A Practical Guide to pthread And Pretty Fractals

Threads. Many programs need them, and they're important on multiprocessor systems (which, recently, is basically every system). Sometimes, they can be confusing, so I've taken the most most important bits and I've distilled them down into the basics of how to do threads in a basic C program using POSIX Threads (pthreads, for short). Let's get started!

Before we dive into writing threaded programs, let’s first decide if a program would benefit from threading. This is an important decision to make early on in the design process, as you want to be able to design your program around the benefits threading can provide. The question you have to ask is simple: does your program have to do two things at once? Say you need a responsive interface while at the same time you need to also do some long running operations, respond to network I/O, or whatever will detract from the operation of the front end. That’s a pretty clear indication that you’ll want to use threads. You also might want to use threads if you can easily do tasks in parallel (like in the example below).

Then you have to ask yourself if you need threads or a full-blown forked process. Processes are far more independent and can survive outside the termination of the main process. They have their own memory footprint and have their own process-related resources. They are a bit heavy and are, in my opinion, a bit harder to work with in terms of memory sharing (shared memory pages are not hard, but their a touch more tedious to deal with). I always try to work with threads before I’m forced to work with multiple processes.

So once you’ve decided that threads are the right thing to do, be sure to divide your work up into logical tasks. In this example, I’ll write a program to draw fractals in the Julia Set. I really don’t want to get into the math of it here, but it isn’t super hard to figure out how it works from the code. I’ve adapted this code from a homework assignment I had in an Intro to Parallel Computing class. That implementation used OpenMPI and I was required to parse a bunch of input parameters to determine what kind of fractal to make. I’m stripping it down so far to the point that it’s completely different. I’ll be using the bitmap generation code provided to me in that class. I’ll also be using the complex number code I wrote for that assignment to make my life easier. I won’t go through how that stuff works right now because the important part is learning how threads work, so the only file we’ll look at is main.c. And because I’m not really happy with how Blogger does formatting, I’ll be pasting pictures of syntax highlighted code from Visual Studio Code. If you want to copy and paste code, stick around to the end for the GitHub link.

To make things a lot easier (and especially since there’s not really “communication”), I’ll be using a static task assignment strategy. This means that I’ll know what each thread is doing before the thread is even created. In this case, I’ll break the picture into even blocks of rows. Each thread will process that specific block of rows and then exit. Static allocation’s counterpart – dynamic allocation, who would have guessed? – involves creating a sort of pool of rows such that when a thread is done with a row it can pull the next row needing to be done from the queue. This way, if one thread is slower than the others, it is inherently given less to do.

First we define what we pass to our thread. It’s important to note that we can only pass one thing to our thread, but that one thing can be anything. So we can pass multiple things by making a struct. I always typedef my structs because putting the word “struct” in front of your custom data types is tedious and annoying. I give each thread an ID so it can identify itself and calculate which part of the image it should do, the size of the line chunk (how much it’s supposed to do), the pointers to the shared color memory (more on that later), the number of samples (how large the image is), and numbers involved with the math.

After parsing the arguments and doing some extra setup math, we then create the shared memory for the image. The bitmap code that was given to me takes arrays of doubles (ranging from min to max, in my case 0 to 1) that define what color is at which pixel. Next, we create memory for the stuff we’ll be sending to each thread. The most memory efficient way to do this is to have a pointer to a global struct, but I’m not too worried about memory consumption at this point (especially given how much memory we’ve just allocated for the image). 

Now we get into the actual thread creation. The simplest way to this is to allocate space for pthread’s thread info object (one for each thread). Next, we start each of the threads using pthread_create. This function takes a pointer to a pthread_t struct, an attribute struct (which I’ve never used), the function pointer to the function that will be run in the thread (more on this later), and the void pointer to the argument you want to pass. We’ll look at how to receive all of this in a function later. Next, we make the main thread wait for the completion of the threads by calling pthread_join. This function takes a pthread_t struct (by value, not by reference), and a void pointer to the thread’s return value. I’ve passed NULL because I don’t return anything from my thread. Joining a thread means the caller of pthread_join will wait until the given thread has terminated (that is to say, the function you passed pthread_create has returned). If it’s already finished by the time you call pthread_join, then pthread_join will return immediately.

This is my thread code. The only thing that’s relevant to learning about pthread is the signature of the thread function. It takes one void pointer (to the argument you passed earlier) and returns a void pointer (which is given to pthread_join if you don’t pass it NULL like I have). To make the argument useful, it’s cast to the relevant type. Remember, threads can access the same memory as the main thread (main), so if you pass it a pointer to something you malloc’d in the main thread, then you can read and write to the same memory region from both threads. This is one of the things I like about threads, because you don’t need to do so much with IPC or shared memory pages. It just works.
Now you’re probably wondering how fast threading is. Well, we can run timing tests and find out. Below is the result of creating a 2048x2048 Julia set (-0.4+0.6i). I used the same Ubuntu VM, increasing the number of cores by one each time. This is the benchmark command and the results it produced:

./julia 2048 -0.4 0.6 -2 2 -2 2 output.bmp <threads> 500 2


These results make a little bit of sense. I’m not sure why there’s a bump at three threads, but it happened with any number of processors every time I ran it. So that’s interesting I would say.

Now you can make pretty pictures. That’s awesome! You can play around with how this program works as well as feeding it different values to get different images. There is a limit to how far you can zoom in – a double’s precision is only so high. I’m not sure what happens after you zoom in passed its limit, but it’s worth a shot.

So this is just a very practical example of how pthreads can be used to work wonders on a multicore system. It’s not too hard to do as long as you think about how you want to design your program ahead of time. Depending on the application, you can make your serial program go much faster with parallel processing, made easy by the POSIX Thread Library! Here is the GitHub Repo!


No comments:

Post a Comment