Before we dive into writing threaded programs, let’s first
decide if a program would benefit from threading. This is an important decision
to make early on in the design process, as you want to be able to design your
program around the benefits threading can provide. The question you have to ask
is simple: does your program have to do two things at once? Say you need a
responsive interface while at the same time you need to also do some long
running operations, respond to network I/O, or whatever will detract from the
operation of the front end. That’s a pretty clear indication that you’ll want
to use threads. You also might want to use threads if you can easily do tasks
in parallel (like in the example below).
Then you have to ask yourself if you need threads or a
full-blown forked process. Processes are far more independent and can survive
outside the termination of the main process. They have their own memory
footprint and have their own process-related resources. They are a bit heavy
and are, in my opinion, a bit harder to work with in terms of memory sharing
(shared memory pages are not hard, but their a touch more tedious to deal
with). I always try to work with threads before I’m forced to work with
multiple processes.
So once you’ve decided that threads are the right thing to
do, be sure to divide your work up into logical tasks. In this example, I’ll
write a program to draw fractals in the Julia Set. I really don’t want to get
into the math of it here, but it isn’t super hard to figure out how it works
from the code. I’ve adapted this code from a homework assignment I had in an
Intro to Parallel Computing class. That implementation used OpenMPI and I was
required to parse a bunch of input parameters to determine what kind of fractal
to make. I’m stripping it down so far to the point that it’s completely
different. I’ll be using the bitmap generation code provided to me in that
class. I’ll also be using the complex number code I wrote for that assignment
to make my life easier. I won’t go through how that stuff works right now
because the important part is learning how threads work, so the only file we’ll
look at is main.c. And because I’m not really happy with how Blogger does
formatting, I’ll be pasting pictures of syntax highlighted code from Visual
Studio Code. If you want to copy and paste code, stick around to the end for
the GitHub link.
To make things a lot easier (and especially since there’s
not really “communication”), I’ll be using a static task assignment strategy. This
means that I’ll know what each thread is doing before the thread is even
created. In this case, I’ll break the picture into even blocks of rows. Each
thread will process that specific block of rows and then exit. Static
allocation’s counterpart – dynamic allocation, who would have guessed? –
involves creating a sort of pool of rows such that when a thread is done with a
row it can pull the next row needing to be done from the queue. This way, if one
thread is slower than the others, it is inherently given less to do.
First we define what we pass to our thread. It’s important
to note that we can only pass one thing to our thread, but that one thing can
be anything. So we can pass multiple things by making a struct. I always
typedef my structs because putting the word “struct” in front of your custom
data types is tedious and annoying. I give each thread an ID so it can identify
itself and calculate which part of the image it should do, the size of the line
chunk (how much it’s supposed to do), the pointers to the shared color memory
(more on that later), the number of samples (how large the image is), and
numbers involved with the math.
After parsing the arguments and doing some extra setup math,
we then create the shared memory for the image. The bitmap code that was given
to me takes arrays of doubles (ranging from min to max, in my case 0 to 1) that
define what color is at which pixel. Next, we create memory for the stuff we’ll
be sending to each thread. The most memory efficient way to do this is to have
a pointer to a global struct, but I’m not too worried about memory consumption
at this point (especially given how much memory we’ve just allocated for the
image).
Now we get into the actual thread creation. The simplest way
to this is to allocate space for pthread’s thread info object (one for each
thread). Next, we start each of the threads using pthread_create. This function
takes a pointer to a pthread_t struct, an attribute struct (which I’ve never
used), the function pointer to the function that will be run in the thread
(more on this later), and the void pointer to the argument you want to pass.
We’ll look at how to receive all of this in a function later. Next, we make the
main thread wait for the completion of the threads by calling pthread_join. This
function takes a pthread_t struct (by value, not by reference), and a void
pointer to the thread’s return value. I’ve passed NULL because I don’t return
anything from my thread. Joining a thread means the caller of pthread_join will
wait until the given thread has terminated (that is to say, the function you
passed pthread_create has returned). If it’s already finished by the time you
call pthread_join, then pthread_join will return immediately.
This is my thread code. The only thing that’s relevant to
learning about pthread is the signature of the thread function. It takes one
void pointer (to the argument you passed earlier) and returns a void pointer
(which is given to pthread_join if you don’t pass it NULL like I have). To make
the argument useful, it’s cast to the relevant type. Remember, threads can
access the same memory as the main thread (main), so if you pass it a pointer
to something you malloc’d in the main thread, then you can read and write to
the same memory region from both threads. This is one of the things I like
about threads, because you don’t need to do so much with IPC or shared memory
pages. It just works.
Now you’re probably wondering how fast threading is. Well,
we can run timing tests and find out. Below is the result of creating a
2048x2048 Julia set (-0.4+0.6i). I used the same Ubuntu VM, increasing the
number of cores by one each time. This is the benchmark command and the results
it produced:
./julia 2048 -0.4 0.6 -2 2 -2 2 output.bmp <threads>
500 2
These results make a little bit of sense. I’m not sure why
there’s a bump at three threads, but it happened with any number of processors
every time I ran it. So that’s interesting I would say.
Now you can make pretty pictures. That’s awesome! You can play
around with how this program works as well as feeding it different values to
get different images. There is a limit to how far you can zoom in – a double’s
precision is only so high. I’m not sure what happens after you zoom in passed
its limit, but it’s worth a shot.
So this is just a very practical example of how pthreads can
be used to work wonders on a multicore system. It’s not too hard to do as long
as you think about how you want to design your program ahead of time. Depending
on the application, you can make your serial program go much faster with
parallel processing, made easy by the POSIX Thread Library! Here is the GitHub Repo!
No comments:
Post a Comment