26.8 Condition
Variables
A mutex is fine to prevent simultaneous access
to a shared variable, but we need something else to let us go to
sleep waiting for some condition to occur. Let's demonstrate this
with an example. We return to our Web client in Section 26.6
and replace the Solaris thr_join with
pthread_join. But, we cannot call the Pthread function
until we know that a thread has terminated. We first declare a
global variable that counts the number of terminated threads and
protect it with a mutex.
int ndone; /* number of terminated threads */
pthread_mutex_t ndone_mutex = PTHREAD_MUTEX_INITIALIZER;
We then require that each thread increment this
counter when it terminates, being careful to use the associated
mutex.
void *
do_get_read (void *vptr)
{
...
Pthread_mutex_lock(&ndone_mutex);
ndone++;
Pthread_mutex_unlock(&ndone_mutex);
return(fptr); /* terminate thread */
}
This is fine, but how do we code the main loop?
It needs to lock the mutex continually and check if any threads
have terminated.
while (nlefttoread > 0) {
while (nconn < maxnconn && nlefttoconn > 0) {
/* find a file to read */
...
}
/* See if one of the threads is done */
Pthread_mutex_lock(&ndone_mutex);
if (ndone > 0) {
for (i = 0; i < nfiles; i++) {
if (file[i].f_flags & F_DONE) {
Pthread_join(file[i].f_tid, (void **) &fptr);
/* update file[i] for terminated thread */
...
}
}
}
Pthread_mutex_unlock(&ndone_mutex);
}
While this is okay, it means the main loop
never goes to sleep; it just
loops, checking ndone every time around the loop. This is
called polling and is considered a
waste of CPU time.
We want a method for the main loop to go to
sleep until one of its threads notifies it that something is ready.
A condition variable, in
conjunction with a mutex, provides this facility. The mutex
provides mutual exclusion and the condition variable provides a
signaling mechanism.
In terms of Pthreads, a condition variable is a
variable of type pthread_cond_t. They are used with the
following two functions:
#include <pthread.h>
|
int pthread_cond_wait(pthread_cond_t
*cptr, pthread_mutex_t
*mptr);
|
int pthread_cond_signal(pthread_cond_t
*cptr);
|
Both return: 0 if OK, positive
Exxx value on error
|
The term "signal" in the second function's name
does not refer to a Unix SIGxxx signal.
An example is the easiest way to explain these
functions. Returning to our Web client example, the counter
ndone is now associated with both a condition variable and
a mutex.
int ndone;
pthread_mutex_t ndone_mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t ndone_cond = PTHREAD_COND_INITIALIZER;
A thread notifies the main loop that it is
terminating by incrementing the counter while its mutex lock is
held and by signaling the condition variable.
Pthread_mutex_lock(&ndone_mutex);
ndone++;
Pthread_cond_signal(&ndone_cond);
Pthread_mutex_unlock(&ndone_mutex);
The main loop then blocks in a call to
pthread_cond_wait, waiting to be signaled by a terminating
thread.
while (nlefttoread > 0) {
while (nconn < maxnconn && nlefttoconn > 0) {
/* find file to read */
...
}
/* Wait for thread to terminate */
Pthread_mutex_lock(&ndone_mutex);
while (ndone == 0)
Pthread_cond_wait (&ndone_cond, &ndone_mutex);
for (i = 0; i < nfiles; i++) {
if (file[i].f_flags & F_DONE) {
Pthread_join(file[i].f_tid, (void **) &fptr);
/* update file[i] for terminated thread */
...
}
}
Pthread_mutex_unlock (&ndone_mutex);
}
Notice that the variable ndone is still
checked only while the mutex is held. Then, if there is nothing to
do, pthread_cond_wait is called. This puts the calling
thread to sleep and releases the
mutex lock it holds. Furthermore, when the thread returns from
pthread_cond_wait (after some other thread has signaled
it), the thread again holds the mutex.
Why is a mutex always associated with a
condition variable? The "condition" is normally the value of some
variable that is shared between the threads. The mutex is required
to allow this variable to be set and tested by the different
threads. For example, if we did not have the mutex in the example
code just shown, the main loop would test it as follows:
/* Wait for thread to terminate */
while (ndone == 0)
Pthread_cond_wait(&ndone_cond, &ndone_mutex);
But, there is a possibility that the last of the
threads increments ndone after the test of ndone ==
0, but before the call to pthread_cond_wait. If this
happens, this last "signal" is lost and the main loop would block
forever, waiting for something that will never occur again.
This is the same reason that
pthread_cond_wait must be called with the associated mutex
locked, and why this function unlocks the mutex and puts the
calling thread to sleep as a single, atomic operation. If this
function did not unlock the mutex and then lock it again when it
returns, the thread would have to unlock and lock the mutex and the
code would look like the following:
/* Wait for thread to terminate */
Pthread_mutex_lock(&ndone_mutex);
while (ndone == 0) {
Pthread_mutex_unlock(&ndone_mutex);
Pthread_cond_wait(&ndone_cond, &ndone_mutex);
Pthread_mutex_lock(&ndone_mutex);
}
But again, there is a possibility that the final
thread could terminate and increment the value of ndone
between the call to pthread_mutex_unlock and
pthread_cond_wait.
Normally, pthread_cond_signal awakens
one thread that is waiting on the condition variable. There are
instances when a thread knows that multiple threads should be
awakened, in which case, pthread_cond_broadcast will wake
up all threads that are blocked on
the condition variable.
#include <pthread.h>
|
int pthread_cond_broadcast (pthread_cond_t
* cptr);
|
int pthread_cond_timedwait (pthread_cond_t
* cptr, pthread_mutex_t
*mptr, const struct
timespec *abstime);
|
Both return: 0 if OK, positive
Exxx value on error
|
pthread_cond_timedwait lets a thread
place a limit on how long it will block. abstime is a timespec structure (as
we defined with the pselect function, Section 6.9)
that specifies the system time when the function must return, even
if the condition variable has not been signaled yet. If this
timeout occurs, ETIME is returned.
This time value is an absolute time; it is not a time delta. That is, abstime is the system time鈥攖he number of
seconds and nanoseconds past January 1, 1970, UTC鈥攚hen the function
should return. This differs from both select and
pselect, which specify the number of seconds and
microseconds (nanoseconds for pselect) until some time in
the future when the function should return. The normal procedure is
to call gettimeofday to obtain the current time (as a
timeval structure!), and copy this into a
timespec structure, adding in the desired time limit. For
example,
struct timeval tv;
struct timespec ts;
if (gettimeofday(&tv, NULL) < 0)
err_sys("gettimeofday error");
ts.tv_sec = tv.tv_sec + 5; /* 5 seconds in future */
ts.tv_nsec = tv.tv_usec * 1000; /* microsec to nanosec */
pthread_cond_timedwait( ..., &ts);
The advantage in using an absolute time instead
of a delta time is if the function prematurely returns (perhaps
because of a caught signal), the function can be called again,
without having to change the contents of the timespec
structure. The disadvantage, however, is having to call
gettimeofday before the function can be called the first
time.
The POSIX specification defines a
clock_gettime function that returns the current time as a
timespec structure.
|