30.11 TCP Prethreaded Server,
per-Thread accept
We found earlier in this chapter that it is
faster to prefork a pool of children than to create one child for
every client. On a system that supports threads, it is reasonable
to expect a similar speedup by creating a pool of threads when the
server starts, instead of creating a new thread for every client.
The basic design of this server is to create a pool of threads and
then let each thread call accept. Instead of having each
thread block in the call to accept, we will use a mutex
lock (similar to Section 30.8) that
allows only one thread at a time to call accept. There is
no reason to use file locking to protect the call to
accept from all the threads, because with multiple threads
in a single process, we know that a mutex lock can be used.
Figure
30.27 shows the pthread07.h header that defines a
Thread structure that maintains some information about
each thread.
Figure 30.27
pthread07.h header.
server/pthread07.h
1 typedef struct {
2 pthread_t thread_tid; /* thread ID */
3 long thread_count; /* # connections handled */
4 } Thread;
5 Thread *tptr; /* array of Thread structures; calloc'ed */
6 int listenfd, nthreads;
7 socklen_t addrlen;
8 pthread_mutex_t mlock;
We also declare a few globals, such as the
listening socket descriptor and a mutex variable that all the
threads need to share.
Figure
30.28 shows the main function.
Figure 30.28
main function for prethreaded TCP server.
server/serv07.c
1 #include "unpthread.h"
2 #include "pthread07.h"
3 pthread_mutex_t mlock = PTHREAD_MUTEX_INITIALIZER;
4 int
5 main(int argc, char **argv)
6 {
7 int i;
8 void sig_int(int), thread_make(int);
9 if (argc == 3)
10 listenfd = Tcp_listen(NULL, argv[1], &addrlen);
11 else if (argc == 4)
12 listenfd = Tcp_listen(argv[1], argv[2], &addrlen);
13 else
14 err_quit("usage: serv07 [ <host> ] <port#> <#threads>");
15 nthreads = atoi(argv[argc - 1]);
16 tptr = Calloc(nthreads, sizeof(Thread));
17 for (i = 0; i < nthreads; i++)
18 thread_make(i); /* only main thread returns */
19 Signal(SIGINT, sig_int);
20 for ( ; ; )
21 pause(); /* everything done by threads */
22 }
The thread_make and
thread_main functions are shown in Figure 30.29.
Figure 30.29
thread_make and thread_main functions.
server/pthread07.c
1 #include "unpthread.h"
2 #include "pthread07.h"
3 void
4 thread_make(int i)
5 {
6 void *thread_main(void *);
7 Pthread_create(&tptr[i].thread_tid, NULL, &thread_main, (void *) i);
8 return; /* main thread returns */
9 }
10 void *
11 thread_main(void *arg)
12 {
13 int connfd;
14 void web_child(int);
15 socklen_t clilen;
16 struct sockaddr *cliaddr;
17 cliaddr = Malloc(addrlen);
18 printf("thread %d starting\n", (int) arg);
19 for ( ; ; ) {
20 clilen = addrlen;
21 Pthread_mutex_lock(&mlock);
22 connfd = Accept(listenfd, cliaddr, &clilen);
23 Pthread_mutex_unlock(&mlock);
24 tptr[(int) arg].thread_count++;
25 web_child(connfd); /* process request */
26 Close(connfd);
27 }
28 }
Create thread
7 Each
thread is created and executes the thread_main function.
The only argument is the index number of the thread.
21鈥?3
The thread_main function calls the functions
pthread_mutex_lock and pthread_mutex_unlock
around the call to accept.
Comparing rows 6 and 7 in Figure 30.1, we
see that this latest version of our server is faster than the
create-one-thread-per-client version. We expect this, since we
create the pool of threads only once, when the server starts,
instead of creating one thread per client. Indeed, this version of
our server is the fastest on these two hosts.
Figure 30.2 shows the
distribution of the thread_count counters in the
Thread structure, which we print in the SIGINT
handler when the server is terminated. The uniformity of this
distribution is caused by the thread scheduling algorithm that
appears to cycle through all the threads in order when choosing
which thread receives the mutex lock.
On a Berkeley-derived kernel, we do not need any
locking around the call to accept and can make a version
of Figure 30.29 without
any mutex locking and unlocking. Doing so, however, increases the
process control CPU time. If we look at the two components of the
CPU time, the user time and the system time, without any locking,
the user time decreases (because the locking is done in the threads
library, which executes in user space), but the system time
increases (the kernel's thundering herd as all threads blocked in
accept are awakened when a connection arrives). Since some
form of mutual exclusion is required to return each connection to a
single thread, it is faster for the threads to do this themselves
than for the kernel.
|