16.5 Nonblocking
connect: Web Client
A real-world example of nonblocking
connects started with the Netscape Web client (Section
13.4 of TCPv3). The client establishes an HTTP connection with a
Web server and fetches a home page. Often, that page will have
numerous references to other Web pages. Instead of fetching these
other pages serially, one at a time, the client can fetch more than
one at the same time using nonblocking connects. Figure 16.12 shows an example of
establishing multiple connections in parallel. The leftmost
scenario shows all three connections performed serially. We assume
that the first connection takes 10 units of time, the second 15,
and the third 4, for a total of 29 units of time.
In the middle scenario, we perform two
connections in parallel. At time 0, the first two connections are
started, and when the first of these finishes, we start the third.
The total time is almost halved, from 29 to 15, but realize that
this is the ideal case. If the parallel connections are sharing a
common link (say the client is behind a dialup modem link to the
Internet), each can compete against each other for the limited
resources and all the individual connection times might get longer.
For example, the time of 10 might be 15, the time of 15 might be
20, and the time of 4 might be 6. Nevertheless, the total time
would be 21, still shorter than the serial scenario.
In the third scenario, we perform three
connections in parallel, and we again assume there is no
interference between the three connections (the ideal case). But,
the total time is the same (15 units) as the second scenario given
the example times that we choose.
When dealing with Web clients, the first
connection is done by itself, followed by multiple connections for
the references found in the data from that first connection. We
show this in Figure
16.13.
To further optimize this sequence, the client
can start parsing the data that is returned for the first
connection before the first connection completes and initiate
additional connections as soon as it knows that additional
connections are needed.
Since we are doing multiple nonblocking
connects at the same time, we cannot use our
connect_nonb function from Figure 16.11 because
it does not return until the connection is established. Instead, we
must keep track of multiple connections ourself.
Our program will read up to 20 files from a Web
server. We specify as command-line arguments the maximum number of
parallel connections, the server's hostname, and each of the
filenames to fetch from the server. A typical execution of our
program is
solaris % web 3 www.foobar.com / image1.gif image2.gif \
image3.gif image4.gif image5.gif \
image6.gif image7.gif
The command-line arguments specify three
simultaneous connections: the server's hostname, the filename for
the home page (/, the server's root page), and seven files
to then read (which in this example are all GIF images). These
seven files would normally be referenced on the home page, and a
Web client would read the home page and parse the HTML to obtain
these filenames. We do not want to complicate this example with
HTML parsing, so we just specify the filenames on the command
line.
This is a larger example, so we will show it in
pieces. Figure 16.14 is
our web.h header that each file includes.
Figure 16.14
web.h header.
nonblock/web.h
1 #include "unp.h"
2 #define MAXFILES 20
3 #define SERV "80" /* port number or service name */
4 struct file {
5 char *f_name; /* filename */
6 char *f_host; /* hostname or IPv4/IPv6 address */
7 int f_fd; /* descriptor */
8 int f_flags; /* F_xxx below */
9 } file[MAXFILES];
10 #define F_CONNECTING 1 /* connect() in progress */
11 #define F_READING 2 /* connect() complete; now reading */
12 #define F_DONE 4 /* all done */
13 #define GET_CMD "GET %s HTTP/1.0\r\n\r\n"
14 /* globals */
15 int nconn, nfiles, nlefttoconn, nlefttoread, maxfd;
16 fd_set rset, wset;
17 /* function prototypes */
18 void home_page(const char *, const char *);
19 void start_connect(struct file *);
20 void write_get_cmd(struct file *);
Define file structure
2鈥?3
The program reads up to MAXFILES files from the Web
server. We maintain a file structure with information
about each file: its name (copied from the command-line argument),
the hostname or IP address of the server to read the file from, the
socket descriptor being used for the file, and a set of flags to
specify what we are doing with this file (connecting, reading, or
done).
Define globals and function
prototypes
14鈥?0
We define the global variables and function prototypes for the
functions that we will describe shortly.
Figure
16.15 shows the first part of the main program.
Figure 16.15
First part of simultaneous connect: globals and start of
main.
nonblock/web.c
1 #include "web.h"
2 int
3 main(int argc, char **argv)
4 {
5 int i, fd, n, maxnconn, flags, error;
6 char buf[MAXLINE];
7 fd_set rs, ws;
8 if (argc < 5)
9 err_quit("usage: web <#conns> <hostname> <homepage> <file1> ...");
10 maxnconn = atoi(argv[1]);
11 nfiles = min(argc - 4, MAXFILES);
12 for (i = 0; i < nfiles; i++) {
13 file[i].f_name = argv[i + 4];
14 file[i].f_host = argv[2];
15 file[i].f_flags = 0;
16 }
17 printf("nfiles = %d\n", nfiles);
18 home_page(argv[2], argv[3]);
19 FD_ZERO(&rset);
20 FD_ZERO(&wset);
21 maxfd = -1;
22 nlefttoread = nlefttoconn = nfiles;
23 nconn = 0;
Process command-line arguments
11鈥?7
The file structures are filled in with the relevant
information from the command-line arguments.
Read home page
18 The
function home_page, which we will show next, creates a TCP
connection, sends a command to the server, and then reads the home
page. This is the first connection, which is done by itself, before
we start establishing multiple connections in parallel.
Initialize globals
19鈥?3
Two descriptor sets, one for reading and one for writing, are
initialized. maxfd is the maximum descriptor for
select (which we initialize to 鈥? since descriptors are
non-negative), nlefttoread is the number of files
remaining to be read (when this reaches 0, we are finished),
nlefttoconn is the number of files that still need a TCP
connection, and nconn is the number of connections
currently open (which can never exceed the first command-line
argument).
Figure
16.16 shows the home_page function that is called once
when the main function begins.
Figure 16.16
home_page function.
nonblock/home_page.c
1 #include "web.h"
2 void
3 home_page(const char *host, const char *fname)
4 {
5 int fd, n;
6 char line[MAXLINE];
7 fd = Tcp_connect(host, SERV); /* blocking connect() */
8 n = snprintf(line, sizeof(line), GET_CMD, fname);
9 Writen(fd, line, n);
10 for ( ; ; ) {
11 if ( (n = Read(fd, line, MAXLINE)) == 0)
12 break; /* server closed connection */
13 printf("read %d bytes of home page\n", n);
14 /* do whatever with data */
15 }
16 printf("end-of-file on home page\n");
17 Close(fd);
18 }
Establish connection with server
7 Our
tcp_connect establishes a connection with the server.
Send HTTP command to server, read
reply
8鈥?7
An HTTP GET command is issued for the home page (often
named /). The reply is read (we do not do anything with
the reply) and the connection is closed.
The next function, start_connect, shown
in Figure 16.17, initiates
a nonblocking connect.
Figure 16.17
Initiate nonblocking connect.
nonblock/start_connect.c
1 #include "web.h"
2 void
3 start_connect(struct file *fptr)
4 {
5 int fd, flags, n;
6 struct addrinfo *ai;
7 ai = Host_serv(fptr->f_host, SERV, 0, SOCK_STREAM);
8 fd = Socket(ai->ai_family, ai->ai_socktype, ai->ai_protocol);
9 fptr->f_fd = fd;
10 printf("start_connect for %s, fd %d\n", fptr->f_name, fd);
11 /* Set socket nonblocking */
12 flags = Fcntl(fd, F_GETFL, 0);
13 Fcntl(fd, F_SETFL, flags | O_NONBLOCK);
14 /* Initiate nonblocking connect to the server. */
15 if ( (n = connect(fd, ai->ai_addr, ai->ai_addrlen)) < 0) {
16 if (errno != EINPROGRESS)
17 err_sys("nonblocking connect error");
18 fptr->f_flags = F_CONNECTING;
19 FD_SET(fd, &rset); /* select for reading and writing */
20 FD_SET(fd, &wset);
21 if (fd > maxfd)
22 maxfd = fd;
23 } else if (n >= 0) /* connect is already done */
24 write_get_cmd(fptr); /* write() the GET command */
25 }
Create socket, set to nonblocking
7鈥?3
We call our host_serv function (Figure 11.9) to look
up and convert the hostname and service name, returning a pointer
to an array of addrinfo structures. We use only the first
structure. A TCP socket is created and the socket is set to
nonblocking.
Initiate nonblocking
connect
14鈥?2
The nonblocking connect is initiated and the file's flag
is set to F_CONNECTING. The socket descriptor is turned on
in both the read set and the write set since select will
wait for either condition as an indication that the connection has
finished. We also update maxfd, if necessary.
Handle connection complete
23鈥?4
If connect returns successfully, the connection is already
complete and the function write_get_cmd (shown next) sends
a command to the server.
We set the socket to nonblocking for the
connect, but never reset it to its default blocking mode.
This is fine because we write only a small amount of data to the
socket (the GET command in the next function) and we
assume that this command is much smaller than the socket send
buffer. Even if write returns a short count because of the
nonblocking flag, our writen function handles this.
Leaving the socket as nonblocking has no effect on the subsequent
reads that are performed because we always call
select to wait for the socket to become readable.
Figure
16.18 shows the function write_get_cmd, which sends an
HTTP GET command to the server.
Figure 16.18
Send an HTTP GET command to the server.
nonblock/write_get_cmd.c
1 #include "web.h"
2 void
3 write_get_cmd(struct file *fptr)
4 {
5 int n;
6 char line[MAXLINE];
7 n = snprintf(line, sizeof(line), GET_CMD, fptr->f_name);
8 Writen(fptr->f_fd, line, n);
9 printf("wrote %d bytes for %s\n", n, fptr->f_name);
10 fptr->f_flags = F_READING; /* clears F_CONNECTING */
11 FD_SET(fptr->f_fd, &rset); /* will read server's reply */
12 if (fptr->f_fd > maxfd)
13 maxfd = fptr->f_fd;
14 }
Build command and send it
7鈥?
The command is built and written to the socket.
Set flags
10鈥?3
The file's F_READING flag is set, which also clears the
F_CONNECTING flag (if set). This indicates to the main
loop that this descriptor is ready for input. The descriptor is
also turned on in the read set and maxfd is updated, if
necessary.
We now return to the main function in
Figure 16.19, picking up
where we left off in Figure
16.15. This is the main loop of the program: As long as there
are more files to process (nlefttoread is greater than 0),
start another connection if possible and then use select
on all active descriptors, handling both nonblocking connection
completions and the arrival of data.
Initiate another connection, if
possible
24鈥?5
If we are not at the specified limit of simultaneous connections,
and there are additional connections to establish, find a file that
we have not yet processed (indicated by a f_flags of 0)
and call start_connect to initiate the connection. The
number of active connections is incremented (nconn) and
the number of connections remaining to be established is
decremented (nlefttoconn).
select: wait for something to
happen
36鈥?7
select waits for either readability or writability.
Descriptors that have a nonblocking connect in progress
will be enabled in both sets, while descriptors with a completed
connection that are waiting for data from the server will be
enabled in just the read set.
Handle all ready descriptors
39鈥?5
We now process each element in the array of file
structures to determine which descriptors need processing. If the
F_CONNECTING flag is set and the descriptor is on in
either the read set or the write set, the nonblocking
connect is finished. As we described with Figure
16.11, we call getsockopt to fetch the pending error
for the socket. If this value is 0, the connection completed
successfully. In that case, we turn off the descriptor in the write
set and call write_get_cmd to send the HTTP request to the
server.
See if descriptor has data
56鈥?7
If the F_READING flag is set and the descriptor is ready
for reading, we call read. If the connection was closed by
the other end, we close the socket, set the F_DONE flag,
turn off the descriptor in the read set, and decrement the number
of active connections and the total number of connections to be
processed.
There are two optimizations that we do not
perform in this example (to avoid complicating it even more).
First, we could terminate the for loop in Figure 16.19 when we finish
processing the number of descriptors that select said were
ready. Next, we could decrease the value of maxfd when
possible, to save select from examining descriptor bits
that are no longer set. Since the number of descriptors this code
deals with at any one time is probably less than 10, and not in the
thousands, it is doubtful that either of these optimizations is
worth the additional complications.
Figure 16.19
Main loop of main function.
nonblock/web.c
24 while (nlefttoread > 0) {
25 while (nconn < maxnconn && nlefttoconn > 0) {
26 /* find a file to read */
27 for (i = 0; i < nfiles; i++)
28 if (file[i].f_flags == 0)
29 break;
30 if (i == nfiles)
31 err_quit("nlefttoconn = %d but nothing found", nlefttoconn);
32 start_connect(&file[i]);
33 nconn++;
34 nlefttoconn--;
35 }
36 rs = rset;
37 ws = wset;
38 n = Select(maxfd + 1, &rs, &ws, NULL, NULL);
39 for (i = 0; i < nfiles; i++) {
40 flags = file[i].f_flags;
41 if (flags == 0 || flags & F_DONE)
42 continue;
43 fd = file[i].f_fd;
44 if (flags & F_CONNECTING &&
45 (FD_ISSET(fd, &rs) || FD_ISSET(fd, &ws))) {
46 n = sizeof(error);
47 if (getsockopt(fd, SOL_SOCKET, SO_ERROR, &error, &n) < 0 ||
48 error != 0) {
49 err_ret("nonblocking connect failed for %s",
50 file[i].f_name);
51 }
52 /* connection established */
53 printf("connection established for %s\n", file[i].f_name);
54 FD_CLR(fd, &wset); /* no more writeability test */
55 write_get_cmd(&file[i]); /* write() the GET command */
56 } else if (flags & F_READING && FD_ISSET(fd, &rs)) {
57 if ( (n = Read(fd, buf, sizeof(buf))) == 0) {
58 printf("end-of-file on %s\n", file[i].f_name);
59 Close(fd);
60 file[i].f_flags = F_DONE; /* clears F_READING */
61 FD_CLR(fd, &rset);
62 nconn--;
63 nlefttoread--;
64 } else {
65 printf("read %d bytes from %s\n", n, file[i].f_name);
66 }
67 }
68 }
69 }
70 exit(0);
71 }
Performance of Simultaneous
Connections
What is the performance gain in establishing
multiple connections at the same time? Figure 16.20 shows the clock time required to
fetch a Web server's home page, followed by nine image files from
that server. The RTT to the server is about 150 ms. The home page
size was 4,017 bytes and the average size of the 9 image files was
1,621 bytes. TCP's segment size was 512 bytes. We also include in
this figure, for comparison, values for a version of this program
that we will develop in Section 26.9 using
threads.
Most of the improvement is obtained with three
simultaneous connections (the clock time is halved), and the
performance increase is much less with four or more simultaneous
connections.
We provide this example using simultaneous
connects because it is a nice example using nonblocking
I/O and one whose performance impact can be measured. It is also a
feature used by a popular Web application, the Netscape browser.
There are pitfalls in this technique if there is any congestion in
the network. Chapter 21 of TCPv1 describes TCP's slow-start and
congestion avoidance algorithms in detail. When multiple
connections are established from a client to a server, there is no
communication between the connections at the TCP layer. That is, if
one connection encounters a packet loss, the other connections to
the same server are not notified, and it is highly probable that
the other connections will soon encounter packet loss unless they
slow down. These additional connections are sending more packets
into an already congested network. This technique also increases
the load at any given time on the server.
|