← Home

Highlights from High Performance Browser Networking

Source:

The two critical components that dictate the performance of all network traffic: latency and bandwidth.

  • Latency: The time from the source sending a packet to the destination receiving it.
  • Bandwidth: Maximum throughput of a logical or physical communication path

Latency is the time it takes for a message, or a packet, to travel from its point of origin to the point of destination.

To connect your home or office to the Internet, your local ISP needs to route the cables throughout the neighborhood, aggregate the signal, and forward it to a local routing node. […] This translates into 18–44 ms of latency just to the closest measuring node within the ISP’s core network, before the packet is even routed to its destination!

Latency, not bandwidth, is the performance bottleneck for most websites!

Content delivery network (CDN) services provide many benefits, but chief among them is the simple observation that distributing the content around the globe, and serving that content from a nearby location to the client, will allow us to significantly reduce the propagation time of all the data packets. We may not be able to make the packets travel faster, but we can reduce the distance by strategically positioning our servers closer to the users! Leveraging a CDN to serve your data can offer significant performance benefits.

An optical fiber acts as a simple “light pipe,” slightly thicker than a human hair, designed to transmit light between the two ends of the cable. Metal wires are also used but are subject to higher signal loss, electromagnetic interference, and higher lifetime maintenance costs. Chances are, your packets will travel over both types of cable, but for any long-distance hops, they will be transmitted over a fiber-optic link. Improving latency, on the other hand, is a very different story. The quality of the fiber links could be improved to get us a little closer to the speed of light: better materials with lower refractive index and faster routers along the way. However, given that our current speeds are within ~1.5 of the speed of light, the most we can expect from this strategy is just a modest 30% improvement. Unfortunately, there is simply no way around the laws of physics: the speed of light places a hard limit on the minimum latency.

As a result, to improve performance of our applications, we need to architect and optimize our protocols and networking code with explicit awareness of the limitations of available bandwidth and the speed of light: we need to reduce round trips, move the data closer to the client, and build applications that can hide the latency through caching, pre-fetching, and a variety of similar techniques, as explained in subsequent chapters.

At the heart of the Internet are two protocols, IP and TCP. The IP, or Internet Protocol, is what provides the host-to-host routing and addressing, and TCP, or Transmission Control Protocol, is what provides the abstraction of a reliable network running over an unreliable channel.

TCP provides an effective abstraction of a reliable network running over an unreliable channel, hiding most of the complexity of network communication from our applications: retransmission of lost data, in-order delivery, congestion control and avoidance, data integrity, and more. When you work with a TCP stream, you are guaranteed that all bytes sent will be identical with bytes received and that they will arrive in the same order to the client. As such, TCP is optimized for accurate delivery, rather than a timely one.

The HTTP standard does not specify TCP as the only transport protocol. If we wanted, we could deliver HTTP via a datagram socket (User Datagram Protocol or UDP), or any other transport protocol of our choice, but in practice all HTTP traffic on the Internet today is delivered via TCP due to the many great features it provides out of the box.

All TCP connections begin with a three-way handshake (Figure 2-1). Before the client or the server can exchange any application data, they must agree on starting packet sequence numbers, as well as a number of other connection specific variables, from both sides.

SYN Client picks a random sequence number x and sends a SYN packet, which may also include additional TCP flags and options. SYN ACK Server increments x by one, picks own random sequence number y, appends its own set of flags and options, and dispatches the response. ACK Client increments both x and y by one and completes the handshake by dispatching the last ACK packet in the handshake.

Once the three-way handshake is complete, the application data can begin to flow between the client and the server. The client can send a data packet immediately after the ACK packet, and the server must wait for the ACK before it can dispatch any data. This startup process applies to every TCP connection and carries an important implication for performance of all network applications using TCP: each new connection will have a full roundtrip of latency before any application data can be transferred.

The delay imposed by the three-way handshake makes new TCP connections expensive to create, and is one of the big reasons why connection reuse is a critical optimization for any application running over TCP.

Flow control is a mechanism to prevent the sender from overwhelming the receiver with data it may not be able to process—the receiver may be busy, under heavy load, or may only be willing to allocate a fixed amount of buffer space. To address this, each side of the TCP connection advertises (Figure 2-2) its own receive window (rwnd), which communicates the size of the available buffer space to hold the incoming data.

Despite the presence of flow control in TCP, network congestion collapse became a real issue in the mid to late 1980s. The problem was that flow control prevented the sender from overwhelming the receiver, but there was no mechanism to prevent either side from overwhelming the underlying network: neither the sender nor the receiver knows the available bandwidth at the beginning of a new connection, and hence need a mechanism to estimate it and also to adapt their speeds to the continuously changing conditions within the network.

The only way to estimate the available capacity between the client and the server is to measure it by exchanging data, and this is precisely what slow-start is designed to do. To start, the server initializes a new congestion window (cwnd) variable per TCP connection and sets its initial value to a conservative, system-specified value (initcwnd on Linux).

The solution is to start slow and to grow the window size as the packets are acknowledged: slow-start!

The same request made on the same connection, but without the cost of the three-way handshake and the penalty of the slow-start phase, now took 96 milliseconds, which translates into a 275% improvement in performance!

In both cases, the fact that both the server and the client have access to 5 Mbps of upstream bandwidth had no impact during the startup phase of the TCP connection. Instead, the latency and the congestion window sizes were the limiting factors.

It is important to recognize that TCP is specifically designed to use packet loss as a feedback mechanism to help regulate its performance. In other words, it is not a question of if, but rather of when the packet loss will occur. Slow-start initializes the connection with a conservative window and, for every roundtrip, doubles the amount of data in flight until it exceeds the receiver’s flow-control window, a system-configured congestion threshold (ssthresh) window, or until a packet is lost, at which point the congestion avoidance algorithm (Figure 2-3) takes over.

The implicit assumption in congestion avoidance is that packet loss is indicative of network congestion: somewhere along the path we have encountered a congested link or a router, which was forced to drop the packet, and hence we need to adjust our window to avoid inducing more packet loss to avoid overwhelming the network.

TCP provides the abstraction of a reliable network running over an unreliable channel, which includes basic packet error checking and correction, in-order delivery, retransmission of lost packets, as well as flow control, congestion control, and congestion avoidance designed to operate the network at the point of greatest efficiency. Combined, these features make TCP the preferred transport for most applications. However, while TCP is a popular choice, it is not the only, nor necessarily the best choice for every occasion. Specifically, some of the features, such as in-order and reliable packet delivery, are not always necessary and can introduce unnecessary delays and negative performance implications. To understand why that is the case, recall that every TCP packet carries a unique sequence number when put on the wire, and the data must be passed to the receiver in-order (Figure 2-8). If one of the packets is lost en route to the receiver, then all subsequent packets must be held in the receiver’s TCP buffer until the lost packet is retransmitted and arrives at the receiver. Because this work is done within the TCP layer, our application has no visibility into the TCP retransmissions or the queued packet buffers, and must wait for the full sequence before it is able to access the data. Instead, it simply sees a delivery delay when it tries to read the data from the socket. This effect is known as TCP head-of-line (HOL) blocking.

The delay imposed by head-of-line blocking allows our applications to avoid having to deal with packet reordering and reassembly, which makes our application code much simpler. However, this is done at the cost of introducing unpredictable latency variation in the packet arrival times, commonly referred to as jitter, which can negatively impact the performance of the application.

Further, some applications may not even need either reliable delivery or in-order delivery: if every packet is a standalone message, then in-order delivery is strictly unnecessary, and if every message overrides all previous messages, then the requirement for reliable delivery can be removed entirely. Unfortunately, TCP does not provide such configuration—all packets are sequenced and delivered in order. Applications that can deal with out-of-order delivery or packet loss and that are latency or jitter sensitive are likely better served with an alternate transport, such as UDP.

Having said that, while the specific details of each algorithm and feedback mechanism will continue to evolve, the core principles and their implications remain unchanged: TCP three-way handshake introduces a full roundtrip of latency. TCP slow-start is applied to every new connection. TCP flow and congestion control regulate throughput of all connections. TCP throughput is regulated by current congestion window size.

As a result, the rate with which a TCP connection can transfer data in modern high-speed networks is often limited by the roundtrip time between the receiver and sender. Further, while bandwidth continues to increase, latency is bounded by the speed of light and is already within a small constant factor of its maximum value. In most cases, latency, not bandwidth, is the bottleneck for TCP.

Datagram: A self-contained, independent entity of data carrying sufficient information to be routed from the source to the destination nodes without reliance on earlier exchanges between the nodes and the transporting network.

The words datagram and packet are often used interchangeably, but there are some nuances. While the term “packet” applies to any formatted block of data, the term “datagram” is often reserved for packets delivered via an unreliable service—no delivery guarantees, no failure notifications.

Perhaps the most well-known use of UDP, and one that every browser and Internet application depends on, is the Domain Name System (DNS): given a human-friendly computer hostname, we need to discover its IP address before any data exchange can occur.

However, even though the browser itself is dependent on UDP, historically the protocol has never been exposed as a first-class transport for pages and applications running within it. That is, until WebRTC entered into the picture.

The new Web Real-Time Communication (WebRTC) standards, jointly developed by the IETF and W3C working groups, are enabling real-time communication, such as voice and video calling and other forms of peer-to-peer (P2P) communication, natively within the browser via UDP. With WebRTC, UDP is now a first-class browser transport with a client-side API!

To understand UDP and why it is commonly referred to as a “null protocol,” we first need to look at the Internet Protocol (IP), which is located one layer below both TCP and UDP protocols.

Once again, the word “datagram” is an important distinction: the IP layer provides no guarantees about message delivery or notifications of failure and hence directly exposes the unreliability of the underlying network to the layers above it. If a routing node along the way drops the IP packet due to congestion, high load, or for other reasons, then it is the responsibility of a protocol above IP to detect it, recover, and retransmit the data—that is, if that is the desired behavior!

The UDP protocol encapsulates user messages into its own packet structure (Figure 3-2), which adds only four additional fields: source port, destination port, length of packet, and checksum. Thus, when IP delivers the packet to the destination host, the host is able to unwrap the UDP packet, identify the target application by the destination port, and deliver the message. Nothing more, nothing less.

With that in mind, we can now summarize all the UDP non-services: No guarantee of message delivery No acknowledgments, retransmissions, or timeouts No guarantee of order of delivery No packet sequence numbers, no reordering, no head-of-line blocking No connection state tracking No connection establishment or teardown state machines No congestion control No built-in client or network feedback mechanisms Unfortunately, IPv4 addresses are only 32 bits long, which provides a maximum of 4.29 billion unique IP addresses. The IP Network Address Translator (NAT) specification was introduced in mid-1994 (RFC 1631) as an interim solution to resolve the looming IPv4 address depletion problem—as the number of hosts on the Internet began to grow exponentially in the early ’90s, we could not expect to allocate a unique IP to every host.

The proposed IP reuse solution was to introduce NAT devices at the edge of the network, each of which would be responsible for maintaining a table mapping of local IP and port tuples to one or more globally unique (public) IP and port tuples (Figure 3-3). The local IP address space behind the translator could then be reused among many different networks, thus solving the address depletion problem.

Unfortunately, as it often happens, there is nothing more permanent than a temporary solution. Not only did the NAT devices resolve the immediate problem, but they also quickly became a ubiquitous component of many corporate and home proxies and routers, security appliances, firewalls, and dozens of other hardware and software devices. NAT middleboxes are no longer a temporary solution; rather, they have become an integral part of the Internet infrastructure.

Chances are, your local router has assigned your computer an IP address from one of those ranges. That’s your private IP address on the internal network, which is then translated by the NAT device when communicating with an outside network. To avoid routing errors and confusion, no public computer is allowed to be assigned an IP address from any of these reserved private network ranges.

UDP is a simple and a commonly used protocol for bootstrapping new transport protocols. In fact, the primary feature of UDP is all the features it omits: no connection state, handshakes, retransmissions, reassembly, reordering, congestion control, congestion avoidance, flow control, or even optional error checking. However, the flexibility that this minimal message-oriented transport layer affords is also a liability for the implementer. Your application will likely have to reimplement some, or many, of these features from scratch, and each must be designed to play well with other peers and protocols on the network.

When the SSL protocol was standardized by the IETF, it was renamed to Transport Layer Security (TLS). Because the SSL protocol was proprietary to Netscape, the IETF formed an effort to standardize the protocol, resulting in RFC 2246, which became known as TLS 1.0 and is effectively an upgrade to SSL 3.0.

TLS was designed to operate on top of a reliable transport protocol such as TCP. However, it has also been adapted to run over datagram protocols such as UDP.

The TLS protocol is designed to provide three essential services to all applications running above it: encryption, authentication, and data integrity. Technically, you are not required to use all three in every situation. You may decide to accept a certificate without validating its authenticity, but you should be well aware of the security risks and implications of doing so. In practice, a secure web application will leverage all three services.

⁃ Encryption: A mechanism to obfuscate what is sent from one computer to another. ⁃ Authentication: A mechanism to verify the validity of provided identification material. ⁃ Integrity: A mechanism to detect message tampering and forgery.

In order to establish a cryptographically secure data channel, the connection peers must agree on which ciphersuites will be used and the keys used to encrypt the data. The TLS protocol specifies a well-defined handshake sequence to perform this exchange, which we will examine in detail in TLS Handshake. The ingenious part of this handshake, and the reason TLS works in practice, is its use of public key cryptography (also known as asymmetric key cryptography), which allows the peers to negotiate a shared secret key without having to establish any prior knowledge of each other, and to do so over an unencrypted channel. Before the client and the server can begin exchanging application data over TLS, the encrypted tunnel must be negotiated: the client and the server must agree on the version of the TLS protocol, choose the ciphersuite, and verify certificates if necessary.

Unfortunately, each of these steps requires new packet roundtrips (Figure 4-2) between the client and the server, which adds startup latency to all TLS connections.

New TLS connections require two roundtrips for a “full handshake.”

In practice, most web applications attempt to establish multiple connections to the same host to fetch resources in parallel, which makes session resumption a must-have optimization to reduce latency and computational costs for both sides. Most modern browsers intentionally wait for the first TLS connection to complete before opening new connections to the same server: subsequent TLS connections can reuse the SSL session parameters to avoid the costly handshake.

Chain of Trust and Certificate Authorities

Authentication is an integral part of establishing every TLS connection. After all, it is possible to carry out a conversation over an encrypted tunnel with any peer, including an attacker, and unless we can be sure that the computer we are speaking to is the one we trust, then all the encryption work could be for nothing. To understand how we can verify the peer’s identity, let’s examine a simple authentication workflow between Alice and Bob: Both Alice and Bob generate their own public and private keys. Both Alice and Bob hide their respective private keys. Alice shares her public key with Bob, and Bob shares his with Alice. Alice generates a new message for Bob and signs it with her private key. Bob uses Alice’s public key to verify the provided message signature.

Authentication is an integral part of establishing every TLS connection. After all, it is possible to carry out a conversation over an encrypted tunnel with any peer, including an attacker, and unless we can be sure that the computer we are speaking to is the one we trust, then all the encryption work could be for nothing. To understand how we can verify the peer’s identity, let’s examine a simple authentication workflow between Alice and Bob: Both Alice and Bob generate their own public and private keys. Both Alice and Bob hide their respective private keys. Alice shares her public key with Bob, and Bob shares his with Alice. Alice generates a new message for Bob and signs it with her private key. Bob uses Alice’s public key to verify the provided message signature.

Because all TLS sessions run over TCP, all the advice for optimizing for TCP applies here as well. If TCP connection reuse was an important consideration for unencrypted traffic, then it is a critical optimization for all applications running over TLS—if you can avoid doing the handshake, do so.

As we discussed in Chapter 1, we cannot expect any dramatic improvements in latency in the future, as our packets are already traveling within a small constant factor of the speed of light. However, while we may not be able to make our packets travel faster, we can make them travel a shorter distance!

The simplest way to accomplish this is to replicate or cache your data and services on servers around the world instead of forcing every user to traverse across oceans and continental links to the origin servers. Of course, this is precisely the service that many content delivery networks (CDNs) are set up to offer.

In a nutshell, move the server closer to the client to accelerate TCP and TLS handshakes!

Unlike the tethered world, where a dedicated wire can be run between each network peer, radio communication by its very nature uses a shared medium: radio waves, or if you prefer, electromagnetic radiation. Both the sender and receiver must agree up-front on the specific frequency range over which the communication will occur; a well-defined range allows seamless interoperability between devices. For example, the 802.11b and 802.11g standards both use the 2.4–2.5 GHz band across all WiFi devices.

Finally, it is also worth noting that not all frequency ranges offer the same performance. Low-frequency signals travel farther and cover large areas (macrocells), but at the cost of requiring larger antennas and having more clients competing for access. On the other hand, high-frequency signals can transfer more data but won’t travel as far, resulting in smaller coverage areas (microcells)

Our brief crash course on signal theory can be summed up as follows: the performance of any wireless network, regardless of the name, acronym, or the revision number, is fundamentally limited by a small number of well-known parameters. Specifically, the amount of allocated bandwidth and the signal-to-noise ratio between receiver and sender. Further, all radio-powered communication is: Done over a shared communication medium (radio waves) Regulated to use specific bandwidth frequency ranges Regulated to use specific transmit power rates Subject to continuously changing background noise and interference Subject to technical constraints of the chosen wireless technology Subject to constraints of the device: form factor, power, etc. The Hypertext Transfer Protocol (HTTP) is one of the most ubiquitous and widely adopted application protocols on the Internet: it is the common language between clients and servers, enabling the modern web.

Both the request and response headers were kept as ASCII encoded, but the response object itself could be of any type: an HTML file, a plain text file, an image, or any other content type. Hence, the “hypertext transfer” part of HTTP became a misnomer not long after its introduction. In reality, HTTP has quickly evolved to become a hypermedia transport, but the original name stuck.

Requiring a new TCP connection per request imposes a significant performance penalty on HTTP 1.0; see Three-Way Handshake, followed by Slow-Start.

The first and most obvious difference is that we have two object requests, one for an HTML page and one for an image, both delivered over a single connection. This is connection keepalive in action, which allows us to reuse the existing TCP connection for multiple requests to the same host and deliver a much faster end-user experience.

For a good reference on all the inner workings of the HTTP protocol, check out O’Reilly’s HTTP: The Definitive Guide by David Gourley and Brian Totty.

The Hypertext Transfer Protocol (HTTP) is an application-level protocol for distributed, collaborative, hypermedia information systems. It is a generic, stateless, protocol that can be used for many tasks beyond its use for hypertext, such as name servers and distributed object management systems, through extension of its request methods, error codes and headers. A feature of HTTP is the typing and negotiation of data representation, allowing systems to be built independently of the data being transferred.

The performance of your application, especially the first load and the “time to render” depends directly on how this dependency graph between markup, stylesheets, and JavaScript is resolved. Incidentally, recall the popular “styles at the top, scripts at the bottom” best practice? Now you know why! Rendering and script execution are blocked on stylesheets; get the CSS down to the user as quickly as you can.

Now, add up the network latency of a DNS lookup, followed by a TCP handshake, and another few roundtrips for a typical web page request, and much, if not all, of our 100–1,000 millisecond latency budget can be easily spent on just the networking overhead.

Jakob Nielsen’s Usability Engineering and Steven Seow’s Designing and Engineering Time are both excellent resources that every developer and designer should read! Time is measured objectively but perceived subjectively, and experiences can be engineered to improve perceived performance.

No discussion on web performance is complete without a mention of the resource waterfall. In fact, the resource waterfall is likely the single most insightful network performance and diagnostics tool at our disposal.

To start, it is important to recognize that every HTTP request is composed of a number of separate stages (Figure 10-3): DNS resolution, TCP connection handshake, TLS negotiation (if required), dispatch of the HTTP request, followed by content download. The visual display of these individual stages may differ slightly within each browser, but to keep things simple, we will use the WebPageTest version in this chapter.

To start, it is important to recognize that every HTTP request is composed of a number of separate stages (Figure 10-3): DNS resolution, TCP connection handshake, TLS negotiation (if required), dispatch of the HTTP request,

Unlike the resource waterfall, where each record represents an individual HTTP request, the connection view shows the life of each TCP connection—all 30 of them in this case—used to fetch the resources for the Yahoo! homepage.

Ergo, we are led to conclude that an average consumer in the United States would not benefit much from upgrading the available bandwidth of her connection if she is interested in improving her web browsing speeds. She may be able to stream or upload larger media files more quickly, but the pages containing those files will not load noticeably faster: bandwidth doesn’t matter, much.

When analyzing performance data, always look at the underlying distribution of the data: throw away the averages and focus on the histograms, medians, and quantiles. Averages lead to meaningless metrics when analyzing skewed and multimodal distributions.

Steve Souders’ High Performance Web Sites offers great advice in the form of 14 rules, half of which are networking optimizations:

Each TCP connection begins with a TCP three-way handshake, which takes a full roundtrip of latency between the client and the server. Following that, we will incur a minimum of another roundtrip of latency due to the two-way propagation delay of the HTTP request and response. Finally, we have to add the server processing time to get the total time for every request.

Hence, a simple optimization is to reuse the underlying connection! Adding support for HTTP keepalive (Figure 11-2) allows us to eliminate the second TCP three-way handshake, avoid another round of TCP slow-start, and save a full roundtrip of network latency.

The ability to break down an HTTP message into independent frames, interleave them, and then reassemble them on the other end is the single most important enhancement of HTTP 2.0.

Most HTTP transfers are short and bursty, whereas TCP is optimized for long-lived, bulk data transfers. By reusing the same connection between all streams, HTTP 2.0 is able to make more efficient use of the TCP connection.

A typical web application consists of dozens of resources, all of which are discovered by the client by examining the document provided by the server. As a result, why not eliminate the extra latency and let the server push the associated resources to the client ahead of time? The server already knows which resources the client will require; that’s server push. In fact, if you have ever inlined a CSS, JavaScript, or any other asset via a data URI (see Resource Inlining), then you already have hands-on experience with server push! By manually inlining the resource into the document, we are, in effect, pushing that resource to the client, without waiting for the client to request it.

At the core of all HTTP 2.0 improvements is the new binary, length-prefixed framing layer. Compared with the newline delimited plaintext HTTP 1.x, binary framing offers more compact representation and is both easier and more efficient to process in code.

Once an HTTP 2.0 connection is established, the client and server communicate by exchanging frames, which serve as the smallest unit of communication within the protocol.

Reduce DNS lookups Every hostname resolution requires a network roundtrip, imposing latency on the request and blocking the request while the lookup is in progress. Reuse TCP connections Leverage connection keepalive whenever possible to eliminate the TCP handshake and slow-start latency overhead; see Slow-Start. Minimize number of HTTP redirects HTTP redirects can be extremely costly, especially when they redirect the client to a different hostname, which results in additional DNS lookup, TCP handshake latency, and so on.

The optimal number of redirects is zero. Use a Content Delivery Network (CDN) Locating the data geographically closer to the client can significantly reduce the network latency of every TCP connection and improve throughput. This advice applies both to static and dynamic content; see Uncached Origin Fetch. Eliminate unnecessary resources No request is faster than a request not made.

Cache resources on the client Application resources should be cached to avoid re-requesting the same bytes each time the resources are required. Compress assets during transfer Application resources should be transferred with the minimum number of bytes: always apply the best compression method for each transferred asset. Eliminate unnecessary request bytes Reducing the transferred HTTP header data (i.e., HTTP cookies) can save entire roundtrips of network latency. Parallelize request and response processing Request and response queuing latency, both on the client and server, often goes unnoticed, but contributes significant and unnecessary latency delays. Apply protocol-specific optimizations HTTP 1.x offers limited parallelism, which requires that we bundle resources, split delivery across domains, and more. By contrast, HTTP 2.0 performs best when a single connection is used and HTTP 1.x specific optimizations are removed.

A full discussion on browser security requires its own separate book. If you are curious, Michal Zalewski’s The Tangled Web: A Guide to Securing Modern Web Applications is a fantastic resource.

Prior to XHR, the web page had to be refreshed to send or fetch any state updates between the client and server. With XHR, this workflow could be done asynchronously and under full control of the application JavaScript code. XHR is what enabled us to make the leap from building pages to building interactive web applications in the browser.

The XHR interface enforces strict HTTP semantics on each request: the application supplies the data and URL, and the browser formats the request and handles the full lifecycle of each connection. Similarly, while the XHR API allows the application to add custom HTTP headers (via the setRequestHeader() method), there are a number of protected headers that are off-limits to application code: Accept-Charset, Accept-Encoding, Access-Control-_ Host, Upgrade, Connection, Referer, Origin Cookie, Sec-_, Proxy-*, and a dozen others… The browser will refuse to override any of the unsafe headers, which guarantees that the application cannot impersonate a fake user-agent, user, or the origin from where the request is being made. In fact, protecting the origin header is especially important, as it is the key piece of the “same-origin policy” applied to all XHR requests.

However, while necessary, the same-origin policy also places severe restrictions on the usefulness of XHR: what if the server wants to offer a resource to a script running in a different origin? That’s where “Cross-Origin Resource Sharing” (CORS) comes in! CORS provides a secure opt-in mechanism for client-side cross-origin requests:

CORS requests use the same XHR API, with the only difference that the URL to the requested resource is associated with a different origin from where the script is being executed: in the previous example, the script is executed from (http, example.com, 80), and the second XHR request is accessing resource.js from (http, thirdparty.com, 80). The opt-in authentication mechanism for the CORS request is handled at a lower layer: when the request is made, the browser automatically appends the protected Origin HTTP header, which advertises the origin from where the request is being made. In turn, the remote server is then able to examine the Origin header and decide if it should allow the request by returning an Access-Control-Allow-Origin header in its response:

In the preceding example, thirdparty.com decided to opt into cross-origin resource sharing with example.com by returning an appropriate access control header in its response. Alternatively, if it wanted to disallow access, it could simply omit the Access-Control-Allow-Origin header, and the client’s browser would automatically fail the sent request.

If the third-party server is not CORS aware, then the client request will fail, as the client always verifies the presence of the opt-in header. As a special case, CORS also allows the server to return a wildcard (Access-Control-Allow-Origin: *) to indicate that it allows access from any origin. However, think twice before enabling this policy! With that, we are all done, right? Turns out, not quite, as CORS takes a number of additional security precautions to ensure that the server is CORS aware: CORS requests omit user credentials such as cookies and HTTP authentication. The client is limited to issuing “simple cross-origin requests,” which restricts both the allowed methods (GET, POST, HEAD) and access to HTTP headers that can be sent and read by the XHR.

To enable cookies and HTTP authentication, the client must set an extra property (withCredentials) on the XHR object when making the request, and the server must also respond with an appropriate header (Access-Control-Allow-Credentials) to indicate that it is knowingly allowing the application to include private user data. Similarly, if the client needs to write or read custom HTTP headers or wants to use a “non-simple method” for the request, then it must first ask for permission from the third-party server by issuing a preflight request:

XMLHttpRequest is what enabled us to make the leap from building pages to building interactive web applications in the browser. First, it enabled asynchronous communication within the browser, but just as importantly, it also made the process simple. Dispatching and controlling a scripted HTTP request takes just a few lines of JavaScript code, and the browser handles all the rest: Browser formats the HTTP request and parses the response. Browser enforces relevant security (same-origin) policies. Browser handles content negotiation (e.g., gzip). Browser handles request and response caching. Browser handles authentication, redirects, and more…

Similarly, there is no one best strategy for delivering real-time updates with XHR. Periodic polling incurs high overhead and message latency delays. Long-polling delivers low latency but still has the same per-message overhead; each message is its own HTTP request. To have both low latency and low overhead, we need XHR streaming! As a result, while XHR is a popular mechanism for “real-time” delivery, it may not be the best-performing transport for the job. Modern browsers support both simpler and more efficient options, such as Server-Sent Events and WebSocket. Hence, unless you have a specific reason why XHR polling is required, use them.

WebSocket is the only transport that allows bidirectional communication over the same TCP connection (Figure 17-2): the client and server can exchange messages at will. As a result, WebSocket provides low latency delivery of text and binary application data in both directions.

XHR is optimized for “transactional” request-response communication: the client sends the full, well-formed HTTP request to the server, and the server responds with a full response. There is no support for request streaming, and until the Streams API is available, no reliable cross-browser response streaming API. SSE enables efficient, low-latency server-to-client streaming of text-based data: the client initiates the SSE connection, and the server uses the event source protocol to stream updates to the client. The client can’t send any data to the server after the initial handshake.