Wednesday, October 12, 2011

Data Throughput with HTTP, SMB and NRPC - It Isn't Just Network Latency

Customers are often puzzled by the "speed difference" seen when transferring data via Lotus Notes, as compared to HTTP or SMB file transfers; these differences are most often seen in conditions of high network latency, but they are always evident to some degree.  What most folks DON'T realize is that "network latency" is only an exacerbating factor; the real "root cause" reasons are inherent differences among the protocols.  Let's go deeper...

HTTP is, for all intents and purposes, a streaming protocol.  A single request, such as an HTTP GET, results in a (sometimes) lengthy, but uninterrupted, stream of data in response.  It's just "GET", "200 OK" and a blast of data; the protocol performs no metering or interruption of the inbound data flow.  Basically, it looks like this:

GET /booga.zip HTTP/1.0

200 OK <followed by all of booga.zip in one stream>

The limiting factors are, in this case, at the TCP/IP layer itself, in the form of small TCP windows, TCP's "slow start" behavior, and/or congestion avoidance mechanisms.

SMB has its own limiting factor - one that most users of "disk shares" don't know.  SMB/CIFS requests are limited to 64Kb in size; thus, any file transfer is a series of requests, each of which is up to 64Kb in size.  So, no matter how "fast" one's network might be, SMB still operates in 64Kb "chunks" and must issue a new request for each "chunk."  Transferring our booga.zip now looks like this:

Create AndX booga.zip

Create AndX

Read AndX offset 0x0 data 0xf000

Read AndX (64K chunk of data)

Read AndX offset 0xf000 data 0xf000

Read AndX (next 64K chunk of data)

(Repeat until all data transferred...for a 2Mb file, about 32 iterations)

Now, each of these is a network transaction, so now we have a full network round-trip added for each ReadX request, plus the 64Kb blocking overhead on both ends; you can see why large data transfers with SMB are more sensitive to network latency than are HTTP transfers. (The astute among you also see why I NEVER recommend the use of SMB shares for frequently accessed data, such as personal Notes mailfiles...)

Turning to Notes NRPC (aka "Lotus Notes"), we find ourselves somewhere between these two extremes.  Behind the scenes, Lotus Notes/Domino is a transactional database system; even your "mailfile" is a database, and what you see as "an email message" is actually dozens of entries in that database.  It's all fields and attributes, which are handled individuallly.  They're all individual transactions, which means that--while individual transactions may involve large volumes of data--there's a constant back-and-forth between client and server.  Even opening (or replicating) a single email message triggers "Give me this part"..."OK, now this part"..."I'm ready for this part"...and, while not subjected to any particular metering or "chunking" (as is SMB), each of those transactions still requires an extra round-trip between server and client - which is where network latency can hit and hit hard - just as it does with SMB.

Unfortunately, there are no easy solutions for such environments.  You can adjust TCP windows, and (in the case of Lotus Notes), increasing the size of the TCP port buffers can provide some relief, but you'll eventually come up against the laws of physics - the electrons can only move so fast...and now you know how choice of protocols can make a BIG difference in performance when the network is slow...

1 comment:

Sam Sawatzky said...

Thanks, Wes!