C-Kermit 7.0 Case Study #10

[ Previous ] [ Next ] [ Index ] [ C-Kermit Home ] [ Kermit Home ]

Article: 10971 of comp.protocols.kermit.misc
From: fdc@watsun.cc.columbia.edu (Frank da Cruz)
Newsgroups: comp.protocols.kermit.misc
Subject: Case Study #10: Atomic File Movement
Date: 18 Jan 2000 00:06:57 GMT
Organization: Columbia University

Also see:

Today let's look at the common situation in which files must be moved from one computer to another for processing on a regular basis. For example, daily business receipts are sent from a branch office or franchise to company headquarters, or medical or pharmaceutical insurance claims from a doctor's office, hospital, or pharmacy to a claims clearinghouse. Each file contains a series of financial transactions, so we need to ensure that each transaction occurs once and only once, and when it occurs, it occurs completely and correctly. Of course other applications can be imagined too.

Let's call the two parties "Branch" and "Headquarters" (HQ). In a typical scenario, Branch collects files (e.g. from each operator station) into a directory and then transmits them every evening to HQ. The connection can be made by traditional (non-PPP) dialup or by network. Of course Kermit is equally suited to both. (That's a strong point of Kermit, remember? For example, if you normally use a network connection but the net is broken, you can fall back on old-fashioned dialup using the same script if it is well-designed.)

The procedures for making the connection are well documented in the Kermit manuals. Let's assume we have a connection already, we have already authenticated or logged in, and there is a Kermit server on the far end. Let's also assume that our current directory on the local computer contains the files we need to send, and there are many of them. Of course we can just tell the local Kermit to "SEND *.*" or whatever, but:

If there is a separate process on the receiving computer that monitors the incoming-files directory for new files, how does it know when a file is completely uploaded? We don't want it to process the file while it is still being received, or if it was only partially received.
What happens if the connection breaks and we have to start again? We don't want HQ to receive multiple copies of the same transaction. (Obviously there should be other safeguards but we won't discuss them here.)

There are several approaches to the second problem, but the best one is Kermit's new "atomic file movement" feature, which also happens to solve the first. In this case "atomic" is used in the computer-science sense, not the physics one :-) The command is simple:

  SEND /DELETE *.*

This means, send all the files whose names match "*.*" (or any other pattern or filename) and delete each one as soon as, and only if, it was sent successfully (MOVE is a synonym for SEND /DELETE). Alternatively, you can use:

  SEND /MOVE-TO:xxxx *.*

which, instead of deleting each successfully sent file, moves it to the directory named xxxx. (A third choice, SEND /RENAME-TO:, is described in the update notes.)

Now if the connection is lost, you can make a new connection and give the same SEND /DELETE or SEND /MOVE-TO command again, and it sends only the files that were not already sent successfully, because the ones that were are gone.

Meanwhile, back at Headquarters we encounter the classic conundrum: how to know when a file has been completely uploaded? Let's suppose some process at HQ (besides Kermit) waits for new files to appear in the upload directory. Well, each file "appears" as soon as it is opened, but it might be open for some time while the Kermit receiver is writing new material to it (the same is true, of course, for FTP). We don't want to start processing it until it has arrived completely, but we also don't want to wait forever.

Here again, atomic file movement is the answer. If the Kermit server at HQ is given the command:

  SET RECEIVE MOVE-TO xxxx

(where xxxx is the name of a directory), this tells it to move each received file to the specified directory after, and only if, it is received successfully. So the script to start up the server at HQ might look like this:

  cd /incoming/tmp/
  set receive move-to /incoming/ready/
  server
  exit

The underlying API is chosen to be atomic; for example the UNIX rename() system call is used (or link() when rename() is not available); the instant the file appears in the /incoming/ready/ directory, it's ready to use and not in the middle of being copied. And it won't come back to haunt you again after processing, because the Branch won't upload it again. (Note, however, than in some operating systems, including UNIX, files can not be renamed across disk boundaries.)

As for making sure the files get through despite repeated disconnections, see the deliver script in the C-Kermit script library or on page 453 of Using C-Kermit.

For details about atomic file movement, see Sections 4.0.8, 4.1.3, and 4.7 of the C-Kermit 7.0 Update Notes.

- Frank

From: Mark Sapiro <msapiro@value.net>
Subject: Re: Case Study #10: Atomic File Movement
Date: 26 Jan 2000 02:18:18 GMT

Frank posted a tutorial on the features in C-Kermit for "atomic" file movement.

It seems to me however, that there must still be a window, albeit a small one during which a connection can be lost and the sender will believe the file has been successfully sent and the receiver will not or vice versa.

I don't know the details of the protocol well enough to know exactly what scenario can occur, but I assume the sender sends a "file complete" packet of some kind. Perhaps this packet gets lost when the connection goes down. The sender may assume the file is successfully sent, but the receiver doesn't know it.

Or perhaps the sender needs an ACK to this packet which the receiver sends and this is the packet that is lost. Then the receiver knows it has received the whole file, but the sender doesn't.

Am I missing something here, or is this a problem?

--
Mark Sapiro <msapiro@value.net>
San Francisco Bay Area, California

From: jrd@cc.usu.edu (Joe Doupnik)
Subject: Re: Case Study #10: Atomic File Movement
Date: 25 Jan 00 20:30:49 MDT
Organization: Utah State University

May I relay a short story on this subject? It pervades networking and most other two party exchanges. Good, it makes a nifty one to tell the person next to you on the plane.

Once upon a time there were two armies fighting a war. One army was in a valley, the other was split on two hilltops. The hill army can win if, and only if, the two components can attack in unison. Naturally they send message back and forth "Let's attack at noon" etc. They may be intercepted, faked, changed, misunderstood, and anything else we can imagine to keep the guy next to us occupied thinking.

The question is, can the hill army be guaranteed to win? If so what are the vital components of the messages passing back and forth.

This problem is stated and discussed famously by Andrew Tanenbaum in his book "Computer Networks." The answer is there is no guarantee, there is no finite sequences of messages that clinch the mutual decision. It is a tail recursion of "how do I know that they know that I know that they know, etc" If there were a last required message to do the deed then it could be lost/garbled/faked, and confirmation would be needed. Thus there isn't a last required message.

With that there is no "closed form" solution in any protocol at all that guarantees that what was sent is what was received and both sides know it firmly. Only approximations exist, even if many or all messages are delivered and understood correctly.

The Kermit protocol has an end of file packet, signifying what it says. It requires and ACK before the protocol will proceed to the next thing to do. The ACK is well protected, but cannot be perfectly protected. It can be lost and the EOF can be repeated, etc. Successful reception of the ACK tells the file sender the receiver is pleased, but of course the file receiver won't know that until new work arrives. Here progress is implied by a rigid set of rules concerning what can be done next, and reception of the next thing to do implies the preceding was completed by both ends. Or it could result from an error implementing the protocol or even a fortuitous garble on the wire which creates just the right message by mistake.

Thus sending the ACK to EOF tells the receiver to keep the file and await new things to do. The file transmitter may miss it and keep trying until tired. The two perspectives may differ even though the file has been moved. The two perspectives will agree if the ACK is not sent; it will be "not done yet."

That's my story for tonight.
Joe D.

From: fdc@watsun.cc.columbia.edu (Frank da Cruz)
Subject: Re: Case Study #10: Atomic File Movement
Date: 26 Jan 2000 14:47:39 GMT
Organization: Columbia University

As Joe explained, the file sender sends an End-of-File (Z) packet after the end of the file. So the sender knows the whole file was sent.

The file receiver might or might not get the Z packet. If the Z packet does not arrive, the protocol times out and recovery action is taken; ultimately the Z packet is retransmitted until an affirmative response is received, or the retransmission limit is exceeded, or the connection is observed to be broken. In any of these failure cases, the sender knows the transfer was not successful, and therefore does not delete, move, or rename the source file.

Once the file receiver gets the Z packet, it acknowledges it. The file sender might or might not get the acknowledgement. If it doesn't, the protocol times out and recovery action is taken. If the recovery action fails, the sender does not know if the transfer was successful, and therefore does not delete, move, or rename the source file.

If the sender receives the acknowledgement, it knows that the receiver got the whole file, and so it can delete, move, or rename it.

Therefore, any error condition -- including loss of connectivity -- triggers the conservative response: keep the source file. It is better to send it more than once than less than once. By design, the protocol might seem to fail when it succeeds, but it should never seem to succeed when it fails.

By the way, what makes Kermit somewhat immune to the two-armies problem, also known as the three-way-handshake problem, is that the Z packet and its ACK are not the final stage of Kermit protocol. The file receiver does not exit the protocol or close the connection after acknowledging the Z packet. In fact, the whole protocol is protected by an "outer layer" that has no consequences at the file level. If this outer layer is disturbed at the end (in the typical case, by premature disconnection) there might be an annoying delay, but no harm is done.

- Frank

From: jrd@cc.usu.edu (Joe Doupnik)
Subject: Re: Case Study #10: Atomic File Movement
Date: 26 Jan 00 10:51:36 MDT
Organization: Utah State University

A "friendly amendment." While the Kermit protocol, and TCP, do an acceptable job of confirming stages of work are completed, those techniques do not remove ambiguity. Frank correctly states "somewhat immune." Old packets whose sequence numbers have wrapped to the proper current value, badly garbled ones with apparently legit contents (CRC checks are hardly perfect), and packets delivered by mistake to the wrong session, are three serious concerns for protocol designers because they confuse the normal stage by stage confirmations. TCP uses three way handshakes, extra steps to extend sequence numbers in some circumstances, and pseudo headers, to help reduce false indications. Kermit does a pretty good job too, but not to the extent that TCP goes.

The two hill army problem remains when one gets serious about comms. As stated, there is no certainty in the exchange, only approximation to it.

Joe D.

From: fdc@watsun.cc.columbia.edu (Frank da Cruz)
Newsgroups: comp.protocols.kermit.misc
Subject: Re: Case Study #10: Atomic File Movement
Date: 28 Jan 2000 02:01:39 GMT
Organization: Columbia University

I meant to get back to this earlier, (so as not to leave an unsettling impression with readers who don't study these topics) but better late than never.

I believe most of Joe's observations pertain more to TCP and IP than to Kermit:

Old packets whose sequence numbers have wrapped. This can happen in TCP/IP because it's a worldwide packet-switched network. A TCP packet (encapsulated within an IP packet) can be stuck in the network for minutes, hours, days, or weeks, and then show up after the sequence number space has recycled one or more times, and then it can cause trouble unless there is a higher (than TCP) level of checking. But Kermit connections are either point-to-point in fact, or in effect, so packets don't lurk in odd crannies of the world and reappear at a later time -- at least not late enough to cause confusion about packet numbers. Why? Because (in the non-streaming case) every Kermit packet must be acknowledged. The window can't be larger than half the sequence number space, and it can't advance until the oldest packet in the window is acknowledged. This technique, called "sliding windows with selective retransmission", is more conservative and robust than the technique TCP uses in preventing packet sequence number ambiguity.
Packets are delivered by mistake to the wrong session. Can't happen in Kermit because there is only one session.
Packets can be garbled to look like other packets. Yes, this can happen in any communications protocol with some calculable probability. But let's look at the consequence in the context of atomic file movement. First, it is possible (but highly unlikely) that a data packet can be corrupted in such a way that its CRC will still be correct, thus allowing bad data into a file (but only if the packet sequence number, length, and other controls remain valid). Of course this can happen in any communications protocol; there is a whole literature on the subject. But what about the progress of the protocol itself? Each possible happenstance and its consequences can be examined in turn. For example, an ACK can turn into a NAK with the same sequence number. No harm is done. A NAK can turn into an ACK for the same packet. Again, no harm is done (because the seemingly ACK'd packet will be missing at the receiver, and this will cause the transfer to fail eventually.) An ACK is turned into something besides an ACK or NAK: then we have an illegal packet type and the transfer fails. An ACK is turned into an ACK with a different sequence number; if it's an "old" sequence number it is ignored and no harm is done; if it's a "new" one, the sender will catch the error ("You ACK'd a packet I didn't send"). And so on.

In other words, I think it is safe to say that the chances are practically negligible that a Kermit transfer will appear to succeed when it failed. Except perhaps for possible data corruption, which all protocols are subject to; as noted in the literature, the number of errors that a CRC will not catch is very small, and the probability that exactly such an error will occur, out of all the kinds of errors that can occur, is much smaller still.

And in fact, in 20 years of experience with Kermit transfers, I can't recall a single confirmed report of the protocol reporting success when the transfer failed.

- Frank

From: jrd@cc.usu.edu (Joe Doupnik)
Subject: Re: Case Study #10: Atomic File Movement
Date: 27 Jan 00 20:26:23 MDT
Organization: Utah State University

Faithful readers can see an academic discussion right away. What Frank and I are doing here is exploring the outer limits of protocols to see just how close one can come to meeting a requirement that both file transmiter and receiver agree that the file has made it across intact. It's an interesting puzzle, actually, because we learn neat things about particular protocols.

Take the stray ACK arriving out of blue at the worse possible moment (masquerading as that very ACK from another session). The stray can readily happen because the comms pathway is long, such as the Internet with its many routers and paths. Ok, so it could happen, yet it would be rare indeed to line up just so. What if it did, however. An ACK tells the transmitter to dismiss the held packet and worry about the next. The file isn't changed at this point. But the receiver has its own ideas of propriety and needs the packet which is missing but is covered by that stray ACK. So, the receiver, being a good protocol engine, declines to let that go unnoticed and insists upon the missing packet being filled in soon; it won't leave home without it. Whew!

A more serious stray is a data packet arriving just before the real one and the real one is rejected as a duplicate (same sequence number etc). Here we get corruption and don't know it. Bad guys like to play tricks like this with security related traffic. Such spoofing can be taken to extremes of masquerading as the real file sender or receiver, and so on.

On getting sessions muddled. The way this can happen is to run two Kermits over the same underlying transport protocol stack, say TCP/IP but it could be others. A "small mistake, fixed in the next release" could deliver a packet to the wrong Kermit. Nothing bad happens unless sequence numbers and packet kind line up just right, but in principle they might when the damage caused is greatest (someone or other's principle). Things of this kind cause rotten comms and we do something about it well before file transfers could make mistakes.

By the way, if this happens with TCP/IP then a nasty message is returned from that wrong stack and the affected session can be abruptly terminated. Kermit is forgiving and keeps on running.

Garbling packets to look like other pristine packets is much more difficult indeed, an art form almost. But it could happen.

Of these three situations the least likely to cause trouble is the third, garbling to look nice. It's just too hard to get a good match. But there is that opening where bytes get swapped wrong in a buffer deep down in the hardware and by chance the CRC check is ok. The first kind, stray packet, is more likely, but not frequent enough to draw attention of diligent system managers. We would have to have delays arranged just right for sequence numbers to have gone round one whole cycle. That's a lot of storage in the net, and long term storage there is very unlikely (it's tough getting short term storage, hence packet loss under congestion). The most easily arranged boo-boo is that mis-delivery with parallel Kermits, because it's a programming error (the error would happen a lot) or a hardware glitch (much less likely). Even then things would have to line up just so to make a difference.

What I am saying is, guarantees can't be absolute. They are normally very very good indeed, but not absolute. Making a typing error denoting the file of interest is vastly more likely to occur, not to mention all the errors going into making the file and reading it later (given today's standards in programming, sigh).

If you are wondering if the Kermit protocol is more immune to these things than say TCP/IP. The answer is, I do believe, Kermit is more robust than TCP/IP; it has slightly fewer windows of opportunity.

Frank is saying the same things, but he is trying to be more reassuring. Fine. What I am saying is absolute certainty is unobtainable, no matter where one chooses to draw the line over which data flows. Please do check your SCSI bus and RAID controller cache memory, and type more carefully next time.

Now wasn't that educational? I thought so; rather fun too.

Joe D.

In article <nJLJ3X13aW$a@cc.usu.edu>, Joe Doupnik <jrd@cc.usu.edu> wrote:
:
: ... But there is that opening where bytes get swapped wrong in a
: buffer deep down in the hardware and by chance the CRC check is ok.
:
Swapped bytes are not that uncommon. Some years ago, we discovered that our own terminal servers had this problem under heavy load. Checksums do not catch swapped bytes; CRCs do (except perhaps in few pathological cases). C-Kermit 7.0 and K95 use CRCs by default.

As Joe says, if you are transferring files with Kermit over a TCP/IP connection that has bugs in it and delivers TCP packets to the wrong session, then confusion over Kermit packets, while astronomically improbable, is indeed possible, but (as Joe also says) it's the least of your problems. If TCP is delivering packets to the wrong program, think what must be happening to your passwords and credit card numbers as you surf the web! In any case, it would be unlikely that you could even log in to a host and begin a Kermit transfer under such conditions.

At least Kermit provides several layers of protection on top of TCP/IP: CRCs, sequence numbers, packet framing, and the finite state automaton of the protocol engine, which requires legal sequences of events that tend not to occur by chance. Other TCP/IP applications, such as FTP and Telnet, do no error checking at all; they simply assume an error-free connection. If the underlying TCP connection is faulty, FTP and Telnet will not detect it, and certainly will not recover from it.

: What I am saying is, guarantees can't be absolute.
:
That's true. The fact that some event has one chance in a trillion of taking place does not guarantee it won't happen to you. But we base almost everything we do on probabilities -- the chances of a car wreck, a plane crash, of getting botulism from a jar of peanut butter, or that the PC we sit in front of every day will explode in our faces.

: Please do check your SCSI bus and RAID controller cache memory, and
: type more carefully next time.
:
This is another important point. The best protocol in the world won't protect your data from interrupt conflicts and similar problems on its way from your computer's memory (after the protocol is finished with it) to the disk.

In many cases we have easy ad-hoc ex-post-facto integrity checks at our disposal. If we have transferred an executable program, can we execute it? If it's a ZIP or GZIP file, can we unzip it? If it's a graphics image, can we view it? If it's a Microsoft Word document, can we load it into Word and does it look about right (i.e. not like Klingon)? If it's C source code, does it still compile? These are not guarantees; they can give false positives, but you'll rarely get a false negative :-)

In general, the chances that a structured computer file has been damaged in such a way that its associated application won't notice are relatively small -- not zero, but small. Plain text is the most vulnerable, since it has no associated application. It is for humans to read. If the word "not" was dropped out of a sentence, how would we know?

When you are transferring a file between like platforms, they often have a utility in common that generates CRCs, checksums, or ciphers over a whole file; in such cases, that utility can be run on the file before and after transfer and the result checked for equality. Examples include the Unix 'sum' and 'md5sum' commands.

Kermit itself offers you a platform-independent check: a 16-bit CRC over the entire file. It is not computed unless you ask for it, since it impacts performance a bit. But in C-Kermit, MS-DOS Kermit, and Kermit 95 you can:

  SET TRANSFER CRC-CALCULATION ON

Then after a file has been transferred, each Kermit's \v(crc16) variable contains the file's CRC (as a decimal numeric string). This check can be used only for binary-mode transfers since text-mode transfers, by definition, change the file and so the before-and-after CRCs can not be expected to agree. Here's an example using a client/server connection that works no matter what platforms are involved (adapted from Using C-Kermit, 2nd Ed, page 361):

  set transfer crc on           ; Required in C-Kermit 7.0 and K95 1.1.17
  set file type binary
  send mission-critical-data.bin
  if fail exit 1 Transfer Failed
  query kermit crc16
  if not = \v(query) \v(crc16) exit 1 CRC mismatch

(The server must also be told to "set transfer crc on".) Again, this is not a guarantee but it's a further check. The probability that Kermit's per-packet checking fails to catch an error and the independent per-file CRCs will match in spite of an error is the product of the separate probabilities. So if the first is 0.0000002 and the second is 0.0000003, the probability of both is 0.0000000000006.

You can add further and further checks if you wish. Are the lengths the same?

  query kermit size(mission-critical-data.bin)
  if not = \v(query) \fsize(mission-critical-data.bin) exit 1 Size mismatch

The ultimate test might be to transfer the file back to the source platform and compare the result byte for byte with the original. But even this is no guarantee since it won't necessarily catch systematic errors, and there is always the minute possibility that an undetectable random error will occur at precisely the same spot in both transfers, in such a way that the second one "corrects" the first. But what is the chance this will happen and that all the other checks also fail to catch an error?

The point of all this is that you can safely place more trust in Kermit than you place in other well-known data transfer applications whose integrity you wouldn't think to question, such as FTP. And if you don't trust Kermit, a wide range of tools and tests are available to boost your confidence level. But as Joe says, there can never be an absolute guarantee.

- Frank

[ Top ] [ Previous ] [ Next ] [ Index ] [ C-Kermit Home ] [ Kermit Home ]

C-Kermit 7.0 / Columbia University / kermit@columbia.edu / 16-28 Jan 2000