[Python-talk] A few more socket related questions

Ben Scott dragonhawk at gmail.com
Fri Aug 28 18:33:42 EDT 2009


On Fri, Aug 28, 2009 at 2:53 PM, <bruce.labitt at autoliv.com> wrote:
> Curious that tftp uses UDP.

  TFTP uses UDP because TFTP is designed to be as simple as possible.
(Hence "Trivial".)  UDP is very simple to implement; all you really
need to do is copy data between buffers, and maybe calculate a
checksum.  TCP requires that you keep state and employs a number of
self-adjusting algorithms for performance.

  You want TFTP simple because that makes a working implementation
smaller, which keeps memory footprint down, which is usually the whole
reason you're using a network boot in the first place.  :-)

> Obviously there is a protocol superimposed over UDP that makes it reliable
> enough to send an OS.

  TFTP includes its own checksum mechanism.  The receiver transmits an
ACK after each datagram "chunk"; the sender does not transmit the next
chunk until it gets the ACK.  If the sender doesn't eventually get the
ACK, it times out and retransmits.  This approach again makes it very
simple to implement.  There's no buffering or windowing; there is no
more than one chunk "in flight" at any given time.  It also makes TFTP
rather slow, especially over high-latency links (like the Internet).

> So UDP is not all BAD.

  UDP and TCP are neither good nor bad.  They're tools.  But it's
important to understand one's tools.  A wrench makes a lousy
screwdriver, even if it's a really good wrench.  :)

>>   As I recall, for TCP, the send(2) call will block if you try to send
>> more data than will fit in the transmit buffer, and won't return until
>> TCP has consumed all the data you gave it.  I think UDP just returns
>> an error.
>
> So using python, having an array that is 160MB, does one need to break it
> into pieces or not?

  With UDP, yes -- UDP datagrams are limited to 64 KiB (minus header
overhead).  With TCP, no -- TCP does all the work for you.

> What you just wrote seems to be inconsistent with
> "just write the data to the socket".

  With TCP, you can just write data to the socket.  However, if you
try to write a lot of data at once, the operation may block.  That
means your program will stop until TCP catches up.  That may be
exactly what you want.

>> > As I understand it, in tcp the client has to create and destroy a
>> > socket(s) for each transaction.
>>
>>   TCP and UDP do not recognize a concept of "transaction".
>
> Bad choice of words.  Did not know how to describe the concept...

  Yah, I figured.  :)  But I didn't want to confuse things further by
assuming *I* knew what *you* meant by the word "transaction".  :-)

> I would be sending arbitrary data, however, the first n bytes contain a
> command, the next m bytes would be either optional data for the command or
> data.  Uggh, as I write this it is clear I need to rethink this if I can
> just send the "array"

  Either one of the above is possible with TCP.  If your only goal is
to just ship the array from one computer to the other, I wouldn't
bother with command and arguments; I'd just dump the data into the
socket.  Maybe you need to differentiate arrays?  If so, you can
probabbly just prefix the array data with some kind of sequence number
or other header.  But certainly, if the application warrants it, some
kind of command architecture may be appropriate.

  I'm not completely clear on your application, but I know about some
of it, so let's consider this example: You have computer ALPHA that
acquires, stores, and manages large piles of data, but processes
slowly.  You have another computer, BRAVO, that processes quickly, but
has limited storage.  So...

  On BRAVO: You have it run a program, ProgB, on startup.  ProgB opens
a socket and listens on -- waiting for an incoming TCP connection.  It
just sits there until it has work to do.

  On ALPHA, you have a program, ProgA, that you run when appropriate,
Maybe when there's enough data, maybe when you tell it to, maybe every
night -- whatever.  ProgA assembles a dataset.  ProgA then creates a
socket and opens a TCP connection to BRAVO.  Once the connection is
up, ProgA starts sending the data.  The send operation will soon block
until BRAVO starts sending TCP acknowledgments.

  On BRAVO: ProgB accepts the connection and attempts to receive data.
 The receive operation will block until there is data to read.  But
since ProgA started sending right away, there should be data to read.

  The network stacks on both ALPHA and BRAVO will handle handshaking
(TCP connection setup), flow control, acknowledgment, error checking,
retransmission, etc.

  ProgA just writes data until it is done.  When it reaches the end of
data, ProgA calls shutdown(3) on the socket, with the SHUT_WR
argument.  That tells the network stack that ProgA is done writing, so
please finish sending data to BRAVO and let it know.  However, ProgA
is reserving the right to *read* data from the socket.  Indeed, it
starts doing just that.  Since there's no data to read yet, ProgA
blocks.

  ProgB just reads data until recv() returns zero, which indicates
end-of-file.  Now ProbB can start doing the processing.  It works and
works and works and works, and finally finishes up with transformed
data.  ProgB then starts writing the transformed data back to the same
socket.  ProbB will block as soon as the send buffer is full.

  But back on ALPHA, ProgA suddenly starts getting data from the
socket.  It reads it back and does whatever it needs to -- store the
result in a file, perhaps.

  So ProgB can keep writing data until *it* finishes.  It then calls
shutdown(3) with the SHUT_RDWR argument, indicating it's done in both
directions.  It can then call listen() on the socket again, to wait
for the next job.

  ProgA will get end-of-file in the same way, call shutdown in the
same way, and close the socket.  It will then finish up and exit.

  That's one possible scenario.  You may need to do something
completely different.  But hopefully it gives you can idea about how
TCP can work.

  One thing of note: Both ends can send and receive data at the same
time -- full duplex.  The above example only does one direction at a
time, because that was all the example needed to do.  But full duplex
is possible if needed.

  Another thing to note: These concepts are from the BSD sockets API,
which is C language.  Python may not expose everything the same way.
But generally, Python makes you life easier by handling details for
you, so that is a good thing.  :)

> The socket would know what size buffer to use via a previously sencommand.

  In Python, I presume tracking the length of the buffer is handled
automatically, and all you have to do is tell Python to send some data
to a socket.

  At the system call level, the arguments to the send(3) include a
pointer to a buffer and the length of the buffer.  That buffer is what
send(3) uses to do its work.

  There may be other buffers involved, too, in the kernel, network
stack, or network hardware.  These will copy data from your program's
buffer and then do things with the data.  This lets your program
submit data and then move on, while the network does network things in
the background.  But you generally don't have to worry about these.

> Optionally I could have a separate socket for every command?

  Yes, with the caveat that I'm not entirely sure what you may be
implying with "command".  But if you mean one program could create a
socket, open() a TCP connection to another host, send a single
command, then shutdown() and close() [destroy] the socket, yes, that
is perfectly possible.

  As I mentioned earlier, if you're sending a large number of small
commands, than the overhead involved in setting up and tearing down
TCP connections may be a performance drag.  But if commands take a
long time to execute, then you may not care.  In which case, by all
means, create new TCP connections for each command, if that's easier.

>>   TCP and UDP do not recognize a concept of "client" or "server".
>> There are sender and receiver.
>>
> I appreciate that.  However, I have to create both the client and server,
> and I need to know how to separate the components.

  I understand.  But again I did not want to assign semantics when I
wasn't sure what you meant.  :-)

>> > Can the server send data to the client without a request?  (Delayed
>> > output?)
>>
>>   You'll have to clarify the above.  I suspect you're assuming
>> semantics that TCP and UDP do not provide.
>
> Indeed, this is really protocol level, isn't it?

  That depends on who you ask.  The OSI seven layer network model is
popular, but doesn't map cleanly on to the Internet Protocol suite.
In the world of the IPv4, you have:

Network layer = This includes IP itself, along with ARP, PPP, and
other support protocols.  It also technically includes the network
card and transmission medium.  Or rather, those concepts are
abstracted away by the network layer.  There is no network card, only
IP.

Protocol layer = Almost always TCP or UDP.  Also technically includes
ICMP (ping) and other rare protocols.

Application layer = Everything above TCP and UDP.  Code you write, or
which is provided by Python libraries.

> Client sends command to Server
>                                                Server receives command,
> sends ack msg
> client knows server got msg
>                                                server sends 'busy' msg
> client knows server is busy
>                                                server sends results
> client finally gets data

  Depending on what you need, the above could be done implicitly, by
using the TCP, and letting the send() and recv() calls block.  TCP
will handle the busy/acknowledgment stuff.  The downside is your
program will stop until the calls unblock.  Alternatively, you can use
TCP but still layer your own commands on top of it, if you want your
program to remain in full control.  There are also ways to let TCP do
the flow control work in the background, and notify your program
asynchronously, or let you check in on progress.

> Thank you, Ben.

  You're very welcome.  :-)

-- Ben


More information about the Python-talk mailing list