So, this is the first post in a series on programming with C, using the latest C11 standard, officially known as ISO/IEC 9899:2011.
I’ve read many books about programming. Nowadays I mostly read books on topics that don’t deal with specific languages, but with code optimization, refactoring, design patterns, team management and the likes.
But I’ve read plenty of programming language specific books on C, C++, C#, Java, Python, Ruby, Erlang and Perl and common to most of them is that the examples they provide are very arbitrary in the sense that they are not very useful for real world programming.
I believe a book is better written to cover a language using some sort of common theme/application, such as graphics, network or any other field-specific programming topic.
That way you get to introduce the language in a useful context, allowing the reader to immediately produce something useful, instead of creating a poorly implemented generic Customer/Orders based application they will hopefully never use in real life.
So, I’ve started working on a book called “Real C, Real Value” that addresses this issue. I’m prepared to use roughly two years writing it, as I do this on my spare time. I work as a C#/C++ developer and have more than enough to keep me occupied during working hours.
Please send me an e-mail if you’d like to be a technical reviewer or proofreader, as I’m planning to publish this book in electronic format to begin with, contacting a publisher only when and if the book becomes popular.
Well, now let’s get started on the topic of this post: Introduction to TCP/IP Sockets Programming using C11.
Why did I choose this topic?
First off, I’ve worked as a TCP/IP instructor, so the topic interests me. Second, networking is all around us all the time, literally speaking.
Third, it makes for a good topic to present new language constructs and techniques in the new C11 standard. Fourth, it is of utmost importance to know TCP/IP and Sockets programming (or at least TCP/IP) if you plan on creating applications that share information with others, in other words if you plan on writing anything that does not live in complete isolation.
I’ll use this first post to introduce the TCP/IP protocol for readers not familiar with the basics of networking on a programmatic level. In a sense this post will cover the bare minimum knowledge needed to follow along with this blog series.
It will not contain any real code yet, but I promise that my next post will!
Ok, enough chatter.
Introduction to the TCP/IP networking protocol(s)
I’m not going to write about the whole history of TCP/IP. There are numerous articles on the web that does this in a remarkably good way, such as Wikipedia. I’ll explain the basics of the protocol in words I believe will serve to gain the understanding of the topic needed to follow my posts, nothing more, nothing less.
Please understand that this introduction only serves as a minimum common denominator of knowledge needed to follow my posts, and that each topic will get more substantial when we start working with the elements in code.
The protocol defines two major parts, TCP and IP.
We’ll start off in the back end with IP.
The IP part of TCP/IP
IP stands for “Internet Protocol” and defines the rules for transferring data from one computer to the next. Mark that I say from computer to computer, not from application to application. The latter is the work of TCP.
An IP adress takes the form: xxx.xxx.xxx.xxx in version 4 of the protocol. These are fields of 4 bytes each, totalling 32 bits of actual data per address. the valid ranges for each octet is 0-255. An example IP address would be something like 192.168.0.1, which is probably the default internal address of your home router/modem used to connect to the internet. Just issue a “ifconfig” request in your *NIX/OS X shell of choice or an “ipconfig” in your cmd prompt on Windows to reveal the information. Here is a restricted sample of my current computers ifconfig:
lo0: flags=8049<up,loopback,running,multicast> mtu 16384
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
inet 127.0.0.1 netmask 0xff000000
inet6 ::1 prefixlen 128
en1: flags=8863<up,broadcast,smart,running,simplex,multicast> mtu 1500
inet6 fe80::226:bbff:fe0b:c465%en1 prefixlen 64 scopeid 0x5
inet 192.168.0.103 netmask 0xffffff00 broadcast 192.168.0.255
This illustrates that my local IP address on the active interface is 192.168.0.103. The 192.168.x.x and 10.x.x.x series of addresses are part of a special range of addresses known as “private” addresses. These addresses are not routable on the internet, but works fine on a local network such as behind a router or ADSL/Cable connection.
The reason why you can connect to the internet when your computer has a private address is because your router/modem acts as a “forwarder” of the data sent from your computer.
It accomplishes this via something called “Network Address Translation” or NAT for short. Your router will have an external, valid internett address obtained from your ISP while connecting and it uses that address to forward any messages coming from its local network. Your local address is kept in a lookup table on the router, so that more than one machine on the same local network may use the same router/modem.
A router uses an IP -> MAC address mapping internally to make sure that the data is sent to the correct receiver, given that a machine on the local network could in theory change IP address from the time a request is made and to the time that the router receives the reply. Though this rarely happen, it is a possibility that the router must be prepared to deal with.
So what is this MAC address? It is NOT an Apple product. It stands for Media Access Control, and is a “unique” series of numbers identifying each physical network interface (think USB Wireless, Network Card etc) that is hard-coded into each interface. From my sample ifconfig output, you can see that my MAC address is:
Minus the “ether” this is a HEX address defining this interface uniquely.
So a mapping in the router may be visualized as:
192.168.0.103 -> 00:26:bb:0b:c4:65
This mapping is kept in something called an ARP table, short for Adress Resolution Table.
The other address worth noting from the ifconfig output is the 127.0.0.1 address.
This is the “Local Loopback” address, another special IP address that’s not routable on the internet.
This address serves internal purposes for the protocol and will be explained in greater extent when we start programming. For now remember that: ANY DATA SENT TO THE LOOPBACK ADDRESS IS IMMEDIATELY RETURNED TO YOUR LOCAL MACHINE, without involving the router.
it is present on every host and can be used even when a computer has no other interfaces (i.e. is not connected to the network). The loopback address for IPv6 is 0:0:0:0:0:0:0:1 (or just ::1).
A related class contains the link-local, or “auto-configuration” addresses. For IPv4, such
addresses begin with 169.254. For IPv6, any address whose first 16-bit chunk is FE80, FE90,
FEA0, or FEB0 is a link-local address. These addresses can only be used for communication
between hosts connected to the same network; routers will not forward packets that have such
addresses as their destination.
IP keeps data flowing from one location to another via connected routers, that again connects local networks. In general this is known as WAN’s (Wide Area Networks) and LAN’s (Local Area Networks).
Here is a visual representation of the general dataflow:
An IP data packet consists of multiple parts that identifies various properties of the data it carries and about the package itself. It has (among others) a header, body and tail part that identifies where the packet comes from (origin), what the package contains (body) and where it’s headed (destination) among other things. Here is a visual representation of the IP packet and its parts:
The protocol version described here is V4 of TCP/IP.
As you can see from my sample ifconfig output there is now a new kid in town called IP V6. It has a completely new implementation and is still in its early stages regarding widespread use.
But, every device created the last years include support for the V6 of the protocol, so it’s pretty evident that this will be the “next big thing” to happen to TCP/IP.
We’ll cover IP V6 in later posts, but a little bit of information is provided here.
The V6 version of IP has a different addressing scheme, and looks something like this:
IP V6 uses 16 bytes for addressing and is by convention represented as groups of hexadecimal digits, separated by colons.
Each group of digits represents 2 bytes (16 bits) of the address, totalling 64 bits. leading zeros may be omitted.
We’ll get much deeper into IP when we start programming, but this should serve as a sound basis.
The TCP (and UDP + SCTP) parts of TCP/IP
TCP stands for Transmission Control Protocol, and is responsible for the accurate delivery of data between applications running on different computers. UDP stands for User Datagram Protocol and is also responsible for delivering data, but does not contain any error checking, so it accepts package drops, something TCP does not. UDP is suitable for things like streaming video, as the packages received are not dependent on each other for delivering video. If some packages drop, you’ll just end up with a video image of slightly poorer quality.
The newer Stream Control Transmission Protocol (SCTP) is also a reliable, connection-oriented transport mechanism. It is message-stream-oriented (not byte-stream oriented like TCP) and provides multiple streams multiplexed over a single connection.
Both TCP and UDP use addresses, called port numbers, to identify applications within hosts. TCP, SCTP and UDP are called end-to-end transport protocols because they carry data all the way from one program to another (whereas IP only carries data from one host to another)
The main thing relating to TCP/IP you need to understand from this post is that it takes two parts for successfully communicating between computer programs, a network address and a port number (plus off course, the data to transfer).
Here’s a visual representation of the packets involved and their content placeholders:
Just to make things clear:
TCP is a lot more than what I’ve covered here, but this is the basic understanding needed to program with the C Sockets API.
For completeness I include this visual view of the TCP HeaderPacket:
I hope that you’ve found this post interesting and somewhat informative.
It may look complicated at first, but it’s really not that bad. Especially compared to 3D programming with advanced linear algebra and stuff like that.
In my next post I’ll cover some basics of Sockets and how to start using the C Sockets API.
I found this explanation for “What happened to TCP/IP 5” after some google’ing:
IPng, Internet Protocol next generation, was conceived in 1994 with a goal for implementations to start flooding out by 1996 (yeah, like that ever happened). IPv6 was supposed to be the “god-send” over the well-used IPv4: it increased the number of bytes used in addressing from 4 bytes to 16 bytes, it introduced anycast routing, it removed the checksum from the IP layer, and lots of other improvements. One of the fields kept, of course, was the version field — these 8 bits identify this IP header as being of version “4″ when there is a 4 in there, and presumably they would use a “5″ to identify this next gen version. Unfortunately, that “5″ was already given to something else.
In the late 1970’s, a protocol named ST — The Internet Stream Protocol — was created for the experimental transmission of voice, video, and distributed simulation. Two decades later, this protocol was revised to become ST2 and started to get implemented into commercial projects by groups like IBM, NeXT, Apple, and Sun. Wow did it differ a lot. ST and ST+ offered connections, instead of its connection-less IPv4 counterpart. It also guaranteed QoS. ST and ST+, were already given that magical “5″.