[TOC]
Overview
When you deal with external binary data in Python, there are a couple of ways to get that
data into a data structure. You can use the ctypes
module to define the data structure
or you can use the struct
python module.
You will see both methods used when you explore tool repositories on the web. This article shows you how to use each one to read an IPv4 header off the network. It’s up to you to decide which method you prefer; either way will work fine.
ctypes
is a foreign function library for Python. It deals with C-based languages to provide C-compatible data types, and enables you to call functions in shared libraries.struct
converts between Python values and C structs that are represented as Python bytes objects.
So ctypes
handles binary data types in addition to a lot of other functionality, while
handling binary data is the main purpose of the struct
module.
Let’s see how these two libraries are used when we need to decode an IPv4 header off the network.
First, here’s the structure of the IPv4 header. This is from the IETF RFC 791:
A summary of the contents of the internet header follows:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service| Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification |Flags| Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live | Protocol | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Example Internet Datagram Header
Initial Data from the Network
We need some data to work with, so let’s get a single packet from the network. This little snippet show do fine. I ran this on Linux.
import socket
import sys
def sniff(host):
sniffer = socket.socket(socket.AF_INET, socket.SOCK_RAW, socket.IPPROTO_ICMP)
sniffer.bind((host, 0))
sniffer.setsockopt(socket.IPPROTO_IP, socket.IP_HDRINCL, 1)
# read and return a single packet
return sniffer.recvfrom(65535)
if __name__ == '__main__':
if len(sys.argv) == 2:
host = sys.argv[1]
else:
host = '192.168.1.69'
buff = sniff(host)
We just grab a single raw packet from the network and put it into a variable, buff
.
So now that we have binary data, let’s look at how to use it.
ctypes
module
The following code snippet defines a new class, IP
that
can read a packet and parse the header into its separate fields.
from ctypes import *
import socket
import struct
class IP(Structure):
_fields_ = [
("ihl", c_ubyte, 4),
("version", c_ubyte, 4),
("tos", c_ubyte, 8),
("len", c_ushort, 16),
("id", c_ushort, 16),
("offset", c_ushort, 16),
("ttl", c_ubyte, 8),
("protocol_num", c_ubyte, 8),
("sum", c_ushort, 16),
("src", c_uint32, 32),
("dst", c_uint32, 32)
]
def __new__(cls, socket_buffer=None):
return cls.from_buffer_copy(socket_buffer)
def __init__(self, socket_buffer=None):
# human readable IP addresses
self.src_address = socket.inet_ntoa(struct.pack("<L",self.src))
self.dst_address = socket.inet_ntoa(struct.pack("<L",self.dst))
You can see that the _fields_
structure defines each part of the header,
giving the width in bits as the last argument. Being able to specify the bit
width is handy. Our IP
class inherits
from the ctypes
Structure
class, which specifies that we must have
a defined _fields_
structure before any instance is created.
Class Instantiation
The wrinkle with ctypes
Structure
abstract base class is the __new__
method.
See the documentation for full details:
ctypes module.
The __new__
method takes the class reference as the first argument. It creates and
returns an instance of the class, which passes to the __init__
method.
We create the instance normally, but underneath, Python invokes the class
method __new__
, which fills out the _fields_
data structure immediately before
instantiation (when the __init__
method is called). As long as
you’ve defined the structure beforehand, just pass the __new__
method the
external (network packet) data, and the fields magically appear as attributes
on your instance.
struct
module
The struct
module provides format characters that you used to specify the structure
of the binary data. The first character (in our case, <
) specifies the “endianness” of the
data. See the documentation for full details:
struct module.
import ipaddress
import struct
class IP:
def __init__(self, buff=None):
header = struct.unpack('<BBHHHBBH4s4s', buff)
self.ver = header[0] >> 4
self.ihl = header[0] & 0xF
self.tos = header[1]
self.len = header[2]
self.id = header[3]
self.offset = header[4]
self.ttl = header[5]
self.protocol_num = header[6]
self.sum = header[7]
self.src = header[8]
self.dst = header[9]
# human readable IP addresses
self.src_address = ipaddress.ip_address(self.src)
self.dst_address = ipaddress.ip_address(self.dst)
# map protocol constants to their names
self.protocol_map = {1: "ICMP", 6: "TCP", 17: "UDP"}
Here are the individual parts of the header.
- B 1 byte (
ver
,hdrlen
) - B 1 byte
tos
- H 2 bytes
total len
- H 2 bytes
identification
- H 2 bytes
flags + frag offset
- B 1 byte
ttl
- B 1 byte
protocol
- H 2 bytes
checksum
- 4s 4 bytes
src ip
- 4s 4 bytes
dst ip
Everything is pretty straightforward, but with ctypes
, we could specify the bit-width
of the individual pieces. With struct
, there’s no format character for a nybble
(4 bits),
so we have to do some manipulation to get the ver
and hdrlen
from the first part of
the header.
Binary Manipulations
The wrinkle with struct
in this example is that we need to do some manipulation
of header[0]
, which contains a single byte but we need to create two variables
from that byte, each containing a nybble
.
High nybble
We have one byte and for the ver
variable, we want the high-order nybble
.
The typical way you get the
high nybble
of a byte is to right-shift.
We right shift the byte by 4 places, which is like prepending 4 zeros
at the front so the last 4 bytes fall off, leaving us with the first nybble
:
0 1 0 1 0 1 1 0 >> 4
-----------------------------
0 0 0 0 0 1 0 1
Low nybble
We have one byte and for the hdrlen
variable, we want the low-order nybble
.
The typical way you get the
low nybble
of a byte is to AND
it with F
(00001111):
0 1 0 1 0 1 1 0 &F
0 0 0 0 1 1 1 1
-----------------------------
0 0 0 0 0 1 1 0
Let’s look an example in the Python REPL:
>>> m = 66
>>> m
66
>>> bin(m)
'0b1000010' # or 0100 0010
>>> bin(m>>4)
'0b100' # or 0100
>>> bin(m&0xF)
'0b10' # or 0010
Now, more specifically to our IPv4 case, the first byte in the header is
always 0x45 = 69 decimal = 01000101 binary
.
See what that looks like when we right-shift
it by 4 and then AND
it with F
:
>>> '{0:08b}'.format(0x45)
'01000101'
>>> '{0:04b}'.format(0x45>>4)
'0100'
>>> '{0:04b}'.format(0x45&0xF)
'0101'
You don’t have to know binary manipulation backward and forward for decoding an IP header,
but there are some patterns like these (shift and AND
) you will see over and over again as you code and as you explore other hackers’ code.
That seems like a lot of work doesn’t it? In the case where we have to do some bit
shifting, it does take effort. But for many cases (e.g. ICMP),
everything works on an 8-byte boundary and so is very
simple to set up. Here is an “Echo Reply” ICMP message;
you can see that each parameter of the ICMP header can be
defined in a struct
with one of the existing format letters (BBHHH) (RFC777):
Echo or Echo Reply Message
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Code | Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identifier | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data ...
+-+-+-+-+-
A quick way to parse that would simply be:
class ICMP:
def __init__(self, buff):
header = struct.unpack('<BBHHH', buff)
self.type = header[0]
self.code = header[1]
self.sum = header[2]
self.id = header[3]
self.seq = header[4]
Conclusion
You can use either the ctypes
module or the struct
module to read and parse
binary data. Here is an example of instantiating the class no matter which method
you use. You instantiate the IP
class with your packet data
in the variable buff
:
mypacket = IP(buff)
print(f'{mypacket.src_address} -> {mypacket.dst_address}')
With ctypes
, make sure you define your _fields_
structure and hand
the data to it in the _new_
method. When you instantiate the class, you’ll have the
access to the data attributes automatically.
With struct
, you define how to read the data with a format string. For data attributes
that don’t lie on the 8-byte boundary, you may need to do some binary manipulation.
In short, use whichever method fits your brain. But always be aware that you may see code from others that use a different method. Hopefully, now you’ll see it and understand it.