Lab 07: Man-In-The-Middle

Table of contents
  1. Lab 07: Man-In-The-Middle
  2. Introduction
    1. Logistics
    2. Learning objectives
    3. Getting the configuration
    4. Generating your .env file
    5. Network topology
  3. Code introduction
    1. Compilation
    2. Step 1: Disconnecting the two hosts
      1. Running the executable
      2. Success criteria
    3. Step 2: Exploring netcat
      1. Take notes
    4. Step 3: The full exploit
      1. Implementation
        1. Parsing TCP headers
        2. Reaching the data
        3. The TCP checksum
      2. Compilation
      3. Testing
        1. Success criteria
      4. Demo
      5. Submission

Introduction

This lab is the logical extension of Lab 06 to finally use the infrastructure we’ve been building to conduct an active exploit. In Lab 06, we explore ways in which we can poison the cache of a victim machine so that it starts sending its packets to us. In this lab, we’ll aim to act as the Man In The Middle (MITM) between two communication hosts. Sitting between the two, we’d now be acting as a router. The advantage we gain from that is that we can see all the packets between the two hosts!

Logistics

We will continue with the same set of tools from Lab01, these are namely:

  1. Wireshark to visually see packets and protocols.
    • Install this on your local machine, so you can see things visually.
  2. If you are comfortable with command line, you can also use tshark to observe the same packets and protocols, directly on the server machine.

  3. scp or rsync will prove to be useful to obtain packet captures from the server and download them on your local machine. They should be available by default on your Linux distribution that you are running.

Learning objectives

After completing this lab, you should be able to:

  • Conduct an Man-In-The-Middle (MITM) attack using Address Resolution Protocol (ARP) cache poisoning.

  • Modify TCP packets on the fly to violate the integrity of communication packets.

Getting the configuration

You can find the starter setup for this lab under the 07_Lab07 directory of the csse341-labs repository. If you have set up your private repository correctly, you can fetch the latest version of the labs using the following sequence of commands:

  1. Synchronize with the labs remote using git fetch upstream.

  2. Pull the latest changes from the main branch of the class repository:

    $ git pull upstream main
    
  3. Push the starter setup to your repository so you can start modifying it:

    $ git add 07_Lab07
    $ git push origin main
    

Generating your .env file

Before we spin up our containers, there are some configuration variables that we must generate. To do so, please run the gen_env_file.sh script from the lab repository directory as follows:

$ ./gen_env_file.sh

If run correctly, you will find the following new files:

  1. .env (hidden file - use ls -al to see it) contains your UID and GID variables.

  2. connect_*.sh a utility script to connect to each container in this lab.

  3. run_*.sh is another utility script that allows you to run commands on a container without logging into it.

Network topology

In this lab, we will be working with three machines connected to the same local network. They will live on the same subnet and all can access each other directly. The machines are:

  1. hostA with IP address 10.10.0.4
  2. hostB with IP address 10.10.0.5
  3. attacker with IP address 10.10.0.10

Code introduction

We will continue with a similar structure to the previous but we’ll extend it with a few more utilities. Since we will be observing and modifying TCP packets, we will add some utility functions to parse TCP packets.

First, copy over your solution to previous lab so we can reuse the implementation from there. To do so, from the top level directory of the current lab (i.e., 07_Lab07), do the following:

$ cp ../06_Lab06/volumes/code/src/arp_util.c volumes/code/src/arp_util.c

If you modified any other files, please be careful when copying them. Make sure to not interfere with any functionality that I already implement for you. It is better to ask for clarification before copying over more code int this lab.

As in the previous lab, we structure the code as follows:

  • The include directory contains definitions for the functions and utility headers.

  • The src directory contains the implementation files. Of particular interest to us are:

    1. arp_util.c: This is the file you should copy over from the previous lab.
    2. send_arp.c: This is the same send_arp.c from the previous lab, it is here for your convenience, you don’t necessarily need it.
    3. poison.c: This file contains code that would run a dual ARP cache poisoning to make the attacker act the MITM between the two hosts. It relies on the send_arp_packets function we implement in the previous lab.
    4. sniff.c: This file contains code to sniff TCP packets coming form either hostA or hostB.

Compilation

As in the previous lab, we will use cmake as our build system to facilitate the resolution of dependencies. To build your code, you should first generate the appropriate makefiles as follows.

On your server (not a container), navigate to the volumes/code directory and then do the following:

$ mkdir build
$ cd build
$ cmake ..

Once we have generated the build files (you only need to do that once), you can compiled your code on any change using make in the build directory.

Step 1: Disconnecting the two hosts

We would first like to be able to completely disconnect hostA from hostB. We will be doing that by acting as the MITM between the two, then dropping all packets received from either. In other words, if the attack is successful, when hostA wants to talk to hostB, it will send its packet to us (the attacker container): we will just drop that packet. Similarly for hostB, it will send packets destined for hostA’s IP address to us: we will also drop those.

This way, even though hostA and hostB seemingly get valid ARP replies when they send out ARP requests, they cannot communicate with each other.

For this to work, we would need to poison the ARP cache on two fronts. The first poisons hostA’s cache while the second poisons hostB’s cache. You can use whichever technique from Lab 06 that you’d like and is effective.

Here is the gist of this attack:

  • Every time, we’d like to generate two packets, one to poison hostA’s cache, and another to poison hostB’s cache.
  • Once you implement this, monitor the ARP caches on hostA and hostB to make sure you attack is successful (recall to use arp -an to check the content of the ARP cache).

I have provided you with a file under src/poison.c that allows you to send ARP packets to two destinations using multi-processing. The main process that you launch when executing this program will fork (i.e., replicate) itself and then both processes start sending ARP packets. What you have to do in the file is to call the send_arp_packets function with the right arguments from two different spots in the code.

  1. The first is at line 155 (this will run in one process).
  2. The second is at line 163 (this will run in another, completely different, process).

You will have to select the arguments for send_arp_packets in each case so that we can poison both hostA’s cache and hostB’s cache. The outcome of this attack should be the following:

  • In hostA, the ARP cache should map hostB’s IPv4 address to the attacker’s MAC address.

  • In hostB, the ARP cache should map hostA’s IPv4 address to the attacker’s MAC address.

Please note that this C code does not work with ARP replies since those require us to supply two destination MAC addresses, one for each process. I was too lazy to implement that and will leave it to you to complete the exploit if you’d like (assuming you found that replies work from Lab 06).

Running the executable

To build this file, move into the build directory and call make. The executable poison should show up under the bin/ directory in build/. After that connect to your attacker container (or use the ./run_attacker.sh script) and run this executable:

(attacker) $ sudo /volumes/code/build/bin/poison -h
Usage: ./bin/poison [OPTIONS]

Options:
  -s, --source <mac>         Source MAC address
  -d, --destination <mac>    Destination MAC address (default: 0)
  -v, --victim <ip>          Victim's IP address
  -t, --target <ip>          Target's IP address
  -n, --num-packets <count>  Number of packets to send (default: -1)
  -a, --arp <type>           Type of packets to send (request, reply, gratuitous)
  -h, --help                 Display this help message

Then you can invoke this script by passing the right arguments that correspond to your chose method of ARP cache poisoning.

Recall that you can get your own MAC address using cat /sys/class/net/eth0/address.

Note that when your run this code, you will motive no difference in the output on the screen, but rest assured, there are two instances of it running.

Success criteria

If your attack is successful, the hosts will no longer be able to talk to each other. Make sure to test the following cases:

  1. ping from hostA to hostB, no packets should arrive, but the packets should show up at the attacker. Use tcpdump to make sure they show up.
  2. ping from hostB to hostA, no packets should arrive, but the packets should show up at the attacker. Use tcpdump to make sure they show up.

Please demo this step to me before you move on to the following one.

Step 2: Exploring netcat

In this lab, we will intercept traffic between two netcat applications running on hostA and hostB, and play a little prank. We will do a more interesting (and more nefarious) exploit in the labs to come.

Let’s first understand how netcat works so we can plan our attack accordingly. Grab three terminal windows, two on hostA and one on hostB (you can also use the ./run_*.sh scripts, but you’d still need three terminal windows).

On hostA start a packet capture for all IP traffic.

(hostA) $ sudo tcpdump -i eth0 ip -w /volumes/netcat.pcap

Then start the server on either host, I will go with hostA:

(hostA) $ nc -l 1234

On the other machine (hostB in my case), connect to the server:

(hostB) $ nc hostA 1234

Now type a few words on hostB and press <Enter>, those same words will show up on hostA where the netcat server is running. It is a simple way of testing if two hosts can connect and exchange packet.

Stop the packet capture, download the .pcap file, and open it in Wireshark. You will notice that a new protocol shows up, namely TCP, which stands for Transmission Control Protocol. We will explore TCP in depth later on, all we care about now is to find where the words we have typed are.

Take notes

By observing the pcap captures, locate the words you have typed during the experiment in the captured packets. You will need to expand the TCP protocol header to be able to see those and answer the following question. There is no worksheet for this lab, but please take note of your answers to these questions, as you’ll need them in the next step.

  1. Grab a TCP packet, and open its corresponding IPv4 header. What is the value of the protocol number in the IPv4 header? Record this value in your notes.
  2. Locate the packets that contain the data that you type into your terminal.
  3. For those packets containing the data, open their TCP header, what is the value of the flags field? Which flags are actually set? Take note of those flags.

Step 3: The full exploit

Our main goal here is to keep hostA and hostB communicating, but to observe their packets and modify their content. To do so, we must sniff all packets that are destined for either host, modify them, and then send them back out. This is a data integrity attack since we are modifying the contents of the communication between hostA and hostB.

For our specific purposes, here’s what we want to do:

  1. Listen for TCP packets coming from either hostA or hostB.

  2. If the packet does not contain netcat data (Hint: use the flags value you recorded in the netcat experiment), skip to step 4.

  3. If the packet contains netcat data (i.e., it contains messages), modify the content of those messages to your liking.

  4. Send the packet back on the wire (use pcap_inject).

The C file under src/sniff.c sets you up to sniff on the network only for TCP packets that are coming either from hostA or hostB. Therefore, we can skip the steps we did in the early labs by peeling out the layers. The filter we pass to libpcap guarantees that we only see TCP packets.

Implementation

Your task in this lab is to implement the function parse_tcp. We define this function in include/tcp_util.h and implement it in src/tcp_util.c. You will not need to change src/sniff.c at all, it already calls the parse_tcp function. Your job is to then parse the TCP packets and modify those that need modification, and then forward them to their destination.

Before moving on, this step requires us to store the MAC address of hostA and hostB to access them in the code. It is ok to hard-code those values in src/tcp_util.c. I have already done that for you in the constants HOST_A_MAC and HOST_B_MAC that you can find at lines 16 and 17.

Please double check the values there with those of the MAC address of hostA and hostB. Docker might change those from my machine to yours and thus you might need to update them accordingly.

Parsing TCP headers

You would need to parse the TCP header for this task. You can use the struct tcpdhr provided by the Linux kernel for this task. To use it, add this line to the top of your file

#include <linux/tcp.h>

Then, you can use it in the same way we did for all previous packet headers. For example, given a packet pointer pkt, we could do:

struct tcphdr *tcp = (struct tcphdr*)(pkt + sizeof(struct ether_header) + sizeof(struct iphdr));

Here are the content of that structure (vscode should help here as well):

struct tcphdr {
  __be16  source;
  __be16  dest;
  __be32  seq;
  __be32  ack_seq;
#if defined(__LITTLE_ENDIAN_BITFIELD)
  __u16  res1:4,
    doff:4,
    fin:1,
    syn:1,
    rst:1,
    psh:1,
    ack:1,
    urg:1,
    ece:1,
    cwr:1;
#elif defined(__BIG_ENDIAN_BITFIELD)
  __u16  doff:4,
    res1:4,
    cwr:1,
    ece:1,
    urg:1,
    ack:1,
    psh:1,
    rst:1,
    syn:1,
    fin:1;
#else
#error  "Adjust your <asm/byteorder.h> defines"
#endif
  __be16  window;
  __sum16  check;
  __be16  urg_ptr;
};

The fact that flags are actually split into individual bits makes life a lot easier. For example, if I want check if the header contains the PUSH and SYN flags, I could simply do:

if(tcp->syn && tcp->psh) {
  // found it.
}

Of course, you’d need to check for the flags you care about.

Reaching the data

If the packet contains data, then we need a way to access that data, and also know how large it is. This will require us to peek a bit into the IPv4 header and the TCP header. As we will see later in class, TCP headers can have varying length options fields. This makes access the data a bit annoying.

Luckily for us, the TCP header contains a field called the “data offset,” which tells us where the data starts, as an offset from the TCP header. By design, the TCP header is always aligned to 32 bits (i.e., 4 bytes). We can thus reduce the offset to 4 bits and represent the number of 4-byte words in the header.

For example, if the data offset is 4, then the header is actually \(4 \times 4\) bytes long, which is 16 bytes.

Therefore, to calculate the start of our data segment, we’d do something like:

// assume we created a struct tcphdr *tcp earlier...
char *data = (char*)tcp;
uint16_t tcp_hdr_len = tcp->doff * 4;
data = data + tcp_hdr_len;

Now, you can access the data of the TCP header. That is great, but how do I know when to stop reading data? Now, we need the help of the IPv4 header.

The IP header contains a 16-bit field called tot_len that represents the total length of the packet. This includes the IP header, the TCP header, and the data (excluding the Ethernet header).

Therefore, we can calculate the data length as follows:

uint16_t tot_len = ntohs(ip->tot_len);

// we are making an assumption here, but we'll let it go for now.
// talk to me if you'd like to really know what's going on.
uint16_t iphdr_len = sizeof(struct iphdr);
uint16_t tcp_hdr_len = tcp->doff * 4;

uint16_t data_len = tot_len - iphdr_len - tcp_hdr_len;

// the following loop iterates over the data and replaces all a's with b's
char *data = (char*)tcp + tcp_hdr_len;
int i = 0;
for(; i < data_len; i++, data++) {
  if(*data == 'a') {
    *data = 'b';
  }
}

You might want to do something a bit better than just replacing a’s with b’s, but you get the gist.

The TCP checksum

The last step we need to worry about is the checksum again (recall the ICMP checksum from the previous lab). The TCP header also contains a checksum field, but computing it a bit of a pain. It requires us to peek back into the IP header and obtain a pseudo header from there.

Here’s the description from RFC793:

The checksum field is the 16 bit one's complement of the one's
complement sum of all 16 bit words in the header and text.  If a
segment contains an odd number of header and text octets to be
checksummed, the last octet is padded on the right with zeros to
form a 16 bit word for checksum purposes.  The pad is not
transmitted as part of the segment.  While computing the checksum,
the checksum field itself is replaced with zeros.

The checksum also covers a 96 bit pseudo header conceptually
prefixed to the TCP header.  This pseudo header contains the Source
Address, the Destination Address, the Protocol, and TCP length.
This gives the TCP protection against misrouted segments.  This
information is carried in the Internet Protocol and is transferred
across the TCP/Network interface in the arguments or results of
calls by the TCP on the IP.

                 +--------+--------+--------+--------+
                 |           Source Address          |
                 +--------+--------+--------+--------+
                 |         Destination Address       |
                 +--------+--------+--------+--------+
                 |  zero  |  PTCL  |    TCP Length   |
                 +--------+--------+--------+--------+

The TCP Length is the TCP header length plus the data length in
octets (this is not an explicitly transmitted quantity, but is
computed), and it does not count the 12 octets of the pseudo
header.

To avoid dealing with this fugliness, I have provided you with a TCP header checksum calculation routine. You can find its definition in include/tcp_util.h and its implementation in src/tcp_util.c.

struct pseudo_tcp_hdr {
  uint32_t saddr;
  uint32_t daddr;
  uint8_t zero;
  uint8_t ptcl;
  uint16_t tcp_len;
};

/**
* Compute the 16 bits check for a TCP packet.
*
* @param tcp   The TCP header.
* @param ip    The IP header.
*
* @return the 16 bits checksum computed according to the TCP specs.
*/
uint16_t compute_tcp_checksum(struct tcphdr *tcp, struct iphdr *ip);

You can use this function by first setting the TCP header’s checksum field to 0, then calling it and passing the TCP header and the IP header as arguments.

Note that if your checksum computation is incorrect, packets will not reach user-space, they will dropped earlier by the kernel. However, they will still show up in tcpdump capture and tell you that the checksum is incorrect. So make sure to have tcpdump session running when debugging. Welcome to the world of network debuggin.

Compilation

To compile your work, follow the same steps as usual:

  • Run make (or make all) in the build directory.

  • Check the build/bin contains the sniff executable.

Testing

Testing this step is a bit involved since we need to run things across several machines.

To test your implementation, your poisoning program must be running in the background. Therefore, you will need to launch the poison binary in one attacker terminal, and then launch the sniff binary in another. As in previous labs, the sniff program requires the attacker’s MAC address as a command line argument. You can always pass it using cat /sys/class/net/eth0/address.

Once the attack is running, start a netcat server on hostA using nc -l 1234. Then connect to it from hostB using nc hostA 1234.

Success criteria

Your attack is successful if you can observe the following behavior (assuming you decided to change all characters to a).

  1. Starting a netcat on hostA, hostB can successfully connect to hostA.

  2. All packets between hostA and hostB go through the attacker machine.

  3. Packet not containing data pass through the attacker without modification.

  4. Packets containing netcat data are all modified according to your own design (in our case, in my case, I replaced all characters with a).

  5. If you type words on hostB and send them, they will show up as all a’s on hostA.

Demo

Once your are confident with your implementation, please demo it to your instructor for grading.

Submission

Please submit your modified .c and .h to the appropriate Gradescope dropbox.