dtpx: borrowing bandwidth across machines
TL;DR
- In college we had a 1 GB per day data cap, and a movie was about a gigabyte. I wanted to beat per-device caps like that: each device spends its own quota on a slice, then relays it to you over the unmetered local network, so you get the whole file for a fraction of the data. (It also helps when a source throttles speed per IP.)
- So I built a distributed file downloader in 2021, in Python over raw TCP sockets.
- A tracker keeps a list of peer machines. Each peer downloads one byte-range in parallel and sends it back to the client, which joins the parts into the final file.
- There is also a 2021 paper on an enhanced version that streams and merges chunks as they download, so there is no wait at the end. I had forgotten it until writing this.
- The real value was learning TCP, sockets, Python concurrency (the GIL), and TLS the hard way.
dtpx was the first thing I built that felt like a system, not just a program. It mostly worked. Coming back to it five years later, the code still shows where I got stuck.
The idea
This started in college. We all had a 1 GB per day data cap, and a movie was about a gigabyte. One movie used your whole day.
So I took it as a problem to solve. The limit was not speed, it was the per-device cap. But a hostel or a college network has many devices, each with its own cap, all sitting on the same local network.
The idea: split one file into slices and give each slice to a different device. Each device downloads its slice over its own internet connection, spending a bit of its own daily cap. Then it sends that slice to me over the local network, which is not metered. I end up with the whole movie while spending only my share of the data.
The same trick helps anywhere there are per-device caps, like hotel or hostel WiFi. It is the BitTorrent tracker model, but for any plain HTTP URL instead of a swarm of seeders. I called it dtpx. (I first named it DDM, for Distributed Download Manager. The old name is still in the temp directory paths.)
How it worked
Three roles, three processes, talking over plain TCP sockets:
- The tracker is a registry. Peer servers say
addmeand it records them. The client sayssendpeerslistand it returns the list. - Each peer server waits for a job, downloads its assigned byte-range using an HTTP
Rangerequest, and sends those bytes back to the client. - The client asks the tracker who is available, splits the file into one slice per peer, sends the slices out, and joins the parts back into the final file.
If there are no peers, or the server does not support range requests, the client downloads the whole file itself with a pool of local threads. So it never fully fails. Worst case, it is a normal download.
The parts I got stuck on
I did not remember any of the hard parts. The code did. Each workaround points at one.
1. TLS won, so I cheated
The most telling line in the repo:
resp = http.request("GET",
url.replace("https", "http"), # ← gave up on TLS
retries=retries, timeout=timeout,
preload_content=False, headers=headers)
It strips the s off every HTTPS URL before downloading. Right below it sits a dead except SSLError handler. Certificate verification through a proxy, in urllib3, in that setup, beat me, so I skipped TLS entirely. It worked, but only on servers that still answered over plain HTTP, and it quietly turned off security on every download.
Lesson: "make it work" and "make it correct" are different goals. Be honest about which one you are at.
2. TCP is a stream, not a mailbox
I treated the socket like it delivered whole messages:
msg = self.client_conn.recv(1024)
msg = json.loads(msg.decode()) # assumes the whole message arrived in one recv()
It worked in testing because my messages were tiny and arrived in one packet. TCP makes no such promise. It is a byte stream, and a message split across packets would have broken json.loads.
The file transfer had the same problem. How does the receiver know the file is done? There is no length sent anywhere. The sender just closes the connection:
# peer server: send the part, then close to signal the end
while chunk:
self.client_conn.send(chunk)
chunk = file.read(3000)
self.client_conn.shutdown(socket.SHUT_RDWR)
# client: read until the socket closes
chunk = self.sock.recv(1024)
while chunk:
file.write(chunk)
chunk = self.sock.recv(1024)
Closing the connection means end of file. It works, but it cannot tell "done" apart from "connection dropped", and nothing checks the data. A cut-off part is accepted as complete.
Lesson: decide how messages are framed before you decide what goes in them. Put a length in front, or use a delimiter. One recv is not one message.
3. "Address already in use"
Every socket binds a port and sets SO_REUSEADDR:
self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
self.sock.bind(('', self.client_server_bind_port))
Restart the program, get OSError: Address already in use, wait, try again. That is TIME_WAIT: the just-closed connection still holding the port. SO_REUSEADDR on every socket is what that loop left behind.
Lesson: a socket outlives your process. Closing it is not instant.
4. The GIL, and a thread inside a process inside a thread
The peer server stacks concurrency in layers:
# a thread per connected client...
class PeerClientThread(threading.Thread):
def run(self):
...
# ...spawns a process to run the download...
process = multiprocessing.Process(
target=MultithreadedDownloader().download, args=(...))
process.start()
process.join()
# ...and that downloader spawns its own threads.
A thread per client, a process to run the download, and threads inside that. That is me running into Python's Global Interpreter Lock before I had the words for it. Threads are fine for the waiting, which is most of a download. The process was to get past the GIL for the CPU-bound parts.
Lesson: concurrency and parallelism are different problems. In Python you pick threads, processes, or async depending on which one you need.
5. Tuning the relay by hand
A commit titled "Tuned Chunk Sizes" was just this:
- chunk = file.read(1024)
+ chunk = file.read(3000)
Small, but the instinct was right: the relay was slow, so I turned a knob. What I missed: the HTTP download read 256 KB at a time while the socket relay read 1 KB, then 3 KB. The relay was the real bottleneck and I only nudged it.
Lesson: measure first, and tune the slowest part, not the easiest one.
6. WSL2 had its own network
A commit titled "Made some minor changes regarding WSL2" moves every path from /home/... to /mnt/c/Users/.... The small rename hid a real problem. WSL2 runs in its own NAT'd network with its own IP, so "connect to localhost" and "reach the other machine on the LAN" stopped being simple. That is the whole point of a multi-machine app.
Lesson: the environment is part of the system. Networking code is only as portable as the network it runs on.
Was it a good idea?
It depends on what is actually limiting you.
If everyone shares one internet connection, like a single home router, this does nothing. Every slice still comes through that one pipe, so I have just added a hop and copied the data an extra time.
It only helps when each device has its own limit. Two cases: a per-device data cap, where each device spends its own quota and the local relay between devices is free, or a source that throttles speed per IP, where ten peers on ten IPs pull more in total than one connection could. Both are narrow, but both are real, and the data cap one is the problem I started with.
The peers hold none of the file themselves, so this is not really like a torrent. It is a distributed download proxy pool.
The part I forgot
While writing this I went looking and found something I had completely forgotten. In 2021 we published a paper on an enhanced version of this: Enhanced Distributed Downloading Technique using DTPX.
The paper fixes the biggest weakness of the code above. In the code, a peer downloads its whole range, then sends the finished part back, and the client merges only after every part has arrived. The waiting at the end is wasted time.
The paper's version streams as it downloads. For every 5 MB a peer pulls, it packs that into a JSON chunk with a count field for ordering and sends it to the client right away. The client merges these chunks while the rest is still downloading. By the time the last chunk arrives, the file is already assembled, so the extra merge time at the end is zero.
On a 1 GB file over a 1 Mbps link, the paper measured about 17 minutes for a normal download, 9.4 minutes for plain distributed download, and 8.5 minutes for this. The whole gain over plain distributed download is that overlap.
The paper also covers the things I was about to list as missing: a 30 second timer per chunk that reassigns a slice if a peer goes quiet, and an optional MD5 check. It even names the per-IP case directly, as a way around Fair Usage Policy limits on a single IP.
So the optimization I "thought of" while reading this back was not new. I had designed it, written it up, and forgotten it. The public code is the baseline. The enhanced version lived in a paper.
What I would still change
- Ship the paper's design. The streaming merge, the per-chunk timeout and reassignment, and the MD5 check were all written up. The public code never got them.
- Keep the TLS. Fix the certificate path instead of deleting the
s. - Use asyncio. The whole thing is I/O-bound fan-out, which is what async is for. The threads inside processes would become much simpler.
I am not going to rebuild it. The point was never the program. It was the first time TCP, sockets, concurrency, and TLS stopped being textbook words and started pushing back.