Have you ever had to copy millions upon millions of little files across your network very, very quickly? Have you exhausted all of your other command line hacks yet? Of course you have, or you wouldn’t be reading this. (Or you’re my mom.)
Ok… I get that the audience for this type of thing is rather limited. But this is one of those posts that will get more hits by me, then anybody else. This is strictly for demonstrating how to send thousands or millions of little files across a network using bash, tar, and netcat. (Mom you can stop reading now.. I don’t make a cameo in the video… you can skip this one. Thanks for the click though.)
The Code: One-liners are a beautiful thing
tar -cz [source_dir] | nc [destination_ip] [destination_port]
nc -l -p [local port] | tar -C [destination_dir] -xzf -
Running cygwin-X from one of my XP boxes I tunneled into two different linux boxes (krispc7 == ubuntu 11.04 && kris@bt == backtrack 5).
ssh -X kris@krispc7
ssh -X kris@bt
I did this to work entirely within a native linux environment mainly because I’ve only ever done this with cygwin in the past. Also, so I can demonstrate everything on the same screen using terminator (my favorite GUI shell) and not have to run multiple desktop recorders. I don’t actually need the X forwarding, and I’m sure that my performance was lacking because of this. Additionally the files being copied were on a separate windows file server (we’ll call him e5). So that throws the whole speed thing out the window. Combine that with my extremlely verbose switches and you could probably print the files out of one machine, and physically scan them into the other machine quicker than the actual copy process took place.
Like I said… for demonstration purposes only. Running the compression and netcat instance on a third party machine is just plain stupid in this situation if you’re trying to move stuff really fast (not to mention that this particular hack box has no legs at all). The ideal environment would be to run the talkie box command on the actual talking box.
I ssh into bt (i know everyone roots into their bt boxes… but I don’t allow root to ssh anything), go to the network shared directory on e5 that contains the subfolders with the millions of little files and initiate the talkie side of the command. I then ssh into krispc7 and initiate the listening side of the command.
…Actually it’s the other way around… but you get the idea (“YOU”, is me talking to myself in my own post. Now I’m omnisciently referring to myself in the third person twice removed… and you thought you had problems.)
So listening box is listening, and talking box is waiting for me to hit enter. In the bottom right of the video is a simple while loop I used to count the number of new files in the destination directory.
I let the copy/nc job run for about ten minutes before I killed the video. But I cut a lot out while editing… so 10 minutes happens in less than two. (Who really wants to watch a video of files being copied?).
What’s happening here you ask?
Each file is being compressed on (what is supposed to be) the local machine and instead of being output to an individual zip or tarball file I’m simply redirecting the compressed data into netcat which sends the information over a tcp connection pointed at a specific port. The listening box in turn is monitoring the port defined (9998 in my video) for any and all incoming data and redirects it to be decompressed in the output location of choice.
Maybe tomorrow I’ll run a test that involves copying a bunch of stuff back and forth between two high-end machines (without any man in the middle), and compare the speeds when using different types of compression. Then compare those to a standard scp, windows drag and drop file copy, and my favorite… xxcopy.
Until then, enjoy the show. (Always launch the videos in full screen to watch in HD).