Uuencoding for You

by Charles A. Gimon

for INFO NATION

(InfoNation Logo)

Newcomers to Usenet can be puzzled by those big blocks of jumbled letters and symbols in some newsgroups. Those bricks of text are actually computer files--programs, pictures, sounds--that have been converted into text.

The programs and protocols that send Usenet news were designed to work with text: information that is in letters, numerals, punctuation, and so on. E-mail is also a text-based medium. More specifically, Usenet and E-mail use a subset of ASCII--the American Standard Code for Information Interchange. Every character in ASCII equals a number: for example, a capital Q is equal to 81. In fact, if you write that number in binary using ones and zeroes, you can see exactly how your computer represents a "Q" in bits.

01010001 = 81 = Q

The full ASCII character set has 256 characters; each character can be represented by eight bits, which is one byte, which is eight ones or zeroes in binary. Every letter (or numeral, or punctuation mark) in a text message equals eight bits.

Text, as many programs work with it, uses only some of the ASCII character set, often only the characters from 32 to 128. If a combination of bits shows up that makes a character outside of that subset, the program may choke on it. This is why E-mail and Usenet don't accept "binary" files--pictures, sounds, database files, compiled programs in machine language, compressed files, and so on.

If you want to send someone a picture by E-mail, or post a sound file to Usenet, you have to convert it to a text format. By far the most common way to do this is by a process called uuencoding. "UU" is short for unix-to-unix: the method was developed for unix-based systems. Uuencoding will take any file and convert it to one of those blocks of text you may have seen. The block of text can be converted back to the original file by running the same process the other way: called uudecoding, of course.

The process is not that complicated: a binary file is a long series of ones and zeroes. Count off every six digits. Take that binary number and add thirty-two. The number you end up with can be represented as a text character. For example, let's say the first six ones and zeroes--the first six bits--of a file are 101100.

101100 in binary equals 44 in base ten.

44 plus 32 equals 76.

76 equals ASCII character "L".

So the uuencoding program would print an "L" and go on to the next six bits. Uudecoding does exactly the same thing in reverse: the program reads each ASCII character, subtracts 32 from the corresponding number, and prints out the bits to reconstitute the original file.

Uuencoded files should look very familiar after a while. Every line will be the same length, and the characters will appear to be random (but they're really no more random than the original file was). The start of every line is marked with an "M". There's a "begin" and an "end" at the respective ends, and a file name on the "begin" line for the reconstituted file.

So how can you send and receive these files? For many people, here's the best news of all: the programs you already use for E-mail or Usenet news (Eudora, Pegasus, Free Agent) will do the work for you. They may use the term "attachments", but uuencoding is usually what they're really doing behind the scenes. If you get on the Internet from a unix shell account, you can use the programs on your system (such as the tin or trn newsreaders) to decode files, or you can encode and decode files right from the shell prompt. Try typing "uuencode" at the prompt to see if it's available.

Once in a while, you may find yourself with a chunk of uuencoded text that's slipped away from your mail or news program. All is not lost: there are standalone programs available that will uuencode and uudecode. Try ftp to oak.oakland.edu, in the directory /SimTel/msdos/decode; the file uuexe651.zip is an excellent uuencoder/uudecoder for DOS. (An archie search for the same file will turn up many alternate sites, and those sites will have plenty of other uuencoding programs to sample as well.) These programs will both encode and decode--you only have to get one program.

If you end up with a text file on your home computer that has unrelated garbage at the beginning or end of the uuencoded text, don't worry about the extra stuff: unless it's really big, the uudecoding program will ignore it. The "begin" and "end" lines are what the program will pay attention to.

Something that you will notice, though, is that a uuencoded block will generally be no more than 50 kilobytes or so. Some mail programs (and America On-Line's software) can't handle files over a certain size, so uuencoding programs will slice up the file into several sections. You may have seen this before in Usenet: part 1/3, part 2/3, part 3/3. The uudecoding program will reassemble the parts for you. If you end up with uuencoded text by itself, you'll have to check and see if there are "begin" and "end" lines on it. If not, check and see if there is some kind of header that says this chunk is part 2 of 3, for example. If you have three files that you know are parts one, two and three, you can give them names like a1.uue, a2.uue, a3.uue, and the uudecoding program you are using will put them back together and decode them for you. If you don't have "begin" and "end" lines, or you don't have all the parts of the file, you're pretty much out of luck.

You can uuencode text, too, to thwart simple text snooping. It's almost too easy to sniff out a certain word in Usenet, and search engines like AltaVista or DejaNews can archive your comments practically forever. If you want to send mail to a mailing list, or post a message to Usenet, but you don't want it to be easily searchable by a certain word, you can uuencode the text with the same programs you use to uuencode binary files. This does not mean you've made it secret or secure--uuencoding, despite the name, is not the same thing as encryption--but it will defeat persons or computers who are doing simple searches for messages that contain a certain word.

The only bug in the ointment is that there's an alternate way to encode binary files floating around. There is a standard for attaching files to Internet mail messages called MIME (for Multimedia Internet Mail Extensions). When the MIME standard was voted on several years ago, uuencoding was passed over in favor of an alternate system called base64. The two methods are very similar--just the math formulas are different. What was so special about base64 was that it maps the file to be encoded into a different character subset. Some of the characters that uuencoding uses are not compatible with EBCDIC (Extended Binary Coded Decimal Interchange Code), a character set comparable to ASCII that is used mostly in Europe. The MIME standards committee decided on base64, because the base64 encoding would work with many European systems as well as American systems that use ASCII.

Now, in the real world, almost everyone uses uuencoding--but once in a while, you will run into a file in base64 encoding. If you do, there are many standalone programs available to decode those blocks of text, too. A good one for the DOS/Windows world is mpack15d.zip, available at oak.oakland.edu in the same directory /SimTel/msdos/decode.

It's an old stereotype of the new user: someone posts a message to an alt.binaries.* newsgroup on Usenet "How can I see the pictures??" Uuencoding and uudecoding is how it happens. Your client programs like Free Agent or Eudora may do the work for you, but knowing what's happening behind the window on your screen will make you a more informed Internet user, and better able to handle those situations when an "attachment" gets detached.

 

Note (September 20, 2003): Today mail attachments are typically in base64 encoding. The principles are still the same.

 

Charles Gimon teaches an Intro to the PC class at the English Learning Center in South Minneapolis. He can be reached at gimonca@skypoint.com.

 

Back to my net writings.
Back to my home page.