Read This Before You Write a Newsreader, News Transport System, etc. -------------------------------------------------------------------- By Tom Limoncelli Version 1.12. Updated 1995-03-30 This document is not a FAQ. A FAQ implies that someone asked questions and someone else answered said questions. That's not what this document is about. This document is written because of all the people that didn't ask, or didn't know to ask, and got in trouble because of it. People constantly post to the net that they are writing some kind of software and in the process of asking other questions they reveal that they are doing something else that is, well, stupid. This document attempts to point out common stupid mistakes so that you can avoid them. Wow! How amazingly useful! Isn't that nice of Tom? Well, actually it's a sad statement on the understanding of netnews technology that makes these stupid mistakes so common. Very common. Common enough to make this document useful. So, this is not a FAQ, this is a warning. Point #1: "I think I'll write a newsreader!" Stop. Stop right there. I have a suggestion that will save you a lot of time: Go to a movie. Rent a video. Volunteer with a local first-aid squad. Feed the homeless. Make a sandwich, walk onto the street and when you see a homeless person say, "HERE!" Just do anything other than write a newsreader. There are enough already. We should have started neutering newsreader authors a long time ago. "...but I'm going to write one that does something nobody has done before!" Yeah, right. Before you say that, learn all the features of nn, trn, strn, gnus, and tin. Now tell me you've thought of something new. Can you add this new feature to one of the old newsreaders? Yes you can. Plus, you'll save a million sysadmins grief when they have to go and install yet another newsreader and figure out how to build it. It's much easier to install a updated version of an old newsreader. "...well I could, but I really want to make my system stand-alone". Then go stand alone in a corner until you've changed your mind. If you don't, you'll spend all of your time writing silly parts like user interfaces, figuring out your command structure, getting it to compile on every kind of Unix in the world, etc. Heck, you'll waste most of your time just writing the install script. Really! Just add your feature to some other newsreader. All the boring parts are written for you. For example, a scoring system for articles could be the basis for an entire newsreader. However, by adding this feature to trn (creating strn), the author was able to build an entire newsreader with all those great trn features, but concentrate on what he considered "fun" (i.e. the scoring system). Here's another example: TMNN was an attempt to make an entirely new netnews system where things would be a lot more hypertext'ish. Rather than just add these brilliant ideas to a newsreader (or to C News and modify a newsreader to take advantage of the new data), the author tried to re-invent the entire newstransport. The project was never completed. I'm sure the author didn't get to spend much time on that part that really interested him either. Point #2: If you are writing a newsreader or transport from scratch, here's what I think the areas of interesting research would be: Wait a second, you aren't sure what's a transport and what's a newsreader? Well, that's a sure sign that you shouldn't be writing this software just yet. Much of the work of writing any software that interacts with netnews can be avoided if you KNOW THE CURRENT TECHNOLOGY FIRST. Read all the RFC's (822, 977, 1036), install C News or INN once or twice. Install tin, trn, nn, and readnews. Read the O'Reilly book. If anything, read the 2 Usenix papers about C News (same place you'll find the C News code), Rich's Usenix paper on INN (stored where INN is stored), and Kurt Lidl's paper on MUSE (distributing netnews via MBONE multicast) ftp://ftp.uu.net/networking/news/muse/usenix-muse.ps.gz). Heck, they're just plain good reading for anyone that writes software. Point #2.5: If you are writing news software (transport or reader) for a non-Unix system then it's still important to have experience with the Unix systems that are available. In fact, considering the horrible I/O throughput on most Intel-based computers, you have 2x the reason to have studied the papers about C News and INN because they have optimized the amount of I/O to a minimum. If you are writing a newsreader you have 10x the reason to learn a number of Unix newsreaders. There is over 100 years of combined software engineering experience in those readers. Where else do you have the opportunity to learn from that much experience? (Maybe while writing accounting software but who would want to do that?) Also, PC news posting agents are NOTORIOUS for not following the RFCs and making the mistakes listed here. So, go slowly, read the resources, do your homework, get lots of advice, and then go for it. Point #3: If you are writing a newsreader or transport from scratch, here's what I think the areas of interesting research would be: Advanced user interfaces -- I don't mean Athena Widgets vs. MOTIF vs. OpenWindows, etc. All those damn GUI-based newsreaders add ABSOLUTELY nothing to the state of the art... except maybe permitting the mouse-generation to access the technically elite Usenet (which, being a "More Power To The People"-kind-of-guy, I feel this is a good thing). However, I think an advanced UI is something that reads news for you. Read the Usenix papers on RightPages or Ferret. Or, how about something that lets me post articles in a way that lets me communicate better (for some definition of "better"). Something that cross-references articles in a way that is more useful than currently available. Disconnected Mode -- Every pissant BBS user uses QWK and has all sorts of fancy QWK newsreaders. Nobody has invented something as nice as this for netnews. On the other hand, QWK *SUCKS*. Boy does it suck! It is the embodiment of BAD SOFTWARE DESIGN. It is the best example of everything wrong with the way PC software authors create systems. Then there is SOUP, the spec is available as "soup12.zip" from all good SimTel mirrors. I haven't read it yet, but the author tells me it is much better that QWK. Software can be found by looking in the FAQ's for comp.os.msdos.mail-news and alt.usenet.offline-reader. Investigate this before you go reinventing the wheel. Better posting -- Wanna be famous? Make a seriously amazing MIME posting tool. You could be responsible for the next explosion of netnews bandwith as everyone uses your MIME authoring tool to make megabyte posts full of text, sounds, and graphics. Or, make a idiot-proof, GUI-based, bullet-proof posting system that doesn't lock you into just the standard headers. Want real fame? Separate the posting mechanism from the newsreader. Define an interface between newsreaders and newsposters and then make a couple newsposters. Try to get every newsreader to add support to your newsposter. Then we'll hear things like, "I use trn with FreshPost" and "Oh, I use trn too but I use MagicPost, it has better MIME capabilities". Fame and fortune await you! New storage systems -- Everyone talks about storing netnews in a compressed form, as a database, as a flat-file, on a special "netnewsfs" filesystem, etc. etc. Nobody actually implements it. What's stopping you? How about a storage system that makes expires happen blindingly fast? How about a storage system that makes reading the "next article" a fast operation (note: the next article is not numerically next if you are using a threaded newsreader). INN has hooks for these kinds of things and all INN utilities uses these hooks. Make the change once in the right place and all (or most) INN code supports it. Time after time people have suggested using an SQL database to store articles, the history file, the kill file, the X-Files, etc. Why not actually implement it and see if these are good ideas? HypertextNews -- Why store quoted text? Why not just store a code which specifies the quoted article and which lines? Newsreaders that support it could let you click on the quoted text and view the lines specified, the whole article, or whatever! Make a system that is also backwards compatible or figure out how to expire such a news database and you'll win the nobel prize! There are plenty of ideas where those came from. Please don't just write yet another newsreader. Point #4: Never re-invent the wheel. Why write a text editor when you can just call $EDITOR? (Unless your amazing new feature is a better editor... in which case you shouldn't be writing a newsreader, you should be writing something that all newsreaders could be calling as $EDITOR. Why get bogged down writting tons of NNTP code when you can link to a pre-written client library? NNTP-t5 and INN both generate ready-to-use client libraries that anyone can link to and they do all the work for you. Best of all, they are similar enough that you can write your code so that it works when linked to either. It would be nice if someone wrote a library with all the same calls that read everything off the disk instead of via NNTP. Then you could link a NNTP-based newsreader to this library and turn it into a non-NNTP-based newsreader. Why not write a new library that checks a flag and reads news via either NNTP, the file system, on a special compressed system, or by smoke signals. Point #5: THINGS TO DO OR NOT TO DO: -------------------------- DO USE "MODE READER": When you talk to an NNTP port, the first thing you should do is send the command "mode reader". Pay attention to the error messages. "500" means "I don't know that command" (proceed as normal), "200" means "good". Anything else and you don't want to talk to this server. DO USE A PRE-WRITTEN DATABASE: Don't use your own database, use NOV. Link to the NOV library so you don't have to implement any of it. It does all the work for you. Kill your sysadmin if they want to install tin's database, trn's database, nn's database, etc. (unless you get your hard disks for free). POSTING: DON'T VALIDATE HEADERS: When the user wants to post an article, give them an editor with the minimum headers and accept whatever you get back. If any changes were made, send everything verbatim to "inews -h". "inews"'s job is to validate the headers, insert missing ones, silently delete certain ones, etc. Don't try to do all this work in the newsreader. Sysadmins often hack their inews to add some special feature... don't undo their work or require them to re-add this hack to every newsreader they install! (The NNTP POST command is the same as piping to "inews -h" except you must include a "From:" header). POSTING: DON'T WORK TOO HARD #1: The "inews -h" command requires only two valid headers: "Subject:" and "Newsgroups:". Don't send it anything else (unless the user inserts it him/her self). For example, why figure out how to format the date properly? The format is very specific and if you get it wrong, the transport silently drops the article. Why try at all when you know that "inews -h" generates a perfect one for you? Also, if the user inserts a "Reply-To: foo@bar", let them. Don't try to validate it, if they put in a non-functioning address it's not your job to care. POSTING: DON'T WORK TOO HARD #2: The NNTP "POST" command requires only 3 valid headers: "Subject:", "Newsgroups:", and "From:". It will generate the rest if they are left out. Don't do the work yourself. RFC977 says that you must generate all required headers, but that isn't a good idea, as authors learned. That's why it is important to educate yourself about the RFCs, as well as how they got implemented. POSTING: DON'T GENERATE A PATH HEADER: Don't generate a Path: header. Period. With networks changing so often, it is impossible to generate one that is correct for all sites. Let "inews" or NNTP's "POST" command generate it for you. They will generate it properly because they were installed (and maybe modified) by someone that understands the site's special configuration. The person that installs the newsreader is often someone different, and is often installed by Joe Loser that thinks netnews was invented 3 months ago when he first discovered alt.sex. POSTING: MORE HEADERS NOT TO GENERATE: When generating a post's headers, don't insert the Date: header, munge the Sender:, From: header, etc. That is inews's job. "inews"'s only purpose in life is to take the crap that the user input, add the missing required headers, check and fix obvious errors, and reject what it can't fix. inews will send it to the spool or post it via NNTP. Why does everyone think they can out-do inews by doing the work themselves? POSTING: The importance of the Date: header: The Date: header is critical to the news transport because this makes it possible to expire netnews. Therefore, the Date: header has to be one of a couple very specific formats so that transport software authors aren't chasing a moving target. Since every site that touches an article must re-parse this date (and it is slow to parse), C News and INN have optimized on one particular date format. The other formats are handled in a manner that isn't as fast. So, output Date: formats like C News does. Better yet, if you are a posting agent DO NOT GENERATE the Date: format and let inews (or the NNTP "POST" command) generate it for you! DATE HEADER MANIA: The Date: header that you generate should always use your local GMT timezone offset. However, if you want to be a really cool newsreader author, make sure your program displays that header in the local timezone of the person reading the message. (i.e. convert the header to the local time when displaying it). Remember to provide a way to see the original header (i.e. the "show all headers" command shouldn't do the conversion). NNTP POSTING: DON'T USE IHAVE: Use the NNTP "POST" command, not the NNTP "IHAVE" command. If you use NNTP's "ihave" command then you have spent about a week duplicating all the work that inews (or NNTP's "POST" command) does, wasted another week of programming time to get everything "just right"... and when someone installs their software on an INN server, they'll find that it doesn't work. Duuuh! WHEN YOUR USER IS IDLE, DON'T GENERATE TRAFFIC: If your user isn't typing, mouse'ing, clicking, etc. your newsreader shouldn't be generating work for the server. Imagine 5,000 users all leaving your program running when they leave for lunch. EXCEPTION: If you are implementing some fancy read-ahead model, but then you shouldn't be reading too far ahead if the user seems to have walked away from their terminal, eh? IF YOU LOSE YOUR CONNECTION, HANDLE IT TRANSPARENTLY: Write your code so that if your NNTP connection closes you handle it gracefully. You shouldn't go into an infinite loop, or spin in a "open->error->open" loop chewing up CPU time. If you can't re-open the server, tell the user but don't core-dump. WHEN YOUR CONNECTION IS CLOSED, RECONNECT GRACEFULLY: Write your code so that you reconnect without the user being warned a zillion times. Maybe put "Reconnecting to server" in a status line, but don't require the user to click on "OK". Give the user the feeling that they always have a connection, even when they are talking to a server that disconnects after 30 seconds of idle time. WHEN YOUR CONNECTION IS CLOSED, DON'T RECONNECT UNTIL YOU HAVE TO: If your connection closed it was for a good reason. Either you closed it because your user was idle, or the server closed it because it felt your user was idle, or maybe the server went down. Don't reconnect until you need to issue your next NNTP command. Example: If a server has 400 connections when it reboots, you don't want 400 clients all pummeling it with packets trying to start new connections while it is trying to come up. Plus, when the service is operational again, only those connections that are actively used should be reconnecting anyway. If you delay reconnecting until the user needs it, the load on the server will be smoothed out since everyone won't be connecting at the same time. WHEN YOUR USER IS IDLE FOR A LONG TIME, DISCONNECT: If your user is idle for more than 5 minutes, why not close the NNTP connection? If you followed the above advice, the reconnect will be seemless and the users will not notice. WHEN YOUR CLIENT IS IDLE FOR A LONG TIME, DISCONNECT: NNTP servers should disconnect if a connection hasn't seen traffic for 5-10 minutes. Let the newsadmin set this time limit, and let them disable this feature if they need to. In a perfect world, all newsreaders disconnect after 5 minutes of idle time, all servers will disconnect after 5 minutes of idle time, and all re-connects will be transparent to the user. However, since we don't live in a perfect world, we have to do our best to do our share. DISCONNECT EVERY 4 HOURS: Whether idle or or not, disconnect from the server every 4 hours. This lets any file handle leaks on the server get flushed out. If you followed the above advice about reconnecting, your users won't notice. DON'T DISCONNECT BETWEEN EVERY COMMAND: I hate to embarrass anyone but the authors of NETSCAPE made the mistake in a beta version (the current one is fixed) where they closed the connection after *every* *single* *article*. You could just hear your system performance DIE as your kernel locks out everything trying to fork() fast enough to keep up with the NetScape users. DON'T CONNECT IF YOU DON'T HAVE TO: nnpost (part of NN) connects to the NNTP port, then put you in the editor. 15 minutes later, you have completed writing your post and the server has disconnected you because your connection was idle. Now it has to re-connect to do the actual posting. The opposite is just as bad: "/bin/rnews -U" (part of the INN distribution) connects to the server every time it runs, even if it doesn't need to send anything. (This actually triggers a bug in certain operating systems. Someone forgot to test the OS to see how it handled a connection being created then closed, with no read()'s or write()'s on it in between.) DISCONNECT CORRECTLY: If you drop the NNTP connection, drop it gracefully. Send "QUIT\r\n" on the socket, then close it. When might you want to do this? For example, if a user cancels any kind of operation while a transaction is in progress with the news server, you may want to abort the news stream. Don't just disconnect the stream! Ungraceful disconnects annoy news administrators because they show up in logs. IF YOUR CONNECTION CLOSES FOR GOOD, DON'T GO CRAZY: Some times a connection dies because a machine is down or doing maintence or maybe the permission file just changed and you no longer have permission to talk to that server. All NNTP-based newsreaders should handle this gracefully. DON'T GENERATE VANITY HEADERS: Don't include a header that identifies what newsreader the user is using. Son-of-RFC1036 explicitly states that this is A Bad Thing. If you haven't seen this header before it basically looks like: "X-Newsreader: this was posted by a user that uses FooReader v33.1 which is the software that I wrote and I put this header in because I'm boring and immature and think that I can make myself famous by adding this header when it really just shows how shallow I am." Well, you're not completely shallow, but you should watch out for the neutering patrol (see Point #1 above). TRASH CERTAIN HEADERS: News posting agents shouldn't generate NNTP-Posting-Host: or Path: headers. News transports that receive posts (i.e. the NNTP "POST" command or non-NNTP inews commands) should notice attempts by users to supply their own NNTP-Posting-Host: or Path: headers and delete them. Of course, the transport should add replacement headers. My point is that if a user tries to supply a NNTP-Posting-Host: or Path: header, they should be silently replaced by the transport (or the mechanism that accepts posts). EDUCATE YOURSELF: Reading RFC1036 and RFC977 in one sitting in a quiet library was the best investment I ever made. Read RFC822 too, but it might put you to sleep so read it in bed. (Certainly do not read it while operating heavy machinery.) You learn all sorts of requirements you may not have known of and they explain many issues. They also tell you certain things that were tried one way but failed, and (therefore) why it is standard practice to NOT do those things. TIPS WHEN REPLYING AND DOING A FOLLOWUP-POST: Don't reply to the user listed in the Path: header. The Path: header is just informational as far as you are concerned, unless you are a news transport (C News or INN). Don't ignore the "Reply-To:" header when doing a reply, and don't invent some data-structure that will prevent you from using the Reply-To: header. FIDO sites have a From: but no Reply-To: field. So, FIDO gateway software just drops the Reply-To: header *OR* promotes the Reply-To: header to replace the From: header "so that replies work right". Well, Mr. Snoop-FIDO-Dog, first of all, to be gramatically correct it's "work correctly". Secondly, you've just broken the spec. There are other pairs of headers that work like this. For example, Followup-To:/Newsgroups: is similar to Reply-To:/From:. However, don't forget to implement the RFC1036 requirement that "Followup-To: poster" (yes, the string p-o-s-t-e-r) means that if the user tries to do a Followup, do a Reply instead. If this happens, don't forget to check for the existence of a Reply-To: header! DON'T CONFUSE THE HEADER AND THE BODY: Between the header and the body of an article is one blank line. It doesn't have anything on it. No spaces, no tabs, no nothing. After that blank line, don't fuss with what you find. I've seen FIDO software that finds headers inside the body and treats them like real headers. For example, such broken software would find 3 "headers" in the above paragraph ("TIPS WHEN REPLYING...") and try to process them. (to be continued...) [ A related note: The Good Net-Keeping Seal of Approval attempts to establish a standard for newsreader behavior on Usenet. For more information on the aims and requirements of the Good Net-Keeping Seal of Approval, see http://www.media.mit.edu/people/rnewman/Good_Netkeeping_Seal ] -- Tom Limoncelli -- tal@plts.org (home) -- tal@big.att.com (work) "Would you compare your system administrator to `Indiana Jones' or `Tank Girl'?" "Both!"