The Northern Spy
August 2002
An Internet History and Primer
by
Rick Sutcliffe
|
|
Introduction
All right, Nellie, today the Spy puts on his teaching hat for a primer on Internet history and usage. He was there, an early Internet user, but has forgotten a lot of this stuff himself, and has to ask you to look up some of it on the net for a reminder.
Disclosure statement
The Spy's Arjay Enterprises owns Arjay Web services http://www.arjayweb.com which in turn runs a domain name registration service at http://www.webnamesource.com and a web hosting service at http://www.webnamehost.net. These are not big time money making get rich quick schemes, but do pay for his own voracious appetite for domain names and hosting space. Mention of these services herein ought to be taken as providing examples, not as an attempt to influence.
History
Before it was called that, the Internet had its start in 1969 when the United States Department of Defense commissioned ARPANET, (Advanced Research Projects Agency Network). Between September and December of that year, the first four nodes (UCLA, Stanford Research Institute, University of California Santa Barbara (UCSB), and University of Utah) came on line. The following year saw the first cross country line, and by 1971 there were fifteen institutions with 23 nodes. Meanwhile, other networks had formed, sometimes among the universities and colleges of one state, others within a single institution.
By 1972 the first eMail programs were used, and with similar basic functionality to what one sees today. Messages could be listed, filed, forwarded, and selectively responded to. If memory serves me correctly, FTP (File Transfer Protocol) was also used to transfer messages, and this practice may have survived for some time afterwards. However, eMail proved to be the killer app, consuming some 75% of all traffic by one year later. 1973 also saw the origin of ARPANET news, the first international connections (to England and Norway) and Bob Metcalfe's invention of Ethernet. By this time, there may have been 2000 users on ARPANET and numerous localized networks had been created.
The Queen of England made the news by sending her first eMail message in 1976. UUCP (Unix-to-Unix CoPy) was first used the same year. The TCP/IP protocol was in use by 1978, USENET started in 1979, BITNET (because it's time) in 1981 (between Yale and the City University of New York) and CSNET (Computer Science Net) also began that year. All these would eventually become combined into a single network, but initially operated independently.
Modern internets, that is, a series of computers connected specifically with TCP/IP, began in January 1983. This year also saw the use of the term "The Internet" for the system of connected internets, gateways established to BITNET and other nets, the departure of several ARPANET nodes for the new Internet, and the invention of nameservers. Before that, one needed to know the path data had to take to get from one node to another. This year and the next saw widespread networking among universities in Europe and Japan, with Canada developing NetNorth in 1984 and connecting through a gateway to BITNET (however, most of us who used it called this BITNET North).
This year also saw the deployment of DNS (the Domain Name System) that connects names such as apple.com to numbers such as 17.254.3.183. The first domains in .com, .edu, .gov, .net, .uk, and .org space were registered during 1985. In 1986 the U.S. National Science Foundation (NSF) created NSFNET with six supercomputers, a configuration that became the backbone for widespread expansion of the net.
By 1987, the Internet had 11000 hosts, about 10% on BITNET. A year later there were over 60000. 1988 was also the year the famous Internet worm brought down about 6000 hosts. CERT (Computer Emergency Response Team) was founded to coordinate alerts to such threats. Several countries, including Canada, joined NSFNET this year, although connectivity was fragmentary, with individual providers hooking up independently to various gateways.
In 1988, while preparing a text on ethical and social issues for Charles Merrill, I coined the terms "Fourth Civilization" "New Renaissance" and "Metalibrary," the latter referring to the coming hyper-linked global electronic library. We've begun building toward this facility, but the current world wide web is merely a primitive prototype of what the Metalibrary will yet become when it fulfils the premises on which the information age is built.
To this point most of the network was government-run or sponsored, with private involvement minimal, and by this time, there were nearly twenty countries connected to NSFNET. But in 1989, Compuserve and MCI Mail established gateways to the Internet, allowing it to route commercial messages (I could send from my personal Compuserve account to my university one--wow!). The camel had its nose in the tent, and the rapid commercialization of the net was to follow.
1990 saw the first dial up commercial access, and the merger of the ten provincial authorities in Canada into CA*Net (originally CDN Net), taking the control of entities such as BCNet from the universities into a quasi-public independent realm.
In 1991 Tim Berners-Lee turned the net upside down by developing the World Wide Web and the first browsers, the newest killer app. By the following year the number of hosts had exceeded one million and InterNic was created by NSF to provide domain name registration and other services, initially with a monopoly in this area.
By 1995 NSFNET's role as connectivity provider had been taken over by numerous interconnected providers, and NSF reverted to its roots, beginning development of a new high speed research network. Also by this time the http protocol had surpassed all others in traffic. Domain registration now cost money, but in 1999 would become competitive as other registrars were allowed to enter the field (and drive the price down). I recall remarking to my technician about this time (to his great amusement) that I'd thought of several rather nasty ways to bring down the Internet using eMail, an insight that was to prove all too prophetic. No one is amused any more, and I routinely delete all uninvited mail attachments.
As the decade closed, many of the Internet's functions had become privatized, or placed at arm's length from government, at least in the industrialized world. Throughout this period the number of country domains multiplied, though countries such as China acted contrary to the premises of the age and imposed severe restrictions on use and content. Meanwhile, hacker attacks, particularly of the denial of service variety, have proliferated. By 2002, the number of hosts was estimated at just under 150 million and the number of domains was in the millions.
During this time I've personally gone from building my own computers, stringing three wire RS-232 connections from room to room to share printing facilities, and using 110 Baud acoustic modems for long distance communications to having 1G Ethernet at work (10 G switches now available) and high speed cable at home. No more breakout boxes to configure crossover cables; my Mac does that automatically, regardless of what I plug in.
So much for nostalgia. Let's turn to the nuts and bolts of how the system currently operates.
How it works today:
When someone sends out a request (say, using a browser) for a page, using an URL (Uniform Resource Locator) such as http://www.arjay.bc.ca/store.htm, the part before the colon and two slashes (http), specifies the protocol of the request (this one is HyperText Transfer Protocol). The part between the two slashes and the next slash is the name of an Internet domain, and the part after the first single slash specifies a path name in the file system of the target system. Thus, this URL says to use HTTP to communicate with the domain www.arjay.bc.ca and fetch the file store.htm.
In this case, the requesting system sends a Whois query to the .ca national domain central registry, which looks in its database to find the authoritative nameserver (DNS) for that domain. Said nameserver is then queried to translate the domain request into an IP number (in this case 66.96.246.52), and this is used to locate the desired host (each of which broadcasts one or more such addresses), find a route through the net to it, and pass the query for the file to the http server. The requester (and some intermediate machines) may store or cache the name and IP address associated with it for a time so that subsequent requests (if any) in the next little while can be replied to more quickly.
The numbers in each group run from zero to 255, by the way, not just to 100 as I've seen some people mistakenly claim. This means there are 4 294 967 296 possible machine addresses, a limit toward which the net is rushing rather quickly, prompting several suggested reforms of the system. The best route may, by the way, change during the course of a transmission, so that not all data packets between two points necessarily travel the same path.
So much for the low level nitty gritty, Nellie. How does an individual get set up with her own functioning web site? To answer this, it is necessary to take a close look at names.
The domain system:
Each host on the Internet has a name, such as arjay.ca. The last part is the top level domain, and is usually an universal one such as .com, .net, .org, .info, .biz, or .name. Alternately, it may be an institutional, like .gov or .mil. Finally, it can be a country domain (always two letters). Examples of the latter include .ca, .uk, .de, .us, or .ie. Most country domains require one to live or do business in the country to own a domain there (regulations vary). However, a few countries have cut agreements for private firms to manage their domain systems in return for a cut of the action, and these are usually open to all. Thus, .ws (Western Samoa) is widely advertised as meaning Web Site, and .tv (Tuvalu) is sold for the abbreviation's recognition factor. Also in this open country domain category are .cc (Cocos Is), .bz (Belize) and .nu (Niue).
The part of the domain name prior to the last period is called the second level domain, and it is this name you register at an appropriate registrar for the desired top level (most registrars sell the universal and open country domains, plus a few of the less restrictive others). Subject to national restrictions mentioned above, anyone can register a second level name through any top level domain, provided it does not infringe on a copyright or trade name. Thus, one could not secure use of cocacola.com, ibm.net, or enron.info, as company lawyers would soon come calling. There is a dispute-resolving mechanism, and its tribunals always return names to their trade owners.
Domain names may contain numbers, letters, and n-dashes only, and are not case-sensitive. Some top level domain registries require names to be at least three characters, or perhaps charge a premium for very short names. Occasionally in the past names like business.com sold for large sums, but this rarely happens any more.
Individual directories that partition the files on a site may be designated with third level domain names (subdomains), such as www.arjay.ca or modula2.arjay.ca. Alternately, some countries have at times used third level domains for geopolitical divisions, creating such names as arjay.bc.ca where the bc indicates British Columbia. The .name registry only allows third level names to be registered. Thus one may not register hacker.name, only nellie.hacker.name.
Once a person has chosen a name, the next step is to go to a domain name registrar or subcontractor thereof, and attempt to register it. "Attempt" because most easy-to-think-up names (including EveryWordInTheDictionary.com) are already taken. Some creativity is indicated. Thus mygreatbusiness.com may be taken, but mygreatbusiness.net, my-great-business.com, or 4mygreatbusiness.com may be available.
Names can be registered for periods of one to ten years, and depending on the registrar, the top level domain, and the services offered, cost anywhere from eight dollars a year up (way up). Good registrars charge under $20 per year for the universal domains and offer easy-to-use password-protected tools for making changes to the domain information. They will also lock your domain (or allow you to do it) so that it cannot be transferred by a hijacker without your knowledge. Poorer ones offer few or no tools, don't respond to eMail queries, and are sitting ducks for hackers.
Observe that you do not, strictly speaking, own a domain name. You merely rent its use for a specified period. If you fail to pay the annual registration fee, your domain expires, and when the registrar releases it, anyone can register it. Many people watch expirations closely, hoping to secure popular names for their own purposes (often attaching it to a porn site). The most common reason for an unexpected expiry is that the registree provided an eMail address on registration, but changed it later without telling the registrar. A renewal message went out, but no one got it.
Initially, a newly registered domain name doesn't point anywhere except to a generic site on the registrar's system that typically presents advertising for the registrar. So the next step is to develop a web site using appropriate tools and secure web hosting from a commercial provider. When selling web hosting, the host gives the names and IP addresses of two or more nameserver (DNS) machines. These machines have names like ns1.webnamehost.net and know where to find your site on their system, so any query sent to them will be correctly directed. Return to the registration service where you bought the name and use its tools to change the nameservers to those you were supplied by your host, wait up to three days for the change to "take" and after that, any queries sent to your site's name arrive at the correct destination (query to top level registry, then nameserver, then site).
When DNS queries are done at a host's nameserver, numerous sites may be dispatched from the same IP address. This is called name-based hosting, and is now common. There aren't enough IP addresses around for every host to have its own, so this is the only way the current expansion can continue unabated.
Another (usually more expensive) option is to obtain an exclusive IP address from the hosting provider. A registrar can probably handle routing to such sites also, if they provide DNS service (not all do). If so, create "A" records in that nameserver's database with the IP number of your site, and you're in business. If the registrar does not do this, the web host should (through its DNS), and if not, there are third parties such as zoneedit.com that exist solely to provide DNS services (In this case, you have registration, DNS and hosting at three different locations).
Some web hosts want customers to transfer technical control of the domain to them, or more accurately, to an account they have with a registrar, so they can control the DNS themselves. This makes things more convenient for them, and relieves the customer of understanding the plumbing described here, but it isn't necessary. More knowledgeable webmistresses like Nellie are better off retaining technical control of the domain so they can make a fast switch if the web host goes out of business (Alas, all too many do).
With whom could you host? Let me count the places.
Kinds of hosting:
The first option is free. These have the advantage of cost, but two disadvantages, besides the tendency to change to paid plans or go out of business without warning. The first is that you end up with an URL that looks like http://www.everybodyknowsthisisafreeservice.com/ourcommunity/customershomepages/~me/index.html which is not only ugly but the search engines know it's free and may not index it. You can use the DNS service to redirect your short domain name like me.com to the long (real) one, but the long one will show up in the browser navigation window. Some DNS services will offer frame redirection, where the short name shows in the navigation window, but what viewers are seeing is a small dummy Web Site with no content that is a frame around a window with the real content. Search engines cannot see past the dummy site to the real one, so you have to give them the real URL, which they can tell is free, and... You get what you pay for.
Most commercial hosting is done using the Red Hat Linux flavour of UNIX running on one of many servers located in a well-connected data centre. These house up to thousands of server boxes, and there are many such centres. Typical providers allow the use of programming languages such as Perl, C++, Java, and PHP, and provide a cgi-bin directory to store these programs so they can be called from web page front ends for special effects. They also allow round the clock FTP access to update sites, an anonymous FTP directory, statistics programs to determine the number of "hits" on the site, POP accounts to receive mail, SMTP facilities to send it, the ability to create subdomains, access to a database program, and possibly several other goodies, such as free shopping carts (Full e-commerce with credit card authentication is much more expensive, though).
Usually one can enable a set of extensions to allow Microsoft's Front Page to work with Linux sites, though support for this option may be sketchy. Look for quality options such as a catch-all eMail address, unlimited mail forwarding, and a system-wide porn and spam ban (Without this your site could be blocked because it shares the same address as the black hat).
All this activity is directed by the site owner using a control panel. The main ones are made by Plesk, Ensim, Cpanel, Alabanza, and Raq. All offer similar functionality. These are front ends for a set of underlying scripts that allow a user control over a variety of the web site's functions without having to involve tech support, and therefore lowering the cost of hosting. I've used all these at one time or another, but prefer CPanel and Ensim to the others for speed, ease of use, reliability, and features.
Linux pricing varies from $5 to $50 a month, depending on how much space the site need and how much traffic (bandwidth) it generates. A starter site requires under 10M of space and no more than 25M of bandwidth a month. At the low end of this price range, expect to pay by the year, as monthly charges eat up too much in credit card fees to make it worthwhile (For instance, WebNameHost.net is a Red Hat host that specializes in small, high quality sites and does only annual contracts). All in all, Linux offers good speed, many choices, and the best security.
Windows-based hosting using Microsoft server products is a second choice. This works best with Microsoft's own page preparation products such as Front Page or Word's HTML save function (provided your visitors only use Microsoft's browsers), but is more open to hacker attacks. Many large data centres no longer allow these servers, as attacks on them deny traffic to the other machines in the centre as well. Some corporations, have adopted exclusive Microsoft policies, so they also use these servers, but they can afford the large support staffs required to keep them in operation. Support costs are high, and hosting tends to be more costly and less reliable.
A brand new third option is to host with a company using Apple's new Xserve machines in their server farm. This is, of course, also UNIX but the BSD flavour. So far, there aren't many, such companies active, but one that has recently announced it's coming is XServeRack.com (no pricing as yet available.)
Recommendation: find web sites on similar topics to yours that you like and that are fast. Find out who hosts them and see if you can cut a deal. Usually the host will have sample packages on display on their site, but other possibilities can often be negotiated. Perhaps they'll throw in a little extra bandwidth, a domain name registration, or give you two small sites for the price of one medium sized one. You never know till you ask.
What can go wrong?
If any of the information in the top level domain registry or in the DNS is corrupted, or if the domain's user has allowed it to expire, or if the webmistress of the site has rearranged the pages so the URL is no longer valid, traffic never reaches the requested page. If a section of the network near the source or destination of the request is under hacker attack, the traffic may be too slow for practical use. If the company providing your services goes broke, your site will be hosed, and you'll be out the remaining charges. If you prepare your pages in Word and try to view them with a non-MS browser, they won't display correctly. Hey, life is an adventure.
What to do when the site is up and running?
Make sure everything is operational, and that all pages in the site have a title tag and meta tags (assignment: look it up) for description, author, and keywords. Every image tag should have an alt field (preferably containing a keyword), and the keywords you list should actually occur in the text of that page. Check your html syntax with a tool designed for the purpose, and do likewise for the links to ensure they're all valid. Only then should you submit your pages to search engines. The most important are Google, Yahoo, AOL. MSN, and Northern Lights, followed by regional engines such as Canada.com or MyBC.com. Buying a submission service may work, but is likely to generated more junk eMail to you than hits on your web site. INeedHits.com is cheap and reliable, but like all these services, does not submit to the major engines. Those you have to do by hand.
In the up to six months before the search engine spiders your site to see what is there, you want to find other related sites and offer to trade links with them. The more sites that link to yours, the higher the ranking on search engines. During this time, you can also refine the look and feel of the site so visitors stick there long enough to engage their thinking and perhaps to buy something from you. Time you're finished all this, you'll not only know more, you'll be on your way to becoming famous.
'Nuff said for now, Nellie. In a later lecture I may chat about setting up your own hosting company.
References
Zakon, Robert H. Hobbes' Internet Timeline v5.6 http://www.zakon.org/robert/Internet/timeline/ (2002 07 08)
Sutcliffe, R. The Fourth Civilization--Technology, Ethics and Society commissioned by Charles E. Merrill Publishing but not published by them http://www.arjay.ca/EthTech/Text/index.html Copyright 1988, 1998, 2002
XServeRack: http://www.xserverack.com/index.php
--The Northern Spy
|