Copyright © 1995-2010
iNet//Web Solutions
2811 La Cuesta Drive
Los Angeles, CA 90046
(323)851-6400
Fax (323)851-4570
 
[ Logo]

DATABASES


What's a Database and Why Do I Care...?

Inevitably, within the first five minutes of a telephone conversation with a potential new client, we get around to the subject of how many products the company has in its catalog, and how we're going to get that list on the Internet. Of all the variables which determine the underlying architecture of a site, and therefore the bottom-line cost, this is one of the most important.

The number of products will determine the entire file structure and inner working methods of the site. And the more complex the programming to handle the products, the more costly the project.

It's a "row-of-dominos" decision process ... the number of products determines the storage file format, which determines the "search" procedure we must use, which determines how the catalog pages will be delivered to the shopper, which in turn can limit the choice of server operating system. Every one of these dominos has a direct effect on the cost of the site.

First, some basic stuff ... regardless of the size of your catalog, all product information is stored in one or more data files on the server. When a shopper asks for a specific product, the details are extracted from those files and displayed - price, description, photo, technical data, required additional items (batteries, or a stand), multiple discount pricing, and so on. The format of the file which stores this information is at the crux of the decision sequence - and the format decision is largely determined by the number of products.

All databases have essentially the same elements: they contain "records" - an entry for every product - and each record contains several "fields" - one for every fact about that product. So, for instance, the Men's Clothing Database may contain a record for a pair of pants, which contains one field for the stock number, one for the manufacturer, another for the color, the material, the available sizes, the shipping weight, the price, and so on ... as many fields as are needed. We can represent that record like this, using commas to separate the fields for clarity:

MW6323,Bronco Manufacturing,Blue,Denim,30-44,2.6,29.95

There are three different ways to store all this information which are in general use on the internet: Flat-files (sometimes called Text files), Hashed files, and SQL files. Let's deal with them one at a time, and talk about the advantages and disadvantages of each ...

FLAT-FILE DATABASES: If you type in all the information about all your products using the format above, and save it as a plain text file, you have created a Flat-file database. Every product is listed on a separate line - which means the "records" are separated by carriage returns - and all the details are listed on that one line - the "fields" are separated by commas.

Applications: Databases of 2 to 50 records.
Advantages: The first is the most obvious - simplicity. A Flat-file database is easy to produce, easy to read, and easy to extract data from. There are no restrictions on their use: any person can use any computer with any Operating System to produce one, and any computer with any Operating System can be the server for a site using Flat-file databases ... text is the simplest, most common file type which is shared by virtually every computer in the world.
Disadvantages: The very simplicity of a Flat-file database is also its greatest drawback. In order to extract the record for the product which happens to be on line number 999 of the file, we have to read the first 998 lines, check if that's the product we're seeking, reject it, and go get the next line. The larger the number of records, the more inefficient this system becomes. And the more inefficient the search, the slower the download to the shopper ... and the slower the download, the faster the shopper will leave. Can you imagine how long it would take American Airlines to find your reservation if they used a Flat-file system?!?
HASHED FILE DATABASES: It would solve the Flat-file's biggest problem if we could jump straight to line 999 ... and that's what a Hashed-file system can do. A unique value representing every product in the database is defined - say, the stock number - and that "key" value is stored at the beginning of a Hashed-file database together with the line number where the rest of the information is stored (that's an over simplification, but it's an accurate metaphor). When you want to extract the information about the blue jeans, ask for product number MW6323 ... the Hashed-file directory looks up the entry "MW6323999", and immediately jumps to line 999.

Applications: In general, 50 to 1000 records.
Advantages: Hashed-file databases have three main advantages: random access, random access, and random access. The ability to create and use hashed files is also built into PERL, one of the most common programming languages on the internet ... so a hashed file system can be used on any server which has PERL, and that's most of 'em.
Disadvantages: If you need to access the data using any method other than random access, you're back using a Flat-file database. For instance, if you need to extract a subset of the whole database - if you want all men's pants which are blue - you'll need to read in every line of the database and check it, just as you would with a Flat-file system. Inefficient ... equals slow ... equals the sound of shoppers leaving. As long as you remember your reservation number ("010302AQX3614-US"), American Airlines could call up your file quickly; but if you forget it ... Another disadvantage: the minimum size of a PERL hashed file is 16k ... one product, 16k, no rebate.
SQL DATABASES: "Structured Query Language" ... that should give you a big fat hint! This isn't just a technique, it's a full-blown computer programming language in its own right, one which has become the standard way of dealing with databases and extracting the information you need. Not just on the internet, either - we're talking on computers, period. SQL offers random access, not just by key values, but random access to any piece of data anywhere in the file, which in many applications contains literally millions of records.

Applications: from 1 to millions of records.
Advantages: Using SQL, you can call up a list of all the men's blue jeans available in size 36 which cost under $40.00 manufactured by a company whose name contains a "Q" ... and you can get that list of records out of a 500,000 line database almost as quickly as you can get the first record out of a flat-file database. SQL is universal, it's fast, and it's flexible: if you can define the parameters of the search you want to do, SQL can do it for you.
Disadvantages: Your site must be hosted on a server which has SQL capabilities, and that restriction not only narrows your choices, it could have a major impact on the cost.
BUT ... READ ON: ORACLE and several other companies have successfully marketed high-end (and high-cost) operating systems for servers to handle large databases ... and if you need Oracle SQL, internet hosting companies are going to pass that cost on to you. These operating systems can cost tens of thousands of dollars ... or, you could spend a couple of hundred dollars and have a universally available SQL system which will do almost everything the big guys' systems will do, and just as fast.

We at have successfully created several database-driven sites using MySQL, which is a very polished subset of the full SQL - and the cost is $200 for a lifetime license. Don't let the phrase "subset" bother you - it's 99.3% compatible, and in 11 years of working with it, I haven't figured out what that 0.7% is ...

Please don't get me wrong - Oracle's system is the best, the sine qua non, and it will do stuff none of the others will do, thank you very much. But for a small- or medium-sized company whose needs can only be met by a system with SQL flexibility and speed, there are legitimate alternatives to $130,000 price tags - and MySQL is one of them. It provides all the advantages of SQL, and none of the disadvantages.

One final word about cost is probably in order. At iNet, we decided early on that we would not price a site by the number of pages, as many designers do; to us, the number of pages has no practical bearing on the time it takes to develop a site, and therefore is a lousy way to charge for a job. Rather, we estimate the number of hours work it will take to complete the tasks, then multiply that figure by hourly rates ... that total price is guaranteed, Assuming only that the basic requirements don't change, the cost won't change, either, no matter how long it takes us to program the site, nor how many changes and corrections are needed.

... and these are the three solutions to question number one in database site design: how many products in the catalog? We've designed sites which use Flat-file databases, and sites which use Hashed-file databases, and sites which use SQL databases ... we have the experience and the know-how to translate your product line onto the Web. So why not give us a call, and put our team to work for you? The consultation is free!

Sincerely,

bowker@iNetWebInc.com
President, Odd Days
(323) 851-6400





What's a
Database?
Database
Architecture
On-line
Catalogs
Shopping
Carts
Secure
Ordering
Getting
Noticed
Getting
Paid
Dictionary
of Terms
Home
Page


Home Page