i am designing a database to store various types of logfile lines, which record proxy-server activity on firewall machines.
Because the lines are not related, it is hard to come up with a model that groups them into categories. In addition to this, it seems more efficient to create tables for fields which contain long strings, like the path to a resource on a domain, and use integer IDs to index them, then post these as foreign keys in a logfile table.
your data model looks fine, except that you don't have to split things out unless it will represent a real savings somewhere
for example, take username
if username is some humungous 64-character string, and let's say there's only a few different ones, and let's say there's hundreds of thousands of line records, then by substituting userid as a FK to a separate user table, you've saved a boatload of space in the line table
otherwise, i don't see the point
i mean, if you were after a specific user's lines, you could just as easily say
from line, user
where user.userid = line.userid
The design looks fine, and I am presuming you need to optimise the design for speed of data retreival.
Coming from a multivalue background I would implement the design as is, with each of the index files keyed on id and containing a multivalued list of the lineid's. My personal preference would be to assign id's using a seuqntially assign base 64 key value - 6 digits will give you 68 billion separate ids, which should be enough! I'd add in inversions of these IDs for ease of lookup, i.e. add a user_id keyed on username which holds userID as the only data value - one-to-one reference table.
I am no Oracle expert, but my understanding is that Oracle have abandoned any pretence at being true relational database and implemented multivalued fields in 8i. Maybe you could use those to store the index records.
In multivalue, you could then do queries like:
BSELECT mailfrom_id "JOE BLOGGS" emailID
BSELECT mailfrom emailID lineID
LIST line time day month path file bytes ....
where the first command does one database read to get the relevant emailID for JOE BLOGGS, the second reads the index record for that id and returns a list of line ids which are passed to the third command, which in this example just lists the data values for the ids returned. Lets say JOE BLOGGS has 30 line record out of a database of 1.5 million lines. How many database reads would you perform in this model? One for the id lookup, one for the index read, then 30 dtail lines. Total 32.