I need some help designing a database containing information from 70GB of proxy logs.
To keep it basic, there are mainly 4 fields I'm interested in:
Userid IP Address Time URL
I have a perl script to insert the data parsed from the logs. I'm wondering though, can I just have one table with the 4 column headings above or do I need a cleverer design than that? As I fill this table up I'm wary that it's going to take longer to run a query, such as 'select URL where Userid="ed"'. What would be the optimal design for something like this? Also which is database product would be best? MySQL or PostgreSQL?
You would create your table and then create indexes
create table tablename(column1, column2...)
create index indexname on tablename (column1, ...)
Then you can forget about the indexes and concentrate on population and querying. The indexes will work in the background.
You create the indexes according to how you query. If you always query on column1 and column2 together, you would create an index for those two columns. Otherwise create an index for each column separately. Separate indexes assure that all available indexing will be used for each query.