If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > one movie db, built up from multiple sources

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 05-09-09, 09:40
ipconfig ipconfig is offline
Registered User
 
Join Date: May 2009
Posts: 5
one movie db, built up from multiple sources

Hi all,

There probably already is a thread on this same subject, but I wasn't able to find one, so I decided to put this question here.

Here's the situation. I want to create a movie database. I've got several sources giving me movie data. One source may give me a movie title and a description, while another source might give me a list of actors for this movie. A new source should be able to add to/update the data already present in the database.

Now my question is, what's the best way to go about designing such a database? Should I create some set of 'master tables' to store the movies? Should I also store the 'raw' data from the sources?

I hope someone can point me in the right direction.
Reply With Quote
  #2 (permalink)  
Old 05-09-09, 10:39
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
yes, you will need some master tables to store all the data for your movies

the best way to design these tables is to implement the principles of normalization

yes, it's a good idea to have "loading tables" for each source of information, into which the imported data is first loaded, and from which the data is extracted, transformed (edited, verified, translated, etc.), and finally loaded into your master tables

simple, innit

__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #3 (permalink)  
Old 05-21-09, 04:29
ipconfig ipconfig is offline
Registered User
 
Join Date: May 2009
Posts: 5
Yeah, sounds like a nice solution, thanks for your reply

To elaborate it some more:
Would you put the 'raw' and 'final' data in different databases, or just in the same?
And, would you store some kind of source id in the master tables (to know which data came from which source)? If so, where would you store these?
Reply With Quote
  #4 (permalink)  
Old 05-21-09, 06:59
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
Quote:
Originally Posted by ipconfig
Would you put the 'raw' and 'final' data in different databases, or just in the same?
same database, yeah, but different tables

Quote:
Originally Posted by ipconfig
And, would you store some kind of source id in the master tables (to know which data came from which source)? If so, where would you store these?
if i needed to track where the data came from, then yeah, i would, and i would store that in the master table

__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #5 (permalink)  
Old 05-21-09, 12:52
ipconfig ipconfig is offline
Registered User
 
Join Date: May 2009
Posts: 5
Quote:
Originally Posted by r937
same database, yeah, but different tables
Alright, will do that.

Quote:
Originally Posted by r937
if i needed to track where the data came from, then yeah, i would, and i would store that in the master table

Ok, I think I should only add the source id to specific tables, like movie_description or movie_image, since movie_base would contain a factual set of information (like title, year, etc.), which should be independent of the source, right?

Another question, what if I wanted to store multilingual information? The focus should definitely remain on the main language, but having more than one language would be a nice extra feature. I guess adding a 'language_id' field to every master table would not be a very nice solution, is it? Should I maybe create a different set of tables for those other languages?

Thanks in advance
Reply With Quote
  #6 (permalink)  
Old 05-21-09, 13:29
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
Quote:
Originally Posted by ipconfig
I guess adding a 'language_id' field to every master table would not be a very nice solution, is it?
what makes you think it's not a nice solution?

question for you: would the master_base table need language_id? if not, why not?
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #7 (permalink)  
Old 05-21-09, 13:51
ipconfig ipconfig is offline
Registered User
 
Join Date: May 2009
Posts: 5
Quote:
Originally Posted by r937
what makes you think it's not a nice solution?
Well I sort of get the feeling that the data would get kind of 'messy', but yeah, maybe I can just put it in the same tables.

Quote:
Originally Posted by r937
question for you: would the master_base table need language_id? if not, why not?
I think it does, because most movies have a language dependent title.

I'm almost beginning to think I also need a table movie_title_translations or something, which links title translations to the movies in movie_base, since one movie can have a multilingual title. Then I could make english the default title language, and put the language_id field inside movie_title_translations instead of movie_base.. But then another problem arises, i.e. how to check whether a certain title is a translation of an existing one.

..but maybe I'm just making things more complex than they actually are
Reply With Quote
  #8 (permalink)  
Old 05-23-09, 09:15
ipconfig ipconfig is offline
Registered User
 
Join Date: May 2009
Posts: 5
Ok, so I've made a first rough concept Let's highlight some stuff:

I think I should make a table movies, which contains factual information (title, year, duration, etc.). Next to that I think I'll create a table movies_region for language dependent data, like this:

movies_region
movie_id [int]
language_id [int]
title [varchar]
premiere_date [timestamp]

This way every movie can be linked to multiple region dependent data. Do you think this is a good way of setting things up?
Also, what is a good primary key in this movies_region table? Would it be the combination of movie_id and language_id, or something else?

Next point. For example, the movie genres. Would this be a good practice?

genres
movie_id [int]
genre_id [int]
primary key: combination of movie_id and genre_id

genre_list
id [int, auto_increment]
genre [varchar]
language_id [int]
primary key: id
unique: combination of genre and language_id

Are the names of the tables/fields like they should, or should I name the id field in genre_list genre_id for example?

The rest of the concept pretty much follows this pattern. I hope I'm thinking in the right direction! Thanks for any feedback
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On