There probably already is a thread on this same subject, but I wasn't able to find one, so I decided to put this question here.
Here's the situation. I want to create a movie database. I've got several sources giving me movie data. One source may give me a movie title and a description, while another source might give me a list of actors for this movie. A new source should be able to add to/update the data already present in the database.
Now my question is, what's the best way to go about designing such a database? Should I create some set of 'master tables' to store the movies? Should I also store the 'raw' data from the sources?
I hope someone can point me in the right direction.
yes, you will need some master tables to store all the data for your movies
the best way to design these tables is to implement the principles of normalization
yes, it's a good idea to have "loading tables" for each source of information, into which the imported data is first loaded, and from which the data is extracted, transformed (edited, verified, translated, etc.), and finally loaded into your master tables
Yeah, sounds like a nice solution, thanks for your reply
To elaborate it some more:
Would you put the 'raw' and 'final' data in different databases, or just in the same?
And, would you store some kind of source id in the master tables (to know which data came from which source)? If so, where would you store these?
if i needed to track where the data came from, then yeah, i would, and i would store that in the master table
Ok, I think I should only add the source id to specific tables, like movie_description or movie_image, since movie_base would contain a factual set of information (like title, year, etc.), which should be independent of the source, right?
Another question, what if I wanted to store multilingual information? The focus should definitely remain on the main language, but having more than one language would be a nice extra feature. I guess adding a 'language_id' field to every master table would not be a very nice solution, is it? Should I maybe create a different set of tables for those other languages?
Well I sort of get the feeling that the data would get kind of 'messy', but yeah, maybe I can just put it in the same tables.
Originally Posted by r937
question for you: would the master_base table need language_id? if not, why not?
I think it does, because most movies have a language dependent title.
I'm almost beginning to think I also need a table movie_title_translations or something, which links title translations to the movies in movie_base, since one movie can have a multilingual title. Then I could make english the default title language, and put the language_id field inside movie_title_translations instead of movie_base.. But then another problem arises, i.e. how to check whether a certain title is a translation of an existing one.
..but maybe I'm just making things more complex than they actually are
Ok, so I've made a first rough concept Let's highlight some stuff:
I think I should make a table movies, which contains factual information (title, year, duration, etc.). Next to that I think I'll create a table movies_region for language dependent data, like this:
This way every movie can be linked to multiple region dependent data. Do you think this is a good way of setting things up?
Also, what is a good primary key in this movies_region table? Would it be the combination of movie_id and language_id, or something else?
Next point. For example, the movie genres. Would this be a good practice?
genre_id [int] primary key: combination of movie_id and genre_id
id [int, auto_increment]
language_id [int] primary key: id unique: combination of genre and language_id
Are the names of the tables/fields like they should, or should I name the id field in genre_listgenre_id for example?
The rest of the concept pretty much follows this pattern. I hope I'm thinking in the right direction! Thanks for any feedback