Results 1 to 9 of 9
  1. #1
    Join Date
    Nov 2011
    Posts
    13

    Unanswered: Made my first proper postgresql database - Any advice?

    Hi guys,

    I would really like some advice from any DB gurus who have a few minutes free. After doing some reading and playing with sqlfiddle over the weekend I have constructed this schema and it is the first proper one I've ever done, so I am sure I've made some poor choices.

    This is what I've got for a "job" and "advert" site:

    Code:
    --
    -- Application Database Structure
    --
    
    -- Company Table
    CREATE TABLE company (
        id SERIAL PRIMARY KEY,
        name varchar(60) NOT NULL
    );
    
    -- Country Table
    CREATE TABLE country (
        id SERIAL PRIMARY KEY,
        name varchar(255) NOT NULL
    );
    
    -- Location Table
    CREATE TABLE location (
        id SERIAL PRIMARY KEY,
        name varchar(255) NOT NULL,
        country_id integer NOT NULL REFERENCES country (id),
        coordinate varchar(255) NOT NULL
    );
    
    -- Source Table
    CREATE TABLE source (
        id SERIAL PRIMARY KEY,
        name varchar(60) NOT NULL
    );
    
    -- Payment Method Table
    CREATE TABLE payment_method (
        id SERIAL PRIMARY KEY,
        name varchar(60) NOT NULL
    );
    
    -- Payment Table
    CREATE TABLE payment (
        -- Generic
        id SERIAL PRIMARY KEY,
        ip varchar(255),  
        email varchar(255) NOT NULL,    
        amount varchar(255),
        payment_method_id integer NOT NULL REFERENCES payment_method (id),
        -- Credit Card
        card_name varchar(255),
        card_expiration varchar(255),
        -- Paypal
        paypal_email varchar(255),
        paypal_transactionid varchar(255)
    );
    
    -- Job Table
    CREATE TABLE job (
        -- Identification
        id SERIAL PRIMARY KEY,
        solrid varchar (10),
        -- Job Information
        title varchar(60) NOT NULL,
        description varchar(255) NOT NULL,
        truncated_description varchar(255) NOT NULL,
        keyword_description varchar(255) NOT NULL,
        how_to_apply varchar(255) NOT NULL,
        website_url varchar(255) NOT NULL,  
        logo_url varchar(255),
        page_url varchar(255) NOT NULL,
        -- Dates
        date timestamp NOT NULL,
        expires timestamp NOT NULL,   
        -- Linked Tables
        company_id integer NOT NULL REFERENCES company (id),
        source_id integer NOT NULL REFERENCES source (id),
        location_id integer NOT NULL REFERENCES location (id),
        payment_id integer NOT NULL REFERENCES payment (id),
        -- Status Flags
        active boolean DEFAULT FALSE,
        premium boolean DEFAULT FALSE,
        indexed boolean DEFAULT FALSE,
        error boolean DEFAULT FALSE,
          -- Email Flags
        email_reminder boolean DEFAULT FALSE,
        email_confirmation boolean DEFAULT FALSE
    );
    
    -- Advert Table
    CREATE TABLE advert (
        -- Identification
        id SERIAL PRIMARY KEY,
        -- Content
        title varchar(60) NOT NULL,
        description varchar(255) NOT NULL,
        page_url varchar(255) NOT NULL,
        -- Dates
        date timestamp NOT NULL,
        expires timestamp NOT NULL,
        -- Email Flags    
        email_reminder boolean DEFAULT FALSE,
        email_confirmation boolean DEFAULT FALSE
        -- Linked Tables
        source_id integer NOT NULL REFERENCES source (id),
        payment_id integer NOT NULL REFERENCES payment (id),  
        -- Status Flag
        active boolean DEFAULT FALSE,
        error boolean DEFAULT FALSE
    );
    Could anyone give me some guidance / possible improvements before I start building my site about this DB? Any help would would be really appreciated.
    Last edited by Spadez; 02-03-13 at 06:03.

  2. #2
    Join Date
    Nov 2003
    Posts
    2,935
    Provided Answers: 12
    OK, here are some comments.:


    First: why varchar(255)?

    There is absolutely nothing special about 255.
    It's (technically) the same constraint as 341 or 46537

    If you were assuming this would "optimize" anything, then you were wrong. If you don't know the length of a column (or don't want to constrain it) use text or varchar without a limit.

    What is the table "source" for? The name and the column names don't reveal anything about what it's for. (You could have called it "thing" as well).

    If the coordinate column is just for information, that using a varchar seems OK. If you want to do more with that (e.g. proximity search and other things) using a point or even the PostGIS extension with real longitude/latitude support might be more approriate.

    Shouldn't the payment table link to a "customer" or "user" table? What is the email column for in that table? (Again a poor choice of names. customer_email or user_email might be better if that's what is being stored there).

    The amount should be a numeric column (e.g. decimal(12,2). Having numbers in text column is a very, very bad idea. This will give you a lot of trouble in the long run).

    As you already have a countries table: do you expect payments in different currencies? If yes, you should have the currency in the payment table as well.

    Postgres also has a native IP address data type. You might want to consider that instead of varchar.

    Why do "description" and "truncated_description" have the same maximum length? (again those "magic" 255...) I'd simply use one column called description with datatype "text".

    Using reserved words for column names is a bad idea as well. You should rename the "date" column in the job table to something else. Which "date" exactly is that? The publish_date? creation_date? valid_from?

    Also if this is a real date, you should use the datatype "date", not timestamp (that is unless you actually do need the time information that comes with a timestamp). The same goes for "expires" (which is a better name). But unless your expiration includes a time part, I'd use the "date" data type there as well.

    What is the "indexed" column for in the "job" table?

    Why does the "job" table link to the "payment" table? And is there always exactly one payment for each job? (The same goes for the advert table). Or is it possible that e.g. an advert is being paid in two payments? If yes, you need a m:n relation between payment and job or advert.

    You might also want to consider using SQL comments to describe the table and columns. Those comments are displayed in most SQL GUI tools and help you understand your database:

    Code:
    comment on table country is 'Stores the countries';
    comment on column country.id is 'Primary key';
    comment on table source is 'The source of foobar';
    I will not read nor answer questions where the SQL code is messy and not formatted properly using [code] tags: http://www.dbforums.com/misc.php?do=bbcode#code

    Tips for good questions:

    http://tkyte.blogspot.de/2005/06/how...questions.html
    http://wiki.postgresql.org/wiki/SlowQueryQuestions
    http://catb.org/esr/faqs/smart-questions.html

  3. #3
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    you dont' 'have' to use an autonumber (SERIAL) for everything. soemtimes it makes sense to use other vlaues.. a so called natural key.
    ferintance I'd be tempted to use an alpha code that made immediate sense when looking at the table (eg CHQ:Cheque, EFT: Electronic Funds Transfer, TT.... so on)

    As Sindho notes above columns should be big enough for the anticipated values. although disk space is re;atively cheap so the need to store say 89 in place of 1989 has largely gone there

    im not entoirely convinced that you have properly normalised the design
    ferinstace you have a column logo_url in the jobs advert. are you really saying that a specific job advert has a specific logo, or are you instead meaning its the company offering the job's logo in which case it shoudl be in the company entity

    equally be careful with default values
    ferisntance I can understand why the error in advert may be default false, but should the active column also be false or should it be true. you can argue it each way. but if that represents a process whereby an advert has to be checked prior to use, then presuambly who checked/released it and when is also important?

    Keywords
    always tricky call, but if this is supposed to be a live website, then presuambly you'd expect people to search or match based on keywords. the way you are modelling it at present presents two problems

    COMPUTER LITERATE, Computer Literate, computer literate and not neccesarily the same (they could be if you used the correct matching technique (say always store as upper (or lower) case). but that doesn't allow for typos so you'd probably need something to verify keywords make sense.

    having, say a comma separated list of attributres all in one column is a possible symtom of issues. I'f be tempted to push keywords (for a jon) into a junction table comprising the PK of JOBS and the PK of a new table Keywords. I'd also be tempted to defien the Keywords so that you could pattern match or associate with other similar Keywords

    eg
    finance: accountancy
    dba: database and so one
    what self respecting recruitment website wouldn't offer say DBA as equivalent to developer
    I'd rather be riding on the Tiger 800 or the Norton

  4. #4
    Join Date
    Nov 2011
    Posts
    13
    Thank you so much for the advice, I really wasnt expecting such a comprehensive reply!

    I need some time to look through this and make the changes but you did raise one really important clock. I will have the ability to extend the advert / job which would mean a second payment and increase the expiration.

    How can I make it so I could receive multiple payments on a single job / advert? At the moment I link the ID's but this is only possible if there is one ID.

  5. #5
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    well what do you think
    how would you model that
    ..is it a effectively a new advert
    ..should your payments handling model be changed

    when you are modellinbg something its nearly always smarter to use natural keys if available. its also nearly always smarter to use interantional standards on data if available


    ferinstance you have a table for countries and defined it using a an autogenerfated key. instead you could have defiend it suing say the ISO 2 or possibly better) 3 digit country code, soemthign that is guaranteed to be unique
    https://www.google.co.uk/#hl=en&tbo=...w=1703&bih=915

    it also means that you can probably get the list of countries from somewhere else.
    I'd rather be riding on the Tiger 800 or the Norton

  6. #6
    Join Date
    Nov 2011
    Posts
    13
    It wouldn't be a new advert it would be an extension so that means I would need to be able to link more than one payment_id to an advert. I don't know of a way to do this other than encasing everything in another table which links a payment to an advert

    ID
    Payment
    Advert

    Just so I understand correctly, are you saying ditch the uniqueID for the country and replace it with the countrycode?

  7. #7
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    if you are using a relational db, such as any SQL db, then there needs to be a method of uniquely identifying any one specific row. how you do that is up to you
    SQL by Design: How to Choose a Primary Key

    use a surrogate key if there is no clear candidate key, or if the obvious natural key coudl be a problem (performance wise), of if the candiate primary key coudl be subject to change.

    choosing to use the ISO 2 or 3 digit key isn't right, and equally it isn't wrong. it is however in my books an obvious choice. aside from Jugoslavia not many countries disappear, new one do appear eg South & (North) Sudan from Sudan. often iots a matter of opinion. for me the code exists its maintained elsewhere its an obvious candidate. however you could just as equally assert that a an autonumber column is just as relevant. the main difference (and its actually no thtat important here) is readability and legibiltiy.
    an autonumber is precisely that its a numeric value. the ISO 3 digit code for country or currency is an international standard. do if you see 'GBR' or 'USA' or 'AUS' its immediatel;y obvious in your code what the value is. numeric values can be subverted (ferinstance people who insist on trying to apply meaning to an autonumber column outside the system. ferinstance accontants trying to use an autonumber in place of an order, grn or other sequential number.

    as to your problem over payments
    how do you handle the fact that there is an ad extension
    how does that affect your current model. can it live with what you are proposing.
    bear in mind its your model, supposedly modelling a real world requirement. so how would the business handle this.
    I'd rather be riding on the Tiger 800 or the Norton

  8. #8
    Join Date
    Jan 2009
    Location
    Zoetermeer, Holland
    Posts
    746
    Quote Originally Posted by shammat View Post
    Using reserved words for column names is a bad idea as well.
    Like to emphasize that once more.
    Somewhere between " too small" and " too large" lies the size that is just right.
    - Scott Hayes

  9. #9
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    All have been good comments - I would add that the company table appears to be a bit sparse. Typically, I would include supplemental detail... address/contact information come to mind.

    Source table, with ID and name. ???

    You may want to create a contacts table. (names, addresses, email, phone numbers,contact type, etc.) Then, depending upon usage, you could set up many-to-many relationships as needed, between contacts and companies, contacts and source, contacts and payments, etc. This way, if a contact changes their email or telephone number, (or, name changes, etc.) you only need to change it in one place.

    Does the payments table represent payments made to you, or payments you make? In either case, this is an instance where a timestamp field is probably a good idea. I can understand not wanting to store an entire credit card number, but storing the last 4 digits is handy when talking to someone - especially a corporate someone, where there could be many folks with a corporate card.

    On larger jobs, you (and your client) probably would like multiple payments, instead of getting the check at the end. (Most places want to see the end product before releasing money, if there's just one payment.) Here's a link to an article with a schedule of:

    30% on sign-up (secures studio time)
    30% on sign-off of creative design
    30% on sign-off of working site (preview)
    10% balance due on go-live

    which would be best handled with a many-to-many (M:N) relationship.
    Last edited by loquin; 02-12-13 at 16:58.
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •