Results 1 to 12 of 12
  1. #1
    Join Date
    Feb 2008
    Posts
    4

    Help to search a dB of text.

    For starters, I know almost nothing about databases. I'll have to hire someone to make this... But...

    I want to create a dB with some general fields of numbers and text per database entry. (In the end, I think there will be about 7 million entries with about 20 fields each.) There be about 5 text fields per database entry each containing about 3-30 pages of text.

    I am looking for the correct database to permit me to search the text fields. I am looking for the most flexible (and greatest number of) search operands available.

    I am hoping to run this on a linux box. Any ideas? I'm sure you guys know what to do!!

  2. #2
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Running on a Linux box still leaves just about all of the database engines available. You lose two major players in Micrsoft Jet (the database engine most often used by Microsoft Access) and Microsoft SQL Server, but even with those two removed from consideration that leaves the field wide open.

    There are many ways to search databases, each with slightly different techniques, advantages, and problems. You'll have to give us (or at least me anyway) a much clearer picture of what you expect to do in order to get any useful help from us.

    -PatP

  3. #3
    Join Date
    Feb 2008
    Posts
    4

    ok

    So, the database contains 6 million entries. There are 10 fields per entry that I wish to be able to perform a text search upon. (So, we hace 60 million text fields) Each text field contains around 1000 to 100000 words.

    I want to search all of the 60 million text fields with search operands such as:

    "printer" AND "paper"
    print$ ADJ/n paper <--- $ = wildcard and ADJ/n = within n words


    etc, etc. In fact, I want to have ultimate flexibility in the manner which I search the text.

  4. #4
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    The best text search I've seen for a database is Microsoft's Full-Text search, but that doesn't run on Linux.

    If you want to do your searches inside the database, DB2 and Oracle both offer tools that support strong text searching like you've described. Both packages are quite expensive, which I think is something that you want to avoid.

    There are many ways to get what you have requested from other database engines. These all involve trade offs of coding complexity versus disk space versus query time.

    I'll give the other folks here on DBForums a shot at this question before I go too far trying to ferret out an answer for you. There are many database features on many database engines... There might be one that I don't know about yet that provides exactly what you've requested.

    -PatP

  5. #5
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    PostgreSQL has full text search capabilities, as well as some fuzzy string comparisons (soundex and several similar algorithms)

    Ref Full Text Search, Fuzzy string matching

    Since it's totally free of charge, download it & try it out. (install on whatever OS you would like. (linux server performance with many concurrent connections is better than windows servers.)
    Last edited by loquin; 02-29-08 at 02:14.
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


  6. #6
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    I would say that if you plan to get someone to build this for you then don't worry too much about the "how" concern yourself with the "what" and thee "why". i.e. forumlate your business rules - get them nailed down. Tell your guy what the output needs to be, what its purpose is (not how things are stored, not how things are searched).

    Once your developer has put together a technical spec we can review it if you like, based on your business rules & requirements, and let you know if it looks ok.

  7. #7
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    "The moment you let your 'users' make design decisions is the moment that a system will fail"
    George
    Home | Blog

  8. #8
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Quote Originally Posted by pootle flump
    I would say that if you plan to get someone to build this for you then don't worry too much about the "how" concern yourself with the "what" and thee "why". i.e. forumlate your business rules - get them nailed down. Tell your guy what the output needs to be, what its purpose is (not how things are stored, not how things are searched).
    Kinda disagree here...experience shows that whoever he contracts with is most likely to recommend the platform with which they are most comfortable. I don't think it is a bad idea for him to research database engines, choose the most appropriate one, and then find a database developer who is experienced with it.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  9. #9
    Join Date
    Feb 2008
    Posts
    4
    Thanks for the inputs guys. I think I will try postgresql!

  10. #10
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by blindman
    Kinda disagree here...experience shows that whoever he contracts with is most likely to recommend the platform with which they are most comfortable. I don't think it is a bad idea for him to research database engines, choose the most appropriate one, and then find a database developer who is experienced with it.
    I'll remember that next time you recommend SQL Server over Oracle

  11. #11
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Yeah, but I've developed applications for both.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  12. #12
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    I'll take issue with blindman's position on this issue.

    The world database world divides pretty neatly into developers and users. Users know what they need in terms of information (output) and usually of data (input). They rarely know much about the process of getting from data to informaiton, and even more rarely have the ability to add significant value to the process of getting from data to information. Developers know the processing options and how to use them, and are usually well suited to making the processing happen. Each side does what it does well, and only in very rare cases do you find people that can do both parts effectively.

    Let the users do what they do well (produce data and use information). Let the developers do what they do well (convert data into information). Using the wrong person for a task is much like using a hammer to drive screws... It can be done, it is often fun to watch (from a safe distance), but at least in my experience it is very rarely productive.

    When a client starts to tell me what tools to use and how to use them, my response is almost always to thank them for their interest in my services and get the heck out of there. Staying around to fight that battle is a poor use of my time, rarely profitable, and infuriating for both me and the client.

    -PatP

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •