If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > Help to search a dB of text.

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 02-28-08, 20:55
jphudy jphudy is offline
Registered User
 
Join Date: Feb 2008
Posts: 4
Help to search a dB of text.

For starters, I know almost nothing about databases. I'll have to hire someone to make this... But...

I want to create a dB with some general fields of numbers and text per database entry. (In the end, I think there will be about 7 million entries with about 20 fields each.) There be about 5 text fields per database entry each containing about 3-30 pages of text.

I am looking for the correct database to permit me to search the text fields. I am looking for the most flexible (and greatest number of) search operands available.

I am hoping to run this on a linux box. Any ideas? I'm sure you guys know what to do!!
Reply With Quote
  #2 (permalink)  
Old 02-28-08, 21:03
Pat Phelan Pat Phelan is offline
Resident Curmudgeon
 
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,605
Running on a Linux box still leaves just about all of the database engines available. You lose two major players in Micrsoft Jet (the database engine most often used by Microsoft Access) and Microsoft SQL Server, but even with those two removed from consideration that leaves the field wide open.

There are many ways to search databases, each with slightly different techniques, advantages, and problems. You'll have to give us (or at least me anyway) a much clearer picture of what you expect to do in order to get any useful help from us.

-PatP
Reply With Quote
  #3 (permalink)  
Old 02-28-08, 21:12
jphudy jphudy is offline
Registered User
 
Join Date: Feb 2008
Posts: 4
ok

So, the database contains 6 million entries. There are 10 fields per entry that I wish to be able to perform a text search upon. (So, we hace 60 million text fields) Each text field contains around 1000 to 100000 words.

I want to search all of the 60 million text fields with search operands such as:

"printer" AND "paper"
print$ ADJ/n paper <--- $ = wildcard and ADJ/n = within n words


etc, etc. In fact, I want to have ultimate flexibility in the manner which I search the text.
Reply With Quote
  #4 (permalink)  
Old 02-28-08, 21:21
Pat Phelan Pat Phelan is offline
Resident Curmudgeon
 
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,605
The best text search I've seen for a database is Microsoft's Full-Text search, but that doesn't run on Linux.

If you want to do your searches inside the database, DB2 and Oracle both offer tools that support strong text searching like you've described. Both packages are quite expensive, which I think is something that you want to avoid.

There are many ways to get what you have requested from other database engines. These all involve trade offs of coding complexity versus disk space versus query time.

I'll give the other folks here on DBForums a shot at this question before I go too far trying to ferret out an answer for you. There are many database features on many database engines... There might be one that I don't know about yet that provides exactly what you've requested.

-PatP
Reply With Quote
  #5 (permalink)  
Old 02-29-08, 01:10
loquin loquin is offline
Super Moderator
 
Join Date: Jun 2004
Location: Arizona, USA
Posts: 1,797
PostgreSQL has full text search capabilities, as well as some fuzzy string comparisons (soundex and several similar algorithms)

Ref Full Text Search, Fuzzy string matching

Since it's totally free of charge, download it & try it out. (install on whatever OS you would like. (linux server performance with many concurrent connections is better than windows servers.)
__________________
Lou
使大吃一惊
"Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
"I have my standards. They may be low, but I have them!" - Bette Middler
"It's a book about a Spanish guy named Manual. You should read it." - Dilbert


Last edited by loquin; 02-29-08 at 01:14.
Reply With Quote
  #6 (permalink)  
Old 02-29-08, 03:07
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
I would say that if you plan to get someone to build this for you then don't worry too much about the "how" concern yourself with the "what" and thee "why". i.e. forumlate your business rules - get them nailed down. Tell your guy what the output needs to be, what its purpose is (not how things are stored, not how things are searched).

Once your developer has put together a technical spec we can review it if you like, based on your business rules & requirements, and let you know if it looks ok.
Reply With Quote
  #7 (permalink)  
Old 02-29-08, 04:02
gvee gvee is offline
www.gvee.co.uk
 
Join Date: Jan 2007
Location: UK
Posts: 10,156
"The moment you let your 'users' make design decisions is the moment that a system will fail"
__________________
George
Twitter | Blog
Reply With Quote
  #8 (permalink)  
Old 02-29-08, 10:14
blindman blindman is offline
World Class Flame Warrior
 
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
Quote:
Originally Posted by pootle flump
I would say that if you plan to get someone to build this for you then don't worry too much about the "how" concern yourself with the "what" and thee "why". i.e. forumlate your business rules - get them nailed down. Tell your guy what the output needs to be, what its purpose is (not how things are stored, not how things are searched).
Kinda disagree here...experience shows that whoever he contracts with is most likely to recommend the platform with which they are most comfortable. I don't think it is a bad idea for him to research database engines, choose the most appropriate one, and then find a database developer who is experienced with it.
__________________
If it's not practically useful, then it's practically useless.

blindman
www.chess.com: "sqlblindman"
Reply With Quote
  #9 (permalink)  
Old 02-29-08, 21:50
jphudy jphudy is offline
Registered User
 
Join Date: Feb 2008
Posts: 4
Thanks for the inputs guys. I think I will try postgresql!
Reply With Quote
  #10 (permalink)  
Old 03-01-08, 03:11
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
Quote:
Originally Posted by blindman
Kinda disagree here...experience shows that whoever he contracts with is most likely to recommend the platform with which they are most comfortable. I don't think it is a bad idea for him to research database engines, choose the most appropriate one, and then find a database developer who is experienced with it.
I'll remember that next time you recommend SQL Server over Oracle
Reply With Quote
  #11 (permalink)  
Old 03-01-08, 09:31
blindman blindman is offline
World Class Flame Warrior
 
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
Yeah, but I've developed applications for both.
__________________
If it's not practically useful, then it's practically useless.

blindman
www.chess.com: "sqlblindman"
Reply With Quote
  #12 (permalink)  
Old 03-01-08, 11:50
Pat Phelan Pat Phelan is offline
Resident Curmudgeon
 
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,605
I'll take issue with blindman's position on this issue.

The world database world divides pretty neatly into developers and users. Users know what they need in terms of information (output) and usually of data (input). They rarely know much about the process of getting from data to informaiton, and even more rarely have the ability to add significant value to the process of getting from data to information. Developers know the processing options and how to use them, and are usually well suited to making the processing happen. Each side does what it does well, and only in very rare cases do you find people that can do both parts effectively.

Let the users do what they do well (produce data and use information). Let the developers do what they do well (convert data into information). Using the wrong person for a task is much like using a hammer to drive screws... It can be done, it is often fun to watch (from a safe distance), but at least in my experience it is very rarely productive.

When a client starts to tell me what tools to use and how to use them, my response is almost always to thank them for their interest in my services and get the heck out of there. Staying around to fight that battle is a poor use of my time, rarely profitable, and infuriating for both me and the client.

-PatP
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On