| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

02-28-08, 20:55
|
|
Registered User
|
|
Join Date: Feb 2008
Posts: 4
|
|
|
Help to search a dB of text.
|
|
For starters, I know almost nothing about databases. I'll have to hire someone to make this... But...
I want to create a dB with some general fields of numbers and text per database entry. (In the end, I think there will be about 7 million entries with about 20 fields each.) There be about 5 text fields per database entry each containing about 3-30 pages of text.
I am looking for the correct database to permit me to search the text fields. I am looking for the most flexible (and greatest number of) search operands available.
I am hoping to run this on a linux box. Any ideas? I'm sure you guys know what to do!!
|
|

02-28-08, 21:03
|
|
Resident Curmudgeon
|
|
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,605
|
|
Running on a Linux box still leaves just about all of the database engines available. You lose two major players in Micrsoft Jet (the database engine most often used by Microsoft Access) and Microsoft SQL Server, but even with those two removed from consideration that leaves the field wide open.
There are many ways to search databases, each with slightly different techniques, advantages, and problems. You'll have to give us (or at least me anyway) a much clearer picture of what you expect to do in order to get any useful help from us.
-PatP
|
|

02-28-08, 21:12
|
|
Registered User
|
|
Join Date: Feb 2008
Posts: 4
|
|
|
ok
|
|
So, the database contains 6 million entries. There are 10 fields per entry that I wish to be able to perform a text search upon. (So, we hace 60 million text fields) Each text field contains around 1000 to 100000 words.
I want to search all of the 60 million text fields with search operands such as:
"printer" AND "paper"
print$ ADJ/n paper <--- $ = wildcard and ADJ/n = within n words
etc, etc. In fact, I want to have ultimate flexibility in the manner which I search the text.
|
|

02-28-08, 21:21
|
|
Resident Curmudgeon
|
|
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,605
|
|
The best text search I've seen for a database is Microsoft's Full-Text search, but that doesn't run on Linux.
If you want to do your searches inside the database, DB2 and Oracle both offer tools that support strong text searching like you've described. Both packages are quite expensive, which I think is something that you want to avoid.
There are many ways to get what you have requested from other database engines. These all involve trade offs of coding complexity versus disk space versus query time.
I'll give the other folks here on DBForums a shot at this question before I go too far trying to ferret out an answer for you. There are many database features on many database engines... There might be one that I don't know about yet that provides exactly what you've requested.
-PatP
|
|

02-29-08, 01:10
|
|
Super Moderator
|
|
Join Date: Jun 2004
Location: Arizona, USA
Posts: 1,797
|
|
PostgreSQL has full text search capabilities, as well as some fuzzy string comparisons (soundex and several similar algorithms)
Ref Full Text Search, Fuzzy string matching
Since it's totally free of charge, download it & try it out. (install on whatever OS you would like. (linux server performance with many concurrent connections is better than windows servers.)
__________________
Lou
使大吃一惊
"Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
"I have my standards. They may be low, but I have them!" - Bette Middler
"It's a book about a Spanish guy named Manual. You should read it." - Dilbert
|
Last edited by loquin; 02-29-08 at 01:14.
|

02-29-08, 03:07
|
|
King of Understatement
|
|
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
|
|
I would say that if you plan to get someone to build this for you then don't worry too much about the "how" concern yourself with the "what" and thee "why". i.e. forumlate your business rules - get them nailed down. Tell your guy what the output needs to be, what its purpose is (not how things are stored, not how things are searched).
Once your developer has put together a technical spec we can review it if you like, based on your business rules & requirements, and let you know if it looks ok.
|
|

02-29-08, 04:02
|
|
www.gvee.co.uk
|
|
Join Date: Jan 2007
Location: UK
Posts: 10,156
|
|
"The moment you let your 'users' make design decisions is the moment that a system will fail"
|
|

02-29-08, 10:14
|
|
World Class Flame Warrior
|
|
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
|
|
Quote:
|
Originally Posted by pootle flump
I would say that if you plan to get someone to build this for you then don't worry too much about the "how" concern yourself with the "what" and thee "why". i.e. forumlate your business rules - get them nailed down. Tell your guy what the output needs to be, what its purpose is (not how things are stored, not how things are searched).
|
Kinda disagree here...experience shows that whoever he contracts with is most likely to recommend the platform with which they are most comfortable. I don't think it is a bad idea for him to research database engines, choose the most appropriate one, and then find a database developer who is experienced with it.
__________________
If it's not practically useful, then it's practically useless.
blindman
www.chess.com: "sqlblindman"
|
|

02-29-08, 21:50
|
|
Registered User
|
|
Join Date: Feb 2008
Posts: 4
|
|
Thanks for the inputs guys. I think I will try postgresql!
|
|

03-01-08, 03:11
|
|
King of Understatement
|
|
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
|
|
Quote:
|
Originally Posted by blindman
Kinda disagree here...experience shows that whoever he contracts with is most likely to recommend the platform with which they are most comfortable. I don't think it is a bad idea for him to research database engines, choose the most appropriate one, and then find a database developer who is experienced with it.
|
I'll remember that next time you recommend SQL Server over Oracle 
|
|

03-01-08, 09:31
|
|
World Class Flame Warrior
|
|
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
|
|
Yeah, but I've developed applications for both.
__________________
If it's not practically useful, then it's practically useless.
blindman
www.chess.com: "sqlblindman"
|
|

03-01-08, 11:50
|
|
Resident Curmudgeon
|
|
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,605
|
|
I'll take issue with blindman's position on this issue.
The world database world divides pretty neatly into developers and users. Users know what they need in terms of information (output) and usually of data (input). They rarely know much about the process of getting from data to informaiton, and even more rarely have the ability to add significant value to the process of getting from data to information. Developers know the processing options and how to use them, and are usually well suited to making the processing happen. Each side does what it does well, and only in very rare cases do you find people that can do both parts effectively.
Let the users do what they do well (produce data and use information). Let the developers do what they do well (convert data into information). Using the wrong person for a task is much like using a hammer to drive screws... It can be done, it is often fun to watch (from a safe distance), but at least in my experience it is very rarely productive.
When a client starts to tell me what tools to use and how to use them, my response is almost always to thank them for their interest in my services and get the heck out of there. Staying around to fight that battle is a poor use of my time, rarely profitable, and infuriating for both me and the client.
-PatP
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|