If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > Preliminary Research - Search Engine DB Design

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 06-02-11, 13:15
bryceray bryceray is offline
Registered User
 
Join Date: Jun 2011
Posts: 2
Preliminary Research - Search Engine DB Design

Hello,

This is my first post in this forum but I look forward to becoming an active member as I explore the world of databases. I'm a web developer and have experience building databases for simple websites. I've decided I want to work on a personal project to enhance my understanding of databases and database design. I plan to create a website search engine which can search/crawl a website.

It has become obvious to me that one of the most important factors in creating an efficient search engine is a well-designed database. I'm interested in doing some reading (books/papers/tutorials) on databases specific to search engines. However, searching the web I've not been able to find much research.

My question to the members of this forum are:
Do you know of any good research that has been done on the subject of search engine databases? If not, what are some important concepts I should research that are essential to developing a solid search engine database.

The search engine I plan to develop should scale from a small website to a large website. It is not intended to be a 'google-like' search engine which scours the web. This is intended to be an internal website search engine.

I appreciate any advise you might be able to offer. Thanks!
Reply With Quote
  #2 (permalink)  
Old 06-02-11, 14:59
blindman blindman is offline
World Class Flame Warrior
 
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
It would seem that researching googl's, bing's, and yahoo's search engine database designs would still be a good start, even if you are not planning on replicating that same project scale.
__________________
If it's not practically useful, then it's practically useless.

blindman
www.chess.com: "sqlblindman"
Reply With Quote
  #3 (permalink)  
Old 06-02-11, 15:15
Pat Phelan Pat Phelan is offline
Resident Curmudgeon
 
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,605
Searching is one of the "holy grail" topics in computing. There is always a lot of research being done on the subject, and I don't know of anything that approaches a "Unified Field Theory" for computer searching. There are several hotbeds of search research going on right now, each of which has specific features/benefits/restrictions.

What kind of material do you wish to search? This is probably the key question that you need to answer before you can start any real reading or learning. Searching pictures is very different from searching audio, and they are both very different than searching text. Each medium requires different tactics and radically different processes to search them.

Once you define your media, you need to decide what is important to be able to find as a result of your search. Macro searches deal with searching huge amounts of information with the goal of getting the searcher into the right vincinity and allowing the searcher to then refine their search. A macro search would be like specifying that you are interested in forrests or beaches on Google Earth, then allowing the search to find where you should look to refine your search criteria.

Micro searches usually follow the macro search, and they are designed to deal with far less data and to return very specific results. An example of this would be to feed a specific pattern of notes or lyrics to SoundHound and have it find that pattern within a given song or playlist. These searches are not possible on large amounts of data, but they can be just what is needed to get to a final answer in a search process.

Text seraches can be very different than "context aware" searches because there is only one context. Even though text can appear in multiple languages or dialects, the fundamental nature of text limits the amount of data and the ways that data can be organized. Text searching is very different than context aware searches, and requires a different mindset and different practices to achieve good results.

If you could offer some information about what kind(s) of data your website(s) will host, what you expect as search criteria, and what you want as a result of the search, then I could offer more specific suggestions on where you should look for more information.

-PatP
__________________
In theory, theory and practice are identical. In practice, theory and practice are unrelated.
Reply With Quote
  #4 (permalink)  
Old 06-02-11, 19:54
bryceray bryceray is offline
Registered User
 
Join Date: Jun 2011
Posts: 2
Thanks for the responses.

blindman
I've been looking at documents on google. The problem is it is hard to visualize scaling it down to a single website. I'm sure its possible though, and i'll definitely be spending some studying the way they do it.

Pat Phelan
At the moment I am going to concentrate on standard text searching. This will include text webpages and documents (pdf, ppt, doc, etc). I know there are ways to grab text from the documents, i'm more concerned on a flexible database structure that will allow me to scale. Once I get it setup I also plan to play around with tuning to get the most efficiency out of it as possible.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On