I have recently created a search system at my company using the Regain Search at regain - Home to search through all support tickets. I run some queries in SQL to extract my data, export to Excel, import to Access, run some additional queries, create a report, export the report to .txt files, and use a file splitter to split the HTML text into pages before handing it off to the indexer. This results in 90,000+ pages that are indexed and I only update about 500 or so daily as the rest are closed support incidents which do not change. The result is a great search functionality but it is a very tedious and manual process to update records.
I'd like to revamp this to an enterprise type search that updates automatically each night. I have built a SQL 2005 server and linked it to the production server with which I have read access to. I'm trying to figure out the best approach to take from here. The tables are essentially as follows:
So do I run a "select into" statement to make one large non-normalized table that can be indexed and searches run against? Are there some examples or books on this? Will I need reporting services? The most important thing to maintain is the fast query speed. Right now all queries return almost instantly. Any help is greatly appreciated.
So do I run a "select into" statement to make one large non-normalized table that can be indexed and searches run against?
No, stick with what works. Regain is effectively a front-end to Lucene. Lucene is a full-text indexing engine, which is essentially your "large non-normalized table". The Lucene FAQ has some info on how to index a database. You basically want to have Lucene pull data in from your database.
I'd like to revamp this to an enterprise type search that updates automatically each night.
Why each night? Create a table that logs changes. Triggers update the log table. And a script pings it every few minutes to pull changes into the Lucene store.
Thanks for the reply. So if the Lucene engine is the way to go above SQLs indexing capabilities, I will stick with it. There must be a way to simplify my process though and possibly automate it. Are there any examples that I can look at? Unfortunately, for our production database I only have read access and am not allowed to read from it during business hours for performance reasons. It needs to poll for changes after hours.