If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > Oracle > Fast search

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 12-14-09, 18:00
jimmyy jimmyy is offline
Registered User
 
Join Date: Mar 2002
Posts: 16
Fast search

Hi,

We have an user registration system, which handles:
the registration of new users
the update of existing ones
and the search for already existing ones.

In total there are already around 2million registered users, and the search( like %searchterm%) among the existing customers is very slow(10-15 min).


I would like to have your opinion on the proper design and implementation ideeas for such a situation, and not necessarily on the existing implementation which might be flawed.


The customers have:
1) a name (textual, up to 300 characters, however most of the records have around 30-100 caracters)
2) an andress.
I'll be happy if we could search for already existing customers based on name only within seconds.

Currently we have an index on the name field, but the index , as expected doesn't work on searches that start with a wildacard '%searchterm'


I assume this is a classical issue, and I would like to know if there are any ideeas on how to provide for a fast search of a search such as
user_name like %new% with the end goal to avoid having twice the same user.



I did my homework and started to search the web, thinking mainly at the google like solution, and I came across a few ideeas
1) the inverted indexing .
However this will solve the issue of multiple words as search criteria, and as in our case we are talking about user names which can be very unique, I assume that it may not map exactly to the model of a search engine, that searches for words(out of the dictionary mainly) without using wildcards.

2) Then I thought of oracle search text
3) and or a reverse index.
I do not have that much experience with Oracle search text though, but I'm willing to invest time and study/learn about the solution that you will recommend.

I have posted here because we are currently using Oracle:
Operating system Sun Solaris 9.
RDBMS Oracle 10.2.0.3.
Application server BEA WebLogic Server 8.1

I have as well checked a bit Apache Lucene, however I couldn't find any db information about the core design, the only info were the javadocs.

Or maybe is it too much to ask for seconds while searching for username LIKE %searchterm% ? Is it feasible?


Thank you!

Last edited by jimmyy; 12-14-09 at 18:05.
Reply With Quote
  #2 (permalink)  
Old 12-14-09, 18:13
jimmyy jimmyy is offline
Registered User
 
Join Date: Mar 2002
Posts: 16
I forgot to mention that the user names are not only english, they can be in other languages as well, but mainly latin characters, a very few cyrilics, but no chinese/japanese nor arabic characters.
Reply With Quote
  #3 (permalink)  
Old 12-14-09, 20:46
anacedent anacedent is offline
Registered User
 
Join Date: Aug 2003
Location: Where the Surf Meets the Turf @Del Mar, CA
Posts: 6,151
It should NOT take multiple minutes to do a Full Table Scan on a table with only 2 million rows.

Code:
16:41:44 SQL> alter system flush shared_pool;  

System altered.

16:43:01 SQL> select sysdate from dual;

SYSDATE
-------------------
2009-12-14 16:43:18

16:43:18 SQL> select count(*) from tsttbl;

  COUNT(*)
----------
    603018
16:43:21 SQL>
Something else must be slowing down the application.


ALTER SESSION SET SQL_TRACE=TRUE;
-- invoke the slow SQL code
ALTER SESSION SET SQL_TRACE=FALSE;

now find the trace file within ./udump folder
tkprof <trace_file.trc> trace_results.txt explain=<username>/<password>

post the contents of trace_results.txt back here
__________________
You can lead some folks to knowledge, but you can not make them think.
The average person thinks he's above average!
For most folks, they don't know, what they don't know.
Reply With Quote
  #4 (permalink)  
Old 01-11-10, 11:14
jimmyy jimmyy is offline
Registered User
 
Join Date: Mar 2002
Posts: 16
Hello, and Happy New Year.

I have manged to get the tkprof file. Please find it attached.
I have started checking a bit on the internet to learn how to read it, but I must admit it is not very easy, and I couldn't find any proper documentation.

Jimmyy
Attached Files
File Type: txt trace_results_4dbforums.txt (92.5 KB, 70 views)
Reply With Quote
  #5 (permalink)  
Old 01-18-10, 11:40
jimmyy jimmyy is offline
Registered User
 
Join Date: Mar 2002
Posts: 16
Hi,

anacedent helped me to identify the most time consuming part of the SQL query of this ORacle text serach.

SELECT/*+DYNAMIC_SAMPLING(0) INDEX(T "DR$TRI_TO_FTS_IDX$X")*/ DISTINCT
TOKEN_TEXT FROM "TRI"."DR$TRI_TO_FTS_IDX$I" T WHERE TOKEN_TEXT LIKE
:lkexpr ESCAPE '\' and (TOKEN_TYPE IN (0, 4, 6, 604) OR (TOKEN_TYPE
BETWEEN 16 AND 74) OR (TOKEN_TYPE
BETWEEN 616 AND 674))

Code:
call       count    cpu     elapsed     disk      query       current    rows
------- ------  ------ ---------- ---------- --------- -------- -----
Parse       2      0.00       0.00          0          0          0         0
Execute     2      0.01       0.00          0          0          0         0
Fetch        2     34.26     106.35      67399      67412       0        178
------- ------  ------ ---------- ---------- -------- --------- -----
total         6      34.27     106.36      67399      67412       0       178
I have searched the internet to find how to interpret it, but couldn't find much help, except for general advices that the buffering is not optimal, because there are too many disk reads.

I have found the details of the DR$TRI_TO_FTS_IDX$X :

Index name: DR$TRI_TO_FTS_IDX$X (this index is on the column TOKEN_TEXT)
Index Type: Normal
Uniqueness: NonUnique
Status: Valid
Table: DR$TRI_TO_FTS_IDX$I"
TableType: Table
Buffer Pool: Default
Partitioned: No
Temporary: No
Initial Transactions: 2
Max Transactions: 255
Initial extent size: 65536
Minimum extents: 1
Percent Free: 10
Degree: 1
Instances: 1
BLevel: 2
Leafblocks: 35205
Distinct keys:6900716
Avg LeafBlocks per Key: 1
Avg DataBlocks per Key: 1
Clustering factor: 5995030
NumRows:6900716
Samplesize: 6900716
Generated: N
Join Index: No
Size in MB: 288
Number Extents: 107

I do not know where and how to continue the investigation on improving the response time for this Oracle Full Text Search. Any help is appreciated.

Last edited by jimmyy; 01-18-10 at 11:46.
Reply With Quote
  #6 (permalink)  
Old 01-20-10, 13:57
jimmyy jimmyy is offline
Registered User
 
Join Date: Mar 2002
Posts: 16
Any ideeas on where to digg next?

Jimmyy
Reply With Quote
  #7 (permalink)  
Old 02-06-10, 04:42
jimmyy jimmyy is offline
Registered User
 
Join Date: Mar 2002
Posts: 16
Hi,

I was thinking of trying to cluster this index DR$TRI_TO_FTS_IDX$X
Could it improve the performance?
Reply With Quote
  #8 (permalink)  
Old 02-14-10, 10:25
jimmyy jimmyy is offline
Registered User
 
Join Date: Mar 2002
Posts: 16
Even though it looks like this subject is not very interesting, I'll still post here, because hopefully I'll find a solution and somebody else might benefit from it.


I have researched and found the following.
1) In almost all the examples the Oracle Full Text Search(as in this example: ...where contains(title,'Oracle')) is not used with wildcards. So I wonder if that's the best way to go when it comes to searching with wildcards. I'll have to check further, as so far I have seen only examples.

2) The index is a CTXSYS.CONTEXT type index on a clob column. This clob column named "full_text_search" is a concatenation of all the text fields in the main table.
I'll try to see if the performance is improved by limiting the number of columns concatenated, because the Full Name column is the most important one. I'll need to read more about this type of index to better understand it.

3) I need to read more about the "lexer" parameter from the creation of the index, for the moment I do not know exactly how it works, I have seen that it helps group the text into categories, and I guess it is much more than this.

I have found some material that I'll read from here and hopefully I'll be more knowledgable and manage to find the issue.


Basically as it is shown in the tkprof result attached in a previous post, the index is used and not a full table scan, so I assume the issue is with the full text search index.

Last edited by jimmyy; 02-14-10 at 10:28.
Reply With Quote
  #9 (permalink)  
Old 02-15-10, 06:07
mishaalsy mishaalsy is offline
Registered User
 
Join Date: Aug 2009
Posts: 262
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On