Results 1 to 9 of 9

Thread: Fast search

  1. #1
    Join Date
    Mar 2002

    Unanswered: Fast search


    We have an user registration system, which handles:
    the registration of new users
    the update of existing ones
    and the search for already existing ones.

    In total there are already around 2million registered users, and the search( like %searchterm%) among the existing customers is very slow(10-15 min).

    I would like to have your opinion on the proper design and implementation ideeas for such a situation, and not necessarily on the existing implementation which might be flawed.

    The customers have:
    1) a name (textual, up to 300 characters, however most of the records have around 30-100 caracters)
    2) an andress.
    I'll be happy if we could search for already existing customers based on name only within seconds.

    Currently we have an index on the name field, but the index , as expected doesn't work on searches that start with a wildacard '%searchterm'

    I assume this is a classical issue, and I would like to know if there are any ideeas on how to provide for a fast search of a search such as
    user_name like %new% with the end goal to avoid having twice the same user.

    I did my homework and started to search the web, thinking mainly at the google like solution, and I came across a few ideeas
    1) the inverted indexing .
    However this will solve the issue of multiple words as search criteria, and as in our case we are talking about user names which can be very unique, I assume that it may not map exactly to the model of a search engine, that searches for words(out of the dictionary mainly) without using wildcards.

    2) Then I thought of oracle search text
    3) and or a reverse index.
    I do not have that much experience with Oracle search text though, but I'm willing to invest time and study/learn about the solution that you will recommend.

    I have posted here because we are currently using Oracle:
    Operating system Sun Solaris 9.
    RDBMS Oracle
    Application server BEA WebLogic Server 8.1

    I have as well checked a bit Apache Lucene, however I couldn't find any db information about the core design, the only info were the javadocs.

    Or maybe is it too much to ask for seconds while searching for username LIKE %searchterm% ? Is it feasible?

    Thank you!
    Last edited by jimmyy; 12-14-09 at 18:05.

  2. #2
    Join Date
    Mar 2002
    I forgot to mention that the user names are not only english, they can be in other languages as well, but mainly latin characters, a very few cyrilics, but no chinese/japanese nor arabic characters.

  3. #3
    Join Date
    Aug 2003
    Where the Surf Meets the Turf @Del Mar, CA
    Provided Answers: 1
    It should NOT take multiple minutes to do a Full Table Scan on a table with only 2 million rows.

    16:41:44 SQL> alter system flush shared_pool;  
    System altered.
    16:43:01 SQL> select sysdate from dual;
    2009-12-14 16:43:18
    16:43:18 SQL> select count(*) from tsttbl;
    16:43:21 SQL>
    Something else must be slowing down the application.

    -- invoke the slow SQL code

    now find the trace file within ./udump folder
    tkprof <trace_file.trc> trace_results.txt explain=<username>/<password>

    post the contents of trace_results.txt back here
    You can lead some folks to knowledge, but you can not make them think.
    The average person thinks he's above average!
    For most folks, they don't know, what they don't know.
    Good judgement comes from experience. Experience comes from bad judgement.

  4. #4
    Join Date
    Mar 2002
    Hello, and Happy New Year.

    I have manged to get the tkprof file. Please find it attached.
    I have started checking a bit on the internet to learn how to read it, but I must admit it is not very easy, and I couldn't find any proper documentation.

    Attached Files Attached Files

  5. #5
    Join Date
    Mar 2002

    anacedent helped me to identify the most time consuming part of the SQL query of this ORacle text serach.

    :lkexpr ESCAPE '\' and (TOKEN_TYPE IN (0, 4, 6, 604) OR (TOKEN_TYPE
    BETWEEN 616 AND 674))

    call       count    cpu     elapsed     disk      query       current    rows
    ------- ------  ------ ---------- ---------- --------- -------- -----
    Parse       2      0.00       0.00          0          0          0         0
    Execute     2      0.01       0.00          0          0          0         0
    Fetch        2     34.26     106.35      67399      67412       0        178
    ------- ------  ------ ---------- ---------- -------- --------- -----
    total         6      34.27     106.36      67399      67412       0       178
    I have searched the internet to find how to interpret it, but couldn't find much help, except for general advices that the buffering is not optimal, because there are too many disk reads.

    I have found the details of the DR$TRI_TO_FTS_IDX$X :

    Index name: DR$TRI_TO_FTS_IDX$X (this index is on the column TOKEN_TEXT)
    Index Type: Normal
    Uniqueness: NonUnique
    Status: Valid
    Table: DR$TRI_TO_FTS_IDX$I"
    TableType: Table
    Buffer Pool: Default
    Partitioned: No
    Temporary: No
    Initial Transactions: 2
    Max Transactions: 255
    Initial extent size: 65536
    Minimum extents: 1
    Percent Free: 10
    Degree: 1
    Instances: 1
    BLevel: 2
    Leafblocks: 35205
    Distinct keys:6900716
    Avg LeafBlocks per Key: 1
    Avg DataBlocks per Key: 1
    Clustering factor: 5995030
    Samplesize: 6900716
    Generated: N
    Join Index: No
    Size in MB: 288
    Number Extents: 107

    I do not know where and how to continue the investigation on improving the response time for this Oracle Full Text Search. Any help is appreciated.
    Last edited by jimmyy; 01-18-10 at 11:46.

  6. #6
    Join Date
    Mar 2002
    Any ideeas on where to digg next?


  7. #7
    Join Date
    Mar 2002

    I was thinking of trying to cluster this index DR$TRI_TO_FTS_IDX$X
    Could it improve the performance?

  8. #8
    Join Date
    Mar 2002
    Even though it looks like this subject is not very interesting, I'll still post here, because hopefully I'll find a solution and somebody else might benefit from it.

    I have researched and found the following.
    1) In almost all the examples the Oracle Full Text Search(as in this example: ...where contains(title,'Oracle')) is not used with wildcards. So I wonder if that's the best way to go when it comes to searching with wildcards. I'll have to check further, as so far I have seen only examples.

    2) The index is a CTXSYS.CONTEXT type index on a clob column. This clob column named "full_text_search" is a concatenation of all the text fields in the main table.
    I'll try to see if the performance is improved by limiting the number of columns concatenated, because the Full Name column is the most important one. I'll need to read more about this type of index to better understand it.

    3) I need to read more about the "lexer" parameter from the creation of the index, for the moment I do not know exactly how it works, I have seen that it helps group the text into categories, and I guess it is much more than this.

    I have found some material that I'll read from here and hopefully I'll be more knowledgable and manage to find the issue.

    Basically as it is shown in the tkprof result attached in a previous post, the index is used and not a full table scan, so I assume the issue is with the full text search index.
    Last edited by jimmyy; 02-14-10 at 10:28.

  9. #9
    Join Date
    Aug 2009

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts