Results 1 to 12 of 12
  1. #1
    Join Date
    Mar 2004
    Posts
    370

    Question Unanswered: LIKE is considered awful from performance point of view?

    Hi all,
    We here had a conversation with a collegue that recommand not to use an index on varchar2 columns for searching. He think that Oracle index is not suitable for LIKE operations on column and that is better to index these varchar2 things (mapped to String in our java app) by a java search API called lucene.
    Do you have any idea or related experience? It will be my pleasure to learn from you...
    -Thanks

  2. #2
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Provided Answers: 1
    Your colleague is insane - or to put it more mildly, mistaken ;-)

    How can "Lucene" possibly search for data in an Oracle database faster than Oracle can? Assuming Lucene can work with Oracle at all, it will be a layer on top of Oracle, i.e. an overhead. Your Java app will first full scan the table for all the data, and then search it. That can't be faster than Oracle searching it for you.

    Indexing is fine on VARCHAR2 columns. LIKE can use indexes for "leading edge" searches (e.g. col like 'abc%'). It will perform a full table scan for other searches (e.g. col like '%xyz') - which is OK if your table isn't too big and/or you don't need to do it too often.

    Oracle has something called Oracle Text to handle indexing of sigificantly large amounts of text.

  3. #3
    Join Date
    Nov 2003
    Posts
    2,935
    Provided Answers: 12
    No his colleague is not insane and might be correct after all.

    Lucene is a full-text search engine and it will create an index that will allow quick access to the indexed information (like Google or other search engines).

    It will definitely not cause a full table scan if setup correctly.

    You are right as far as Oracle Text is concerned. It can do the same, but AFAIK it is either a separate (costly) license, or only available with the enterprise edition.

    If you are only querying single column values than a proper indexed table might actually suit your needs. But as long as we don't know what and how you are searching your data (and how that looks like), there is no way we can give a decent recommendation.

  4. #4
    Join Date
    Aug 2003
    Location
    Where the Surf Meets the Turf @Del Mar, CA
    Posts
    7,776
    Provided Answers: 1
    >Lucene is a full-text search engine and it will create an index
    >It will definitely not cause a full table scan if setup correctly.
    Please explain in detail how an index will be created WITHOUT doing a FTS?
    How does Lucerne keep its index current/accurate with INSERTed or UPDATEd data?
    You can lead some folks to knowledge, but you can not make them think.
    The average person thinks he's above average!
    For most folks, they don't know, what they don't know.
    Good judgement comes from experience. Experience comes from bad judgement.

  5. #5
    Join Date
    Aug 2004
    Location
    France
    Posts
    754
    Quote Originally Posted by shammat
    You are right as far as Oracle Text is concerned. It can do the same, but AFAIK it is either a separate (costly) license, or only available with the enterprise edition.
    Oracle Text is bundled with every Oracle editions : from Oracle XE to Enterprise edition.

    As andrewst said, this is the way to go if LIKE queries are too slow.

    Oracle Text Application Developer's Guide

    Oracle Text Reference

    HTH & Regards,

    rbaraer
    ORA-000TK : No bind variable detected... Shared Pool Alert code 5 - Nuclear query ready .

  6. #6
    Join Date
    Nov 2003
    Posts
    2,935
    Provided Answers: 12
    Quote Originally Posted by RBARAER
    Oracle Text is bundled with every Oracle editions : from Oracle XE to Enterprise edition.
    Ah! Good to know. I'm pretty sure this wasn't the case with 8i because a customer once considered it, and dropped it due to the additional license cost.

    But still: Lucene is a viable option for implementing full text search.

  7. #7
    Join Date
    Aug 2004
    Location
    France
    Posts
    754
    Quote Originally Posted by shammat
    But still: Lucene is a viable option for implementing full text search.
    I just had a quick glance at Lucene here :
    How can I use Lucene to index a database?

    Connect to the database using JDBC and use an SQL "SELECT" statement to query the database. Then create one Lucene Document object per row and add it to the index. You will probably want to store the ID column so you can later access the matching items. For other (text) columns it might make more sense to only index (not store) them, as the original data is still available in your database.
    Manually SELECT new data through JDBC so as to have them indexed. Then search the index, find the IDs quickly, OK, but then query the database so as to get the actuel data. This does NOT seem scalable to me.

    Lucene seems a very good option if you want to index OS files, but not a good solution for text data stored in a database. If it were a DBMS without full text indexing, then OK, but Oracle + Oracle Text is definitely a better option IMHO.

    Regards,

    rbaraer
    ORA-000TK : No bind variable detected... Shared Pool Alert code 5 - Nuclear query ready .

  8. #8
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Provided Answers: 1
    Quote Originally Posted by shammat
    But still: Lucene is a viable option for implementing full text search.
    ... of flat files maybe. You have answered Anacedent's valid question: how is Lucene going to search an Oracle database without performing a full table scan on the data. Does it go directly to the OS files underlying the database? If so, how can it extract any sensible meaning from the data unless it has an Oracle engine clone built into it?

    I imagine more likely works like this:
    1) application loads all data from table into memory, or into a flat file
    2) application calls Lucene process to search data in memory/file

    If the amount of data is small then this seems pointless; if large then it seems very inefficient.

  9. #9
    Join Date
    Mar 2004
    Posts
    370

    Thumbs up

    WOW! Thanks friends
    This thread is really useful for me and of course my colleague I learned a lot, and yep as I guessed lucene has to do FTS at least first time it goes to build indexes. (and whenever a DML is occured?!).

    Note for Tony Andrews:
    Our queries are not all "leading edge", we will have some queries for searching in our varchar2(4000) column for a phrase by LIKE '%phrase%'; Your recommandation about using Oracle Text is viable for me but our product will be shipped not only on Oracle but also SQL Server and MySQL (and maybe other DBMSs at future) so we want a solution with minimum affect in app code that works reasonably on all of these.

    Do you suggest any particular tip or trick to enhance the performance withot using Oracle Text and by regular B-tree indexes?

    -Thanks again for all great posts...

  10. #10
    Join Date
    May 2004
    Location
    Dominican Republic
    Posts
    721
    I believe he definitively wants Lucene, since he is not looking for a particular option especific to a database. Lucene is *generic* on this aspect, as it can be used with any database.

    Don't expect this *generic* solution by efficient though.

  11. #11
    Join Date
    Aug 2004
    Location
    France
    Posts
    754
    Quote Originally Posted by Ach
    (and whenever a DML is occured?!).
    In a perfect world, yes.

    As I see it, if you use Lucene, you will have to maintain the indexes externally, so you should mark every record to be indexed with a "To be indexed" flag (after each insert/update), and then from Lucene SELECT the records with this flag set to "YES" every n minutes, index them and then update all of them to put the flag to "NO". Now in order to deal with deleted records, you could put them in a separate table, and as you do for inserts/updates, periodically SELECT this table, tell Lucene to remove them from the index, and then empty this table...

    As you must see, all of this is a LOT of overhead for something that, as I told you earlier and as Tony emphasized, will certainly NOT be efficient/scalable. Even searches won't be that efficient since once Lucene has given the ids to you, you have to query the database on all these ids to retrieve the actual data...

    I would be very interested in returns on the efficiency of Lucene (in such a scenario, with data within the database) compared to LIKE searches. If the benefit is not that much (say 20% faster searches), the whole manual update work/overhead would really make this solution pointless IMO.

    If you limit the types of LIKE searches that can be performed, you can also think about using a FUNCTION INDEX, such as an index on SUBSTR(stringfield, 1, 5) to index the first 5 letters of a string field... But again this is a database-specific solution.

    HTH & Regards,

    rbaraer
    ORA-000TK : No bind variable detected... Shared Pool Alert code 5 - Nuclear query ready .

  12. #12
    Join Date
    Mar 2004
    Posts
    370

    Thumbs up

    Thanks friends,
    You are right RBARAER, in overall it will be more expensive to do this sort of job by lucene. But I should think twice about another solutions on SQL Server (yep I know it HAS an indexing service like Oracle Text) and also on Mysql that AFAIK does not have similar full text service.

    Also I will consider FBI too, thanks again for your time and help.

    -Good luck.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •