Results 1 to 4 of 4
  1. #1
    Join Date
    Jul 2007
    Posts
    3

    Red face Unanswered: Full Text Index not working when populated from nvarchar. Bug?

    Hi, I was wondering if any SQL Server gurus out there could help me...

    I have a table I'm trying to apply a full text catalog to, however no results are ever returned due to the text column being cataloged being of varbinary(max) that's being populated from a converted nvarchar(max) value.

    To re-create the problem quickly...

    If I populate the column via
    CONVERT(varbinary(max), 'test text')
    then there is no problem, I get results as expected.

    However if I populate the column via
    CONVERT(varbinary(max), CAST('test text' as nvarchar(max)))
    no results are ever returned.

    Is this a bug with SQL Server 2005 Full Text Indexing? I'm happily creating full text catalogs when an nvarchar is not getting converted into a varbinary.

    I'm setting the Document Type column to '.html' (I've tried changing this to '.txt' in case it was a fault with the html ifilter but the problem persists so I believe I can rule this out).

    The reason I need to convert an nvarchar to varbinary is that the table holds multi-lingual text and I'm adding a html meta tag <META NAME="MS.LOCALE" CONTENT="ES"> to the beginning in order for the full text indexing word breaker to select the correct language to catalog the text with. The aim being to provide more relevant searches in users native languages (I've read a few articles that describe this technique, but it's the first time I've tried to apply it).

    Any pointers / suggestions would be greatly appreciated. Cheers,
    Gavin.

    Below is a T-SQL script you can run to demonstrate the effect I'm experiencing...

    Code:
    -- Create test database
    CREATE DATABASE FullTextTest
    GO
    USE FullTextTest
    GO
    
    -- Create test data table
    CREATE TABLE TestTable
    (
    pk UNIQUEIDENTIFIER NOT NULL CONSTRAINT tablePK PRIMARY KEY,
    varbinarycol VARBINARY(MAX),
    documentExtension VARCHAR(5),
    )
    GO
    
    -- The below single entry WILL BE FOUND (the text source is being entered directly)
    INSERT INTO TestTable (pk, varbinarycol, documentExtension) VALUES (NEWID(), CONVERT(VARBINARY(MAX),'<META NAME="MS.LOCALE" CONTENT="EN">test entry 1'), '.html')
    
    -- The bellow two entries below WILL NOT BE FOUND (the text source is taken from an NVARCHAR(MAX) value)
    INSERT INTO TestTable (pk, varbinarycol, documentExtension) VALUES (NEWID(), CONVERT(VARBINARY(MAX), CAST('<META NAME="MS.LOCALE" CONTENT="EN">test entry 2' AS NVARCHAR(MAX))), '.html')
    INSERT INTO TestTable (pk, varbinarycol, documentExtension) VALUES (NEWID(), CONVERT(VARBINARY(MAX), CAST('<META NAME="MS.LOCALE" CONTENT="EN">test entry 3' AS NVARCHAR(MAX))), '.html')
    GO
    
    -- Create the full text catalog
    sp_fulltext_database 'enable'
    GO
    CREATE FULLTEXT CATALOG TEST AS DEFAULT
    GO
    CREATE FULLTEXT INDEX ON TestTable (varbinarycol TYPE COLUMN documentExtension LANGUAGE 1033)
    KEY INDEX tablePK
    GO
    
    -- NOTE: You might need to give the catalog a chance to build before running the script below.
    
    -- Now do a search that SHOULD RETURN 3 ROWS of data, but ONLY 1 ROW IS RETURNED
    SELECT CAST(varbinarycol AS NVARCHAR(MAX)) FROM TestTable WHERE CONTAINS(varbinarycol, 'test')

  2. #2
    Join Date
    Aug 2002
    Location
    Scotland
    Posts
    1,578

  3. #3
    Join Date
    Jul 2007
    Posts
    3
    Hi Satya,

    I know varbinary is supported by FT, but if you run my example code you'll see it has problems when an nvarchar is being converted into varbinary. Those rows of data don't show us when you do a CONTAINS or FREETEXT search.

    This is my problem as I need to take nvarchars in order allow full multi-lingual functionality support.

    Regards,
    Gavin.

  4. #4
    Join Date
    Jul 2007
    Posts
    3
    I've also tried adding the following extra meta tag - <META http-equiv="Content-Type" content="text/html; charset=utf-16">

    But this seemed to break the Full Text Index even further. If applied to the example T-SQL I give above then even the non-nvarchar input is no longer returned:

    INSERT INTO TestTable (pk, varbinarycol, documentExtension) VALUES (NEWID(), CONVERT(VARBINARY(MAX),'<META http-equiv="Content-Type" content="text/html; charset=utf-16"><META NAME="MS.LOCALE" CONTENT="EN">test entry 1'), '.html')

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •