Results 1 to 10 of 10
  1. #1
    Join Date
    Jun 2007
    Posts
    12

    Unanswered: String tokenizer

    I have searched the forums and internet, but can only find ORACLE examples and nothing for SQL.

    Does anyone here have an example of a microsoft SQL compatible string tokenizer? Or point me in the correct place.

    In this list i am parsing there is not a set number of levels to parse, some may have 1 level some may have 50.

    Any help would be beneficial.

  2. #2
    Join Date
    Jun 2009
    Posts
    26
    I don't believe SQL Server has any built in tokenizing functions - in fact it's string manipulation functions are quite primitive.
    I'm guessing your only real option would be to create a custom function in the databse that tokenized the string. Your SQL could then call the tokenizer function as required.

  3. #3
    Join Date
    Nov 2003
    Location
    denver
    Posts
    11
    SQL Server dose not have the same capabilities as oracle but we can create our own functions using SQL Server user defined functions.

    Currently SQL Server has a function called : PatIndex, which can find the compare the patern with the string.

    Example:
    USE AdventureWorks;
    GO
    SELECT CHARINDEX('arm', Title)
    FROM Production.Document
    WHERE DocumentID = '1';
    GO

    --Returns
    7
    The above returned value can be used to get the value again back from the string using substring function.
    Example:

    SELECT Substring (Title, 1, CHARINDEX('arm', Title))
    FROM Production.Document
    WHERE DocumentID = '1';


    Cheers
    Shailesh Patangay

  4. #4
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    What exactly do you mean by a "tokenizer"? Are you thinking yacc/lex or just keyword recognition?

    Microsoft and their predecessors have repeatedly avoided the idea of making Transact-SQL into a full blown scripting language. I think that was a good choice, since it helps keep a clear separation between the database and the applications that use the database.

    Microsoft produced a novel work-around for this problem with the CLR. The CLR allows developers to actually host end-user code (VB, C#, etc) within the SQL Server. This is very much a "mixed bag" in my opinion, but a good solution for preserving the distinction while allowing application code to run on the SQL Server.

    If you can explain what "tokenizing" means to you, I bet that there's a clean and supportable way to provide it.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  5. #5
    Join Date
    Jun 2007
    Posts
    12
    spatan - Correct me if i am wrong, but i believe that will only give me the first occurance and not the Nth.

    Pat - Essentially i am trying to gain access to parts of a string. The format is as follows:

    ** <name1> ** <date1> <body1> ** <name2> ** <date2> <body2> ...

    Usually if i am doing this in a programming language I would use a string tokenizer to gain access to the values. I would know the first token has the name and the second has the date and body.

    Essentially i am looking to split this up to gain access to the information within. If you know of a better way to perform this or get access to this information please let me know i am open to alternate solutions.

  6. #6
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    I know a fair number of programming languages (dozens), but the only ones that I know of that incorporate "string tokenizers" into the language are SNOBOL, WADL, and the Georgia Tech version of RATFOR. There are a number of other languages that sneak equivalent functionality in via RexExp parsers (like AWK and Perl).

    Your example looks a little bit like XML, but it appears to be missing the close tags. Many databases can parse XML nicely. Is there any chance that you're trying to parse XML? It would be formatted more like:
    Code:
    <record><name>Pat Phelan</name><date>Only ladies</date></record>
    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  7. #7
    Join Date
    Nov 2003
    Posts
    2,935
    Provided Answers: 12
    Quote Originally Posted by Pat Phelan
    I know a fair number of programming languages (dozens), but the only ones that I know of that incorporate "string tokenizers" into the language are SNOBOL, WADL, and the Georgia Tech version of RATFOR.
    Java has one as well

  8. #8
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    I'm pretty sure that Java does that via a JAR library, not as part of the language itself.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  9. #9
    Join Date
    Jun 2007
    Posts
    12
    The programming language i was referencing was Java and it is part of the language, just have to include it in the header ... Anyway i didnt mean to misslead you with the <>. the items with the <> are just fillers for data. So instead of putting a date i just put <date1>. An example with real data would be like the following:

    ** Joe Smith (js) ** Monday, January 26, 2009 12:04 PM Emailed listing over to customer

    and then the format is repeated with a different user and date and body depending on what has been put in.

    Hopefully that makes more sense. I may have found a solution, but will try it out before I rule on that. I am still open to suggestions.

  10. #10
    Join Date
    Sep 2009
    Posts
    1
    This site has code for SQL Server to Tokenize a String

    SpatialDB Advisor: String Tokenizer for SQL Server 2008 written in TSQL

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •