Results 1 to 3 of 3

Thread: Count HTML tags

  1. #1
    Join Date
    Apr 2012
    Posts
    1

    Unanswered: Count HTML tags

    Hello @ all , i have lot's of webpage coded in HTML and i want to count html tags and put it in a array.

    The result :

    p ; 33
    a ; 57
    .. ; ..

    I don't want to do this by listing before the all html tags and check if there are in the page.

    Thanks for your idea.

  2. #2
    Join Date
    Sep 2009
    Location
    Ontario
    Posts
    1,057
    Provided Answers: 1
    Two questions:
    Is <p> </p> two "p" or one "p" and one "/p"?
    Are all tags in the same case, and if not, do you want them all reported as lower case?

    Code:
    tagname=""
    found=n
    while read character
    do
      character=lower(character)
      if character = "<"
         tagname=""
         found=y
    fi
    if character=">"
       echo  tagname >tagfile
       tagname=""
       found=n
    fi
    if found=y
        tagname=tagname.character
    fi
    done
    sort tagfile
    count=0
    prev=""
    first=y
    while read sorted tagfile
    do
    if first=y
      prev=tag
      first=n
    fi
    if tag <> prev
    print prev count
    count=0
    prev=tag
    fi
    count=count+1
    done
    The code for reading a file one character at a time will be dependent on which shell you are using. Also there are meta characters in the input so you may have to find workarounds.
    Also, while probably bad form, html allows tags to be split by carriage returns and line feeds.

  3. #3
    Join Date
    Sep 2009
    Location
    Ontario
    Posts
    1,057
    Provided Answers: 1
    Two questions:
    Is <p> </p> two "p" or one "p" and one "/p"?
    Are all tags in the same case, and if not, do you want them all reported as lower case?

    Code:
    tagname=""
    found=n
    while read character
    do
      character=lower(character)
      if character = "<"
         tagname=""
         found=y
    fi
    if character=">"
       echo  tagname >tagfile
       tagname=""
       found=n
    fi
    if found=y
        tagname=tagname.character
    fi
    done
    sort tagfile
    count=0
    prev=""
    first=y
    while read sorted tagfile
    do
    if first=y
      prev=tag
      first=n
    fi
    if tag <> prev
    print prev count
    count=0
    prev=tag
    fi
    count=count+1
    done
    The code for reading a file one character at a time will be dependent on which shell you are using. Also there are meta characters in the input so you may have to find workarounds.
    Also, while probably bad form, html allows tags to be split by carriage returns and line feeds.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •