Interface HTMLParser


public interface HTMLParser
A front end to a DOM parser that can handle HTML.
Since:
1.5.2
Author:
Russell Gold, Bernhard Wagner
  • Method Summary

    Modifier and Type
    Method
    Description
    Removes any string artifacts placed in the text by the parser.
    void
    parse(URL baseURL, String pageText, DocumentAdapter adapter)
    Parses the specified text string as a Document, registering it in the HTMLPage.
    boolean
    Returns true if this parser supports forcing the upper/lower case of tag and attribute names.
    boolean
    Returns true if this parser can display parser warnings.
    boolean
    Returns true if this parser supports preservation of the case of tag and attribute names.
    boolean
    Returns true if this parser can return an HTMLDocument object.
  • Method Details

    • parse

      void parse(URL baseURL, String pageText, DocumentAdapter adapter) throws IOException, SAXException
      Parses the specified text string as a Document, registering it in the HTMLPage. Any error reporting will be annotated with the specified URL.
      Throws:
      IOException
      SAXException
    • getCleanedText

      String getCleanedText(String string)
      Removes any string artifacts placed in the text by the parser. For example, a parser may choose to encode an HTML entity as a special character. This method should convert that character to normal text.
    • supportsPreserveTagCase

      boolean supportsPreserveTagCase()
      Returns true if this parser supports preservation of the case of tag and attribute names.
    • supportsForceTagCase

      boolean supportsForceTagCase()
      Returns true if this parser supports forcing the upper/lower case of tag and attribute names.
    • supportsReturnHTMLDocument

      boolean supportsReturnHTMLDocument()
      Returns true if this parser can return an HTMLDocument object.
    • supportsParserWarnings

      boolean supportsParserWarnings()
      Returns true if this parser can display parser warnings.