Options
All
  • Public
  • Public/Protected
  • All
Menu

Class ParseElement

Together, ParseElement and ParseList are a powerful DOM querying tool.

Their class architecture is implemented to encourage method chaining, stringing together class methods to describe complex queries from the DOM. Each class method returns either a final query result (such as ParseElement.innerHTML) or a new instance of ParseElement or ParseList (such as ParseElement.element). See usage for examples.

Internally, this class stores any element as an HTMLElement in ParseElement.raw. Entities returned by methods such as querySelector(someDomString) are typed as Element, even if they actually contain the properties of some more descriptive class such as HTMLAnchorElement. In the future, I may to try to infer types through the requested DOMString. Until then, ParseElement.prop and ParseElement.propAsNum are leveraged to access any Element properties not explicitly accessable through class methods.

Usage

Initialization (for all other examples)

// create a new ParseElement from existing DOM element containing data to be scraped
 const page = new ParseElement(scrapedPageElement, 'some page');

Example 1: Extract a number from an element, defaulting to 0

extractedNumber = page
    .element('.inner > p.num') // inherits parseElement.strict = false
    .number(); // inherits strict, returns 0 if a number cannot be extracted
extractedLink = page
    .element('.inner > a.link', 'some link', true) // overrides parseElement.strict = false
    .href(); // inherits strict, throws error if link not found

Here, if .inner > a.link cannot be found, an error will be thrown:

Cannot find element: some page > some link.

If the element is found, but has no href value, this error will be thrown:

No href found for element: scrape of some page > some link

Example 3: Extract the name of the second artist in a list (uses ParseList):

const secondArtistInList = page
    .element('div.music-element', 'some container', true)
    .list('div.artist-blocks', 'artist elements')
    .element(1, 'second artist')
    .element('p.artist-name', 'artist name')
    .text();

If the second artist block is found, but not the nested 'p.artist-name', an error is thrown:

Cannot find element: some page > some container > artist elements > second artist

Hierarchy

  • ParseElement

Index

Constructors

constructor

  • new ParseElement(element: Element, description?: string, strict?: boolean): ParseElement

Properties

description

description: string = "HTML Element"

entities

entities: any = new AllHtmlEntities()

Decodes HTML character entities

raw

raw: HTMLElement

HTMLElement, set during instance construction

strict

strict: boolean
  • true: method failures result in thrown errors instead proceeding
  • false: method failures return empty (not null) elements or default values. Allows method chaining to continue indefinitely

Methods

element

  • element(htmlQuery: string, targetDescription?: string, strict?: boolean): ParseElement

href

  • href(strict?: boolean, defaultVal?: string): string

innerHTML

  • innerHTML(strict?: boolean, defaultVal?: string, decode?: boolean): string

innerText

  • innerText(strict?: boolean, defaultVal?: string, trim?: boolean, decode?: boolean): string

list

  • list(htmlQuery: string, targetDescription?: string, strict?: boolean): ParseList

number

  • number(strict?: boolean, defaultNum?: number): number

prop

  • prop(propName: string, strict?: boolean, defaultVal?: string): string
  • Get the value of any property of the raw DOM entity stored in this class, such as the alt text for an image element wrapped in ParseElement. Do not use for properties that have already been explicitly implemented by ParseElement.

    Parameters

    • propName: string

      element property to retrieve, such as alt.

    • Default value strict: boolean = this.strict
    • Default value defaultVal: string = ""

    Returns string

propAsNum

  • propAsNum(propName: string, strict?: boolean, defaultNum?: number): number

textContent

  • textContent(strict?: boolean, defaultVal?: string, trim?: boolean, decode?: boolean): string
  • Parameters

    • Default value strict: boolean = this.strict
    • Default value defaultVal: string = ""
    • Default value trim: boolean = true

      trim whitespace from either end of return string

    • Default value decode: boolean = false

      convert HTML character entities to UTF

    Returns string

Generated using TypeDoc