ParseElement | MuCritic

Together, ParseElement and ParseList are a powerful DOM querying tool.

Their class architecture is implemented to encourage method chaining, stringing together class methods to describe complex queries from the DOM. Each class method returns either a final query result (such as ParseElement.innerHTML) or a new instance of ParseElement or ParseList (such as ParseElement.element). See usage for examples.

Internally, this class stores any element as an HTMLElement in ParseElement.raw. Entities returned by methods such as querySelector(someDomString) are typed as Element, even if they actually contain the properties of some more descriptive class such as HTMLAnchorElement. In the future, I may to try to infer types through the requested DOMString. Until then, ParseElement.prop and ParseElement.propAsNum are leveraged to access any Element properties not explicitly accessable through class methods.

Usage

Initialization (for all other examples)

// create a new ParseElement from existing DOM element containing data to be scraped
 const page = new ParseElement(scrapedPageElement, 'some page');

Example 1: Extract a number from an element, defaulting to 0

extractedNumber = page
    .element('.inner > p.num') // inherits parseElement.strict = false
    .number(); // inherits strict, returns 0 if a number cannot be extracted

Example 2: Extract link from element, throwing an error if not found

extractedLink = page
    .element('.inner > a.link', 'some link', true) // overrides parseElement.strict = false
    .href(); // inherits strict, throws error if link not found

Here, if .inner > a.link cannot be found, an error will be thrown:

Cannot find element: some page > some link.

If the element is found, but has no href value, this error will be thrown:

No href found for element: scrape of some page > some link

Example 3: Extract the name of the second artist in a list (uses ParseList):

const secondArtistInList = page
    .element('div.music-element', 'some container', true)
    .list('div.artist-blocks', 'artist elements')
    .element(1, 'second artist')
    .element('p.artist-name', 'artist name')
    .text();

If the second artist block is found, but not the nested 'p.artist-name', an error is thrown:

Cannot find element: some page > some container > artist elements > second artist

Hierarchy

ParseElement

Index

Constructors

constructor

Properties

Methods

Constructors

constructor

new ParseElement(element: Element, description?: string, strict?: boolean): ParseElement

- Defined in helpers/parsing/parseElement.ts:93
Parameters
- element: Element
- Default value description: string = "element"
- Default value strict: boolean = false
  
  see ParseElement.strict
Returns ParseElement

Properties

description

description: string = "HTML Element"

entities

entities: any = new AllHtmlEntities()

Decodes HTML character entities

raw

raw: HTMLElement

HTMLElement, set during instance construction

strict

strict: boolean

true: method failures result in thrown errors instead proceeding
false: method failures return empty (not null) elements or default values. Allows method chaining to continue indefinitely

Methods

element

element(htmlQuery: string, targetDescription?: string, strict?: boolean): ParseElement

- Defined in helpers/parsing/parseElement.ts:125
Finds a nested element using .querySelector(someDomString)

Parameters
- htmlQuery: string
  
  a valid DOMString
- Default value targetDescription: string = "element"
- Default value strict: boolean = this.strict
  
  see ParseElement.strict
Returns ParseElement

href

href(strict?: boolean, defaultVal?: string): string

- Defined in helpers/parsing/parseElement.ts:144
See HyperlinkElementUtils.href

Parameters
- Default value strict: boolean = this.strict
  
  see ParseElement.strict
- Default value defaultVal: string = ""
Returns string

innerHTML

innerHTML(strict?: boolean, defaultVal?: string, decode?: boolean): string

- Defined in helpers/parsing/parseElement.ts:163
See Element.innerHTML

Parameters
- Default value strict: boolean = this.strict
  
  see ParseElement.strict
- Default value defaultVal: string = ""
- Default value decode: boolean = false
Returns string

innerText

innerText(strict?: boolean, defaultVal?: string, trim?: boolean, decode?: boolean): string

- Defined in helpers/parsing/parseElement.ts:185
See HTMLElement.innerText

Parameters
- Default value strict: boolean = this.strict
  
  see ParseElement.strict
- Default value defaultVal: string = ""
- Default value trim: boolean = true
  
  trim whitespace from either end of return string
- Default value decode: boolean = false
  
  convert HTML character entities to UTF
Returns string

list

list(htmlQuery: string, targetDescription?: string, strict?: boolean): ParseList

- Defined in helpers/parsing/parseElement.ts:209
Finds a nested list of elements using .querySelectorAll(someDomString).

Parameters
- htmlQuery: string
  
  a valid DOMString
- Default value targetDescription: string = "list"
- Default value strict: boolean = this.strict
  
  see ParseElement.strict
Returns ParseList

number

number(strict?: boolean, defaultNum?: number): number

- Defined in helpers/parsing/parseElement.ts:227
Parameters
- Default value strict: boolean = this.strict
  
  see ParseElement.strict
- Default value defaultNum: number = 0
Returns number

The text content of an element, converted to a number

prop

prop(propName: string, strict?: boolean, defaultVal?: string): string

- Defined in helpers/parsing/parseElement.ts:243
Get the value of any property of the raw DOM entity stored in this class, such as the alt text for an image element wrapped in ParseElement. Do not use for properties that have already been explicitly implemented by ParseElement.

Parameters
- propName: string
  
  element property to retrieve, such as alt.
- Default value strict: boolean = this.strict
  
  see ParseElement.strict
- Default value defaultVal: string = ""
Returns string

propAsNum

propAsNum(propName: string, strict?: boolean, defaultNum?: number): number

- Defined in helpers/parsing/parseElement.ts:263
Same as ParseElement.prop, but attempts to parse the output to a number.

Parameters
- propName: string
  
  numbic element property to retrieve, such as height.
- Default value strict: boolean = this.strict
  
  see ParseElement.strict
- Default value defaultNum: number = 0
Returns number

textContent

textContent(strict?: boolean, defaultVal?: string, trim?: boolean, decode?: boolean): string

- Defined in helpers/parsing/parseElement.ts:279
See Node.textContent

Parameters
- Default value strict: boolean = this.strict
  
  see ParseElement.strict
- Default value defaultVal: string = ""
- Default value trim: boolean = true
  
  trim whitespace from either end of return string
- Default value decode: boolean = false
  
  convert HTML character entities to UTF
Returns string

Usage

Initialization (for all other examples)

Example 1: Extract a number from an element, defaulting to 0

Example 2: Extract link from element, throwing an error if not found

Example 3: Extract the name of the second artist in a list (uses ParseList):

Hierarchy

Index

Constructors

Properties

Methods

Constructors

constructor

Parameters

element: Element

Default value description: string = "element"

Default value strict: boolean = false

Returns ParseElement

Properties

description

entities

raw

strict

Methods

element

Parameters

htmlQuery: string

Default value targetDescription: string = "element"

Default value strict: boolean = this.strict

Returns ParseElement

href

Parameters

Default value strict: boolean = this.strict

Default value defaultVal: string = ""

Returns string

innerHTML

Parameters

Default value strict: boolean = this.strict

Default value defaultVal: string = ""

Default value decode: boolean = false

Returns string

innerText

Parameters

Default value strict: boolean = this.strict

Default value defaultVal: string = ""

Default value trim: boolean = true

Default value decode: boolean = false

Returns string

list

Parameters

htmlQuery: string

Default value targetDescription: string = "list"

Default value strict: boolean = this.strict

Returns ParseList

number

Parameters

Default value strict: boolean = this.strict

Default value defaultNum: number = 0

Returns number

prop

Parameters

propName: string

Default value strict: boolean = this.strict

Default value defaultVal: string = ""

Returns string

propAsNum

Parameters

propName: string

Default value strict: boolean = this.strict

Default value defaultNum: number = 0

Returns number

textContent

Parameters

Default value strict: boolean = this.strict

Default value defaultVal: string = ""

Default value trim: boolean = true

Default value decode: boolean = false

Returns string