Decodes HTML character entities
HTMLElement
, set
during instance construction
Finds a nested element using
.querySelector(someDomString)
a valid DOMString
trim whitespace from either end of return string
convert HTML character entities to UTF
Finds a nested list of elements using
.querySelectorAll(someDomString)
.
a valid DOMString
The text content of an element, converted to a number
Get the value of any property of the raw DOM entity stored in this class, such as the alt text for an image element wrapped in ParseElement. Do not use for properties that have already been explicitly implemented by ParseElement.
element property to retrieve, such as alt
.
Same as ParseElement.prop, but attempts to parse the output to a number.
numbic element property to retrieve, such as height
.
See Node.textContent
trim whitespace from either end of return string
convert HTML character entities to UTF
Generated using TypeDoc
Together, ParseElement and ParseList are a powerful DOM querying tool.
Their class architecture is implemented to encourage method chaining, stringing together class methods to describe complex queries from the DOM. Each class method returns either a final query result (such as ParseElement.innerHTML) or a new instance of ParseElement or ParseList (such as ParseElement.element). See usage for examples.
Internally, this class stores any element as an HTMLElement in ParseElement.raw. Entities returned by methods such as
querySelector(someDomString)
are typed as Element, even if they actually contain the properties of some more descriptive class such as HTMLAnchorElement. In the future, I may to try to infer types through the requested DOMString. Until then, ParseElement.prop and ParseElement.propAsNum are leveraged to access any Element properties not explicitly accessable through class methods.Usage
Initialization (for all other examples)
// create a new ParseElement from existing DOM element containing data to be scraped const page = new ParseElement(scrapedPageElement, 'some page');
Example 1: Extract a number from an element, defaulting to 0
extractedNumber = page .element('.inner > p.num') // inherits parseElement.strict = false .number(); // inherits strict, returns 0 if a number cannot be extracted
Example 2: Extract link from element, throwing an error if not found
extractedLink = page .element('.inner > a.link', 'some link', true) // overrides parseElement.strict = false .href(); // inherits strict, throws error if link not found
Here, if
.inner > a.link
cannot be found, an error will be thrown:Cannot find element: some page > some link
.If the element is found, but has no
href
value, this error will be thrown:No href found for element: scrape of some page > some link
Example 3: Extract the name of the second artist in a list (uses ParseList):
const secondArtistInList = page .element('div.music-element', 'some container', true) .list('div.artist-blocks', 'artist elements') .element(1, 'second artist') .element('p.artist-name', 'artist name') .text();
If the second artist block is found, but not the nested 'p.artist-name', an error is thrown:
Cannot find element: some page > some container > artist elements > second artist