Availableknowledge-base
Website crawling follows same-origin links, respects robots settings, uses sitemap hints where available, and stores extraction metadata for diagnostics.
JavaScript-heavy sites
Rendered crawling is foundation work, not a fully enabled browser-rendered ingestion path. If a site renders most content client-side, verify the extracted text before relying on it.