Norconex Web Crawler

From Wikipedia, the free encyclopedia


Other namesNorconex HTTP Collector
Developer(s)Norconex Inc.
Initial release2016
Stable release
3.0.2 / 2022-01-05
RepositoryGitHub Repository
Written inJava
Operating systemCross-platform
LicenseApache License
WebsiteNorconex Web Crawler

Norconex Web Crawler is a free and open-source web crawling and web scraping Software written in Java and released under an Apache License. It can export data to many repositories such as Apache Solr, Elasticsearch [1], Microsoft Azure Cognitive Search, Amazon CloudSearch and more.[2][3][4]

The Crawler can be run on its own or embedded in your own Java application.[5][6]

Some key features are:

  • Multi-threaded
  • Extract text from a variety of file formats (HTML, PDF, Word, etc.)
  • Extract metadata associated with documents
  • Supports pages rendered with JavaScript
  • Incremental crawls
  • Supports external commands to parse or manipulate documents
  • Send extracted data to a variety of repositories

Some well-known companies and products using Norconex Web Crawler are: Apache Solr Ecosystem, Department of National Defence, Universities Canada, U.S. Department of Education, Department of National Defence.[7] [8]

History[edit]

Norconex Web Crawler was released as free and open-source software in 2013.[9]

References[edit]

  1. ^ "Enhance Your Search Capabilities with Norconex Web Crawler: Indexing Data to Elasticsearch". Medium. Apr 12, 2024.{{cite web}}: CS1 maint: url-status (link)
  2. ^ "Committers". opensource.norconex.com.
  3. ^ Hoppa, Jocelyn (10 February 2020). "Importing Data from the Web with Norconex & Neo4j". Graph Database & Analytics.
  4. ^ "Deploy a Norconex HTTP Collector Indexer Plugin | Cloud Search". Google for Developers.
  5. ^ Valcheva, Silvia (11 February 2018). "10 Best Open Source Web Crawlers: Web Data Extraction Software". Blog For Data-Driven Business.
  6. ^ "Norconex HTTP Collector". Softpedia. 9 July 2023. Retrieved 25 September 2023.
  7. ^ "SolrEcosystem - Solr - Apache Software Foundation". cwiki.apache.org.
  8. ^ "Norconex Crawler Users". opensource.norconex.com.
  9. ^ "Norconex Gives Back to Open-Source – Norconex Inc". Retrieved 2023-09-25.

Mentions in Academic Research[edit]

See also[edit]