SmartReader: Get Clean Articles

Featured image
A .NET Standard library to extract the main content of a web page

SmartReader is designed to remove the clutter from a web page: ads, sidebars, etc. and get you just the content. The core algorithm is a port of the Mozilla Readability library. The original library is stable and used in production inside Firefox. By relying on a library maintained by a competent organization like Mozilla we can piggyback on their hard and well-tested work.

SmartReader also add some improvements on the original library, getting get more and better metadata:

  • site name
  • an author and publication date
  • the language
  • the excerpt of the article
  • the featured image
  • a list of images found (it can optionally also download them and store as data URI)
  • an estimate of the time needed to read the article