Vilistextum 2.6.5 - fault tolerant HTML to text converter

From: Patric Mueller (bhaak_at_bigfoot.com)
Date: 06/01/04

  • Next message: Frederick Noronha (FN): "Debian Conference 2004 (fwd)"
    Date: 1 Jun 2004 10:50:00 GMT
    
    

    Announcing the version 2.6.5 release of Vilistextum.

    Vilistextum is a small and fast HTML to text converter.
    It is quite fault-tolerant and deals well with badly-formed HTML.
    It has full support for different character sets (e.g. Unicode).

    Some features:
    ==============
    * understands HTML 3.2 upto 4.01 and XHTML 1.0
    * supports various multibyte encodings (Unicode, Shift_JIS, ...)
    * output can be optimized for ebook reading
    * converts characters and entities between 128 and 159 from the
      windows1252 charset to meaningful strings in ISO-8859-1.
    * GUI-frontend using kaptain
    * creates footnotes for links

    Changes:
    ========
    * vilistextum recognizes character set, even if it is declared with
            <META http-equiv="charset" content="utf-8">
    * BUGFIX: --no-title in combination with --shrink-lines didn't work
    * BUGFIX: ignore html tags inside script environments
    * BUGFIX: sometimes the last word in the document was not output

    Download:
    =========
    http://homepage.sunrise.ch/mysunrise/bhaak/vilistextum/vilistextum-2.6.5.tar.gz
    http://homepage.sunrise.ch/mysunrise/bhaak/vilistextum/vilistextum-2.6.5.tar.bz2

    Homepage:
    =========
    http://homepage.sunrise.ch/mysunrise/bhaak/vilistextum/

    ##########################################################################
    # Send submissions for comp.os.linux.announce to: cola@stump.algebra.com #
    # PLEASE remember a short description of the software and the LOCATION. #
    # This group is archived at http://stump.algebra.com/~cola/ #
    ##########################################################################


  • Next message: Frederick Noronha (FN): "Debian Conference 2004 (fwd)"

    Relevant Pages

    • Vilistextum 2.6.6 - fault tolerant HTML to text converter
      ... Vilistextum is a small and fast HTML to text converter. ... It has full support for different character sets. ... BUGFIX: empty SCRIPT tag caused text to be swallowed ...
      (comp.os.linux.announce)
    • Vilistextum 2.6.4 - fault tolerant HTML to text converter
      ... Vilistextum is a small and fast HTML to text converter. ... It has full support for different character sets (e.g. Unicode). ...
      (comp.os.linux.announce)
    • Re: HTML to text
      ... HTML formated e-mail bodies to the raw text, ... with full Unicode support (accented & foreign language ... You do not need to have IE nor additional character sets ...
      (borland.public.delphi.thirdpartytools.general)
    • Re: ASP CDO sending MS Word copied text
      ... email with this html outside of my ASP web application it displays fine. ... This is about character sets. ... I am after advice regarding character encoding. ...
      (microsoft.public.inetserver.asp.general)