Re: YOU ALL SUCK!

From: Ian Sedwell (ian.sedwell_at_btclick.com)
Date: 09/10/04


Date: Fri, 10 Sep 2004 16:15:53 +0000 (UTC)

On 2004/09/10 01:05, in article
Xns955FCC8E99C10RepublicPicturesLtd@216.221.81.119, "Tris Orendorff"
<triso@remove-me.cogeco.ca> wrote:

> carl.scharenberg@gmail.com (Carl Scharenberg) wrote in
> news:e930c085.0409020529.2db830fc@posting.google.com:
>
>
>>> This seems to be of somewhat better quality than the output of the
>>> typical random-text generator. Can anyone suggest something on CPAN
>>> useful for such?
>>
>> You can do this by analyzing a sample text at a higher level. Instead
>> of generating text from the frequency of single letters, you generate
>> using the frequencies of 2, 3, or 4-letter sequences. You analyze a
>> large text so you have a database of frequencies. When generating each
>> new character you look at the frequences of the letters given that the
>> 3 previous letters are 'the'. The possibilities are a space, 'r'
>> (their), 'y' (they), and some others. Overall it will generate words
>> and even phrases that seem to almost make sense. It is neat stuff.
>
> This is known as a Markov Chain and it works even better if you generate using
> words rather than letters.
> Using letters creates words and non words. The output is written in the same
> style as the input text.
>

There's a technique for verifying the authorship of a text by comparing the
frequency of triples in a text of known provenance against the frequency of
the same triples in the suspect text. The larger the texts, the better the
accuracy. Presumably, if one sampled a large enough corpus from a single
author, one could generate a Markov Chain in the style of that author.



Relevant Pages

  • Re: YOU ALL SUCK!
    ... Sedwell" wrote: ... >> Using letters creates words and non words. ... > the same triples in the suspect text. ... one could generate a Markov Chain in the style of that author. ...
    (comp.os.linux.misc)
  • Re: Spell check does not catch single letters.
    ... Those letters are not in the lexicon. ... Word made this choice because there are many uses for single letters that ... >>>>>to get word to show a spelling error if only one ...
    (microsoft.public.word.spelling.grammar)
  • Re: where do so many tenses come from?
    ... Not only the frequency of phonemes plays a rôle but much more so the ... single letters (not the same as phonemes, but the effect will be the same ... 27 pairs beginning with that letter, and for one quadruple of letters ...
    (sci.lang)
  • Re: YOU ALL SUCK!
    ... >> You can do this by analyzing a sample text at a higher level. ... >> new character you look at the frequences of the letters given that the ... > This is known as a Markov Chain and it works even better if you generate using words rather than letters. ... And for the monkeys I wanted letter ...
    (comp.lang.python)
  • Re: YOU ALL SUCK!
    ... >> You can do this by analyzing a sample text at a higher level. ... >> new character you look at the frequences of the letters given that the ... > This is known as a Markov Chain and it works even better if you generate using words rather than letters. ... And for the monkeys I wanted letter ...
    (comp.os.linux.misc)