Random name generator using Markov Chains

DrakilorP2P · #1 06-30-2008, 11:19 PM

This script, given a database of example names, will produce new random names using what it learned by parsing the example names. Some assembly required.

<tech>
The next character in the string is a random function of the previous character(s). For instance, if the script finds that the letter "f" often follows the letter "e" and that "q" almost never follows "e", it will replicate this by more often choosing "f" than "q" when faced with an "e". This is a first order chain. A second order chain will do the same, except choose the next character based on the two previous characters, such that "y" could often follow "to".
</tech>

Fun fact: Markov chains have been used for spam filtering. They have also been used to elude spam filters.

In order to produce original names, you will need to input a fairly large database of example names. By customizing the database, you will be able to choose what kind of names you want to generate. If you want English-sounding names, you input English names, if you want alien-sounding names, you input alien names.
There are some things you need to consider however. If the database is too large, the infamous loop limit will eat you if the performance demon doesn't. If it's too small, you'll often find yourself looking at unoriginal names or blatant copies of names already in the database.

You'll be able to choose what order the chains should be. This also gives you some options. A first order chain will often produce gibberish, while a fourth order chain will often produce unoriginal results. I've found that a third order chain will often produce non-gibberish with only occasional copies.

Fun fact: Markov chains are being used in Google's PageRank formula.

These names were generated in order-3 with a database of a thousand medium length names:

Quote:

rebe lucy kristine jessie danicher orah alba cara rita wilagros tessie wina son claudra alexis

These on the other hand, are order-3, but generated with only a small database of famous scientists. As you can see, most are duplicates.

Quote:

piccardson ein heisenberg curie shroedinger ein langmeir fowler shroedingevin tesla piccard vers planck bohr henriot planck heisenberg kramerschaffelt fowler shroedinger

Bottom line: the quality of the names depends entirely on your input data.
I suggest a decently large database with a fair share of longer names and second or third order chains.

PHP Code:


			
//#CLIENTSIDE

function onCreated()

{

  doNotCallThisFunction();

  

  //Use this.nameArray as the database. Generate third order chains.

  parseAndLearnArray(this.nameArray, 3);

  

  echo("---------------");

  echo("Markov: " @ generate());

  echo("Markov: " @ generate());

  echo("Markov: " @ generate());

  echo("Markov: " @ generate());

  echo("Markov: " @ generate());

}



function generate(c, currentWord)

{

  if (temp.c == null) {

    temp.c = "";

    for (temp.i=0; temp.i<this.order; temp.i++) temp.c @= "#";

  }

  

  if (currentWord == null) currentWord = "";

  

  temp.total = 0;

  for (temp.character: makevar("this.chain_key_" @ temp.c.substring(0, this.order))) {

    temp.frequency = makevar("this.chain_key_" @ temp.c.substring(0, this.order) @ "_" @ temp.character);

    temp.total += temp.frequency;

  }

  

  if (temp.total > 0) {

    temp.r = int(random(1, temp.total+1));

    for (temp.character: makevar("this.chain_key_" @ temp.c.substring(0, this.order))) {

      temp.frequency = makevar("this.chain_key_" @ temp.c.substring(0, this.order) @ "_" @ temp.character);

      temp.r -= temp.frequency;

      

      if (temp.r <= 0) {

        if (temp.character == "#") return temp.currentWord;

        

        temp.currentWord @= temp.character;

        temp.c = temp.c.substring(1, temp.c.length() - 1) @ temp.character;

        break;

      }

    }

  }

  else {

    return temp.currentWord;

  }

  

  return generate(temp.c, temp.currentWord);

}



function parseAndLearnText(text)

{

  temp.bakText = temp.text;

  temp.text = "";

  for (temp.i=0; temp.i<this.order; temp.i++) temp.text @= "#";

  temp.text @= temp.bakText @ "#";

  

  for (temp.i=0; temp.i<temp.text.length() - this.order; temp.i++) {

    temp.tmpText = temp.text.substring(temp.i, this.order);

    temp.nextCharacter = temp.text.substring(temp.i + this.order, 1);

    

    if (!makevar("this.chain_key_" @ temp.tmpText) == null) {

      makevar("this.chain_key_" @ temp.tmpText) = {};

    }

    if (!makevar("this.chain_key_" @ temp.tmpText @ "_" @ temp.nextCharacter)) {

      makevar("this.chain_key_" @ temp.tmpText @ "_" @ temp.nextCharacter) = 0;

      makevar("this.chain_key_" @ temp.tmpText).add(temp.nextCharacter);

    }

    makevar("this.chain_key_" @ temp.tmpText @ "_" @ temp.nextCharacter) += 1;

  }

}



function parseAndLearnArray(textArray, order)

{

  if (temp.order == null) temp.order = 3;

  this.order = temp.order;

  

  for (temp.line: temp.textArray) {

    parseAndLearnText(line);

  }

}



//It's most likely better to read these from a file

function doNotCallThisFunction()

{

  this.nameArray = {

"tesla",

"heisenberg",

"curie",

"richardson",

"bohr",

"brillouin",

"fowler",

"shroedinger",

"langmeir",

"planck",

"lorentz",

"henriot",

"kramers",

"piccard",

"verschaffelt",

"einstein",

"langevin"

};

}

As previously mentioned, some assembly required. You probably want to load all the names from a text file rather than putting them directly in an array.

Fun fact: Markov chains have been used to create chatbots.

Some more examples using larger databases than Graal can handle follows.

More than five thousand names, second order chains. More original but also some gibberish.

Quote:

la marrick gan ca dana weladian marri edee ma lorenan deth mina lina ro wey ti rodee shanitherlin ronna vice da monda rolannmarlisomashon

Just above 80000 dictionary words, third order chains. Extremely large databases produce more gibberish.

Quote:

chats oute debillintakeoff efteer matzu roomplose prediallowsers buness mutumustudity gondor papery quinteles cust unbei antrangedly hoopf ves divis rollier sad synecked earchy ryn***ed cowps butting ding vinyls pees uncent mail amp

Fun fact: J.R.R Tolkien used Markov chains.

Chompy · #2 06-30-2008, 11:30 PM

This is actually interesting..
I might use it for naming my npcs on symp :o

And Drakilor, I love it when you do something creative with your code gallery posts

! *stories, fun facts etc.*

Keep it up man

zokemon · #3 06-30-2008, 11:42 PM

Very nice!

++rep (when 24 hours go by)

As Chompy said, you're creative

TheStan · #4 07-01-2008, 12:15 AM

++ rep love the little fun facts plus this is awesome!

DrakilorP2P · #5 07-04-2008, 03:29 PM

I'm attaching two text files that you can use to feed the learning algorithm with. I should have done that on the OP.

Also, adding an NPC on a level which logs what everyone says might yield humorous results.