|
Name Generation
|
I've done some investigating into the various names used by different cultures in and around Prax. The intention was that players should have an idea of a character's culture from his or her name. There are two main ways of doing this: the syllable-merging approach, and the n-gram model approach. I've written a little program to to both. Syllable MergingMichael Harvey developed a program that created random words and names from a set of syllables (the original zip file is available). I translated the program to Delphi to give it a nicer user interface: hopefully it's intuitive to use!. [ Zipped executable (runs under Win3.1 and later) | Source | How it works (Michael's original document) ] Here are some example name elment files (the raw outputs given need judgement before use):
N-gram model approachThe other way to do this is to build a 'model' of the language. The idea is that the preceeding few letters in a word determine what the next letter could be. Let's say we're looking at bigrams, sequences of two letters (n = 2). If we take all the words in the language sample we've got, we can list all the bigrams that occur in all the words. We can also list, for each bigram, the letter that comes after it. We also record 'end of word' as being a possible successor letter for a bigram. We end up with a list of all the bigrams in the language sample, how frequent they are, and what letter follows. We also keep a list of the initial bigrams, so we know how words are allowed to start. This is our model of the language. To generate new words, we pick a random starting bigram from the list of initial bigrams. This gives us the first two letters of our word. We then look up that bigram in our main list of bigrams, which gives us a list of letters that can follow this bigram. We pick one of those at random, and that gives us the third letter of our word. We then take the bigram of the second and third letters and look it up in the list of bigrams; from this, we generate the fourth letter. We then use the third and fourth letter to generate the fifth, and so on until we choose an 'end of word' marker. Using larger values for n means that the generated words conform more closely to the words in the language sample, but there is a tendency to recycle the exising words if the sample is small. I find that using trigrams (n = 3) works well when there's a few hundred words.
I've mainly used this program to generate words for Tekumel-based games (mainly Tsolyani ones). The word list I use is based on one posted to the Tekumel mailing list and name lists at the Tekumel website. Here are some sample words.
|
Read Issaries's IP statements.
This page maintained by Neil Smith (webmaster@wimp.freeuk.com)