The numerics library contains plenty of useful functionality to generate random and pseudo-random data. This can be useful when creating sample databases, data for data visualizations, text for prototypes and whatnot. Here is an overview of the data generators you can find in the numerics library.

Lorem Ipsum data generator

The Lipsum generator will output an arbitrary piece of pseudo-language text somewhat like

“Lorem ipsum dolor sit amet sadipscing et sit rebum dolores consequat qui eros aliquyam illum. Tation eum vero duo nonumy ipsum takimata dolor dolor minim dolor sed amet et labore takimata ex. Kasd et aliquyam sed quod sadipscing amet dolor eum sed du is elitr sadipscing consectetuer iriure. Gubergren dolore ipsum esse labore praesent sadipscing lorem euismod et ut duo magna nobis zzril. Ipsum gubergren dolor est blandit invidunt lorem et nibh et et duo gubergren diam amet. At dolor lorem lorem do lor lorem aliquyam feugiat duo est volutpat nibh tempor duo consetetur consequat aliquyam euismod. Euismod at minim takimata lorem et wisi justo veri nonummy ipsum eirmod tempor et eum. At eirmod kasd amet magna sea sanctus rebum diam quis amet et se lam adipiscing consetetur. Takimata illum kasd voluptua no vel amet lorem dolor consetetur molestie ut et sed at. Nam est rebum tempor diam et eos magna et aliquip.”

The default will generate 25 words;

DataGenerator.RandomLipsum()

and the optional parameter specifies how many words the paragraph should contain;

DataGenerator.RandomLipsum(numWords: 145)

Note that the text does not generate paragraphs, this can however easily be done by calling the method in a loop in your own application.

References

See Wikipedia on the famous and widely used Lorem Ipsum text.

DataStore, uniqueness and custom data sets

Custom data set for data genration

All data generation is based on a base set defined in the DataStore. If you wish to use your own set of names (e.g. In function of some localization or peculiar business context) you can access the sets a public, static values. For example, the

DataStore.EnglishMaleName = new[]{"John", "Peter", "David"};
DataGenerator.ResetMarkovChains();

Will changes the male name set and the reset will ensure that the data is re-initialized. This ResetMarkovChain method drops the cached values and is essential if you use your own data sets.

On uniqueness of the generated item

The Markov chain class ensures (by default) uniqueness across the generate sets by retaining a cache of used items. This means that if your custom data set is too small with respect to the amount of items you wish to generate the chain will fail to find variations and thus will get stuck in an endless loop. To this end there is a property called EnsureUniqueness on the MarkovNameGenerator class which can be set to false and which bypasses the check. Meaning that also small datasets can generate large numbers of items but uniqueness is not guaranteed anymore.

Random person data

To generate a set of 100 full names with middle initials you would use:

DataGenerator.RandomPersonNameCollection(PersonDataType.FullNameWithMiddleInitial, 100);

The PersonDataType offers the following options:

  • MaleFirstName: the first name of a male person
  • FemaleFirstName: the first name of a female person
  • FirstName: the first name of a male or female person
  • FamilyName: a family name
  • FullName: a male or female first name followed by a family name
  • FullNameWithMiddleInitial: same as FullName but with a middle initial between the firstname and family name

A typical batch with the FullNameWithMiddleInitial would produce something like:

{Zacheatha L. Elson, Tiago X. Pepall, Chasia U. Rtinez, Nashe F. Zalez, Rianai O. Orriso, Rissac Y. Llingto, Ustian J. Brown, Makayla V. Green, Ouise T. Brera, Lilanes K. Ramirez, Roder C. Valji, Kenda N. Rnanayake, Crain S. Alker, Destony M. Arrott, Tinia V. Angels, Eagan C. Orres, Stino P. Omvillin, Kiann X. Selvaratnam, Orinee U. Lliams, Miniqueli H. Homas, Jagger M. Ewartin, Utumn O. Ensdales, Ashton G. Scott, Ytony X. Rrison, Arian G. Ilson, Brysta M. Herby, Emanuel V. Moore, Dolynn O. Ukins, Endale A. Arker, Hector K. Tchelson, Emiano F. Adhyad, Daria Q. Illips, Elynn A. Artine, Ordyn G. Llinson, Akota C. Aylore, Palond D. Young, Jaden Z. Millin, Arlton D. Aller, Conra M. Wilson, Ylerm T. Edeson, Ennen X. Ashingbert, Chelesli Z. Lington, Ustus U. Hompso, Ebeca A. Pperez, Celynn N. Feuvre, Anabet I. Davis, Emiah B. Artinez, Llyssan Q. Dhyad, Metric O. Hayes, Lilan X. Urner, Ystalia N. Lvaratnam, Ydenz G. Helly, Thenry W. Deson, Alinee Q. Witte, Jacke A. Mitchez, Ximothy X. Lstridge, Annabetha E. Llingber, Argan B. Millen, Renna Z. Perdu, Ydenny N. Mckiddie, Aydenz B. Denial, Arrio U. Llips, Nnethan E. Bryant, Itney F. Garcia, Eshaun X. Parker, Gustyn I. Thompson, Riannah S. Gauge, Llian L. Martin, Leigh W. Jones, Ielleen Z. Rsnelson, Asonn U. Virji, Nkierce K. Khambaita, Amustu P. Larke, Imond S. Martinez, Ilipe G. Llingbe, Tephery O. Wardli, Nnethe S. Mccalman, Attien C. Gelson, Eonard W. Larker, Lvatonel P. Butler, Nathenriq Z. Arrison, Lvince J. Hnson, Elyssa I. Amirez, Elisa R. Nnett, Erick A. White, Onaldon F. Philling, Isiah V. Ckson, Harley U. Simmons, Aristinee K. Wright, Ricento A. Coooks, Ainey O. Eatherby, Gunnar L. Davie, Livian B. Harris, Ngelo O. Henderson, Marileen M. Walker, Freden L. Weild, Eesen P. Warter, Lania P. Ennett, Esonnon K. Elshel, Aliya N. Wrigue,}

The default call to this method returns a batch of 15 person names with the FullNameWithMiddleInitial option.

You can also generate single names using

DataGenerator.RandomPersonName();

With an optional PersonDataType parameter which is set by default to FullNameWithMiddleInitial.

Stochastic numbers

Since a big part of the statistical distributions depend on stochastic generators this code sits in the Statistics section.

Random Addresses

The random address generator is in fact a combination of the separate methods available to generate

  • RandomCompanyName: generates a random company name
  • RandomStreetName: generates a random address line (Street plus optionally a house number)
  • RandomStateName: generates a randim state name
  • RandomCountryName: generates a random country name
  • RandomZipCode: generates a random zip code

The DataGenerator.RandomAddress() method has a flag enum parameter which allows you to combine different option, the default give a typical batch like below:

Xenom Insurances
School Lane 55
J1Z 4N1 Brighton
Gloucestershire, Tanzania, United Republic of

Oraculepa
Manor Road 496
S3 6ZR Romford
Nova Scotia, Poland

Mephos+
Park Lane 17
Z2A 8Z7 Worcester
Saskatchewan, Greenland

Getaur Software
Church Road 235
09757 Auburn Hills
NS, Estonia

Random document titles

This is yet another Markov chain generator which produces random titles, a typical batch being

 

  • A micro-scale using ipad technology.
  • Diverging trends in computer games.
  • Articificial intelligence of pseudo-variant systems in fluid motion of blue gas.
  • Algorithmic beauty of javascript development.
  • Invariant systems in computing using Sharepoint.
  • Google search infrastructure at low altitude.
  • Minor changes in fluid motion graphics in computer games.
  • Articificial intelligence of large data at extreme height.
  • Personalization and searching mechanisms.
  • Marketing of pseudo-variant systems .
  • Western Europe at low temperature.
  • Abstract patterns of large data sets.
  • Data transformation challenges using ipad technology.
  • Global economy on linux.
  • Diverging trends in western europe.
  • Abstract patterns of clinical trials in fluid motion of metallic structures through middle-management in relation to develop a simple solutions for SOA architecture.
  • Sequence analysis using ipad technology.
  • Global economy on a simple appliance using modern algebraic approaches.
  • Graph databases for complex problems.

You can generate a list of 20 titles using

 

var list = DataGenerator.RandomTitles(20);

Note that like other items being generated using a Markov chain, you can alter the DataStore (in this case the DataStore.DocumentTitleSample collection ) where a base set of document titles is defined. See “DataStore, uniqueness and custom data sets” for more information.

File extensions

You can generate a random file extension (or a collection of extensions) using

DataGenerator.RandomFileExtension()

A typical batch will generate something like

  • eml
  • maq
  • api
  • xml
  • pptx
  • c
  • pl
  • rpmsg
  • csv
  • bak

Note that the ‘.’ is not added and that the extensions are in lower case. If necessary you can capiatlize the first letter using the TextExtensions.Capitalize method.

The generator has an optional paramter which specifies the data set to use when sampling:

  • CommonExtensions: various well-know file extensions (pdf, txt, png and so on)
  • OfficeExtension: typical set of Microsoft Office (2010 or above) file extensions (docx, pptx and so on)

The default will sample extensions from both collections.

An additional utility method is the GetFileExtensionDescription method which attempts to fetch the description (associated application or owner) of the extension. Using this you can get the following type of information:

gif: Graphics Interchange Format that supports animation. Created by CompuServe and used primarily for web use.

htm: Hyper Text Markup. This markup language is used for web design.

shtml: HTML file that supports Server Side Includes(SSI).

pcl: Printer Control Language file. PCL is a Page Description Language developed by HP.

rm: RealAudio video file.

Note that like other items being generated using a Markov chain, you can alter the DataStore (in this case the DataStore.CommonExtensions and DataStore.OfficeExtensions collection) where a base set of document titles is defined. See “DataStore, uniqueness and custom data sets” for more information.

Markov chain generator

There are two types of Markov generators in RadMath

  • MarkovNameGenerator: A generator which takes a list of strings an combines this list into new ones
  • MarkovTextGenerator: A generator which takes a portion of text and outputs another text which in style and punctuation is similar to the given one.

Both generators can be tuned in accuracy and variation by feeding them with a bigger dataset and by allowing more memory consumption, which allows a bigger pool of variations.

For a readable explanation how Markov chain generators function, see this article by Jeff Atwood. For more thorough pointers, see the article in Wikipedia.

Custom usage of the MarkovNameGenerator

If you wish to use the MarkovNameGenerator for a custom set of string, you need to proceed as follows:

  • Define a sample set of strings from which the generator will collect portions according to the accuracy (called th order). The strings have supposedly some affinitiy (say medical terms, legal terms and so on) although technically this is of no importance.
  • Instantiate the generator by specifiying:
    • The sample or set. The larger the set the more variations are possible.
    • The order: the length of the strings which will be taken from the pool (samples). The higher the order the more resemblance there will be with the original set. The default value is 3.
    • The minimum length of the strings. The default is 5.
  • Call the NextName property which will return a new string or combination.

Note that there is EnsureUniqueness property (which by default is true), see “DataStore, uniqueness and custom data sets” for more information.

Custom usage of the MarkovTextGenerator

The text generator works in an analogous fashion as the name generator but does not keep a cache of the sample. This means that you do not need to instantiate the MarkovTextGenerator and can use the static Generate method;

MarkovTextGenerator.Generate(sample)

where ‘sample’ is some portion of text. Optional parameters are:

  • The size of the text to be generated. The larger the size the bigger the sample should be in order to have some uniqueness and variety.
  • The order or window of the sampling. The bigger the window the more likely the text will seem to be real and the less aberrations or odditites.

For your convenience, several texts have been added to the library from which text can be generated:

  • DataStore.BulgarianSample: A Bulgarian text from the ‘Epical Songs’ by Pencho Slaveykov.
  • DataStore.BiologySample: A biology sample from the ‘Origin of species’ by Charles Darwin.
  • DataStore.LatinSample: A Latin sample from the ‘Principia Mathematica’ by Isaac Newton.
  • DataStore.SpanishSample: A Spanish sample from ‘Los cuatro jinetes del apocalipsis’ by Vicente Blasco Ibáñez.
  • DataStore.PhilosophySample: A philosophy sample from ‘A treatise of human nature’ by David Hume.
  • DataStore.English1Sample: A sample from ‘The Illiad’ by Homer.
  • DataStore.English2Sample: A sample from ‘The hound of Baskerville’ by Sir Arthur Conan Doyle.

These texts were taken from the freely available text in the Guttenberg project.

The DataGenerator contains a wrapper method RandomTextVariation which achieves the same result and is a simple alias for the MarkovTextGenerator. The overload with a TextSamples enumeration parameter allows you to easily access the aforementioned text samples, for example

DataGenerator.RandomTextVariation(TextSamples.Bulgarian,1500);

Will generate a random Bulgarian (variation on the sample) text of 1500 words long, which could be something like this:

“и се промъкнах,

видях ива, видях кърви… И не сетих как измъкнах

остро ножче из сърце му и в прегръдки си обвих го…

Нек’ сега ни се нарадват, мене майка, нему татко:

мъртви ние пак се любим и смъртта за нас е сладка!

не в черковний двор зариха на любовта двете жъртви –

тамо ровят само тия, дето истински са мъртви –

а погребаха ни тука, на брегът край таз долина…

Той израстна кичест явор, а до него аз калина; –

той ме е прегърнал с клони, аз съм в него вейки свряла,

за сърцата що се любят и смъртта не е раздяла.

Дълго аз стоях и слушах, там под сянката унесен,

и това що чух, изпях го в тази моя тъжна песен.

…”

Alternatively, a typical output using

MarkovTextGenerator.Generate(DataStore.English2Sample)

would give:

“Glanced at me from between swollen lids. It was she, then, who wept in the night, and if she did so her husband must know it. Yet he had taken the obvious risk of discovery in declaring that it was not so. Why had he done this? and why did she weep so bitterly? already round this pale-faced, handsome, black-bearded man there was gathering an atmosphere of mystery and of gloom. It was he who had been the first thing to do was to see the grimpen postmaster and find whether the test telegram had really been placed in barrymore’s own hands. Be the answer what it might, i should at least have something to report to Sherlock Holmes.”

Random strings and letters

The DataGenerator contains two overloaded methods related to random letters and string:

  • RandomString (length, type) which generates a random string of a certain length using the character set defined by the type:
    • UpperCaseLetter: upper case letters
    • LowerCaseLetter: lower case letters
    • Numbers: numbers
    • French: characters like ‘è’ and ‘ç’
    • Special: diverse characters like ‘$’, ‘§’ and so on
    • Brackets: the brackets ‘[‘, ‘{‘, ‘]’ and so on
  • RadomString(length, input) which generates a random string of a certain length by sampling from the given input string.

Obvisouly, if you wish to generate a random letter you could just call the method with size equal to one but there is an alias for this too, called RandomLetter which also has an optional character type parameter.

For example, if you execute the following

(DataGenerator.RandomString(10, CharType.UpperCaseLetters))
(DataGenerator.RandomString(15, (CharType.UpperCaseLetters|||CharType.LowerCaseLetters)))
(DataGenerator.RandomString(25, (CharType.Numbers|||CharType.Special)))
(DataGenerator.RandomString(17, (CharType.French|||CharType.Brackets)))
(DataGenerator.RandomString(48))

 

You will see returned something like the following:

PHQCDKROFX
KwUDNmctduRBIcv
c{ùwku;bcµl>µ9cv£v§£°my15
(à[{[)èéèè]{èé[à}
hWH}[ey0vçq)O°Wm*]<TQoXFµ9pXw8bJx°°Nh]àBMaI£èkBY

Random dates and time

There is in the .Net framework not a ‘time’ data type but rather a TimeSpan or a DateTime which contains as a subtype a timestamp. The DataGenerator has hence two methods related to this:

  • RandomDate(begin, end): generates a random date (and random time) within the specified interval. There is also an overload which does not take any argument and will return a random date within the full span of the DateTime data type.
  • RandomTimeSpan(being, end): generates a random time span which, if added to the begin date, will not exceed the specified end date.