{"id":337,"date":"2023-01-31T14:30:32","date_gmt":"2023-01-31T09:30:32","guid":{"rendered":"https:\/\/bernd32.xyz\/blog\/?p=337"},"modified":"2023-01-31T14:33:55","modified_gmt":"2023-01-31T09:33:55","slug":"generating-random-words-in-js","status":"publish","type":"post","link":"https:\/\/bernd32.xyz\/blog\/?p=337","title":{"rendered":"Generating random words in JS"},"content":{"rendered":"<p>Russian version is <a href=\"https:\/\/bernd32.xyz\/blog\/?p=295\">here<\/a>.<\/p>\n<p>Let&#8217;s say we had an idea to create a JS script that would generate random words (nicknames).<\/p>\n<p>Let&#8217;s start with the simplest approach first. If we just generate random letters and <span class=\"HwtZe\" jsaction=\"mouseup:Sxi9L,BR6jm; mousedown:qjlr0e\" jsname=\"jqKxS\" lang=\"en\"><span jsaction=\"agoMJf:PFBcW;MZfLnc:P7O7bd;nt4Alf:pvnm0e,pfE8Hb,PFBcW;B01qod:dJXsye;H1e5u:iXtTIf;lYIUJf:hij5Wb;bmeZHc:iURhpf;Oxj3Xe:qAKMYb,yaf12d\" jsname=\"txFAF\" class=\"jCAhz ChMk0b\" jscontroller=\"Gn4SMb\"><span class=\"ryNqvb\" jsaction=\"click:E6Tfl,GFf3ac,tMZCfe; contextmenu:Nqw7Te,QP7LD; mouseout:Nqw7Te; mouseover:E6Tfl,c2aHje\" jsname=\"W297wb\">make words using these letters<\/span><\/span><\/span>, they will look unnatural and unsightly. Here are some examples:<\/p>\n<ul dir=\"auto\">\n<li>srjxdq<\/li>\n<li>moyssj<\/li>\n<li>ywtckmw<\/li>\n<li>wjvzw<\/li>\n<li>xtwey<\/li>\n<\/ul>\n<p>etc.<\/p>\n<p>As you can see, this approach does not allow us to generate words that even remotely resemble natural ones &#8211; at the output, we simply generate a set of meaningless letters that looks more like passwords than words. Here are two points how to make the generated random words look more natural (in my opinion):<\/p>\n<ol>\n<li>Avoid occurrences of more than two vowels\/consonants when generating a word. This problem is trivial and I will not consider it.<\/li>\n<li>Pick random letters for a word based on their weight. Weights in this case will be the frequency of letters in English. This way we have to reduce\/increase the chance that a certain letter will end up in our generated word, and rarely used letters such as Q, Z and X will appear in our words much less often than E, T, A, O , I, which are statistically the most frequent in English words.<\/li>\n<\/ol>\n<p><span class=\"HwtZe\" jsaction=\"mouseup:Sxi9L,BR6jm; mousedown:qjlr0e\" jsname=\"jqKxS\" lang=\"en\"><span jsaction=\"agoMJf:PFBcW;MZfLnc:P7O7bd;nt4Alf:pvnm0e,pfE8Hb,PFBcW;B01qod:dJXsye;H1e5u:iXtTIf;lYIUJf:hij5Wb;bmeZHc:iURhpf;Oxj3Xe:qAKMYb,yaf12d\" jsname=\"txFAF\" class=\"jCAhz ChMk0b\" jscontroller=\"Gn4SMb\"><span class=\"ryNqvb\" jsaction=\"click:E6Tfl,GFf3ac,tMZCfe; contextmenu:Nqw7Te,QP7LD; mouseout:Nqw7Te; mouseover:E6Tfl,c2aHje\" jsname=\"W297wb\">Using just these two approaches, we generate much more \u201cnatural\u201d words.<\/span><\/span> Here are some e<span jsaction=\"agoMJf:PFBcW;MZfLnc:P7O7bd;nt4Alf:pvnm0e,pfE8Hb,PFBcW;B01qod:dJXsye;H1e5u:iXtTIf;lYIUJf:hij5Wb;bmeZHc:iURhpf;Oxj3Xe:qAKMYb,yaf12d\" jsname=\"txFAF\" class=\"jCAhz ChMk0b\" jscontroller=\"Gn4SMb\"><span class=\"ryNqvb\" jsaction=\"click:E6Tfl,GFf3ac,tMZCfe; contextmenu:Nqw7Te,QP7LD; mouseout:Nqw7Te; mouseover:E6Tfl,c2aHje\" jsname=\"W297wb\">xamples:<\/span><\/span><\/span><\/p>\n<p><img decoding=\"async\" alt=\"screenshot_0.png\" src=\"https:\/\/github.com\/bernd32\/nickname-generator\/raw\/main\/screenshots\/screenshot_0.png?raw=true\" \/><\/p>\n<p>Now it&#8217;s much better. <span class=\"HwtZe\" jsaction=\"mouseup:Sxi9L,BR6jm; mousedown:qjlr0e\" jsname=\"jqKxS\" lang=\"en\"><span jsaction=\"agoMJf:PFBcW;MZfLnc:P7O7bd;nt4Alf:pvnm0e,pfE8Hb,PFBcW;B01qod:dJXsye;H1e5u:iXtTIf;lYIUJf:hij5Wb;bmeZHc:iURhpf;Oxj3Xe:qAKMYb,yaf12d\" jsname=\"txFAF\" class=\"jCAhz ChMk0b\" jscontroller=\"Gn4SMb\"><span class=\"ryNqvb\" jsaction=\"click:E6Tfl,GFf3ac,tMZCfe; contextmenu:Nqw7Te,QP7LD; mouseout:Nqw7Te; mouseover:E6Tfl,c2aHje\" jsname=\"W297wb\">Let&#8217;s analyze the 2nd point in more detail.<\/span><\/span><\/span><\/p>\n<h3>Algorithm for selecting random array elements based on weights in JS<\/h3>\n<p>A relatively simple implementation of such an algorithm is the transformation of a series of rational numbers s1 (array), which are weights for elements, into a series of numbers s2, which is obtained by cumulative addition of numbers:<\/p>\n<p><img decoding=\"async\" alt=\"equation\" data-canonical-src=\"https:\/\/latex.codecogs.com\/gif.image?%5Cdpi%7B110%7D%5Cbg%7Bwhite%7DS_%7Bn%7D%5CRightarrow&amp;space;S_%7Bcumulative%7D&amp;space;=&amp;space;%5Csum_%7Bi=1%7D%5E%7Bn%7Da_%7Bi%7D+(a_%7Bi-1%7D%5Cvee&amp;space;0)\" src=\"https:\/\/camo.githubusercontent.com\/48505af5b168b04e2cf0bcf0230e079dd717b0a9190334f4968b2cc1a186496b\/68747470733a2f2f6c617465782e636f6465636f67732e636f6d2f6769662e696d6167653f25354364706925374231313025374425354362672537427768697465253744535f2537426e25374425354352696768746172726f772673706163653b535f25374263756d756c61746976652537442673706163653b3d2673706163653b25354373756d5f253742693d312537442535452537426e253744615f253742692537442b28615f253742692d312537442535437665652673706163653b3029\" \/><\/p>\n<div class=\"hcb_wrap\">\n<pre class=\"prism line-numbers lang-js\" data-lang=\"JavaScript\"><code>const items = [ 'a', 'b', 'c' ]; \r\nconst weights = [ 3, 7, 1 ];<\/code><\/pre>\n<\/div>\n<p>Prepare an array of weights via cumulative addition (i.e. a cumulativeWeights list that will have the same number of elements as the original weights list of weights). In our case, such an array will look like this:<\/p>\n<div class=\"hcb_wrap\">\n<pre class=\"prism line-numbers lang-js\" data-lang=\"JavaScript\"><code>cumulativeWeights = [3, 3 + 7, 3 + 7 + 1] = [3, 10, 11]<\/code><\/pre>\n<\/div>\n<p>We now generate a randomNumber from 0 to the highest cumulative weight value. In our case, the random number will be in the range [0..11]. Let&#8217;s say randomNumber = 8.<\/p>\n<p>Loop through the cumulativeWeights array from left to right and select the first element that is greater than or equal to randomNumber. We will use the index of such an element to select an element from an array of elements.<\/p>\n<p>The idea behind this approach is that higher weights will &#8220;occupy&#8221; more numerical space. Therefore, there is a higher probability that the random number will end up in the \u201cnumber bucket\u201d with a higher weight.<\/p>\n<p>I&#8217;ll try to demonstrate this using the example of my script:<\/p>\n<div class=\"hcb_wrap\">\n<pre class=\"prism line-numbers lang-js\" data-lang=\"JavaScript\"><code>const weights = [3, 7, 1 ]; \r\nconst cumulativeWeights = [3, 10, 11]; \r\n\/\/ <span class=\"HwtZe\" jsaction=\"mouseup:Sxi9L,BR6jm; mousedown:qjlr0e\" jsname=\"jqKxS\" lang=\"en\"><span jsaction=\"agoMJf:PFBcW;MZfLnc:P7O7bd;nt4Alf:pvnm0e,pfE8Hb,PFBcW;B01qod:dJXsye;H1e5u:iXtTIf;lYIUJf:hij5Wb;bmeZHc:iURhpf;Oxj3Xe:qAKMYb,yaf12d\" jsname=\"txFAF\" class=\"jCAhz ChMk0b\" jscontroller=\"Gn4SMb\"><span class=\"ryNqvb\" jsaction=\"click:E6Tfl,GFf3ac,tMZCfe; contextmenu:Nqw7Te,QP7LD; mouseout:Nqw7Te; mouseover:E6Tfl,c2aHje\" jsname=\"W297wb\">In a pseudo-view, we can represent cumulativeWeights like this:<\/span><\/span><\/span>\r\nconst pseudoCumulativeWeights = [ 1, 2, 3, \/\/ &lt;-- [3] numbers\r\n4, 5, 6, 7, 8, 9, 10, \/\/ &lt;-- [7] numbers\r\n11, \/\/ &lt;-- [1] number \r\n];<\/code><\/pre>\n<\/div>\n<p>As you can see, heavier weights occupy a higher numerical space and therefore have a higher chance of being randomly selected. The percentage of selection chance for the weights elements will be as follows:<\/p>\n<p>Element 3: \u2248 27%,<\/p>\n<p>Element 7: \u2248 64%,<\/p>\n<p>Element 1: \u2248 9%<\/p>\n<p>In general, the function looks something like this:<\/p>\n<div class=\"hcb_wrap\">\n<pre class=\"prism line-numbers lang-js\" data-lang=\"JavaScript\"><code><\/code><\/pre>\n<pre><code data-hcb-clip=\"4\">function weightedRandom(items, weights) {\r\nif (items.length !== weights.length) {\r\nthrow new Error('Arrays of elements and weights must be the same size');\r\n}\r\n\r\nif (!items.length) {\r\nthrow new Error('Array elements must not be empty');\r\n}\r\nconst cumulativeWeights = [];\r\nfor (let i = 0; i &lt; weights.length; i += 1) {\r\ncumulativeWeights[i] = weights[i] + (cumulativeWeights[i - 1] || 0);\r\n}\r\n\r\nconst maxCumulativeWeight = cumulativeWeights[cumulativeWeights.length - 1];\r\n\r\nconst randomNumber = maxCumulativeWeight * Math.random();\r\n\r\nfor (let itemIndex = 0; itemIndex &lt; items.length; itemIndex += 1) {\r\nif (cumulativeWeights[itemIndex] &gt;= randomNumber) {\r\nreturn items[itemIndex];\r\n}\r\n}\r\n}<\/code><\/pre>\n<\/div>\n<h4><span class=\"HwtZe\" jsaction=\"mouseup:Sxi9L,BR6jm; mousedown:qjlr0e\" jsname=\"jqKxS\" lang=\"en\"><span jsaction=\"agoMJf:PFBcW;MZfLnc:P7O7bd;nt4Alf:pvnm0e,pfE8Hb,PFBcW;B01qod:dJXsye;H1e5u:iXtTIf;lYIUJf:hij5Wb;bmeZHc:iURhpf;Oxj3Xe:qAKMYb,yaf12d\" jsname=\"txFAF\" class=\"jCAhz ChMk0b\" jscontroller=\"Gn4SMb\"><span class=\"ryNqvb\" jsaction=\"click:E6Tfl,GFf3ac,tMZCfe; contextmenu:Nqw7Te,QP7LD; mouseout:Nqw7Te; mouseover:E6Tfl,c2aHje\" jsname=\"W297wb\">How can the word generation algorithm be even better?<\/span><\/span><\/span><\/h4>\n<p>This script is more of an example of using an algorithm for selecting a random element of an array based on their weight, so I did not go deep into linguistics and artificial intelligence algorithms. But offhand, unsightly combinations of some vowel and consonant pairs that do not occur in real words and looks unnatural. Examples:<\/p>\n<ul dir=\"auto\">\n<li>satlenl<\/li>\n<li>tohhi<\/li>\n<li>tiowh<\/li>\n<li>aahepw<\/li>\n<\/ul>\n<p>etc.<\/p>\n<p>The simplest solution to this issue is to limit the alternation of more than two vowels \/ consonants in a row:<\/p>\n<div class=\"hcb_wrap\">\n<pre class=\"prism line-numbers lang-js\" data-lang=\"JavaScript\"><code>if (vowelCounter &gt;= maxVowelsInRow) { i -= 1; continue;}<\/code><\/pre>\n<\/div>\n<p>and<\/p>\n<div class=\"hcb_wrap\">\n<pre class=\"prism line-numbers lang-js\" data-lang=\"JavaScript\"><code>if (consonantCounter &gt;= maxConsonantsInRow) { i -= 1; continue;}<\/code><\/pre>\n<\/div>\n<p>Let the values maxConsonantsInRow = 1 and maxVowelsInRow = 1, then the generated words will look something like this:<\/p>\n<p><img decoding=\"async\" alt=\"screenshot_1.png\" src=\"https:\/\/github.com\/bernd32\/nickname-generator\/raw\/main\/screenshots\/screenshot_1.png?raw=true\" \/><\/p>\n<p>Note here that &#8220;th&#8221; and &#8220;ae&#8221; are digrams, and count as one letter.<\/p>\n<p>The obvious disadvantage of this approach is that the generated words are more of the same type and with much less variative potential. Therefore, in this problem there is a huge scope for improving the algorithm.<\/p>\n<p>The full version of the script can be found here: <a href=\"https:\/\/github.com\/bernd32\/nickname-generator\">https:\/\/github.com\/bernd32\/nickname-generator<\/a><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Russian version is here. Let&#8217;s say we had an idea to create a JS script that would generate random words (nicknames). Let&#8217;s start with the&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[17,16],"class_list":["post-337","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-algorithms","tag-javascript"],"_links":{"self":[{"href":"https:\/\/bernd32.xyz\/blog\/index.php?rest_route=\/wp\/v2\/posts\/337","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/bernd32.xyz\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/bernd32.xyz\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/bernd32.xyz\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/bernd32.xyz\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=337"}],"version-history":[{"count":5,"href":"https:\/\/bernd32.xyz\/blog\/index.php?rest_route=\/wp\/v2\/posts\/337\/revisions"}],"predecessor-version":[{"id":342,"href":"https:\/\/bernd32.xyz\/blog\/index.php?rest_route=\/wp\/v2\/posts\/337\/revisions\/342"}],"wp:attachment":[{"href":"https:\/\/bernd32.xyz\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=337"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/bernd32.xyz\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=337"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/bernd32.xyz\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=337"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}