Generating random words in JS

Russian version is here.

Let’s say we had an idea to create a JS script that would generate random words (nicknames).

Let’s start with the simplest approach first. If we just generate random letters and make words using these letters, they will look unnatural and unsightly. Here are some examples:

  • srjxdq
  • moyssj
  • ywtckmw
  • wjvzw
  • xtwey

etc.

As you can see, this approach does not allow us to generate words that even remotely resemble natural ones – at the output, we simply generate a set of meaningless letters that looks more like passwords than words. Here are two points how to make the generated random words look more natural (in my opinion):

  1. Avoid occurrences of more than two vowels/consonants when generating a word. This problem is trivial and I will not consider it.
  2. Pick random letters for a word based on their weight. Weights in this case will be the frequency of letters in English. This way we have to reduce/increase the chance that a certain letter will end up in our generated word, and rarely used letters such as Q, Z and X will appear in our words much less often than E, T, A, O , I, which are statistically the most frequent in English words.

Using just these two approaches, we generate much more “natural” words. Here are some examples:

screenshot_0.png

Now it’s much better. Let’s analyze the 2nd point in more detail.

Algorithm for selecting random array elements based on weights in JS

A relatively simple implementation of such an algorithm is the transformation of a series of rational numbers s1 (array), which are weights for elements, into a series of numbers s2, which is obtained by cumulative addition of numbers:

equation

const items = [ 'a', 'b', 'c' ]; 
const weights = [ 3, 7, 1 ];

Prepare an array of weights via cumulative addition (i.e. a cumulativeWeights list that will have the same number of elements as the original weights list of weights). In our case, such an array will look like this:

cumulativeWeights = [3, 3 + 7, 3 + 7 + 1] = [3, 10, 11]

We now generate a randomNumber from 0 to the highest cumulative weight value. In our case, the random number will be in the range [0..11]. Let’s say randomNumber = 8.

Loop through the cumulativeWeights array from left to right and select the first element that is greater than or equal to randomNumber. We will use the index of such an element to select an element from an array of elements.

The idea behind this approach is that higher weights will “occupy” more numerical space. Therefore, there is a higher probability that the random number will end up in the “number bucket” with a higher weight.

I’ll try to demonstrate this using the example of my script:

const weights = [3, 7, 1 ]; 
const cumulativeWeights = [3, 10, 11]; 
// In a pseudo-view, we can represent cumulativeWeights like this:
const pseudoCumulativeWeights = [ 1, 2, 3, // <-- [3] numbers
4, 5, 6, 7, 8, 9, 10, // <-- [7] numbers
11, // <-- [1] number 
];

As you can see, heavier weights occupy a higher numerical space and therefore have a higher chance of being randomly selected. The percentage of selection chance for the weights elements will be as follows:

Element 3: ≈ 27%,

Element 7: ≈ 64%,

Element 1: ≈ 9%

In general, the function looks something like this:

function weightedRandom(items, weights) {
if (items.length !== weights.length) {
throw new Error('Arrays of elements and weights must be the same size');
}

if (!items.length) {
throw new Error('Array elements must not be empty');
}
const cumulativeWeights = [];
for (let i = 0; i < weights.length; i += 1) {
cumulativeWeights[i] = weights[i] + (cumulativeWeights[i - 1] || 0);
}

const maxCumulativeWeight = cumulativeWeights[cumulativeWeights.length - 1];

const randomNumber = maxCumulativeWeight * Math.random();

for (let itemIndex = 0; itemIndex < items.length; itemIndex += 1) {
if (cumulativeWeights[itemIndex] >= randomNumber) {
return items[itemIndex];
}
}
}

How can the word generation algorithm be even better?

This script is more of an example of using an algorithm for selecting a random element of an array based on their weight, so I did not go deep into linguistics and artificial intelligence algorithms. But offhand, unsightly combinations of some vowel and consonant pairs that do not occur in real words and looks unnatural. Examples:

  • satlenl
  • tohhi
  • tiowh
  • aahepw

etc.

The simplest solution to this issue is to limit the alternation of more than two vowels / consonants in a row:

if (vowelCounter >= maxVowelsInRow) { i -= 1; continue;}

and

if (consonantCounter >= maxConsonantsInRow) { i -= 1; continue;}

Let the values maxConsonantsInRow = 1 and maxVowelsInRow = 1, then the generated words will look something like this:

screenshot_1.png

Note here that “th” and “ae” are digrams, and count as one letter.

The obvious disadvantage of this approach is that the generated words are more of the same type and with much less variative potential. Therefore, in this problem there is a huge scope for improving the algorithm.

The full version of the script can be found here: https://github.com/bernd32/nickname-generator

 

 

Guide on installing shadowsocks+v2ray server with traffic obfuscation (Cloudflare) over TLS in Debian 10

English translation of my post, original version in Russian is here.

Just a few words before we get started. You can skip this part if you want.

Introduction.

With internet regulation and censorship on the rise, states increasingly engaging in online surveillance, and state cyber-policing capabilities rapidly evolving globally, concerns about regulatory “chilling effects” online – the idea that laws, regulations, or state surveillance can deter people from exercising their freedoms or engaging in legal activities on the internet have taken on greater urgency and public importance [1]. Today, the most popular way to bypass Internet censorship are VPN services. However, they have quite significant drawbacks, which are completely or partially solved by setting up your own shadowsocks server. In this guide I will teach how to do it. 

You can ask a reasonable question: why bother so much when there is a VPN services? So, to begin with, I will list the pros and cons of a VPN over a shadowsocks server:

Pros of VPN services:

The user does not need any technical knowledge and time-consuming configuration, just install a VPN client and use it. Setting up an SS server, especially with traffic obfuscation, requires some skills and knowledge that most users do not have.

Cons:

– VPN services can be slow, including the paid ones. I won’t even mention the free ones, as they are often extremely slow and may not provide adequate privacy protection. ISPs may intentionally throttle the speed of suspicious encrypted traffic originating from VPNs. This issue can be addressed by employing traffic obfuscation through basic TLS encryption, which appears legitimate to your ISP.

– VPNs are not entirely secure. If someone is determined, they can find you relatively quickly: either the VPN service might hand over your information to authorities, or your ISP could track you using a so-called “correlation attack.” This is when an ISP compares the IP address a user utilizes to access certain online content or visit a restricted website with the IP addresses connected at that time, enabling the ISP to potentially identify an internet dissident’s real IP address. In this context, SS + v2ray + tls is a safer option for users residing in totalitarian countries like Russia or China. By the way, the Shadowsocks protocol and v2ray were developed by users in China.

Besides speed and security, circumventing internet censorship and surveillance with SS+v2ray can be absolutely free! You just need to find a shareware virtual server (for example, Oracle Cloud, which has an unlimited trial period) and a free domain (such as a .tk domain provided by Freenom). However, using free services can be somewhat risky since you don’t have full ownership, and both the VPS and domain could be taken away from you at any moment.

 

Steps to set up your SS server:

– Getting a virtual server (VPS) running on Debian (you can use any distro you want, but in this tutorial I’m using Debian 10)

– Getting a domain (you can go with any domain, Freenom’s .tk for example)

– Signing up on Cloudflare and linking the domain there

– Deploying the shadowsocks and a web server on the VPS

– Getting a free SSL certificate and setting up traffic obfuscation

– Setting up a client for windows/android/ios/linux.

Let’s get started.

Getting a virtual server (VPS)

Any inexpensive virtual server provider will do. Oracle Cloud and Microsoft Azure are fine and they’re free too! (though, there is a limit on the amount of traffic). There is nothing complicated in getting a virtual server, just make sure that you are provided with a dedicated static IP address and have open ports 80, 443 and 22 (usually they are opened by default). You can also choose a suitable VPS from this list: https://bitcoin-vps.com/

Getting a domain

Get any domain you want, .tk domains are free (you can get one here: https://www.freenom.com)

Signing up on Cloudflare and linking the domain (adding DNS records)

As an example, let’s take the bernd32.xyz domain. To do this, in the cloudflare, specify the IP address of our SS server, one is just bernd32.xyz, the second is www.bernd32.xyz, click next.

In this guide I’ll use one of my domains bernd32.xyz, replace with your own. We need to make two DNS records:

1) “A” record with the name “www” and IP address of your VPS

2) “A” record with the name “bernd32.xyz” and IP address of your VPS

Next, click “Continue.” Afterward, Cloudflare will generate name servers that should be entered into the control panel of your domain registrar. If you obtained your free .tk domain from Freenom, the control panel page might look something like this:

Wait for a few hours for the DNS records to update. In the meantime, let’s navigate to the Cloudflare Firewall settings and change the Security level to “Essentially Off”:

Read more