⏩ Boost NodeJS Performance with Threads

Vijay Rangan
6 min readAug 4, 2023

NodeJS is known for its non-blocking, event-driven architecture, making it a popular choice for building scalable and high-performance applications. However, when it comes to handling computationally intensive tasks or large-scale data processing, the single-threaded nature of NodeJS can become a bottleneck.

Fortunately, NodeJS provides a solution in the form of Worker Threads, which enable developers to harness the power of multi-threading to improve performance and process data more efficiently.

Today I will walk you through how I used worker threads to significantly boost the performance of a real world problem that I solved for one of my clients.

Need a consultant CTO or FAANG-tier engineer to build your MVP? Get in touch — cto@vijayrangan.dev

We’ll be generating training data for a ML model for anomaly detection. If you’re familiar with ML at all, you’ll know that we need large amounts of data for the model to do it’s job well.

All the code I’ve used in this post is availble at https://github.com/vjrngn/node-worker-threads

Approach

To truly understand the benefits of using threads, we will create two versions of our NodeJS application.

  1. Single-Threaded Version: A straightforward implementation that processes the entire dataset sequentially on the main thread.
  2. Multi-Threaded Version: We will leverage Worker Threads to distribute the workload across multiple threads.

Setup

We’ll generate 10,00,000 (1 million) records of random transaction data. We will split the workload across multiple threads instead of using just a single thread.

To generate random data, we’ll use the popular Faker library. If you’re following along, paste the following code in a file named random-data-generator.js .

const { faker } = require('@faker-js/faker');
const numTransactions = 1000000;
const numSellers = 1000;
const numCustomers = 10000;

// Generate a set of 1000 unique sellers (or merchants)
const sellers = Array.from({ length: numSellers }).map(() => {
return {
id: faker.string.uuid(),
accountId: faker.string.uuid()
}
});

// Generate a set of 10,000 unique customers
const customers = Array.from({ length: numCustomers }).map(() => {
return {
id: faker.string.uuid(),
accountId: faker.string.uuid()
}
});

Array.from({ length: numTransactions }).map(() => {
return {
id: faker.string.uuid(),
timestamp: faker.date.past({ years: 1 }).toLocaleDateString(),
customerId: faker.helpers.arrayElement(customers.map(c => c.id)),
customerAccountId: faker.helpers.arrayElement(customers.map(c => c.accountId)),

// seller data
sellerId: faker.helpers.arrayElement(sellers.map(s => s.id)),
sellerAccountId: faker.helpers.arrayElement(sellers.map(s => s.accountId)),

// transaction data
amount: faker.number.float({ min: 1000, max: 100000 }),
currencyCode: faker.finance.currencyCode(), // ISO 4217 currency code.
paymentMethod: faker.helpers.arrayElement([
"CREDIT_CARD",
"DEBIT_CARD",
"UPI",
"BANK_TRANSFER"
])
};
});

🪡 Single-Threaded Approach

This approach is straight forward. Let’s setup the following in index.js . The code only imports the module we created above.

const startTime = Date.now();
require('./random-data-generator.js');
const endTime = Date.now();

console.log(`That took ${endTime - startTime}ms`);

On my machine — a MacBook Pro M1 — the process takes 69976ms (or 70 seconds). Not bad given we’re generating a million records. If this was an API call, it would be baaaad. Let’s see how threads can make this better 👇

🤫 On a side not, any module or file that is imported is immediately executed. Running node index.js imports the random-data-generator.jsand runs the loop that we have defined.

🧵 Multi-Threaded Approach

Let’s modify our code as follows. In the index.js file, we’ll introduce the worker module. There’s quite a bit of change, so just come along for the ride for now and I’ll explain how this all fits together soon.

const { Worker } = require('worker_threads');
const { faker } = require('@faker-js/faker');
const os = require('os');
const path = require('path');

const numTransactions = 1000000;
const numSellers = 1000;
const numCustomers = 10000;

const threads = os.cpus().length; // 1 thread per core

// Generate a set of 1000 unique sellers (or merchants)
const sellers = Array.from({ length: numSellers }).map(() => {
return {
id: faker.string.uuid(),
accountId: faker.string.uuid()
}
});

// Generate a set of 10,000 unique customers
const customers = Array.from({ length: numCustomers }).map(() => {
return {
id: faker.string.uuid(),
accountId: faker.string.uuid()
}
});

let transactions = [];
for (let i = 0; i < threads; i++) {
const startTime = Date.now();
const worker = new Worker(
path.resolve(__dirname, 'random-data-generator.js'), {
workerData: {
numTransactions: numTransactions / threads,
sellers,
customers
}
}
);

worker.on('message', (data) => {
transactions = transactions.concat(data);
});

worker.on('exit', () => {
if (transactions.length === numTransactions) {
const endTime = Date.now();
console.log(`That took ${endTime - startTime}ms`);
}
})
}

We need a few changes in our random-data-generator.js as well.

const { faker } = require('@faker-js/faker');
const { workerData, parentPort } = require('worker_threads');

const { numTransactions, sellers, customers } = workerData;

const transactions = Array.from({ length: numTransactions }).map(() => {
return {
id: faker.string.uuid(),
timestamp: faker.date.past({ years: 1 }).toLocaleDateString(),
customerId: faker.helpers.arrayElement(customers.map(c => c.id)),
customerAccountId: faker.helpers.arrayElement(customers.map(c => c.accountId)),

// seller data
sellerId: faker.helpers.arrayElement(sellers.map(s => s.id)),
sellerAccountId: faker.helpers.arrayElement(sellers.map(s => s.accountId)),

// transaction data
amount: faker.number.float({ min: 1000, max: 100000 }),
currencyCode: faker.finance.currencyCode(), // ISO 4217 currency code.
paymentMethod: faker.helpers.arrayElement([
"CREDIT_CARD",
"DEBIT_CARD",
"UPI",
"BANK_TRANSFER"
])
}
});

parentPort.postMessage(transactions);

Running node index.js with this yields a result of 10831ms (or 10.8 seconds). That is orders of magnitude faster!

🤔 How it works

Alrighty, let’s dig in to the details.

const { Worker } = require('worker_threads'); 

We first start by creating an instance of the Worker class. The first argument of the constructor is the path to a script to be run by each thread (or worker).

// threads = cores.length
for (let i = 0; i < threads; i++) {
// ...
const worker = new Worker(
path.resolve(__dirname, 'random-data-generator.js'), {
workerData: {
numTransactions: numTransactions / threads,
sellers,
customers
}
});
// ...
}

In our experiment, we create as many threads as there are cores in our machine — const threads = os.cpus().length . This ensures that we take maximum advantage of all our processors by performing data generation in parallel.

{
workerData: {
numTransactions: numTransactions / threads,
sellers,
customers
}
}

As the second argument, we configure each worker to process a subset of the total transactions, specified by numTransactions: numTransactions/ threads .

We want all our transactions to be between the same set of customers and sellers, so we move the fake customer and seller data to our main thread and pass them in as data to the worker. Next, we access this data in our worker script random-data-generator.js like so:

const { workerData, parentPort } = require('worker_threads');

const { numTransactions, sellers, customers } = workerData;

// ...

workerData can be used in worker threads and gives us access to the data we make available from the main thread.

NodeJS threads use messages to communicate with each other and the main thread. parentPort allows any worker threads to send messages to the main thread using the postMessage method. Notice in the random-data-generator.js file we have the following piece of code

This command sends the generated transactions from each worker back to the parent.

Note: You may want to use a SharedArrayBuffer when dealing with threads and shared memory to ensure threads don’t step on each others toes and cause data corruption. Our example is simple and I’ve chosen to not use it.

If you’re a visual learner like me, you will probably like the image below that provides a visualisation of the multi threaded approach.

Visualisation of delegating work to threads in NodeJS

Conclusion

We saw how using threads to off-load CPU intensive tasks can significantly help speed up our application. NodeJS threads are great for these types of applications.

They are not a good candidate for I/O bound tasks — file and network — as the NodeJS engine (V8) will handle those a lot better.

In a future post, I will talk about NodeJS’s cluster module that allows us to horizontally scale our web applications to all available cores and maximise resource utilisation.

👉 GitHub Repo: https://github.com/vjrngn/node-worker-threads

If you liked this article and found it useful, don’t be shy… show some love with a 👏

Until next time…

--

--