Data processing using JavaScript built-in methods

Introduction

Building UI in the client side using JavaScript involves retrieving JSON data from the server using REST APIs and processing. Similarly, Node.js could retrieve data from a datasource and do some processing. For small scale data processing, JavaScript has very good built-in methods, which are super fun to work with. This article takes an example data and process it to get insight into the data.

Arrow functions

Though all the data processing methods mentioned in this post can work with regular callback functions, Arrow functions introduced in ES6 are best suited for this. This section provides a quick introduction to the arrow functions. A detailed overview is available at MDN.

Arrow functions have a shorter syntax than regular functions. Example is given below. The callback function for map() is a regular function at line 8 and arrow functions at line 15 and 20.  Arrow function has three parts:

  • Parameters: The first part is the function parameter, (name) in this example. Multiple parameters are comma separated. If there is only one parameter, parenthesis can be omitted. If there are no parameters, empty parenthesis should be specified.
  • Arrow: There should not be any line breaks between the parameters and the arrow
  • Body: If the only statement in the function is the return statement, then the expression can be written without curly braces and return keyword. This is called concise body, as in line 15. Otherwise, usual curly braces enclose the body with the explicit return statement, which is called as block body, as in line 20.

arrarjs-snippet

Other properties of Arrow functions are:

  • arrow functions do not create its own this. this resolves to the enclosing parent context
  • arguments object is not available
  • not recommended to use as object methods. Best suited for functions.

Data processing methods

Array has many methods for processing data. The following methods will be discussed in this post. None of these methods mutate the original array.


Array.prototype.reduce()
Array.prototype.find()
Array.prototype.filter()
Array.prototype.map()
Array.prototype.every()
Object.keys()

Processing Tennis Grand Slam men’s singles final data

Data

Tennis grand slam men’s singles final data from 1877 to 2017 is available as JSON in data.js, which is used in the test. It is an array of objects (records). Each object is referred to as ‘records’ in this post. An example structure for year 2016 is given below. ‘data’ parameter in the functions mentioned in this post is the data from data.js.

{
 "year": "2016",
 "tournament": "U.S. Open",
 "winner": "Stan Wawrinka",
 "runner": "Novak Djokovic"
}, {
 "year": "2016",
 "tournament": "Wimbledon",
 "winner": "Andy Murray",
 "runner": "Milos Raonic"
}, {
 "year": "2016",
 "tournament": "French Open",
 "winner": "Novak Djokovic",
 "runner": "Andy Murray"
}, {
 "year": "2016",
 "tournament": "Australian Open",
 "winner": "Novak Djokovic",
 "runner": "Andy Murray"
}

Code

The GrandSlamSingles.js is written as CommonJS module. It is the main module of this project.

Copy-paste friendly code is available at https://github.com/sbalagop/arrayjs/blob/master/GrandSlamSingles.js

Find a winner

Given a year and the tournament, find the winner. Passing “2017” and “Australian Open”,  this returns “Roger Federer”.

Array.prototype.find() executes the callback function on every element of the data array. If a record matches the given year and the tournament, find() returns that record immediately, without checking further. So, find() returns only one value, which is the first matching one. If there is no match, undefined is returned. find() was introduced in ES6.

arrayjs_findAWinner

Find all the winners of the year

Given a year, find the winners of all tournaments in that year. Return only the list of tournament name and its winner. Passing “2016” should return:


[{"tournament": "U.S. Open", "winner": "Stan Wawrinka"},
{"tournament": "Wimbledon", "winner": "Andy Murray"},
{"tournament": "French Open", "winner": "Novak Djokovic"},
{"tournament": "Australian Open", "winner": "Novak Djokovic"}]

Array.prototype.filter() returns an array of all the matching elements, in this case the elements matching the given year. Since the return value is an array, we call the map() function on it (chaining).

Array.prototype.map() executes the callback once for each element in the array and returns a new array of result records. It is used to transform the array elements. Since only ‘tournament’ and ‘winner’ properties should be in the result, we use map to transform each record to only include these two properties. As mentioned earlier, the original array is not modified.

arrayjs_findWinnersOfTheYear

Search using space separated keywords

Given a string of space separated keywords, return the records which matches every keyword. The keywords can be matched on any property value on the record. Example, given “2016 Andy”, return the records which match both 2016 and Andy. The result is:


[{
"year": "<strong>2016</strong>",
"tournament": "Wimbledon",
"winner": "<strong>Andy</strong> Murray",
"runner": "Milos Raonic"
}, {
"year": "<strong>2016</strong>",
"tournament": "French Open",
"winner": "Novak Djokovic",
"runner": "<strong>Andy</strong> Murray"
}, {
"year": "<strong>2016</strong>",
"tournament": "Australian Open",
"winner": "Novak Djokovic",
"runner": "<strong>Andy</strong> Murray"
}]

This is my favorite. First we split the input string by space. The filter() used in line 60 filters the empty strings out. Since empty string evaluates to false, those are ignored.

filter() at line 62 returns the records which matches all the keywords.

Array.prototype.every() calls the callback on each array elements, and return true only if the callback returns true for all the elements. If the callback returns false for any element, every() returns false immediately.

At line 64, every() is called on keywords array since we want every keywords to be matched by the record. Its callback iterates over each property of the record and finds if at least one property value includes the keyword.

Object.keys() returns all the properties of the object as an array. As of now, it is the best way to iterate over an object’s properties.

arrayjs_search

String.prototype.includes() was introduced in ES6. It does a case sensitive match. The following regular expression based can be used for case insensitive searcg.


var regex = new RegEx(keyword,'gi');

return record[k].matches(regex) !== null;

Who won the most number of tournaments?

This should return { ‘winners’: [‘Roger Federer’], ‘maxWins’: 18 }. Roger Federer won the men’s grand slam singles 18 times, which is the most so far.

Array.prototype.reduce() calls the callback on each element passing an accumulator (acc) on each call to reduce the elements into single value or single object. The initial value of the accumulator can be provided as the second argument to reduce(). The arguments passed to the callback are the accumulator, current element, current index and the array.

First, we need to find the total wins for each player. We use reduce() to reduce the array to a single object with player names as the property names and their total wins as the value. In line 94, we pass an empty initial accumulator object. If the winner of the current record is already in the accumulator object, the winning count is increased or wining count is initialized to 1. At the end of the callback the accumulator should be returned as in line 93, so that it will be passed to the callback on the next element again. The total wins of each player is stored in totalWins.

Next, we need to find out who has the most wins. Object.keys() is used to iterate over all the winners and reduce() is used to find the maximum wins and the players with maximum wins. In theory there could be multiple players with the same max wins, so an array is used at line 109 in the initial object., though as of 2017, the result returns only [‘Roger Federer’].

arrayjs_findMostTournamentWinner

 Who won the most consecutive tournaments?

Donald Budge has won six consecutive Grand Slam singles titles, from Wimbledon 1937 to U.S. Open 1938, the most consecutive wins.

reduce() is used to reduce the records into a single object with the result. The initial accumulator is passed in line 149. The comments in the code explains the logic.

arrayjs_findMostConsecutiveWinner

Time complexity

All the functions mentioned above find the result in O(n) time complexity, the best possible.

Testing

Mocha and Chai are used for BDD unit testing. The test is at test/test.js. The test requires the data.js and ../GrandSlamSingles.js. Part of the test is shown below, while others are collapsed. test/test.js contains the complete test.

arrayjs_test

Download

  1. git clone https://github.com/sbalagop/arrayjs.git
  2. npm install
  3. npm test

The ‘npm test’ will show the following results:

arrayjs_test_results

References

  1. Arrow functions
  2. Array

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s