Extracting Functions and Shifting Left
February 06, 2019
Let’s take a look at our example from last time:
const get = require('./get'); // get(url, callback)
const githubApi = 'https://api.github.com';
const githubEventsUrlFor = username => `${githubApi}/users/${username}/events`;
const githubRepoUrlFor = repo => `${githubApi}/repos/${repo}`;
const fetchReposForLatestActivity = (user, callback) => {
let timeout = setTimeout(function() {
callback(new Error('Timed out'));
}, 3000);
get(githubEventsUrlFor(user), (err, allEvents) => {
if (err) {
return callback(err);
}
const events = allEvents.slice(0, 3);
const repos = [];
let remainingCount = events.length;
events.forEach(event => {
get(githubRepoUrlFor(event.repo.name), (err, data) => {
if (err) {
return callback(err);
}
repos.push({
author: data.owner.login,
avatar: data.owner.avatar_url,
name: data.name,
full_name: data.full_name,
url: data.html_url,
description: data.description
});
remainingCount -= 1;
if (remainingCount <= 0) {
clearTimeout(timeout);
callback(null, repos)
}
});
});
});
};
fetchReposForLatestActivity('machuga', (err, repos) => {
if (err) {
console.error(err);
return;
}
console.log(repos);
});
We have a number of different nesting levels going on here and a lot of work happening in one named function. As with most other areas of programming, this code would probably do well to be broken apart into logical, well named functions. But asynchronous code, especially in languages that are lexically scoped like JavaScript, can sometimes be a bit more challenging to tease apart. We will start with some low hanging fruit and then move to more involved extractions in later posts.
Extract HTTP Requests
The first thing I want to do is extract the embedded HTTP requests into named functions, that way it is clear what they are doing, and they are easier for us to talk about. A win for both you the reader of this article and for people who will eventually read your asynchronous code.
const fetchRepo = (repoName, callback) => {
get(githubRepoUrlFor(repoName), callback);
};
const fetchEventsForUser = (user, callback) => {
get(githubEventsUrlFor(user), callback);
}
Something to note here is that I was able to pass the callback sent to our new
functions directly to the get
call without an intermediate callback. Why does
this work? We left the invoking code expecting the same callback signature, and
all of our callbacks expect (error, result)
as an argument list. So we can
save ourself a frame on the callstack (extra compute time and extra garbage to
be collected), and skip over reduncancy that would look like:
const fetchRepo = (repoName, callback) => {
get(githubRepoUrlFor(repoName), (err, repo) => {
if (err) {
return callback(err);
}
callback(null, repo);
});
};
This small set of extractions makes our main fetching function look like this:
const fetchReposForLatestActivity = (user, callback) => {
let timeout = setTimeout(function() {
callback(new Error('Timed out'));
}, 3000);
fetchEventsForUser(user, (err, allEvents) => {
if (err) {
return callback(err);
}
const events = allEvents.slice(0, 3);
const repos = [];
let remainingCount = events.length;
events.forEach(event => {
fetchRepo(event.repo.name, (err, data) => {
if (err) {
return callback(err);
}
repos.push({
author: data.owner.login,
avatar: data.owner.avatar_url,
name: data.name,
full_name: data.full_name,
url: data.html_url,
description: data.description
});
remainingCount -= 1;
if (remainingCount <= 0) {
clearTimeout(timeout);
callback(null, repos)
}
});
});
});
};
That makes me feel a bit better. A small win, but the asynchronous patterns are still the same, as is the nesting level. Well until we think of something, let’s just consider breaking our problem down into smaller logical chunks.
Encapsulating Logic
Let’s start by separating the idea of fetching a user from the concept of making 3 additional requests for the individual repositories. I won’t cover how the iteration works in this post, but will cover that bit more in the next.
I think our best place to separate this code out into a new function is right after where the events are sliced into a subset of the collection. However, I am blocked by one thing:
clearTimeout(timeout);
In that line I’m trying to clear out the timeout we set at the top of the
fetchReposForLatestActivity
function. We do this in case any of the HTTP
requests take too long, error out, or our counter never reaches 0. Looking
closely, this also includes the fetchEventsForUser
request.
This could be a blocker because I don’t want to need to keep timeout
from the
lexical scope of the parent function. Thankfully, I don’t notice any reason
this timeout needs to be cleared at the end of particular requests, just that
it needs to be cleared if all requests complete successfully. In fact…I think
there is an error here. We want the timeout to be cleared on failure too!
Cleaning up old crufty code can be a great way to uncover bugs. Well we can
address that in a bit - one step at a time when refactoring.
Back on topic, let’s yank all is code into yet another asynchronous function and invoke it from the same place.
const fetchReposForEvents = (events, callback) => {
const repos = [];
let remainingCount = events.length;
events.forEach(event => {
fetchRepo(event.repo.name, (err, data) => {
if (err) {
return callback(err);
}
repos.push({
author: data.owner.login,
avatar: data.owner.avatar_url,
name: data.name,
full_name: data.full_name,
url: data.html_url,
description: data.description
});
remainingCount -= 1;
if (remainingCount <= 0) {
callback(null, repos)
}
});
});
};
const fetchReposForLatestActivity = (user, callback) => {
let timeout = setTimeout(function() {
callback(new Error('Timed out'));
}, 3000);
fetchEventsForUser(user, (err, allEvents) => {
if (err) {
return callback(err);
}
fetchReposForEvents(allEvents.slice(0, 3), (err, repos) => {
if (err) {
return callback(err);
}
clearTimeout(timeout);
callback(null, repos);
});
});
};
I like this a lot better for a few reasons:
- I don’t have to care about how events are iterated over inside of the
fetchReposForLatestActivity
function anymore - I’ve decoupled the concept. Instead, now it is the responsibility offetchReposForEvents
to care about the iteration over the multiple HTTP requests, and how repos are trimmed down. - It shortens up the body of
fetchReposForLatestActivity
into a nice consumable chunk. I can see two main error cases right up front, see when the timeout gets cleared, and what gets sent back to the callback. - That bug we found regarding the
timeout
not being cleared when errors occur does not need to interfere with thefetchReposForEvents
inner workings, it will just deal with errors or success inside of the callback it fires upon completion.
So did we lessen the nesting level? Technically we did, since the iteration is no longer within this function, but the callback nesting is still the same.
This wouldn’t inherently be a problem in a real codebase, but I’m trying to teach you something, so this simply will not do.
We’ll start looking at some new techniques soon for flatting out our callbacks soon, but in the next post I think it’d be a good idea to explain how waiting for multiple callbacks to complete were handled in this app.
You can get a before and after comparison of this as a gist here.