Analyzing JavaScript Scope

JavaScript’s scoping rules are pretty darn simple, but they make for a few implications that surprise novice and experienced developers alike. Luckily, most of those implications can be detected through static code analysis.

We can use a tool like Esprima to do our scope analysis. Esprima is a JavaScript parser that builds a pretty detailed—and thus useful—syntax tree (AST, abstract syntax tree). The AST tells us how statements in our JavaScript file are structured, but in its raw form it’s way too complex. It is hard to make any relevant claims about scope through the raw AST, and that’s why we have to transform it into something more meaningful for our purposes.

But first, let’s have a look at what we might want to detect.

What’s in a scope?
What’s in the global scope?
How are scopes nested?
Are identifiers shadowed by other identifiers?
Are variable declarations hoisted?
Where are closures and which scope(s) do they close over?

Besides a few general claims, we look into three topics that are implied by the way scoping works in JavaScript: shadowing, hoisting and closures. Let’s get started.

All the code is written in CoffeeScript. Read here why.

syntax = esprima.parse text, range: true, loc: true

This will get us the AST into the syntax variable, along with range and location information of the nodes. text is the JavaScript source, as a String variable.

Before we go into transforming this AST, let’s think about our target structure. Scopes in JavaScript can be nested, and we generally speak of global scope and function scope. JavaScript doesn’t really have block scope, as for example C. There are a few exceptions, looking at the with statement and try and catch, but those shall not concern us right now. The structure I have in mind is as follows:

Global Scope
- Variable
- Variable
- Variable
- Function a()
  - Variable
  - Variable
  - Function foo()
    - Variable
  - Function bar()
- Function b()
  - Variable

So, we basically nest functions within each other. Each function can have functions and variables as children. The global scope is special, it isn’t really a function, and it is the only scope without a parent. This leaves us with a certain set of object types (let’s just call them “classes”).

Scope, of which the “global scope” object will be an instance
Function, which will inherit of Scope
Variable

Also Function and Variable share certain characteristics, for example they all have a parent scope, and they might be assigned to identifiers, but not necessarily (think about anonymous functions without reference). You might also want to have different variations of Variable, depending on the way it was created (variable declaration or as parameter).

The classes in more detail

So, what attributes do we need in those classes?

Variable

Let’s start with the most simple one.

class Variable
  constructor: (@node, @parentScope) ->
    @type = @node.type
    @name = @node.id.name
    @shadowedBy = []
    @hoisted = false

That was pretty simple. We keep a reference to the actual node object (as returned by Esprima) and its parent scope. Then we extract the variable name and the type of node. We also initialize the list of variables it is shadowed by with an empty array, and by default we assume its declarations is not being hoisted.

Additionally, we could also save @node.loc, which represents the start and end positions of the declaration in the source code, but for our purposes, we don’t need it.

Function

Although it inherits from Scope, we will look at Function first, as it shares a lot with variables.

class Function extends Scope
  constructor: (@node, @parentScope, @identifier) ->
    @type = @node.type
    @shadowedBy = []
    @isAnonymous = false
    if @node.id?
      @name = @node.id.name
    else if @identifier?
      @name = @identifier.name
    else
      @isAnonymous = true
      @name = "(anonymous function)"

    super(@node, @parentScope)
    @params = (new Variable(param, @) for param in @node.params)

    @children.push @params

As you can see, we start off the same way we did with variables: we create references to the source node, the parent scope and the type. The name, however, depends on the way the function was created. There are three possibilities:

Function declaration
```
function foo() {}
```
Here, the name is defined through the declaration.
Function expression with assignment
```
var foo = function() {}
```
If no name is declared, but the function expression is a variable initialization, we give it the name of the variable (and store a reference to the identifier).
Function expression without assignment
```
bar(function() {});
```
In this case, we just have to be honest and call this an anonymous function. Poor child.

Finally, params keeps track of the parameters of the function, which are as well one type of the function’s children.

Scope

Let’s just start with the constructor.

class Scope
  constructor: (@node, @parentScope = null) ->
    body = if @parentScope? then @node.body.body else @node.body
    @loc = @node.loc

    @children = []
    @variables = []
    @functions = []

    @hoisting = false
    @hoistingPosition = null

    for statement, index in body
      # Set hoistingPosition to first statement
      @hoistingPosition = statement.loc if 0 == index
      @parseNode statement

    @name = "GLOBAL" if not @parentScope?

    # Registering children
    @children.push @variables
    @children.push @functions

The first difference is that the parent scope is null by default. The Scope class is used for the global scope, so this only makes sense.

The next interesting property is body. This is an array of statements which represent the body block of the scope, i.e. in a function, that would be anything between the two curly braces { and }.

We also save a reference to the scope’s start and end points through loc, as mentioned before. Those information may be useful for an application based on the scope analysis, later on.

Next, we initialize a few arrays: children, which we’ve been using in the Function constructor already, and one for variables and functions, respectively. Let’s ignore the two hoisting-related variables for now, as well as the loop, and jump right to the end. If the scope has no parent, it must be the global, so we name it respectively. We also register variables and functions as children of the scope (in a function, those are accompanied by the parameters).

Building the scope tree

We build the tree of nested scopes by populating the functions and variables arrays using a recursive function, parseNode. This process is kicked off by looping over all the (top-level) statement nodes in the scope’s body and applying the parseNode function.

This recursive function looks at the type of node and, depending on this, either calls itself on certain properties of the node or adds a function/variable to the scope’s lists.

  parseNode: (node, identifier) ->
    return unless node?

    switch node.type
      when "ExpressionStatement"
        @parseNode(node.expression)

      when "FunctionExpression"
        @functions.push new Function(node, @, identifier)

      when "CallExpression"
        @parseNode node.callee
        @parseNode(argument) for argument in node.arguments

      when "VariableDeclaration"
        for declarator in node.declarations
          variable = new Variable(declarator, @)
          variable.hoisted = @hoisting
          if variable.node.init
            @parseNode(variable.node.init, identifier)
          @variables.push variable

      when "FunctionDeclaration"
        @functions.push new Funktion(node, @)

      when "IfStatement"
        @parseNode node.test
        @parseNode node.consequent
        @parseNode node.alternate

This is just an excerpt of the actual function, covering about a third of the node types you can encounter (or rather: that I have encountered). Look here for the full source code.

This way, we can get the complete, structured parse tree by just calling:

syntax = esprima.parse text, range: true, loc: true
parseTree = new Scope syntax

But we’re not quite done yet.

Detect hoisting

Hoisting is a simple thing to detect—if you phrase it simply enough. For our purposes, we are putting it that way:

If a variable declaration comes after any other statement anywhere in a scope block, it is going to be hoisted.

Remember the hoisting boolean variable we defined in the Scope class? That’s just a switch we have to turn on as soon as we encounter any statement that is not of type VariableDeclaration. This includes statements which are used to initialize a variable, for example:

var a = 4;

To achieve what we want, we insert the following statement right before the switch in the parseNode function:

  @hoisting = true unless node.type is "VariableDeclaration"

Now we only have to save the hoisting information in newly declared variables, and we have seen the code for this already above, in the parseNode function:

          variable = new Variable(declarator, @)
          variable.hoisted = @hoisting

Detect shadowing

Shadowing is a bit more tricky, as it goes into two directions: identifiers can shadow other identifiers, and consequently, identifiers can be shadowed by other identifiers. We want to save these information in both cases. However, we cannot do this while we are building the syntax tree, as we would need to access the parent scope while it is still being constructed. Instead, we create a recursive function on the Scope class that we call after the parse tree was created.

  checkForShadowing: ->
    for child in @children
      for id in child
        id.shadows = id.getShadowedIdentifier()

  constructShadowingInformation: ->
    for id in @functions
      id.constructShadowingInformation()
    @checkForShadowing()

There we go! For each scope, we loop over its children (functions, variables, parameters) and check if there are identifiers that are shadowed by them. This is done on each identifier class using the following function:

  getShadowedIdentifier: ->
    scope = @parentScope.parentScope

    while scope?
      id = scope.getIdentifier(@name)
      if id isnt null
        id.shadowedBy.push @
        return id
      else
        scope = scope.parentScope
    null

getIdentifier is just a function that returns the appropriate object for a name in a certain scope.

The end

We made it! We built a program that turns Esprima’s AST into a tree of nested scopes with shadowing and hoisting information. Look at the complete source code here, created as a Node.js module.

Why CoffeeScript?

I created this module as part of my thesis project, a JavaScript Scope Inspector for Atom, which is why I wrote it in CoffeeScript. (Also, CoffeeScript is quite convenient.) I have yet to extract the module and publish it on NPM.

If you are using Atom and develop JavaScript on a regular basis, please check out the Scope Inspector package and enable metrics tracking. You will help me with my thesis a great lot :)

Closure detection

This will be the next big step for the parser. When it’s done, I will probably write about it as well.

Proto & Type