in ECMAScript

JavaScript. The Core: 2nd Edition

This is the second edition of the JavaScript. The Core overview lecture, devoted to ECMAScript programming language and core components of its runtime system.

Audience: experienced programmers, professionals.

The first edition of the article covers generic aspects of JS language, using abstractions mostly from the ES3 spec, with some references to the appropriate changes in ES5 and ES6+.

Starting since ES2015, the specification changed descriptions and structure of some core components, introduced new models, etc. And in this edition we focus on the newer abstractions, updated terminology, but still maintaining the very basic JS structures which stay consistent throughout the spec versions.

This article covers ES2017+ runtime system.

Note: the latest version of the ECMAScript specification can be found on the TC-39 website.

We start our discussion with the concept of an object, which is fundamental to ECMAScript.

ECMAScript is an object-oriented programming language, having concept of an object as its core abstraction.

Def. 1: Object: An Object is a collection of properties, and has a single prototype object. The prototype may be either an object or the null value.

Let’s take a basic example of an object. A prototype of an object is referenced by the internal [[Prototype]] property, which to user-level code is exposed via the __proto__ property.

For the code:

1
2
3
4
let point = {
  x: 10,
  y: 20,
};

we have the structure with two explicit own properties and one implicit __proto__ property, which is the reference to the prototype of point:

Figure 1. A basic object with a prototype.

Figure 1. A basic object with a prototype.

Note: objects may store also symbols. You can get more info on symbols in this documentation.

The prototype objects are used to implement inheritance with the mechanism of dynamic dispatch. Let’s consider the prototype chain concept to see this mechanism in detail.

Every object when is created receives its prototype. If the prototype is not set explicitly, objects receive default prototype as their inheritance object.

Def. 2: Prototype: A Prototype is a delegation object used to implement prototype-based inheritance.

Explicitly the prototype can be set either by __proto__ property, or via Object.create method:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// Base object.
let point = {
  x: 10,
  y: 20,
};
 
// Inherit from `point` object.
let point3D = {
  z: 30,
  __proto__: point,
};
 
console.log(
  point3D.x, // 10, inherited
  point3D.y, // 20, inherited
  point3D.z  // 30, own
);

Note: by default objects receive Object.prototype as their inheritance object.

Any object can be used as a prototype of another object, and the prototype itself can have its own prototype. If a prototype has a non-null reference to its prototype, and so on, it is called the prototype chain.

Def. 3: Prototype chain: A prototype chain is a finite chain of objects used to implement inheritance and shared properties.

Figure 2. A prototype chain.

Figure 2. A prototype chain.

The rule is very simple: if a property is not found in the object itself, there is an attempt to resolve it in the prototype; in the prototype of the prototype, etc. — until the whole prototype chain is considered.

Technically this mechanism is known as dynamic dispatch or delegation.

Def. 4: Delegation: a mechanism used to resolve a property in the inheritance chain. The process happens at runtime, hence is also called dynamic dispatch.

Note: in contrast with static dispatch when references are resolved at compile time, dynamic dispatch resolves the references at runtime.

And if a property eventually is not found in the prototype chain, the undefined value is returned:

1
2
3
4
5
6
7
8
9
10
11
12
// An "empty" object.
let empty = {};
 
console.log(
 
  // function, from default prototype
  empty.toString,
   
  // undefined
  empty.x
 
);

As we can see, a default object is actually never empty — it always inherits something from the Object.prototype. To create a prototype-less dictionary, we have to explicitly set its prototype to null:

1
2
3
4
// Doesn't inherit from anything.
let dict = Object.create(null);
 
console.log(dict.toString); // null

The dynamic dispatch mechanism allows full mutability of the inheritance chain, providing an ability to change a delegation object:

1
2
3
4
5
6
7
8
9
let foo = {x: 10};
let bar = {x: 20};
 
let baz = {__proto__: foo};
console.log(baz.x); // 10
 
// Change the delegate:
Object.setPrototypeOf(baz, bar);
console.log(baz.x); // 20

On the example of Object.prototype, we see that the same prototype can be shared across multiple objects. On this principle the class-based inheritance is implemented in ECMAScript. Let’s see the example, and look under the hood of the “class” abstraction in JS.

When several objects share the same initial state and behavior, they form a classification.

Def. 5: Class: A class is a formal abstract set which specifies initial state and behavior of its objects.

In case we need to have multiple objects inheriting from the same prototype, we could of course create this one prototype, and explicitly inherit it from the newly created objects:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// Generic prototype for all letters.
let letter = {
  getNumber() {
    return this.number;
  }
};
 
let a = {number: 1, __proto__: letter};
let b = {number: 2, __proto__: letter};
// ...
let z = {number: 26, __proto__: letter};
 
console.log(
  a.getNumber(), // 1
  b.getNumber(), // 2
  z.getNumber(), // 26
);

We can see these relationships on the following figure:

Figure 3. A shared prototype.

Figure 3. A shared prototype.

However, this is obviously cumbersome. And the class abstraction serves exactly this purpose — being a syntactic sugar (i.e. a construct which semantically does the same, but in a much nicer syntactic form), it allows creating such multiple objects with the convenient pattern:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class Letter {
  constructor(number) {
    this.number = number;
  }
 
  getNumber() {
    return this.number;
  }
}
 
let a = new Letter(1);
let b = new Letter(2);
// ...
let z = new Letter(26);
 
console.log(
  a.getNumber(), // 1
  b.getNumber(), // 2
  z.getNumber(), // 26
);

Note: class-based inheritance in ECMAScript is implemented on top of the prototype-based inheritance.

Technically a “class” is represented as a “constructor function + prototype” pair. Thus, a constructor function creates objects, and also automatically sets the prototype for its newly created instances. This prototype is stored in the <ConstructorFunction>.prototype property.

Def. 6: Constructor: A constructor is a function which is used to create instances, and automatically set their prototype.

It is possible to use a constructor function explicitly. Moreover, before the class abstraction was introduced, JS developers used to do so not having a better way (we can still find a lot of such legacy code allover the internets):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
function Letter(number) {
  this.number = number;
}
 
Letter.prototype.getNumber = function() {
  return this.number;
};
 
let a = new Letter(1);
let b = new Letter(2);
// ...
let z = new Letter(26);
 
console.log(
  a.getNumber(), // 1
  b.getNumber(), // 2
  z.getNumber(), // 26
);

And while creating a single-level constructor was pretty easy, the inheritance pattern required much more boilerplate. Currently this boilerplate is hidden as an implementation detail, and that exactly what happens under the hood when we create a class in JavaScript.

Note: constructor functions are just implementation details of the class-based inheritance.

Let’s see the relationships of the objects and their class:

Figure 4. A constructor and objects relationship.

Figure 4. A constructor and objects relationship.

The figure above shows that every object has an associated prototype. Even the constructor function (class) Letter has its own prototype, which is Function.prototype. Notice, that Letter.prototype is the prototype of the Letter instances, that is a, b, and z.

Now when we understand the basic relationships between ECMAScript objects, let’s take a deeper look an JS runtime system. As we will see, almost everything there can also be presented as an object.

To execute JS code and track its runtime evaluation, ECMAScript spec defines the concept of an execution context. Logically execution contexts are maintained using a stack (the execution context stack as we will see shortly), which corresponds to the generic concept of a call-stack.

Def. 7: Execution context: An execution context is a specification device that is used to track the runtime evaluation of the code.

There are several types of ECMAScript code: the global code, function code, eval code, and module code; each code is evaluated in its execution context. Different code types, and their appropriate objects may affect the structure of an execution context: for example, generator functions save their generator object on the context.

Let’s consider a recursive function call:

1
2
3
4
5
6
7
8
9
10
11
12
13
function recursive(flag) {
 
  // Exit condition.
  if (flag === 2) {
    return;
  }
 
  // Call recursively.
  recursive(++flag);
}
 
// Go.
recursive(0);

When a function is called, a new execution context is created, and pushed onto the stack — at this point it becomes an active execution context. When a function returns, its context is popped from the stack.

A context which calls another context is called a caller. And a context which is being called, accordingly, is a callee. In our example the recursive function plays both roles: of a callee and a caller — when calls itself recursively.

Def. 8: Execution context stack: An execution context stack is LIFO structure used to maintain control flow and order of execution.

For our example from above we have the following stack “push-pop” modifications:

Figure 5. An execution context stack.

Figure 5. An execution context stack.

As we can also see, the global context is always at the bottom of the stack, it is created prior execution of any other context.

In general, a code of a context runs to completion, however as we mentioned above, some objects — such as generators, may violate LIFO order of the stack. A generator function may suspend its running context, and remove it from the stack before completion. Once a generator is activated again, its context is resumed and again is pushed onto the stack:

1
2
3
4
5
6
7
8
9
10
11
function *gen() {
  yield 1;
  return 2;
}
 
let g = gen();
 
console.log(
  g.next().value, // 1
  g.next().value, // 2
);

The yield statement here returns the value to the caller, and pops the context. On the second next call, the same context is pushed again onto the stack, and is resumed. Such context may outlive the caller which creates it, hence the violation of the LIFO structure.

Note: you can read more about generators and iterators in this documentation.

We shall now discuss the important components of an execution context; in particular we should see how ECMAScript runtime manages variables storage, and scopes created by nested blocks of a code. This is the generic concept of lexical environments, which is used in JS to store data, and solve “Funarg problem” — with the mechanism of closures.

Every execution context has associated with it lexical environment.

Def. 9: Lexical environment: A lexical environment is a structure used to define association between identifiers appearing in the context with their values. Each environment can have a reference to an optional parent environment.

Technically, an environment is a pair, consisting of an environment record (an actual table which maps identifiers to values), and a reference to the parent (which can be null).

For the code:

1
2
3
4
5
6
7
8
9
let x = 10;
let y = 20;
 
function foo(z) {
  let x = 100;
  return x + y + z;
}
 
foo(30); // 150

The environment structures of the global context, and a context of the foo function would look as follows:

Figure 6. An environment chain.

Figure 6. An environment chain.

Logically this reminds a prototype chain which we’ve discussed above. And the rule for identifiers resolution is very similar: if a variable is not found in the own environment, there is an attempt to lookup it in the parent environment, in the parent of the parent, and so on — until the whole environment chain is considered.

Def. 10: Identifier resolution: the process of resolving a variable (binding) in an environment chain. An unresolved binding results to ReferenceError.

This explains why variable x is resolved to 100 and not to 10, why we can access parameter z — it’s also just stored on the activation environment, and why we can access variable y — it is found in the parent environment.

Similarly to prototypes, the same parent environment can be shared by several child environments: for example, two global functions share the same global environment.

Note: you can get detailed information about lexical environment in this article.

Environment records differ by type. There are object environment records and declarative environment records. On top of the declarative record there are also function environment records, and module environment records. Each type of the record has specific only to it properties. However, the generic mechanism of the identifier resolution is common across all the environments, and doesn’t depend on the type of a record.

An example of an object environment record can be the record of the global environment. Such record has also associated binding object, which may store some properties from the record, but not the others, and vice-versa. The binding object can also be provided as this value.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// Legacy variables using `var`.
var x = 10;
 
// Modern variables using `let`.
let y = 20;
 
// Both are added to the environment record:
console.log(
  x, // 10
  y, // 20
);
 
// But only `x` is added to the "binding object".
// The binding object of the global environment
// is the global object, and equals to `this`:
 
console.log(
  this.x, // 10
  this.y, // undefined!
);
 
// Binding object can store a name which is not
// added to the environment record, since it's
// not a valid identifier:
 
this['not valid ID'] = 30;
 
console.log(
  this['not valid ID'], // 30
);

This is depicted on the following figure:

Figure 7. A binding object.

Figure 7. A binding object.

Notice, the binding object exists to cover legacy constructs such as var-declarations, and with-statements, which also provide their object as a binding object. These are historical reason when environments were represented as simple objects. Currently the environments model is much more optimized, however as a result we can’t access binding as properties anymore.

We have already seen how environments are related via the parent link. Now we shall see how an environment can outlive the context which creates it. This is the basis for the mechanism of closures which we’re about to discuss.

Functions in ECMAScript are first-class.

Def. 11: First-class function: a function which can participate as a normal data: be stored in a variable, passed as an argument, or returned as a value from another function.

With the concept of first-class functions so called “Funarg problem” is related (or “A problem of a functional argument”), when a function has to deal with free variables.

Def. 12: Free variable: a variable which is neither a parameter, nor a local variable of this function.

Let’s take a look at the Funarg problem, and see how it’s solved in ECMAScript.

Consider the following code snippet:

1
2
3
4
5
6
7
8
9
10
11
12
13
let x = 10;
 
function foo() {
  console.log(x);
}
 
function bar(funArg) {
  let x = 20;
  funArg();
}
 
// Pass `foo` as an argument to `bar`.
bar(foo); // 10, not 20!

For the function foo the variable x is free. When the foo function is activated (via the funArg parameter) — where should it resolve the x binding? From the outer scope where the function was created, or from the caller scope, from where the function is called? As we see, the caller, that is the bar function, also provides the binding for x — with the value 20.

The use-case described above is known as downward funarg problem, i.e. an ambiguity at determining a correct environment of a binding: environment of the creation time, or environment of the call time?

This is solved by an agreement of using static scope, that is the scope of the creation time.

Def. 13: Static scope: a language implements static scope, if only by looking at the source code one can determine in which environment a binding is resolved.

Technically the static scope is implemented by capturing the environment where a function is created.

Note: you can read about static and dynamic scopes in this article.

For our example the foo function’s captured environment is the global environment:

Figure 8. A closure.

Figure 8. A closure.

We can see that an environment references a function, which in turn reference the environment back.

Def. 14: Closure: A closure is a function which captures the environment where it’s defined. Further this environment is used for identifier resolution.

The second sub-type of the Funarg problem is known as the upward funarg problem. The only difference here is that a capturing environment outlives the context which creates it.

Let’s see the example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
function foo() {
  let x = 10;
   
  // Closure, capturing environment of `foo`.
  function bar() {
    return x + y;
  }
 
  // Upward funarg.
  return bar;
}
 
let x = 20;
 
// Call to `foo` returns `bar` closure.
let bar = foo();
 
bar(); // 10, not 20!

Again, technically it doesn’t differ from the same exact mechanism of capturing the definition environment. Just in this case, hadn’t we have the closure, the activation environment of foo would be destroyed. But we captured it, so it cannot be deallocated, and is preserved — to support static scope semantics.

Often there is an incomplete understanding of closures — usually developers think about closures only in terms of the upward funarg problem (and practically it really makes more sense). However, as we can see, technical mechanism for downward and upward funarg problem is exactly the same — and is the mechanism of static scope.

As we mentioned above, similarly to prototypes, the same parent environment can be shared across several closures. This allows accessing and mutating the shared data:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
function createCounter() {
  let count = 0;
 
  return {
    increment() { count++; return count; },
    decrement() { count--; return count; },
  };
}
 
let counter = createCounter();
 
console.log(
  counter.increment(), // 1
  counter.decrement(), // 0
  counter.increment(), // 1
);

Since both closures, increment and decrement, are created within the scope containing the count variable, they share this parent scope. That is, capturing always happens “by-reference” — meaning the reference to the whole parent environment is stored.

We can see this on the following picture:

Figure 9. A closure.

Figure 9. A shared environment.

Some languages may capture by-value, making a copy of a captured variable, and do not allow changing it in the parent scopes. However in JS, to repeat, it is always the reference to the parent scope.

Note: implementations may optimize this step, and do not capture the whole environment. Capturing only used free-variables, they though still maintain invariant of mutable data in parent scopes.

So all identifiers are statically scoped. There is however one value which is dynamically scoped in ECMAScript. It’s the value of this.

The this value is a special object which is dynamically and implicitly passed to the code of a context. We can consider it as an implicit extra variable, which is accesible but cannot be assigned.

The purpose of the this value is to executed the same code for multiple objects.

Def. 15: This: an implicit context object accessible from a code of an execution context — in order to apply the same code for multiple objects.

The major use-case is the class-based OOP. An instance method (which is defined on the prototype) exists in one exemplar, but is shared across all the instances of this class.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
class Point {
  constructor(x, y) {
    this._x = x;
    this._y = y;
  }
 
  getX() {
    return this._x;
  }
 
  getY() {
    return this._y;
  }
}
 
let p1 = new Point(1, 2);
let p2 = new Point(3, 4);
 
// Can access `getX`, and `getY` from
// both instances (they are passed as `this`).
 
console.log(
  p1.getX(), // 1
  p2.getX(), // 3
);

When the getX method is activated, a new environment is created to store local variables and parameters. In addition, function environment record gets the [[ThisValue]] passed, which is bound dynamically depending how a function is called. When it’s called with p1, the this value is exactly p1, and in the second case it’s p2.

Just to show the dynamic nature of this value, consider this example, which we leave to a reader as an exercise to solve:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
function foo() {
  return this;
}
 
let bar = {
  foo,
 
  baz() {
    return this;
  },
};
 
// `foo`
console.log(
  foo(),       // global or undefined
 
  bar.foo(),   // bar
  (bar.foo)(), // bar
 
  (bar.foo = bar.foo)(), // global
);
 
// `bar.baz`
console.log(bar.baz()); // bar
 
let savedBaz = bar.baz;
console.log(savedBaz()); // global

Note: you can get a detailed explanation how this value is determined, and why the code from above works the way it does, in the appropriate chapter.

The arrow functions are special in terms of this value: their this is lexical, but not dynamic. I.e. their function environment record does not provide this value, and it’s taken from the parent environment.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
var x = 10;
 
let foo = {
  x: 20,
 
  // Dynamic `this`.
  bar() {
    return this.x;
  },
 
  // Lexical `this`.
  baz: () => this.x,
 
  qux() {
    // Lexical this within the invocation.
    let arrow = () => this.x;
 
    return arrow();
  },
};
 
console.log(
  foo.bar(), // 20, from `foo`
  foo.baz(), // 10, from global
  foo.qux(), // 20, from `foo` and arrow
);

Like we said, in the global context the this value is the global object (the binding object of the global environment record). Previously there was only one global object. In current version of the spec there might be multiple global objects which are part of code realms. Let’s discuss this structure.

Before it is evaluated, all ECMAScript code must be associated with a realm. Technically a realm just provides a global environment for a context.

Def. 16: Realm: A code realm is an object which encapsulates a separate global environment.

When an execution context is created it’s associated with a particular code realm, which provides the global environment for this context. This association further stays unchanged.

Note: a direct realm equivalent in browser environment is the iframe element, which exactly provides a custom global environment. In Node.js it is close to the vm module.

Current version of the specification doesn’t provide an ability to explicitly create realms, but they can be created implicitly by the implementations. There is a proposal though to expose this API to user-code.

Logically though, each context from the stack is always associated with its realm:

Figure 10. A context and realm association.

Figure 10. A context and realm association.

Now we’re getting closer to the bigger picture of the ECMAScript runtime. Yet however we still need to see the entry point to the code, and the initialization process. This is managed by the mechanism of jobs and job queues.

Some operations can be postponed, and executed as soon as there is an available spot on the execution context stack.

Def. 17: Job: A job is an abstract operation that initiates an ECMAScript computation when no other ECMAScript computation is currently in progress.

Jobs are enqueued on the job queues, and in current spec version there are two job queues: ScriptJobs, and PromiseJobs.

And initial job on the ScriptJobs queue is the main entry point to our program — initial script which is loaded and evaluated: a realm is created, a global context is created and is associated with this realm, it’s pushed onto the stack, and the global code is executed.

Notice, the ScriptJobs queue manages both, scripts and modules.

Further this context can execute other contexts, or enqueue other jobs. An example of a job which can be spawned and enqueued is a promise.

When there is no running execution context and the execution context stack is empty, the ECMAScript implementation removes the first pending job from a job queue, creates an execution context and starts its execution.

Example:

1
2
3
4
5
6
7
8
9
// Enqueue a new promise on the PromiseJobs queue.
new Promise(resolve => setTimeout(() => resolve(10), 0))
  .then(value => console.log(value));
 
// This log is executed earlier, since it's still a
// running context, and job cannot start executing first
console.log(20);
 
// Output: 20, 10

Note: you can read more about promises in this documentation.

The async functions can await for promises, so they also enqueue promise jobs:

1
2
3
4
5
6
7
8
async function later() {
  return await Promise.resolve(10);
}
 
(async () => {
  let data = await later();
  console.log(data); // 10
})();

Note: read more about async functions in here.

Now we’re very close to the final picture of the current JS Universe. We shall see now main owners of all those components we discussed, the Agents.

The concurrency and parallelism is implemented in ECMAScript using Agent pattern. The agent pattern is very close to the Actor pattern — a lightweight process with message-passing style of communication.

Def. 18: Agent: An agent is an object encapsulating execution context stack, set of job queues, and code realms.

Implementation dependent an agent can run on the same thread, or on a separate thread. A worker agent in the browser environment is an example of the Agent concept.

The agents are state isolated from each other, and can communicate by sending messages. Some data can be shared though between agents, for example SharedArrayBuffers. Agents can also combine into agent clusters.

So below is the picture of the ECMAScript runtime:

Figure 11. ECMAScript runtime.

Figure 11. ECMAScript runtime.

And that is it; that’s what happens under the hood of the ECMAScript engine!

Now we come to an end, and this is the amount of information on JS core which we can cover within an overview article. Like we mentioned, JS code can be grouped into modules, properties of objects can be tracked by Proxy objects, etc, etc. — there are many user-level details which you can find in different documentations on JavaScript language.

Here though we tried to represent the logical structure of an ECMAScript program itself, and hopefully it clarified these details. If you have any questions, suggestions or feedback, — as always I’ll be glad to discuss them in comments.

I’d like to thank the TC-39 representatives and spec editors which helped with clarifications for this article. The discussion can be found in this Twitter thread.

Good luck in studying ECMAScript!

Written by: Dmitry Soshnikov
Published on: November 14th, 2017

  1. As always, fantastic and in-depth explanations of JavaScript internals. Thank you for your great job, Dmitry!