Reactive WebRTC conference

11 min read comment

WebRTC can be used to build a plugin-free video conference with screensharing in pure JavaScript. Observables — proposed for ES7 — let application logic be expressed in a clean, functional and composable way.

Currently WebRTC is supported only in Chrome and Firefox. Screensharing works in both Chrome and Firefox but each browser has its own way of exposing it.

The code samples use the promise-based WebRTC API (2015). Currently only Firefox implements them natively. Fortunately they are easy to polyfill.

A polyfill for Observables is also used.

Additionally a small WebSocket node.js server is presented and a Chrome extension required for screensharing. The entire application can be run inside a Docker container.

ES6 Maps
WebSockets
Promises
WebRTC

Observables

Traditional JavaScript uses ad-hoc callbacks for all kinds of completion notifications. For example the following code initializes the application when all resources on a page have been loaded:

window.addEventListener('load', initApplication);

But that code contains a subtle race condition — if the document is loaded before the event listener has been registered then the application will never be initialized.

A work-around is to first check if the document is ready and if not add the listener:

if (document.readyState === 'complete') {
    initApplication();
} else {
    window.addEventListener('load', initApplication);
}

An alternative approach involves decoupling the potentially asynchronous event (loading the page) from application initialization and using Promises to represent the asynchronous event:

var documentReady = (function() {
    if (document.readyState === 'complete') {
        return Promise.resolve();
    }
    return new Promise(function(resolve) {
        window.addEventListener('load', resolve);
    });
}());

Then the clients of this API do not need to be aware of the document’s loading status:

function initApplication() {
    print('Init app!');
}

documentReady.then(initApplication);

Several DOM APIs that initially used callbacks have already started to adopt promises (e.g. Web Crypto, Web MIDI and WebRTC). This is an approach recommended to specification authors by the W3C TAG.

The fact that once a Promise is resolved its status and value never changes makes Promises good for representing events that happen only once — loading an image or a remote resource. But Promise is inappropriate for events that happen multiple times — like mouse movements or messages from a remote server. One of the solutions to handling multiple values in an asynchronous way is the Observable.

There is a similarity between synchronous concepts of a regular function returning a single value and a generator returning multiple values and the asynchronous concepts of promises that resolve to a single value and observables that model asynchronous streams of multiple values.

	One	Many
Sync	Value	Iterable
Async	Promise	Observable

function print(msg) {
    var special = false;
    if (typeof msg === 'undefined') {
        msg = 'undefined';
        special = true;
    }
    if (msg === null) {
        msg = 'null';
        special = true;
    }
    var li = document.createElement('LI');
    li.textContent = msg;
    li.className = special ? 'primitive' : '';
    output.appendChild(li);
}

function clear() {
    output.innerHTML = '';
}
clear();

A simple value is singular and synchronous:

var value = 17;

print(value);

Promise represents a value that is returned asynchronously. Once a promise is resolved the value is never changed:

var value = new Promise(function(resolve) {
    setTimeout(function() {
        resolve(42);
    }, 1000);
});

value.then(print);

An iterable (e.g. an array or a generator) represents multiple synchronous values:

var source = [ { clientX: 10 }, { clientX: 20 } ];

source.map(function(e) { return e.clientX; })
	.filter(function(x) { return x % 2 === 0; })
	.forEach(print);

Observables deliver multiple values asynchronously.

The example below captures all mouse movements within two seconds and prints their X coordinates:

var source = Observable.fromEvent(document, 'mousemove')
	.takeUntil(Observable.timeout(2000));

source.map(function(e) { return e.clientX; })
    .filter(function(x) { return x % 2 === 0; })
    .forEach(print);

An observable has a similar interface to an array but the party in control is different. In arrays and iterables the consumer pulls values out of the iterable. Observables on the other hand push values into all registered subscribers.

Signaling

WebRTC allows creating direct connections between peers but to setup these connections an intermediary is needed — the signaling channel.

The signaling channel is used to exchange metadata about media or data channels (session descriptions) and network connection (ICE candidates). The exact method for signaling is not mandated by the WebRTC specification and can be anything (e.g. XMPP, WebSockets, SIP, …).

When a direct connection between peers has been estabilished the signaling channel is no longer needed.

This signaling channel in this article uses WebSockets and exposes messages property that emits messages received by the WebSocket.

Sending data to a WebSocket connection can be achieved using the next function.

function WebSocketSignalingChannel(address) {
    this.socket = new WebSocket(address);
    var socketObservable = Observable.fromWebSocket(this.socket);
    this.messages = socketObservable.map(function(event) {
        return JSON.parse(event.data);
    });
}

WebSocketSignalingChannel.prototype.next = function(object) {
    this.socket.send(JSON.stringify(object));
};

To allow multiple conversations at once each conference is assigned an identifier. Room identifier is either taken from the current page’s query string or generated as a random hex-encoded 10 byte sequence using ES6 Uint8Array.prototype.map function.

function getRoom() {
    var room = location.search.substring(1);
    if (room.length !== 20) {
        room = Array.prototype.map.call(crypto.getRandomValues(new Uint8Array(10)), function(b) {
            return b.toString(16);
        }).map(function pad(b) {
            return ('00' + b).slice(-2);
        }).join('');
        history.replaceState(null, '', '?' + room + '#conference');
    }
    return room;
}

Initialization of the WebSocket signaling channel uses current scheme and host.

function getSignalingChannel(room) {
    var url = location.protocol.replace('http', 'ws') + '//' + location.host + '/rooms/' + room;
    return new WebSocketSignalingChannel(url);
}

Session description exchange

Signaling is used to transfer session descriptions and ICE candidates.

There are two kinds of session descriptions exchanged by peers:

offer is created by the initiator,
answer is created by the second party and sent back to the initiator.

Additionally each peer has two slots for session descriptions: local description for the description of itself and remote description for the description of the other peer.

	Peer 1	Peer 2
1	create offer
2	set offer as local description
3	send offer to peer 2
4		set offer as remote description
5		create answer
6		set answer as local description
7		send answer to peer 1
8	set answer as remote description
9	exchange ICE candidates

RTCPeer wrapper object handles incoming messages acting as both peers depending on the message received. The join message is emitted by the signaling channel when another peer joins the channel. The second peer never sees the join message of the first peer but it is instead presented with an offer.

RTCPeer.prototype.handleMessage = function(body) {
    var peer = this.peer, type = body.type;

    if (type === 'join') {
        // peer 1
        return peer.createOffer().then(this.setLocal);
    } else if (type === 'offer') {
        // peer 2
        peer.setRemoteDescription(new RTCSessionDescription(body));
        return peer.createAnswer().then(this.setLocal);
    } else if (type === 'answer') {
        // peer 1
        peer.setRemoteDescription(new RTCSessionDescription(body));
    } else if (type === 'candidate') {
        // both peers
        peer.addIceCandidate(new RTCIceCandidate(body));
    }

    return Promise.resolve({ target: self });
};

The following function returns a promise resolving to an answer that needs to be sent back to another peer. Replies are created by mapping incoming messages into responses.

function getReplies(messages, peers) {
    return messages.map(function(message) {
        return peers.get(message.from).handleMessage(message.body);
    }).unwrapPromises().filter(function(reply) {
        return reply.response;
    }).map(function(reply) {
        return {
            to: reply.target.id,
            body: reply.response
        };
    });
}

This function accepts two parameters:

messages — observable emitting commands (a signaling channel),
peers — lazy map that for each key retrieves an RTCPeer object creating new one if it is not already created.

The unwrapPromises function turns a stream of promises into a stream of values that these promises resolve to.

ICE and streams

ICE allows both peers to find the best communication path between them even if both of them are behind NAT. Each possible way to connect peers is known as an ICE candidate. To generate candidates two kinds of servers are used:

STUN — to determine the public IP and detect if peers are behind a NAT,
TURN — acts as a proxy or a relay server if no direct connection can be made.

The implementation uses simple object wrapper around WebRTC RTCPeerConnections which adds two observable properties:

streams represent remote streams that have been added to the peer,
iceCandidates a list of ICE candidates used to connect two peers together.

function RTCPeer(id, peer) {
    this.id = id;
    this.peer = peer;

    var iceEvents = Observable.fromEvent(peer, 'icecandidate');
    this.iceCandidates = iceEvents.filter(function(event) {
        return event.candidate;
    }).map(function(event) {
        return {
            target: this,
            candidate: event.candidate
        };
    }, this);

    var addStreamEvents = Observable.fromEvent(peer, 'addstream');
    this.streams = addStreamEvents.map(function(event) {
        return {
            target: this,
            stream: event.stream
        };
    }, this);

    this.setLocal = this.setLocal.bind(this);
}

RTCPeer.prototype.setLocal = function(description) {
    return this.peer.setLocalDescription(description).then(function() {
        return { target: this, response: description };
    }.bind(this));
};

An observable that emits RTCPeer objects is converted into a stream of ICE candidates. Because mapping peers into their ICE candiates produces a nested observable (stream of streams of candidates) a merge function is used to flatten the structure.

function getCandidates(objects) {
    return objects.mergeMap(function(peer) {
        return peer.iceCandidates;
    }).map(function(event) {
        var candidate = JSON.parse(JSON.stringify(event.candidate));
        candidate.type = 'candidate';
        return {
            to: event.target.id,
            body: candidate
        };
    });
}

Remote streams are obtained by mapping created RTCPeer objects into streams.

function getRemoteStreams(objects) {
    return objects.mergeMap(function(peer) {
        return peer.streams;
    }).map(function(event) {
        return event.stream;
    });
}

Peers

Peers are stored using a small utility object — the LazyMap. A lazy map is a map that produces values for keys on demand. This map additionally provides an observable that emits produced objects.

function LazyMap(create) {
    var map = new Map, generators = [];
    this.get = function(key) {
        if (map.has(key)) {
            return map.get(key);
        }
        var value = create(key);
        map.set(key, value);
        generators.forEach(function(generator) {
            generator.next(value);
        });
        return value;
    };
    this.objects = new Observable(function observe(generator) {
        generators.push(generator)
        return generator;
    });
}

The function below creates RTCPeer objects with peer connections on demand with specified local stream. The configuration object contains addresses for STUN and TURN servers.

function createMap(stream) {
    var configuration = {
        iceServers: [{
            url: 'stun:stun.l.google.com:19302'
        }, {
            url: 'stun:stun.services.mozilla.com'
        }, {
            url: 'turn:turn.anyfirewall.com:443?transport=tcp',
            credential: 'webrtc',
            username: 'webrtc'
        }]
    };
    return new LazyMap(function(id) {
        var peer = new RTCPeerConnection(configuration);
        peer.addStream(stream);
        return new RTCPeer(id, peer);
    });
}

User interface

The user interface object displays remote streams by creating small <video> elements. For convenience the connection link is also displayed as a QR code.

function UI(container) {
    var bigRemoteView = container.querySelector('.remote');
    var selfView = container.querySelector('.self');
    var statusLabel = container.querySelector('.status');
    var substatusLabel = container.querySelector('.substatus');
    var participants = container.querySelector('.participants');

    function setParticipantsView() {
        var videos = participants.querySelectorAll('video:not(.self)');
        Array.from(videos).forEach(function(video) {
            video.hidden = video.src === bigRemoteView.src;
        });
    }

    function setBigSource() {
        bigRemoteView.src = this.src;
        setParticipantsView();
    }

    function addRemoteStream(stream) {
        var remoteView = document.createElement('video');
        participants.insertBefore(remoteView, participants.firstChild);

        remoteView.addEventListener('click', setBigSource);
        remoteView.autoplay = true;
        remoteView.src = URL.createObjectURL(stream);

        if (!bigRemoteView.src) {
            setBigSource.call(remoteView);
        }
    }

    var qrCode = new QRCode(substatusLabel.querySelector('.qr-link'), {
        width: 128,
        height: 128
    });

    return {
        updateStatus: function(message) {
            statusLabel.textContent = message;
        },
        showConnectionLink: function(link) {
            substatusLabel.querySelector('.link').textContent = link;
            qrCode.clear();
            qrCode.makeCode(link);
            var hidden = Array.from(substatusLabel.querySelectorAll('[hidden]'));
            hidden.forEach(function(element) {
                element.hidden = false;
            });
        },
        hideConnectionLink: function() {
            substatusLabel.querySelector('.center').hidden = true;
        },
        addRemoteStream: addRemoteStream,
        setLocalStream: function(stream) {
            selfView.src = URL.createObjectURL(stream);
        }
    };
}

The function below provides a higher level interface that switches UI state based upon the function called.

function withUI(ui) {
    return {
        addLocalStream: function(stream) {
            ui.setLocalStream(stream);
            ui.updateStatus('waiting for someone to connect...');
            ui.showConnectionLink(location.href);
        },
        updateStatus: function(message) {
            if (message.body.type === 'join') {
                ui.updateStatus('calling...');
            } else if (message.body.type === 'offer') {
                ui.updateStatus('incoming call...');
            }
        },
        addRemoteStream: function (stream) {
            ui.updateStatus('');
            ui.hideConnectionLink();
            ui.addRemoteStream(stream);
        }
    };
}

Conference

if (!Uint8Array.prototype.map) {
    Uint8Array.prototype.map = Array.prototype.map;
}


document.createElement = document.ownerDocument.createElement.bind(document.ownerDocument);


document.createElement = document.ownerDocument.createElement.bind(document.ownerDocument);

Media to be shared are selected through a simple form. Each radio button specifies constraints to be used for selecting media source either as:

data-source attribute in JSON format,
value property that points to a global function that returns a promise with constraints.

<input type="radio" name="source"
    data-source='{"audio":true,"video":false}'> Audio only

<input type="radio" name="source" checked
    data-source='{"audio":false,"video":true}'> Video only

<input type="radio" name="source"
    data-source='{"audio":true,"video":true}'> Audio + Video

<input type="radio" name="source"
    value="sourceChromeScreen"> Screensharing in Chrome

<input type="radio" name="source"
    data-source='{"audio":false,"video":{"mediaSource":"screen"}}'> Screensharing in Firefox

<input type="radio" name="source"
    data-source='{"audio":false,"video":true,"fake":true}'> Fake media in Firefox

<input type="button" value="Connect">

Selected radio is turned into a promise that resolves to constraints.

var form = document.querySelector('form');
form.hidden = true;

var source = Array.from(form.source).find(function(source) {
    return source.checked;
});

var constraints;

if (source.dataset.source) {
    constraints = Promise.resolve(JSON.parse(source.dataset.source));
} else {
    constraints = window[source.value]();
}

Constraints are passed to getUserMedia that exchanges them for the MediaStream object and invokes connect.

var mediaDevices = navigator.mediaDevices;
var getUserMedia = mediaDevices.getUserMedia.bind(mediaDevices);

constraints.then(getUserMedia).then(connect);

The function that joins all the pieces — connect passes messages and streams from created objects into ui. At the end two observables — one for candidates and one for message replies are merged together and sent into the signaling channel.

function connect(stream) {
    var peers = createMap(stream);

    var signalingChannel = getSignalingChannel(getRoom());

    // pass events to UI object
    var ui = withUI(UI(document));
    ui.addLocalStream(stream);
    signalingChannel.messages.forEach(ui.updateStatus);
    getRemoteStreams(peers.objects).forEach(ui.addRemoteStream);

    // get responses
    var candidates = getCandidates(peers.objects);
    var replies = getReplies(signalingChannel.messages, peers);

    candidates.merge(replies).observe(signalingChannel);
}

Serverless mode

By substituting WebSocket singaling channel object with a different object that has the same interface it is possible to build a completely serverless WebRTC conference. Instead WebSocket messages users need to exchange the the communication data manually (for example via IM or e-mail).

The following object uses two text fields for outgoing (send) and incoming (receive) messages and two buttons — one to interpret received data and one to create offer and initiate exchange.

function StaticSignalingChannel(send, receive, receiveButton, offerButton) {
    this.objects = [];
    this.sendInput = send;
    this.messages = new Observable(function(generator) {
        var done = false,
            decoratedGenerator =
            decorate(generator, function() {
                done = true
            });
        receiveButton.addEventListener('click', function(e) {
            e.preventDefault();
            var messages = JSON.parse(receive.value);
            messages.forEach(function(message) {
                if (done) return;
                decoratedGenerator.next(message);
            }, this);
        }.bind(this));
        receiveButton.disabled = false;
        offerButton.addEventListener('click', function(e) {
            e.preventDefault();
            decoratedGenerator.next({
                body: {
                    type: 'join'
                }
            });
        }.bind(this));
        offerButton.disabled = false;

        return decoratedGenerator;
    });
    this.next = function(message) {
        this.objects.push(message);
        send.value = JSON.stringify(this.objects);
    };
    this.throw = console.error.bind(console);
}

Initialization of this object does not require the room identifier.

function getSignalingChannel() {
    var $ = document.querySelector.bind(document);
    return new StaticSignalingChannel(
        $('#sendInput'), $('#receiveInput'),
        $('#receiveBtn'), $('#offerBtn'));
}

WebSocket signaling server

The server used is a simple node.js application that accepts WebSocket connections and allows two-way communication between users.

This server can also be run in a secure (HTTPS) mode that is required for screensharing in Chrome.

var crypto = require('crypto');
var express = require('express');

var WebSocketServer = require('ws').Server;
var app = express();

var server;

if (option('secure')) {
    var fs = require('fs');
    var credentials = {
        key: fs.readFileSync('cert/server.key', 'utf8'),
        cert: fs.readFileSync('cert/server.crt', 'utf8')
    };
    server = require('https').createServer(credentials, app);
} else {
    server = require('http').createServer(app);
}

app.use(express.static(__dirname + '/public'));

function sendToOneClient(clientId, client) {
    return client.id === clientId;
}

function sendToOthers(ownConnection, client) {
    return ownConnection.upgradeReq.url === client.upgradeReq.url &&
        client !== ownConnection;
}

function parseJSON(data) {
    try {
        return JSON.parse(data);
    } catch (e) {
        console.warn('Cannot parse JSON', e);
    }
}

function option(name) {
    var env = process.env;
    return env['npm_config_' + name] || env[name.toUpperCase()] || env['npm_package_config_' + name];
}

var configuration = {
    server: server
};

if (option('origin')) {
    configuration.verifyClient = function(info) {
        return info.origin === option('origin');
    };
} else {
    console.warn('Verifying client connections disabled!')
}

var ws = new WebSocketServer(configuration);

ws.on('connection', function(connection) {
    connection.id = crypto.randomBytes(20).toString('hex');
    ws.clients.filter(sendToOthers.bind(null, connection)).forEach(function(client) {
        client.send(JSON.stringify({
            from: connection.id,
            body: {
                type: 'join'
            }
        }));
    });
    connection.on('message', function(data) {
        var message = parseJSON(data);
        if (!message) return;
        var target = message.to ? sendToOneClient.bind(null, message.to) : sendToOthers.bind(null, connection);
        ws.clients.filter(target).forEach(function(client) {
            client.send(JSON.stringify({
                from: connection.id,
                body: message.body
            }));
        });
        console.log('received:', message);
    });
});

server.listen(option('port'));

console.info('Open: http' + (option('secure') ? 's' : '') + '://localhost:' + option('port'));

Screensharing

In Firefox and in Chrome before version 34 a special parameter (mediaSource: "screen") allowed using windows or screens as inputs for WebRTC. Since version 34 Chrome requires custom extension with desktopCapture permission to screenshare. The extension then uses chrome.desktopCapture API to retrieve the stream identifier that is used on the page.

The implementation is comprised of two parts — the content script and the background page.

The content script is interacting with the site and simply forwards messages to and from the background page.

var port = chrome.runtime.connect();

port.onMessage.addListener(function (message) {
    window.postMessage(message, '*');
});

window.addEventListener('message', function (event) {
    port.postMessage(event.data);
});

The background page invokes the desktopCapture function and posts the message with stream identifier (if any) to the content page.

chrome.runtime.onConnect.addListener(function (port) {
    port.onMessage.addListener(onMessage.bind(this));
});

function onMessage(message) {
    if (message.command == 'get-sourceid' && !message.type) {
        chrome.desktopCapture.chooseDesktopMedia(['screen', 'window'],
            // tab is required so that the sourceId can be used there
            this.sender.tab,
            onStream.bind(this, message.id));
    } else if (message.command == 'check-installed' && !message.type) {
        this.postMessage({
            id: message.id,
            command: 'check-installed',
            type: 'result'
        });
    }
}

function onStream(id, streamId) {
    this.postMessage({
        command: 'get-sourceid',
        type: 'result',
        id: id,
        streamId: streamId
    });
}

The stream identifier is requested from a web page and received through postMessages.

function sourceChromeScreen() {
    var constraints = {
        audio: false,
        video: {
            mandatory: {
                chromeMediaSource: 'desktop',
                maxWidth: screen.width > 1920 ? screen.width : 1920,
                maxHeight: screen.height > 1080 ? screen.height : 1080
            },
            optional: []
        }
    };
    var id = Math.random();
    return new Promise(function(resolve, reject) {
        window.addEventListener('message', function getScreen(e) {
            if (e.data.id === id && e.data.type === 'result' && e.data.command === 'get-sourceid') {
                window.removeEventListener('message', getScreen);
                constraints.video.mandatory.chromeMediaSourceId = e.data.streamId;
                if (e.data.streamId) {
                    resolve(constraints);
                } else {
                    reject(Error('Missing source parameter'));
                }
            }
        });
        window.postMessage({
            command: 'get-sourceid',
            id: id
        }, '*');
    });
}

For a complete source code of this application (both server and client side) see Reactive WebRTC Conference.

Comments

Revisions

2015-05-29Initial version.