Home Java Real-time face tracking in a browser using TensorFlow.js. Part 4

Real-time face tracking in a browser using TensorFlow.js. Part 4

by admin
Real-time face tracking in a browser using TensorFlow.js. Part 4

In Part 4 (you read first , second and the third , right?) we’re going back to our goal of creating a Snapchat-style face filter, using what we’ve already learned about tracking faces and adding 3D visualization through ThreeJS. In this article, we’re going to use key face points to virtually render a 3D model on top of webcam video to have a little fun with augmented reality.


You can download a demo version of this project. You may need to enable WebGL interface support in your web browser to get the performance you need. You can also download code and files for this series. It is assumed that you are familiar with JavaScript and HTML and have at least a basic understanding of neural networks.

Adding 3D graphics with ThreeJS

This project will be based on the face tracking project code we created at the beginning of this series. We will add a 3D scene overlay to the original canvas.

ThreeJS makes it relatively easy to work with 3D graphics, so we’re going to use this library to render virtual glasses over our faces.

At the top of the page we need to include two script files to add ThreeJS and a GLTF file loader for the virtual glasses model we’re going to use :

<script src="https://cdn.jsdelivr.net/npm/three@0.123.0/build/three.min.js"></script><script src="https://cdn.jsdelivr.net/npm/three@0.123.0/examples/js/loaders/GLTFLoader.js"></script>

To keep things simple and not have to worry about how to put the webcam texture on the scene, we can overlay an additional transparent canvas and draw virtual glasses on it. We use the CSS code below above the tag body by putting the output into a container and adding an overlay canvas.

<style>.canvas-container {position:relative;width: auto;height: auto;}.canvas-container canvas {position: absolute;left: 0;width: auto;height: auto;}</style><body><div class="canvas-container"><canvas id="output"></canvas><canvas id="overlay"> </canvas></div>...</body>

The 3D scene requires several variables, and we can add a service function to load the 3D model for the GLTF files:

<style>.canvas-container {position: relative;width: auto;height: auto;}.canvas-container canvas {position: absolute;left: 0;width: auto;height: auto;}</style><body><div class="canvas-container"><canvas id="output"> </canvas><canvas id="overlay"> </canvas></div>...</body>

We can now initialize all the components of our async block, starting with the overlay web size, as was done with the output web :

(async () => {...let canvas = document.getElementById( "output" );canvas.width = video.width;canvas.height = video.height;let overlay = document.getElementById( "overlay" );overlay.width = video.width;overlay.height = video.height;...})();

You also need to set the renderer, scene and camera variables. Even if you’re not familiar with 3D perspective and camera math, you don’t have to worry. This code simply places the camera scene so that the width and height of the webcam video match the coordinates of the three-dimensional :

(async () => {...// Load Face Landmarks Detectionmodel = await faceLandmarksDetection.load(faceLandmarksDetection.SupportedPackages.mediapipeFacemesh);renderer = new THREE.WebGLRenderer({canvas: document.getElementById( "overlay" ), alpha: true});camera = new THREE.PerspectiveCamera( 45, 1, 0.1, 2000 );camera.position.x = videoWidth / 2;camera.position.y = -videoHeight / 2;camera.position.z = -( videoHeight / 2 ) / Math.tan( 45 / 2 );// distance to z should be tan( fov / 2 )scene = new THREE.Scene();scene.add( new THREE.AmbientLight( 0xcccccc, 0.4 ) );camera.add( new THREE.PointLight( 0xffffff, 0.8 ) );scene.add( camera );camera.lookAt( { x: videoWidth / 2, y: -videoHeight / 2, z: 0, isVector3: true });...})();

We need to add to the function trackFace just one line of code to render the scene on top of the face tracking output :

async function trackFace() {const video = document.querySelector( "video" );output.drawImage(video, 0, 0, video.width, video.height, 0, 0, video.width, video.height);renderer.render( scene, camera );const faces = await model.estimateFaces( {input: video, returnTensors: false, flipHorizontal: false, });...}

The last step of this puzzle before displaying the virtual objects on our face is to load a 3D model of the virtual glasses. We found a pair of Maximkuzlin’s heart-shaped glasses on SketchFab You can download and use a different object if you wish.

This shows how to load an object and add it to the scene before calling the trackFace :

Placing virtual glasses on the person being tracked

Now begins the fun part – putting on our virtual glasses.

The labeled annotations provided by the TensorFlow face tracking model include an array of coordinates MidwayBetweenEyes in which the X and Y coordinates correspond to the screen, and the Z coordinate adds depth to the screen. This makes the placement of the glasses on our eyes a fairly easy task.

We need to make the Y coordinate negative because in the two-dimensional screen coordinate system, the positive Y axis points down, but in the spatial coordinate system points up. We also subtract the distance or depth of the camera from the Z coordinate value to get the correct distance in the scene.

glasses.position.x = face.annotations.midwayBetweenEyes[ 0 ][ 0 ];glasses.position.y = -face.annotations.midwayBetweenEyes[ 0 ][ 1 ];glasses.position.z = -camera.position.z + face.annotations.midwayBetweenEyes[ 0 ][ 2 ];

Now we have to calculate the orientation and scale of the glasses. This is possible if we determine the "up" direction relative to our face, which points to the top of our head, and the distance between our eyes.

You can estimate the direction "up" by using a vector from the array midwayBetweenEyes used for the glasses, along with a traceable point for the bottom of the nose. Then we normalize its length as follows :

glasses.up.x = face.annotations.midwayBetweenEyes[ 0 ][ 0 ] - face.annotations.noseBottom[ 0 ][ 0 ];glasses.up.y = -( face.annotations.midwayBetweenEyes[ 0 ][ 1 ] - face.annotations.noseBottom[ 0 ][ 1 ] );glasses.up.z = face.annotations.midwayBetweenEyes[ 0 ][ 2 ] - face.annotations.noseBottom[ 0 ][ 2 ];const length = Math.sqrt( glasses.up.x ** 2 + glasses.up.y ** 2 + glasses.up.z ** 2 );glasses.up.x /= length;glasses.up.y /= length;glasses.up.z /= length;

To get the relative size of the head, you can calculate the distance between the eyes :

const eyeDist= Math.sqrt(( face.annotations.leftEyeUpper1[ 3 ][ 0 ] - face.annotations.rightEyeUpper1[ 3 ][ 0 ] ) ** 2 +( face.annotations.leftEyeUpper1[ 3 ][ 1 ] - face.annotations.rightEyeUpper1[ 3 ][ 1 ] ) ** 2 +( face.annotations.leftEyeUpper1[ 3 ][ 2 ] - face.annotations.rightEyeUpper1[ 3 ][ 2 ] ) ** 2);

Finally, we scale the points based on the value of eyeDist and orient the glasses along the Z-axis, using the angle between the "up" vector and the Y-axis. And voila!

Run your code and check the result.

Real-time face tracking in a browser using TensorFlow.js. Part 4

Before we move on to the next part of this series, let’s look at the complete code put together :

Code sheet
<html><head><title> Creating a Snapchat-Style Virtual Glasses Face Filter</title><script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.4.0/dist/tf.min.js"> </script><script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/face-landmarks-detection@0.0.1/dist/face-landmarks-detection.js"> </script><script src="https://cdn.jsdelivr.net/npm/three@0.123.0/build/three.min.js"> </script><script src="https://cdn.jsdelivr.net/npm/three@0.123.0/examples/js/loaders/GLTFLoader.js"> </script></head><style>.canvas-container {position: relative;width: auto;height: auto;}.canvas-container canvas {position: absolute;left: 0;width: auto;height: auto;}</style><body><div class="canvas-container"><canvas id="output"> </canvas><canvas id="overlay"> </canvas></div><video id="webcam" playsinline style="visibility: hidden;width: auto;height: auto;"> </video><h1 id="status"> Loading...</h1><script>function setText( text ) {document.getElementById( "status" ).innerText = text;}function drawLine( ctx, x1, y1, x2, y2 ) {ctx.beginPath();ctx.moveTo( x1, y1 );ctx.lineTo( x2, y2 );ctx.stroke();}async function setupWebcam() {return new Promise( ( resolve, reject ) => {const webcamElement = document.getElementById( "webcam" );const navigatorAny = navigator;navigator.getUserMedia = navigator.getUserMedia ||navigatorAny.webkitGetUserMedia || navigatorAny.mozGetUserMedia ||navigatorAny.msGetUserMedia;if( navigator.getUserMedia ) {navigator.getUserMedia( { video: true }, stream => {webcamElement.srcObject = stream;webcamElement.addEventListener( "loadeddata", resolve, false );}, error => reject());}else {reject();}});}let output = null;let model = null;let renderer = null;let scene = null;let camera = null;let glasses = null;function loadModel( file ) {return new Promise( ( res, rej ) => {const loader = new THREE.GLTFLoader();loader.load( file, function ( gltf ) {res( gltf.scene );}, undefined, function ( error ) {rej( error );} );});}async function trackFace() {const video = document.querySelector( "video" );output.drawImage(video, 0, 0, video.width, video.height, 0, 0, video.width, video.height);renderer.render( scene, camera );const faces = await model.estimateFaces( {input: video, returnTensors: false, flipHorizontal: false, });faces.forEach( face => {// Draw the bounding boxconst x1 = face.boundingBox.topLeft[ 0 ];const y1 = face.boundingBox.topLeft[ 1 ];const x2 = face.boundingBox.bottomRight[ 0 ];const y2 = face.boundingBox.bottomRight[ 1 ];const bWidth = x2 - x1;const bHeight = y2 - y1;drawLine( output, x1, y1, x2, y1 );drawLine( output, x2, y1, x2, y2 );drawLine( output, x1, y2, x2, y2 );drawLine( output, x1, y1, x1, y2 );glasses.position.x = face.annotations.midwayBetweenEyes[ 0 ][ 0 ];glasses.position.y = -face.annotations.midwayBetweenEyes[ 0 ][ 1 ];glasses.position.z = -camera.position.z + face.annotations.midwayBetweenEyes[ 0 ][ 2 ];// Calculate an Up-Vector using the eyes position and the bottom of the noseglasses.up.x = face.annotations.midwayBetweenEyes[ 0 ][ 0 ] - face.annotations.noseBottom[ 0 ][ 0 ];glasses.up.y = -( face.annotations.midwayBetweenEyes[ 0 ][ 1 ] - face.annotations.noseBottom[ 0 ][ 1 ] );glasses.up.z = face.annotations.midwayBetweenEyes[ 0 ][ 2 ] - face.annotations.noseBottom[ 0 ][ 2 ];const length = Math.sqrt( glasses.up.x ** 2 + glasses.up.y ** 2 + glasses.up.z ** 2 );glasses.up.x /= length;glasses.up.y /= length;glasses.up.z /= length;// Scale to the size of the headconst eyeDist = Math.sqrt(( face.annotations.leftEyeUpper1[ 3 ][ 0 ] - face.annotations.rightEyeUpper1[ 3 ][ 0 ] ) ** 2 +( face.annotations.leftEyeUpper1[ 3 ][ 1 ] - face.annotations.rightEyeUpper1[ 3 ][ 1 ] ) ** 2 +( face.annotations.leftEyeUpper1[ 3 ][ 2 ] - face.annotations.rightEyeUpper1[ 3 ][ 2 ] ) ** 2);glasses.scale.x = eyeDist / 6;glasses.scale.y = eyeDist / 6;glasses.scale.z = eyeDist / 6;glasses.rotation.y = Math.PI;glasses.rotation.z = Math.PI / 2 - Math.acos( glasses.up.x );});requestAnimationFrame( trackFace );}(async () => {await setupWebcam();const video = document.getElementById( "webcam" );video.play();let videoWidth = video.videoWidth;let videoHeight = video.videoHeight;video.width = videoWidth;video.height = videoHeight;let canvas = document.getElementById( "output" );canvas.width = video.width;canvas.height = video.height;let overlay = document.getElementById( "overlay" );overlay.width = video.width;overlay.height = video.height;output = canvas.getContext( "2d" );output.translate( canvas.width, 0 );output.scale( -1, 1 ); // Mirror camoutput.fillStyle = "#fdffb6";output.strokeStyle = "#fdffb6";output.lineWidth = 2;// Load Face Landmarks Detectionmodel = await faceLandmarksDetection.load(faceLandmarksDetection.SupportedPackages.mediapipeFacemesh);renderer = new THREE.WebGLRenderer({canvas: document.getElementById( "overlay" ), alpha: true});camera = new THREE.PerspectiveCamera( 45, 1, 0.1, 2000 );camera.position.x = videoWidth / 2;camera.position.y = -videoHeight / 2;camera.position.z = -( videoHeight / 2 ) / Math.tan( 45 / 2 ); // distance to z should be tan( fov / 2 )scene = new THREE.Scene();scene.add( new THREE.AmbientLight( 0xcccccc, 0.4 ) );camera.add( new THREE.PointLight( 0xffffff, 0.8 ) );scene.add( camera );camera.lookAt( { x: videoWidth / 2, y: -videoHeight / 2, z: 0, isVector3: true } );// Glasses from https://sketchfab.com/3d-models/heart-glasses-ef812c7e7dc14f6b8783ccb516b3495cglasses = await loadModel( "web/3d/heart_glasses.gltf" );scene.add( glasses );setText( "Loaded!" );trackFace();})();</script></body></html>

What’s next? What if we also add facial emotion detection?

Would you believe it’s all possible on one web page? By adding 3D objects to the real-time face tracking feature, we’ve created magic with a camera right in the web browser. You might think : "But heart-shaped glasses exist in real life…" And it’s true! What if we created something really magical, like a hat…that knows how we feel?

Let’s go into the next article let’s create a magic hat (like at Hogwarts!) to detect emotions and see if we can make the impossible possible by using the TensorFlow.js library even more! See you tomorrow, same time.

Real-time face tracking in a browser using TensorFlow.js. Part 4

Find out more How to get Level Up in skills and salary or in-demand profession from scratch by taking SkillFactory online courses with 40% discount and promo code HABR which will give you an additional +10% off tuition.

Other professions and courses

PROFESSIONS

COURSES

You may also like