Augmented Reality App Development

Prof. Dr. Ansgar Gerlicher

Stuttgart Media University

Stuttgart, Germany

Agenda

General Intro to AR
ARKit Overview
Basics on using ARKit
- World Tracking / Plane detection
- Image Detection
- Object Detection
Assignments - hands-on AR apps

General Intro

What is Augmented Reality?
- Difference to Mixed and Virtual Reality
Hardware Requirements - Smartphones / Glasses
Software Requirements - AR Framework Functions and Algorithms
- Basics of Image Tracking, Environment Understanding and Location Tracking, Light Estimation and Lighting

Gartner Hype Cycle 2018

center 60% Source: Gartner

Short AR History

1901: L. Frank Baum, an author, first mentions the idea
1968: Ivan Sutherland invents the head-mounted display
1990: The term ‘Augmented Reality’ is attributed to Thomas P. Caudell, a former Boeing researcher [Lee, Kangon]
First AR applications started in the early 1990s (here Virtual Fixtures by U.S. Airforce, 1992)

40% 40%

Source: wikipedia

Definition of Augmented Reality

What is the difference between:

Augmented Reality (AR)
Visual-Effects (VFX)
Virtual Reality (VR)
and Mixed Reality (MR)?

AR or what?

center 20%

AR or what?

center 60%

AR or what?

center 60%

Youtube: Zombies, Run!

What is Augmented Reality?

“Combination of digital data and real-world human sensory input in real-time that is apparently attached (registered) to the physical space” (J. Linowes, K. Babilinksi: Augmented Reality for Developers, 2017)
- Mostly visual, but could also be audio or haptic
- E.g. audio AR for blind people or visual AR for deaf
Definition by R. Azuma (1997):
- AR combines real and virtual and is interactive in real-time and in 3D
- Cinematic special effects are not AR as they are not real-time interactive

AR Use-Cases

Retail
Games and Entertainment
Navigation
Marketing
Military
Industry
Printmedia
Medicine, e.g. surgery

Example Retail

center 60% AR in Retail - Stuttgart Media University, Semesterproject Winterterm 2017-18

Example Entertainment

center 60% AR in Entertainment - Stuttgart Media University, Semesterproject Winterterm 2017-18

Example Automotive

center 40% AR in Automotive Example - Playspace, project in cooperation with University of Swinburne / Daimler / gerenwa

New: Visual Positiontioning System (VPS)

SLAM, AR and GPS working together (Google IO, 2018)

center 40% Youtube: VPS Google I/O 2018

Future AR?: Hyper-Reality

center 40% Vimeo: Hyper-Reality by Keiichi Matsuda

AR vs. VR. MR

center 80% Source: Microsoft

Virtual Reality

center 80% Youtube: VR Coastiality

Mixed Reality

center 80% Youtube: Microsoft MR

Mixed Reality Spectrum

center 80% Source: Microsoft

Hardware Requirements for AR apps

Sensors to track movement, orientation and location in space, e.g. camera, accelerometer, gyroscope, GPS, compass etc.
Enough processing power to calculate position and perspective based on the sensor data and to render graphical 3D objects and audio on screen overlaying camera video
Interaction possibilities, e.g. touch, gestures etc.

Examples for AR capable smartphones

All latest Apple iPhones since 6S upwards and iPads 5th generation upwards running iOS 11.0 or later (ARKit)
Most Android Phones running Android 8.0 or later (for use with ARCore)
ARCore capable devices list:
- https://developers.google.com/ar/discover/supported-devices

Example AR Glasses

center 80%

Source: tomsguide

Future AR Hardware

Virtual Retinal Display
Bionic Contact Lenses, Smart Contact Lenses
Apple’s AR Glasses in 2022?

center 100%

AR Software Requirements

To create a perfect illusion of the integrations of virtual objects in a real environment it is vital to get a good understanding of the real environment first
This requires basically the following two steps:

Understand the environment as good as possible meaning
- Track objects and the location, movements of objects and the device, understand light conditions. Locate the position and orientation of the device (smartphone or glass) in relation to the surroundings
Render objects so the blend in with the environment based on the environmental understanding
- They are correct in position and size, no overlapping with real objects and the lighting is rendered correctly

AR Frameworks

Frameworks make it easier to develop AR apps by providing certain functions that help in object recognition and tracking als well as in location tracking and environmental understanding
They provide us with implementations of algorithms needed in the area of Computer Vision and Machine Learning and thus provide us with certain functionality
Frameworks are for example:
- ARCore, ARKit, Vuforia, ARToolkit…and about 40 others

Framework Functionalities

Marker Tracking
Image Tracking aka Natural Feature Tracking
Multi-Target Tracking
Text Recognition
3D-Object Recognition & Tracking
(Geo)location Tracking
Simultaneous Localization and Mapping (SLAM)
Masking virtual Objects

Marker Tracking

Using computer vision techniques special markers are recognized via the camera
Markers usually have a high contrast and characteristic that allows easy tracking

center 60%

Image Tracking

Aka Natural Feature Tracking
Recognizing natural features in video images (U. Neumann, S. You, 1999)
Images are easier to recognize, if they have a high contrast and many features to detect and their size in the real world is known

3D Rendering

OpenGLES (Android / iOS), Metal (iOS) are most commonly used for rendering 3D Objects on a camera view

center 60%

Camera, Viewport, Field of View

Field of View (FOV): defines to what angle can be seen through the devices screen / projection. E.g. Hololense only 35 degree
Viewport: the device Screen (window)
View Frustrum: FOV of the notional camera (what could be seen)

center 60%

Multi Targets

Allows to use multiple markers / images and show multiple virtual objects at the same time

center 60%

Text Recognition

Text can also be used as „Marker“ for AR
E.g. now in Google Translate

center 60%

Object Tracking

Object recognition and tracking them is one of the major components of AR
- Objects can be the floor, furniture (table, chair) or any other object in the room
This is done by using Artificial Intelligence or Machine Learning techniques and sensor hardware, mainly the camera
The Vuforia Framework e.g. supports recognition and tracking of objects based on 3D models -> Model Targets, ARKit supports object detection and recognition using feature points and previously scanned objects

Location Tracking

Tracking of the location and orientation (pose) of objects and the device is important to understand the environment in order to know where to ideally place virtual objects
This is done by using the camera and sensors such as
- intertial sensors (e.g. accelerometer, gyroscope)
- location sensors (e.g. GPS) to detect the position
- other sensors e.g. the magnetic field sensor to detect the heading direction

Light Estimation

Tries to replicate the lighting conditions of the real world and apply it to the 3D virtual objects
An image analysis algorithm is e.g. used to determine the light intensity based on the current image from camera

center 60%

Binaural Sound

center 60%

SLAM

Simultaneous Localization and Mapping (SLAM) allows to build a map of the environment and localize the user or objects within that map at the same time
Originally developed for robot orientation within a location
Helps to understand where I am
- relative to some (arbitrary) starting point
- In absolute coordinates on earth, cm accurate
And how the world around me looks like
- Metrically accurate 3D scan
- Objects scenes, semantics (persons, actions)

Motion Tracking

AR Frameworks use Visual Inertial Odometry (VIO) for motion tracking and feature recognition
VIO combines image features with Inertial Motion Sensors (Inertial Measurement Unit - IMU) to track device orientation and position (pose)

center 60%

Understanding the Environment

Identify objects and features using VIO
Creating and tracking point clouds
Constructing a meshing from a point cloud (meshing)

center 60%

Masking Virtual Objects

When real objects are detected and tracked the can be used to calculate when a virtual objects should be masked

center 60%

Youtube: masking objects ARKit 3 People Occlusion / Masking Feature

Interacting with the Environment

Ray casting takes a point of touch in two dimensions and cast a ray into the scene
This ray is then tested against other objects in the scene for collisions

center 60%

3D Reconstruction

center 80%

Youtube: 3D reconstruction

Semantic Scene Understanding

center 80%

Youtube: Semantic Scene Understanding

Processing Pipeline

center 60%

ARKit overview

What is ARKit?

Apple presented the ARKit frameworks since iOS 11 on WWDC 2017. Since September 2018 the new version ARKit 2.0 was released with iOS 12. It provides an API to develop AR applications that allow you to augment the camera view and place virtual 3D object within the real world using your iOS device as a window to see them.

What features does ARKit offer?
- Feature and Plane Detection
- Image Detection
- Object Detection
- Facetracking
- World Sharing

ARKit Agenda

You will learn how to create your own AR app that can:

Present 3D models in the real world
Detect planes and place 3D objects based on touch
Track images and augment them with 3D objects
And more…

Assignment 1 - First AR app

Let’s jump right in an create a first AR app in Xcode:

Assignment 1 - First AR App

Assignment 2 - Using QuickLook API and creating USDZ files

The Quick Look API is a very easy way to get a nice AR impression. Let’s try it:

Assignment 2 - Placing AR Objects using QuickLook

Important basic classes and protocols in ARKit

No that we’ve created our own first two AR apps (that was easy right), let’s try to get a better understanding of the most important ARKit classes used:

ARWorldTrackingConfiguration
- provides motion tracking and enables features to help you place virtual content in relation to real-world surfaces
ARSCNView
- Is a SceneKit view that includes an ARSession object that manages the motion tracking and image processing required to create an augmented reality (AR) experience
ARSCNViewDelegate
- Defines methods that are called when certain ARKit events occur
ARHitTestResult
- Information about a real-world surface found by examining a point in the device camera view of an AR session
SCNNode
- A structural element of a scene graph, representing a position and transform in a 3D coordinate space, to which you can attach geometry, lights, cameras, or other displayable content
ARAnchor
- A real-world position and orientation that can be used for placing objects in an AR scene

On the next few slides we will use these classes and protocols to:

Detect planes in the real world
Place 3D objects at the location where we touch a plane

Detecting Planes with ARKit

In order to detect planes you need to first create an ARSession and configure it using an ARConfiguration object. For the configuration use the ARWorldTrackinConfiguration class and turn on the plane detection as shown here:

let configuration = ARWorldTrackingConfiguration()
configuration.planeDetection = [.horizontal, .vertical]

Next create an ARSCNView (if not already available) and configure it using the configuration object:

let sceneView = ARSCNView()
sceneView.session.run(configuration)

Now your scene view is configured to detect horizontal and vertical planes in the real world.

Using ARSCNViewDelegate to receive information on detected planes

In order to be notified by ARKit if a plane was detected, your ViewController should implement the following method:

 func renderer(_ renderer: SCNSceneRenderer, didAdd node: SCNNode, for anchor: ARAnchor) {
       
    }

This method tells the delegate that a SceneKit node corresponding to a new AR anchor has been added to the scene. It is called whenever ARKit detects a new plane. The plane is added as node at the anchor position. The parameters are:

renderer: The class that renders the Scene (usually your SCNView implementing the SCNSceneRenderer protocol)
node: The SCNnode that has been added to the scene that you can use to add renderable content
anchor:The ARAnchor that defines the position and pose of the node within the scene

Rendering a detected plane in the AR Scene

Within the “renderer” callback method we can use the provided information to visualize the detected planes. For this a SCNPlane class can be used. The following code shows how the SCNPlane class is created based on the information from the ARAnchor:

 guard let planeAnchor = anchor as? ARPlaneAnchor else { return }
 let extentPlane: SCNPlane = SCNPlane(width: CGFloat(planeAnchor.extent.x), height: CGFloat(planeAnchor.extent.z))

First we cast the ARAnchor to an ARPlaneAnchor in order to get information on it’s extend. Then we create the SCNPlane using the extend as width and height for the new object. A SCNPlane represents a rectangle with controllable width and height. The plane has one visible side.

In order to render a 3D object in the scene we need to create a SCNNode. This can be done as follows:

let extentNode = SCNNode(geometry: extentPlane) 
// position the node
extentNode.simdPosition = planeAnchor.center

// `SCNPlane` is vertically oriented in its local coordinate space, so
// rotate it to match the orientation of `ARPlaneAnchor`.
extentNode.eulerAngles.x = -.pi / 2
// Make the visualization semitransparent
extentNode.opacity = 0.3
// add the node to the scene
node.addChildNode(extentNode)

Detecting touches and use the ARHitTestResult to place objects in the scene

In order to detect touches on a View the touchesBegan() method can be overridden. To find out which object was touched in the 3D scene, the hitTest() of ARSCNView can be used. It allows to find a hit with different types of objects in a scene, such as feature points and planes. The following code shows how an extisting plane can be ‘hit tested’:

override func touchesBegan(_ touches: Set<UITouch>, with event: UIEvent?) {
        
        super.touchesBegan(touches, with: event)
        guard let touch = touches.first else { return }
        let results : [ARHitTestResult] = self.sceneView.hitTest(touch.location(in: self.sceneView), types: [.existingPlaneUsingExtent])
        guard let hitResult = results.last else { return }
        // let's show our model in the scene    
        self.renderSomethingHere(hitFeature: hitResult)
}

The hitTest() method returns an array of ARHitTestResult objects. In the above example it is tested, if the location of a touch on the view would “hit” an existing plane with an extend (a size) in the world context.

Rendering something at the location of the hitResult

To render a node in the 3D scene based on the found hit location, it is necessary to retrieve the position and orientation of the hit test result relative to the world coordinate system. This is done by using the property worldTransform.

This is a transform matrix that indicates the intersection point between the detected surface and the ray that created the hit test result. A hit test projects a 2D point in the image or view coordinate system along a ray into the 3D world space and reports results where that line intersects detected surfaces.

In order to translate this into a location within 3D space, x,y,z coordinates are needed. This can be done by using SCNVector3Make as follows:

func renderSomethingHere(hitFeature: ARHitTestResult){
        
        
    let hitTransform = SCNMatrix4(hitFeature.worldTransform)
    // get coordinate for node
    let hitPosition = SCNVector3Make(hitTransform.m41,
                                         hitTransform.m42,
                                         hitTransform.m43)
    //create a clone of an existing node  
    let node = self.someSCNNode!.clone()
    // set the position within 3D space
    node.position = hitPosition
    // add the node to the ARSCNScene
    self.sceneView.scene.rootNode.addChildNode(node)
}

In the above example a SCNnode is cloned, positioned and then added to the scene as child node. Let’s see how the node is created:

Loading models from an UDSZ file into an SCNNode

As we’ve learned in the second assignment, it is quite easy to load USDZ files as SCNnodes using Quick Look. But without Quick Look, we have to do it by using the class SCNReferenceNode. It though is fairly simple. Here is some example code:

let url = Bundle.main.url(forResource: "Tugboat", withExtension: "usdz")
if let refNode = SCNReferenceNode(url: url!) {
    // the file was found so load the model            
    refNode.load()
    // set our boat variable
    self.boat = refNode
    // scale the boat and make it much smaller
    self.boat?.scale.x = 0.05
    self.boat?.scale.y = 0.05
    self.boat?.scale.z = 0.05
}

Assignment 3 - Plane Detection and Object Placement

That’s it. You now can use the SCNNode and add it to the scene. Not let’s try it out - hands on:

Assignment 3 - Plane Detection and Placing Objects

ARKit Image Detection

For the detection and tracking of images, ARKit provides basically two classes:

ARImageTrackingConfiguration
- Establishes a 3D space by tracking the motion of known 2d images in the camera view. This requires less performance than the ARWorldTrackingConfiguration, as it is much easier to calculate.
ARReferencingImage
- Represents an image that will be tracked by ARKit

Images and their movements are tracked and objects can be placed relative to the tracked position of those images.

Configuring Image Detection

To enable image detection the following steps are necessary:

Load one or more ARReferenceImage resources from your app’s asset catalog.
Create a world-tracking configuration and pass those reference images to its detectionImages property.
Use the run(_:options:) method to run a session with your configuration.

The code below shows how to execute these steps when starting or restarting the AR experience

let configuration = ARImageTrackingConfiguration()
        
guard let trackedImages = ARReferenceImage.referenceImages(inGroupNamed: "Photos", bundle: Bundle.main) else {
    print("No images available")
    return
}
        
configuration.trackingImages = trackedImages
configuration.maximumNumberOfTrackedImages = 1
// Run the view's session
sceneView.session.run(configuration)

Visualize Image Detection Results

When one of the reference images is detected, the session automatically adds a corresponding ARImageAnchor to its list of anchors. Implement for example the renderer(_:didAdd:for:) method for reacting on a image detection as follows:

func renderer(_ renderer: SCNSceneRenderer, didAdd node: SCNNode, for anchor: ARAnchor) {
        
        
    let shipScene = SCNScene(named: "art.scnassets/ship.scn")!
    let shipNode = shipScene.rootNode.childNodes.first!
        
    shipNode.eulerAngles.x = -.pi / 2
    node.addChildNode(shipNode)
        
}

Position an object on the detected image

To use the detected image as a trigger for AR content, you’ll need to know its position and orientation, its size, and which reference image it is. The anchor’s inherited transform property provides position and orientation, and its referenceImage property tells you which ARReferenceImage object was detected. If your AR content depends on the extent of the image in the scene, you can then use the reference image’s physicalSize to set up your content, as shown in the code below.

guard let imageAnchor = anchor as? ARImageAnchor else { return }
let referenceImage = imageAnchor.referenceImage
    
// Create a plane to visualize the initial position of the detected image.
let plane = SCNPlane(width: referenceImage.physicalSize.width, height: referenceImage.physicalSize.height)
let planeNode = SCNNode(geometry: plane)
planeNode.opacity = 0.25
    
//rotate the plane to match.
planeNode.eulerAngles.x = -.pi / 2
    
// Add the plane visualization to the scene.
node.addChildNode(planeNode)

Provide Your Own Reference Images

You can provide your own images as a reference. This is done by creating an AR resource group in your Assets and then adding the image files (jpg or png) via drag an drop. The following things should be considered:

The added images’ size should be set to their real physical size in order to detect them properly
Images should have a high enough resolution (at least 480px in width and height)
Images are easier to detect if the have high contrast
If an image is not on a flat surface, e.g. on a bottle, it may not be detected correctly
Reflections or bad lighting conditions make the detection more difficult

Xcode will warn you if the image quality is not good enough, when you add an image.

For each session you should load one resource group. Apple recommends not to use more than 25 images in on session for performance reasons. If you want to use more images in your app that is possible, but you should load another resource group for example dependant on the context or location.

Assignment 4 - Image Tracking

Now you should have enough information to develop your own AR image tracking app. Have fun!

Assignment 4 - Image Tracking

ARKit Object Tracking

With iOS 12 Apple supports also 3D object detection. In order to detect objects the following steps are necessary

Create an ARWorldTrackingConfiguration
Load a set of reference objects using the class ARReferenceObject
Add the reference object to the configuration
Run the configuration

The rest is the same as with detecting images. Here is some example code to do the setup:

let configuration = ARWorldTrackingConfiguration()
guard let referenceObjects = ARReferenceObject.referenceObjects(inGroupNamed: "gallery", bundle: nil) else {
    fatalError("Missing expected asset catalog resources.")
}
configuration.detectionObjects = referenceObjects
sceneView.session.run(configuration)

When ARKit detects one a reference object, the session automatically adds a corresponding ARObjectAnchor to its list of anchors

Creating ARKit ReferenceObjects for tracking

Objects are created by using the app provided by Apple here. But it is also possible to scan objects in your own app. This is done by using the ARObjectScanningConfiguration class. Here is some example code:

let configuration = ARObjectScanningConfiguration()
configuration.planeDetection = .horizontal
sceneView.session.run(configuration, options: .resetTracking)

After scanning, call createReferenceObject() method and export or use the scanned object.

More information about this process can be found here

Assignment 5 - Object detection

Now let’s try scanning and detecting objects with ARKit:

Assignment 5 - Object Detection

Assignment 6 - Bonus for the quick

This was all to easy and you are bored? Go on and try out Apples example game…and maybe you can create your own version of it!

Bonus: Assignment 6 - Build, Run and Play around with Apples SwiftShot Game

References

ARKit Documentation