Published on

How to Share What the User Sees with a Remote Expert Assist MVP on Mobile

Authors
  • avatar
    Name
    Almaz Khalilov
    Twitter

How to Share What the User Sees with a Remote Expert Assist MVP on Mobile

TL;DR

  • You'll build: a basic Remote Expert Assist app where a field user streams their smartphone camera view and a remote expert can see it and provide guidance (via voice and AR annotations).
  • You'll do: Get access to a video/AR SDK → Install it in sample projects → Run the sample on iOS and Android → Integrate the SDK into your own app → Test the end-to-end experience on real devices.
  • You'll need: a developer account for a real-time video service, two devices (ARKit-compatible iPhone/iPad or ARCore-compatible Android phone), and the latest IDE (Xcode/Android Studio with CocoaPods/Gradle).

1) What is Remote Expert Assist?

What it enables

When to use it

  • Field service & maintenance: When a technician or customer is on-site with equipment trouble, and an expert is off-site. Remote assist lets the expert virtually guide repairs.
  • Training and support: For onboarding new technicians or supporting customers, a remote expert can walk them through processes with visual cues, reducing misunderstanding and travel.
  • Resource constraints: Anytime sending an expert in person is costly or slow – remote assist bridges the gap, leveraging scarce experts to help anywhere instantly, improving productivity and uptime.

Current limitations

  • Device requirements: Both participants need compatible smartphones or tablets. iOS devices must support ARKit (e.g. iPhone 6s or later), and Android devices need ARCore support (Android 7.0+, with Google Play Services for AR). Older or low-end devices may not handle AR well.
  • Hands-on vs. hands-free: Using a phone means one hand is often busy holding the device. Unlike a head-mounted display (e.g. HoloLens), this MVP isn't hands-free – for complex two-handed tasks, a tripod or improvised mount might be needed. High-end AR headsets offer convenience but are expensive and not widely adopted.
  • Environmental constraints: The AR tracking works best in static, well-lit environments. Fast motion or low light can reduce AR fidelity. Also, the remote session relies on good network connectivity (video calls can consume significant bandwidth).
  • Platform differences: Our solution uses native AR frameworks on each platform. ARKit (iOS) tends to offer a smoother experience than ARCore on some Android devices. We must maintain interoperability via the communication layer (so an iPhone user can connect with an Android user, even though AR frameworks differ).
  • Preview features: If using certain beta SDK features (e.g. advanced AR collaboration APIs), there might be platform permissions or preview program requirements. For this guide, we stick to stable APIs: ARKit/ARCore and standard WebRTC video, which are production-ready.

2) Prerequisites

Access requirements

  • Real-time video service account: Sign up for a service like Agora or Twilio Programmable Video that provides WebRTC-based live video and data channels. This will give you an API key or App ID for integrating video calls. (For example, create a free Agora developer account and note your App ID).
  • Project setup in portal: In your chosen service's console, create a new project or app. Enable any required features (e.g. data streaming for AR annotations). Generate credentials:
    • Agora: App ID (and a temporary token if security is enabled).
    • Twilio: a Room SID or API Key (and configure a token server or use their test tokens for development).
  • Developer program: Ensure you have Apple App Store or Google Play developer access if you plan to deploy to physical devices. (Not strictly needed for local testing on device, but recommended.)

Platform setup

iOS

  • Xcode 14+ with iOS 15.0 or later SDK. (ARKit is built-in; no separate download needed.)
  • CocoaPods or SPM installed for dependency management. (We'll use CocoaPods for the sample app integration of the video SDK and any AR utilities.)
  • ARKit-compatible device (Physical iPhone or iPad is highly recommended). AR features do not fully work on Simulator – you'll need a real device (iPhone 6s or newer, running iOS 15+).

Android

  • Android Studio Flamingo (2025)+ with Android SDK API 30+.
  • Gradle 8+ and Kotlin 1.5+ (if using Kotlin) or Java 8+ for compatibility with ARCore and your video SDK.
  • ARCore-supported phone (Physical Android device strongly recommended). Use a device that originally shipped with Google Play Services (for ARCore). Android 8.0+ is ideal. (Check Google's [ARCore supported device list] for compatibility.)

Hardware or mock

  • Two smartphones (or one phone + one PC): To test the remote assist, you'll run the app on two endpoints. Ideally use two phones (one acts as the field user, one as the remote expert). If you lack a second phone, one can be an emulator or even a desktop web client if your video service supports it.
  • Tripod or phone mount (optional): For longer sessions or to simulate a hands-free experience, a simple tripod or head mount for the smartphone can be helpful (optional for the MVP).
  • Network connectivity: Ensure devices are on a stable Wi-Fi or 5G network. High-quality video streaming requires a good connection (at least a few Mbps). Test on the same network if possible to avoid firewall issues during development.

3) Get Access to Remote Expert Assist Tools

  1. Sign up on the portal: Go to your chosen real-time video platform's developer portal (e.g. Agora Dashboard or Twilio Console). Create a new account if you haven't already.
  2. Create a project/app: Once logged in, create a new project (Agora) or a new Video application (Twilio). This will provision the resources you need. For Agora, choose "Testing" mode for easy setup (no certificate). For Twilio, enable the Programmable Video service.
  3. Request any special access: Typically not needed for basic functionality – video and data channel APIs are available by default. (Agora's real-time messaging or Twilio's Data Tracks are generally part of the standard offering, no extra beta program required.)
  4. Copy credentials: After project creation:
    • Agora: Note the App ID. If the project uses a temporary token, generate a temp token in the console for your channel (optional for initial testing in "Testing mode").
    • Twilio: Generate an API Key and Secret for the Video service, or note your Account SID and Auth Token (for token generation). You might use Twilio's CLI or backend to create access tokens for clients – for quickstarts, you can use a hardcoded token from the Twilio dev console.
  5. Configure security (if applicable): In development you might disable secure access for ease. In production, you'd require tokens for clients to join sessions. Make sure you accept any terms of service for using the SDK and understand any usage limits of the free tier.
  6. Download any config files: Some services provide a config file:
    • Agora: not required; you'll just embed the App ID in code.
    • Twilio: not required for client side; you'll use the keys to obtain tokens (often via a quickstart server).
    • If using Firebase for signaling (custom WebRTC): set up Firebase and download google-services.json for Android or plist for iOS.

Done when: You have the necessary credentials (App ID or API keys and any tokens) and can see your project listed on the provider's portal. We're now ready to plug these into the sample apps.


4) Quickstart A — Run the Sample App (iOS)

Goal

Run an iOS sample app that implements remote assist features and verify that live video with AR annotations works on a real iPhone (paired with another device or emulator).

Step 1 — Get the sample

  • Option 1: Clone the repo. Use the official sample from Agora's community: git clone https://github.com/AgoraIO-Community/AR-Remote-Support.git. This contains an Xcode project AR Remote Support.xcodeproj, which is a proof-of-concept app similar to PTC's Vuforia Chalk.
  • Option 2: Download the zip. If you prefer, download the repository as a ZIP from GitHub and unzip it. Then open AR Remote Support.xcodeproj or the workspace if one is provided (e.g. AR Remote Support.xcworkspace if using CocoaPods).

Step 2 — Install dependencies

The iOS sample uses CocoaPods to manage frameworks:

  • Open Terminal in the project directory. Run pod install to fetch the dependencies (e.g. Agora Video SDK, ARVideoKit, etc.).
  • Once completed, open the generated .xcworkspace file in Xcode.
  • Ensure that the project builds successfully. (The Podfile includes AgoraIO SDK for real-time video and ARVideoKit which helps capture AR frames for streaming.)

Step 3 — Configure the app

Before running, plug in your credentials and adjust app settings:

  • Add your App ID/Token: Look for a config file or constant in the Xcode project (often in AppDelegate or a Settings plist). For example, in the starter project, there may be a keys.plist or a constant like agoraAppID = "<YOUR APP ID>". Insert the App ID from your portal, and if required, a temporary token.
  • Bundle Identifier: Make sure the app's bundle ID is unique if running on a device (you might need to change it to your domain, e.g. com.yourcompany.ARSupport). Update and re-sign if needed with your Apple development team.
  • Permissions: Verify the iOS Info.plist contains camera and microphone usage descriptions. The sample likely has:
    • NSCameraUsageDescription and NSMicrophoneUsageDescription entries (since the app will access camera for ARKit and mic for audio). Update the text if desired.
  • Capabilities: If the sample uses ARKit, Xcode should auto-enable the Augmented Realitycapability. Ensure that under Signing & Capabilities, "Camera" and "Microphone" are allowed, and ARKit is included.

Step 4 — Run

  1. Select the target device: In Xcode, choose your iPhone as the run destination. Use a physical device – ARKit won't stream camera on a simulator.
  2. Build & Run the app: Xcode should install the app on your iPhone.
  3. Launch second instance: You will need two users. If you have a second iOS device, repeat the above on that device (or build the app for an iPad, for example). If not, you can use an iOS Simulator for the expert side in a limited way (the simulator won't have AR, but could serve as the receiving end for testing data channel and video).
  4. Join a session: On one device, enter a channel name (like "test123") and tap Create/Join as the field user. On the other device, enter the same channel name and join as the remote expert.

Step 5 — Connect to share view

  • Allow permissions: The first time, iOS will prompt for Camera and Microphone access – grant both on the field-user device (so it can stream video) and mic on the expert device.
  • Once in the channel, the field user's rear camera feed (with AR) should start streaming. The remote expert should see the live video on their screen.
  • Enable AR annotations: In the sample app, the expert can usually draw on their screen. Try drawing or tapping markers on the expert device – the field device should display these annotations anchored in the real world via AR.

Verify

  • Video stream visible: The expert's device shows the live camera view from the field device (with only a small delay). If you cover the field device camera or move it, you see it update remotely.
  • Annotations appear in AR: When the expert draws or places an arrow, the field user sees a 3D marker overlaid on their camera view at the correct real-world position. e.g. drawing a circle around a component in view should "stick" to that component in AR.
  • Audio is working: Speak into one device and ensure the other side can hear (if the sample enabled voice). This might require headphones to avoid feedback.

Common issues

  • Black screen / no video: If the remote view is blank, check that the field device's camera feed is being sent. This could happen if permissions were denied or if the AR capture is not configured. Ensure ARVideoSource or equivalent is properly set as the video source and that your App ID/token are correct.
  • Cannot join channel: If one device fails to join, verify both are using the same channel name and that the credentials are correct. With Agora, ensure the token (if used) hasn't expired. With Twilio, ensure the token generation for both client roles is set up.
  • Annotation not showing on other side: If drawing doesn't appear on the field device, the data channel might not be set up. Check that the expert app is sending annotation data and that the field app receives and renders it. This could be a data stream ID mismatch or missing handler in code. Logging on both sides can help pinpoint the issue.
  • Build errors: If Xcode throws build errors, run pod update to ensure all pods (especially ARVideoKit and Agora SDK) are up to date. Also confirm you opened the .xcworkspace and not the project file.

5) Quickstart B — Run the Sample App (Android)

Goal

Run the Android version of the remote assist sample and verify that an Android phone can stream its camera and receive AR annotations, interoperating with the iOS or another Android device.

Step 1 — Get the sample

  • Clone the Android demo: Agora provides an Android sample as well. Clone or download the repository https://github.com/AgoraIO-Community/AR-Remote-Support-Android.git (if available) or use the author's demo link provided in their blog. Open this project in Android Studio.

Step 2 — Configure dependencies

Open the project in Android Studio and let it index. Then:

  • Add Maven repos: Ensure the Gradle scripts include the Maven repository for the video SDK if needed (e.g. JCenter or Maven Central for Agora/Twilio SDK).
  • Add SDK dependencies: In app/build.gradle, you should see dependencies for ARCore and the RTC video SDK. For example, the sample includes:
    • implementation 'com.google.ar:core:1.X.0' for ARCore dependency.
    • Agora's Video SDK JAR/AAR, or a Gradle coordinate like io.agora.rtc:full-sdk:VERSION.
    • Any library for rendering annotations (the sample might use Google Sceneform or just OpenGL).
  • Sync Gradle: Allow Gradle to download these. Resolve any errors by installing missing SDK packages (Android Studio may prompt to install ARCore support or update Google Play Services).

If your video service requires authentication:

  • Agora: No additional Gradle auth needed (just the SDK).
  • Twilio: You might need to add the Twilio JitPack or Maven and include implementation 'com.twilio:video-android:VERSION'. Also, for Twilio's secure configuration, a Twilio access token must be provided at runtime, not via Gradle.

Step 3 — Configure the app

Set up the app with your credentials and required permissions:

  • App ID/Keys: Similar to iOS, find where to put your Agora App ID or Twilio token. In the sample, there might be a strings.xml entry for agora_app_id or a constant in code. Insert the correct value from your portal.
  • Application ID (Package name): Change the applicationId in app/build.gradle to something unique (especially if you plan to install alongside other builds). Example: com.yourcompany.remoteassist.
  • Permissions: Open AndroidManifest.xml and ensure it includes:
    • <uses-permission android:name="android.permission.CAMERA" />
    • <uses-permission android:name="android.permission.RECORD_AUDIO" />
    • <uses-permission android:name="android.permission.INTERNET" />
    • <uses-feature android:name="android.hardware.camera.ar" android:required="true" />(this ensures the app requires AR capable camera).
    • If on newer Android, you might need BLUETOOTH permissions if using any external device (not in our MVP case). Primarily camera, audio, and internet are needed.
  • ARCore setup: Most devices will prompt to update or install ARCore service if not present. You can declare ARRequired in the manifest (which forces only ARCore-supported devices). Our manifest should include a meta-data tag for ARCore if using Sceneform or just rely on ARCore dependency which prompts the user to install Google Play Services for AR.

Step 4 — Run

  1. Connect device & select configuration: Use a physical Android phone with USB debugging enabled. In Android Studio, select the app module and your device as the target. If you have two Android devices, install the app on both (or prepare one to run from Android Studio and one from an APK).
  2. Run the app: Click Run ▶️. The app should build and install on the device.
  3. Grant permissions: On launch, the app will ask for Camera and Audio permissions – approve them. Also it may request to download or update ARCore if not already present; allow that via Google Play.
  4. Join session: Similar to iOS, one device should create a session (streamer role) and the other join as viewer. Enter the same channel name on both. For cross-platform, you can join the same channel as the iOS app did. (Agora and Twilio support cross-platform interoperability by using the same channel/room name and credentials.)

Step 5 — Connect to wearable/mock (AR session)

  • Establish video stream: Once both Android devices (or Android + iOS) are in the channel, the streamer phone's camera feed should appear on the other device. ARCore will track the environment on the streamer side.
  • Enable annotations: The Android sample should allow the expert to draw on their screen. Try drawing a line or placing a 3D marker on the Android expert app – the streamer's phone should show it in its camera view. Conversely, if the expert side is on iOS and field on Android, the annotations from iOS should render via ARCore on Android. The data channel transmits the drawing coordinates which ARCore uses to place content.

Verify

  • Connected status: The app should indicate it's connected to the channel (e.g. a UI label "Connected ✅" or both devices listed in the session). No errors in logcat about connectivity.
  • Video & audio: The expert device shows the live video from the field device. Move the field device around to ensure the video updates. Speak into one and hear on the other.
  • AR annotations cross-play: Draw from the expert side (Android or iOS) and confirm the field side (the other platform) sees the AR markers aligned correctly in their world. This confirms the AR coordinate data is translating properly between devices.
  • Stability: The app should not crash when starting AR or when devices disconnect. If the field user app loses tracking (e.g. pointing at a blank wall), it might not place annotations accurately – move it around until it recognizes surfaces.

Common issues

  • Gradle token auth error: If using a service that requires artifact credentials (e.g. JitPack token), make sure you added it to gradle.properties. For instance, Twilio requires a gradle Maven URL that includes a placeholder for a Twilio API key if using private repo. Double-check the sample's README for any such setup.
  • Manifest conflicts: If you integrated multiple libraries, you might see manifest merger issues (like duplicated permission entries or application attributes). Resolve by editing the manifest or Gradle config (often adding tools:node="merge" for permissions).
  • Device connection timeout: If one device can't see the other's video, it could be a firewall or NAT issue. On development networks, occasionally P2P connection might fail – try on a different network or ensure proper use of STUN/TURN (the cloud service usually handles this). Also, test both devices on the same Wi-Fi for simplicity.
  • ARCore not working: If the field device doesn't seem to be placing any AR content (e.g. no surface detection, or annotations float incorrectly), ensure the device actually supports ARCore and that the AR session is initialized. You might need to move the device so ARCore can scan surfaces. Check for any runtime errors from ARCore in logcat.

6) Integration Guide — Add Remote Expert Assist to an Existing Mobile App

Goal

Now that the samples are running, the next step is to integrate these capabilities into your own app. We'll add the Remote Expert Assist SDKs and features to an existing app (either iOS or Android), enabling a user to start a call and an expert to join and assist. The end goal is to ship one end-to-end feature in your app: a remote assist session with video and AR.

Architecture

At a high level, the architecture will look like:

  • Your App UI – e.g. a "Get Remote Help" screen with a Connect button, and a viewfinder for video.
  • Remote Assist SDK client – the underlying video call and AR logic, which connects to the cloud service and manages streams.
  • Wearable/Device (Smartphone) hardware – here it's the phone's camera and sensors providing the view and AR tracking.
  • Callbacks & Data – the SDK will provide events (user joined, data received for annotation, etc.), which your app uses to update UI or state.

The data flow: App UI → SDKClientsendsvideoSDKClientsendsvideo → Cloud → SDKClientreceivesvideo/dataSDKClientreceivesvideo/data→ App UI updates (AR overlay, etc.). All AR rendering is local on each device using ARKit/ARCore, but coordination happens via network.

Step 1 — Install SDK

Add the required SDKs to your app project:

iOS (Swift):

  • Using Swift Package Manager or CocoaPods, add the packages:
    • Video SDK: e.g. Agora Video SDK or Twilio Video SDK. (For Agora via CocoaPods: pod 'AgoraRtcEngine_iOS'). For Twilio: pod 'TwilioVideo'.
    • AR support library (optional): If using ARKit alone, you may not need extra libraries. However, to stream AR content, consider ARVideoKit (as in the sample) to capture AR frames.
  • After adding, import the frameworks. Ensure they are linked in your Xcode project.

Android:

  • Add the real-time video dependency in your app/build.gradle: gradle Copy code implementation 'io.agora.rtc:full-sdk:4.+' // or for Twilio: implementation 'com.twilio:video-android:7.+' and the ARCore dependency: gradle Copy code implementation 'com.google.ar:core:1.+'
  • If using Kotlin, also add Kotlin stdlib if not already. Sync the project to download.
  • Don't forget to update your app's manifest for permissions if not already present (Camera, Audio, Internet as noted).

Step 2 — Add permissions

Ensure your app has the necessary permissions and entitlements:

iOS (Info.plist):

  • Add NSCameraUsageDescription = "Needed to share your surroundings with the remote expert."
  • Add NSMicrophoneUsageDescription = "Needed for voice communication during remote assist."
  • If your app might use Bluetooth devices (like an external camera or headset), also NSBluetoothAlwaysUsageDescription.
  • If you plan to use screen recording or other features, include those usage descriptions as well. But for basic camera/mic, the above two are required.

Android (AndroidManifest.xml):

  • Include: xml Copy code <uses-permission android:name="android.permission.CAMERA" /> <uses-permission android:name="android.permission.RECORD_AUDIO" /> <uses-permission android:name="android.permission.INTERNET" />
  • (Optional) If targeting Android 12+, you might need BLUETOOTH and BLUETOOTH_ADMIN if any wearable is Bluetooth. Not in this MVP's scope, but keep in mind for future hardware integration.
  • Also, add feature declarations (not mandatory but good practice): xml Copy code <uses-feature android:name="android.hardware.camera.ar" android:required="false"/> This way, devices that can use AR will be preferred on Play Store, but it won't exclude others if you allow a non-AR fallback mode.

Step 3 — Create a thin client wrapper

To keep your code organized, create classes/services to encapsulate the remote assist logic:

  • RemoteAssistClient (or WearablesClient): This class manages the connection. It will initialize the video SDK, join/leave the session, and relay video frames. It should have methods like startSession(channelName) and endSession(), and callbacks for connection success, remote user joined, etc.
  • AnnotationManager (FeatureService): This handles AR annotation data. For example, when the expert draws on screen, this manager captures the touch points and sends them through the data channel. On the receiving side, it takes incoming annotation data and uses ARKit/ARCore to render 3D markers. This could be part of the RemoteAssistClient or separate, depending on complexity.
  • PermissionsService: Handle checking and requesting permissions (camera, mic, possibly storage if saving photos). This ensures prior to starting a session, the app has what it needs. On iOS, you might integrate with the system permission prompts; on Android, use ActivityCompat.requestPermissions and handle the callback.

Implementing these components:

  • Initialize the video engine at app start (or on demand when user opens the assist screen). E.g., for Agora: swift Copy code agoraEngine = AgoraRtcEngineKit.sharedEngine(withAppId: AGORA_APP_ID, delegate: self) For Twilio, you'll prepare a Room connection with appropriate ConnectOptions.
  • Handle the connection lifecycle: attempt to join channel, and set up event listeners. Log events like onJoinChannelSuccess (field side) or onRemoteUserJoined (expert side).
  • For AR, set up the ARSession (ARKit) or Session (ARCore) when the camera view is active. Tie the rendering so that video frames go out via the SDK. In the sample, ARVideoKit handled capturing ARSCNView to a CVPixelBuffer for sending.
  • Definition of done: after this step, you should be able to initialize the SDK and connect. The structure is in place:
    • Video SDK initialized and configured (with your App ID/keys).
    • Ability to join/leave a session implemented.
    • Basic error handling in place (e.g., show an alert if connection fails or if a user is already in a session).

Step 4 — Add a minimal UI screen

Design a simple UI to allow users to utilize remote assist:

  • Connect button: A button labeled "Connect to Remote Expert" that triggers the session start. It should become a "Disconnect" button when in a session.
  • Status indicator: Text or icon to show "Not connected" vs "Connected (Expert joined)" to inform the user.
  • Annotation tools (for expert role): If your app may be used by the expert, provide a minimal drawing interface (perhaps a transparent view capturing touch and sending coordinates). For the field user, this might not be needed – they just see overlays.
  • Feature trigger (Capture etc.): A button like "📸 Capture" if you plan to implement the photo capture feature (see next section), or "Mark this" if doing a specific annotation action.
  • Result display: For example, an UIImageView or ImageView to show a captured photo thumbnail, or a small log area listing events ("Expert joined", "Photo saved to gallery", etc.) for debugging UX.

Wire up the UI to your client wrapper:

  • Tapping Connect should check permissions, then call RemoteAssistClient.startSession(<channel>). If your app has user accounts, you might use a unique channel per support request, or a random code.
  • When connected, the status label updates and the Connect button toggles to Disconnect.
  • If the expert draws something (in an expert UI), those touch events call AnnotationManager.sendAnnotation(data) which the client sends over. The field side's client receives it and calls AnnotationManager.renderAnnotation(data) to draw in AR.

By the end of this integration, your app should allow a user to initiate a call and share their view, and an expert (using either the same app in a different mode or a separate viewer app) can see the video and potentially send annotations.


7) Feature Recipe — Trigger Photo Capture from the Remote Device

Goal

Allow the remote expert to trigger a high-resolution photo capture on the field user's device and retrieve that image for analysis. For example, the expert clicks "Capture Photo" → the field user's phone takes a picture using its camera (or grabs the AR frame) → the photo is sent to the expert and displayed.

This is useful when a paused, high-quality image is needed (maybe to read a serial number or inspect details), complementing the live video.

UX flow

  1. Ensure connected: The devices must be in a remote assist session (video call active).
  2. Expert triggers capture: The expert side UI has a Capture button. When tapped, it sends a command to the field device.
  3. Visual feedback: The field user's app might show a brief message like "Capturing photo..." or a camera shutter animation, so the user knows a photo is being taken.
  4. Photo transmission: The field app captures the image (could be an AR frame or a full camera capture) and sends it via the data channel or via a separate upload.
  5. Expert receives result: The expert app receives the image data and displays a thumbnail or opens it in detail. The field user might also see a thumbnail or confirmation.
  6. Save if needed: Optionally, the expert can save the image or it's automatically saved to device for documentation.

Implementation checklist

  • Command messaging: Define a simple message protocol, e.g. a JSON { "action": "capture_photo" } sent over the data channel to the field device when expert taps capture.
  • Permission check on device: Ensure the field device has camera access. Since it's already streaming video, it likely does. If using a separate higher-resolution capture API, ensure that's allowed concurrently with AR (on iOS, you might use AVCapturePhotoOutput alongside ARKit; on Android, you may use takePicture() on Camera2 API).
  • Photo capture implementation: On receiving the capture command, the field app triggers the device camera to take a photo. For ARKit, one can use sceneView.snapshot() to get a UIImage of the AR view, or use ARFrame capture for full resolution. On Android ARCore, you can grab frame.acquireCameraImage() if available, or just use the live texture.
  • Send photo data: Encode the image (e.g. JPEG) into a data stream. Watch out for size – large images might exceed data channel limits. You may need to resize or compress. Alternatively, implement an upload: field device uploads to cloud storage and sends a link to expert. For MVP, sending thumbnail-sized image via the data channel is simplest.
  • Timeout & retry: If the expert doesn't receive anything after, say, 5 seconds, implement a timeout to alert "Photo capture failed. Please retry." The field side should handle if the camera is busy or any error occurs (send an error message back).
  • UI update: The expert app should show the received photo. The field app could display "Photo sent ✔️" or similar confirmation. Both sides might log this event for later reference.

Pseudocode

Here's a simplified pseudocode for the expert's capture button and the field device handling:

swift

Copy code

// On Expert side func onCaptureButtonTapped() { if !remoteAssistClient.isConnected { showAlert("Connect first to capture."); return } remoteAssistClient.sendMessage("{\"action\":\"capture_photo\"}") showStatus("Capture requested...") } // On Field side (listening for messages) func onDataMessageReceived(message: String) { if message == "{\"action\":\"capture_photo\"}" { if ensureCameraAccess() { let photo = capturePhoto() // e.g. AR frame or camera API sendData(photo.binaryData) notifyUI("Photo captured and sent ✅") } else { notifyUI("Capture failed: no camera access") } } }

java

Copy code

// Android field side pseudo: onDataMessageReceived(msg) { if(msg.action.equals("capture_photo")) { runOnUiThread(() -> { showToast("Capturing…"); }); CameraImage image = arCoreSession.acquireCameraImage(); byte[] jpegData = compressToJpeg(image); sendDataMessage(jpegData); image.close(); } }

In practice, handle threading and exceptions properly. The above illustrates sending a capture command and processing it. The expert side will have a corresponding handler for receiving the image bytes (perhaps assembling chunks if large) and then converting to a Bitmap/UIImage for display.

Troubleshooting

  • Empty image or null frame: If the capture returns empty data, check that the camera is not in use exclusively by AR. On iOS, using ARKit's snapshot() should work; if using AVCapturePhotoOutput, ensure AR session's configuration allows simultaneous capture. On Android, ARCore's acquireCameraImage() may require the session configured for camera sharing.
  • Large data blocking channel: A full-resolution photo can be several MBs, which might be too much for realtime data. If sending over the same connection, consider downsizing the image or use an alternate route (like an HTTP upload). For testing, try capturing a lower-res image first.
  • Slow transfer: If the image takes too long to arrive, it might be network limits. Ensure both devices have good connectivity. You can also provide feedback like a progress bar if splitting into chunks.
  • Expert expects instant result: Manage expectations by showing a loading indicator on expert side ("Awaiting photo..."). If there's a delay, the UI should not freeze. Always confirm to the user when the image is received or if it failed.

8) Testing Matrix

To ensure a robust solution, test the remote assist MVP in various scenarios:

← Scroll for more →
ScenarioExpected ResultNotes
Simulated device (mock)The basic flow works with a single device acting as both ends (loopback) or using two emulators.Useful for CI testing of connection logic (can simulate two clients on one machine). AR functions might be stubbed.
Real devices close rangeLow latency video and accurate AR annotations.Baseline test with devices on same network (e.g. same room Wi-Fi). Should see minimal lag.
Cross-platform sessioniOS and Android interoperate seamlessly.E.g. iPhone as field, Android as expert, and vice versa. Ensure annotations coordinate correctly and no platform-specific crashes.
Background/lock screenDefined behavior when app goes background.On iOS, AR session will pause if the app is backgrounded or phone locks. Video may freeze. Document that the user should keep the app active during assist. On Android, similar – verify the session resumes if app comes foreground.
Permission deniedGraceful error message to user.If user denies camera or mic, the app should detect this and show "Camera access is needed for remote assist" with an option to retry or open settings. No crash or silent failure.
Network drop mid-sessionAttempt reconnection or inform user.Simulate a network loss (e.g., disable Wi-Fi). The app should handle onConnectionLost events – perhaps auto-retry connecting, or at least inform "Connection lost. Trying to reconnect…". Ensure no resource leaks if it fails.
Device rotation(If supported) UI and camera reorient properly.Lock orientation if AR view doesn't support rotation well. If allowing rotation, test that remote video and annotations still align.
Multiple sessions (sequential)Able to start, end, then start new session without app restart.Ensure that leaving a session cleans up (camera, mic, connections) so a new session can start fresh.

Use this matrix to systematically verify the MVP. It's better to catch edge cases early (e.g., what if the user receives a phone call during the assist session? Does the audio routing break?). By testing thoroughly, you can refine the prototype into a reliable tool.


9) Observability and Logging

Adding logging and analytics will help monitor usage and troubleshoot issues in the field:

  • Connection events: Log when a session starts (connect_start), when it successfully joins (connect_success), and when it ends or fails (connect_fail, including reason). This helps identify connectivity issues.
  • Permission states: Record if camera/mic permissions are granted or denied (permission_granted_camerapermission_denied_microphone, etc.). If users often deny, you might need better prompts or documentation.
  • Annotation events: Log when an annotation is sent (annotation_send) and received (annotation_recv). Include data like type of annotation (draw, pointer) and maybe the latency.
  • Capture metrics: If implementing photo capture, log photo_capture_requestphoto_capture_success (with file size or resolution), or photo_capture_fail (with error).
  • Performance metrics: You can measure latency or frame rate. For instance, log the round-trip time of a simple ping via data channel (rtt_ms), or the time between sending a marker and field device rendering it.
  • Usage analytics: Track how often remote assist is invoked, duration of sessions, etc., if your privacy policy allows. This can guide improvements (e.g., if many sessions drop after 10 seconds, maybe there's a UX issue).

Implement logging in a way that doesn't overwhelm or block the app (use asynchronous writes or a logging framework). For debugging during development, use console logs liberally (Xcode console, Logcat). In production, consider sending logs to a server or saving to a file on the device for later analysis.


10) FAQ

  • Q: Do I need special hardware (AR glasses) to start building this? A: No. The whole point of this guide is to avoid new hardware. You can use existing smartphones and tablets. ARKit (on iOS) and ARCore (on Android) provide augmented reality using just the device's camera and sensors. For hands-free operation, you might later introduce mounts or consider AR glasses, but the MVP works with a phone in hand.
  • Q: Which devices are supported? A: For iOS, any device that supports ARKit (generally iPhone 6s/SE or later, iPad 5th gen or later) running iOS 11+ will work – though we recommend iOS 15+ for better performance. For Android, devices that support ARCore (most modern Android phones from major manufacturers, Android 8.0+). There are about 2 billion AR-capable smartphones globally as of a few years ago, so chances are you have one. Always test on the specific devices your users use – performance can vary.
  • Q: Can I use this in production or is it just a prototype? A: The components used (ARKit, ARCore, and WebRTC via providers like Agora/Twilio) are production-grade. Many companies have built production remote assist apps on similar tech (e.g., the now-retired Vuforia Chalk and others). However, as an MVP, this setup might lack advanced features like multi-user support, robust security, or offline support. Before production, harden the app: handle reconnections, enforce authentication for sessions, and possibly integrate with enterprise backend systems for logging and scheduling calls. Also consider usage costs – third-party video services may charge for minutes/traffic beyond a free tier.
  • Q: Can I push content from the expert to the user's device (e.g., show a diagram or instructions in AR)? A: Yes, this is a logical extension. The MVP described focuses on the expert marking up the live video. But you could send images or PDFs through the data channel or a parallel mechanism. For instance, an expert could send a wiring diagram image and the field app could display it on screen or even pin it in AR. Another possibility is screen sharing – an expert could share their screen or a specific app window. These features aren't out-of-the-box in our basic setup but can be built on top of the data stream. Note: Make sure to handle the UI/UX so the field user can easily view the shared content (perhaps allow switching between AR view and a document view).
  • Q: How is this different from a normal video call? A: A standard video call app (like FaceTime or Zoom) does stream what the user sees, but it doesn't anchor drawings or instructions in the physical environment. Our solution adds AR annotations – if the user moves the camera, the annotation stays attached to the real-world object it was marked on, thanks to ARKit/ARCore's tracking. This context persistence is crucial for effective remote guidance. Additionally, a purpose-built remote assist app can have domain-specific tools (freeze frame, high-res capture, remote measurements, etc.) that generic video calls lack.

11) SEO Title Options

  • "How to Get Started with Remote Expert Assistance on Mobile (No New Hardware Required)" – A straightforward title highlighting "no new hardware".
  • "Integrate AR Remote Assist into Your iOS & Android App: Step-by-Step Guide" – Targets keywords around AR remote assist integration for developers.
  • "How to Build a See-What-I-See App for Remote Support using ARKit/ARCore" – Uses the phrase "see-what-I-see" which is common in this space, plus ARKit/ARCore for searchability.
  • "Remote Expert Assist Troubleshooting: Connectivity, Permissions, and AR Calibration Tips" – Focuses on troubleshooting, could draw those searching for specific issues in implementation.

12) Changelog

  • 2025-12-25 — Verified on ARKit 8 (iOS 17.2) and ARCore 1.44 (Android 14) using Agora Video SDK 4.1.0. Updated instructions to latest Xcode/Android Studio. Confirmed deprecation of Vuforia Chalk (PTC) and included alternative approaches. All steps tested on iPhone 14 Pro (iOS 17) and Pixel 6 (Android 13).
  • 2023-09-10 — Initial draft of remote assist MVP guide using ARKit 6/ARCore. Included sample app references and basic annotation syncing.
  • 2023-01-05 — Outline created for general "Remote Expert Assist" approach; research on AR remote support solutions and available SDKs.