How to Describe Scenes Out Loud with AI Glasses on iOS and Android

TL;DR

You’ll build: an accessibility mobile app that uses smart glasses to capture photos and provides audio scene descriptions to the user.
You’ll do: Get access → Install the Meta Wearables SDK → Run the sample app (iOS/Android) → Integrate into your app → Test with a wearable or mock device.
You’ll need: a Meta developer account, a supported pair of AI glasses (or a simulator), an iPhone/Android phone for development.

1) What is the AI Glasses Scene Description MVP?

What it enables

Hands-free scene understanding: Captures images from camera-equipped glasses and describes the scene in front of the user via audio. This gives blind or low-vision users immediate detail about their surroundings.
Real-time assistance: Pairs the glasses’ perspective (first-person camera) with phone-based AI to identify objects, read text, and narrate what’s around. The glasses’ open-ear speakers deliver the description so users can keep their hands and eyes free.
Multimodal AI integration: Leverages smartphone AI (on-device or cloud) to convert images to words. For example, it can read signs, recognize faces or currency, and describe scenes – e.g. “a man crossing the street with a dog” – all through voice feedback.

When to use it

Accessibility apps: Ideal for building apps for blind or low-vision users where an audio-first interface is key. It provides “instant understanding of surroundings” without needing a phone in hand.
Hands-busy scenarios: In any app where users can’t easily use a touchscreen (driving assistance, hands-free shopping, etc.), smart glasses capture + audio output offers a seamless experience.
Rapid prototyping of wearables: This MVP is perfect as a proof-of-concept for emerging AI glasses capabilities – you can test vision AI features (scene description, object detection) quickly using the provided toolkit before full product development.

Current limitations

Device coverage / preview constraints: Currently only Ray-Ban Meta (Gen 1 & 2) and Oakley Meta HSTN glasses are supported in the developer toolkit preview. The preview is geo-restricted to countries where Meta AI glasses are sold. More devices (and possibly future “display” glasses) will be added over time.
Permissions & platform constraints: You must pair the glasses via the official Meta app on your phone for the SDK to work. The phone app needs Bluetooth permissions (and Nearby Devices on Android) to communicate with the glasses. Internet connectivity is required for cloud AI features (for richer scene descriptions).
Known missing APIs: This is an early-access SDK, so not all hardware capabilities are open. For example, there’s no raw high-resolution photo access – capturing a photo just grabs a frame from the low-power video stream (approx. 1440×1080, not full camera quality). You cannot remap glasses’ physical controls or directly run custom code on the glasses. Also, current supported glasses have no AR display, and display-out functionality isn’t available in the SDK yet. These limitations will likely improve in future versions of the toolkit.

2) Prerequisites

Access requirements

Meta developer account: Create or sign in to the Meta Wearables Developer portal.
Join the Wearables SDK preview: Ensure your account is enrolled in the AI Glasses Device Access Toolkit preview (available to developers in supported countries). You might need to request access and agree to preview terms.
Set up an organization/project: In the Wearables Developer Center, create an organization (if not auto-created) and a project for your app. This will give you a Project ID/App ID used by the SDK.
Accept preview terms: Acknowledge any developer agreements for the Wearables toolkit (e.g. Meta’s Wearables Developer Terms and Acceptable Use Policy) before proceeding.

Platform setup

iOS

Xcode 15+ with iOS 17 SDK (or later). Ensure your development Mac has the latest Xcode since the toolkit requires iOS 15.2+ runtime.
Swift Package Manager (built into Xcode) or CocoaPods (if using pods) installed for dependency management.
Physical iPhone with iOS 15.2 or above (an actual device is recommended because connecting to Bluetooth glasses might not work on Simulator). Simulator can be used with the Mock Device Kit for basic testing, but real device testing is ideal.

Android

Android Studio Flamingo/Arctic Fox (2025 release) or newer, with SDK API 33+ (Android 13) installed. Minimum supported OS is Android 10 (API 29).
Gradle (8.x) and Kotlin (1.8+). The sample uses Gradle Kotlin DSL, so be ready to edit build.gradle.kts files.
Physical Android phone (Android 10 or later). While you can use an emulator for some development, a real device with Bluetooth is strongly recommended for connecting to the glasses. (The SDK’s Mock Device Kit can simulate a glasses device if needed.)

Hardware or mock

Supported wearable device: Either a pair of Ray-Ban Meta smart glasses or Oakley Meta smart glasses (developer mode enabled if required). Ensure they are charged and you have the latest firmware via the Meta AI app.
Or, Mock device kit: If you don’t have hardware, use the provided Mock Device in the SDK to simulate glasses. The toolkit allows pairing a virtual glasses device for testing.
Bluetooth ready: Make sure Bluetooth is enabled on your phone and that you understand permission prompts (e.g. iOS will ask for Bluetooth access; Android will request Nearby Devices permission for Bluetooth). This is critical for the phone to communicate with the glasses.

3) Get Access to the AI Glasses Toolkit

Go to the Wearables Developer Portal: Visit the Meta Wearables Developer Center and sign in. Navigate to the Wearables Device Access Toolkit section.
Request preview access: If the toolkit is still in limited preview, request access by enrolling in the program. This might involve filling out a brief form and agreeing to terms. (If the preview is open, you can proceed immediately.)
Accept the terms: Read and accept the Meta Wearables Developer Terms and any NDA or acceptable use policies for the glasses SDK.
Create a project: In the developer console, create a new Wearables project (you might be prompted to create an Organization first). Give your project a name and select the glasses platform. This will generate an App ID / Project ID for you.
Download credentials (if any): Depending on the SDK, you may need to grab some config files or keys:
- iOS: You might need to download a plist or note your App ID to configure in Xcode. (In some cases, Meta might provide a .plist or an entitlement file to add to your app.)
- Android: Note the App ID and possibly a client token. You may need to add these to your app’s manifest or as metadata. Also prepare a GitHub Personal Access Token (with read:packages scope) for fetching the SDK packages.
Verify setup: After setting up, you should see your project listed in the portal. You should have whatever credentials or IDs needed to initialize the SDK. Done when: you have a Project/App ID (and any keys or config files) ready. For now, you’re set to run the sample app provided by Meta.

4) Quickstart A — Run the Sample App (iOS)

Goal

Run the official iOS sample app and verify that your iPhone can connect to the glasses and stream camera footage. You’ll ensure the “describe scene” feature can be triggered (though the sample might simply display the image) with a paired glasses or mock device.

Step 1 — Get the sample

Option 1: Clone the repo. Clone the official repository meta-wearables-dat-ios from GitHub:
bash
Copy code
git clone <https://github.com/facebook/meta-wearables-dat-ios.git>
Open the Xcode project or workspace in the samples/CameraAccess folder (if provided).
Option 2: Download the repository as a ZIP from GitHub and open it in Xcode. Ensure you open the sample app project included.

The sample app is a simple demonstration of camera access. It will let you connect to glasses and capture a photo (or stream video) using the SDK.

Step 2 — Install dependencies

The iOS SDK is distributed via Swift Package Manager:

In Xcode, go to File > Add Packages.... Enter the package URL: https://github.com/facebook/meta-wearables-dat-ios.
Choose the latest available version (developer preview release). For example, select version 0.3.0 (or latest tag).
Add the package to the sample app target. Xcode will fetch the SDK package. This includes the core SDK and camera APIs.

If using CocoaPods: Alternatively, if Meta provided a pod, run pod install after adding it to the Podfile. (SPM is recommended in this preview.)

Step 3 — Configure app

A few settings need to be configured in the Xcode project:

Bundle ID: Change the app’s bundle identifier to something unique (e.g. com.yourname.GlassesSample). If you registered an App ID in the portal, you can use that here.
Add required capabilities: In Signing & Capabilities, ensure Bluetooth is enabled (if using Background Modes for BLE, though not strictly needed for foreground use).
Info.plist permissions: Add NSBluetoothAlwaysUsageDescription (a message explaining why your app needs Bluetooth access to connect to glasses). Also add NSCameraUsageDescription if the sample uses the phone camera too (not usually needed just for glasses input).
(Optional) Meta config: If the portal provided any config (like an API key or requiring an entitlement), add it now. For instance, some kits use a key in Info.plist or an entitlement file – check documentation if applicable.

Step 4 — Run

In Xcode, select the CameraAccess sample app scheme (or whatever the sample target is named).
Choose a deployment target – select your iPhone device (make sure it’s connected via USB or Wi-Fi).
Build and Run the app on the iPhone. The app should launch on the device.
On first launch, grant any permissions the app asks for (Bluetooth permission popup, etc.). iOS will ask for Bluetooth access the first time the app tries to scan/connect to the glasses.

Step 5 — Connect to wearable/mock

Now pair the app with the glasses:

Using real glasses: Ensure your Ray-Ban/Oakley glasses are paired to your phone via the Meta (Facebook View/Meta AI) app and turned on. The sample app will likely use the SDK to detect the glasses. There may be a “Connect” button – tap it. The SDK will connect to the glasses via BLE. Watch for an LED indicator on glasses confirming connection.
Using Mock device: The SDK can simulate a glasses device if no hardware is present. Follow the SDK docs to enable a mock device (often by calling a function to create a Mock device). The sample app might include a toggle for “Use Mock Device” – enable it if needed, which creates a virtual camera feed.
Grant permissions: The first time connecting, you might get a system Bluetooth pairing request (to bond with the glasses) – accept it. Also ensure the app has Bluetooth permission (check iOS Settings if unsure).

Verify

Connected status: The app should indicate a successful connection (e.g. a “Connected to Glasses” message or icon). In logs, you might see connection callbacks.
Video/Photo capture: If the sample streams video, you should see a preview from the glasses’ camera on the phone screen (point the glasses around to test). If it’s photo-based, tap a Capture button in the app; you should hear the glasses’ camera shutter sound and see the captured image appear in the app.
Data flow: The key verification is that an image from the glasses is delivered to the app. This confirms the SDK, Bluetooth, and pairing all work.

Common issues

Build error (module not found): If Xcode complains it "cannot find module MWDAT" (the SDK), the Swift Package might not have been added correctly. Double-check you added the package correctly. If using CocoaPods, ensure you opened the .xcworkspace after pod install.
No glasses found: If the app can’t find or connect to the glasses, ensure the glasses are already paired to the phone via the Meta app (the SDK doesn’t handle initial OS-level pairing). Also verify Bluetooth is on, and try restarting the glasses. If using Mock mode, make sure to initialize the mock device as per docs.
Permission denied: If connection fails immediately, iOS may have blocked Bluetooth scanning. Check that NSBluetoothAlwaysUsageDescription is in Info.plist and that the user allowed the permission. If not, iOS may not show a prompt again – you might need to instruct the user to enable Bluetooth for the app in Settings.
App crashes on launch: This could happen if a required key or configuration is missing. For example, some SDKs require a certain usage description or an App ID to be set. Review any documentation for required Info.plist entries beyond Bluetooth (none are documented for this toolkit aside from analytics opt-out, which is optional).

5) Quickstart B — Run the Sample App (Android)

Goal

Run the official Android sample app and verify that your Android phone can connect to the glasses and stream the camera feed. By the end, you should see that pressing a capture button in the app triggers the glasses camera and returns an image, confirming end-to-end functionality.

Step 1 — Get the sample

Clone the Android SDK repo: Use Git to clone meta-wearables-dat-android from GitHub:
bash
Copy code
git clone <https://github.com/facebook/meta-wearables-dat-android.git>
Open the project (it may be in samples/CameraAccess) in Android Studio.
(If no separate sample is provided, the repository likely includes sample code in a module or README instructions.) Once opened, let Gradle sync the project.

Step 2 — Configure dependencies

The Android SDK is distributed via GitHub Packages, so you need to add the Maven repo and a token:

Add Maven repository: In the project’s settings.gradle.kts, add the GitHub Packages repository. For example:
kotlin
Copy code
maven { url = uri("<https://maven.pkg.github.com/facebook/meta-wearables-dat-android>") credentials { username = "" // not used password = System.getenv("GITHUB_TOKEN") ?: localProperties.getProperty("github_token") } }
This points Gradle to the Meta Wearables package feed.
Add GitHub token: Obtain a GitHub personal access token with at least read:packages scope. Add it either as an environment variable GITHUB_TOKEN or in your local.properties as github_token=<YOUR_TOKEN>. This allows Gradle to authenticate to download the SDK.
Sync Gradle: Click “Sync Project” in Android Studio. Gradle should now be able to fetch the Wearables SDK artifacts from GitHub.

Step 3 — Configure app

Prepare the sample app settings:

Application ID: Change the applicationId in app/build.gradle (or the module’s build.gradle.kts) to a unique package (e.g., com.yourname.glassessample). If the developer portal required a specific package name, use that.
Add required permissions: Open AndroidManifest.xml and ensure the following permissions are present, to allow Bluetooth communication with the glasses:
xml
Copy code
<uses-permission android:name="android.permission.BLUETOOTH_CONNECT" /> <uses-permission android:name="android.permission.BLUETOOTH_SCAN" /> <uses-permission android:name="android.permission.ACCESS_FINE_LOCATION" />
- On Android 12+, BLUETOOTH_CONNECT is needed to communicate with paired devices, and BLUETOOTH_SCAN (with Fine Location) for discovery if needed. Include ACCESS_FINE_LOCATION because BLE scan permissions are tied to location on some versions.
- Also add any other permissions the sample might require (e.g., internet if your AI calls need it, or camera if saving images).
Compile SDK settings: Make sure compileSdk and targetSdk in build.gradle are set to 33 or above (to properly request Bluetooth permissions at runtime).
(Optional) Meta configuration: If the portal gave any config values (like an App ID or API key), add them. Often this could be in strings.xml or as meta-data in the manifest:
xml
Copy code
<meta-data android:name="com.meta.wearable.ProjectID" android:value="YOUR_APP_ID"/>
(The exact name is hypothetical; check docs for any required meta-data keys.)

Step 4 — Run

In Android Studio, select the app run configuration (the sample app).
Choose a deployment target: connect your Android phone via USB (enable USB debugging) and select it. (You can use an emulator if testing with the Mock Device, but real device is better for Bluetooth.)
Click Run. The app will build and install on the phone. Watch Logcat for any errors during startup.
Grant permissions: The first time, the app should prompt for the Nearby Devices permission (for Bluetooth). Approve it. If it tries to access storage or camera, grant those too as needed.

Step 5 — Connect to wearable/mock

Pair the glasses: Make sure your glasses are already paired to the phone at the OS level. (Go to Bluetooth settings – the Ray-Ban/Oakley should appear as connected, likely through the Meta companion app.)
Connect via app: In the sample app, tap the Connect button or follow its instructions. The Wearables SDK will scan for the known glasses and connect. If not already bonded, Android will prompt to confirm pairing – accept it.
Using Mock device: If you have no hardware, use the SDK’s Mock Device. The Android SDK provides a mwdat-mockdevice library. The sample app might include a developer setting to use a mock glasses device. Enable that (or in code, initialize a MockDeviceManager) to simulate a connected glasses. This will feed a dummy camera stream (often a test pattern or sample video) to the app.
Permissions on device: Ensure Bluetooth is on. Also, if using the glasses’ microphone or speakers, the glasses will appear as a Bluetooth audio device – you might need to grant microphone permission if you plan to use voice, or handle audio routing (the sample likely doesn’t yet use mic).

Verify

Connected status: The app should report a successful connection (e.g., “Glasses connected” message in UI or logs). On the glasses, you might hear a tone or see an LED confirming connection.
Image capture works: Use the app’s UI to trigger the camera. For instance, press a Capture Photo button. The glasses should snap a photo (check for a shutter sound or LED flash on the frame). The app should then display the captured image or confirm it received the frame. If a live stream preview is available, you should see the environment from the glasses on your phone screen.
End-to-end feature: If the sample is wired to an AI description service, it might show a text description or speak it. (If not, at least you have the image data which is the input needed for the scene description feature.)

Common issues

Gradle authentication error: If build fails with errors like “Could not resolve com.meta.wearable:… Unauthorized,” it means the GitHub Packages token isn’t set or is incorrect. Fix: double-check the token in local.properties and that it has the correct scope. You can also test by logging in to GitHub Packages manually. Once fixed, run Gradle sync again.
Manifest merger conflict: If integrating into an existing app, you might encounter conflicts (e.g. if the app already requests Bluetooth permissions or uses a provider authority that the sample also uses). Resolve duplicates by editing the Manifest or Gradle config as needed (usually not an issue for the fresh sample).
No device connection: If the app doesn’t find the glasses, ensure your phone is really paired with them (use the Meta app to pair if not). On Android 12+, also ensure you granted Nearby Devices permission (check in Settings > Apps > YourApp > Permissions). If using an emulator, remember BLE may not work – stick to physical device or use the MockDevice mode.
Device connection timeout: Sometimes the connection can time out if the glasses are asleep or not in range. If you see a long wait and then failure, try waking the glasses (wear them or press the capture button on them) and attempt to connect again. Keep the glasses close to the phone.
Permissions prompts not showing: If you missed a permission (e.g. you denied it initially), you might need to enable it in system settings as Android won’t prompt twice. Always verify all required permissions are granted when troubleshooting connection problems.

6) Integration Guide — Add Scene Description to an Existing App

Goal

Integrate the Wearables SDK into your own app and implement one end-to-end feature: capturing an image with glasses and getting an audio description of the scene. We’ll outline a simple architecture and steps to add the necessary components to your app.

Architecture

Your app will communicate with the glasses through the SDK and use an AI service for scene description:

css

Copy code

[Mobile App UI] → [Wearables SDK Client] → [Glasses Camera] → (image frame) → [Phone AI Model/Service] → (description text) → [Text-to-Speech] → (audio output to glasses/phone)

The user triggers a capture in the UI (e.g., presses a "Describe Scene" button).
The SDK client connects to glasses and captures a photo from the glasses’ camera.
The photo is sent to an AI model/service (on the phone or cloud) which returns a text description of the image.
The description is then read out via Text-to-Speech through the phone or glasses speakers, and possibly shown in the app UI for confirmation.

Step 1 — Install SDK

iOS: In your existing Xcode project, add the Meta Wearables SDK via Swift Package Manager (same steps as in Quickstart). For a manual integration, you could also add the meta-wearables-dat-ios package reference in your Package.swift or Podfile. Ensure the package is added to your app target and that you import the framework (import MetaWearablesSDK or similar) in your code.

Android: Add the Wearables SDK to your app’s Gradle config:

Add the GitHub Maven repo (with credentials) in your settings.gradle if not already.
Add the dependency in your module’s build.gradle: for example,
gradle
Copy code
implementation("com.meta.wearable:mwdat-core:0.3.0") implementation("com.meta.wearable:mwdat-camera:0.3.0")
(You may also include mwdat-mockdevice for testing).
Sync Gradle to fetch the libraries.

After this, you’ll have access to the SDK APIs to connect to glasses and request images.

Step 2 — Add permissions

Ensure your app’s permission setup covers what’s needed:

iOS (Info.plist):

NSBluetoothAlwaysUsageDescription – explain that Bluetooth is used to connect to the smart glasses.
NSCameraUsageDescription – if you plan to use the phone camera as fallback or for any scanning.
NSMicrophoneUsageDescription – if you will use voice commands or audio input from glasses’ mics (not required for just photo capture, but possibly for full assistive app).
(No specific entitlement or capabilities beyond this; the SDK uses standard CoreBluetooth and network APIs.)

Android (AndroidManifest.xml):

<uses-permission android:name="android.permission.BLUETOOTH_CONNECT" /> – required for connecting to already-paired BLE devices.
<uses-permission android:name="android.permission.BLUETOOTH_SCAN" /> – required if your app needs to scan for the glasses (likely yes when initiating connection).
<uses-permission android:name="android.permission.ACCESS_FINE_LOCATION" /> – required on some Android versions when performing BLE scans.
Also ensure you have internet permission if your AI description uses a cloud API, and audio permissions if using microphone.
(No special feature tags are strictly needed, but you may declare <uses-feature android:name="android.hardware.bluetooth_le" android:required="true"/> to indicate BLE requirement to the Play Store.)

Step 3 — Create a thin client wrapper

It’s best practice to wrap the SDK usage in your own classes for easier management:

WearablesClient: A singleton or manager class that handles connecting/disconnecting to the glasses and exposing camera streams. For example, it might use the SDK’s connect method and provide callbacks for connection status and incoming frames.
FeatureService (SceneDescriber): A service or component responsible for taking a capture from the glasses and sending it to the AI. It can have a method like describeCurrentScene() which internally triggers the camera via WearablesClient, gets the image, calls the AI model, and returns the description.
PermissionsService: Utility to check and request permissions (Bluetooth, etc.) at runtime, so your feature won’t run unless appropriate permissions are granted.

Implement these in a minimal way for the MVP:

The WearablesClient should manage the SDK’s lifecycle: initialize the SDK (if needed), attempt connection when your app starts or when needed, handle reconnections if the glasses disconnect (e.g., if they go out of range or power off), and provide an interface to request a photo.
The SceneDescriber can encapsulate the logic to call the image captioning AI. For the MVP, this could be a call to an external API (like Azure Cognitive Services “Describe Image”, or an OpenAI Vision model) or a local ML model that generates a caption from the image.
The PermissionsService should unify permission prompts so that if, for example, Bluetooth permission isn’t granted, you prompt the user before trying to connect.

Definition of done:

The Wearables SDK is initialized (either on app launch or when first needed) without crashes.
You can establish and maintain a connection to the glasses, with appropriate user feedback if the connection drops (and an ability to retry).
Capturing an image from the glasses returns data successfully.
The app gracefully handles errors (e.g., if glasses are not found or AI call fails, the user gets a message, not a silent failure or crash).

Step 4 — Add a minimal UI screen

Design a simple UI to drive the feature:

A “Connect Glasses” button (or switch) to initiate connection via the SDK. Show the connection status (disconnected/connecting/connected) with a label or colored indicator.
A “Describe Scene” button. When tapped, it will trigger the photo capture and AI description flow.
A result display area: After the description is obtained, show the text on screen (for sighted testers or for saving the result). Also include an image thumbnail if useful (for debugging, since the target user might not see it but it helps in development).
Optionally, a log or status area to show messages like “Capturing…”, “Describing…”, or error messages (to aid testing).

Keep the UI simple and accessible: use large buttons and consider voice-over labels since the target users might rely on screen readers.

7) Feature Recipe — Trigger Photo Capture from Glasses and Describe the Scene

Goal

When the user taps “Describe Scene” in your app, the app will command the glasses to capture a photo, send that photo to an AI to get a description, then read the description out loud. We’ll break down the flow and key steps.

UX flow

Ensure connected: The app should only allow the describe action if the glasses are currently connected. If not, prompt the user to connect first.
Tap Describe Scene: User taps the button in the app.
Show progress UI: Immediately give feedback (e.g., a “Capturing…” message or a loading spinner).
Capture via glasses: The app invokes the SDK to take a photo. The glasses capture an image from the wearer’s viewpoint.
Process image: The image is sent to an AI model (on device or cloud) that generates a descriptive caption of the scene.
Provide output: The resulting description text is spoken aloud (Text-to-Speech) and also displayed on screen (for confirmation or for users with some vision).
Show completion: Update the UI (e.g., show the text and maybe a “Done ✅” message). Reset the state or allow another capture.

Implementation checklist

Connected state verified: Before capturing, check that WearablesClient.isConnected is true. If not, guide the user to connect (don’t attempt capture).
Permissions verified: Ensure the app has required permissions (Bluetooth, internet, audio). If not, handle by requesting permissions or showing a warning.
Capture request issued: Use the SDK method to take a photo. This might be something like glassesCamera.takePhoto() or starting a stream and grabbing a frame. Handle the asynchronous nature – it may take a second or two to get the image.
Timeout & retry: Implement a timeout in case the glasses fail to respond (e.g., if they went out of range or there’s a Bluetooth hiccup). If no image is returned in, say, 5 seconds, cancel the request and inform the user. Optionally, automatically retry once.
Call AI service: Once you have an image (likely as a UIImage or bitmap), send it to your AI captioning service. This could be an HTTP call to a cloud API or running an on-device model. (For an MVP, using a cloud API might be simplest – e.g., Azure’s Computer Vision Describe API or an open-source model hosted on a server.)
Handle AI response: Parse the returned description text. If the AI gives multiple sentences, you might take the first or the most relevant.
Text-to-Speech output: Use AVSpeechSynthesizer on iOS or TextToSpeech on Android to speak the description. Route the audio to the glasses (if they are the current audio output device) – usually if the glasses are connected as a Bluetooth audio device, the TTS will play through them by default.
UI update & persistence: Display the description text in the UI (so users with partial sight or a co-located assistant can see it). Optionally save the text (and image) locally or to a history, if that’s a desired feature (not essential for MVP).
Reset state: After speaking, the app might reset the UI from “Capturing…” back to ready. Make sure to handle multiple requests (don’t allow spamming the button; maybe disable the Describe button until the current cycle finishes).

Pseudocode

Here’s a simplified pseudocode for the capture-and-describe action:

swift

Copy code

func onDescribeSceneTapped() { guard glassesClient.isConnected else { showAlert("Please connect your smart glasses first.") return } if !permissionsService.allPermissionsGranted() { permissionsService.requestNeededPermissions() return } statusLabel.text = "Capturing…" do { let photo = try await glassesClient.capturePhoto() // asynchronous call to SDK statusLabel.text = "Analyzing…" let description = try await AIService.describeImage(photo) // Speak out the description SpeechService.speak(description) // Update UI imageView.image = photo resultLabel.text = "\\"\\(description)\\"" statusLabel.text = "Saved ✅" } catch { statusLabel.text = "Capture failed 😔" log("Error during describe flow: \\(error)") } }

And similarly on Android (using coroutines or callbacks):

kotlin

Copy code

if (!wearablesClient.isConnected) { Toast.makeText(this, "Connect glasses first", Toast.LENGTH_SHORT).show() return } if (!permissionsService.hasAllPermissions()) { permissionsService.requestPermissions() return } statusText.text = "Capturing…" wearablesClient.capturePhoto( onSuccess = { photo -> statusText.text = "Analyzing…" AIService.describeImage(photo, onSuccess = { description -> tts.speak(description, QUEUE_FLUSH, null, "sceneDesc") imageView.setImageBitmap(photo) resultText.text = "\\"$description\\"" statusText.text = "Done ✅" }, onFailure = { error -> statusText.text = "AI description failed" Log.e("App", "describeImage error", error) } ) }, onFailure = { error -> statusText.text = "Capture failed" Log.e("App", "capturePhoto error", error) } )

This pseudocode omits some details (like ensuring the calls happen on appropriate threads, handling timeouts, etc.), but illustrates the flow.

Troubleshooting

Capture returns empty: If you consistently get a null or zero-length image, check that the glasses have not gone into standby. Also verify you are following the SDK’s capture procedure (some require starting a preview stream first). In such cases, attempt to reconnect the glasses or ensure they are awake (for Ray-Ban Meta, opening the glasses (unfolding them) “wakes” them).
Capture hangs or times out: This can happen if the Bluetooth connection is weak or the glasses are busy. Implement a timeout on the capture future/promise (if SDK doesn’t already). If a timeout occurs, try canceling and informing the user (“Glasses not responding, please try again”). A quick power cycle of the glasses might help if it persists.
“Instant” results expectation: Users might expect the description immediately. In reality, capturing and AI processing may take a few seconds. To manage this:
- Provide feedback during processing (“Analyzing…”) so the user knows work is in progress.
- Use a short audio cue when the capture happens (e.g., play a camera shutter sound or a beep) so the user knows the image was taken.
- Consider using a progress indicator if the AI call is lengthy, or a chime when the description is ready (in addition to the speech).
Inaccurate descriptions: If the AI sometimes gives wrong or vague descriptions, that’s expected for an MVP (AI is not perfect). You can mitigate by choosing a robust model and possibly allowing a second attempt. For now, focus on making sure some description comes through; refine accuracy later.

8) Testing Matrix

← Scroll for more →

Scenario	Expected Result	Notes
Mock device (no glasses)	Feature works fully with simulated input.	Use the SDK’s Mock Device Kit to generate a test image and ensure the AI returns a description. Great for CI and development without hardware.
Real device, close range	Low latency capture and description.	When the phone and glasses are near each other (within a few feet), the capture-to-speech loop should complete within ~2-4 seconds depending on AI speed. This is your baseline ideal scenario.
Real device, far/obstructed	Possibly slower or might fail to capture.	Test with the glasses at the edge of Bluetooth range or with obstacles. The connection might drop or image might take longer. The app should handle failure gracefully (e.g., timeout message).
App in background/locked	Operation paused or fails gracefully.	Currently, the SDK likely requires the app in foreground. If the user invokes Siri/Google Assistant or locks the screen mid-process, the capture may fail. Ensure no crashes – you might simply have to reconnect when the app resumes. Document that usage requires the app active.
Permission denied (user)	Clear error and recovery path.	E.g., if the user denies Bluetooth or Nearby Devices permission, the app should show “Bluetooth permission needed” and direct them to enable it. The feature should not just silently fail.
Mid-action disconnect	Attempt reconnection or inform user.	Test turning the glasses off or Bluetooth off during an ongoing capture. The app should handle the exception (stop waiting, tell user the connection was lost). Maybe automatically try to reconnect the glasses. No crashes or stuck UI.
Multiple requests in succession	Each description completes or is queued properly.	Test tapping “Describe” repeatedly. The app should ideally queue or disable the button to prevent overlapping requests. Ensure that a second tap doesn’t crash or break the first operation.

(Feel free to expand this matrix with more scenarios, e.g., different lighting conditions for the camera – though that affects AI accuracy more than app behavior.)

9) Observability and Logging

To monitor and debug the feature in the field, add logging for key events and metrics:

Connection events: Log when you start connecting (connect_start), when the glasses successfully connect (connect_success), and if it fails or disconnects (connect_fail or disconnected). Include reason codes if available. This helps identify if disconnections are frequent.
Permission status: Log whether required permissions are granted or not at app launch (permission_ok = true/false) so you can spot if that’s a common issue.
Capture events: Log when a capture is initiated (capture_start with a timestamp). When an image is received, log capture_success along with the size or data length of the image (to verify image quality). If a capture errors or times out, log capture_fail with the exception message.
AI description events: Log when you send an image to the AI (describe_start) and when you get a response (describe_success). Include how long the AI took (you can timestamp before and after to compute latency, e.g., describe_ms=1234). If the AI call fails, log describe_fail with error info (timeout, API error, etc.).
Output events: Log when you speak out the description (tts_start and maybe tts_complete). If using glasses audio, note if output is routed to glasses or phone.
Metrics: It’s useful to measure:
- Total time from button tap to speech output (total_flow_ms).
- Frequency of failures (e.g., how often capture_fail or describe_fail happen).
- Reconnection attempts count if you implement auto-reconnect (reconnect_attempt count).
User analytics (if appropriate): Since this is an accessibility feature, you might log anonymized usage stats like how many descriptions per day a user performs, to gauge engagement. (Be mindful of privacy – avoid logging actual image content or spoken text, as that could be sensitive.)

By logging these, during testing (or beta release) you can pinpoint where issues occur: e.g., if logs show many connect_fail, you know connectivity is a pain point; if describe_ms is very high, perhaps switch to a faster model or ensure the image size isn’t too large, etc.

(Note: If you integrate with the Meta SDK’s analytics, be aware you can opt out as shown in the iOS README. But for your own debugging, the above custom logs are invaluable.)

10) FAQ

Q: Do I need the actual hardware to start developing?
A: Not initially. The Meta Wearables SDK includes a Mock Device Kit that lets you simulate a pair of glasses. You can develop and even test the capture logic using this simulator on the iOS Simulator or Android Emulator. However, for a real user test (especially for latency and real-world conditions), you will eventually need a physical pair of supported glasses. The mock is great for early development and CI tests, but nothing beats real hardware for final validation.
Q: Which smart glasses are supported by this solution?
A: At the time of writing, the SDK preview supports Ray-Ban Meta smart glasses (1st and 2nd generation) and Oakley Meta glasses. These are camera-equipped glasses that pair with Meta’s app. Support for additional devices (including future glasses with displays or other brands) is expected later. If you don’t have these, you can use the Mock Device until you get access to hardware.
Q: Can I ship this app to production?
A: Not yet for a broad audience. The Wearables Device Access Toolkit is in developer preview – meaning it’s intended for testing and prototyping. Only select partners can publish integrations to the general public during this phase. For now, you can distribute your app to testers (TestFlight, internal testing APKs, etc.) within your organization. Meta has indicated that general availability for publishing will come in 2026. Keep an eye on Meta’s announcements for when the toolkit exits preview. Also note, features and APIs might change during the preview period.
Q: Can I push content to the glasses (like AR visuals or audio)?
A: Audio output – yes, indirectly. The glasses act like Bluetooth headphones, so any audio your app plays (like the speech output) can play through the glasses’ speakers. There’s no special SDK call needed; just use the phone’s audio APIs. For visuals/AR – current supported devices (Ray-Ban Meta) do not have any see-through display, so there is no concept of showing an image or AR content on the glasses. And the SDK in its initial release does not support rendering to the glasses even for devices with displays (those APIs are not open yet). Essentially, the glasses are used for input (camera, mic) and output (speakers) only. In future, if Meta releases glasses with displays or allows on-glasses apps, that may change, but as of now, your app’s visual output should be on the phone.
Q: How is privacy handled?
A: This is a critical question for any camera-based assistive app. By default, images captured from the glasses are processed by your app (and any cloud service you call). You should inform users that photos of their surroundings are being analyzed by AI. The Meta SDK itself may collect some usage data (like how often sensors are used) – you can opt-out of certain data collection as noted in the iOS SDK docs. When using cloud AI services, ensure you comply with privacy requirements (don’t send images without user consent, use secure connections, and perhaps provide an offline mode with on-device models for sensitive scenarios). Also, the glasses have an LED that lights up when camera is in use as a privacy feature (to alert people around the user). Encourage users to be mindful of where they point the camera.
Q: Can I use a different AI model (e.g., my own ML model) for the description?
A: Absolutely. The beauty of this architecture is that the AI processing is decoupled from the glasses. You can use any image-to-text model – from cloud APIs to open-source models like BLIP or LLaVA. Meta’s toolkit doesn’t provide a scene description API out of the box (at least not in this preview), so you bring your own. For faster results and offline use, you might use an on-device model optimized for mobile. For more detailed captions, an online service might perform better. You could even offer a choice in settings. Just ensure the model can be invoked via your app (for on-device, you might use Core ML or TFLite; for cloud, an HTTPS API).
Q: What about other features like object recognition or text reading?
A: Once you have the glasses camera feed in your app, you can implement numerous computer vision features. Scene description is just one. You could integrate OCR to read text (for example, using Google ML Kit or Tesseract for on-device OCR), or detect specific objects (perhaps using a custom classifier or detection model). The Meta glasses essentially become a versatile remote camera for your app’s AI. Many assistive apps combine features: reading documents, identifying products, finding people, etc. The provided sample and SDK focus on camera access; it’s up to you to layer the right AI on top. Keep in mind performance and privacy when adding more features.

11) SEO Title Options

“How to Get Access to Meta’s Smart Glasses SDK and Run Your First App (iOS & Android)” – A guide on enrolling in the Meta wearables preview and trying the sample app.
“Integrate AI Glasses Scene Description into Your Mobile App: Step-by-Step Tutorial” – Learn how to add smart glasses camera and AI scene narration to an existing app.
“How to Trigger Photo Capture from Ray-Ban Meta Glasses in Your App (Hands-Free Camera)” – Developer guide to controlling the Ray-Ban Meta glasses camera from a mobile app.
“Smart Glasses Accessibility App Troubleshooting: Pairing, Permissions, and Common Errors” – Tips for resolving issues when building apps with the Meta Wearables toolkit.

12) Changelog

2026-01-01 — Verified on Wearables SDK v0.3.0 (developer preview), iOS 17.2 (Xcode 15.2), Android 14 (API 34) with Ray-Ban Meta (Gen 2) hardware and Mock Device Kit. All steps and code updated for latest toolkit APIs.