- Published on
How to Build a "What Am I Looking At?" Feature with Object Detection or Segmentation on Mobile
- Authors

- Name
- Almaz Khalilov
How to Build a 'What Am I Looking At?' Feature with Object Detection or Segmentation on Mobile
TL;DR
- You’ll build: A mobile app feature that identifies what the user sees through a wearable camera, by detecting objects in view and optionally highlighting them.
- You’ll do: Get access → Install SDK → Run sample → Integrate into your app → Test on device/mock
- You’ll need: Meta preview account, Ray-Ban Meta glasses (or Mock kit), Xcode 15 / Android Studio 2022+
1) What is this Feature?
"What am I looking at?" is a computer vision feature that tells users about the objects in front of them. It leverages on-device vision techniques – primarily object detection and image segmentation – to recognize and locate real-world objects through a camera (like AR glasses or a phone).
What it enables
- Identify objects in real time: The app can detect multiple objects in the camera feed and label them (e.g. "cat", "coffee mug", "street sign") with real-time identification. This gives immediate context about the user's environment.
- Highlight precise object regions: Using segmentation, the app can outline the exact shape of an object a user is looking at with pixel-level outlines. For example, it could mask out a product on a shelf or separate a person from the background for AR effects.
- Hands-free assistance: Combined with wearable cameras and audio, it enables experiences like describing surroundings to a visually impaired user or providing info on landmarks the user looks at as a hands-free guide. The feature can speak out object names via the glasses' speakers for a fully hands-free guide.
When to use it
- Use object detection if… you need to quickly identify and localize several items with minimal compute. Bounding boxes are fast and efficient, giving coarse positions of objects which is often enough for counting, tracking, or simple alerts. For example, an app can use detection to name what's in front of the user or to trigger an action when a certain object appears.
- Use image segmentation if… your use-case demands pixel-perfect detail of the scene. Segmentation "wins" when precise shapes or area measurements are required. This is ideal for overlaying AR content behind objects (occlusion), measuring object size, or any scenario where understanding the exact boundaries of an object is critical (e.g. guiding a robot or performing medical image analysis). Segmentation can also separate foreground from background for creative effects (blurring everything except what the user focuses on).
- Often, a combination is best: detect objects first to get class labels, then apply segmentation on a particular object of interest for fine detail. In practice, object detection provides a quick list of "what's there," and segmentation can refine "where exactly it is" if needed.
Current limitations
- Device and SDK constraints: This feature currently relies on the Meta Wearables Device Access Toolkit (SDK) which is in developer preview. Only Meta's supported smart glasses (e.g. Ray-Ban Meta and Oakley Meta HSTN) are accessible in this preview. You must be in a supported country and accepted to the preview program to use the SDK. Moreover, apps built with this SDK cannot yet be shipped publicly (general release is targeted for 2026).
- Recognition scope: The vision models (object detectors/segmenters) have a fixed set of classes they can recognize. If an object isn't in the model's training classes, the app might not identify it or could label it incorrectly. For example, a generic COCO dataset model might label a novel gadget as "unknown" or as the nearest known category. This feature is not magic – it answers "What am I looking at?" only for the categories it was trained on.
- Performance considerations: Image segmentation is more computationally heavy than detection. Real-time segmentation on a mobile device or wearable can be slow or energy-intensive, especially with high-resolution glasses imagery. Developers may need to downsample frames or run models on device neural accelerators. Similarly, streaming video from glasses has limits (720p at 30fps max) due to Bluetooth bandwidth. Expect some latency between capture and identification.
- Permissions and privacy: Accessing a wearable camera requires user consent at multiple levels. The glasses must be paired and the user must allow your app to use them via the Meta AI companion app. Your app also needs Bluetooth permissions to communicate with the device. Additionally, capturing imagery of surroundings raises privacy concerns – apps should handle data securely and respect bystanders' privacy (no unauthorized recording).
2) Prerequisites
Before diving in, ensure you have the necessary accounts, hardware (or simulators), and development tools.
Access requirements
- Meta developer account: Create or log in to the Meta Wearables Developer Center. You'll need to apply for the Wearables Device Access Toolkit preview. Access is currently limited – only approved developers in supported regions can fully use the toolkit.
- Join the preview program: If required, join the appropriate developer organization/team on the portal and agree to any preview NDA or terms. Enable any beta features for the wearables SDK if prompted.
- Create a project/app ID: Once accepted, set up a project in the Wearables Developer Center. Define an App Name/ID (bundle identifier for iOS or package name for Android) that will integrate the glasses. This registers your app with the service.
- Accept terms and download credentials: You may need to accept the Meta Wearables Developer Terms and Acceptable Use Policy. In the portal, create a release channel for testing and add your test users. Download any provided config or keys. For iOS: there is no separate API key, but ensure your app's bundle ID matches the project. For Android: you will need a GitHub personal access token to fetch the SDK packages (details below).
Platform setup
iOS
- Xcode 15+ with iOS 15.2 or later SDK (the minimum OS for Meta glasses support is iOS 15.2). Ensure Swift and Swift Package Manager are updated (the Wearables SDK uses Swift Package distribution).
- CocoaPods (optional): If you prefer CocoaPods or if some sample dependencies use it, have it installed. However, the Meta Wearables SDK for iOS is easily added via Swift Package Manager.
- Physical iPhone (required): Use an actual iPhone or iPad for testing. The glasses connect via Bluetooth to the device, which is not possible to simulate in Xcode Simulator. Testing on a real device is strongly recommended.
Android
- Android Studio (Arctic Fox+) with Android SDK API 29+ (Android 10). Ensure Gradle is up to date. The Wearables SDK supports Android 10 and above.
- Kotlin 1.5+: The sample and integration will likely use Kotlin (Java is also usable, but the SDK docs use Kotlin examples). Set your project's language level appropriately.
- Physical Android phone (recommended): Use a real Android device running Android 10 or higher. While you can compile on an emulator, you cannot pair or stream from physical glasses to an emulator. Bluetooth and camera streaming require a real phone. (If using the Mock Device Kit, an emulator might simulate some behavior, but a real device gives the best test.)
Hardware or mock
- Meta smart glasses (recommended): A pair of Ray-Ban Meta AI glasses or Oakley Meta glasses is needed to fully test the feature on hardware. You should set up the glasses normally with the Meta AI companion app first.
- Mock Device Kit (optional): If you don't have the hardware, the Meta SDK provides a Mock Device Kit to simulate a glasses device. This allows basic testing of your integration on device or emulator. You can pair a "virtual device" and simulate its camera stream and state.
- Bluetooth enabled: Make sure the smartphone's Bluetooth is turned on. Understand the platform's permission prompts for Bluetooth (e.g., iOS will require user permission via a usage description, Android needs runtime permission on Android 12+). Without Bluetooth, your app cannot communicate with the wearable.
- Meta AI app installed: Install the Meta AI companion app (the official app for the glasses) on your test phone. The SDK relies on this app to bridge connections and permissions to the glasses. Log in to the Meta AI app with the same account that has developer access, and pair it with your glasses.
3) Get Access to the Wearables SDK
Now that prerequisites are set, let's obtain the SDK and permissions for using the glasses:
- Go to the Wearables Developer Center: Visit the Meta Wearables Developer Center and sign in. In the dashboard, locate the Device Access Toolkit section.
- Request preview access: Follow the prompts to apply for the Wearables Device Access Toolkit developer preview (if you haven't already). You might need to fill out a form about your intended use cases. Approval could take time; ensure your Meta account email is verified.
- Accept the agreement: Once approved, you'll see the toolkit available in your account. Accept any Terms of Service for the preview. Meta will require agreement to specific policies for using the glasses SDK.
- Create a project: In the developer center, create a new Wearables project for your app. You will define an app name and platform (iOS, Android, or both). Note the App ID or any GUID it provides. This project ties your mobile app to the glasses permissions.
- Set up organization and testers: If not done, create or join an organization on the portal (this is used to manage who can test your integration). Inside your project, create a release channel (e.g., "Internal Test") and add yourself or testers to it. This will allow the Meta AI app to recognize your app as allowed to connect.
- Download credentials/config: The current SDK doesn't require an API key file for on-device use, but you may need certain identifiers:
- iOS: No config file is needed to download. Ensure your Xcode project's bundle ID exactly matches the App ID from the developer center, and that your provisioning profile includes associated domains if specified by Meta (check the docs; as of now, connection is handled via the Meta AI app and Bluetooth, so no special entitlements file is required).
- Android: No JSON config file is required. However, you will need a GitHub Personal Access Token to fetch the SDK packages (since the Android SDK is hosted on GitHub Packages). Generate a classic token with
read:packagesscope from GitHub and note it for later. Also ensure your app's applicationId matches what you registered.
- Meta AI app linking: Open the Meta AI companion app on your phone. In settings, enable Developer Mode for your glasses if available. This might involve entering a code from the dev center into the app. Developer Mode ensures the glasses can stream to test apps. Also, after installing your test app, the Meta AI app should prompt to "Allow <YourApp> to access glasses" – you'll handle that during testing.
Done when: you have the Wearables SDK available (via Swift Package or Gradle), your Meta developer project is configured, and you possess any tokens or IDs needed. You should see your created app/project listed in the Wearables portal, and your glasses should be visible/linked in the Meta AI app ready for developer use.
4) Quickstart A — Run the Sample App (iOS)
Goal
Run the official iOS sample app to verify that your iPhone can connect to the glasses and capture a camera feed. This will confirm the SDK and device link are working, before you add object recognition logic.
Step 1 — Get the sample
- Option 1: Clone the repo. Clone the official Meta Wearables SDK repository:
Open the Xcode project found undergit clone <https://github.com/facebook/meta-wearables-dat-ios.git>samples/CameraAccessin the repository. - Option 2: Download from Developer Center. On the Wearables Developer Center, find the iOS sample app link (if provided as a zip or Xcode project) and download it. Open the project in Xcode.
The sample app "CameraAccess" demonstrates connecting to glasses and streaming or taking a photo.
Step 2 — Install dependencies
The sample uses the Wearables SDK package. If it's not already configured, add it:
- Swift Package Manager: In Xcode, go to File → Add Packages.... Enter the package URL
https://github.com/facebook/meta-wearables-dat-ios(or select it if Xcode suggests it). Choose the latest version (e.g. 0.x.y). Add the package to the sample app target. This will download the SDK. (The sample project may have a Package.swift or Swift Package reference already – if so, Xcode should prompt to resolve it. You might just need to open the.xcworkspaceif CocoaPods was used.) - CocoaPods (if needed): If the sample came with a Podfile instead, run
pod installto fetch dependencies. (The Meta SDK might not be on CocoaPods yet; SPM is the primary method.)
After this step, ensure the project builds without missing package errors. The Wearables SDK includes modules for core connection and camera control.
Step 3 — Configure app
Before running, adjust a few settings in the sample:
- Bundle ID: In Xcode, select the project target → Signing & Capabilities. Change the Bundle Identifier to the one you registered in the developer portal (e.g.
com.yourname.WhatAmILookingAt). Ensure a valid Team/Provisioning Profile is set so you can run on device. - Info.plist permissions: Verify the sample's Info.plist has a NSBluetoothAlwaysUsageDescription key with a usage message (it should, if not, add it). For example: "This app uses Bluetooth to connect to AR glasses." This ensures iOS will prompt for Bluetooth access. The sample might include this already. (No camera/microphone description needed for glasses usage.)
- Capabilities: If the sample uses Background modes (for Bluetooth), check that Background Fetch or Uses Bluetooth LE accessories are enabled if required. (In many cases, not needed unless you want the connection to persist when app is in background.)
- Meta config: If the documentation or sample code requires an organization or project ID, set those. For example, some sample code might have placeholders like
let ORG_ID = "your-org-id"or similar. Fill them with values from the developer center if applicable.
Step 4 — Run
- Select the target device: In Xcode's toolbar, select your iPhone as the run destination (connect your device via USB or network and ensure it's trusted).
- Build & Run: Click Run (▲). Xcode will build the app and install it on your iPhone. If asked for codesign permissions, approve. Watch for the app launching on your phone.
- Allow Bluetooth: On first launch, you should get an iOS permission prompt: "App wants to use Bluetooth." Accept this, or the app won't be able to find the glasses.
The app's UI should appear, likely with a connect button or similar.
Step 5 — Connect to wearable/mock
Now pair the app with the glasses (or mock):
- Pair with glasses: Make sure your Ray-Ban Meta glasses are powered on and connected to the Meta AI companion app on the phone. In the sample app, tap the Connect button (or whatever UI initiates connection). The SDK will hand off to the Meta AI app to handle pairing. You might see a system dialog or the Meta AI app UI asking "Allow this app to access glasses camera?" – confirm it. Once allowed, the sample app should indicate it's connected.
- Use Mock mode (if no hardware): The sample may allow adding a Mock Device. Ensure you added the
mwdat-mockdevicemodule in Step 2 if needed. In the app, you might see an option to connect to a virtual device. This will simulate a camera. (You may have to enable Developer Mode in the Meta AI app and add a mock device via the portal.) - Grant any other permissions: If the sample triggers a camera or photo library permission (unlikely for this scenario), grant those as well. Primarily, Bluetooth permission is the key one. If using mock, allow any prompts that appear.
Verify
- Connected status: The app should update to show it's connected to the glasses (e.g. a status label "Glasses Connected" or a green indicator).
- Camera streaming or capture works: Try the feature. If it's a live view, you might see video from the glasses on your phone screen. If it's a capture button, press it — you should get a photo appear in the app (image taken through the glasses' camera). This confirms the app can receive frames from the wearable.
- (No object labels yet): The stock sample likely doesn't perform object recognition on the images — it just displays them. That's okay; our concern here is that the pipeline from glasses to app is working.
Common issues
- Build error ("No such module" or code sign failure): If Xcode complains it cannot find
MetaWearablesDATmodule, ensure the Swift Package was added properly and try File → Packages → Resolve Package. For code signing issues, make sure your personal team is selected and the bundle ID is unique (Apple doesn't allow duplicates). - App can't find glasses: If the sample app says "No device found" or timeouts when connecting, check that your glasses are connected in the Meta AI app first. The Meta AI app must be running (in background is fine) and the glasses must be paired to the phone's Bluetooth. Also verify your app is listed as a tester in the portal (otherwise the Meta AI app will refuse the connection). If all else fails, toggle Bluetooth off/on and reboot the glasses, then try again.
- Permission denied: If you declined the Bluetooth permission initially, the app won't connect. iOS doesn't allow prompting twice quickly; you'll need to go to Settings → YourApp → Bluetooth and enable it, then relaunch the app. Similarly, if the Meta AI app prompt to authorize your app was denied, you may need to remove the app and reinstall (or find the setting in Meta AI app to grant permission).
- Meta AI app not responding: Sometimes the handoff might not occur if the Meta AI app isn't running in background. Launch the Meta AI app manually, ensure the glasses are connected there, then switch back to your test app and attempt connect again.
5) Quickstart B — Run the Sample App (Android)
Goal
Run the official Android sample app to verify that an Android phone can connect to the glasses and stream imagery. This will mirror the iOS quickstart on Android Studio.
Step 1 — Get the sample
- Clone the repo: Clone the Android SDK repository:
In Android Studio, choose File → Open and open thegit clone <https://github.com/facebook/meta-wearables-dat-android.git>samples/CameraAccessproject within the cloned repository. - (If provided via portal:) Alternatively, download any Android sample app zip from the developer center and open it in Android Studio.
The sample project will contain an Android app module demonstrating core functions (connecting to glasses, starting a camera session).
Step 2 — Configure dependencies
The Android SDK is distributed via GitHub Packages, which requires authentication:
- Add Maven repository: In the project's
settings.gradleor Gradle settings, add the GitHub Packages Maven URL. The sample's README provides the snippet – essentially:
Make sure yourmaven { url = uri("<https://maven.pkg.github.com/facebook/meta-wearables-dat-android>") credentials { username = "" // not needed password = System.getenv("GITHUB_TOKEN") ?: localProperties.getProperty("github_token") } }local.propertiescontainsgithub_token=<YOUR_GITHUB_PAT>with the token you created. This allows Gradle to download the SDK. (If using the sample repo, this may be pre-set; just supply the token.) - Add SDK dependencies: In the app module's
build.gradle.kts, add the Wearables SDK libraries. For example, insidedependenciesadd:
Use the latest version from the GitHub repo tags/releases. Theimplementation("com.meta.wearable:mwdat-core:0.3.0") implementation("com.meta.wearable:mwdat-camera:0.3.0") implementation("com.meta.wearable:mwdat-mockdevice:0.3.0")mwdat-coreprovides base connection,mwdat-camerafor camera control, andmwdat-mockdevicefor simulation support. - Sync Gradle: Click "Sync Project" in Android Studio. Gradle will fetch the packages. If it prompts for credentials, double-check the token setup. A successful sync means the SDK is integrated.
Step 3 — Configure app
Adjust the sample app configuration:
- Application ID: Open
app/build.gradleand ensure theapplicationIdmatches what you registered (e.g."com.yourcompany.whatami"). If you change it, also update the package name in the AndroidManifest and refactor any imports as needed. - AndroidManifest entries: Add required permissions and features:
- In
AndroidManifest.xml, include
Also ensure<uses-permission android:name="android.permission.BLUETOOTH" /> <uses-permission android:name="android.permission.BLUETOOTH_ADMIN" /> <!-- For Android 12+ --> <uses-permission android:name="android.permission.BLUETOOTH_CONNECT" /> <uses-permission android:name="android.permission.BLUETOOTH_SCAN" />uses-permission for CAMERAis not needed (we are not using phone's camera). If using microphone or audio, addRECORD_AUDIOas appropriate. - If the sample manifest has a placeholder
metadatalike an app ID or client key, insert the correct value from the portal. (For this SDK, likely not needed; the Meta AI app handles linking by package name.) - Optionally, declare Bluetooth features:
This just indicates the app expects BLE support.<uses-feature android:name="android.hardware.bluetooth_le" android:required="true"/>
- In
- Gradle settings: Make sure minSdk is 29 or higher (as required by the SDK). If your test device is Android 13+, you might also need to add a line in
AndroidManifestinside<application>:
(Only if you want to suppress notification permission, since our app likely doesn't need notifications. This is just to avoid unnecessary prompts.)<uses-permission android:name="android.permission.POST_NOTIFICATIONS" tools:node="remove"/> - Build configuration: Ensure the app is set to use Debug signing (Android Studio will handle it with a debug keystore by default). No special signing config is needed for testing on device.
Step 4 — Run
- Select run configuration: In Android Studio, select the app module run configuration.
- Choose a device: Connect your Android phone via USB (enable USB debugging and authorize the PC). Select that device from the device dropdown. (Ensure the device has the Meta AI app installed and glasses paired from earlier steps.)
- Run the app: Click Run ▶️. The app will compile and install on the phone. Watch the Android logcat for any errors during startup.
When the app launches, it should show a simple UI (perhaps a "Connect" button and a viewfinder or placeholder for the camera stream).
Step 5 — Connect to wearable/mock
- Connect to glasses: On your Android phone, ensure Bluetooth is on and the glasses are paired (they should appear as a connected device in the Meta AI app). In the sample app, tap Connect (or similar). The Wearables SDK will likely pop up a system dialog via the Meta AI app asking for permission. Accept any prompt saying "Allow this app to access camera on your glasses". Upon success, the sample app will connect to the glasses.
- Grant Bluetooth permissions: If your phone runs Android 12 or higher, you will see a dialog: "Allow YourApp to find, connect to, and determine position of nearby devices?" for Bluetooth. Choose Allow. (This corresponds to the BLUETOOTH_CONNECT/SCAN permissions.)
- Mock device use: If no physical glasses, ensure you included
mwdat-mockdeviceand that the sample has a developer option to add a mock. The process might involve enabling Developer Mode in the Meta AI app and the sample automatically offering a "Connect Mock Device" which simulates a camera feed. Follow Meta's docs for using the Mock Device Kit on Android (usually, you pair to a virtual device shown in the developer center, and the SDK connects as if it were hardware). - Pairing troubleshooting: The Meta AI companion app must be installed. On Android, the Wearables SDK might open the Meta AI app or run an internal service to link. If nothing happens, open Meta AI app manually, then try again.
Verify
- App is connected: The sample app should indicate a successful connection (e.g., a status text or toast "Connected to Glasses"). In logcat, look for logs confirming connection establishment.
- Image data is received: If it's a streaming sample, you might see live video from the glasses in the app UI. If it's on-demand capture, try pressing the capture button and see if an image is taken and displayed. You should see the world through the glasses on your phone screen, confirming the camera frames are coming through.
- No crashes or hangs: The app should remain responsive while streaming. If you disconnect or turn off glasses, the app should handle it (perhaps showing "disconnected" status).
Common issues
- Gradle token error (401 Unauthorized): If build sync failed with authentication errors, the GitHub Packages token wasn't configured properly. Double-check that
github_tokenin local.properties is correct and that you added the Maven repo in settings.gradle. Also ensure the token has the correct scopes. Then sync again. - Manifest merger conflict: If you added permissions or the SDK has its own manifest entries, you might see merge warnings. Resolve by ensuring no duplicate entries, or using tools:node="replace" if needed. Typically, just having the permissions once is enough.
- Cannot find glasses device: If the app keeps waiting for connection or says no device, confirm that the Meta AI app is running on the phone and the glasses are connected to it. The Meta AI app on Android might need to be opened at least once after installation to initialize permissions. Also verify your app's package name is registered and added to the test channel (the Meta AI app only permits known dev apps).
- Connection timeout: Bluetooth can be finicky. If connection fails, try toggling Bluetooth, or "forget" the glasses and re-pair in the Meta AI app. Also, ensure only one phone is trying to connect – glasses can typically only pair to one device at a time.
- App not listed in Meta AI: On Android, if your test app isn't recognized, you might not get the permission prompt. Ensure the account logged in on the Meta AI app is the developer account that has your app in its org. If using a different phone or account, add that account as a tester in the dev center.
6) Integration Guide — Add Object Recognition to Your Mobile App
Goal
Now that basic connectivity works, integrate the Wearables SDK and a vision model into your own app. We'll set up the architecture so your app can capture images from the glasses and run an on-device AI model (object detector or segmenter) to identify what's in view, then display the results to the user.
Architecture
At a high level, the flow will be:
App UI → WearablesClient (SDK wrapper) → Glasses camera → image frames → Vision Model (object detection/segmentation on device or cloud) → results → UI update/storage.
In practice, the Meta glasses act as a sensor, streaming images to the phone. Your app uses the SDK to grab those images and then can feed them into a neural network (e.g., a TensorFlow Lite or Core ML model) to analyze what the user is looking at. Finally, you present the object names or other info on screen (or via audio feedback).
Step 1 — Install the SDK
Integrate the Wearables SDK into your app project (if you haven't already during sample testing):
iOS
- Swift Package Manager: In your app's Xcode project, add the package
facebook/meta-wearables-dat-ios(as in Quickstart A). Select the latest version tag. This will give you access to frameworks likeMetaWearablesDAT(the core SDK). Alternatively, if Meta ever provides a CocoaPod, you could addpod 'MetaWearablesDAT', but currently SPM is the way. - Link frameworks: After adding, check that the package's frameworks are in the "Frameworks, Libraries, and Embedded Content" for your app target. Xcode should handle this automatically. No additional iOS system frameworks are needed besides the defaults (CoreBluetooth etc., which are used internally).
Android
- Gradle dependency: Add the Maven repo and dependencies as done in Quickstart B. In your app's
build.gradle:
(Use the latest version available.) This brings the SDK into your app. After syncing, you can import the SDK classes in Kotlin/Java.repositories { maven { url "<https://maven.pkg.github.com/facebook/meta-wearables-dat-android>" credentials { ... } } google() mavenCentral() } dependencies { implementation "com.meta.wearable:mwdat-core:0.3.0" implementation "com.meta.wearable:mwdat-camera:0.3.0" implementation "com.meta.wearable:mwdat-mockdevice:0.3.0" // optional, for testing }
Step 2 — Add permissions
Integrating means your app will need to declare the proper permissions to use the glasses and possibly run the vision model:
iOS (Info.plist)
- Add NSBluetoothAlwaysUsageDescription with a user-facing reason, if not already present. e.g., "Allow Bluetooth access to connect to your AR glasses."
- (If your app will use microphone on glasses for voice commands, also add NSMicrophoneUsageDescription. For simply receiving images and showing results, mic may not be needed.)
- You do not need NSCameraUsageDescription to receive the glasses camera feed (since you're not using the phone's built-in camera). The glasses feed is authorized through the Meta AI app instead.
- No special entitlements or background modes are required unless you plan background operation. (If so, enable the Bluetooth LE Accessory background mode so the connection can stay alive when the app goes background.)
Android (AndroidManifest.xml)
- Declare Bluetooth permissions as discussed:
For Android 12+, at runtime you'll request<uses-permission android:name="android.permission.BLUETOOTH" /> <uses-permission android:name="android.permission.BLUETOOTH_ADMIN" /> <uses-permission android:name="android.permission.BLUETOOTH_CONNECT" /> <uses-permission android:name="android.permission.BLUETOOTH_SCAN" />BLUETOOTH_CONNECT(andSCANif needed). - If your object recognition model will run on-device, no additional permissions are needed for that (unless you load a model from external storage or use camera – which we aren't). The images come from glasses via the SDK, not from user storage.
- If using Mock Device in development, no extra permission beyond Bluetooth is needed (the mock feed is delivered through the SDK).
- (Optional) If you plan to use text-to-speech or voice input in your feature, remember to include
android.permission.RECORD_AUDIO(for voice input) or use the TTS API (which requires no permission).
Step 3 — Create a thin client wrapper
To keep your code clean, wrap the SDK interactions and the vision AI in a few manager classes:
- WearablesClient (e.g.
WearablesClient.swiftorWearablesClient.kt): This class will manage connecting to and disconnecting from the glasses, and listening for image frames. Using the SDK, it can expose callbacks likeonPhotoReceived(Bitmap/UIImage). It should handle the asynchronous nature of connecting (perhaps via a delegate or LiveData/Flow for connection status). - VisionService (e.g.
ObjectRecognitionService): Encapsulate your object detection/segmentation logic. For instance, it can load a Core ML model or TFLite model at startup, and provide a methoddetectObjects(in image) -> [String](returning object labels or a structured result). Having this separate makes it easy to swap models or call cloud APIs if needed. - PermissionsService: (Optional) A utility to check and request permissions (Bluetooth, etc.) in a user-friendly way. On iOS, you might trigger the Bluetooth permission and handle the callback; on Android, use
ActivityCompat.requestPermissionsfor BT if not already granted.
By organizing code this way, your ViewController/Activity can remain thin, simply orchestrating calls between the UI and these services.
Definition of done:
- The SDK is initialized (e.g. any required setup calls are made when app launches or when user invokes the feature). For example, ensure the WearablesClient knows about the release channel or any config needed to identify your project.
- Connection lifecycle handled: The app can connect to the glasses when needed, reconnect if the connection drops (maybe with an automatic retry or a user "Reconnect" button), and disconnect gracefully (e.g. when the feature is turned off or app exits). No resource leaks or orphan Bluetooth sessions.
- Image pipeline ready: The app receives images (frames or photo captures) from the glasses through the SDK reliably.
- Vision processing integrated: When an image is received, it is immediately passed into the VisionService for analysis, and the results are obtained.
- User feedback & errors: Any errors (failure to connect, model load failure, etc.) are caught and either shown to the user in an appropriate message or logged. For example, if connection fails due to timeout, your WearablesClient can call a delegate like
onError("Cannot connect to glasses, please check Bluetooth."). Likewise, if the detection model fails to initialize, log it and inform the user that recognition is unavailable.
Step 4 — Add a minimal UI screen
Design a simple interface to let the user invoke the feature and see results:
- "Connect Glasses" button: Allows the user to initiate the connection. This could toggle to "Disconnect" when already connected. It should reflect the current status (enabled only when appropriate).
- Status indicator: A small indicator (could be text like "Connected ✅" / "Disconnected ❌" or an icon) that shows whether the wearable is currently linked.
- "Identify" or Camera trigger button: A button labeled "What am I looking at?" or "Capture". When pressed, it triggers the capture of a frame (if not continuously streaming). In a streaming scenario, this might not be needed, but you might still have it to perform a one-time analysis.
- Results display: A view to show what was recognized. For object detection, this could be a scrolling list of identified object names with confidence scores. For segmentation, it could be an image view that overlays colored masks or boundaries on the captured photo. At minimum, show text like "I see: a cat, a cup, and a book." to the user. This could be in a UILabel/TextView or even spoken via TTS for accessibility.
- Thumbnail or camera view: It's often useful to display the last image the app analyzed (so the user knows what the system is answering about). You can show a small thumbnail of the captured frame, possibly with annotations (boxes or masks). This also reassures the user that the system captured the correct scene.
With this UI, the user can connect their glasses, tap a button to ask "What am I looking at?", and get a visual/textual answer.
7) Feature Recipe — Recognize Objects from a Wearable Photo Capture
Let's walk through the core user story: the user presses a button in your app to identify what they're looking at. The app will capture a photo through the glasses, run object recognition on it, and present the answer.
Goal
When the user taps "Capture" (or asks via voice, if extended), the app will use the glasses to take a photo of the user's viewpoint. That image is then analyzed to detect objects, and the app outputs something like: "You're looking at a coffee mug on a table." Optionally, it could highlight the mug in the image or speak the result.
UX flow
- Ensure connection: The user's glasses are connected (if not, prompt them to connect first). The UI should indicate if not connected.
- Tap Capture: The user initiates the action by tapping the "What am I looking at?" button.
- Show progress: The app gives feedback (e.g., a loading spinner or "Capturing…" label) while it triggers the photo capture and waits for analysis. This might take a second or two (the SDK has to get the image over Bluetooth, then run the model).
- Receive result: The app obtains the image and immediately runs the object detection/segmentation. Once the model finishes, it produces results (object labels and possibly their positions).
- Display output: The app updates the UI to show the identified object(s). For example, display text "📷: Cup (98% confidence)" and overlay a bounding box on the photo around the cup. Save the result (image and labels) if needed for later review. Clear the loading state.
Implementation checklist
- Connection verified: Before capturing, check
WearablesClient.isConnected. If false, either auto-connect or prompt "Please connect your glasses first." This prevents a bad call. - Permissions verified: Ensure Bluetooth (and any others) are granted. On Android, this might mean calling
checkSelfPermission(BLUETOOTH_CONNECT)and requesting if not. On iOS, by the time the glasses are connected, permission was already handled. - Capture request issued: Call the appropriate SDK method to take a photo. For example, the SDK might have
takePhoto()or you might start a stream and grab a frame. The Wearables SDK likely provides an API to capture a still image from the glasses' camera. - Timeout & retry: Implement a timeout for the capture in case something hangs (e.g., if the glasses are unresponsive). For instance, if no image arrives in, say, 5 seconds, cancel the request and show an error. Also handle failures by offering a retry ("Capture failed, try again").
- Process result: When the image is received, pass it to your vision model (
VisionService). Make sure this happens off the main thread (use a background queue or coroutine) because ML inference can be heavy. Once done, update UI on the main thread with the findings. - Persist and update UI: Save the photo and recognition result if your app needs to (e.g., to a gallery or history log). Update the UI elements: set the thumbnail image, populate a list of detected objects or draw overlays. Then display a success state (e.g., a small checkmark or toast "Saved ✅" once done).
Pseudocode
Here's a simplified pseudo-code integrating these steps:
func onCaptureButtonTapped() {
guard wearablesClient.isConnected else {
showAlert("Please connect your glasses first.")
return
}
if !permissionsService.checkAllPermissions() {
permissionsService.requestPermissions()
return
}
statusLabel.text = "Capturing…"
wearablesClient.capturePhoto { result in
switch result {
case .success(let image):
DispatchQueue.global().async {
let labels = visionService.detectObjects(in: image)
DispatchQueue.main.async {
imageView.image = image
labelsView.text = "I see: " + labels.joined(separator: ", ")
statusLabel.text = "Done ✅"
}
saveImage(image, with: labels)
}
case .failure(let error):
DispatchQueue.main.async {
statusLabel.text = "Capture failed 😢"
log("Capture error: \\(error)")
}
}
}
}
fun onCaptureButtonClicked() {
if (!wearablesClient.isConnected) {
Toast.makeText(ctx, "Connect your glasses first", Toast.LENGTH_SHORT).show()
return
}
if (!permissionsService.hasPermissions()) {
permissionsService.requestPermissions(activity)
return
}
statusText.text = "Capturing…"
wearablesClient.capturePhoto { result ->
if (result.isSuccess) {
val bitmap = result.getOrNull()!!
// Run object detection in background
CoroutineScope(Dispatchers.IO).launch {
val labels = visionService.detectObjects(bitmap)
withContext(Dispatchers.Main) {
imageView.setImageBitmap(bitmap)
labelsText.text = "I see: ${labels.joinToString()}"
statusText.text = "Done ✅"
}
saveImage(bitmap, labels)
}
} else {
statusText.text = "Capture failed 😕"
Log.e("App", "Capture error", result.exceptionOrNull())
}
}
}
(Pseudo-code assumes capturePhoto is an async API provided by the SDK. In reality it might involve registering a listener for a frame.)
Troubleshooting
- Capture returns empty: If the image comes back blank or
null, check the flow. Did the glasses' shutter actually fire? Sometimes the Meta AI app might not grant access if your app isn't set up correctly – ensure your app is added as a trusted tester in the portal. Also verify the glasses' camera isn't already in use by another app. Look at logs from the SDK; you might need to call a different API for grabbing a frame. If persistent, attempt a reconnect sequence before capture. - Capture hangs or times out: If
capturePhoto()never invokes its callback, it could be a Bluetooth issue. Implement a timeout: e.g., if no response in 5s, cancel and inform the user. On repeated hangs, try restarting the glasses. It's also possible the glasses went to sleep – the Meta AI app might need to wake them. Ensure the glasses have adequate battery. As a fallback, you could use the Mock Device which would isolate if the issue is hardware. - "Instant display" expectation: Users might expect immediate results. To manage this:
- Show a progress indicator during the analysis. Even 1-2 seconds can feel long, so use a spinner or an animation ("🔎 Analyzing…").
- If doing segmentation which could be slower, consider showing a low-res preview quickly, then refining. Or show partial results (e.g., show the first detected object name while still processing others).
- Use concise language in UI. Instead of a static loading, maybe update status like "Identifying objects…".
- Ensure the results are worth the wait: if the system is uncertain, you might say "I see something that looks like a cup." rather than nothing at all.
- Optionally, utilize the glasses' audio: a brief camera shutter sound on capture and a "ding" on result can give the user immediate feedback beyond visuals.
8) Testing Matrix
Test the feature under various scenarios to ensure robustness:
| Scenario | Expected Outcome | Notes |
|---|---|---|
| Mock device (simulated) | Feature works with virtual feed. | Use this in CI or if no hardware. Ensures code logic is correct independent of real BT quirks. |
| Real device, close range | Low latency, high accuracy. | Glasses near the object in good lighting. Baseline ideal case. |
| Low-light environment | Possibly weaker detection. | The model might fail to identify in dark conditions. Consider messaging: "Couldn't see clearly." |
| Background/lock screen | Graceful pause or error message. | If user tries to trigger while app is backgrounded or phone is locked, the capture might not happen. The app should handle this (perhaps disable capture button when not active). Document that it only works when app is open (current preview limitation). |
| User denies permission | Clear error & prompt recovery. | E.g., if Bluetooth permission was denied, the app should detect this and explain how to enable it. The feature should not just silently fail. |
| Disconnect mid-action | App detects disconnect and recovers. | If glasses disconnect or turn off during capture, the app should timeout and show "Glasses disconnected." It might auto-retry connecting or at least not crash. No infinite waits or app freezes. |
| Multiple objects in view | All major objects identified. | Test scenes with more than one object. Ensure the app can list or highlight all (up to the model's capability) and not just one. |
| No recognizable object | Returns "Not sure" gracefully. | If user looks at something very uncommon or an empty scene, the app might return no labels. It should handle "I'm not sure what that is" or similar user feedback, rather than showing a blank result. |
Make sure to test on both iOS and Android if you support both, as each may have platform-specific quirks.
9) Observability and Logging
Adding logging and analytics will help you maintain the feature's quality:
- Connection events: Log when you start connecting (
connect_start), when the glasses successfully connect (connect_success), and if it fails (connect_fail). Include timestamps and error details for fails (could be used to analyze connectivity issues). - Permission state: Log whether required permissions are granted or not at app launch (
permission_state:granted/denied). This can help see if user errors are common (e.g., many users not enabling Bluetooth). - Feature usage: For each capture, log
capture_start(user initiated), and thencapture_successorcapture_failwith relevant info. If success, also log what was detected (not user-visible, but to telemetry – e.g., "detected: cup, book"). If fail, log why (timeout, exception). - Performance metrics: Measure the end-to-end latency of the "what am I looking at" query. Log
capture_duration_msfor how long the whole process took (from button tap to result). Break it down if possible: camera latency vs inference latency. This helps identify bottlenecks. - Reconnection count: If you implement auto-reconnect, keep a count of how many reconnection attempts occur (
reconnect_attempts). If a certain device or scenario causes frequent drops, you'd catch that. - Error tracking: Use your logging to feed into an error reporting service. For instance, if object detection throws an exception on certain images, log it (with maybe a tag of model/label issue) and report to your analytics.
By logging these, you can improve the feature over time, see how often users use it, and how well it performs in the wild. (Be mindful of privacy: don't log any actual image data or personal info, only high-level events and perhaps category labels.)
10) FAQ
- Q: Do I need the actual glasses hardware to start developing? A: Not initially. Meta provides a Mock Device Kit that lets you simulate a glasses device without physical hardware. You can develop and even run object detection on sample images via the mock. However, to truly test end-to-end (especially latency and Bluetooth interactions), the real glasses are recommended. Many developers start with the mock for convenience, then do final testing on the hardware.
- Q: Which wearable devices are supported by the SDK? A: Currently, the SDK supports the Ray-Ban Meta AI Glasses and the Oakley Meta HSTN glasses in the preview. Support for other models (Oakley Meta Vanguard, Meta Ray-Ban with Display) is planned, but those that include a display still won't allow custom graphics yet. In short, only Meta's own smart glasses launched in 2023+ are supported. No generic AR devices or older Ray-Ban Stories are supported in this toolkit.
- Q: Can I ship this feature to production (App Store/Play Store) now? A: Not to the general public. The Wearables Device Access Toolkit is in developer preview. You can develop and test internally (and share test builds within your organization or via TestFlight/Play Beta), but you cannot publish an app using it publicly yet. Meta aims for general availability in 2026 for broad release. Until then, any app using this SDK must be distributed privately to approved testers.
- Q: Does the SDK or glasses do the object recognition for me? A: No – the SDK only provides access to the glasses' hardware (camera, microphones, etc.). It does not include any object detection or AI models. You must bring your own computer vision model. This could be a lightweight on-device model (Core ML, TFLite, etc.) as we did, or you could send images to a cloud service (like an ML backend or Vision API) if latency allows. The good news is you have full control to choose a model that fits your needs.
- Q: Can I use voice commands like "Hey Meta, what am I looking at?" A: Not in this preview. The glasses' built-in Meta AI voice assistant and wake-word ("Hey Meta") are not exposed to third-party apps yet. However, you can implement your own voice command within your app (using the glasses' microphone audio via Bluetooth). For example, you could have a button in-app to start listening and use speech-to-text to trigger the capture. But integration with the native Meta AI is not available in the toolkit currently.
- Q: Can I push content or AR visuals to the glasses (e.g., show text in the HUD)? A: Not at this time. The current SDK is focused on input from the glasses (camera, audio) and not output to displays. Even though Meta has glasses with displays (Ray-Ban Meta Display), the preview only allows camera imagery to be received, not sending graphics to the heads-up display. You also cannot create custom LED notifications on the Ray-Ban (no API for the LED). So for now, all user feedback (like object names or AR overlays) will be on the phone app, not in-glasses. Meta may add display support in future updates of the SDK.
- Q: How accurate is the object recognition? A: That depends entirely on the model you use. For example, if you use a COCO-pretrained MobileNet SSD, it will recognize 80 common classes with decent but not perfect accuracy. There will be times it's wrong or unsure. You can improve accuracy by training a custom model for your use case, or by using a more advanced model (at the cost of performance). Also, the glasses' ultra-wide camera might distort objects at edges, affecting accuracy. In testing, ensure the main subject is relatively centered for best results. Over time, you might incorporate an AI assistant (like Meta's LLaVA or a multimodal model) for a more holistic description, but that's beyond simple detection vs segmentation.
- Q: What about power and battery life?A: Continuously streaming video from the glasses and running AI will use battery on both the glasses and phone. The glasses battery will drain faster when the camera is active (just like recording video). The phone's battery will also be used by the Bluetooth and CPU/GPU during vision processing. In our quickstart, the feature is used on-demand (one photo at a time), which is gentle on battery. If you plan to offer a continuous mode (e.g., constantly identify objects as the user walks), be mindful of heat and battery. You might limit resolution or frame rate, or auto-shutdown after a period. For now, it's best used in short bursts.
11) SEO Title Options
- "How to Build a 'What Am I Looking At' App using Object Detection vs Segmentation" – A descriptive guide title emphasizing the core question and techniques.
- "Get Started with Meta Wearables SDK: Identify What Your Glasses See (iOS/Android)" – Emphasizes the wearable SDK and the feature outcome.
- "Object Detection or Segmentation? Choosing the Best Vision Approach for AR Glasses" – Addresses the comparison angle for SEO, targeting readers deciding between techniques.
- "Step-by-Step: Integrating Smart Glasses Camera and On-Device AI into Your Mobile App" – Broad but hits keywords like smart glasses, on-device AI, integration guide.
- "Troubleshooting Object Recognition on Smart Glasses: Tips for Connections & Permissions" – Focused title that could draw those facing issues with the setup.
(Choose a title that matches your emphasis—if the article's focus is the how-to, the first or second option works well. For a more comparison-focused read, the third option highlights it.)
12) Changelog
- 2025-12-31 — Verified on Wearables SDK v0.3.0 (developer preview), iOS 17.2, Android 13 with Ray-Ban Meta glasses (hardware and mock). Updated object detection vs segmentation use-case comparisons. Initial publication of guide.