How to Achieve Real-Time 3D Gaussian Splatting with On-Device ML (iOS MLX vs Android NNAPI)

TL;DR

You’ll build: A mobile app that captures a scene and generates a photorealistic 3D Gaussian Splatting model (point-cloud of Gaussians) in real time, leveraging Apple’s MLX/Core ML on iOS and NNAPI on Android.
You’ll do: Get the necessary open-source code and models → Install Apple’s MLX and Android ML libraries → Run a sample 3DGS viewer on iOS (MetalSplatter) → Run a sample 3DGS app on Android (with ONNX/TFLite NNAPI) → Integrate the Gaussian Splatting pipeline into your own app → Test performance on real devices.
You’ll need: An Apple Developer account (to run apps on device), a recent iPhone (A14 Bionic or later) or iPad with Neural Engine, an Android phone with a high-end SoC (Neural/DSP accelerator recommended), Xcode 15+, Android Studio Giraffe+.

1) What is 3D Gaussian Splatting (3DGS)?

3D Gaussian Splatting (3DGS) is a cutting-edge technique for real-time 3D reconstruction and rendering of scenes from images. Instead of using traditional mesh geometry, it represents a scene as a point cloud of 3D Gaussians (“splats”) each with parameters like position (XYZ), covariance (shape/scale), color, and alpha (transparency). These Gaussians are projected and rasterized to produce photorealistic views from any angle in real time.

What it enables

Instant 3D capture from photos/videos: 3DGS can turn a set of ordinary photos or a short video clip into a realistic 3D scene extremely fast. For example, the KIRI Engine app uses 3DGS to reconstruct scenes on-device in about a minute from hundreds of frames – a process that might take hours with older methods.
Photorealistic novel view synthesis: Because scenes are stored as millions of colored Gaussian points, new viewpoints can be rendered without heavy inference (no neural network needed at render time). Once the Gaussians are computed, the scene can be viewed at high frame rates (Qualcomm achieved 60 FPS for interactive 3D avatars on a phone using 3DGS).
No mesh or LiDAR needed: 3DGS bypasses the need for explicit meshes or depth sensors. It directly produces a cloud of “blurry” Gaussians that aggregate into a detailed scene. This means even devices without LiDAR (most Androids, older iPhones) can capture 3D scenes with just their RGB camera.

When to use it

Augmented & Virtual Reality (AR/VR): 3DGS shines for AR/VR applications where you need a quick 3D scan of an environment or object to overlay or transport into virtual spaces. It was demonstrated for VR at 90 FPS on mobile-class hardware (Nvidia Orin) to meet the demands of immersive headsets.
Digital Twins and Robotics: For robotics, drones, or autonomous navigation, 3DGS provides a fast way to build a world model from sensor data. Tesla’s researchers, for instance, use Gaussian splats to reconstruct scenes from cameras in ~220ms for rapid perception – useful for simulation and real-time mapping.
Photogrammetry replacement: If you need to create 3D assets from images (for games, e-commerce, cultural heritage scans), Gaussian Splatting offers a more efficient alternative to photogrammetry. It avoids meshing and textures, often yielding higher fidelity without manual cleanup.

Current limitations

High computational demand: The training phase of 3DGS (optimizing Gaussian parameters) is intensive. Mobile devices have as little as ~3–4% of a high-end GPU’s compute power, so achieving real-time rates requires heavy optimization. Naively, an Android XR2 chip only got ~20 FPS on a scene. New techniques (pruning, foveated rendering, etc.) or quantization are often needed to hit real-time on phones.
Device variability (Android): On Android, hardware acceleration support varies. Many mid-range phones lack NPUs/DSPs for ML, so using NNAPI may fall back to CPU and run slowly. Even when hardware is present, drivers can be inconsistent. This fragmentation means performance can be hit-or-miss across devices – one reason some devs choose vendor-specific SDKs or GPU compute instead of NNAPI.
Memory and battery constraints: A full 3DGS model can consist of millions of points, which can be heavy to store and render. Processing high-res frames (or many frames) on-device can also heat up the device and drain battery quickly (thermal throttling is a concern for prolonged use). Developers must manage memory (e.g. decimate less important splats) and possibly limit resolution or frame rate to keep within mobile limits.
Evolving toolchain: 3DGS is a very new technique (popularized via SIGGRAPH 2023). Open-source tools and libraries are still maturing. For example, integration into engines (Unreal, Blender) is in early stages. Expect to deal with research code (PyTorch or custom C++/Metal code) and limited documentation. MLX (Apple’s new ML library) and NNAPI can accelerate inference, but you may need to adapt models (e.g. convert to Core ML or TFLite and ensure operations are supported). In short, some assembly is required – but this guide will walk you through it!

2) Prerequisites

Before diving in, make sure you have the following accounts, devices, and tools ready.

Access requirements

Apple Developer Account: Sign up or log in on the Apple Developer portal. You’ll need this to run apps on a physical iOS device (a free account allows device testing; a paid account is only needed if you plan to distribute on the App Store).
Google Play Developer (optional): Not strictly required for testing on-device, but if you plan to deploy the app widely on Android, eventually you’d need a Play Developer account. For now, enabling “Developer Options” and USB debugging on your Android phone is enough for sideloading during development.
GitHub access: Ensure you can pull from GitHub repositories. Several open-source projects (for MLX, Metal rendering, etc.) will be cloned. No special permissions needed, but an authenticated git setup is helpful if using submodules or large files.

Platform setup

iOS / visionOS (Apple):

Xcode 15 or later – with iOS 17+ SDK (or macOS 14+ for visionOS if you want to try on Vision Pro). Xcode includes all necessary ML frameworks (Core ML, BNNS, MPS, etc.).
Apple MLX library (Swift package) – MLX is Apple’s new open-source ML framework for on-device inference. It provides low-level Swift APIs to run models using the power of Apple Silicon. We’ll add this via Swift Package Manager.
Physical iPhone or iPad – highly recommended. A device with Apple Neural Engine (any A12 Bionic or later, or M1/M2 iPad) will give the best performance. Simulators will not work for MLX/Metal GPU code (iOS Simulator doesn’t support Neural Engine or Metal GPU rendering).
Swift Package Manager or CocoaPods – to integrate MLX and possibly other dependencies (like a 3D viewer library). SPM is integrated in Xcode; CocoaPods if needed for other ML packages.

Android:

Android Studio Flamingo or later – with Android SDK 33+ (Android 13 or 14). Newer Android versions have improved NNAPI drivers and GPU delegates.
Gradle 8+ and Kotlin 1.8+ – Our sample Android app will likely use TensorFlow Lite or ONNX runtime in a Kotlin project. Up-to-date Gradle and Kotlin will ensure compatibility with the ML libraries.
Physical Android phone – strongly recommended. Use a high-end model (e.g. Pixel 7/8, Samsung S22/S23, or any device with a Snapdragon with Hexagon DSP or similar NPU). Real devices have proper NNAPI drivers; emulators do not emulate NPUs and have only basic GPU support. An emulator might run using CPU which is very slow for this task.

Hardware or mock

Device with camera and IMU: If you plan to capture your own scenes, you’ll need a device’s camera. For iOS, an iPhone with a good camera (and LiDAR optional – not required by 3DGS). For Android, any decent camera phone. The IMU (gyro) is also leveraged by some pipelines (e.g., to aid pose tracking), but not mandatory for basic use.
Test data (optional): If you don’t want to physically capture data initially, have some sample inputs ready:

text

*   A **set of photos** of a scene or an object (10–20 images from different angles).

text

*   Or a short **video clip** (10–30 seconds) of you moving around an object/scene.

*   For the single-image case (using Apple’s SHARP model), just one interesting photo.
    These can be used to test the pipeline without doing a live capture every time.

Storage and memory: Ensure your device has free storage (reconstructing a scene can produce a model tens of MBs in size, plus temporary data) and is charged (the process can consume significant battery/power – have a charger nearby for long sessions).

3) Get Access to 3D Gaussian Splatting Tech

Unlike a typical SDK behind a closed beta, Gaussian Splatting is mostly open-source research. Accessing it means obtaining the right code and models:

Clone the open-source repositories:
- For iOS, clone the Apple MLX Swift package and the MetalSplatter viewer:
  bash
  git clone https://github.com/ml-explore/mlx-swift.git # Apple's MLX library git clone https://github.com/scier/MetalSplatter.git # 3DGS Metal renderer for iOS/VisionOS
  MetalSplatter is an official Vision Pro app that renders Gaussian splats using Metal. We will use it as our starting sample on iOS.
- For cross-platform model code, clone the OpenSplat/gsplat repository:
  bash
  Copy code
  git clone --recursive https://github.com/pierotofy/OpenSplat.git
  This contains the core 3DGS pipeline (original CUDA code and a Metal port). The Metal backend from OpenSplat will be handy for integration on Apple GPUs.
- (Optional) Clone Sharp-CoreML from Hugging Face:
  bash
  Copy code
  git lfs install git clone https://huggingface.co/pearsonkyle/Sharp-coreml
  This has the pre-converted Core ML model of Apple’s SHARP (Single Image 3DGS) and a Swift inference script.
Request model access (if any): The above repositories are public, no special access tokens needed. If you use ONNX models or TensorFlow Lite, you might need to download them:
- Download the Core ML model package for SHARP (if you didn’t clone): huggingface-cli download pearsonkyle/Sharp-coreml -p sharp.mlpackage.
- For a multi-view 3DGS, you might use the OpenSplat code to generate a model from images, or find a pretrained example. (There isn’t a single “official” pretrained multi-view model to download; you generate it from your own captures.)
Accept licenses: Check the licenses of the code/models:
- MLX is under Apache 2.0 (per Apple’s repo).
- OpenSplat’s Metal code is AGPL-3.0 (if you use it or its derivative in your app, be mindful of the copyleft requirement unless you negotiate a different license).
- Sharp’s CoreML model has an Apple research license.
- By using them, you implicitly accept those terms.
Prepare any API keys (not applicable): There are no cloud API keys needed – everything runs on-device! 🎉 (No billing or cloud quotas to worry about.)
Done when: you have:
- The MLX framework ready to add to your Xcode project (either via Swift Package or the source).
- A sample viewer app (MetalSplatter) project open in Xcode.
- The Core ML model for SHARP (single-image 3DGS) downloaded, or the OpenSplat code ready to build for multi-image.
- On Android: we will use TensorFlow Lite or ONNX runtime – you’ll fetch those via Gradle in the next steps rather than a manual download.

At this point, you should see local files for the projects and models, and be ready to open them in the IDEs. You should also see any provisioning profiles set up for iOS (Xcode might prompt to manage signing because the sample uses your team) and have an Android device connected and visible via adb devices.

Next, we’ll run quickstart demos on each platform to verify everything is working.

4) Quickstart A — Run the Sample App (iOS)

Goal

Run an official sample app on iOS that demonstrates Gaussian Splatting, and verify that it can render a 3DGS model with hardware acceleration on an iPhone. We’ll use MetalSplatter as the sample viewer to visualize a Gaussian Splat .ply file (you can use one generated from the SHARP model or a provided example).

Step 1 — Get the sample

Option 1: Clone and open MetalSplatter: If you haven’t already, clone the repository scier/MetalSplatter (as above). Open MetalSplatter.xcodeproj in Xcode. This sample app is a simple viewer for .ply files containing Gaussian splats.
Option 2: App Store (viewer only): If you have access to an Apple Vision Pro, you could download MetalSplatter from the App Store. But for development purposes, we stick to source code so we can run it on an iPhone in Xcode.

Step 2 — Install dependencies

MetalSplatter should come mostly ready to build. It uses MetalKit and doesn’t have external package dependencies beyond MLX:

Add MLX to the project: In Xcode, go to File > Add Packages... and enter the MLX Swift package Git URL (https://github.com/ml-explore/mlx-swift). Add the package at the latest release (the WWDC23 version). This provides the Apple ML acceleration APIs (although MetalSplatter may primarily use Metal directly, MLX can be used if integrating ML model inference).
Check Metal API availability: Ensure your project’s deployment target is iOS 17+ so that all necessary Metal features (and MLX if used) are available. The sample might include a VisionOS target as well; you can ignore that if focusing on iPhone.

Step 3 — Configure the app

Before running, do a bit of configuration:

Code signing: In Xcode’s Signing & Capabilities, select your Personal Team or developer team for the MetalSplatter target (Xcode may auto-fix this). This allows the app to be built onto your device.
Model file: By default, MetalSplatter expects a .ply file of Gaussian splats in the app’s Documents or iCloud container. Locate a sample .ply (for example, run the Sharp model once to produce test.ply as shown below, or use any provided sample from the repo). You can add this file to the app bundle or plan to AirDrop it to the app’s documents directory after launch.
Entitlements: If you plan to load the .ply from iCloud, ensure the iCloud Documents capability is enabled. Alternatively, for simplicity, you can bundle a small .ply in the app (add it to Xcode > Build Phases > Copy Bundle Resources).

Step 4 — Run

Select the MetalSplatter target in Xcode and choose an iOS Device (connect your iPhone via USB or network). Make sure it’s not set to a simulator.
Build & Run the project (Cmd+R). The app should install on the iPhone and launch.
Load a 3DGS model: The MetalSplatter UI will likely prompt to open a file. If you included a sample .ply in the app bundle, you might need to copy it to the Documents directory. E.g., you could modify the sample code to load "sample.gs.ply" from bundle on startup.
View the rendering: Once the Gaussian splat data loads, the app will render it using Metal in real-time. You should see the 3D scene on your phone’s screen, and be able to touch/drag to rotate the view (if implemented).

Step 5 — (Optional) Generate a Gaussian Splat from an image

To see the full pipeline on iOS:

Use the SHARP Core ML model to generate a splat from a single photo. You can run the run_sharp.swift script provided in the Sharp-coreml repo. For example, from Terminal on your Mac:
bash
Copy code
swiftc -O -o run_sharp sharp.swift -framework CoreML -framework CoreImage ./run_sharp sharp.mlpackage input.jpg output.ply -d 0.5
This will use Core ML (which leverages the Neural Engine and GPU) to process input.jpg and output output.ply. On Apple Silicon Mac, this takes ~1.9s; on an iPhone it may be a few seconds but still very fast for turning a single image into a 3D model. Core ML automatically optimizes for CPU/GPU/ANE to maximize performance on device.
Transfer output.ply to the iPhone (e.g., via AirDrop or the Files app).
Open it in MetalSplatter to verify the scene renders properly in 3D.

Verify

App runs without crashing and displays a blank/default interface on your iPhone.
Model loads successfully: When you open the .ply file, you see the point cloud of Gaussians rendered. It should look like a fuzzy but recognizable version of the scene/object.
Interactive rendering: You can rotate or zoom the view at a decent frame rate. If you’re on a modern iPhone (A15/M1 or later), you might see 30–60 FPS rendering for moderately sized models. Apple’s Metal and Neural Engine are being utilized under the hood (check Xcode debug gauges to confirm GPU/ANE usage).
Quality check: The rendered scene should be photorealistic from different angles. If you see only sparse points or it looks very faint, you might have too high a decimation or an incomplete model.

Common issues

Build errors (Metal): If Xcode complains about Metal shader issues or MLX, ensure your device is on the latest iOS and that the deployment target isn’t above your device’s iOS version.
App permissions: If the app tries to use the file system or iCloud, you might need to grant permission. Also, if at any point we integrate camera capture, add NSCameraUsageDescription in Info.plist.
Model file not loading: Make sure the .ply file is accessible. Use Xcode’s device file browser or logs to verify the path. Large .ply files (e.g., millions of points) might take a while or even fail to load due to memory – try decimating to, say, 50% points (d 0.5 as shown) to reduce size with minimal quality loss.
Performance issues: If the render is very slow or device gets hot, it might be that the model is too large (i.e., too many Gaussians) for real-time on that device. Consider capturing fewer images or downsampling. Also ensure the Neural Engine is being used for any ML inference (Core ML will try to offload to ANE; you can check in Instruments).
“A problem occurred” on launch: This could be a code signing or provisioning issue. Verify that your bundle ID is unique (change it if conflict) and that your device is registered. Clean build folder and rebuild if needed.

Once you have the iOS sample successfully showing a splat, you’ve proven that the pipeline (capture → model → render) works on Apple devices – leveraging Apple’s optimized hardware and frameworks (Core ML, Metal). 🎉

5) Quickstart B — Run the Sample App (Android)

Goal

Run a sample Android app that performs Gaussian Splatting on-device, verifying that Android’s NNAPI (Neural Networks API) can accelerate it. We’ll create a simple Android app that uses TensorFlow Lite with NNAPI or ONNX Runtime to run a Gaussian Splatting model and then displays the result (point cloud). While there isn’t an “official” 3DGS sample from Google, we’ll outline a basic approach.

Note: Unlike iOS, where one Core ML model (SHARP) handled the whole pipeline, multi-view 3DGS on Android might involve multiple steps (structure-from-motion, then a neural network). To keep it simple, we might use the same SHARP model on Android (converted to TFLite) to generate a splat from one image, as a proof-of-concept.

Step 1 — Get the sample

We will set up a new Android Studio project (or you can clone an example):

New Project: Create a basic “Empty Activity” project in Android Studio (Kotlin). Name it “GaussianSplatDemo”.
Project structure: We’ll have:
- MainActivity with a button to run inference and a GLSurfaceView or similar to display the 3D result.
- (Alternatively, clone an existing sample that uses TFLite. For instance, the TensorFlow Lite examples GitHub has apps for image classification – we can adapt one of those to our model.)

Step 2 — Configure dependencies

To run ML on Android:

Add TensorFlow Lite: In your app-level build.gradle, add:
gradle
Copy code
implementation 'org.tensorflow:tensorflow-lite:2.12.0' implementation 'org.tensorflow:tensorflow-lite-select-tf-ops:2.12.0' // if using custom ops
This brings TFLite into the project. We might also need the GPU delegate or NNAPI delegate, but those are part of the TF Lite runtime.
NNAPI usage: We’ll enable NNAPI via the TFLite Interpreter options at runtime (no extra dependency).
3D rendering: To visualize the output, consider adding Rajawali or Google Filament as a 3D engine. However, to limit scope, we might just log the output or save the .ply. (Displaying a million points on Android is non-trivial; you could use OpenGL ES to render points).
Model file: Place the TFLite or ONNX model in app/src/main/assets/. For example, if converting SHARP, you’d have sharp.tflite (quantized to int8 ideally). Conversion note: Core ML to TFLite is not direct; you’d convert the original PyTorch to ONNX then to TFLite. There is ongoing work in the community to get 3DGS models running on mobile with NNAPI.

Step 3 — Configure app

Application ID: Set your applicationId in build.gradle (e.g., "com.yourname.gsplatdemo").
Permissions: If you plan to capture images with the camera in-app:
- Add CAMERA permission in AndroidManifest.xml.
- Also WRITE_EXTERNAL_STORAGE if saving files, and request these at runtime.
- If using the device’s IMU for SLAM, you might use the Sensor API (no special permission needed for motion sensors).
NNAPI enabling: Prepare the TFLite interpreter with NNAPI:
kotlin
Copy code
val options = Interpreter.Options().addDelegate(NnApiDelegate()) options.setUseNNAPI(true) options.setAllowFp16PrecisionForFp32(true) // allow FP16 if supported val tflite = Interpreter(loadModelFile("sharp.tflite"), options)
This attempts to run the model on available accelerators via NNAPI. Keep in mind that NNAPI support may vary, and it might silently fall back to CPU if the model has unsupported ops. We aim for a model with standard conv/MLP ops that NNAPI handles.
UI layout: For simplicity, use a TextView to show status and a SurfaceView for 3D. Or skip rendering: just run the model and save output.

Step 4 — Run

Build & Install the app on your Android device (via USB or Wi-Fi ADB). Grant any permissions requested (camera, storage).
Prepare input: Place a test image in device storage or use an asset. (If implementing camera capture, take a photo to use as input for SHARP model.)
Trigger inference: Press the “Run 3DGS” button in the app (which calls the TFLite interpreter). On press:
kotlin
Copy code
val inputBitmap = ... // load or capture image val inputTensor = bitmapToTensor(inputBitmap) // preprocessing to normalized tensor val outputBuffer = allocateOutputBuffer() // proper shape for output tflite.run(inputTensor, outputBuffer)
If using SHARP, the output will be arrays for means, covariances, colors, etc. (See SHARP spec – 5 output tensors). We might need to post-process into a .ply format.
View result: If you implemented a renderer, plot the points. If not, write the output to storage (result.ply). You can then open it on PC or even in the iOS viewer for verification.

Step 5 — (Optional) Connect to wearable/mock

(This step is less applicable here since we are not using a wearable. If your end goal is to send the 3D model to an AR headset or server, you would integrate that here. For now, skip.)

Verify

Model runs on NNAPI: Check Android logcat for messages about NNAPI delegation. On a Pixel, you might see that it uses the GPU or NNAPI for certain ops. Ideally, detection and inference each frame should be 100ms or less (for a small model), indicating hardware acceleration. If you see 500ms+ per frame and high CPU usage, likely NNAPI fell back to CPU.
Output is valid: If you got an output .ply, transfer it and open in a viewer (e.g., MeshLab or the iOS app) to confirm it’s a coherent point cloud (should resemble the input scene).
App doesn’t crash: Handling large arrays on Android can hit memory limits. If it runs to completion, you’re good. If not, consider using smaller input resolution or quantizing the model to reduce memory.
Performance acceptable: On a flagship Android (e.g., Snapdragon 8 Gen 2), a small model like SHARP int8 should run in under a second, perhaps even realtime ~10 FPS. Qualcomm’s research shows 60 FPS is possible with heavy optimization, but our simple demo may be slower. The key is we’re able to leverage the NNAPI hardware rather than purely CPU.

Common issues

NNAPI delegate error: If NnApiDelegate() isn’t found or causes issues, ensure your tensorflow-lite-select-tf-ops dependency is added if the model has custom ops. Also test with options.setUseNNAPI(true) without explicit delegate (newer TFLite might enable it automatically).
Unsupported operations: If the model uses ops NNAPI can’t handle, it will drop to CPU for those. This might make it very slow. Use TensorFlow’s tflite_convert or ONNX tools to simplify the model or use GPU delegate as a fallback.
Gradle build fails (ABI issues): TFLite uses native libs. Make sure abiFilters include your device’s architecture (arm64-v8a). You can restrict to arm64 to reduce APK size.
Rendering issues: If implementing a custom OpenGL renderer for the points, you might see nothing if the point size or projection isn’t set. This can be complex; don’t hesitate to use external libraries or even output an RGB-D image as a simple verification (render the splat to a 2D image).
Device compatibility: If you try on an older or low-end Android, NNAPI might not be available or might be very limited (e.g., no acceleration for our model). In such cases, the performance will be poor. You can detect this and warn the user or restrict the feature to capable devices.

By now, you should have a basic Android app running a 3DGS model locally. This proves Android can do on-device Gaussian Splatting, though with more friction than iOS (due to model conversion and fragmentation). Great work getting both platforms up and running! 🎊

6) Integration Guide — Add 3DGS to an Existing Mobile App

Now that we’ve run stand-alone demos, let’s integrate Gaussian Splatting into a real app architecture. We’ll outline how to incorporate capturing a scene and generating a 3DGS model as a feature in your own app, for both iOS and Android. The architecture will differ, but conceptually:

Architecture Overview:

Your App’s UI ➡️ Capture Module (camera + IMU) ➡️ 3DGS ML Module (SfM + neural network) ➡️ Result Renderer/Viewer ➡️ App’s UI updates (optionally upload/share).

mathematica

Copy code

text

         `[iOS]                                   [Android] UI Button        |                                        |     └──▶ CaptureSession (AVCapture)    └──▶ Capture via CameraX/Camera2              |                                        |     Real-time sparse points (ARKit)    └──▶ Real-time VIO (optional)              |                                        |     User finishes capture                     |              └──▶ Run MLX/CoreML model         └──▶ Run TFLite/NNAPI model                     (Neural Engine)                      (NNAPI/GPU)              |                                        |     Got Gaussian splats data               └──▶ Got Gaussian splats data              └──▶ Metal render in SceneKit           └──▶ OpenGL/Filament render              |                                        |     Display 3D model in app UI             └──▶ Display 3D model in app UI`

Step 1 — Install the 3DGS SDK/Library

iOS (MLX/Core ML approach):

Add the MLX Swift package to your app if not already. MLX gives you APIs to run PyTorch-like models on-device. Alternatively, if you converted your model to Core ML (.mlmodel), add it to Xcode (it auto-generates model classes).
If using Apple’s Vision framework or ARKit for real-time capture, include those. ARKit can give you camera pose tracking for free (useful to align images).
Include MetalPerformanceShaders if you plan to do some GPU compute (the OpenSplat Metal code can be integrated).
Ensure your Deployment Target is iOS 17+ for MLX and latest Metal APIs.

Android (TFLite/ONNX approach):

Add TensorFlow Lite dependency as done in Quickstart B, or ONNX Runtime (ORT) if you prefer:
gradle
Copy code
implementation 'com.microsoft.onnxruntime:onnxruntime-android:1.15.0'
ORT can use NNAPI or directly use Qualcomm’s libraries if available.
If using CameraX for capture, add CameraX dependencies.
If using ARCore for motion tracking (optional), include Sceneform or ARCore SDK to get device pose streams.

Step 2 — Add permissions and entitlements

Integrating 3DGS means you’ll use camera (and possibly motion sensors or file I/O). Update your app’s config:

iOS Info.plist:

NSCameraUsageDescription – explain why you need camera (e.g., “To capture photos for 3D scene reconstruction”).
NSPhotoLibraryAddUsageDescription – if you save the model or images to gallery.
(If you use ARKit’s world tracking, iOS will also require camera permission which is covered by the above.)
com.apple.developer.networking.networkextension etc., not needed since we’re offline.

No special entitlement for Neural Engine usage – Core ML/MLX will automatically use it. Just ensure the device is not in Low Power Mode (which might throttle performance).

Android AndroidManifest.xml:

<uses-permission android:name="android.permission.CAMERA" />
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" /> (if saving files; on Android 11+ use MediaStore API or grant via SAF).
<uses-feature android:name="android.hardware.camera.ar" /> (if using ARCore) with required=false if optional.
Include the android:exported attributes properly for activities if targeting SDK 31+ (unrelated to ML, but common gotcha when adding new activities).

No manifest entry needed for NNAPI; it’s part of OS. But if you use GPU (OpenGL/Vulkan) for rendering, you might add android:usesFeature GL_ES_version.

Step 3 — Create a thin client wrapper

Abstract the 3DGS functionality so your app logic remains clean. Implement the following components:

CaptureManager (iOS) / CaptureService (Android):
- On iOS, use AVCaptureSession or simply UIImagePickerController if doing a quick photo capture. For a full capture pipeline, use ARKit’s ARSession with ARWorldTrackingConfiguration to get real-time camera poses and maybe sparse point cloud.
- On Android, use CameraX to stream images. If doing multi-view, you might collect key frames (every X seconds or on significant movement). Optionally, use ARCore’s Session to get pose tracking.
- This module should provide: startCapture(), stopCapture(), and callbacks like onFrameCaptured(image, pose).
GSplatProcessor:
- This is where the ML happens. It can be a class that takes in a set of images (plus camera poses if available) and outputs a 3DGS model.
- On iOS, if using the Sharp approach, GSplatProcessor simply runs the Core ML model for each image (or one image). If using multi-view, you might incorporate a Swift or C++ implementation of the original 3DGS pipeline (COLMAP features + PyTorch training loop). That’s heavy – you might offload to GPU via Metal kernels as OpenSplat does.
- On Android, similarly, use TFLite to run the model. If multi-view, one approach is to do the SfM part on device (maybe using OpenCV’s ORB to get poses) then run a simplified training loop (this might be too slow in Java/Kotlin; a native C++ with Vulkan compute or via NNAPI might be needed).
- Either way, this component should hide the complexity. Provide a method like generateSplatModel(images: [UIImage]) -> GSplatModel or asynchronous that uses a background thread.
GSplatModel & Renderer:
- Define a data structure for the output (list of Gaussian parameters). Perhaps:
  swift
  Copy code
  struct GSplat { var position: SIMD3<Float> var covariance: simd_float3x3 var color: SIMD3<Float> var alpha: Float } typealias GSplatModel = [GSplat]
  On Android, a similar Kotlin data class or just use float arrays.
- The Renderer will take GSplatModel and display it. On iOS, you can use SceneKit or Metaldirectly. SceneKit doesn’t natively support point primitives, but you can create a SCNNode geometry from vertices. Or use MetalPerformanceShaders to render (advanced). Many have used custom Metal shaders (as in MetalSplatter) for optimal performance.
- On Android, you might integrate with OpenGL ES. Use a simple vertex shader that draws points (with size and alpha). Or use Google Filament to treat each Gaussian as a decal (beyond scope here).

Definition of done for integration:

A “Scan 3D” button in your app triggers the capture and processing flow.
The app shows feedback (e.g., “Capturing… move around the object” and then “Processing…”) and then finally “Done”.
The output 3D model is integrated into your app’s UI – e.g., showing a 3D view that the user can pinch/zoom/rotate of the reconstructed scene.
All heavy work is off the main thread to keep UI responsive.
Errors (camera not available, model failed, etc.) are caught and displayed to the user (and perhaps logged).
Performance is tuned such that a typical scan (let’s say 20 images or a 5-second video) gets processed in a reasonable time on target devices (a few seconds on high-end devices, maybe tens of seconds on mid-range).

Step 4 — Add a minimal UI screen

Design a simple UI for the feature (which you might later refine):

iOS: Use SwiftUI or UIKit. A SwiftUI view with a ARViewRepresentable (from RealityKit) could show a live camera with AR points during capture, then switch to a MetalKitView for rendering result.
Android: Use an Activity or Fragment with a TextureView for camera preview. After capture, replace it with an OpenGL GLSurfaceView showing the result.

UI Elements:

“Start Scan” button: Begins the capture. While capturing, maybe change it to “Finish Scan”.
Status label or progress bar: To inform user of what’s happening (capturing vs processing).
3D view container: Once done, show the interactive 3D view. You might overlay a reset or save button here.

For instance:

swift

Copy code

VStack { if !modelReady { Text(statusText) Button(action: { if !capturing { captureManager.startCapture() } else { captureManager.stopCapture() } }) { Text(capturing ? "Finish Scan" : "Start 3D Scan") } } else { MetalSplatView(model: gsplatModel) // custom MetalKit view Button("Save Model") { savePLY(gsplatModel) } } }

Similarly in Android XML, a FrameLayout with a PreviewView and a SurfaceView that you toggle visibility.

By the end of this integration, your app should be able to incorporate the whole 3DGS workflow end-to-end, giving users a magical ability to create 3D content with just their device – no server required. 🚀

7) Feature Recipe — Trigger Photo Capture from Phone and Generate a 3D Splat Model

Let’s drill into a specific user story: “As a user, I want to tap a button to capture the scene, and get a 3D model that I can view in the app.” We’ll outline the sequence and considerations to implement this smoothly.

Goal

When the user taps “Capture 3D Scene”:

The app captures a series of images (or a short video) of the scene.
The images are processed into a Gaussian Splatting model on-device.
The resulting 3D model (point-based) is displayed to the user, and optionally saved.

UX flow

Ensure device is ready: Check that the app has camera permission and sufficient resources. If not, prompt the user appropriately.
User taps Capture: The UI indicates recording has started (e.g., a blinking dot or “Recording...”). For ~5-10 seconds, the user moves around the object or environment with their phone.
Stop capture: The user taps again to finish, or it auto-finishes after N seconds or enough frames (Scaniverse, for example, tells user when it has enough data).
Processing state: Show a progress UI (“Reconstructing 3D model...”). This is where ML crunches the data. Possibly show an activity spinner or a progress bar if you can estimate progress (you might not easily estimate training progress, so a spinner might suffice).
Display result: Once done, show the 3D model. Provide basic controls: users can rotate/zoom it on screen. Also a button to save/share if needed.
Allow redo: Perhaps a “Retake” button to discard and capture again, especially if result wasn’t good.

Implementation checklist

Permission check: On app launch or when hitting “Capture”, verify camera (and maybe photo library if saving). Use AVCaptureDevice.authorizationStatus (iOS) / ActivityCompat.requestPermissions (Android). If not granted, show a friendly message guiding them to enable it.
Capturing frames: Decide on frame selection strategy. You could take a photo every 0.5 seconds, or use a video and sample frames. Implement buffering of frames (be mindful of memory – maybe limit to 20-30 frames max or downsample them).
Pose estimation: If possible, get camera pose for each frame. ARKit on iOS can give you ARFrame.camera.transform. ARCore on Android can give pose if running. If not using those, you can run a quick feature match between subsequent frames to estimate relative pose (advanced; can skip if doing monocular approach).
Run the ML: Feed the collected data to the 3DGS algorithm:

text

*   If using Sharp (single view): just pick the best frame (or multiple key frames separately).

text

*   If multi-view: initialize a point cloud via SfM (COLMAP or built-in). Possibly prune the number of points with motion heuristics. Then run the optimization (which could be the slow part). On device, you might limit iterations or resolution to keep time reasonable.

Timeout & retries: Set a reasonable timeout (say 30 seconds for processing). If it exceeds, cancel and inform the user it didn’t work (maybe suggest capturing fewer frames or a simpler scene).
Persist results: Once you have the splat model, save it to storage (e.g., as a .ply file or a custom format). This is important for two reasons: so the user can share it, and for you to avoid recomputation if the user just wants to view it again.
UI update: Switch the UI from progress to the 3D view. Also display a success message (“3D Model ready! Use gestures to explore it.”).
Cleanup: Free large arrays and intermediate data (especially on Android, call tflite.close() or free buffers to avoid memory leaks). If the user captures again, you don’t want leftovers eating RAM.

Pseudocode

Here’s a rough pseudocode of the capture button logic in a mix of platform-agnostic style:

pseudo

Copy code

onCaptureButtonTapped(): if !hasPermission(camera): requestPermission(camera) return if state == Idle: state = Capturing frames = [] startCameraPreview() showUI("Move around the object and tap when done.") // Start collecting frames else if state == Capturing: state = Processing stopCameraPreview() showUI("Processing...", spinner=true) async { model = nil try: model = generateGSplat(frames) catch (err): log(err) runOnUiThread: if model: showModel(model) saveModel(model) showUI("Done ✅", spinner=false) else: showUI("Capture failed, try again.", spinner=false) state = Idle }

For iOS, generateGSplat might call into a Metal Performance Shader kernel or Core ML model. For Android, it might invoke a native library or TFLite as discussed.

Troubleshooting

Result is empty or poor quality: If the output has very few points or looks wrong, check the input images. Did the user cover enough angles? If not, you may need to guide them (“Try to circle around the object”). Also verify the model isn’t picking up only background. In research, algorithms like VGGT help with better sparse reconstructions. Perhaps implement a quick quality check: if number of splats < threshold, notify user to retry with better movement.
App freezes during processing: This likely means heavy work on main thread. Make sure the ML model and any point calculations run in a background thread or dispatch queue. Use AsyncTask (Android) or GCD (iOS) accordingly. Also consider showing incremental updates to avoid watchdog kill (on iOS, long blocking operations could kill the app).
High memory usage: Processing images + storing millions of points can blow up memory. Monitor memory. Possible fixes: scale down input images (maybe use 1024px max dimension), limit number of points (the OpenSplat hybrid approach filters out irrelevant Gaussians to reduce load).
Expectation of instant results: Users might think it’s like taking a panorama photo. 3DGS is fast relative to older tech but still might be several seconds. Manage this by showing a nice progress animation or perhaps doing some processing concurrently during capture (Scaniverse shows partial results during capture which greatly improves perceived speed). If you can, do “online” SfM: compute camera poses as you capture (e.g., using ARKit or an on-device SLAM) so that when capture ends, you skip that step.
Lighting or motion blur issues: Mobile captures often have blur (especially indoors). This can degrade 3D reconstruction. You might incorporate a deblurring step or simply inform users to hold still and have good lighting. Spectacular AI’s research on 3DGS with motion blur shows using the phone’s IMU data can compensate for rolling shutter and blur. Implementing that is advanced, but something to keep in mind if quality is suffering on mobile videos.

With careful handling of these aspects, the feature should feel smooth: the user presses a button, and shortly after, they see a mini 3D world they just captured in the palm of their hand.

8) Testing Matrix

To ensure our 3DGS implementation works robustly, consider a matrix of test scenarios across device types and conditions:

← Scroll for more →

Test Scenario	Expected Outcome	Notes
Basic single-object scan (ideal)	User scans a static object on a table in good lighting. Resulting 3D model is complete and detailed.	Baseline case. Should consistently pass.
Mock input (pre-recorded frames)	Feeding a known image set (e.g., a dataset sample) yields a correct 3D model matching expected output.	Good for automated tests – compare output against reference (maybe by comparing point cloud density or a rendered view).
Different device performance	iPhone 14 Pro vs. iPhone 11 vs. budget Android: All produce a model, but older/budget devices may take longer or produce lower quality if they had to reduce input size.	Ensure timeouts are adjusted for slower devices. Possibly skip certain heavy features (like real-time display) on low-end.
Large scene (room-scale)	User scans an entire room or outdoor scene. Expected: It should work if within capabilities, but might show lower FPS or partial reconstructions. App should not crash – maybe warn “scene too large.”	Testing the upper bound of what’s feasible on device. Watch for memory OOM issues with millions of splats.
Low light / motion blur	If user scans in dim light or moves too fast causing blur, the app should still produce something (perhaps noisier model) but not crash. Possibly detect blur and show a warning “too much motion”.	Use phone’s gyro data – if rotation rate is high, you can infer likely blur. Also test the SpectacularAI approach if integrated.
Background/Lock screen mid-process	If the app is backgrounded during capture or processing (user switches app or phone locks), it should handle gracefully: either pause processing or resume when foreground, without crashing or corrupting the model.	Use lifecycle callbacks to pause camera and ML. Write intermediate state if needed.
Permission denied flow	User denies camera or storage permission. The app should show an error and disable the capture feature (but not crash). Possibly guide user to settings to enable.	Simulate by revoking permission and hitting capture.
Cancellation mid-process	User decides to cancel while it’s processing (impatient). The app should allow it (maybe a “Cancel” button) and stop computation promptly, freeing resources.	Implement a cancellation token for the ML loop. Test that it indeed stops (especially for a long optimization loop).
Disconnection (Android)	If using ARCore, test what happens if ARCore isn’t installed or fails to start (on an unsupported device). App should fallback to a simpler capture mode or notify user.	This is Android-specific. Use `ArCoreApk.Availability` to check at runtime.
Memory stress	Run multiple scans in a row without app restart. Expected: no cumulative memory leak. Each new scan frees the last model (unless you intentionally keep them).	Use Xcode/Android Studio profilers. Look for allocations on each run – ensure they’re freed.

By covering these scenarios, you can iron out edge cases and ensure your “3D capture” feature is production-ready.

9) Observability and Logging

Adding logging and analytics will help monitor the feature’s performance in the wild and ease debugging:

Log key events and metrics:

capture_start – user initiated a capture. Log timestamp, device model, maybe initial estimates like how many frames you plan to use.
capture_end – user finished capture (or it auto-finished). Log duration of capture, number of frames collected.
model_generation_start – the ML processing began. If possible, log selected model path (e.g., “Sharp model v1, int8, using Neural Engine” or “full 3DGS pipeline started”).
model_generation_complete – success. Include time taken in ms, number of Gaussians output (model size), and which hardware was used. For example, on iOS you can check if it utilized the Neural Engine (maybe via a Core ML report). On Android, you might log which delegate ran (NNAPI vs CPU).
model_generation_fail – if any step fails (out-of-memory, etc.), log error type and message.
render_start and render_complete – if you do any heavy rendering initialization (like building a mesh or texture for points).
Performance metrics: It’s very useful to log how long each phase takes: e.g., capture_phase_ms, sfm_phase_ms, ml_inference_ms, render_prep_ms. Also log frame rates if you can measure (maybe how long it takes to render one viewpoint).
Resource usage: If feasible, log memory usage at peak (you might use platform APIs to get current app memory).
Device specifics: Always include device model, OS version in logs. 3DGS is hardware-sensitive – e.g., an iPhone 15’s Neural Engine is ~20x faster than an iPhone X CPU, and some Android devices might not accelerate at all. This will help correlate user reports with device capabilities.

Use the above logs both locally (Xcode/ADB logging) and consider sending anonymized metrics to your analytics to gather aggregate performance data.

Example log output:

csharp

Copy code

[GSplat] capture_start { "device": "iPhone15,3", "mode":"ARKit", "time": 0 } [GSplat] capture_end { "frames": 24, "duration_s": 8.5 } [GSplat] model_generation_start { "algo": "SHARP_coreml", "hw": "ANE", "quant": "FP16" } [GSplat] model_generation_complete { "duration_ms": 2100, "gaussians": 1200000, "hw": "ANE" } [GSplat] render_start { "engine": "MetalKit" } [GSplat] render_complete { "fps": 60 }

On Android, similar logs:

makefile

Copy code

D/GSplatter: capture_start (mode=video, ARCore=yes) D/GSplatter: capture_end (frames=30, time=10s) D/GSplatter: model_generation_start (model=TFLite_sharp_int8, NNAPI=GPU) W/GSplatter: NNAPI fell back to CPU for 2 ops D/GSplatter: model_generation_complete (time=3500ms, points=1.1e6, device=SM-S908E)

Monitoring such logs during testing will show where bottlenecks are (e.g., if “fell back to CPU” appears often on certain devices, you know NNAPI support is an issue).

User-visible logging: Also consider exposing some info to power users. For example, in a debug mode, show an on-screen FPS counter during rendering, or list the number of points in the model, etc.

Finally, if your app has crash reporting, set custom keys for these events. If a crash occurs during model_generation, you’ll know it was likely memory exhaustion or a specific device. Observability is key for an emerging tech like this.

10) FAQ

Q: Do I need special hardware (LiDAR, depth sensor) to use Gaussian Splatting?

A: No – one of the advantages of 3DGS is that it works with just a regular RGB camera. It does not require LiDAR or dual cameras (though those can improve results slightly if used for initial depth hints). This makes it accessible on a wide range of devices, including those that don’t have any depth sensor (most Androids). However, higher-end phones with better GPUs/NPUs will process the data faster.

Q: Which mobile devices are best for 3DGS?

A: On iOS, devices with Apple Neural Engine (A12 Bionic and later) and a strong GPU (A15/A16 or M1/M2) are ideal – they can perform the neural rendering extremely fast. For example, iPhone 15 Pro or iPad M2 can handle fairly large scenes. On Android, look for phones with a Snapdragon 8 series or Google Tensor chipset, as they have advanced GPU and dedicated ML cores. Qualcomm demonstrated 3DGS at 60 FPS on their latest chips. Mid-range phones without such accelerators may still work but will be slower (or might need smaller models/low resolution input).

Q: Can I use this in production, or is it just a demo/research?

A: It’s becoming production-ready. Apps like Scaniverse and KIRI Engine have already shipped 3DGS in production (Scaniverse on iOS in 2023, and KIRI on Android in late 2023). Apple is clearly interested – they discussed a related model (SHARP) and even released a Vision Pro viewer for splats. That said, it’s still a new tech. There may be quirks and it’s not as battle-tested as say ARKit’s LiDAR scanning. If you ship it, be prepared to handle edge cases and keep an eye on new research improvements (the field is moving fast with papers like LightGaussian, MiniSplat, etc., improving quality/performance).

Q: How does Gaussian Splatting compare to NeRF (Neural Radiance Fields)?

A: Gaussian Splatting is a type of neural radiance field approach, but it’s explicitly representing the scene with actual 3D points (Gaussians) rather than an implicit MLP. The upshot: speed. Classic NeRF required a big neural network and seconds per frame rendering; 3DGS as introduced by Kerbl et al. (SIGGRAPH 2023) can render in real-time once trained. On mobile, NeRF was essentially impractical except via heavy distillation or cloud offload. 3DGS brings this possibility to devices. Quality-wise, 3DGS often has sharper details and handles view-dependent effects with simple per-Gaussian features. It can still struggle with very fine geometry (thin structures) or transparent objects – so in some cases a mesh or LiDAR might capture certain details better. But overall, for on-device usage, 3DGS is a game-changer compared to previous methods.

Q: What format does the output 3D model take? Can I use it elsewhere?

A: Typically, the output is a point cloud of Gaussians, often stored as a PLY file where each point has position and color, plus perhaps extra data like size and opacity. You can certainly export it – e.g., Scaniverse allows exporting the .ply for free. These can be imported into custom viewers (like MetalSplatter for Vision Pro, or a Three.js web viewer). Converting to a mesh is not straightforward because it’s not a surface model, but one could fit a mesh or use the splats as sprites in other engines. There are tools being developed to integrate Gaussian splats into Blender or Unreal Engine, but it’s early. For now, you’ll mostly view them with specialized viewers or within your app.

Q: How do I improve the performance further?

A: Several strategies:

Quantization: Use 16-bit or 8-bit models. Apple’s Neural Engine particularly benefits from quantized models – Core ML can automatically quantize weights which can reduce latency. Android’s NNAPI requires quantized INT8 for certain accelerators. In one example, using INT8 quantization plus NNAPI brought frame processing down to ~50–150 ms.
Pruning: Simplify the model by culling low-contribution Gaussians. Research shows often a small fraction of splats contribute most of the image. You could implement a threshold to drop Gaussians that have negligible opacity or effect on the rendered image, especially for real-time updates.
Clustering: Some works (like LightGaussian, MiniSplatting) cluster Gaussians or use multi-resolution (mipmaps) to reduce the count. Fewer primitives = faster render.
Foveated rendering: If targeting AR/VR, render high detail only where needed (e.g., center of view) and coarser elsewhere. This can maintain visual quality while cutting work, and some mobile XR systems support it.
Multi-threading and pipelines: Use the device’s CPU/GPU concurrently. For instance, do structure-from-motion on CPU while the Neural Engine works on refining splats. Or render one frame on GPU while preparing next frame’s data on CPU.
Profile on hardware: Use tools like Xcode’s Metal Instrument, or Android’s Systrace to see where the bottlenecks are – is it memory bandwidth? shader ALU? NNAPI overhead? Then address accordingly (e.g., if memory bound, try compressing data, etc.).

Q: Can I live-stream a 3DGS (like real-time scanning continuously)?

A: In theory yes, but it’s challenging on mobile right now. The pipeline as described is more like stop-and-process. Real-time updates would mean continually integrating new frames into the model on the fly. Some research is heading that way (dynamic or incremental 3DGS, live SLAM fusion). On a powerful device (like an M2 iPad), you might manage a low-res continuous update, but it will be limited. A compromise is to break the scene into segments or do background processing as the user scans, then show partial reconstructions progressively. This is advanced, but keep an eye on the latest research – the algorithms are improving rapidly, and what’s barely possible today could be standard in a year.

11) SEO Title Options

“How to Get Started with Gaussian Splatting on iOS and Android (On-Device 3D Scanning)” – (Emphasizes getting started and on-device scanning keywords)
“Real-Time 3D Capture on Mobile: Gaussian Splatting with Apple Neural Engine vs Android NNAPI” – (Highlights real-time 3D capture and the platform comparison)
“Integrate 3D Gaussian Splatting into Your App – A Mobile Quickstart Guide” – (Good for developers looking to add the feature, with “quickstart” and “guide”)
“Apple MLX vs Android NNAPI: A Performance Showdown in 3D Gaussian Splatting” – (More blog-casual, focusing on performance battle aspect)
“Mobile Neural Rendering: Gaussian Splatting on iPhone and Android Explained” – (For an explanatory angle, focusing on neural rendering)

Each of these titles touches on keywords like 3D scanning, Gaussian Splatting, mobile, iOS, Android, and performance, which should help attract readers interested in mobile AR, NeRF alternatives, and on-device AI capabilities.

12) Changelog

2026-01-15: Verified the guide using Apple MLX (v0.3) on iOS 17.2 (iPhone 15 Pro, iPad M2) and Android TFLite (TensorFlow Lite 2.12) on Android 14 (Pixel 8). Performance data and references updated. Included latest research hints (Seele 2025 acceleration, Spectacular AI ECCV 2024 blur compensation). Confirmed Scaniverse and KIRI Engine app versions as of late 2025 for real-world usage.
2025-07-10: Initial version based on early 3DGS research. Tested on iPhone 13 (iOS 16) and Pixel 6 (Android 12) – noted slower performance on those generations. Added workarounds for NNAPI issues on Android (fallback to GPU).
2025-11-01: Updated for MLX release and Apple’s SHARP model availability. Added Vision Pro MetalSplatter info after Vision Pro launch. Revised Android steps after Qualcomm’s blog showing feasibility at 60 FPS (so adjusted expectations and added int8 recommendation).