What to Expect
Geometric vision research is at the forefront of content production systems, primarily responsible for recovering the spatial structure of scenes from raw 2D image data.
The output of this role is evaluated using rigorous metrics, with the goal of calculating the spatial structure with minimal error at a faster processing speed.
The downstream role connecting to this position is Asset Generative research.
As an intern, you will be working on the company's Cloud AI Infrastructure.
What You’ll Do
Rapid SOTA Reproduction & Adaptation:
Implement training configs, hyperparameters, and losses/constraints (photometric reprojection, geometric consistency, uncertainty calibration).
On data, deliver train/val/test pipelines, A/B baselines, and reproducibility scripts; produce alignment and error reports (depth metrics, pose errors).
Design Training Data & Gains :
Design Training Dataset (sampling, rebalancing, synthesis, augmentation);
implement pre/post-processing to improve metrics (distortion correction, color consistency, depth fusion and hole filling, temporal consistency and uncertainty integration).
Job Requirements
Basic requirements:
Camera models of Equirectangular/cubemap, stitching/slicing, exposure/color alignment, seam/geometric consistency, stitching depth map.
Multi-view Geometry, triangulation/BA, depth fusion, uncertainty estimation; solid grasp of numerical optimization and linear algebra for vision.
Attention/positional encodings, cross-view & temporal attention, cost-volume and epipolar-aware constraints, long-context memory;
forward/reverse processes, noise schedules, conditional guidance, and sampling/distillation; with extensions or gains that preserve camera-model/epipolar consistency, scale observability, and uncertainty calibration.
Training & Evaluation & Coding Experience:
Proficient in Python (3.x)/PyTorch. Strong with NumPy/SciPy/OpenCV; able to deliver E2E training pipelines.
Familiarity with multi-GPU/distributed training, dataloader performance, mixed precision; able to export ONNX/TorchScript and validate accuracy/latency.
Solid C++ (C++17+) and libraries: Eigen/Ceres/g2o/OpenCV/Open3D (≥1).
Familiar with compilation tools such as CMake/Ninja, vcpkg/Conan.
Comfortable building/running/debugging in Docker (aligned with your duties on image size optimization and stability testing).
Fluent with and able to lightly extend COLMAP / OpenMVG / OpenMVS (≥1).
Preferred
Publications/Open-source: CVPR/ICCV/ECCV/NeurIPS/ICLR or notable repos.
Model-level contributions (architecture/loss changes, domain transfer, sparse-supervision utilization). Compensation uplift possible.
Experienced in optimizing model inference efficiency: TensorRT / OpenVINO / TVM.
Candidates with more than one year of relevant work experience (including internship experience) will be given priority consideration.
