Snowseer

Achieving minimal-shot autonomy by recognising constants across environments.

Toronto, January 2021. The green overlay is the inferred position of the road under the snow, transferred from matched images of the same location in summer. Nothing in the processing pipeline has been trained in snowy conditions.

Minimal-shot autonomy

Minimal-shot autonomy concerns the question of how a system survives in unfamiliar environments. The commonly accepted response is to collect more data and retrain. We know this answer works, but it's impractical. It does not scale across the long tail of conditions a deployed vehicle, robot, or drone may encounter in the real world. Snow, dust, smoke, washouts and variable human infrastructure are obvious examples. A perception system that depends on having been trained on each new condition will not only be expensive and cumbersome to develop, but will necessarily lag every condition it has not yet encountered.

The protagonist of this project is the humble snow-clearing vehicle, dedicated to doing a task ripe for automation. Maintaining a workforce of trained drivers for engagement during only a small subset of the year, who must deploy at very short notice, is inefficient and expensive. More importantly, snow-clearing is a geographically bounded, repetitive and speed-limited operation over roads whose geometry is already known. Unlike open-ended urban driving, the operational objective is narrow: keep the vehicle aligned with a known road corridor while avoiding people, vehicles, kerbs, signs, parked cars and other obstacles. This makes the snow plough a particularly strong candidate for autonomy.

A snow plough's job is simple: sweep the road clear. The obvious difficulty is that, while the plough is in operation, the road is necessarily invisible. Curbs are buried, lane markings are gone, the boundary between tarmac and verge is no longer defined. A self-driving stack trained on traditional road conditions, applied directly to the plough's camera, will report with calibrated confidence that the entire scene is road and should be cleared. Approaching this problem via the traditional method of annotating a labelled snow dataset large enough to cover the long tail of road, weather, and time-of-day combinations is uneconomic and chronically incomplete. The alternative principle proposed in this project does not require new training data and is applicable across the broader concept of minimal-shot autonomy.

For almost every operating environment where autonomy fails for lack of data, an adjacent regime exists, temporally or seasonally or geographically, where data is plentiful and rich, and whose key components remain the same across environments. The road which needs to be ploughed this winter is the same road it was in the summer. Its appearance has changed, but its position in space and relative to local landmarks has not.

Snowseer is a canonical demonstration of leveraging training data via structural constants across environments. It transfers knowledge across regimes to achieve minimal-shot autonomy.

The constants-bridge

The core of this project is a simple idea: a constants-bridge. This is a composition that takes a model trained on regime A, an inference target in regime B, and a known invariant linking A and B, and uses the invariant characteristics of each regime to transfer the model into environment B without retraining.

The invariant in this work is geometric. The road sits where it sat last summer, in the same place relative to other landmarks. The shape is more general. Anything that stays observable and unchanged between two regimes is a candidate bridge: anatomy across medical imaging environments, terrain across illumination states, scene structure across weather conditions.

The constituent parts are not new. Geometric scene analysis, classical RANSAC, and pretrained feature matchers and segmenters have been combined in many ways across the computer-vision literature in the past. The contribution this submission makes is to identify the composition of the environment itself as a key feature to use, and to give an end-to-end working demonstration of the property the brief calls for: generalisation through understanding, not memorisation. The feature matcher in our pipeline is not generalising its recognition of "snow". It has not been trained on snow. The focus is generalisation via what stays the same.

Snow query (left) and its paired summer prior (right) with green correspondence lines drawn between matched features that survive the seasonal change.
Matched correspondences between a snow-covered street in Gällivare, Sweden (left) and a summer image of the same location (right). The lines connect features that haven't changed across seasons. No correspondences land on the road surface itself. The matcher anchors on the consistent non-road features, and the homography fitted to those carries the road mask from the summer image into the snow frame.

Snowseer

A pre-trained feature matcher establishes correspondences between the live snow frame and a clear-season prior of the same coordinates. A homography (projection transformation) is fitted to those correspondences and geometrically connects the clear-season prior to the snow frame. A pre-trained segmenter produces a road mask on the clear prior, and that mask is warped through the homography onto the winter image, producing an overlay of where the road is underneath the snow.

Per snow frame, six steps:

  1. Pull the live snowy frame from the plough's camera.
  2. Pull a clear-season prior of approximately the same coordinates.
  3. Examine the two with DISK + LightGlue feature-matching models.
  4. Estimate a homography via a USAC-MAGSAC RANSAC model.
  5. Run a Mask2Former segmenter on the clear prior only, to obtain the true position of the road in clear conditions.
  6. Warp the road mask into the snow frame via the homography and overlay onto the plough's visuals.
   ┌──────────────┐                    ┌──────────────────────┐
   │  Snow frame  │                    │  Clear-prior frame   │  Any geo-tagged
   │   (live)     │                    │                      │  clear-weather
   │              │                    │                      │  imagery
   └──────┬───────┘                    └──────────┬───────────┘
          │                                       │
          │                                       ▼
          │                            ┌──────────────────────┐
          │                            │ Mask2Former segmenter│  Pretrained
          │                            │                      │
          │                            └──────────┬───────────┘
          │                                       │ Road mask in prior space
          │                                       │
          └►  DISK + LightGlue feature matching ◄─┘             Pretrained
                              │
                              │
                              ▼ Image correspondences
                    USAC-MAGSAC homography
                              │
                              ▼
                  warp prior mask → snow space
                              │
                              ▼
                    fuse over K=3 priors  +  EMA over time for smoothness
                              │
                              ▼
                  ┌──────────────────────┐
                  │  Road overlay on the │
                  │      snow frame      │
                  └──────────────────────┘
Four panels: snow input (top-left), naive Cityscapes baseline applied directly to snow (top-right, painting road class across snow and sky), summer prior with successful road segmentation (bottom-left), cross-season overlay produced by warping the prior's road mask through the homography (bottom-right).
One frame from a clip of driving in Toronto. Top-left: the snowy input. Top-right: the naive road-segmented baseline. Bottom-left: the paired summer prior with successful road segmentation (because the road is visible). Bottom-right: the cross-season overlay produced by warping the prior's road mask through the homography into the snow frame.

The video processor wraps a per-still-image-pair pipeline in three layers: a track loader indexing snow and summer streams by GPS pose, a prior pool returning the K = 3 nearest summer captures by distance for each snow frame, and an exponential moving average (α = 0.4) on the binary road mask to produce a smoother continuous render of the transferred road position.

The output of the Snowseer pipeline is restricted to presenting only where the road is expected to be, not where the snow should necessarily be cleared by the plough. A broader autonomous snow-clearing system would integrate Snowseer with other sensing and safety processes (lidar, depth, obstacle detection), each unburdened of the road-position problem on a buried road.

Components

ComponentRoleModelDataset
Feature detectorLocate keypoints in each imageDISK
(NeurIPS ‘20)
MegaDepth
Feature matcherPair keypoints across the snow / summer imagesLightGlue
(ICCV ‘23)
MegaDepth
Homography fitRobust geometric registration of the pairUSAC-MAGSAC
(CVPR ‘20)
n/a
Road segmenterProduce a road mask on the summer imageMask2Former
(CVPR ‘22)
Cityscapes

Components used crucially without any retraining: none are fine-tuned on snow.

Demo material

The demo material consists of video clips from snow-covered streets in Toronto (January 2021 and February 2025), obtained from the Boreas dataset. The pipeline produces a continuous green road-region overlay tracking the buried road frame by frame. A side-by-side naive baseline (the same segmenter applied directly to the snow frame) is included for contrast: the naive overlay sprawls across frames into clearly non-road territory, while the cross-season overlay tracks the buried road continuously through the clip on a pipeline whose components have never seen snow.

Toronto, January 2021. Four panels per frame: snow input (top-left), naive segmenter baseline applied directly to snow (top-right), paired summer prior with road segmentation (bottom-left), cross-season overlay onto the snow frame (bottom-right).
Toronto, February 2025. Four panels per frame: snow input (top-left), naive segmenter baseline applied directly to snow (top-right), paired summer prior with road segmentation (bottom-left), cross-season overlay onto the snow frame (bottom-right)..

The same pipeline was also verified across a set of 18 images from different nordic regions, covering distinct snow scenes, road layouts, lighting conditions and environments. They served as proof-of-concept testing early in development and represent the foundations the video pipeline was built on.

Each pair below is a 2×2: snow query (top-left), naive segmenter baseline applied directly to snow with the road approximation in red (top-right), paired summer prior of the same scene (bottom-left), cross-season overlay with the road transferred onto the snow frame in green (bottom-right).

Snow query, Gällivare residential street with red houses. Naive Cityscapes road segmentation applied directly to the snow frame, painting the class across snow and sky. Paired summer prior of the same Gällivare scene. Cross-season overlay: green road mask transferred from the summer prior onto the snow frame.
Gällivare, Sweden.
Snow query, Kiruna residential street. Naive Cityscapes road segmentation on the Kiruna snow frame. Paired summer prior of the same Kiruna scene. Cross-season overlay: green road mask transferred onto the Kiruna snow frame.
Kiruna, Sweden.
Snow query, Luleå street. Naive Cityscapes road segmentation on the Luleå snow frame. Paired summer prior of the same Luleå scene. Cross-season overlay: green road mask transferred onto the Luleå snow frame.
Luleå, Sweden.

The entire pipeline is reproducible from a clean repository clone with the command make reproduce.

Limitations

Some artefacts in the overlay are inherited from the summer prior.
Where the front of the summer capture vehicle is visible in the prior, the warped road mask begins a short distance ahead of the snow camera rather than directly under it. Where a parked car or other obstacle sits on the road in the prior, the segmenter selects the road around the obstacle and the overlay carries that cutout forward into the snow frame. Artefacts of this kind are not a property of the constants-bridge principle, rather an implementation challenge. Both are tractable with reasonable engineering, for example by extrapolating the road below the visible mask boundary, or by fusing several priors of the same scene so that any single prior's occlusions are filled in by the others.

The pipeline is not real-time.
Partly due to lack of access to a snow plough and vast amounts of snow, but perhaps also due to computational constraints, the road overlay pipeline currently runs asynchronously to when images are taken. This is a barrier to deployment, but is an implementation task which is certainly surmountable if this project were to take a more operational form. There is nothing in the principle that stops a knowledge-transfer system running live. The matching pass dominates per-frame compute, taking around 16 s per frame on Mac CPU. Demo clips build end-to-end in roughly an hour. Real-time operation needs a substantially faster matcher and segmenter, which is a deployment-engineering problem rather than a research question, and is the first item in the next-steps section.

The system is not, currently, able to be deployed on arbitrary data.
The design of the present iteration of the pipeline is contingent on a certain format of high-quality clear-road imagery. Generalising the system to operate on any given road with Google Street View (or a comparable source) available is feasible (the pipeline is substrate-agnostic in principle), but the current code does not support this and is geared toward producing the specific material in this demonstration. Integrating a wider source corpus is a natural next step.

Next steps

The key contribution of this project is not to revolutionise the snow ploughing process, rather to present the concept of constants as a bridge to form a principle for general information transfer across environments. Other projects of a similar nature, possessing similar structural characteristics, should certainly be explored.

To get Snowseer to the operational level, the next steps are to upgrade its constituent components:

The snowplough's road-position channel is one consumer of this appliance. The same appliance, with the same recipe, could power fog, dust, smoke, heavy-rain and night driving, as well as heads-up display navigation, seeing around corners, and many more use cases.

Specific development ideas the prize money could fund

  1. Real-time matcher: use the efficiency of a GPU to bring per-frame matching from around 15 s on Mac CPU to under 1 s on a deployment-class device. Required for live operation.
  2. Visual place recognition front-end: replace the GPS-pose lookup with a learned recognition step so the appliance works in GPS-denied environments and without prior pose.
  3. Multi-source clear-season image bank: curate and integrate a wider source corpus (Mapillary global, Street View, operator captures) so any covered road can be a deployment target.
  4. Hardware prototype: a battery-powered processing unit running the live appliance with a simple HUD-esque output, demonstrating the snow, fog and night-driving consumer scenarios end-to-end.

Reproduce

git clone https://github.com/aturner22/snowseer; cd snowseer
uv sync --python 3.12
export MAPILLARY_TOKEN=<token>
make reproduce

make reproduce runs three steps sequentially (~3 hours on Mac CPU): the January 2021 clip, the February 2025 clip, and the 18-pair static-stills precursor. make track TRACK=<id> runs a single track ad-hoc. make stills runs only the static-stills demo (MAPILLARY_TOKEN required).

Companion notebook: analysis.ipynb.