AI Foundations in Visual Model Training
AI model training starts with data. In visual AI the data most valuable is video data collected from cameras. High-quality video data helps models learn motion, context, and behavior. For developers and city planners this matters, since models need real-world variety. The process requires careful data curation, annotation, and iteration. Training visual ai models demands labeled frames, bounding boxes, and temporal consistency so that computer vision systems generalize across conditions.
However, sourcing compliant video for computer vision poses challenges. Legal frameworks such as GDPR constrain how public video can be stored and reused. In Europe the AI Act adds another compliance layer, so regulation-ready pipelines are essential. As a result many AI developers struggle to get ethically sourced, auditable footage. To solve this friction, initiatives centralize data libraries with traceability, and they enforce privacy and compliance across the pipeline.
Annotation accuracy and dataset diversity determine model performance. If labels are inconsistent, then models underperform. If scenes lack diversity, then visual language model outputs fail in complex urban situations. Therefore teams focus on pre-annotated sequences, and they implement quality and compliance checks in every stage. For example, controlled workflows give traceability for each annotated video data asset, so teams can verify provenance and auditing records.
For organizations building operational systems, the difference between detection and explanation is critical. visionplatform.ai transforms detections into reasoning by coupling an on-prem language model with event-level context, which helps operators act faster. For practitioners aiming to deploy ai models in control rooms, visual ai must deliver not only accuracy but also explainability and auditable workflows.
Finally, to accelerate ai development teams must balance compute, annotation, and dataset variety. Using GPUs and cloud microservices shortens iteration cycles, and using curated, ethically sourced video reduces legal risk. Consequently teams can train computer vision models that perform reliably in urban environments and in complex urban scenarios.
project hafnia: Vision and Goals
Project Hafnia is a 12-month initiative designed to create a regulated platform for video data and model training. The program focuses on collecting compliant video data and building pipelines that support training visual ai at scale. Specifically, project hafnia aims to democratize ai model training by making high-quality video data available under a controlled access licence. The effort targets smart cities and public agencies that need regulation-ready tools for model development.
Milestone Systems leads the program, and the project hafnia’s roadmap set milestones for data collection, annotation, model fine-tuning, and deployment. The timeline moved from pilot captures to full-scale data library creation within the year. To ensure regulatory-compliant handling the project emphasised privacy-by-design and auditable documentation. The work helped cities test models without compromising data privacy or vendor lock-in.
Thomas Jensen said, “Artificial intelligence is a transformative technology, with access to high-quality training data being a key challenge. Project Hafnia is designed to create the world’s smartest, fastest, and most responsible platform for video data and AI model training.” This quote frames the intent and the urgency. As part of that intent the effort included early access pilots in multiple cities, and it set out to meet EU AI Act and GDPR obligations.
Project Hafnia also plans to support fine-tuning of vision language models and vlms so that models reflect European values and constraints. The program includes pre-annotated collections, which allow computer vision developers to start with quality labels. Thus the platform supports training visual ai models while retaining traceability and auditable provenance for every annotated video data asset.
For teams that want to explore advanced forensic capabilities see practical examples such as natural-language forensic search. visionplatform.ai’s approach to forensic search complements these efforts by offering on-prem reasoning and search across VMS records, which helps in operationalising the datasets created under project hafnia.

AI vision within minutes?
With our no-code platform you can just focus on your data, we’ll do the rest
NVIDIA Partnership and Technology Stack
The collaboration with nvidia and Nebius provided essential technical depth. Milestone Systems partnered with nvidia to accelerate the pipeline for training and curation. The platform integrates nvidia’s ecosystem and the nemo curator to manage labelled assets. Specifically, nvidia nemo curator on nvidia dgx and cloud instances enabled fast, regulation-ready workflows for data curation and dataset versioning. The stack also links with Nebius for cloud orchestration and microservices.
NVIDIA NeMo Curator plays a central role in dataset curation. Teams use the tool to annotate, validate, and export compliant video data for training. The combination of curator and AI tools lets engineers manage large-scale annotated video data while enforcing privacy, traceability, and quality checks. In addition the pipeline supports the creation of a data library that houses pre-annotated sequences and metadata for provenance.
Project Hafnia’s technical choices included containerised microservices, traceable labeling systems, and a pipeline that supports visual language model training. This architecture helps teams fine-tune visual language model components and visionary vlms that link video frames to textual descriptions. To illustrate the practical effect, the project extended to Genoa as a pilot city to validate the stack in live urban environments during real deployments.
Beyond curation, the partnership produced a nvidia ai blueprint for video that outlines GPU-accelerated model training patterns, and it introduced processes to handle compliant data across jurisdictions. The joint approach supports ai developers who need a reproducible pipeline and compliance documentation. For organisations focused on on-prem solutions, visionplatform.ai complements cloud curation by keeping video and models local, reducing cross-border risks.
Finally, the stack included support for vision language models, and it provided tooling to annotate complex behaviours. This helped computer vision developers bootstrap models that link events to language, so operators receive meaningful, explainable outputs rather than raw detections.
GPU-Accelerated AI Model Training
GPUs change the economics of model training. They reduce training time from days to hours, and they allow multiple experiments to run in parallel. With gpus teams can iterate faster, explore hyperparameters, and deliver higher-quality models. For video workloads the parallelism of GPUs is especially valuable because video frames create large tensors and time-series sequences.
Training visual ai models on GPUs yields clear throughput gains. For example, using DGX-class systems can cut epoch time significantly. In Project Hafnia the use of turbocharged GPU pipelines helped models converge faster, which meant more experiments per month. The nemo curator on nvidia dgx cloud supported data preprocessing and batch augmentation, and it helped maintain consistent data feeds for training visual ai.
Real versus synthetic video data processing differs in compute demand. Synthetic sequences require rendering and physics simulation up-front, but they reduce annotation overhead. Real traffic video and traffic video from pilots capture true sensor noise and environmental complexity. Combining both types let teams strike a balance: synthetic data broadens scenarios while real footage creates realism and robust generalisation. The pipeline therefore mixed real and synthetic datasets to achieve models trained for diverse conditions.
Cost efficiencies appear when GPUs deliver more models trained per dollar spent. The system-level gains included lower iteration cost and faster fine-tuning cycles. For teams that need to deploy ai models in production the result is faster rollouts and better model lifecycle management. Additionally, GPU acceleration supports on-prem inference on edge devices such as NVIDIA Jetson, which helps cities deploy models without sending raw video to the cloud.
Overall, GPU-based pipelines, combined with curated annotated video data, let teams accelerate ai while keeping quality and compliance in focus. This model also supports a transition from pure video analytics to AI-assisted operations where models do more than detect; they explain, verify, and recommend actions.
AI vision within minutes?
With our no-code platform you can just focus on your data, we’ll do the rest
Smart Cities Deployment Case Study
Genoa served as the first full-scale deployment for project hafnia. The city integrated curated, compliant video data into systems that support traffic management and urban sensing. Project Hafnia collected annotated sequences, then used models trained on that data to provide actionable insights. For example, the system improved vehicle flow analytics and helped planners identify congestion hot spots.
Through the pilots, models drove analytics that mattered for operations. They produced occupancy counts, flow rates, and event summaries. This kind of output complements advanced forensic search features; control rooms can query incidents using natural language and then verify footage quickly. For readers interested in practical forensic search examples, visionplatform.ai documents its on-prem forensic search process which turns VLM outputs into searchable, human-readable descriptions for further reading.
Project Hafnia’s rollout demonstrated measurable operational improvements. Cities saw faster incident verification and lower response times. The models trained on curated data delivered fewer false positives than legacy analytics, which reduced operator workload. In addition, the curated datasets helped create fine-tuned models that matched local conditions without sacrificing privacy and compliance.
Beyond safety, the deployment improved planning. The system provided data for heatmap occupancy analytics, and it informed decisions on lane adjustments and signal timings. For airport or transport operators wanting similar insights, resources such as vehicle detection and classification show how object-level data supports broader operations vehicle detection examples.
Finally, the Genoa pilot validated that compliant video data and strong curation deliver urban analytics that scale. The deployment convinced other cities to request early access and to consider similar pilots. The project therefore created a template for responsible technology adoption in urban environments.

Traffic Management and Ethical Data Governance
Traffic management is a primary use case for video-based AI. Using curated datasets, teams can train models to support intelligent traffic control and traffic and transportation analytics. These models power applications such as queue detection, vehicle counts, and anomaly flags. When deployed responsibly they help reduce congestion and improve safety.
Ethical governance forms the backbone of data sharing. Project Hafnia adopted controlled-access licences so researchers and ai developers could use compliant data without exposing identities. This regulatory-compliant model supports privacy and compliance by default. The platform applied privacy-preserving techniques and auditable pipelines, which made each dataset traceable and auditable.
Controlled access also means organisations can fine-tune without data leaving their jurisdiction. For teams that prefer on-prem solutions, visionplatform.ai keeps video, models, and reasoning inside the operational environment, which reduces cross-border data risk. This approach helps systems meet the EU AI Act while enabling model fine-tuning and deployment of AI solutions in secure contexts.
Privacy-by-design measures included pre-annotation at capture, controlled redaction, and metadata management. The legal and technical architecture provided traceability, which satisfies both auditors and procurement teams. In practice this allowed cities to deploy ai-driven traffic management tools while preserving citizens’ rights and data privacy.
Ethical sourcing also matters at scale. By using ethically sourced, annotated video data and clear licences the initiative reduced ambiguity about reuse. As a result, cities could deploy models without compromising safety or compliance. The combination of data curation, regulation-ready processes, and GPU-accelerated training created a realistic path to deploy ai models that improve urban mobility, public safety, and operational efficiency.
FAQ
What is project hafnia?
Project Hafnia is a 12-month initiative led by Milestone Systems to build a platform for compliant video data and model training. The programme focuses on secure curation, annotation, and accessible datasets for AI development.
Who are the main partners in the project?
Milestone Systems partnered with NVIDIA and Nebius to deliver the technical stack and cloud orchestration. The collaboration combined data curation tools, GPU acceleration, and regulatory workflows.
How does the nemo curator help?
Nemo curator streamlines dataset labeling, validation, and export for training pipelines. It supports traceable curation and helps produce regulation-ready datasets that are suitable for model fine-tuning.
Where has project hafnia been deployed?
Genoa was an early deployment city that validated the platform in a real urban environment. The pilots demonstrated improvements in traffic management and operational analytics.
How does GPU acceleration improve training?
GPUs reduce training time and allow more experiments per cycle, which increases model quality and lowers iteration cost. The result lets teams fine-tune models faster and deploy ai solutions more quickly.
Can cities maintain data privacy while using these models?
Yes. Controlled-access licences, pre-annotation, and privacy-by-design pipelines make datasets auditable and compliant. These mechanisms support regulation-ready deployments without compromising data privacy.
How do vision language models fit into the system?
Vision language models convert video events into descriptive text, enabling natural-language search and forensic workflows. This enhances operator understanding and supports automated reasoning inside control rooms.
What role does visionplatform.ai play?
visionplatform.ai offers an on-prem Vision Language Model and agent layer that turns detections into reasoning and action. This complements cloud curation by keeping video and models local, improving compliance and operational value.
How are synthetic and real video data balanced?
Teams combine synthetic video to widen scenario coverage with real footage to capture sensor noise and realism. This hybrid strategy improves generalisation for computer vision models.
How can an organisation get early access or learn more?
Many pilots offered early access to cities and research partners to validate the approach. Interested organisations should consult project partners and technical documentation to plan compliant deployments.