Hi everyone — looking for feedback on a new infrastructure project we launched called vMetal. It's a bare metal management platform for GPU clusters that handles machine discovery, PXE booting, and lifecycle management, without the OpenStack complexity. Built around Kubernetes-native workflows so you can hand it off to teams or drop it into an existing platform. A lot of the infra platforms used for this today were designed 20 years ago (VMware, OpenStack, NVIDIA BCM, MAAS, etc.), while newer tools usually solve only a small piece of the stack. Neither were built with modern GPU cluster ops in mind. In practice most setups end up stitching things together or building custom provisioning pipelines.
With vMetal we took a different approach: treat physical machines like programmable infrastructure resources. Compared to tools like MAAS or Tinkerbell, vMetal is designed around a few ideas: - Bare metal lifecycle automation: Automatically discover machines on the network, boot them, install OS images, and reprovision nodes as hardware moves between clusters or workloads. Built on Metal3 and Ironic. -Built for GPU cluster ops: Supports environments where nodes frequently move between clusters, capacity pools, or tenant workloads. -Direct Kubernetes integration: Provisioned machines can be attached directly to Kubernetes clusters as nodes or assigned to infrastructure pools. -Works with Kubernetes multi-tenancy layers: Integrates with vCluster (virtual clusters) and vNode (node-level isolation) so machines can move from bare metal provisioning into multi-tenant Kubernetes environments. We’ve shared a few other infrastructure projects here before (DevPod, vCluster), and the feedback from HN has been incredibly helpful. Curious how others here are handling bare metal provisioning today — MAAS, Ironic, Metal3, Tinkerbell, something custom?
Open to any feedback, positive or negative.
Comments URL: https://news.ycombinator.com/item?id=47433599
Points: 8
# Comments: 0