hub-spoke – Let's Build Azure!

TL;DR

Starting September 30, 2025, your VMs will be blocked from the public Internet, unless you have outbound controls in place. This means:

no pulling updates from public package repositories
no pulling containers from public registries
no pushing telemetry to monitoring tools
no making calls to third-party APIs
etc., etc.

Avoid this surprise. Understand your options and the trade-offs between them.

In The Name of Security

Outbound by default is a security risk. Zero trust means you don’t trust anything by default – including egress traffic. Endpoint detect and respond (EDR) technologies and eXtended Detect and Response (XDR) technologies, such as Defender for Endpoint, are not perfect. Zero-day exploits leave your machine vulnerable, even if you diligently apply the latest security patches.

The defense in depth mindset means you have multiple layers of controls to prevent attacks and data exfiltration. Managing egress traffic adds controls to the network layer (above and beyond controls you have on the VM itself).

As a relevant example of defense in depth, assume you are managing an API only intended for trusted third parties. There are many ways to implement authentication and authorization between clients connecting to your API and the services handling those requests (access tokens, OAuth, mutual-TLS, etc.). If you know the IPs of clients accessing your API, additional depth can be added to your defenses with source IP restrictions. If source IP(s) of the trusted party are known and consistent, you can add more layers of depth to your defenses with source IP restrictions. This creates a situation where APIs are only access by a known location (the IP address) using a proven identity (the authentication process).

Analogy to Pull It All Together

I love using analogies to explain technical concepts. I’ll be referring to an analogy when explaining concepts in this article.

Consider a sensitive government research facility at an undisclosed location with strict policies for sending communication packages. Assume your VM is an analyst communicating with an affiliate in a different country. There is a certain amount of trust between the two countries, but the sensitive nature of things does not allow blind trust between the two (packages are not allow to flow freely between the two).

How will the request and response packages between these two affiliates be managed? Keep reading…

What About Network Security Groups (NSGs)?

NSGs allow you to define network layer controls for specific subnets and/or network cards (NICs). You can specify acceptable inbound and outbound IP address and port combinations.

NSGs compliment how you manage outbound traffic, but they are not involved in sending the packets to the public Internet.

To explain in the analogy, the department supervisor acts like a NIC level NSG and applies criteria to determine the analyst’s outbound package is going to an acceptable destination. The package is then handed to the facility supervisor that acts like a Subnet NSG. Using their own criteria, the facility supervisor also makes sure the package is going to an acceptable destination.

At this point, the facility supervisor hands the package to a courier. The courier ensures the package (which is going to a different country) is marked in a way that responses can be returned, without exposing the origin of the package (which is the undisclosed facility).

While the department supervisor and facility supervisor act like NSGs, the courier plays a very different function and is not an NSG. This article focuses on the different ways to setup couriers.

Import Networking Concepts

Before jumping into the design options, it’s worth taking a moment to clarify a few applicable networking concepts. If you want to dive further into these, I’ve included links to articles I’ve found helpful.

Hub-and-Spoke Will Avoid Headaches

An ounce of prevention is worth a pound of treatment

Weathered old wagon wheel on a freight wagon.

Using hub-and-spoke design can dramatically simplify how you solve the dilemma private subnets introduce.

It’s rare to see an entire application landscape (both Production and Non-Production environments) running within a single VNET. At a minimum, you want to have a Production and Non-Production environment for each application and those should be in a separate VNET.

I totally understand you can technically put all the environments for your application(s) is a single VNET; however, it reflects low operational maturity. This typically implies the team doing development has full-access to the production environments, not to mention you are exposed to lateral movement by an attacker.

Moreover, if you’re putting both Production and Non-Production environments in the same VNET, it complicated access controls and will be near impossible to pass a compliance audit for standards like: SOC2, ISO27001, HIPAA/HITRUST, or NIST 800-53!

With that said, it is also very common for an application to communicate with other applications/services that are not exposed to the public Internet. These applications/services may be running in:

other Azure VNETs
other public clouds
other private clouds
other on-premise datacenters

Regardless of which specific scenario, “hub-and-spoke topology” is a tried-and-true scalable design to manage private network traffic flows between these services and public Internet egress.

Explained in simplest terms, individual VNETs are peered with a single common VNET. If you diagram it out, this looks like a wagon wheel, the common VNET is the hub (connected to the axel) and the spokes span outward.

The hub network acts as a common point for communication that passes outside a spoke VNET (whether the destination is the Internet, a different spoke peered to the hub, a spoke peered to a different hub, or somewhere on the other end of a VPN or ExpressRoute connection).

Dive Deeper into Hub and Spoke

Hub-and-spoke topologies can be implemented through structured VNET peering, Azure VWAN service, or the Azure Virtual Network Manager

Source Address Network Translation (SNAT)

Source Network Address Translation (SNAT) is the process of translating a private IP address to a public IP address. The service performing SNAT uses ports to keeps track of the different translations to make sure response packets (which are sent to the public IP address) get routed back to the correct private IP address (the requester).

Going back to the analogy, the courier needs to mark the package in a way that the recipient can send a response, but since the origin is an undisclosed facility, the courier can’t just put the origin address on the package – the courier applies a return address so when the response comes back the courier can forward the package back to the facility of origin.

If the courier gets overwhelmed with packages, the process of shipping to destination o will slow down or the courier may just start rejecting new packages. High amounts of outbound requests can lead to SNAT exhaustion which means outbound requests will intermittently fail. This point is discussed later and is important to understand when selecting the best option for managing outbound traffic.

Dive Deeper into SNAT

Use Source Network Address Translation (SNAT) for outbound connections provides a deeper explanation of SNAT. All of the options discussed in this article rely on SNATing to manage outbound requests.

Give Me Some Options Already

Secure your subnet via private subnet and explicit outbound methods provides a good technical description of options. This article is not intended to regurgitate information in that article; rather provide a more opinionated take on the options and decision flow from an architect perspective.

Virtual Appliance/Firewall (My Preference)

This is a “go-to” design for enterprise security. I am disappointed how little focus the aforementioned article gave to this approach. This design effectively shifts details for managing Internet outbound requests to the firewall implementation, thereby reducing concerns on how to achieve outbound Internet connections from the spoke VNET. In other words, the team creating application specific infrastructure going into the VNET (the “applistructure”) only has to make sure the firewall will allow requests to the destinations – they don’t need to worry about implementing other methods described in this article.

When you are using the previously described hub-and-spoke design, user defined routes (UDRs) send requests destined outside the immediate VNET (targeting public Internet, other spokes, VPN gateways, etc.) to a firewall that resides in the hub VNET. This can be either an Azure Firewall or, if you prefer 3rd party solutions, a Virtual Appliance like Palo Alto, Cisco, Checkpoint.

In the hub-and-spoke rant above, I link to articles describing different methods to implement that pattern and those all include details about enabling firewalls in the hub.

Regardless of the technology, the firewall is responsible for SNATing outbound requests to the public Internet – but it only allows requests if firewall rules agree.

Pointing to the analogy, the firewall is like a courier that handles forwarding the package to it’s destination. Unlike other methods described in this article, this Firewall Courier applies their own set of rules to ensure the destination is acceptable. The Firewall Courier may even inspect contents of the package to make sure nothing that shouldn’t be sent gets sent.

NAT Gateway

This approach is the most scalable method of managing outbound requests, but it is not the most secure – it doesn’t apply any restrictions to outbound requests.

There are a few situations to use the NAT Gateway approach.

Hub-and-spoke with a firewall is just too much for the situation (too complicated, too costly and you don’t need the security, etc.), AND you need more scalability than Public Load Balancer or Public IP options provide
The hub firewall has outbound requests exceed capacity (SNAT exhaustion) of Public IP or Public Load Balancer so you need to place NAT Gateway between the firewall and the public Internet
The hub firewall appliance cannot be associated with Public IP addresses or use Public Load Balancer (for some odd reason)

Reverting to the analogy, the NAT Gateway Courier doesn’t apply rules to determine if the destination is acceptable nor does it inspect package contents. This means, if you are directly sending traffic to a NAT Gateway, NSGs are the only method you can use to control acceptable destinations (which can be a very complex endeavor). Of course, if the Firewall Courier is subcontracting to NAT Gateway Courier, the Firewall Courier is only passing along packages that have been approved (so you get both scalability of NAT Gateway and security of a firewall).

Dive Deeper into NAT Gateway

Azure NAT Gateway documentation provides a lot more detail on how the NAT Gateway scales and can be implemented.

Public Load Balancer with Outbound Rules

WARNING: Understand the risks whenever you put VMs behind a Public Load Balancer

The public Internet is more vicious than you think. If you don’t believe me, create a honeypot and see how long it takes for your machine to get attacked or even fully pwned – you may even find your device in search results on shogun.io

This design leaves all the VMs receiving traffic from the Public Load Balancer exposed to the public Internet – so inbound NSGs are essential. NSGs provide a line of defense before inbound packets get to the VM itself (and NSGs have a default rule that denies all inbound requests from the Internet).

If you putting a pool of virtual appliances behind the Public Load Balancer, that is a bit different scenario (vendors harden those machines) but NSGs are still recommended to ensure administration ports cannot be accessed by untrusted sources over the public Internet.

Having fully emphasized the security concerns, this design has limitations:

All VMs sending outbound traffic through the Public Load Balancer must be in the same VNET
If VMs send too much traffic through the Public Load Balancer (SNAT exhaustion), you need to associate more Public IP addresses with the Load Balancer’s front end. There is a limit to the number of Public IP addresses you can associated with the Public Load Balancer so there is a true-limit – but honestly, if you have that many VMs in the same VNET, you may want to reconsider your network design.

Dive Deeper into Public Load Balancer

Use Source Network Address Translation (SNAT) for outbound connections describes this approach in more detail, but it also describes several of the outbound methods in this article.

Instance Level Public IP Address

WARNING: Understand the risks whenever you associate a Public IP address with a VM

Just like the Public Load Balancer approach, inbound NSGs are critical if you use this design.

From a cost and scalability perspective, this design only provides a single VM the ability to communicate with the Internet. Other VMs in the same network cannot connect to the Internet. If you need to enable outbound communication on many VMs, consider one of the other approaches.

In Closing

There is no silver-bullet to enabling outbound connectivity, however each approach has trade-offs to security, scalability, and cost. After reading this and examining the decision tree you are well-equipped to move from problem (my VMs can’t initiate requests to the public Internet) to solution (one of the four approaches described in this article).

Tag: hub-spoke

Private Subnets: Don’t Let Your VM Get Trapped