Skip to main content

Azure Active Directory’s gateway is on .NET Core 3.1!

Azure Active Directory’s gateway service is a reverse proxy that fronts hundreds of services that make up Azure Active Directory (Azure AD). If you’ve used services such as office.com, outlook.com, or xbox.live.com, then you’ve used Azure AD’s gateway. The gateway provides features such as TLS termination, automatic failovers/retries, geo-proximity routing, throttling, and tarpitting to services in Azure AD. The gateway is present in more than 53 Azure datacenters worldwide and serves ~115 Billion requests each day. Up until recently, Azure AD’s gateway was running on .NET Framework 4.6.2. As of September 2020, it’s running on .NET Core 3.1.

Motivation for porting to .NET Core

The gateway’s scale of execution results in significant consumption of compute resources, which in turn costs money. Finding ways to reduce the cost of executing the service has been a key goal for the team behind it. The buzz around .NET Core’s focus on performance caught our attention, especially since TechEmpower listed ASP.NET Core as one of the fastest web frameworks on the planet. We ran our own benchmarks on gateway prototypes on .NET Core and the results made the decision very easy: we must port our service to .NET Core.

Does .NET Core performance translate to real-life cost savings?

It absolutely does. In Azure AD gateway’s case, we were able to cut our CPU costs by 50%.

The gateway used to run on IIS with .NET Framework 4.6.2. Today, it runs on IIS with .NET Core 3.1. The image below shows that our CPU usage was reduced by half on .NET Core 3.1 compared to .NET Framework 4.6.2 (effectively doubling our throughput).

Azure Active Directory’s gateway service is on .NET Core

As a result of the gains in throughput, we were able to reduce our fleet size from ~40k cores to ~20k cores (50% reduction).

Image CoresReduction 2

How was the port to .NET Core achieved?

The porting was done in 3 phases.

Phase 1: Choosing an edge webserver.

When we started the porting effort, the first question we had to ask ourselves was which of the 3 webservers in .NET Core do we pick?

We ran our production scenarios on all 3 webservers, and we realized it all came down to TLS support. Given the gateway is a reverse proxy, support for a wide range of TLS scenarios is critical.

Kestrel:

  • When we started our migration (November 2019), Kestrel did not support client certificate negotiation nor revocation on a per-hostname basis. In .NET 5.0, support for these features was added.
  • As for .NET 5.0, Kestrel (via its reliance on SslStream) does not support CTL stores on a per hostname basis. Support is expected in .NET 6.0.

HTTP.sys:

  • HTTP.sys server had a disconnect between the TLS configuration at Http.Sys layer and the .NET implementation: Even when a binding is configured to not negotiate client certificates, accessing the Client certificate property in .NET Core triggers an unwanted TLS renegotiation.

  • For example, performing a simple null check in C# renegotiates the TLS handshake:

    if (HttpContext.Connection.ClientCertificate != null)
    

    This has been addressed in: https://github.com/dotnet/aspnetcore/issues/14806 in ASP.NET Core 3.1. At the time, when we started the port in November 2019, we were on ASP.NET Core 2.2 and therefore did not pick this server.

IIS:

  • IIS met all our requirements for TLS, so that’s the webserver we chose.

Phase 2: Migrating the application and dependencies.

As with many large services and applications, Azure AD’s gateway has many dependencies. Some were written specifically for the service, and some written by others inside and outside of Microsoft. In certain cases, those libraries were already targeting .NET Standard 2.0. For others, we updated them to support .NET Standard 2.0 or found alternative implementations, e.g. removing our legacy Dependency Injection library and instead using .NET Core’s built-in support for dependency injection. The .NET Portability Analyzer was of great help in this step.

For the application itself:

  • Azure AD’s gateway used to have a dependency on IHttpModule and IHttpHandler from classic ASP.NET, which don’t exist in ASP.NET Core. So, we re-wrote the application using the middleware constructs in ASP.NET Core.
  • One of the things that really helped throughout the migration is Azure Profiler (a service that collects performance traces at runtime on Azure VMs). We deployed our nightly builds to test beds, used wrk2 as a load agent to test the scenarios under stress and collected Azure Profiler traces. These traces would then inform us of the next tweak necessary to extract peak performance from our application.

Phase 3: Rolling out gradually.

The philosophy we adopted during rollout was to discover as many issues as possible with little or no production impact.

  • We deployed our initial builds to test, integration and DogFood environments. This led to early discovery of bugs and helped in fixing them before hitting production.
  • After code complete, we deployed the .NET Core build to a single production machine in a scale unit. A scale unit is a load balanced pool of machines.
    • The scale unit had ~100 machines, where 99 machines were still running our existing .NET Framework build and only 1 machine was running the new .NET Core build.
    • All ~100 machines in this scale unit receive the exact type and amount of traffic. Then, we compared status codes, error rates, functional scenarios and performance of the single machine to the remaining 99 machines to detect anomalies.
    • We wanted this single machine to behave functionally the same as the remaining 99 machines, but have much better performance/throughput and that’s what we observed.
  • We also “forked” traffic from live production scale units (running .NET Framework build) to .NET Core scale units to compare and contrast as indicated above.
  • Once we reached functional equivalence, we started expanding the number of Scale units running .NET Core and gradually expanded to an entire datacenter.
  • Once an entire datacenter was migrated, the last step was to gradually expand worldwide to all the Azure datacenters where Azure AD’s gateway service has a presence. Migration done!

Learnings

  • ASP.NET Core is strict about RFCs. This is a very good thing as it drives good practices across the board. However, classic ASP.NET and .NET Framework were quite a bit more lenient and that causes some backwards compatibility issues:
  • There was a performance bottleneck in FileBufferingReadStream‘s CopyToAsync() method due to multiple 1 byte copies of a n byte stream. This has been addressed in .NET 5.0 by picking a default buffer size of 4K: https://github.com/dotnet/aspnetcore/issues/24032
  • Be aware of classic ASP.NET quirks:
    • Trailing whitespace is auto-trimmed in the path:
      • foo.com/oauth   ?client=abc is trimmed to foo.com/oauth?client=abc on classic ASP.NET.
      • Over the years, customers/downstream services have taken a dependency on this path being trimmed and ASP.NET Core does not auto-trim the path. So, we had to trim trailing whitespace to mimic classic ASP.NET behavior.
    • Content-Type header is auto-generated if missing:
      • When the response is larger than zero bytes, but Content-Type header is missing, classic ASP.NET generates a default Content-Type:text/html header. ASP.NET Core does not force generate a default Content-Type header and clients who assume Content-Type header is always sent in the response start having issues. We mimicked the classic ASP.NET behavior by adding a default Content-Type when it is missing from downstream services.

Future

Porting to .NET Core resulted in doubling the throughput for our service and it was a great decision to move. Our .NET Core journey will not stop after porting. For the future, we are looking at:

The post Azure Active Directory’s gateway is on .NET Core 3.1! appeared first on .NET Blog.



source https://devblogs.microsoft.com/dotnet/azure-active-directorys-gateway-service-is-on-net-core-3-1/

Comments

Popular posts from this blog

Fake CVR Generator Denmark

What Is Danish CVR The Central Business Register (CVR) is the central register of the state with information on all Danish companies. Since 1999, the Central Business Register has been the authoritative register for current and historical basic data on all registered companies in Denmark. Data comes from the companies' own registrations on Virk Report. There is also information on associations and public authorities in the CVR. As of 2018, CVR also contains information on Greenlandic companies, associations and authorities. In CVR at Virk you can do single lookups, filtered searches, create extracts and subscriptions, and retrieve a wide range of company documents and transcripts. Generate Danish CVR For Test (Fake) Click the button below to generate the valid CVR number for Denmark. You can click multiple times to generate several numbers. These numbers can be used to Test your sofware application that uses CVR, or Testing CVR APIs that Danish Govt provide. Generate

How To Iterate Dictionary Object

Dictionary is a object that can store values in Key-Value pair. its just like a list, the only difference is: List can be iterate using index(0-n) but not the Dictionary . Generally when we try to iterate the dictionary we get below error: " Collection was modified; enumeration operation may not execute. " So How to parse a dictionary and modify its values?? To iterate dictionary we must loop through it's keys or key - value pair. Using keys

How To Append Data to HTML5 localStorage or sessionStorage?

The localStorage property allows you to access a local Storage object. localStorage is similar to sessionStorage. The only difference is that, while data stored in localStorage has no expiration time untill unless user deletes his cache, data stored in sessionStorage gets cleared when the originating window or tab get closed. These are new HTML5 objects and provide these methods to deal with it: The following snippet accesses the current domain's local Storage object and adds a data item to it using Storage.setItem() . localStorage.setItem('myFav', 'Taylor Swift'); or you can use the keyname directly as : localStorage.myFav = 'Taylor Swift'; To grab the value set in localStorage or sessionStorage, we can use localStorage.getItem("myFav"); or localStorage.myFav There's no append function for localStorage or sessionStorage objects. It's not hard to write one though.The simplest solution goes here: But we can kee