Skip to main content

Announcing Version 1.0 of .NET for Apache Spark

Today, we announce the release of version 1.0 of .NET for Apache® Spark™, an open source package that brings .NET development to the Apache® Spark™ platform. This release is possible due to the combined efforts of Microsoft and the open source community. Version 1.0 includes support for .NET applications targeting .NET Standard 2.0 or later. Access to the Apache® Spark™ DataFrame APIs (versions 2.3, 2.4 and 3.0) and the ability to write Spark SQL and create user-defined functions (UDFs) are also included in the release.

The .NET Bot

The following code snippet is an example of using Spark to produce a word count from a document (browse the full sample here):

var docs = spark.Read().Option("header", true).Csv("documents.csv");
var filCol = Functions.Col("file");
var words = docs
    .Select(
        fileCol,
        // "a b c" => ["a", "b", "c"]
        Functions.Split(
            Functions.Col("words"), " ")
        .Alias("wordList"))
    // flatten into one row per word
    .Select(
        fileCol,
        // 1: ["a", "b", "c"] => 1: "a", 2: "b", 3: "c"
        Functions.Explode(
            Functions.Col("wordList"))
        .Alias("word"))
    .GroupBy(fileCol, Functions.Lower(Functions.Col("word")))
    .Count();

Background

.NET for Apache® Spark™ launched two years ago to address increasing demand from the .NET community for an easier way to build big data applications. A recent survey confirmed the biggest motivation to use the package is to take advantage of existing .NET development skills and resources, including the enormous .NET ecosystem of existing libraries and frameworks. The team is committed to the continuous evolution of the product to integrate the latest features and keep the API current with the latest Spark versions. For more about the history of the project and key contributors, read the full announcement.

Get Started

There are several options to get started. First, read the full .NET for Apache Spark 1.0 announcement. Then you can:

The post Announcing Version 1.0 of .NET for Apache Spark appeared first on .NET Blog.



source https://devblogs.microsoft.com/dotnet/announcing-version-1-0-of-net-for-apache-spark/

Comments

Popular posts from this blog