Nested JSON data processing using Apache Spark with Coding - PowerPoint PPT Presentation

About This Presentation
Title:

Nested JSON data processing using Apache Spark with Coding

Description:

Here we have given information about nested JSON data processing using Apache Spark in this article and given some necessary code related to it, then go to the end of this article to get more information about it. – PowerPoint PPT presentation

Number of Views:76

less

Transcript and Presenter's Notes

Title: Nested JSON data processing using Apache Spark with Coding


1
Nested JSONData processing usingApache Spark
2
Instructions for use
  • Let us read a public JSON dataset available on
    the internet. Extract required fields from nested
    data, and analyze the dataset to get some
    insights. Here Im using the Baby names public
    data set available on the internet for this demo.

What are we performing in this demo?
  • Read data from the URL using scala API
  • Convert the read data into a dataframe
  • Extract the required fields from the nested JSON
    dataset
  • Analyze the data by writing queries
  • Visualize the processed data

3
  • Let us read a public JSON dataset available on
    the internet. Extract required fields from nested
    data, and analyze the dataset to get some
    insights. Here Im using the Baby names public
    data set available on the internet for this demo.

After this, we use the jsonString Val created
above and create a dataframe using Spark API. We
need to import spark.implicits to convert
Sequence of Strings to a Dataset, and then we
create a dataframe out of it.
4
Now let us see the schema of the JSON using
printSchema method
5
Now let us see the schema of the JSON using
printSchema method
-- data array (nullable true) --
element array (containsNull true)
-- element string (containsNull true))
Also, it contains metadata about the data, lets
not worry about it, for now. But you can have a
look at it when you run this in your machine.
Mainly it contains columns field information in
metadata, which I have extracted for you to have
a better understanding of the data we will work
on.
6
  • We have below fields within an Array of data that
    we are going to analyze.
  • meta
  • Year
  • first_name
  • County
  • Sex
  • Count
  • Sid
  • Id
  • Position
  • created_at
  • created_meta
  • updated_at
  • updated_meta

7
But how we can extract these data fields from
JSON? Now lets select data from the jsonDF
dataframe we created. It looks something like this
8
Now we have to extract the fields within this
data. To do this, let us first create a temporary
view of this dataframe and use explode function
to extract Year, Name, County, and gender
fields. To use explode method, we should first
import spark sql functions.
9
Now let us see the schema of the insightData.
10
Let me show you the contents of insightData
datafrmae using the display method available in
Databricks.
11
Now let us write a query to see what is the most
popular first letter baby names to start within
each year.
insightData.select("year","name").createOrReplaceT
empView("yearname") val disspark.sql("select
year,firstLetter,count,ranks from (select
year,firstLetter,count ,rank() over (partition by
year order by count desc) as ranks from (select
year, left(name,1) as firstLetter, count(1) as
count from yearname group by year ,firstLetter
order by year desc,count desc)Y )Z where ranks1
order by year desc")
12
Now lets visualize this data using the graphs
available in Databricks.
13
(No Transcript)
14
Apache Spark Integration Services
  • With 15 years in data analytics technology
    services, Aegis Softwares Canada expert offers a
    wide range of apache spark implementation,
    integration, and development solutions also 24/7
    support.

15
AEGIS SOFTWARE
OFSHORE SOFTWARE DEVELOPMENT COMPANY
INDIA (Head Office) 319, 3rd Floor, Golden Plaza,
Tagore Road, Rajkot 360001 Gujarat, India
CANADA (Branch Office) 2 Robert Speck
Parkway, Suite 750, Mississauga, ON
Ontario-L4Z1H8, Canada.
info_at_aegissoftwares.com
www.aegissoftwares.com
16
Thank you
Write a Comment
User Comments (0)
About PowerShow.com