Top 20 Data Science Interview Questions and Answers – IQ Online Training - PowerPoint PPT Presentation

View by Category
Title:

Top 20 Data Science Interview Questions and Answers – IQ Online Training

Description:

Data science deals with systems and processes extracting knowledge from data we have in various forms by using certain. To learn Data Science online training course at your flexible hours, you can always reach IQ Online Training – PowerPoint PPT presentation

Number of Views:19
Slides: 14
Provided by: charlie963
Transcript and Presenter's Notes

Title: Top 20 Data Science Interview Questions and Answers – IQ Online Training

1
DATA SCIENCE
LEARN ANYTIME ANYWHERE
Contact Cell 17325938450, 919989527180, 91
6302830890 Mail info_at_iqtrainings.com
2
What is data cleansing? Data cleansing or Data
cleaning enhances the quality of data by
identifying the errors, inconsistencies from the
data and removing them.
Data cleaning has a major role in data analysis,
explain. Cleaning data which is taken from many
different sources and arranging it in a format
for the easy use of any data scientist or data
analyst is a difficult process. With the increase
in the volume of the data generated and the
number of data sources, the time that takes to
clean the data also increases exponentially.
Since, cleaning takes the major part of time, it
has become the major part of analysis task.
Which among the two would you prefer for text
analysis, Python or R? Because of the Pandas
library, which provides easy to use data
structures, python is mostly preferred. But, when
it comes to ad hoc analysis and exploring
databases, R plays better.
DATA SCIENCE
3
What is logistic regression? It is a statistical
method. It is used to examine the dataset where
the outcome is defined by one or more independent
variables.
Differentiate between Univariate, Bivariate and
Multivariate analysis. These are the techniques
used for statistical analysis. Based on the
number of variables involved at one time, these
are differentiated. If only one variable is
sufficient to analyze, for example a sales pie
chart, then it is called univariate analysis. If
an analysis requires two variables in a scatter
plot to understand the difference, then it is
called bivariate analysis. For example, sale and
spending analysis can be considered under it. If
analysis involves more than two variables to
understand, then it is called multivariate
analysis.
DATA SCIENCE
4
Explain the steps in making a decision
tree. Firstly, as an input, take the entire data
set. Look for a split (which divides the given
data into two sets) that maximizes the separation
of the classes. Then divide or split the input
data. To the split data, re-apply the steps 1 to
2. Stop the process, when you meet any stopping
criteria. If you split for many times, clean up
the tree. This step is called pruning.
Compare SAS, R and Python programming? SAS This
is one of the popularly used analytical tools by
some of the big companies. It has some of the
best in the world statistical functions, GUI but
has a price tag because of which the usage by
small companies drops. R The drawback of SAS is
covered here i.e. it is an open source tool. This
could be the reason for the generous use of R by
academia and research community. R is mostly used
for statistical computation,
DATA SCIENCE
5
reporting and graphical representation. Since it
is an open source tool, the updates would reach
the users immediately. Python Python is also an
open source programming language. It is one of
the easy programming languages you can learn. It
can be integrated easily with most of the other
tools and technologies. It is a very robust
language with innumerable libraries and many
other modules created by the community.
Difference between data mining and data
profiling Data profiling It targets individual
attributes and gives information on various
attributes like discrete value and their
frequency, data type, length, value range
etc. Data mining Data mining targets on
detection of unusual records, cluster analysis,
dependencies or relations between several
different attributes, etc.
DATA SCIENCE
6
How the statistics are used by Data
Scientists? Statistics come as a great use for
Data Scientists in the form of identifying hidden
insights, patterns, and converting Big Data into
Big insights that helps to see the customer
behavior and expectations. This helps the Data
Scientists learn right from the customer behavior
to the customer conversion and helps them build
powerful data models for certain interferences
and predictions. This way they help the
businesses and customers by giving them what they
want and when they want.
• What are some of the common problems faced by
data analyst?
• Following are some most common problems a data
analyst face.
• Common misspelling
• Duplicate entries
• Missing values
• Illegal values
• Varying value representations
• Identifying overlapping data

10
DATA SCIENCE
7
What is the goal of A/B Testing? This is an
experimental statistical testing done with two
variables A and B to identify which webpage
performs better of the two tests A and B. For
example, in case of running a banner Ad, it is
used to identify the click through rate.
11
What are the different data validation methods
that data analyst use? Usually, two methods are
used by data analysts for data validation, Data
screening Data verification
Explain Hierarchical Clustering
Algorithm? Hierarchical clustering algorithm
creates hierarchical structure by combining and
dividing existing groups, which show the order in
which groups, are divided or merged.
DATA SCIENCE
8
Various steps in an analytics project First step
is to understand what exactly the business
problem is. Then explore the given data and
become familiar with it. Next prepare the data
for modeling. Now start running the model,
analyze the result and iterate the step till the
best outcome is achieved. Now, validate the model
using a new data set. Next is the implementation
of the model and tracking the result and
analyzing the performance of the model.
14
What is Linear Regression? It describes relation
between a dependent variable and independent
variable and also mostly used for predictive
analysis like in case of sales, price etc. where
it predicts the values which are in a continuous
range rather than classifying into categories.
DATA SCIENCE
9
Following are the three methods in Linear
Regression. Determining the direction and
correlation of data and analyzing it. Deploying
the estimation of the model To make sure the
model is useful and has good validity It is
mostly used is cases where we determine the cause
of the effect. For instance, with the linear
regression, we can know the effect of a certain
action in determining the various outcomes and on
the final outcome.
What is Normal Distribution? Normal Distribution
which can be considered as a continuous
probability distribution is a set of continuous
variable spread across a normal curve. It is very
useful in the statistics and in the analysis of
the variables and their relationships. This is a
symmetrical curve and as the size of the samples
increases, the non-normal distribution approaches
the normal distribution. Central Limit Theorem
can also be deployed very easily.
DATA SCIENCE
10
• Explain what is Clustering? What are the
properties for clustering algorithms?
• Clustering is a classification method that
divides the data set into clusters or natural
groups.
• Properties for clustering algorithm are
• Hierarchical or flat
• Hard and soft
• Iterative
• Disjunctive

17
What is Machine Learning? The Machine Learning is
field of a artificial intelligence (AI) where the
systems will be given the ability to learn things
automatically and make decisions with very less
human intervention.
DATA SCIENCE
11
What is a hash table? A Hash table is a data
structure which is used to implement an
associative array. It stores data in an
associative manner. We can also say, it is a map
of keys to values. A Hash table uses hash
function to compute an index into an array of
slots and fetch the desired value.
19
Explain biasing types that occur during
sampling? Selection bias Under coverage
bias Survivorship bias
DATA SCIENCE
12
Come, Start Learning..!Its your time to join
us!!Enroll now..!!To avail a FREE Live Demo.
DATA SCIENCE
13
Thank you..!!