Outlier Detection in Data Mining: An Essential Component of Semiconductor Manufacturing - PowerPoint PPT Presentation

About This Presentation

Outlier Detection in Data Mining: An Essential Component of Semiconductor Manufacturing


Outlier detection is a critical research field within data mining due to its vast range of applications including fraud detection, cybersecurity, health diagnostics, and significantly for the semiconductor manufacturing industry. – PowerPoint PPT presentation

Number of Views:1
Slides: 6
Provided by: yieldWerx_YMS


Transcript and Presenter's Notes

Title: Outlier Detection in Data Mining: An Essential Component of Semiconductor Manufacturing

Outlier Detection in Data Mining An Essential
Component of Semiconductor Manufacturing htt
Outlier detection is a critical research field
within data mining due to its vast range of
applications including fraud detection,
cybersecurity, health diagnostics, and
significantly for the semiconductor manufacturing
industry. It refers to identifying data points
that significantly deviate from expected
patterns, providing crucial insights into
different aspects of data. However, the ambiguity
between outliers and normal behavior, evolving
definitions of 'normal', application-specific
techniques, and noisy data mimicking outliers,
often complicate the outlier detection process.
This review article offers an in-depth analysis
of the most advanced outlier detection methods,
presenting a thorough understanding of future
research prospects. Defining Outliers The term
outlier refers to a data point that significantly
deviates from the expected behavior or is
substantially dissimilar from others within a
dataset. Various causes contribute to outliers,
including mechanical faults, changes in system
behavior, human errors, and environmental
alterations. The identification and handling of
outliers remain a complex, ongoing process in
machine learning and data mining. This procedure
often goes by numerous terms such as outlier
mining, novelty detection, outlier modeling,
anomaly detection, and more. Techniques for
Outlier Detection The approaches to identifying
outliers are many and varied, each leveraging
different principles for the purpose. Highlighted
below are the key methods of outlier
detection Statistical-Based Methods This
technique operates based on the deviation of a
data point from a statistical model. It assumes
that regular data points occur in
high-probability regions of a stochastic model,
while outliers are the residents of
low-probability areas. Distance-Based
Methods Distance-based methods focus on the
relative distance of a data point from other
points. An outlier, in this context, is a data
point that lies an exceptionally far-off distance
from others. Density-Based Methods This approach
classifies sparse regions as outliers compared to
denser parts. The central idea is that a data
point located in a low-density region is likely
to be an outlier.
Clustering-Based Methods Clustering-based
techniques classify data points as outliers if
they do not belong to any cluster or if they are
far from their nearest cluster centroid. Graph-Bas
ed Methods By constructing a graph that
represents the relationships among data points,
graph-based methods identify outliers as nodes
with characteristics substantially different from
others. Ensemble-Based Methods These methods
often combine multiple outlier detection
techniques to produce a more robust and accurate
detection process. Learning-Based Methods Often
using supervised or semi-supervised machine
learning models, these techniques learn the
normal behavior patterns from labeled data and
classify the deviating instances as
outliers. Handling Outliers Handling outliers
remains a contentious topic. In some cases,
outliers are viewed as erroneous data and
discarded, but in other instances, they are
treated as integral parts of the dataset.
Eliminating outliers from accurate data may lead
to the loss of critical information. Several
techniques, such as visual examination,
univariate and multivariate methods, and
minimizing outliers during training, have been
proposed for outlier handling. Overall, the
approach to handling outliers largely depends on
the context and often requires analytical
reasoning, intuition, and deliberate
decision-making. Applications of Outlier
Detection The applications of outlier detection
span across a plethora of domains such as data
and process logs, fraud and intrusion detection,
security and surveillance, healthcare and medical
diagnostics, transactional data sources, sensor
networks and databases, data quality and
cleaning, time-series monitoring and data
streams, and Internet of Things (IoT).
Significantly, in the semiconductor manufacturing
industry, outlier detection can play a vital role
in detecting anomalies in manufacturing
processes, hence leading to improved quality
control, fault detection, and lot control in
Emerging Techniques Deep Learning and Ensemble
Approaches Recent years have seen increased
interest in leveraging deep learning and ensemble
techniques for outlier detection. Deep
learning-based approaches, primarily autoencoders
and deep neural networks (DNNs) have demonstrated
promising results in detecting complex and subtle
outliers, especially in high-dimensional data.
For example, Autoencoder, a popular deep learning
architecture, is trained to reconstruct its input
data. The reconstruction error is then used to
determine the anomaly score. A high error
indicates that the data point is hard to model,
thus an outlier. Ensemble techniques combine
multiple outlier detection models to increase
robustness and accuracy. They often use various
base detection algorithms or multiple
configurations of a single base algorithm. The
final decision is usually based on a majority
vote, average, or another combination rule of the
base detectors' results. Both these techniques
have promising applications in the semiconductor
industry. They can detect minute faults or
anomalies in the manufacturing processes that may
be overlooked by traditional methods, potentially
saving significant resources and increasing
overall efficiency. The Challenge of Scalability
and the Role of Distributed Detection
Techniques As data size increases, the number of
outliers and the computational cost for detection
also increase, making the process slow and
costly. This is especially relevant in the
semiconductor yield in manufacturing industry
where terabytes of data are generated daily.
Therefore, scalable outlier detection techniques
become necessary for large datasets. To address
this, distributed outlier detection techniques
have been proposed. They partition the original
data into several subsets and distribute them
across different nodes in a distributed system to
process in parallel. After local outlier
detection is performed on each node, the results
are aggregated to produce the outcome. These
techniques are effective in managing large
datasets, reducing computational costs, and
speeding up the detection process.
  • Outlier Detection in Semiconductor Manufacturing
    Industry Fault Detection and Quality Control
  • Outlier detection is especially important in the
    semiconductor manufacturing industry, where
    precision and accuracy are critical. The
    manufacturing processes generate enormous amounts
    of data from various sources, such as machine
    logs, sensors, and quality control tests.
  • Detecting outliers in this data can help identify
    potential faults in the manufacturing process
    early, thus preventing the production of faulty
    chips, reducing waste, and saving costs. For
    instance, a sudden change in sensor readings
    during a particular manufacturing stage could be
    an outlier, indicating a potential issue in that
  • Moreover, outlier detection can play a
    significant role in quality control. By
    identifying anomalies in test data, outlier
    detection can help pinpoint chips that may not
    perform as expected. This can enhance the overall
    quality of the products, leading to better
    reliability and customer satisfaction.
  • To summarize, outlier detection plays a pivotal
    role in enhancing the efficiency, quality, and
    cost-effectiveness of semiconductor
    manufacturing, further highlighting the need for
    advanced and scalable outlier detection
    techniques in the industry.
  •  Conclusions
  •  While each outlier detection technique has its
    unique strengths and weaknesses, the field
    continues to evolve, warranting continuous
    research and advancement. This evolution includes
    a comprehensive understanding of each method's
    performance, the issues they address, and their
    comparative analyses. This understanding will
    provide invaluable insights for future work in
    the field of outlier detection.
  •  References
  • Aggarwal, C. C., Yu, P. S. (2001). Outlier
    detection for high dimensional data. In
    Proceedings of the 2001 ACM SIGMOD international
    conference on Management of data.
  • Chandola, V., Banerjee, A., Kumar, V. (2009).
    Anomaly detection A survey. ACM computing
    surveys (CSUR), 41(3), 1-58.
  • Hodge, V., Austin, J. (2004). A survey of
    outlier detection methodologies. Artificial
    intelligence review, 22(2), 85-126.
  • Zimek, A., Schubert, E., Kriegel, H. P. (2012).
    A survey on unsupervised outlier detection in
    high-dimensional numerical data. Statistical
    Analysis and Data Mining The ASA Data Science
    Journal, 5(5), 363-387.
  • Pang, G., Cao, L., Chen, L. (2020). Outlier
    detection in complex categorical data by modeling
    the feature value couplings. In Proceedings of
    the 26th ACM SIGKDD International Conference on
    Knowledge Discovery Data Mining.
Write a Comment
User Comments (0)
About PowerShow.com