Continuity Equations: Analytical Monitoring of Business Processes and Anomaly Detection in Continuou - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

Continuity Equations: Analytical Monitoring of Business Processes and Anomaly Detection in Continuou

Description:

... with seeded errors, and count the number of false positives and false negatives. ... Detection rate = 1 False Negative Error Rate ... – PowerPoint PPT presentation

Number of Views:40

Avg rating:3.0/5.0

Slides: 24

Provided by: RBSu7

Category:

more less

Transcript and Presenter's Notes

Title: Continuity Equations: Analytical Monitoring of Business Processes and Anomaly Detection in Continuou

1
Continuity Equations Analytical Monitoring of
Business Processes and Anomaly Detection in
Continuous Auditing

Michael G. Alles
Alexander Kogan
Miklos A. Vasarhelyi
Jia Wu
Rutgers University
Nov, 2005

2
Data-oriented CA Automation of Substantive
Testing

Formalization of BP rules as data integrity
constraints.
Verification of data integrity ? identification
of exceptions.
Selection of critical BP metrics and development
of stable business flow (continuity) equations.
Monitoring of continuity equation residuals ?
identification of anomalies.

3
Establishing Data Integrity A Procurement Example

Referential integrity along the business cycle
and identification of completed cycles
P.O. ? Shipment receipt ? voucher payment.
Identification of data consistency issues and
automatic alarms to resolve exceptions
Changes in purchase order vendor numbers
Discrepancies between the totals and the sums of
line items
Discrepancies between matched voucher amounts.

4
Detection of Exceptions

Referential integrity violations
PO without matching requisition
Received item without matching PO
Payments without matching received items
Data integrity violations
PO has zero order quantity
Received item has negative quantity
Invalid payment check numbers (e.g. All 0s)
Gross payment amount is smaller than net payment
amount

5
Advanced Analytics in CA BP Modeling Using
Continuity Equations

Continuity equations
Statistical models capturing relationships
between various business processes.
Can be used as expectation models in the
analytical procedures of continuous auditing.
Originated in physical sciences (various
conservation laws e.g. Mass, momentum).
Continuity equations are developed using the
methodologies of
Simultaneous equation modeling (SEM)
Multivariate time series modeling (MTSM).

6
Basic Procurement Cycle
t2-t1
P.O.(t1)
Receive(t2)
t3-t2
Voucher(t3)
7
Continuity Equations of Basic Procurement Cycle

Receive(t2) P.O.(t1)
Voucher(t3) Receive(t2)
Arent partial deliveries allowed?
Are all orders delivered after exactly the same
time lag?
Are there any feedback loops?

8
Inferred Analytical Model of Procurement

P.O.(t) 0.24P.O.(t-4) 0.25P.O.(t-14)
0.56Receive(t-15) ePO
Receive(t) 0.26P.O.(t-4) 0.21P.O.(t-6)
0.60Voucher(t-10) eR
Voucher(t)0.73Receive(t-1) - 0.25P.O.(t-7)
0.22P.O.(t-17)t-17 0.24Receive(t-17) eV

9
Detection of Anomalies

Anomalies are detected if
Observed P.O.(t) lt Predicted P.O.(t) - Var
or
Observed P.O.(t) gt Predicted P.O.(t) Var
Similarly for
Receive(t)
Voucher(t)
Var acceptable threshold of variance.
If there is anomaly ? generate alarm!

10
Steps of Analytical Modeling and Monitoring Using
Continuity Equations

Choose essential business processes to model
(purchasing, payments, etc.).
Define (physical, financial, etc.) metrics to
represent each process e.g., Amount of
purchase orders, quantity of items received,
number of payment vouchers processed.
Choose the levels of aggregation of metrics
By time (hourly, daily, weekly), by business
unit, by customer or vendor, by type of products
or services, etc.

11
Steps of Analytical Modeling and Monitoring Using
Continuity Equations - II

Identify and estimate stable statistical
relationships between business process metrics
Continuity Equations (CEs).
Define acceptable thresholds of variance from the
expected relationships.
If the variances (residuals) exceed the
acceptable levels, alarm human auditors to
investigate the anomaly (i.e., the relevant
sub-population of transactions).

12
How Do We Evaluate CE Models?

Linear Regression Model is the classical
benchmark for comparison.
Models are compared on two aspects
Prediction Accuracy.
Anomaly Detection Capability.
Mean Absolute Percentage Error (MAPE) is used to
measure prediction accuracy.
MAPE Abs (predicted value actual value) /
(actual value) 100
A good analytical model is expected to have high
prediction accuracy, or low MAPE.

13
Prediction Accuracy Comparison Results Analysis

Prediction accuracy comparison results
Linear regression (best).
Multivariate Time Series (middle).
Simultaneous Equations (worst).
Difference is small (lt2).
Noise in our data sets may pollute the results.
Prediction accuracy is relatively good for all
three models
MAPE is around 0.40 (Leitch and Chen 2003).
Other studies report over 100 MAPE.

14
Simulating Error Stream The Ultimate Test of CA
Analytics

Seed errors of various magnitude into randomly
chosen subset of the holdout sample.
Identify anomalies as those observations in the
holdout sample for which the variance exceeds the
acceptable threshold of variance.
Test whether anomalies are the observations with
seeded errors, and count the number of false
positives and false negatives.
Repeat this simulation several times by choosing
different random subsets to seed errors into.

15
Acceptable Threshold of Variance

What to use as acceptable threshold of variance?
Prediction Interval
Confidence interval for the predicted variable
value.
Anomalies are detected if
Value in the observation lt lower confidence
limit, or
Value in the observation gt upper confidence limit.

16
Error Seeding Procedure

To simulate an anomaly detection scenario, we
seed errors into the hold-out data set (47 obs.)
Original anomalies are detected before error
seeding.
Errors are seeded into 8 randomly-selected
observations which do not have original
anomalies.
5 different error magnitudes are used for each
round of error seeding respectively. (10, 50,
100, 200 and 400 of actual value of the seeded
observation).
The above procedure is repeated 10 times to
reduce the variance of the results.

17
Measuring Anomaly Detection

False positive error (false alarm, Type I error)
A non-anomaly mistakenly detected by the model as
an anomaly. Decreases efficiency.
False negative error (Type II error) An anomaly
failed to be detected by the model. Decreases
effectiveness.
Detection rate is used for clear presentation
purpose The rate of successful detection of
seeded errors.
Detection rate 1 False Negative Error Rate
A good analytical model is expected to have good
anomaly detection capability low false negative
error rate (i.e. high detection rate) and low
false positive error rate.

18
Simulated Error Correction

CA makes it possible to investigate a detected
anomaly in (nearly) real-time.
Anomaly investigation can likely correct a
detected problem in (nearly) real-time.
Real-time problem correction results in utilizing
the actual (not erroneous) values in analytical
BP models for future predictions.
Real-time error correction is likely to benefit
future anomaly detection, and the magnitude of
this benefit can be evaluated using simulation.

19
Benefit of Real-time Error Correction MTSM
20
Anomaly Detection Rate Comparison Results
21
False Positive Error Comparison
22
Anomaly Detection Rate Comparison Results
Analysis

SEM and MTSM outperform the linear regression
model when the error magnitudes are large, even
though linear regression has slightly better
detection rate when the error magnitudes are
small.
It is more important to detect material errors
than non-material errors.

23
Concluding Remarks

New CA-enabled analytical audit methodology
simultaneous relationships between highly
disaggregated BP metrics.
How to automate the inference and estimation of
numerous CE models?
How to identify and remove outliers from the
historical data to estimate statistically valid
CEs (step-wise re-estimation of CEs)?
How to identify the need to re-estimate a CE
model (trends in residuals)?
How to make it worthwhile (trade-off between
effectiveness, efficiency and timeliness)?
Any patterns for detected errors?