Title: Adaptive Testing, Oracle Generation, and Test Case Ranking for Web Services
1Adaptive Testing, Oracle Generation, and Test
Case Ranking for Web Services
- Wei-Tek Tsai
- Software Research Laboratory
- Computer Science Engineering Department
- Arizona State University
- wtsai_at_asu.edu
2Table of Content
- Background
- Existed Dilemmas for SOA
- Introduction to WebStrar
- Difference between Blood and WS Group Testing
- Testing Process
- BBS Case Study
- Impact of Training Sizes and Target Sizes
- Conclusions and Future Work
Impacts of Training on Test Case Ranking
3Background
- Software development is shifting away from the
product-oriented paradigm to the service-oriented
paradigm. - Service-Oriented Architecture (SOA) and its
implementation Web Services (WS) received
significant attention as major computer companies
are all adopting this new approach to develop
software and systems. - However, trustworthiness becomes a serious
problem and appropriate tradeoffs have to be paid
during the WS testing phase
4Verification of Web Services
- Collaborative Testing Cooperation and
collaboration among different testing activities
and stakeholders including service provider,
service consumer, and service brokers. - Specification-Based Testing SOA proposes a fully
specification-based process. WS define a
XML-based protocol stack to facilitate service
inter-communication and inter-operation.
Specifications, such as WSDL, OWL-S, and WSFL
describe the service features. Hence, test cases
need to be generated based on these
specifications.
5Existed Dilemmas for SOA (continued)
- Run-time Testing Most WS activities such as
service publishing, discovering, matching,
composition, binding, execution, monitoring are
done at runtime. Thus - Verification and testing must to be done in
cluding test case generation, test execution,
test evaluation, model checking must be done at
run-time. - Different implementations of the same
specification For the same specification of a
service requirement, many alternative
implementations may be available online.
Effective algorithms are needed to rank and
select the best WS.
6Introduction to WebStrar
- WebStrar Infrastructure for Web Services
Testing, Reliability Assessment, and Ranking. It
is is an infrastructure that facilitates the
development of Web services, trustworthy Web
services, and their applications. It provides - the public (service providers, brokers,
requestors , researchers, and regulators) on-line
access to the tools and databases that enable
describing (specifying), finding, scripting
(composing complex services from existing
services), testing, verification, validation,
experimentation, and reliability evaluation of
Web services. - WebStrar has WS group testing to rank services
belong to the same specifications.
7UDDI service broker
Current Web Service Model
Service providers
Clients
Trustworthy Service Broker
Trustworthy Web Service Model based on Testing
Test Master Database of test scripts
Registry
check-in interface
check-out interface
service binder
acceptance interface
?
Clients
Service providers
8WebStrar Infrastructure
Service Providers submit WS test cases
test case oracle
Model checking
Test case generators
Test case database
WS test master
Access Data - Reliability - Ranks
WSDL OWL-S DAML-S
Reliability Models
Test case ranking
Oracle updates
Reliability database
Test case validation
Service composition
Trustworthy WS repository
WS ranking
WS directory
Composite WS
Service Requestors / Clients
9Difference between blood and WS group testing
10Testing Process
- Test a large number of WS at both the unit and
integration levels. - At each level, the testing process has two
phases - Training Phase and
- Volume Testing Phase.
11Phase 1 Training Phase
- Select a subset of WS randomly from the set of
all WS to be tested. The size of the subset,
which is named as Training Size, can be
experimentally decided. - Apply each test case in the given set of test
cases to test all the WS in the selected subset. - Voting For each test input, the outputs from the
WS under test are voted by a stochastic voting
mechanism based on majority and deviation voting
principles.
12Phase 1 Training Phase (Contd)
- Oracle establishment If a clear majority output
is found, the output is used to form the oracle
of the test case that generates the output. A
confident level is defined based on the extent of
the majority. The confident level will be
dynamically adjusted in the phase 2 as well. - Test case ranking Test cases will be ranked
according to their fault detection capacity,
which is proportional to the number failures the
test cases detect. In the phase 2, the higher
ranked test cases will be applied first to
eliminate the WS that failed to pass the test.
13Phase 1 Training Phase (Contd)
- WS ranking The stochastic voting mechanism will
not only find a majority output, but also rank
the WS under group testing according to their
average deviation to the majority output. - By the end of training phase testing, we have
- Tested and ranked the selected WS
- Ranked the potency of test cases
- Establish the oracle for test cases and their
confidence levels.
14Phase 1 Training Phase (Contd)
15Phase 2 Volume Testing Phase
- This phase continues testing the remaining WS and
any newly arrived WS, based on the profiles and
history (test case effectiveness, oracle, and WS
ranking) obtained in the training phase. - By the end of Phase 2
- all the WS available are tested
- A short ranked list of WS
- Test cases are updated and ranked and
- Oracles and their confidence levels are updated.
16Best Buy Stock WS Specification
17Best Buy Stock WS Case Study
18Impact of Training Sizes and Target Sizes
19Impacts of Target Size on Testing Cost
- The smaller the target size, the lower the cost.
This is so because more WS can be eliminated
sooner. - The differences between the curves 1 to 12 are
small, while a large gap exits between curves 12
and 13. The reason is that there are 12
fault-free WS under test. The number of failures
detected from them is zero. If these fault-free
WS are in the current target set, any WS will be
eliminated if a single failure is detected. - When the target size moves from 12 to 13 or
higher, the testing cost increases sharply,
because the algorithm must find a better WS among
a set of imperfect WS.
20Impacts of Training Size on Testing Cost
- The smaller the training size, the lower the
cost. - When the training size is less than or equal to
the target size, increasing the training size
does not increase the cost (the initial part of
the curves is flat). When the training size
exceeds the target size, the cost increases as
the training size increases. - When the training size equals the total number of
WS under test, it becomes exhaustive testing and
no test runs can be saved.
21Oracle Establishment and Confidence
- Note that the oracle is established by a majority
voting. - If the training size is small, the confidence
decreases, and it is even possible that an
incorrect answer can get the majority vote. - Also incorrect WS does not always produce an
incorrect answer. It often produces an incorrect
answer some of times.
22The Impacts of Training Size on Oracle
23Impacts of Training on Test Case Ranking
24Conclusions
- This paper proposed an efficient process to test
a large number of web services designed based on
the same specification. - The experiment results reveal that the smaller
the training size, the lower the cost. - However, a small training size can lead to
incorrect oracle, leading to incorrect WS
ranking. - A small training size can also lead to incorrect
test case ranking, resulting a higher test cost
in phase two. - Therefore, it is critical to select a reasonable
sized training size in WS group testing.
25Future Work
- Need to address the impact of the age of the test
cases. Need to have an adaptive window to address
this. - Also, we need a stochastic algorithm to perform
the majority voting automatically for complex
outputs.