Title: Use of AI algorithms in design of Web Application Security Testing Framework
1Use of AI algorithms in design of Web Application
Security Testing Framework
2Or a non-monkey approach to hacking web
applications
- By fyodor and meder
- fygrave_at_o0o.nu meder_at_o0o.nu
3No. we are not writing another web scanner!!
4Agenda
- Why hacking web applications
- What scanners do. Why they are useless (or not)
- What else could be done, but isnt (yet)
- Introduction to YAWATT
- User-session based approach
- Distributed
- Intelligent (or not?)
- Modular
- More than application security scanner coverage
5This work background
- STIF, STIF2 automation agent-based cooperative
automated hacking environment - http//o0o.nu/sec/STIF
6So, why going for the web
- They learnt to configure their firewalls
- They learnt to disable services they dont want
- They finally know how to use nmap (and even
nessus!!) - .
- But they still want web
- And they cant learn to code
7So why web applications
- Applications get complex
- Multilayered frameworks make it even more fun
- Amount of web application based services grow
- Number of web application programmers increase
(home brewed web applications) - but
8Web application remains a larger hole into ones
network
- Web application programmers skills arent usually
the best - Firewalls are there just to let you in
- application firewalls can stop limited number of
web application attacks, but are useless when it
comes to detection of logical vulnerabilities - IDS systems arent smart enough to pick up on
Application attacks
9Scanners.. use of..
- checking for enumeration ... YES
- checking for exectution ... YES
- checking if we can drop table YES
- checking if we can drop database .. YES ..
- CANNOT CONNECT TO APPLICATION
10Scanners - summary
- Nessus et all dont see web applications beyond
the underlying software configuration - Libwhisker/nikto signature based. Relatively
primitive. Efficient for default bugs - Wikto/e-Or session aware, coding flaws scanner
- Kavado/Appscan/Webinspect/N-Stalker/Watchfire
Appscan intelligent scanners. Session aware.
Closed blackbox (some allow scripted plugins)
11Why scanners aint enough
- Single-host based
- Commercial scanners are black-box (not
extendable, non-correctable) - Little or no control on hacking process
- Not easily extendable on the fly with new
automation modules - Often primitive, signature based logic
12What would we like to have
- Maximum automation of web hacking process
- Minimum of code writing.
- Autonomous functionality
- Knowledge transfer
- Ability to add hacks on the fly
- Deal with uncertainty in intelligent way
- Learn from valid user session data
13Other good things to have
- Be able to test new class of bugs (i.e. session
hijacking) - Be able to attack web application from
multiple-locations (bypass IP restrictions,
improve brute-forcing process) - Be able to automate testing of application logic
bugs - Be able to make intelligent guesses
14Introducing YAWATTmethod
15User sessions
- User sessions collections of user
request/response pairs (url, name/value pairs,
session information and selective HTTP protocol
data) - Classified user session data include semantic
classification of URL, parameters, responses and
HTTP protocol data (server type, backend
system(s) if visible, unusual HTTP headers
content)
16User sessions
- User session data can be obtained from
- Proxy servers (burp, paros, ..)
- Web server logs
- Browser automation scripts (i.e. WATIR framework)
- Spiders (burp)
17Less code, more automation
- Application content is learnt from user sessions
(data feeders) - Additional application information could be
gathered by agents plugins (i.e. directory
splitting tests) - User session data is classified by
- Semantic and functional classification of URL
- HTTP protocol classificators (server type,
cookies ..) - Session classificators
- Input data classification type, semantics
- Output classification (application error
detection, redirects, bogus responses etc) - Test-case suites and executed in groups
- Stateless tests
- Stateful tests
- Mixed
18Classification process as new data arrives into
the system
19Go Intelligent
- Main components
- Web application components (URL) classification
- Semantic classification for web application input
data - LSI based mapping and comparison of web content
- In response analysers.
- Use of external search engines
- Limited binary analysis of downloaded files
(decoding pdf, doc, rtf (other formats later)
20Knowledge Transfer to machine
- Possibility to create new classification rules on
the fly (and let the system re-learn from it) - Possibility to reclassify application responses
- Possibility to add new testing plugins on the
fly
21How is URL classification used
- Vulnerability scenario testing uses
classificators subscribtion mechanism. - For example login page tester will need login,
executable and session
22How does input data semantics identification
happen
23How the classified user session data is used
24Additional research directions
- Other ideas to work on
- Detection of hidden parameters
- Identification of hidden urls
- Identification of negative and positive
responses - Detection of application failures, redirects
- Evaluation and priority based execution for
plugins
25A note on distributed architecture
- Cooperative Agents Infrastructure
- Design cooperative agent system
- Multi-platform
- Portable
26Distributed architecture
27Distributed architecture (another look)
28What distributed approach gives us
- DDoS EASY!!!
- Distributed brute-forcing. Bypassing IP based
restrictions, bandwidth limitations - IDS more tricks
- Bypass packet filtering restrictions
- an agent behind the firewall!
29Communication framework
- Modified version of spread
- Robust
- Reliable message delivery
- Portable (windows/unix)
- Available in C/C and Java flavours. Bindings
exist for Python, Ruby!
30In progress
- Agents communicate with message
- Task distribution algorithms in progress
31More on intelligence
- Aside from application vulnerabilities, other
things of interest are - Email addresses, user ids that could be seen
within web content - Domain names (within web pages, comments, binary
files, etc) - Building target-oriented dictionary files (used
by brute-force cracking modules)
32Other good things
- Add your plugin code on the fly (attack
automation plugins via subscription mechanism,
classification plugins etc) - Cant be simpler
33Look mah, no hands!
- No reload is needed, plugins executed next time
the new data is processed
34 beyond normalities of average application
scanner
- Integration and use of other tools to collect and
analyse data (search engine queries, ..) - Integration with other tools (script in python or
ruby, or hack plugin in java or C) - If you like your favourite application hax0r tool
you still can use it (and feed the data to us!)
35Other remainders
- Direct interaction with analyst (not fully
implemented yet)
36Other remainders
- Data lookup and data mining services for plugins
(via mySQL database wrapping DataMiner).
37Other nice to have things in progress
- Propogation module manual or automated agent
installation on vulnerable server (controlled
worm spreading capability!)
38Demo
- Code is spaghetti (sorry about that)
- Will demonstrate functional bits
39Questions and Answers
- Sample questions, pick one ---------)
- Why another web hacking tool?
- Can you do X too..?
40Thanks
- Thanks for your patience
- The code, slides and docs will be available in a
while - http//o0o.nu/sec
41Xcon plug
- XCon2006 the Fifth Information Security
Conference will be held in Beijing, China, during
August 22-24, 2006. - Speaking abit late, but you can try
casper_at_xfocus.org - Attending should be possible and interesting
- No politics! -)
- Thanks!