Competitive Intelligence and the Web Presented at AMCIS2003 Tampa, Florida by Dr. Robert J. Boncella - PowerPoint PPT Presentation

About This Presentation
Title:

Competitive Intelligence and the Web Presented at AMCIS2003 Tampa, Florida by Dr. Robert J. Boncella

Description:

'the process of ethically collecting, analyzing and disseminating accurate, ... www.dogpile.com. 11. Difficulties with Information Gathering. Time to carry out search ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 38
Provided by: drrjbo
Learn more at: https://www.washburn.edu
Category:

less

Transcript and Presenter's Notes

Title: Competitive Intelligence and the Web Presented at AMCIS2003 Tampa, Florida by Dr. Robert J. Boncella


1
Competitive Intelligence and the WebPresented
atAMCIS2003Tampa, FloridabyDr. Robert J.
BoncellaWashburn University
2
Competitive Intelligence
the process of ethically collecting, analyzing
and disseminating accurate, relevant, specific,
timely, foresighted and actionable intelligence
regarding the implications of the business
environment, competitors and the organization
itself
3
Competitive Intelligence Process
  • Planning and direction
  • working with decision makers to discover and hone
    their intelligence needs
  • Collection activities
  • conducted legally and ethically
  • Analysis
  • interpreting data and compiling recommended
    actions
  • Dissemination
  • presenting findings to decision makers
  • Feedback
  • taking into account the response of decision
    makers and their needs for continued intelligence

4
CI and The Web
  • A business Web site will contain a variety of
    useful information,
  • company history, corporate overviews, business
    visions
  • product overviews, financial data, sales
    figures
  • annual reports, press releases, biographies of
    top executives, locations of offices, and hiring
    ads.
  • An example of this information is
    http//www.google.com/about.html
  • The cost of this information is, for the most
    part, free.
  • Access to open sources does not require
    proprietary software such as a number of
    commercial database

5
The Web Structure and Information Retrieval
  • HTTP protocol and the use of Uniform Resource
    Locators (URL)
  • Mathematical network of nodes and arcs
  • Information Retrieval (IR)
  • follows the links (arcs)
  • from document to document (node to node)
  • Retrieve documents so their content can be
    evaluated and a new set of URLs would be
    available to follow

6
Issues Associated With CI and The Web
  • Information Gathering
  • Information Analysis
  • Information Verification
  • Information Security

7
Information Gathering
8
General Web Search Engines
  • Architecture
  • Web Crawlers (Web Spiders) are used to collect
    Web pages using graph searching techniques
  • An indexing method is used to index collected Web
    pages and store the indices into a database.
  • Retrieval and ranking methods that are used to
    retrieve search results from the database and
    present ranked results to users.
  • A user interface
  • allow users to query the database and customize
    their searches

9
Domain Specific Web Search Engines
  • Northern Light, a search engine for commercial
    publications, in the domains of business and
    general interest.
  • EDGAR is the United States Securities and
    Exchange Commission clearinghouse of publicly
    available information on company information and
    filings.
  • Westlaw is a search engine for legal materials.
  • OVID Technologies provides a user interface that
    unifies searching across many subfields and
    databases of medical information.

10
Meta-search engine
  • Upon receipt of query connects to several general
    search engines
  • Returns integrated results of searches
  • examples
  • www.metacrawler.com
  • www.dogpile.com

11
Difficulties with Information Gathering
  • Time to carry out search
  • Number of pages returned
  • Currency of information
  • Accessible pages
  • Web contains 552.5 billion pages
  • Growth rate of 7.3 million per day
  • Surface Web v.s. Deep Web
  • Surface Web page freely available to public
  • Deep Web
  • dynamic pages, intranets proprietary databases

  • Surface Web contains about 2.5 billion
  • Deep Web contains about 550 billion (200 times
    more)
  • Charge for Web retrieval

12
Information Analysis(Web Mining)
13
Web Page Content
  • Focused Spiders (On Line)
  • Return Appropriate Set of Pages
  • Intelligent Agent
  • User Interface
  • CI Spider by Chau Chen - University of Arizona
  • Answers On-line by Answer Chase

14
Search Result Mining
  • Text Mining (Off Line)
  • Automate the task of organizing and summarizing
    numerous pages
  • Requires automated analysis of natural language
    texts
  • Commercially available text mining applications
    e.g. TextAnalyst by Megacomputer
  • ANN solution SITEX by Fukuda et. al.

15
Web Structure
  • Page Rank
  • Utilized in keyword searching of web
  • Measure of the number of back links to a page
  • Importance of page determined by number links to
    the page
  • Pages priority determined by this measure
  • Implemented in the Google search engine
  • Hyperlink-Induced Topic Search (HITS)
  • Hub Authority measures associated with page
  • Hub - a page that contains links to authoritative
    pages
  • Authoritative - best page (sources) for requested
    informatiom
  • Starts with a keyword search that returns a set
    of pages
  • hubs and authoritative

16
Web Usage
  • Data mining on Web logs
  • Web logs contain clickstream data
  • Server side
  • Information about pages provided
  • Client side
  • Information about pages requested

17
Information Verification
18
Techniques to Verify Accuracy of Information
  • Deep web sources more reliable that surface web
    sources
  • Confirm with non-web source
  • Answer the following
  • Who is the author?
  • Who maintains the web site?
  • How current is the web page?
  • Observe the Top Level Domain (TLD) of the URL
  • within URL denotes a personal web page

19
Domain Names
  • Original TLDs
  • .com
  • .edu
  • .gov
  • .net
  • .org
  • New TLDs
  • .aero (for the air-transport industry)
  • .biz (for businesses),
  • .coop (for cooperatives)
  • .info (for all uses)
  • .museum (for museums)
  • .name (for individuals)
  • .pro (for professions).

20
Information Security
21
Information Security Issues
  • Assuring the privacy and integrity of private
    information
  • Managed with usual computer and network security
    methods
  • Assuring the accuracy of a firms public
    information
  • Defend against
  • Web hijacking
  • Web defacing
  • Cognitive hacking (semantic attack)
  • Negative information
  • Reference - Cybenko, Giani, Thompson
  • Avoiding unintentionally revealing information
    that ought to be private

22
Web Hijacking
Due to a bug in CNNs software, when people at
the spoofed site clicked on the E-mail This
link, the real CNN system distributed a real CNN
e-mail to recipients with a link to the spoofed
page. With each click at the bogus site, the
real sites tally of most popular stories was
incremented for the bogus story.
Allegedly this hoax was started by a researcher
who sent the spoofed story to three users of
AOLs Instant Messenger chat software.
Within 12 hours more than 150,000 people had vie
wed the spoofed page.
23
Web Defacing
In February 2001 the New York Times web site was
defaced by a hacker identified as splurge from
a group called Sm0ked Crew, which had a few
days previously defaced sites belonging to
Hewlett-Packard, Compaq, and Intel.
24
Cognitive Hacking
  • Cognitive hacking is the manipulation of
    perception.
  • Causes
  • disgruntled customers/employees
  • competition
  • random act of vandalism

25
Two types of cognitive hacking
  • single source cognitive hacking.
  • when a reader reads information and the reader
    does not know who posted the information and has
    no way of verifying the information or contacting
    the author of the information.
  • multiple source cognitive hacking
  • occurs when there are several sources for a
    topic, and this becomes a concern when the
    information is not accurate.

26
Categories of Cognitive Attacks
  • Overt
  • No attempt is made to conceal overt cognitive
    attacks
  • website defacements.
  • Covert
  • Provision of misinformation
  • the intentional distribution or insertion of
    false or misleading information intended to
    influence readers decisions and/or activities

27
Emulex Mark Jakob
  • On 8/25/2000 a press release distributed by
    financial news services stated that Emulex
    revised its per share gain to a per share loss
  • Price per share of Emulex moved from 104.00 to
    43.00 in 16 minutes
  • The press released was false - fabricated by Mark
    Jakob who was currently on the wrong side of a
    stock short sale.
  • Jakob launched this press release via Internet
    Wire - LA based firm that distributes press
    releases.

28
The Jonathan Lebed Case
DATE 2/03/00 343pm Pacific Standard Time
FROM LebedTG1 FTEC is starting to break out! Nex
t week, this thing will EXPLODE . . .
Currently FTEC is trading for just 21/2. I am
expecting to see FTEC at 20 VERYSOON . . . Let
me explain why . . . Revenues for the year shoul
d very conservatively be around 20 million.
The average company in the industry trades with a
price/sales ratio of 3.45. With 1.57 million shar
es outstanding, this will value FTEC
at . . . 44. It is very possible that FTEC will
see 44, but since I would like to remain very co
nservative . . . my short term price target on
FTEC is still 20! The FTEC offices are extremely
busy . . . I am hearing that a number of
HUGE deals are being worked on. Once we get some
news from FTEC and the word gets out about the co
mpany . . . it will take-off to MUCH
HIGHER LEVELS! I see little risk when purchasing
FTEC at these DIRT-CHEAP PRICES.
FTEC is making TREMENDOUS PROFITS and is trading
UNDER BOOK VALUE!!! This is the 1 INDUSTRY you
can POSSIBLY be in RIGHT NOW. There are thousands
of schools nationwide who need FTEC to install
security systems . . . You cant find a better
positioned company than FTEC! These prices are G
ROUND-FLOOR! My prediction is that this will be
the 1 performing stock on the NASDAQ in 2000. I
am loading up with all of the shares of FTEC I po
ssibly can before it makes a run to 20.
Be sure to take the time to do your research on
FTEC! You will probably never come across an oppo
rtunity this HUGE ever again in your entire life.
According to the US Security Exchange Commission,
15-year-old Jonathan Lebed earned between 12,000
and 74,000 daily over six months - for a total
gain of 800,000. Lebed would buy a block of FTEC
stock and then using only AOL accounts with
fictitious names he would post a message like the
one in the next text box. Doing this a number
of times he increased the daily trading volume of
FTEC from 60,000 shares to more than one million.
29
POSSIBLE COUNTERMEASURES
  • Single source
  • Authentication of source
  • Information "trajectory" modeling
  • Ulam games
  • Multiple Sources
  • Source Reliability via Collaborative Filtering
    and Reliability reporting
  • Detection of Collusion by Information Sources
  • Byzantine Generals Models

30
Countermeasures Single Source
  • Authentication of Source
  • Due diligence
  • Implied verification - PKI (Digital Signature)
  • Information Trajectory
  • Variation on a theme
  • e.g. Lebed case variation of the pump dump
    scheme
  • Ulam Games
  • Model that assumes false information
  • How fast can that be determined using questions
    answers of source

31
Countermeasures Multiple Sources
  • Collaborative filtering and reliability reporting

  • when a site keeps records and uses those records
    to verify future claims by those with access to
    publishing on the site.
  • Detection of Collusion by Information Sources
  • Linguistic analysis
  • Determine if different sources are by same
    author
  • Byzantine generals model
  • message communicating system has two types of
    processes reliable and unreliable.
  • Given a number of processes from this system
    determine which of type is each process.

32
CountermeasuresNegative Information
  • Monitor Web Sites
  • 5360 URLs with the phrase Microsoft sucks
  • Use an IA to monitor
  • Text mining for type of negative information
  • Respond accordingly

33
CountermeasuresUnintentional Disclosure
  • Carry out a CI project against yourself

34
Conclusions
  • Reconcile deep web v.s. surface web
  • Determine when all pages are needed vs right
    set of pages
  • Automate authoritative page selection
  • Consumer Reports type process
  • e.g. posting a Web page in early 90s (Yahoo)
  • Automate detection of
  • false information
  • inaccurate information
  • negative information

35
Slides http//www.washburn.edu/cas/cis/boncell
a
E-mail bob.boncella_at_washburn.edu
36
References Aaron, R. D. and Naylor, E. Tools f
or Searching the Deep Web , Competitive
Intelligence Magazine, (44), Online at
http//www.scip.org/news/cimagazine_article.asp?id
156. (date of access April 18, 2003).
Calishain, T. and Dornfest, R. (2003) Google Hac
ks 100 Industrial-Strength Tips Tools,
Sebastopool, CA OReilly Associates.
Chakrabarti, S. (2003) Mining the Web Discoveri
ng Knowledge from Hypertext Data, San Francisco,
CA Morgan Kaufmann. Chen, H., Chau, M.l, and Z
ebg, D. (2002) CI Spider A Tool for Competitive
Intelligence on the Web, Decision Support
Systems, (341) pp. 1-17. Cybenko, G., Giani, A
., and Thompson, P. (2002) Cognitive Hacking A
Battle for the Mind, IEEE Computer (358)
August, pp. 5056. Dunham. M. H. (2003), Data M
ining Introductory and Advanced Topics, Upper
Saddle River, NJ Prentice Hall.
Fleisher, C. S. and Bensoussan, B. E. (2000) Str
ategic and Competitive Analysis, Upper Saddle
River, NJ Prentice Hall, 2003.
Fuld, L. (1995) The New Competitor Intelligence,
New York Wiley. Herring, J. P. (1998) "What I
s Intelligence Analysis?" Competitive
Intelligence Magazine, (12), pp., 13-16.
http//www.scip.org/news/cimagazine_article.asp?id
196
37
References Kleinberg, J. M. (1999), Authoritat
ive Sources in a Hyperlinked Environment,
Journal of the ACM (465), pp. 604-632,
September. Krasnow, J. D. (2000), The Competit
ive Intelligence and National Security Threat
from Website Job Listings http//csrc.nist.gov/ni
ssc/2000/proceedings/papers/600.pdf. (date of
access April 18, 2003). Lyman, P. and Varian, H
.R. (2000) Internet Summary Berkeley, CA How
Much Information Project, University of
California, Berkeley, http//www.sims.berkeley.edu
/research/projects/how-much-info/internet.html.
(date of access April 18, 2003).
Murray, M. and Narayanaswamy, R. (2003) The De
velopment of a Taxonomy of Pricing Structures to
Support the Emerging E-business Model of Some
Free, Some Fee, Proceedings of SAIS 2003, pp.
51-54. Page, Lawrence, and Brin, Sergey, The A
natomy of a Large-Scale Hypertextual Web Search
Engine, http//www-db.stanford.edu/backrub/googl
e.html , 1998.(date of access April 22, 2003).
Schneier, Bruce (2000) Semantic Attacks The Th
ird Wave of Network Attacks, Crypto-gram
Newsletter, October 15, 2000, http//www.counterpa
ne.com/crypto-gram-0010.html. (Date of access
April 18, 2003). SCIP (Society of Competitive I
ntelligence Professionals) http//www.scip.org/.
(date of access April 18, 2003).
Write a Comment
User Comments (0)
About PowerShow.com