Ergebnis für URL: http://pespmc1.vub.ac.be/WEBCONAN.html [1]Principia Cybernetica Web
Web Connectivity Analysis
There exist different algorithms to extract information from the pattern of links
(connectivity) between web pages
____________________________________________________________________________
The links connecting documents in the web are in principle all equivalent: the
web itself does not express an preference for one link or one document above
another. Yet, the connectivity or pattern of linkages between pages does contain
a lot of implicit information about the relative importance of links. The author
of a web document will normally only include links to other documents that are
relevant to the general subject of the page, and of sufficient quality. Thus,
locating one document relevant to your goals may be sufficient to guide you to
further information on that issue. High quality documents, that contain clear,
accurate and useful information, are likely to have many links pointing to them,
while low quality documents will get few or no links. Thus, although no explicit
preference function is attached to a link, there is a preference implicit in the
total number of links pointing to a document. This preference is produced
collectively, by the group of all web authors.
There exist different mathematical techniques to extract this information.
Recently, two types of algorithms have been developed for this purpose: PageRank
(Brin & Page 1998) and HITS (Kleinberg 1998). Both use a bootstrapping approach:
they determine the quality or "authority" of a web page on the basis of the
number and quality of the pages that link to it. Since the definition is
recursive (a page has high quality if many high quality pages point to it), the
algorithm needs several iterations to determine the overall quality of a page.
Mathematically, this is equivalent to computing the eigenvectors of the matrix
that represents the linking pattern in the selected part of the web. PageRank
uses the linking matrix directly, HITS uses a product of the matrix and its
transposed matrix. The latter method produces two types of pages: authorities,
that are pointed to by many good "hubs" (indexes or lists of web pages), and
hubs, that point to many good authorities. In combination with a keyword search,
which restricts the pages for which the quality is computed to a specific problem
"neighborhood", these methods seem to produce a much better quality in the
answers returned for a query.
The disadvantage of these methods is that they are static: they merely use the
(rather sparse) linking pattern that already exists; they do not allow the web to
adapt to the way it is used, as the [2]learning web algorithms propose. However,
the two methods can complement each other, as the use of connectivity matrices
does not require these matrices to have only binary values (either there is a
link or there is not). The [3]learning web and other techniques will produce less
sparse matrices with numerical values that can be analysed in the same way, but
are likely to produce more fine-grained and reliable results.
Various Links on Web Connectivity Analysis
* [4]The Clever Project: research at IBM Almaden based on Kleinberg's HITS
method; see also: [5]Jon Kleinberg's Homepage with several papers, including:
[6]Authoritative sources in a hyperlinked environment, in: Proc. 9th ACM-SIAM
Symposium on Discrete Algorithms, 1998.
* The PageRank algorithm is being used in the [7]Google search engine, and is
sketched in: Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd.
[externallink.GIF] [8]The PageRank Citation Ranking: Bringing Order to the
Web (Manuscript in progress), [externallink.GIF] [9]The Anatomy of a
Large-Scale Hypertextual Web Search Engine , by S. Brin & L. Page, and in:
[externallink.GIF] [10]Efficient Crawling Through URL Ordering, by J. Cho, H.
Garcia-Molina & L. Page
* the [11]Web Archeology project at Digital Research
* [12]Xerox PARC UIR Webology: "information ecology" research by Pitkow,
Pirolli and others, including the papers: [13]Silk from a Sow's Ear:
Extracting Usable Structures from the Web and [14]Life, Death, and Lawfulness
on the Electronic Frontier
* [15]WebQuery: Searching and Visualizing the Web Through Connectivity: a paper
by J. Carriere and R. Kazman
* [16]Web Structure Analysis: a collection of links
* [17]Project Aristotle(sm): Automated Categorization of Web Resources: various
links
* [externallink.GIF] [18]Cybermetrics: a list of papers applying bibliometric
(citation) methods to the web.
* [externallink.GIF] [19]Information Retrieval and Information Extraction on
the web: a very rich list of publications and other resources
* [externallink.GIF] [20]Graph structure in the web, a paper by A. Broder et
al., analysing the structure appearring from a huge crawl through hundreds of
millions of pages
* [externallink.GIF] [21]Quiver, proposes search engines based on the Spectral
Filtering algorithms developed by Kleinberg
____________________________________________________________________________
[22]CopyrightŠ 2000 Principia Cybernetica - [23]Referencing this page
Author
F. [24]Heylighen,
Date
May 31, 2000 (modified)
Mar 24, 1999 (created)
[25]Home
[up.gif]
[26]Project Organization
[up.gif]
[27]Collaborative Knowledge Development
[up.gif]
[28]PCP Research on Intelligent Webs
Up
[29]Prev. [4arrows.gif] [30]Next
Down
____________________________________________________________________________
____________________________________________________________________________
[31]Discussion
____________________________________________________________________________
* [32]Cupter innovation, Comment by Joe Whome
* [33]Global Hierarch, Comment by Jody Wanabee
[34]Add comment...
[space.gif]
References
1. LYNXIMGMAP:http://pespmc1.vub.ac.be/WEBCONAN.html#PCP-header
2. http://pespmc1.vub.ac.be/LEARNWEB.html
3. http://pespmc1.vub.ac.be/LEARNWEB.html
4. http://www.almaden.ibm.com/cs/k53/clever.html
5. http://simon.cs.cornell.edu/home/kleinber/kleinber.html
6. http://simon.cs.cornell.edu/home/kleinber/auth.ps
7. http://www.google.com/
8. http://www-db.stanford.edu/~backrub/pageranksub.ps
9. http://google.stanford.edu/~backrub/google.html
10. http://www-db.stanford.edu/~cho/crawler-paper/
11. http://www.research.digital.com/SRC/personal/Krishna_Bharat/WebArcheology/
12. http://www.parc.xerox.com/istl/projects/uir/projects/Webology.html
13. http://www.acm.org/sigchi/chi96/proceedings/papers/Pirolli_2/pp2.html
14. http://www.acm.org/sigchi/chi97/proceedings/paper/jp-www.htm
15. http://www.cgl.uwaterloo.ca/Projects/Vanish/webquery-1.html
16. http://www.cs.rutgers.edu/~davison/web-structure/
17. http://www.iastate.edu/~CYBERSTACKS/Aristotle.htm
18. http://www.cindoc.csic.es/cybermetrics/links03.html
19. http://www.mri.mq.edu.au/~einat/web_ir/
20. http://www.almaden.ibm.com/cs/k53/www9.final/
21. http://www.quiver.com/
22. http://pespmc1.vub.ac.be/COPYR.html
23. http://pespmc1.vub.ac.be/REFERPCP.html
24. http://pespmc1.vub.ac.be/HEYL.html
25. http://pespmc1.vub.ac.be/DEFAULT.html
26. http://pespmc1.vub.ac.be/ORG.html
27. http://pespmc1.vub.ac.be/^COLDEV.html
28. http://pespmc1.vub.ac.be/WEBRESEA.html
29. http://pespmc1.vub.ac.be/ADAPNET.html
30. http://pespmc1.vub.ac.be/COLLFILT.html
31. http://pespmc1.vub.ac.be/MAKANNOT.html
32. http://pespmc1.vub.ac.be/Annotations/WEBCONAN.0.html
33. http://pespmc1.vub.ac.be/Annotations/WEBCONAN.1.html
34. http://pespmc1.vub.ac.be/hypercard.acgi$annotform?
[USEMAP]
http://pespmc1.vub.ac.be/WEBCONAN.html#PCP-header
1. http://pespmc1.vub.ac.be/DEFAULT.html
2. http://pespmc1.vub.ac.be/HOWWEB.html
3. http://pcp.lanl.gov/WEBCONAN.html
4. http://pespmc1.vub.ac.be/WEBCONAN.html
5. http://pespmc1.vub.ac.be/SERVER.html
6. http://pespmc1.vub.ac.be/hypercard.acgi$randomlink?searchstring=.html
7. http://pespmc1.vub.ac.be/RECENT.html
8. http://pespmc1.vub.ac.be/TOC.html#WEBCONAN
9. http://pespmc1.vub.ac.be/SEARCH.html
Usage: http://www.kk-software.de/kklynxview/get/URL
e.g. http://www.kk-software.de/kklynxview/get/http://www.kk-software.de
Errormessages are in German, sorry ;-)