|
Web Mapping Constructs.
8th September 1998
Mapping complex, non-linear hypertext information structures.
As far as I know there is no software readily available to map complex
structures. Existing software seems to have failings in some facets.
Many published "site maps" don't acknowledge (or have, possibly) any
link other than strictly hierarchical. Facilities for graphically
viewing web sites are few, and those that do don't seem to recognise
inter-relationships.
The whole point of the web was the fact that documents can be
referenced randomly, without insisting that all links be hierarchical
in nature . Many sites (including Yahoo,
Alta-Vista, Webcrawler etc) exist solely to
help navigate these complex structures. Unfortunately all the site map
management tools seem to ignore this. While many sites seem to have
tree structures others do not (eg Yahoo,
while on the surface appearing to be purely tree-based, in fact has
"aliases" that link across the tree).
Apart from trees themselves, in my experience there are more
relationships mis-expressed as trees than true trees. So called family
trees tend to grow in rambling structures, and interconnect (not
just in the low IQ; there is substantial inter-breeding in the royal
houses of Europe).
Additionally, when composing large structures, or making
databases
browse-able via CGI or canned HTML moderately sized databases can
becoming large, unwieldy structures (Imagine drawing a single map for Microsoft's Corporate site!). In
the case of the Lepidoptera
project (475k of base database, with indices and lookup tables
became 2Mb of source data, which in turn became 4,140 HTML files
totalling 12Mb, in addition to the 1,868 image files) the concept of a
map quickly became irrelevant.
"Meta" anything in a computing context
is
basically self-describing: meta-data is data describing data (this can
be definitions of fields and tables, or the more interesting
descriptors (things like accuracy and source comments). Meta-maps are
maps that describe maps. The "index maps" present in map books that
identify the relationships between maps are one common example of
meta-maps. |
Instead a rough "meta-map" concept was used, where rather than
map
individual pages a map was made of the relationship between types of
pages. (A diagram was constructed, albeit one using a demonstration
program from a programming tool. The deficiences in this particular
diagram were immediately obvious, but there was neither the time nor
the tools to produce software that would produce such a document) The
type definitions used, in hindsight, confirm rather closely to the
classes/entities that would have resulted from a proper analysis using
standard methodologies (say Codd and/or Booch) (although code-reuse is
not really an issue)
In addition to assisting programmers with developing and
managing
the document space, with good naming conventions (the scheme originally
chosen did not support spaces or hyphens in entity names, but
underscores in their place helped) it is possible to help explain the
document to users and project management.
A personal, Black box reverse
engineering of
the perceived concepts behind Yahoo(TM). I have no knowledge of the
actual technology used. This is my best guess, and how I would do it.
http://www.yahoo.com/ |
Yahoo, one of the earliest
"web
catalogues" (or more recently "portals"), while having a large amount
of data actually appears to have a simple internal structure. There are
only two types of attributes required: the "Category" and the "Web
site". A category can contains other objects such sub-categories or web
sites, as well as being linked to all it's ancestor categories. This
would have been a simplistic, pure tree structure except for the (very
useful) aliasing to related, but not entirely similar categories across
the structure (for example: the Category page for Entertainment : Music
: Software contains aliases to Business_and_Economy : Companies : Music
: Software and Computers_and_Internet : Software : Shareware :
Microsoft_Windows : Desktop_Themes : Individual_Themes : Music, two
related categories that do not qualify to be merged)
Rob Malda's "Everything"
project is one of the purer proofs that relationships between items can
be other that strictly hierarchical. The concept is basically one of
user created "nodes", filled with descriptive text, that can be linked
to any other node in the system. A linked node need not exist to be
linked to. Instead, anyone following a "null" link will be invited to
create that node. An attempt is made to impose a limited natural
selection mechanism on node content, by providing an alternate data set
for each node, and asking those viewing to make a judgement on the
superior. Presumably at some point the lesser would be deleted.
Images of New Zealand Lepidoptera.
Crosby,
Dugdale, Thoreau. 1998 Manaaki
Whenua Press (currently in press) |
The NZ
Lepidoptera Project is an effort to publish electronically the type
specimen details of the indigenous Lepidoptera (moths and butterflies)
of New Zealand. While for this project some of the complexities of
taxonomic classication were ignored, there was still a complex
structure method due to the need for multiple access paths to the main
information page. Access paths were added, in addition to three
variants on taxonomic identity, by Collector, Author, Location
collected, and institution currently storing the specimen. All paths
lead to the same type of page (1 of ~2200). The current build has
~4,500 pages and ~120,000 hyperlinks.
Details of scheme used.
The tool used need not be expensive.
For the
purposes of the previously mentioned project the tool used was a
demonstration applet from Sun Microsystems'
Java Development Kit,
with
shadows then added in a generic paint program |
The diagramming tool used needs to be able to construct entities and
link them together.
- An entity is simply a box, and links between them simply
lines.
- Arrow-heads are added to show the directions of traffic
available.
- If travel in both directions is possible, two arrow heads
should
be drawn, facing away from each other.
- By adding a "shadow" under entities that map to a large
number of
pages, it is possible to give some broad idea of scale (the arbitrary
figure used to add shadows here was that a level of shadow should be
added on a link where the average ratio was at least 1:20. This gave at
maximum 3 shadows under one entity).
- For ease of readability, the first entity/page presented to
the
user is displayed in the top-left of the document.
- For clarity, some entities may be ommitted. Toolbars,
advertisements etc. are not shown. These should be written
descriptively in text attached to the project.
- Each Entity relates to either a single handcrafted page or
a
single script that produces one type of page.
- Don't display duplicate structures; the version that is
restricted to 8.3 character filenames is not drawn separately.
- Don't display un-related structures.
- Bear in mind that there are other relationships that are
important, including run order, file dependencies etc.
- A single entity only represents a single type of page, and
should
be simple enough to explain in an single sentence, in the context of
the whole project.
- One entity, one script.
Further thoughts
None of the ideas here are being
actively
pursued. The job I work is involved in the field of
Bioinformatics/Biodiversity, so more general software-engineering is
not a priority. However, I am still interested in them. If you are too,
contact
me. The philosophical direction I come from suggests that any code
created here that isn't specifically owned by anybody else (like my
employers) would ideally become Open Source software |
- Drawing simple flow diagrams is a simple task, that doesn't
need
a tool anymore complex than a generic digramming tool. However,
extending the code slightly to keep related attributes associated with
their respective entities would be greatly beneficial to users. It's
only a small step from there to code generation for at least skeleton
code. Unfortunately it's not a priority for me to write small scale
CASE tools.
- The above is a very rough idea shaped by the tools I had at
my
disposal. I am interested in expanding the ideas used. I would probably
write a small tool for this if I had access to a good C++ compiler/GUI
(I learnt Borland(now Inprise)'s OWL toolkit in tertiary education).
- Other ideas to be shaped into a standard:
- Numeric page counters: either fan-out or total
- differentiate between script generated and hand coded
pages
- Optionality: the pages generated by script x lead to
either
page type y or page type z
- Ownership of individual scripts. Who will your changes
impact?
- Sub-and super diagrams; information complexity hiding
© Landcare Research New Zealand Ltd 1998
Dominic J. Thoreau
|