Methods for implementing transclusion of text into HTML pages
Copyright (c) 13/06/1996 Andrew Pam of Xanadu Australia
Second draft 28/06/1996 by Andrew Pam
Third draft 04/08/96 by Andrew Pam
This is a draft document and not to be regarded as final.
Introduction
An important requirement of
Xanalogical systems is the ability to transclude, or
virtually include, portions of one or more documents into another. This
enables composite documents to be constructed where each reader obtains
the pieces from the original publisher. HTML already permits images in
formats including X bitmaps, GIF and JFIF (JPEG) to be transcluded with
the <IMG> tag and other document types with the <EMBED>
tag but unfortunately does not support the transclusion of text.
This document examines methods by which support for text transclusion
can be implemented with currently available technologies on the WWW.
- Java applet
The applet would take the URL and range to transclude as parameters
and would attempt to retrieve the requested text and display it
within the rectangular space reserved for use by the applet.
Note: JavaScript cannot be used because it can cause a URL to be loaded
into a window, but can not access data from a URL without visibly
loading the entire page.
- HTML:
- <APPLET>
- PROS:
- Allows the remainder of the page to continue loading and
rendering while text transclusions are being retrieved. Doesn't require
any changes to the server. Supported by many browsers with more to
follow. The code can automatically and transparently be downloaded when
required.
- CONS:
- Not supported by older browsers that don't have Java.
Doesn't use a distinct <TEXT> HTML tag. Japanese and other fonts
are probably not implemented yet unless we do it from scratch.
Transcluded text must appear within the reserved rectangular applet
window and probably can't be richly formatted as with HTML.
- Netscape plug-in
Documents containing text transclusions would have to be given a
particular file extension (for example *.thtml) and MIME-type (for
example text/thtml). When such a document is requested the plug-in
would be invoked in the background and would parse the incoming HTML
stream looking for <TEXT> tags and forwarding the rest, and the result
of retrieving the transclusions, to the invoking Netscape window.
- HTML:
- <TEXT>
- PROS:
- Supports full HTML markup including Japanese and other foreign
fonts supported by the browser.
- CONS:
- Not supported by the many browsers that don't implement Netscape
plug-ins. Requires the user to download the plug-in before they can
correctly view pages containing text transclusions. Requires server
administrators to configure a new MIME-type and publishers to name
files containing text transclusions with a different file extension.
Finally, the necessary features are not yet implemented by Netscape!
- Server Side Include with a CGI script
Documents would invoke a CGI script which would attempt to retrieve
the URL and range specified as parameters to the script.
- HTML:
- <!-- #include --> or <!-- #exec -->
- PROS:
- Supports full HTML markup.
- CONS:
- Not supported by servers that don't implement server side
included CGI scripts. Doesn't use a distinct <TEXT> HTML tag.
Transcluded text is retrieved by the server on behalf of the user,
rather than directly by the user.
- Parser/filter CGI script
The script would retrieve each HTML page at the URL specified as its
parameter and parse it, looking for <TEXT> tags and attempting to
retrive the requested URL and range, then inserting the retrieved
text into the HTML page as it is output. The script should also
parse hyperlinks (<A HREF> tags) and change the destination to
lead back to the script itself with the original link destination as
a script parameter, so that the script will continue to be invoked
to parse all HTML pages retrieved even when links are followed.
- HTML:
- <TEXT>
- PROS:
- Supports full HTML markup. Should work with all known servers and
browsers.
- CONS:
- Requires that documents containing text transclusions be accessed
via the CGI script. Transcluded text is retrieved by the server on
behalf of the user, rather than directly by the user.
- Browser implementation
The browser would directly recognise and interpret <TEXT> tags in HTML
to request the specified range and URL and insert it inline.
- HTML:
- <TEXT>
- PROS:
- Probably the most efficient solution.
- CONS:
- Doesn't support users of other browsers.
- Server gateway
The server would parse all HTML files and request the necessary
transcluded material while serving each document. Mr. Yousuke Igarashi
<yousuke@crew.sfc.keio.ac.jp> suggested making this a proxy module for a
web server such as Apache, which would allow users of other servers to
set a transclusion supporting server as their proxy.
- HTML:
- <TEXT>
- PROS:
- Supports full HTML markup. Should work with all known browsers.
- CONS:
- Doesn't support users of other servers. Transcluded text is
retrieved by the server on behalf of the user, rather than directly by
the user.
Because all of these methods have their strengths and weaknesses we
will probably want to implement more than one. I believe we decided to
start with method 4 (CGI script) and probably methods 5 and 6 later.
I propose that the new HTML tag should be something like this:
<TEXT SRC=[URL] {(PLAIN|RANGE)={[start]},{[end]}} {WIDTH=[X] HEIGHT=[Y]}>
Where braces {} enclose optional elements, brackets [] enclose variable
parameters and parentheses () contain mutually exclusive alternatives
separated by vertical bars |.
Parameters:
- SRC=[URL]
- Mandatory. Specifies the source document from which text is to
be transcluded. [URL] must be the URL of a text document of
some kind, HTML or otherwise.
- (PLAIN|RANGE)={[start]},{[end]}
- Optional. If this parameter is omitted, the source document
will be transcluded in its entirity. [start] and [end] must be
byte offsets within the file. If PLAIN is used, only the text
of the source document is parsed; all tags are omitted both in
determining the specified offsets and in the transclusion.
If RANGE is used, the source document is transcluded verbatim.
It is probably an error for both [start] and [end] to be
omitted. If either or both are out of range any in range
portion selected should probably still be delivered.
- WIDTH=[X] HEIGHT=[Y]
- Optional. [X] and [Y] are in pixels and would allow the browser
to reserve a rectangular space to present the transcluded text
rather than having to wait for it to arrive before continuing
the layout of the page, in exactly the same fashion as the <IMG>
tag is typically implemented. If the transcluded text will not
fit in the reserved space scroll bars could be displayed.
It probably doesn't make sense to implement support for these
parameters in the server-side implementations (3, 4 and 6).
The intention is to have a facility in authoring programs that permits
the author to create transclusions by indicating an insertion point,
viewing the document from which they wish to transclude, and marking
the region to be transcluded, much in the manner of a traditional
"cut and paste" operation except that what is actually pasted is the
reference to the transcluded portion rhather than the literal text.
Initially, this could be a small editing program purely for adding
transclusions to existing documents. It has also been suggested that
people might wish to add transclusions by hand, in which case it might
be desirable to have other ways of specifying the start and end of
the range besides just the byte offsets, which are inconvenient to
determine by hand.
Possible extensions to [start] and [end] values (for discussion):
- HTML target anchors <A NAME="target">, indicated by prefixing the
target name with a hash mark. This should reference the start of
an anchor. Byte offsets from this position could be permitted by
appending a plus or minus and the number of bytes. Examples:
RANGE=#start,#end
RANGE=#start+5,#end-1
- Paragraphs (<P> in HTML), indicated by the letter P and the
paragraph number counting from the beginning of the document.
Sentences and words could also be supported similarly to paragraphs,
but at additional computational expense. Example:
RANGE=P5,P9-3
- Offsets from pattern matches, suggested by Paul Haeberli
<paul@sgi.com> in his
"Merge" script.
This could be signified by enclosing the pattern with slashes, single
or double quotes as delimiters which may not appear within the
pattern. Examples:
RANGE='"-15
RANGE=105,/day./+6
Comments welcome!