Methods for implementing transclusion of text into HTML pages

Copyright (c) 13/06/1996 Andrew Pam of Xanadu Australia
Second draft 28/06/1996 by Andrew Pam
Third draft 04/08/96 by Andrew Pam
This is a draft document and not to be regarded as final.

Introduction

An important requirement of Xanalogical systems is the ability to transclude, or virtually include, portions of one or more documents into another. This enables composite documents to be constructed where each reader obtains the pieces from the original publisher. HTML already permits images in formats including X bitmaps, GIF and JFIF (JPEG) to be transcluded with the <IMG> tag and other document types with the <EMBED> tag but unfortunately does not support the transclusion of text. This document examines methods by which support for text transclusion can be implemented with currently available technologies on the WWW.

Java applet

Note: JavaScript cannot be used because it can cause a URL to be loaded into a window, but can not access data from a URL without visibly loading the entire page.

HTML:: <APPLET>
PROS:: Allows the remainder of the page to continue loading and rendering while text transclusions are being retrieved. Doesn't require any changes to the server. Supported by many browsers with more to follow. The code can automatically and transparently be downloaded when required.
CONS:: Not supported by older browsers that don't have Java. Doesn't use a distinct <TEXT> HTML tag. Japanese and other fonts are probably not implemented yet unless we do it from scratch. Transcluded text must appear within the reserved rectangular applet window and probably can't be richly formatted as with HTML.

Netscape plug-in

HTML:: <TEXT>
PROS:: Supports full HTML markup including Japanese and other foreign fonts supported by the browser.
CONS:: Not supported by the many browsers that don't implement Netscape plug-ins. Requires the user to download the plug-in before they can correctly view pages containing text transclusions. Requires server administrators to configure a new MIME-type and publishers to name files containing text transclusions with a different file extension. Finally, the necessary features are not yet implemented by Netscape!

Server Side Include with a CGI script

HTML::  or
PROS:: Supports full HTML markup.
CONS:: Not supported by servers that don't implement server side included CGI scripts. Doesn't use a distinct <TEXT> HTML tag. Transcluded text is retrieved by the server on behalf of the user, rather than directly by the user.

Parser/filter CGI script

HTML:: <TEXT>
PROS:: Supports full HTML markup. Should work with all known servers and browsers.
CONS:: Requires that documents containing text transclusions be accessed via the CGI script. Transcluded text is retrieved by the server on behalf of the user, rather than directly by the user.

Browser implementation

HTML:: <TEXT>
PROS:: Probably the most efficient solution.
CONS:: Doesn't support users of other browsers.

Server gateway

HTML:: <TEXT>
PROS:: Supports full HTML markup. Should work with all known browsers.
CONS:: Doesn't support users of other servers. Transcluded text is retrieved by the server on behalf of the user, rather than directly by the user.

Because all of these methods have their strengths and weaknesses we will probably want to implement more than one. I believe we decided to start with method 4 (CGI script) and probably methods 5 and 6 later.

I propose that the new HTML tag should be something like this:

Where braces {} enclose optional elements, brackets [] enclose variable parameters and parentheses () contain mutually exclusive alternatives separated by vertical bars |.

Parameters:

SRC=[URL]: Mandatory. Specifies the source document from which text is to be transcluded. [URL] must be the URL of a text document of some kind, HTML or otherwise.
(PLAIN|RANGE)={[start]},{[end]}: Optional. If this parameter is omitted, the source document will be transcluded in its entirity. [start] and [end] must be byte offsets within the file. If PLAIN is used, only the text of the source document is parsed; all tags are omitted both in determining the specified offsets and in the transclusion. If RANGE is used, the source document is transcluded verbatim. It is probably an error for both [start] and [end] to be omitted. If either or both are out of range any in range portion selected should probably still be delivered.
WIDTH=[X] HEIGHT=[Y]: Optional. [X] and [Y] are in pixels and would allow the browser to reserve a rectangular space to present the transcluded text rather than having to wait for it to arrive before continuing the layout of the page, in exactly the same fashion as the <IMG> tag is typically implemented. If the transcluded text will not fit in the reserved space scroll bars could be displayed. It probably doesn't make sense to implement support for these parameters in the server-side implementations (3, 4 and 6).

The intention is to have a facility in authoring programs that permits the author to create transclusions by indicating an insertion point, viewing the document from which they wish to transclude, and marking the region to be transcluded, much in the manner of a traditional "cut and paste" operation except that what is actually pasted is the reference to the transcluded portion rhather than the literal text.

Initially, this could be a small editing program purely for adding transclusions to existing documents. It has also been suggested that people might wish to add transclusions by hand, in which case it might be desirable to have other ways of specifying the start and end of the range besides just the byte offsets, which are inconvenient to determine by hand.

Possible extensions to [start] and [end] values (for discussion):

HTML target anchors <A NAME="target">, indicated by prefixing the target name with a hash mark. This should reference the start of an anchor. Byte offsets from this position could be permitted by appending a plus or minus and the number of bytes. Examples:
Paragraphs (<P> in HTML), indicated by the letter P and the paragraph number counting from the beginning of the document. Sentences and words could also be supported similarly to paragraphs, but at additional computational expense. Example:
Offsets from pattern matches, suggested by Paul Haeberli <paul@sgi.com> in his "Merge" script. This could be signified by enclosing the pattern with slashes, single or double quotes as delimiters which may not appear within the pattern. Examples:

Comments welcome!