[Date Prev][Date Next][Thread Prev][Thread Next][Author Index][Date Index][Thread Index]
Re: Submission of draft-pam-html-fine-trans-00.txt

To: xanadu@xxxxxxxxxxxxx
Subject: Re: Submission of draft-pam-html-fine-trans-00.txt
From: Lasse Hillerøe Petersen <lassehp@xxxxxxxxxx>
Date: Mon, 3 Mar 1997 15:41:47 +0100
In-reply-to: <199702250105.MAA27729@xxxxxxxxxxxxxxxxxxxxxxxx>
Reply-to: xanadu@xxxxxxxxxxxxxxxxx
Sender: xanni@xxxxxxxxxxxxxxxxx
>INTERNET DRAFT                                                Andrew Pam
><draft-pam-html-fine-trans-00.txt>                        Project Xanadu
>Expires 2 September 1997                                    2 March 1997
>Syntax
>
>   The proposal is to add an HTML tag with the following syntax:
>
>   <TEXT SRC=(URI) {PLAIN} {RANGE=(start),(end)}>

Comment #1. Since what is included is in a sense a (dynamic) image (in a
broader sense) of a different text, why can't the IMG element simply be
extended to permit this? OK, this may cause problems. In that case, INCLUDE
may be better than TEXT.

>   Where parentheses () enclose variable parameters and braces {}
>   enclose optional elements.
>
>   The SRC attribute is mandatory, and specifies the source document
>   from which text is to be transcluded.  (URI) must be the Universal
>   Resource Identifier of a plain text or HTML document.
>   If the source
>   document is HTML and the optional PLAIN attribute is specified,
>   all HTML tags are removed and all SGML entities converted to the
>   characters they represent.  If PLAIN is omitted, the source document
>   is transcluded verbatim.  In either case, only the contents of the
>   <BODY> element of the source document is transcluded.  If the source
>   document is already plain text, the PLAIN attribute has no effect.

Comment #2. Why should the client modify the included text when a Plain
attribute is present? And why only a Plain attribute? Why not allow for the
future, and have a Type attribute? (In principle, adding a Type attribute
to the IMG element would be a more unified approach.) A possibility could
be to allow for a Type=MIME-Type. An inclusion of a HTML-document by <IMG
Src="text.html" Type="text/plain"> would permit the browser to render the
HTML-text as is (useful for HTML guides, embedding the inclusions in
<PRE></PRE>), whereas <IMG Src=text.html"> would include the text as HTML,
and apply HTML-rendering to the text.

Comment #3. The IMG and APPLET elements both have an Alt attribute.
Wouldn't it be desirable to have this as well?

Comment #4. The draft does not seem to specify how "active" elements should
affect included HTML?
#4a. Assume <A Href="url"><STRONG><IMG Src="text.html"></STRONG></A>.
Should the text be rendered as a bold link?
#4b. What if the source is a part of the source document? What if the part
is not balanced? Should the part be checked for correctness/balance?

Comment #5. It seems the intention of the Plain attribute is to prevent
remote markup to interfere with the including document. Wouldn't it be
desirable to have a more fine-grained control over this, so for example
emphasis can be retained, while references are removed (keeping EM, STRONG,
CITE, but omitting A)?

Comment #6. This relates to Comment #4 and #5. What if the included HTML
document part contains a reference to a local anchor. Is this anchor
modified to be absolute, or can it refer to an anchor in the including
document? What about embedded images?

Comment #7. While pattern matching is nice, it could be expensive. Should
the matching be performed by the browser, or the server of the included
text source?
If the latter, how is this indicated to the server? A normal IMG Src
inclusion simply results in a plain HTTP transfer of the image data, but if
the TEXT inclusion was only a small part of a huge document, the transfer
of the whole document would be highly inefficient. Would it be desirable to
include location specification in the URL, rather than as attributes to the
TEXT element? That is: would it be better to extend the URL definition to
allow for document parts?

Comment #8. How is the version of HTML used in a source for inclusion
passed on to an including document? How is compatibility maintained in
general? The included text is not a complete HTML document. (And as
mentioned, it is difficult even to assure that it is a valid HTML document
by itself.)

Comment #9. Should included text be regarded as atomic; that is, can there
be references to a document part, which refers to a slice of the document
that crosses a further included text? (part a of doc A is included in
paragraph b of doc B, doc C included some text from paragraph b.) This is
obviously related to how to refer into a document in the first place, as
mentioned in #7. Whereas it probably makes only little sense to talk about
a section of an image, it would make a lot of sense to talk about text
slices that cross inclusion boundaries.


There are two separate problems that have to be addressed - HOW TO TAKE
SOMETHING OUT and HOW TO PUT SOMETHING IN:

Problem #1. How to refer to document parts, or how to construct documents
that are just parts of other documents.
Currently there is no well-defined way to request a part of a document from
a HTTP server, and there is no URL-notation to achieve this. Anchors
provide marks in the text that can be used for this, but only allow
reference to parts that have been anticipated when the document was
written. An including document cannot arbitrarily pick a part of the text
to quote. Using character positions (byte indexes) is not perfect, but
there currently is no standard namespace that can be used to refer into
HTML documents.

Problem #2. Inclusion of any object in another document.
Currently, a browser can only display images and applets inside documents.
What is necessary is to allow (at least) textual inclusion. This raises the
question of whether such an inclusion should be regarded as happening at
parse time, so that included HTML will be parsed just as the containing
document. This however means that parsing the HTML will become complicated
and recursive. I don't know if SGML permits this. Alternatively, the
browser process displaying the including document will have to embed
another browser process (displayed as an image?) in the window. This would
probably affect further cross-boundary inclusion.

I am tempted to guess that HTML can't be stretched enough to accomodate
these two problems, without severe kludging. (Verbed form of kludge, sorry
if that's not proper usage of the word.;-)

An example of how this could be made to work:
Lets assume we allow recursive HTML parsing.
Extend the IMG element with a Type attribute. Omitting the type means to
implicit use the type indicated from the HTTP transmission.

<IMG Src="http://foo.bar.org/reports/myreport.html//character/200,267/";
Type="text/html" Headerlevel=3 Base=localanchors Hide=globalanchors>

This could mean that headers from level 1 should be remapped to level 3
(probably because the location of the inclusion is already under a H3
header.)
Local anchors are made global relative to their original base, so
references to images will still be correct. Links out of the document are
hidden.
The source is myreport.html, with the included part comprised by characters
200 to 267. The URL has been extended to contain provisions for referring
to elements of a document. I chose an extra // pair for this, what will in
fact be most practical has to be examined further. I used a character index
to refer to the part. This is not important, what is important is that the
slicing is part of the URL, not of the inclusion element.

Inclusion interests me, because I have implemented it partially (for a
specific purpose) using CGI. Inclusion will make many things a LOT easier,
but they also raise many questions. In fact I think they highlight problems
of HTML and the WWW in general, namely that there is no enforced concept of
containment - the apparent hierachical structure of URL's does not
necessarily convey any meaningful structure, but is only a convenience for
the publisher of documents, because it mirrors the storage structure on the
server. The only containment is between the host storing the document and
the document itself. In the above example <http://foo.bar.org/reports>
would probably be an index of reports, which can be meaningfully said to
contain myreport (and probably other reports), but this is would only be a
local convention. An OOPL-like namespace would be much better, IMO.

Well, sorry for the length and the digressions. I hope this input will be
of some value to you, and that you don't feel I have wasted your time. I
would be delighted to hear what you think about this, but don't feel
obliged to reply if all I had to say was old news to you. Use my ideas any
way you like. I hope your transclusion or inclusion proposal will make it
into the WWW as we know it soon. It is needed.

-Lasse

--
Lasse Hillerøe Petersen     ! "Business as usual is
Systems Administrator       !  no longer acceptable"
Information & Media Science !       -Gilbert F. Amelio
Aarhus University, DENMARK  !
Follow-Ups:
- Re: Submission of draft-pam-html-fine-trans-00.txt
  - From: Andrew Pam
References:
- Submission of draft-pam-html-fine-trans-00.txt
  - From: Andrew Pam
Prev by Date: Re: Meme:"Digital Libraries"
Next by Date: Submission of draft-pam-html-fine-trans-00.txt
Previous by thread: Submission of draft-pam-html-fine-trans-00.txt
Next by thread: Re: Submission of draft-pam-html-fine-trans-00.txt
Index(es):