[Date Prev][Date Next][Thread Prev][Thread Next][Author Index][Date Index][Thread Index]
Version Compare/Merge: Remaining Issues
- To: <xanatech>
- Subject: Version Compare/Merge: Remaining Issues
- From: Marc Stiegler <marcs>
- Date: Sat, 7 Apr 90 23:33:01 PDT
Interesting issues include:
- How do we display nested movements in our
compare/merge window?
- How do we deal with vcopies?
- Under what circumstances do we get a result that is
not heuristically optimal, and what should we do about it?
- What other enhancements/alterations may be
desireable?
- Finally, what other kinds of comparison information
might be interesting in other comparison displays? All we
have discussed so far is focused strictly on creating the
Release 1 Tapestry comparison. There is another comparison
that interests me: the Quick Compare. A separate message
documenting that, while it is fresh in my mind, will follow
another day.
Meanwhile, here are discussions of the other issues:
NESTED MOVEMENT
Displaying nested movements is a moderately
difficult problem. I did not think it through clearly
enough at Capabilities Review time (as dean said at the
time, what all did marcs NOT think of? :-) The issue is
that with nested movement, a specific chunk of moved
material may have been moved several times, i.e., there
are several movement locations, and the particular place
you click may be a part of several different movement
groups (for example, if you click on a sentence that was
moved inside a paragraph that was itself moved to a new
chapter, if you click on the sentence, there are 2
moved-blocks and 2 moved locations associated with it).
There are several approaches; there is no risk that we
cannot solve this problem; but it is not yet solved. I
will be thinking about it.
VCOPIES
Vcopies can be dealt with more or less the same way
we deal with insertions and deletions. The most bizarre
case is when there are several copies in the original
and several in the final. The following algorithm will
work well enough for a first go:
- Map as many copies one-to-one from the
original onto the final as possible. A pair of copies
should be mapped onto each other if they are "near"
each other in the working document: here, "nearness"
is a count of the number of movement runs separating
them in the working document. Map the 2 nearest
vcopies (one from the original, one from the final)
onto each other, repeatedly, until you run out of
vcopies in one or the other.
-If there are left over copies in the
original, treat them as deletions. If there are left
over copies in the final, treat them as insertions.
TRUE DEFICIENCIES
Under what circumstances do we get poor results? The
simplest case is a 5-part group of movement runs with no
reverses. This is not as common an event as it might
sound: remember, a run only qualifies as a movement run
if it is not properly adjacent to EITHER of its
neighbors. If you throw out reverses as movements, you
now need to move ALL the movement runs large distances,
being careful to ensure that no original neighbors wind
up next to each other at the far end (else, they were
merely reversed).
The simplest case where this occurs is in the
transformation from "A B C D E" to "B E C A D". In this
case, we would say that A was moved between C and D.
The algorithm has a 50% chance of making an error, and
saying that D was moved behind A, then C was moved in
front of A.
This can be characterized as an error in
lookahead: without looking ahead to future
transformations, the anchor method, in its fixation on
finding a movement that makes a small object adjacent
to a large object, may not notice that it could save
an extra movement later on by moving the large object
between the two small ones that will eventually
surround it. I believe that we should implement this
extra bit of lookahead, so that this 5-part-rearrange
works correctly.
There is a more general version of this problem.
To guarantee heuristically optimal results, we would
have to search the tree of alternative movement
sequences. Though there is a warm place in my heart
for the alpha-beta minimax algorithm needed to do this
efficiently, I do not believe it is worth the effort
for Release 1. With just a one-ply lookahead, our
system will, I believe, create simpler transformations
than normal humans would. 'Tis sufficient.
MORE MINOR ISSUES:
I may have performed overkill in defining the size
of a composite run to have 2 parts. We may want to go
back to defining it to be just the sum of the sizes of
the movement runs it contains. This will increase the
amount of nesting, but decrease the motion of large
objects.
The algorithm for composing movement runs in the
working document does NOT guarantee the maximum
separation of size, into the very largest and very
smallest objects. It can fail for reasons of lookahead
as well. If we get counterintuitive results (too many
insertions and deletions being moved around), we can
enhance this as well.
Intuition suggests that, if objects are sufficiently
different in size, it is easier to understand moving two
small objects than to move one large object.
Consequently the improvement to the anchor method I
propose above, to move the larger object between the two
smaller objects to save a move, will sometimes produce
heuristically better results that are unfortunately
counterintuitive. My personal experiments suggest that
this effect starts to appear when the anchor is larger
than the sum of the sizes of its properly adjacent
neighbors. We may wind up implementing this during Beta
as well.
With a crew of critics like Xanadu, I'm sure someone
will come up with even more amusing instances where this
algorithm does not create the most intuitive
representation...I look forward with some trepidation to
seeing the counterexamples.
--marcs
P.S. This is still no where near being the most important thing
we have to think about. Absolutely everything else that
people are working on is, in fact, more important.