[Date Prev][Date Next][Thread Prev][Thread Next][Author Index][Date Index][Thread Index]
Version Compare/Merge: The Problem
- To: <xanatech>
- Subject: Version Compare/Merge: The Problem
- From: Marc Stiegler <marcs>
- Date: Sat, 7 Apr 90 22:56:40 PDT
Character movement is surprisingly difficult to dissect
analytically. Let us look at the simplest possible
documents that demonstrate movement: let us take the
document "A B" as the original, and the document "B A" as
the final. There are 4 ways to describe the
transformation: we could say
- A and B were transposed;
- A was moved behind B;
- B was moved in front of A;
- A was moved behind B while at the same time B was
moved in front of A.
Of these only one is intuitively a poor
description--the last description saying that both were
moved, which is clearly redundant. Each of the other three
descriptions will be considered "correct" in different
circumstances, depending on the nature of A and B. If A and
B are both single characters, or if they are of
approximately the same size (both sentences, both
paragraphs, both chapters), they would be described as
"transposed". If A is substantially larger than B, then we
would say that B was moved around A, and vice versa.
Alas, a naive computer program would choose the fourth
description: after all, both A and B moved, and so the
program would list them both as having moved. So we need to
develop a more sophisticated algorithm that will construct
"simple", "intuitive" descriptions of the transformation.
First we need to define what it means to have a
"simple" or "intuitive" description. After much
contemplation, I offer the following heuristic. A simple
description is one which:
- minimizes the number of movements
- moves small objects large distances rather than
moving large objects small distances
- Minimizes the number of objects moved large
distances: another way of saying this is, prefers local
movement to global movement.
I would like to say something about minimizing nesting
of movement as well, but in fact nested motion is in some
circumstances quite intuitive. For example, if you moved a
sentence in which a mispelled word had its letters
transposed, you have a nested motion: the characters are
moved inside the sentence that is moved, as in, "I the big
red package have recieved" being rewritten as "I have
received the big red package". You would say "I flipped the
'i' and the 'e' in 'received', then moved 'have received'".
You would NOT say, "I moved 'have rec', then I moved 'e'".
Both descriptions fullfill the requirement of minimizing
motion--both take 2 moves. But the second description
violates the rule "minimize the number of objects moved
large distances".
Anyway, within the context of these 3 rules for a
simple description, I would say "minimize the level of
nesting", though the first 3 rules leave rather little room
for nesting control.
The above is only a heuristic specification; the result
is NOT necessarily "intuitively optimal". However, I hereby
define the term "heuristically optimal" to mean that an
algorithm generates a transformation that meets the
heuristic specification.
The algorithm described next generates heuristically
optimal descriptions for most common editing situations; it
generates valid descriptions under all circumstances.
--marcs