Skip to content

Introduction to Unordered Tree Edit Distance (UTED)

While visual inspection allows for identifying similar and different lineages, it is insufficient for quantific scientific analysis. Therefore, mathematical tools are required to objectively analyze and compare such structures. This module focuses on finding such patterns across lineages using an unordered tree edit distance (UTED) algorithm developed by Zhang and Shasha, 1989. UTED computes the minimum number of operations to transform one tree into the other, regardless of labels. In simpler words, how many nodes do we have to add or remove to transform one tree so that it looks identical to the other one (regardless of labels). Apart from adding and removing nodes the algorithm may also substitute nodes, with any cost defined by the user. The fact the algorithm can substitute nodes without any cost means that this algorithm is capable of only checking the topology of the 2 trees, from now substituting for cost of zero will be called matching. Summarizing there are 3 operations that will be used for transforming trees:

  • Adding nodes: The same cost as removing
  • Removing nodes: The same cost as adding nodes
  • Substituting/Matching nodes: Costs less than adding/removing the nodes being compared.

Adding, Removing and Matching/Substituting always respect hierarchy. If a node n1 is matched to a node n2 the descendants of n1 will only be matched with descendants of n2.


Why UTED?

There are multiple algorithms that specialize in comparing tree graphs, like the ones used in phylogeny research. All these algorithms, have one feature in common, they require the label of the node, thus instead of transforming one tree to the other in terms of topology they aim to create the exact copy of the other tree, including the labels. These algorithms would find great use in organisms like C. Elegans where all of their lineages have been thoroughly studied and labelled. Other specimens may not have such a simplistic and well defined development so it is impossible to label them correctly.

UTED is label agnostic, meaning it can compare lineages without needing any prior knowledge of the naming of the cell, however this strength comes with a significant drawback: the algorithm must map nodes from one tree to another which makes time consumed scale exponentially to the number of nodes, because there are many possible mappings. This pairing always respects the hierarchical structure of the trees, thus the descendants of a node n that has been mapped to a specific node n of the other tree, will not be mapped to nodes of the ancestors of n'

uted_explanation


Different Tree approximations - making UTED faster

There are multiple ways to match 2 random trees that have more than 2 nodes, but the algorithm will try to find the best matching, the one that will need the least amount of operations to transform one tree into the other. Of course this proccess of checking multiple matchings needs computational power and time. To solve this problem, we developed some approximations to make computation more efficient.

  1. Original Tree: This algorithm is the simplest, as the dataset is used without changes to produce the distance. So the algorithm will either:

    • Match a node, where the cost will be 0
    • Add/Remove a node to/from a tree, costing 1.

    Normal tree is the most accurate, but it does not scale well with big datasets, meaning that computation time scales exponentially with the number of nodes.

  2. Downsampled Tree: This algorithm reconstructs the tree but skips nodes every n time points, so the result will be similar to the first tree but with less accuracy. In other words, it simulates the same dataset with bigger time resolution, meaning less points.

    • Match a node, where the cost will be 0
    • Add/Remove a node to/from a tree, costing 1.

    Downsampled Tree is slightly less accurate or equal (according to the downsampling rate) to normal tree, but much faster.

  3. Reduced Tree: This algorithm will reconstruct the trees, but every chain will be replaced with a node of a length attribute that will have the same value as the chains length. Meaning that the number of nodes in the new trees will be equal to the number of chains in the original lineage.This approximation completely changes the behaviour of the algorithm, as it will match whole chains with other chains and not just nodes. Such an algorithm may be extremely useful for fast but not precise results, however some biological question may be better answered using this algorithm.

    • Substitute a node, the cost will be the absolute difference of the length of the 2 chains
    • Add/Remove a node, will add/remove a node of size equal to the length of the existing chain, so the cost will be the value of the length of the chain.

    This algorithm is extremely lightweight and fast; however, the score is different than the normal tree. Some users may prefer this algorithm due to its capability of comparing chains instead of nodes, making it easily interpretable. Many other developers use the same or similar algorithms to this one Natesan et al, 2024

  4. Bound Reduced Tree: This algorithm will use the same tree as the reduced tree, however the edit operations are modified so that big chains that have similar length will be easier to be matched together. Has the strengths of reduced tree, but the resulting distance is closer to the that of the normal tree.

    • Match a node, the cost will be the absolute difference of the length of the 2 chains divided by the sum of both of them, so the result will be between 0 and 1
    • Add a node, will add a node of size 1
    • Remove a node, will remove a node of size 1

    This algorithm has all the advantages of reduced tree, but tries to reduce its disadvantages but adding a new way of comparing chains.

tree_styles

To inspect any style:

from LineageTree import tree_styles
tree_styles.tree_style["simple"].value(parameters)

The need for normalization - Making UTED interpretable

UTED is great to measure distance between 2 tree graphs, but the resulting distances range from zero to infinity, meaning that the results are not directly interpretable. Having this in mind two large trees that look similar may have a bigger distance than two relatively small, but very disimilar trees.

Thus, to overcome these limitations and convert them to a similarity measure we implemented 2 normalization algorithms:

  • Sum: The maximum distance between 2 random trees is the cost to convert one tree to null and then create the other tree from null.
  • Max: The result distance is divided by the cost to create the largest of the 2 trees from scratch, this algorithm primarily produces results in the range of 0 and 1, but sometimes for very disimilar trees the result may be greater than 1.

Template to create new styles:

from LineageTree import lineageTree
from LineageTree.tree_styles import abstract_trees

class tree_style_template(abstract_trees):

   ### The user has to implement these methods

   def get_tree(self)-> tuple[dict[int, list], dict[int, int | float]]:
       out_dict = {}
       times_or_other_attribute = {}
       ... # If time_scale exists for cross embryo comparison use it here
       return (out_dict, # The hierarchy of the nodes should be a dict(unique_node:[successors])
               times_or_other_attribute) # An attribute that may be used for comparisons.

   def _edist_format(self, adj_dict): # This methods seldom needs changing, creates the adjacency matrix 
       return super()._edist_format(adj_dict)# and provides the corresponding nodes that between lt and the tree style

   def delta(self, x, y, corres1, corres2, times1, times2)-> int|float:
       if x is None and y is None:
           return 0
       if x is None: # Cost for removing a node
           return times2[corres2[y]]
       if y is None: # Cost for adding a node
           return times1[corres1[x]]
       len_x = times1[corres1[x]]
       len_y = times2[corres2[y]]
       return abs(len_x - len_y) # Cost for matching a node

   def get_norm(self)->int|float:
       ... # how will the normalization work, should return a function that
       ... # calculates the norm for every subtree of the starting tree
       return value

   @staticmethod
   def handle_resolutions(
       time_resolution1: float | int,
       time_resolution2: float | int,
       gcd, #the greatest common divisor of all the time resolutions of datasets in the manager.
       downsample: int,
   ) -> tuple[int | float, int | float]: #Needed for cross embryo analysis
       ...
       return (time_resolution_fix_for_dataset_1   # This list will be used as the time scale
               ,time_resolution_fix_for_dataset_2) # parameter when comparing.

Testing the approximations

This section is focused on showcasing the advantages and disadvantages of the tree approximations shown in the previous section.

synthetic_trees

This plot showcases several interesting aspects of the approximations:

  1. All of the approximations scale well, as they produce the same results for the same trees regardless of the length of the chains.

  2. The Original tree is very slow compared to the rest and the downsampled algorithm shines for bigger trees, as it proves to be extremely fast while being acuurate.

  3. It is easy to see where each approximation succeeds or fails, if we consider that comparing the original distances is the best result we can get. Downsampled tree, will almost always have the closes result to the origial tree. Considering DAll, normalized reduced tree performs very well for datasets with different time resolution.

All in all, the downsampled tree is the best approximation to use for distance calculation. However, the reduced type algorithms match chains instead of nodes, meaning that in developmental biology terms, that it matches cell lifetimes together, which may prove an invaluable tool for developmental biology. All of the approximations have a use, so the user should select the appropriate one according to the specifics of the problem.

Using the Matching Component of UTED

UTED calculates distances between two tree graphs by matching their nodes. Extracting the matched nodes provides important information, particularly for labeled datasets. For example, when calculating the distance between two embryos, we can identify which clone or sublineage in one embryo corresponds to a clone or sublineage in the other. To support this, UTED includes functions specifically designed to find the matching between two lineages.

Tree distance Graphs

The distance value is a useful metric for quantifying the similarity between two lineages, but it does not provide detailed information about, where the similarities and disimilarities occur within the trees. To address this, we developed tree distance graphs by leveraging the matched pairs produced during the mapping process.

In these graphs, each matched chain is colored according to the value of its subtree, providing a visual representation of mapping quality. Tree distance graphs reveal two important aspects:

  • Variance: The color spectrum highlights the distances of mapped chains. Colors representing good mappings indicate low variance during development.
  • Gain or Loss of Function: Unmapped regions indicate lineages that exist in one tree but not the other. This may reflect biological differences, such as the emergence or loss of a lineage, or it may point to data quality issues.

example

API reference

lineagetree.measure

Functions:

Name Description
clear_comparisons

Clears the comparisons saved on the LineageTree object.

labelled_mappings

Returns the labels or IDs of all the nodes in the subtrees compared.

plot_tree_distance_graphs

Plots the subtrees compared and colors them according to the quality of the matching of their subtree.

unordered_tree_edit_distance

Compute the unordered tree edit distance from Zhang 1996 between the trees spawned

unordered_tree_edit_distances_at_time_t

Compute all the pairwise unordered tree edit distances from Zhang 996 between the trees spawned at time t

clear_comparisons

clear_comparisons(lT: LineageTree)

Clears the comparisons saved on the LineageTree object.

Parameters:

Name Type Description Default
lT
LineageTree

The LineageTree object

required
Source code in src/lineagetree/measure/uted.py
def clear_comparisons(lT: LineageTree):
    """Clears the comparisons saved on the LineageTree object.

    Parameters
    ----------
    lT : LineageTree
        The LineageTree object
    """
    lT._comparisons.clear()

labelled_mappings

labelled_mappings(
    lT: LineageTree,
    n1: int,
    n2: int,
    end_time: int | None = None,
    norm: Literal["max", "sum", None] = "max",
    style: (
        Literal[
            "simple",
            "normalized_simple",
            "full",
            "downsampled",
        ]
        | type[TreeApproximationTemplate]
    ) = "simple",
    downsample: int = 2,
) -> dict[str, list[str]]

Returns the labels or IDs of all the nodes in the subtrees compared.

Parameters:

Name Type Description Default
lT
LineageTree

The LineageTree instance.

required
n1
int

id of the first node to compare

required
n2
int

id of the second node to compare

required
end_time
int

The final time point the comparison algorithm will take into account. If None or not provided all nodes will be taken into account.

None
norm
('max', 'sum')

The normalization method to use, defaults to 'max'.

"max"
style
"simple", "full", "downsampled", "normalized_simple

Which tree approximation is going to be used for the comparisons, defaults to 'simple'.

"simple"
downsample
int

The downsample factor for the downsampled tree approximation. Used only when style="downsampled".

2

Returns:

Type Description
dict mapping str to list of str
  • 'matched' The labels of the matched nodes of the alignment.
  • 'unmatched' The labels of the unmatched nodes of the alginment.
Source code in src/lineagetree/measure/uted.py
def labelled_mappings(
    lT: LineageTree,
    n1: int,
    n2: int,
    end_time: int | None = None,
    norm: Literal["max", "sum", None] = "max",
    style: (
        Literal["simple", "normalized_simple", "full", "downsampled"]
        | type[TreeApproximationTemplate]
    ) = "simple",
    downsample: int = 2,
) -> dict[str, list[str]]:
    """
    Returns the labels or IDs of all the nodes in the subtrees compared.


    Parameters
    ----------
    lT : LineageTree
        The LineageTree instance.
    n1 : int
        id of the first node to compare
    n2 : int
        id of the second node to compare
    end_time : int, optional
        The final time point the comparison algorithm will take into account.
        If None or not provided all nodes will be taken into account.
    norm : {"max", "sum"}, default="max"
        The normalization method to use, defaults to 'max'.
    style : {"simple", "full", "downsampled", "normalized_simple} or TreeApproximationTemplate subclass, default="simple"
        Which tree approximation is going to be used for the comparisons, defaults to 'simple'.
    downsample : int, default=2
        The downsample factor for the downsampled tree approximation.
        Used only when `style="downsampled"`.

    Returns
    -------
    dict mapping str to list of str
        - 'matched' The labels of the matched nodes of the alignment.
        - 'unmatched' The labels of the unmatched nodes of the alginment.
    """
    parameters = (
        end_time,
        convert_style_to_number(style=style, downsample=downsample),
    )
    n1, n2 = sorted([n1, n2])
    lT._comparisons.setdefault(parameters, {})
    if lT._comparisons[parameters].get((n1, n2)):
        tmp = lT._comparisons[parameters][(n1, n2)]
    else:
        tmp = __unordereded_backtrace(
            lT, n1, n2, end_time, norm, style, downsample
        )
    btrc = tmp["alignment"]
    tree1, tree2 = tmp["trees"]

    (
        *_,
        corres1,
    ) = tree1.edist
    (
        *_,
        corres2,
    ) = tree2.edist

    if norm not in lT.norm_dict:
        raise Warning("Select a viable normalization method (max, sum, None)")
    matched = []
    unmatched = []
    if style not in ("full", "downsampled"):
        for m in btrc:
            if m._left != -1 and m._right != -1:
                cyc1 = lT.get_chain_of_node(corres1[m._left])
                if len(cyc1) > 1:
                    node_1, *_ = cyc1
                elif len(cyc1) == 1:
                    node_1 = cyc1.pop()
                cyc2 = lT.get_chain_of_node(corres2[m._right])
                if len(cyc2) > 1:
                    node_2, *_ = cyc2
                elif len(cyc2) == 1:
                    node_2 = cyc2.pop()
                matched.append(
                    (
                        lT.labels.get(node_1, node_1),
                        lT.labels.get(node_2, node_2),
                    )
                )

            else:
                if m._left != -1:
                    node_1 = lT.get_chain_of_node(corres1.get(m._left, "-"))[0]
                else:
                    node_1 = lT.get_chain_of_node(corres2.get(m._right, "-"))[
                        0
                    ]
                unmatched.append(lT.labels.get(node_1, node_1))
    else:
        for m in btrc:
            if m._left != -1 and m._right != -1:
                node_1 = corres1[m._left]
                node_2 = corres2[m._right]
                matched.append(
                    (
                        lT.labels.get(node_1, node_1),
                        lT.labels.get(node_2, node_2),
                    )
                )
            else:
                if m._left != -1:
                    node_1 = corres1[m._left]
                else:
                    node_1 = corres2[m._right]
                unmatched.append(lT.labels.get(node_1, node_1))
    return {"matched": matched, "unmatched": unmatched}

plot_tree_distance_graphs

plot_tree_distance_graphs(
    lT: LineageTree,
    n1: int,
    n2: int,
    end_time: int | None = None,
    norm: Literal["max", "sum", None] = "max",
    style: (
        Literal[
            "simple",
            "normalized_simple",
            "full",
            "downsampled",
        ]
        | type[TreeApproximationTemplate]
    ) = "simple",
    downsample: int = 2,
    colormap: str = "cool",
    default_color: str = "black",
    size: float = 10,
    lw: float = 0.3,
    ax: list[Axes] | None = None,
    vmin=None,
    vmax=None,
) -> tuple[plt.figure, plt.Axes]

Plots the subtrees compared and colors them according to the quality of the matching of their subtree.

Parameters:

Name Type Description Default
lT
LineageTree

The LineageTree instance.

required
n1
int

id of the first node to compare

required
n2
int

id of the second node to compare

required
end_time
int

The final time point the comparison algorithm will take into account. If None all nodes will be taken into account.

None
norm
('max', 'sum')

The normalization method to use.

"max"
style
"simple", "full", "downsampled", "normalized_simple

Which tree approximation is going to be used for the comparisons.

"simple"
downsample
int

The downsample factor for the downsampled tree approximation. Used only when style="downsampled".

2
colormap
str

The colormap used for matched nodes, defaults to "cool"

"cool"
default_color
str

The color of the unmatched nodes, defaults to "black"

'black'
size
float

The size of the nodes, defaults to 10

10
lw
float

The width of the edges, defaults to 0.3

0.3
ax
ndarray

The axes used, if not provided another set of axes is produced, defaults to None

None
vmin

Values within the range [vmin, vmax] from the input data will be linearly mapped to [0, 1]. vmin defaults to the 0.05 quantile of the values of the unordered tree edist distances of the subtrees. vmax defaults to the 0.95 quantile of the values of the unordered tree edist distances of the subtrees.

None
vmax

Values within the range [vmin, vmax] from the input data will be linearly mapped to [0, 1]. vmin defaults to the 0.05 quantile of the values of the unordered tree edist distances of the subtrees. vmax defaults to the 0.95 quantile of the values of the unordered tree edist distances of the subtrees.

None

Returns:

Type Description
Figure

The figure of the plot

Axes

The axes of the plot

Source code in src/lineagetree/measure/uted.py
def plot_tree_distance_graphs(
    lT: LineageTree,
    n1: int,
    n2: int,
    end_time: int | None = None,
    norm: Literal["max", "sum", None] = "max",
    style: (
        Literal["simple", "normalized_simple", "full", "downsampled"]
        | type[TreeApproximationTemplate]
    ) = "simple",
    downsample: int = 2,
    colormap: str = "cool",
    default_color: str = "black",
    size: float = 10,
    lw: float = 0.3,
    ax: list[plt.Axes] | None = None,
    vmin=None,
    vmax=None,
) -> tuple[plt.figure, plt.Axes]:
    """
    Plots the subtrees compared and colors them according to the quality of the matching of their subtree.

    Parameters
    ----------
    lT : LineageTree
        The LineageTree instance.
    n1 : int
        id of the first node to compare
    n2 : int
        id of the second node to compare
    end_time : int
        The final time point the comparison algorithm will take into account.
        If None all nodes will be taken into account.
    norm : {"max", "sum"}, default="max"
        The normalization method to use.
    style : {"simple", "full", "downsampled", "normalized_simple} or TreeApproximationTemplate subclass, default="simple"
        Which tree approximation is going to be used for the comparisons.
    downsample : int, default=2
        The downsample factor for the downsampled tree approximation.
        Used only when `style="downsampled"`.
    colormap : str, default="cool"
        The colormap used for matched nodes, defaults to "cool"
    default_color : str
        The color of the unmatched nodes, defaults to "black"
    size : float
        The size of the nodes, defaults to 10
    lw : float
        The width of the edges, defaults to 0.3
    ax : np.ndarray, optional
        The axes used, if not provided another set of axes is produced, defaults to None
    vmin, vmax: float, optional
        Values within the range ``[vmin, vmax]`` from the input data will be
        linearly mapped to ``[0, 1]``.
        *vmin* defaults to the 0.05 quantile of the values of the unordered tree edist distances of the subtrees.
        *vmax* defaults to the 0.95 quantile of the values of the unordered tree edist distances of the subtrees.

    Returns
    -------
    plt.Figure
            The figure of the plot
    plt.Axes
            The axes of the plot
    """
    parameters = (
        end_time,
        convert_style_to_number(style=style, downsample=downsample),
    )
    n1, n2 = sorted([n1, n2])
    lT._comparisons.setdefault(parameters, {})
    if lT._comparisons[parameters].get((n1, n2)):
        tmp = lT._comparisons[parameters][(n1, n2)]
    else:
        tmp = __unordereded_backtrace(
            lT, n1, n2, end_time, norm, style, downsample
        )
    btrc: Alignment = tmp["alignment"]
    tree1, tree2 = tmp["trees"]
    _, times1 = tree1.tree
    _, times2 = tree2.tree
    (
        *_,
        corres1,
    ) = tree1.edist
    (
        *_,
        corres2,
    ) = tree2.edist
    delta_tmp = partial(
        tree1.delta,
        corres1=corres1,
        corres2=corres2,
        times1=times1,
        times2=times2,
    )

    if norm not in lT.norm_dict:
        raise Warning("Select a viable normalization method (max, sum, None)")
    matched_right = []
    matched_left = []
    colors = {}
    if style not in ("full", "downsampled"):
        for m in btrc:
            if m._left != -1 and m._right != -1:
                cyc1 = lT.get_chain_of_node(corres1[m._left])
                if len(cyc1) > 1:
                    node_1, *_, l_node_1 = cyc1
                    matched_left.append(node_1)
                    matched_left.append(l_node_1)
                elif len(cyc1) == 1:
                    node_1 = l_node_1 = cyc1.pop()
                    matched_left.append(node_1)

                cyc2 = lT.get_chain_of_node(corres2[m._right])
                if len(cyc2) > 1:
                    node_2, *_, l_node_2 = cyc2
                    matched_right.append(node_2)
                    matched_right.append(l_node_2)

                elif len(cyc2) == 1:
                    node_2 = l_node_2 = cyc2.pop()
                    matched_right.append(node_2)

                colors[node_1] = __calculate_distance_of_sub_tree(
                    lT,
                    node_1,
                    node_2,
                    btrc,
                    corres1,
                    corres2,
                    delta_tmp,
                    lT.norm_dict[norm],
                    tree1.get_norm(node_1),
                    tree2.get_norm(node_2),
                )
                colors[node_2] = colors[node_1]
                colors[l_node_1] = colors[node_1]
                colors[l_node_2] = colors[node_2]
    else:
        for m in btrc:
            if m._left != -1 and m._right != -1:
                node_1 = corres1[m._left]
                node_2 = corres2[m._right]

                if (
                    lT.get_chain_of_node(node_1)[0] == node_1
                    or lT.get_chain_of_node(node_2)[0] == node_2
                    and (node_1 not in colors or node_2 not in colors)
                ):
                    matched_left.append(node_1)
                    l_node_1 = lT.get_chain_of_node(node_1)[-1]
                    matched_left.append(l_node_1)
                    matched_right.append(node_2)
                    l_node_2 = lT.get_chain_of_node(node_2)[-1]
                    matched_right.append(l_node_2)
                    colors[node_1] = __calculate_distance_of_sub_tree(
                        lT,
                        node_1,
                        node_2,
                        btrc,
                        corres1,
                        corres2,
                        delta_tmp,
                        lT.norm_dict[norm],
                        tree1.get_norm(node_1),
                        tree2.get_norm(node_2),
                    )
                    colors[l_node_1] = colors[node_1]
                    colors[node_2] = colors[node_1]
                    colors[l_node_2] = colors[node_1]
    if ax is None:
        fig, ax = plt.subplots(nrows=1, ncols=2, sharey=True)
    cmap = colormaps[colormap]
    if vmin is None:
        vmin = np.percentile(list(colors.values()), 5)
    if vmax is None:
        vmax = np.percentile(list(colors.values()), 95)
    c_norm = mcolors.Normalize(vmin, vmax)
    colors = {c: cmap(c_norm(v)) for c, v in colors.items()}
    lT.plot_subtree(
        lT.get_ancestor_at_t(n1),
        end_time=end_time,
        size=size,
        selected_nodes=matched_left,
        color_of_nodes=colors,
        selected_edges=matched_left,
        color_of_edges=colors,
        default_color=default_color,
        lw=lw,
        ax=ax[0],
    )
    lT.plot_subtree(
        lT.get_ancestor_at_t(n2),
        end_time=end_time,
        size=size,
        selected_nodes=matched_right,
        color_of_nodes=colors,
        selected_edges=matched_right,
        color_of_edges=colors,
        default_color=default_color,
        lw=lw,
        ax=ax[1],
    )
    return ax[0].get_figure(), ax

unordered_tree_edit_distance

unordered_tree_edit_distance(
    lT: LineageTree,
    n1: int,
    n2: int,
    end_time: int | None = None,
    norm: Literal["max", "sum", None] = "max",
    style: (
        Literal[
            "simple",
            "normalized_simple",
            "full",
            "downsampled",
        ]
        | type[TreeApproximationTemplate]
    ) = "simple",
    downsample: int = 2,
    return_norms: bool = False,
) -> float | tuple[float, tuple[float, float]]

Compute the unordered tree edit distance from Zhang 1996 between the trees spawned by two nodes n1 and n2. The topology of the trees are compared and the matching cost is given by the function delta (see edist doc for more information). The distance is normed by the function norm that takes the two list of nodes spawned by the trees n1 and n2.

Parameters:

Name Type Description Default
lT
LineageTree

The LineageTree instance.

required
n1
int

id of the first node to compare

required
n2
int

id of the second node to compare

required
end_time
int

The final time point the comparison algorithm will take into account. If None or not provided all nodes will be taken into account.

None
norm
('max', 'sum')

The normalization method to use, defaults to 'max'.

"max"
style
('simple', 'normalized_simple', 'full', 'downsampled')

Which tree approximation is going to be used for the comparisons.

"simple"
downsample
int

The downsample factor for the downsampled tree approximation. Used only when style="downsampled".

2

Returns:

Type Description
float

The normalized unordered tree edit distance between n1 and n2

Source code in src/lineagetree/measure/uted.py
def unordered_tree_edit_distance(
    lT: LineageTree,
    n1: int,
    n2: int,
    end_time: int | None = None,
    norm: Literal["max", "sum", None] = "max",
    style: (
        Literal["simple", "normalized_simple", "full", "downsampled"]
        | type[TreeApproximationTemplate]
    ) = "simple",
    downsample: int = 2,
    return_norms: bool = False,
) -> float | tuple[float, tuple[float, float]]:
    """
    Compute the unordered tree edit distance from Zhang 1996 between the trees spawned
    by two nodes `n1` and `n2`. The topology of the trees are compared and the matching
    cost is given by the function delta (see edist doc for more information).
    The distance is normed by the function norm that takes the two list of nodes
    spawned by the trees `n1` and `n2`.

    Parameters
    ----------
    lT : LineageTree
        The LineageTree instance.
    n1 : int
        id of the first node to compare
    n2 : int
        id of the second node to compare
    end_time : int, optional
        The final time point the comparison algorithm will take into account.
        If None or not provided all nodes will be taken into account.
    norm : {"max", "sum"}, default="max"
        The normalization method to use, defaults to 'max'.
    style : {"simple", "normalized_simple", "full", "downsampled"} or TreeApproximationTemplate subclass, default="simple"
        Which tree approximation is going to be used for the comparisons.
    downsample : int, default=2
        The downsample factor for the downsampled tree approximation.
        Used only when `style="downsampled"`.

    Returns
    -------
    float
        The normalized unordered tree edit distance between `n1` and `n2`
    """
    parameters = (
        end_time,
        convert_style_to_number(style=style, downsample=downsample),
    )
    n1, n2 = sorted([n1, n2])
    lT._comparisons.setdefault(parameters, {})
    if lT._comparisons[parameters].get((n1, n2)):
        tmp = lT._comparisons[parameters][(n1, n2)]
    else:
        tmp = __unordereded_backtrace(
            lT, n1, n2, end_time, norm, style, downsample
        )
    btrc = tmp["alignment"]
    tree1, tree2 = tmp["trees"]
    _, times1 = tree1.tree
    _, times2 = tree2.tree
    (
        nodes1,
        adj1,
        corres1,
    ) = tree1.edist
    (
        nodes2,
        adj2,
        corres2,
    ) = tree2.edist
    delta_tmp = partial(
        tree1.delta,
        corres1=corres1,
        corres2=corres2,
        times1=times1,
        times2=times2,
    )

    if norm not in lT.norm_dict:
        raise ValueError(
            "Select a viable normalization method (max, sum, None)"
        )
    cost = btrc.cost(nodes1, nodes2, delta_tmp)
    norm_values = (tree1.get_norm(n1), tree2.get_norm(n2))
    if return_norms:
        return cost, norm_values
    return cost / lT.norm_dict[norm](norm_values)

unordered_tree_edit_distances_at_time_t

unordered_tree_edit_distances_at_time_t(
    lT: LineageTree,
    t: int,
    end_time: int | None = None,
    style: (
        Literal[
            "simple",
            "full",
            "downsampled",
            "normalized_simple",
        ]
        | type[TreeApproximationTemplate]
    ) = "simple",
    downsample: int = 2,
    norm: Literal["max", "sum", None] = "max",
    recompute: bool = False,
) -> dict[tuple[int, int], float]

Compute all the pairwise unordered tree edit distances from Zhang 996 between the trees spawned at time t

Parameters:

Name Type Description Default
lT
LineageTree

The LineageTree instance.

required
t
int

time to look at

required
end_time
int

The final time point the comparison algorithm will take into account. If None all nodes will be taken into account.

None
style
('simple', 'full', 'downsampled', 'normalized_simple')

Which tree approximation is going to be used for the comparisons.

"simple"
downsample
int

The downsample factor for the downsampled tree approximation. Used only when style="downsampled".

2
norm
('max', 'sum')

The normalization method to use.

"max"
recompute
bool

If True, forces to recompute the distances

False

Returns:

Type Description
dict mapping a tuple of tuple that contains 2 ints to float

a dictionary that maps a pair of node ids at time t to their unordered tree edit distance

Source code in src/lineagetree/measure/uted.py
def unordered_tree_edit_distances_at_time_t(
    lT: LineageTree,
    t: int,
    end_time: int | None = None,
    style: (
        Literal["simple", "full", "downsampled", "normalized_simple"]
        | type[TreeApproximationTemplate]
    ) = "simple",
    downsample: int = 2,
    norm: Literal["max", "sum", None] = "max",
    recompute: bool = False,
) -> dict[tuple[int, int], float]:
    """Compute all the pairwise unordered tree edit distances from Zhang 996 between the trees spawned at time `t`

    Parameters
    ----------
    lT : LineageTree
        The LineageTree instance.
    t : int
        time to look at
    end_time : int
        The final time point the comparison algorithm will take into account.
        If None all nodes will be taken into account.
    style : {"simple", "full", "downsampled", "normalized_simple"} or TreeApproximationTemplate subclass, default="simple"
        Which tree approximation is going to be used for the comparisons.
    downsample : int, default=2
        The downsample factor for the downsampled tree approximation.
        Used only when `style="downsampled"`.
    norm : {"max", "sum"}, default="max"
        The normalization method to use.
    recompute : bool, default=False
        If True, forces to recompute the distances

    Returns
    -------
    dict mapping a tuple of tuple that contains 2 ints to float
        a dictionary that maps a pair of node ids at time `t` to their unordered tree edit distance
    """
    if not hasattr(lT, "uted"):
        lT.uted = {}
    elif t in lT.uted and not recompute:
        return lT.uted[t]
    lT.uted[t] = {}
    roots = lT.time_nodes[t]
    for n1, n2 in combinations(roots, 2):
        key = tuple(sorted((n1, n2)))
        lT.uted[t][key] = lT.unordered_tree_edit_distance(
            n1,
            n2,
            end_time=end_time,
            style=style,
            downsample=downsample,
            norm=norm,
        )
    return lT.uted[t]