Tuesday, April 5, 2011

Is a KD-Tree a unique ordering of a given data set?

Given a set of data points, a kdtree is created over them, but is this kdtree a unique one?

From stackoverflow
  • It appears to depend on how you construct the tree. The Wikipedia article mentions how the selection of the median point affects whether the generated tree is balanced or not. If a different point is selected, then the tree will not be balanced but will still be a kd-tree. Therefore, the answer to your question depends on exactly how your tree construction algorithm chooses the splitting planes.

  • I don't think so.

    If the answer to your question were 'yes' then i think that would mean that the choice of dimension and value for each split were chosen by some objective criterion. The value of course is selected according to a precise algorithm (i.e., calculating the median of all points to be split in that dimension, but not the dimension. Most KD-Tree algorithms select the dimension to split just by alternating through the available dimensions. Some algorithms just randomly select the dimension to split on.

    This is very different than a C4.5 (Decision Tree) because there, the dimension and value to split on are chosen by an objective criterion, i.e., entropy minimization (for categorical variables) or variance (for continuous variables).

    serina : @doug:well, is the root unique?
    doug : well the value is unique but only once the dimension is specified (and i believe this includes root, as for all 'nodes'); again, though even for the root node, the choice of dimension in the implementations i'm aware of (and i checked several before responding to your Q) is just cycling through the available dimensions.

0 comments:

Post a Comment