Tuesday, April 17, 2012

Postpruning - First Whack


Zoo dataset A-B, before and after post-pruning.
It's funny I'm working on pruning for decision trees while earlier I was out in the yard pruning real trees.

It would be nice to eliminate the construction of these redundant nodes in the first place, but that seems to mean comparing the counts of the dataset and sub-datasets to see if the test is redundant which in Linq could be expensive.

It's a good thing I implemented edges because in previous versions of this function I was trying to remove and add nodes which threw an exception because you can't modify collections while your iterating over them.  I got around this by filling a list of replacement structures which was iterated over after the main routine was finished.  What was really a deal breaker is that this changed the order of the branches.  I finally realized I don't need to modify any collections I can just modify the nodes and edges.

Yay edges!

        private DtBranch Postprune(DtBranch branch)
        {
            DtBranch bottom = branch;
            //
            if (branch.HasAllLeafChildren)
                return branch;
            else if (branch.HasSingleChild)
            {
                DtBranch child = branch.Children.Single() as DtBranch;
                bottom = Postprune(child);
                bottom.Edge = branch.Edge;
                bottom.Edge.To = bottom;
            }
            else
                foreach (DtNode child in branch.Children)
                    if(child.Kind == DtElementKind.Branch)
                        Postprune(child as DtBranch);
            //
            return bottom;
        }

No comments: