Python Find All Parent/Child Nodes in Pandas DataFrame - Listing Subtree Descendants

In this quick article, we'll have a look at how to list all parent/child nodes in Pandas DataFrame. This might be useful in certain scenarios like verifying trees or networks.

Step 1: Prepare Hierarchical data

Lets prepare hierarchical data which is going to be used for our example:

import pandas as pd

df = pd.DataFrame({
    'parent': [0, 0, 1, 1, 2, 2, 3, 3, 4, 4],
    'child': [1, 2, 3, 4, 5, 6, 6, 7, 8, 9]
})

Sometimes parent or child information might be stored in the index. In those cases index can be transfered to a column by:

df_req['index1'] = df_req.index

Step 2: Install Python package networkx

As a second step we need to install package: networkx by:

pip install networkx

This package is used for creation, manipulation, and analysis of the structure and features of complex networks.

Step 3: Listing Subtree Descendants with NetworkX in Pandas DataFrame

Finally if we like to get all descendants of node 1 in this DataFrame we can do it by converting the DataFrame records to NetworkX nodes:

import networkx as nx


g=nx.DiGraph()
g.add_edges_from(df[['parent', 'child']].to_records(index=False))

and then listing the subtree of NetworkX by:

from networkx.algorithms.traversal.depth_first_search import dfs_tree

x = dfs_tree(g, 1)
x.edges()

Which will result in:

OutEdgeView([(1, 3), (1, 4), (3, 6), (3, 7), (4, 8), (4, 9)])

Visually the same can be represented by:

Step 4: Set List of Descendants for each Row(Optional)

In this step we are going to add a new column with a list of all descendants recursively.

def get_descendants(parent):
    descendants = list(dfs_tree(g, parent).edges())
    return [x[1] for x in descendants]

    
df["descendants"] = df["parent"].apply(get_descendants)

This will create new column with a list of all childs for the current parent:

parent child descendants
0 0 1 [1, 3, 6, 7, 4, 8, 9, 2, 5, 6]
1 0 2 [1, 3, 6, 7, 4, 8, 9, 2, 5, 6]
2 1 3 [3, 6, 7, 4, 8, 9]
3 1 4 [3, 6, 7, 4, 8, 9]
4 2 5 [5, 6]
5 2 6 [5, 6]
6 3 6 [6, 7]
7 3 7 [6, 7]
8 4 8 [8, 9]
9 4 9 [8, 9]

If you like to use custom code version you can use the one below. The code is a bit slower than the previous version:

def get_children(parent_id):
    list_of_children = []

    def dfs(parent_id):
        child_ids = df[df["parent"]==parent_id]["child"]
        if child_ids.empty:
            return 
        for child_id in child_ids:
            list_of_children.append(child_id)
            dfs(child_id)

    dfs(parent_id)
    return list_of_children

df["descendants"] = df["parent"].apply(get_children)