- Graph $G=(V,E,X_V,X_E)$, where
- $V$ is the set of vertex ids.
- Without loss of generality $V=\{1,\ldots, n\}$.

- $E \subseteq V \times V$ is the set of edges
- $X_V$ and $X_E$ are the sets of node and edge attributes, respectively

- $V$ is the set of vertex ids.

The **adjacency matrix** $A \in \mathbb{A}^{n \times n}$ is a square matrix whose elements indicate whether pairs of vertices have a relationship in some appropriate space $\mathbb{A}$.

- In the simplest case, $\mathbb{A} = \{0,1\}$, and $A_{ij} = 1$ if there is a connection from node $i$ to $j$, and otherwise $A_{ij} = 0$ (indicating a non-edge).
- If there are edge attributes (e.g., edge relations), then $\mathbb{A}$ is some appropriate space that represents these relations (often encoded as a binary vector) or some real-valued connection strength $\mathbb{A} \subseteq \mathbb{R}^d$. This makes the adjacency matrix into a
**adjacency tensor**${\bf A}$. - Self-loops, e.g., $A_{ii}$, will encode
**node attributes**(self-relations). - For an undirected graph, keep in mind that $A$ is a symmetric matrix ($A_{ij}=A_{ji}$). For the example graph below has adjacency matrix: $$ A = \begin{bmatrix} 0 & 1 & 0 & 0\\ 1 & 0 & 1 & 1\\ 0 & 1 & 0 & 1\\ 0 & 1 & 1 & 0 \end{bmatrix} $$

- Now, we would like to
**learn**a function $f({\bf A})$ that produces a representation of graph ${\bf A} \in \mathbb{A}^{n\times n}$.

- These tasks are
**inductive graph tasks**:- We train on a set of graphs
- Apply them on a different set of graph

- Given a set of node lables as training data (e.g., label $\in$ {Bot,Real user})
- Predict labels of unlabeled nodes
**In graphs, the boundaries between***training*and*test*data can be sometimes blurry- We will cover the different training and test splits in our class on
*Positional vs Structural Node Embeddings*

- Assume we have a function $f: \mathbb{A}^{n\times n} \times V \to \mathbb{R}^d, d \geq 1$.
- $f({\bf A},v)$ produces a representation of node $v \in V$ in graph ${\bf A} = \mathbb{A}^{n\times n}$.

Assumption:

- Vertex ids are arbitrary, the output of our graph predictions should not change if we permuted them.

Theory:

A sequence of $n^2$ random variables $(X_1,\ldots,X_{n^2}) \in \mathbb{A}^{n^2}$ is said to be a graph whenever its distribution has the following property: $$ P(X_1,\ldots,X_{n^2}) = P(\pi \circ (X_1,\ldots,X_{n^2})), $$ for $\pi \in \Pi$, where $\Pi \subseteq \mathbb{S}_{n^2}$ is a special subset of permutations of the symmetric group $\mathbb{S}_{n^2}$ of order $n^2$.

- The symmetric group $\mathbb{S}_k$ is the group of permutations of the elements $(1,\ldots,k)$.

**Q**: Consider the graph

Define $\text{vec}(A) = (0,0,0,0,1,\ldots,0,1,1,1)$. Consider a subgroup $\Pi \subseteq \mathbb{S}_{n^2}$ (a subgroup is a subset of a group that also forms a group) defined by some unknown bijection between the elements in the sequence and the edges in the adjacency matrix $A$. All we know about $\Pi$ is that for some $\pi \in \Pi$ we obtain $\pi \circ \text{vec}(A) = (0,0,0,0,0,\ldots,1,1,1,1)$.

**Mark the only permutation performed by $\pi' \in \mathbb{S}_{n^2} \backslash \Pi$.**

- That is, mark the
**only**permutation that would be a*valid*permutation for a set but**invalid**for a graph.

- $(0,0,0,1,0,\ldots,1,1,1,1)$
- $(0,0,0,0,0,\ldots,1,1,1,1)$
- $(0,0,0,0,1,\ldots,0,1,1,1)$
- $(0,0,0,0,1,\ldots,1,0,1,1)$
- $(0,0,0,0,1,\ldots,1,1,0,1)$
- $(0,0,0,0,1,\ldots,1,1,1,0)$

- It is easier to define the graph permutations over the adjacency tensor $A$ using all the elements of $\mathbb{S}_n$ rather than a subgroup of $\mathbb{S}_{n^2}$.
- A
**group action**is the application of a group representation of an element of the group onto some object.- For instance, the group action "$\circ$" of permutation $\pi \in \mathbb{S}_n$ on the adjacency matrix is denoted $\pi \circ {\bf A}$ and jointly permutes the rows and columns of ${\bf A}$ according to the permutation in $\pi$.

- We call ${\bf A}$ and $\pi \circ {\bf A}$ isomorphic graphs, where $\pi \in \mathbb{S}_n$ is an element of the symmetric group of order $n$ and "$\circ$" is the appropriate action (representation) over the input.

Assume we have a function $f: \mathbb{A}^{n\times n} \times V \to \mathbb{R}^d$ that produces a representation of node $v \in V$ in graph ${\bf A}$.

Let's consider the Arctic food web graph below.

- We have added the constraint that $f$ is invariant to permutations of the vertex ids
- That is, $f(\pi \circ {\bf A}, \pi \circ v) = f({\bf A}, v).$

- Let $Z_v = f({\bf A},v)$ be the a permutation-invariant representation of node $v \in V$
- Representations with the same colors have the same representation vectors.

- We have added the constraint that $f$ is invariant to permutations of the vertex ids

- Node representations that are invariant to permutations are denoted
**structural node representations**.- Because nodes that are structurally equivalent on the graph
*must*get the same representations

- Because nodes that are structurally equivalent on the graph

- In the above picture the coloring is obtained through $f$ which is a Graph Neural Network (which we will see next).

**Definition (Structural node representations)**: Structural node representations of an $n\times n$ adjacency matrix ${\bf A}$ (or $n\times n \times k$, $k \geq 2$, adjacency tensor ${\bf A}$) are obtained from a function $f : \mathbb{R}^{n\times n} \to \mathbb{R}^{n \times d}$ where
$$ u \simeq v \implies (f({\bf A}))_{v,\cdot} = (f({\bf A}))_{u,\cdot},$$
where $(f({\bf A}))_{u,\cdot}$ denotes the row of $f({\bf A})$ corresponding to vertex $u$.

**equivariant** to permutations of the adjacency matrix $\bf A$:
$$\forall \pi \in \mathbb{S}_n \quad \quad f(\pi \circ {\bf A}) = \pi \circ f({\bf A}),$$
noting that $\pi$ over ${\bf A}$ jointly permutes rows and columns, but applied at $f$ just permutes the rows.

**Definition (Most Expressive Structural node representations)**:
We say the structural representations are most expressive iff:

$$v \simeq u \Leftrightarrow f({\bf A})_{u,\cdot} = f({\bf A})_{v,\cdot},$$
that is, $f$ gives different embeddings to $u,v \in V$ if they are non-isomorphic, and the same embeddings if $u,v$ are isomorphic.

**Graph neural networks (GNNs):** GNNs output structural node embeddings.

Assume we have a function $f: \mathbb{A}^{n\times n} \times V \to \mathbb{R}^d$ that produces a representation of node $v \in V$ in graph ${\bf A}$.

Let's consider the Arctic food web graph below.

- We have added the constraint that $f$ is invariant to permutations of the vertex ids
- That is, $f(\pi \circ {\bf A}, \pi \circ v) = f({\bf A}, v).$

- Let $Z_v = f({\bf A},v)$ be the a permutation-invariant representation of node $v \in V$
- Representations with the same colors have the same representation vectors.

- We have added the constraint that $f$ is invariant to permutations of the vertex ids

- Node representations that are invariant to permutations are denoted
**structural node representations**.- Because nodes that are structurally equivalent on the graph
*must*get the same representations

- Because nodes that are structurally equivalent on the graph

- In the above picture the coloring is obtained through $f$ which is a Graph Neural Network (which we will see next).

(Murphy et al., 2019b) gave the following (then) surprising result. The work defined *Relational Pooling*, a neural representation defined as
$$
\Gamma(A,i) = \frac{1}{n!} \sum_{\pi \in \mathbb{S}_n} f(\pi \circ A, \pi \circ i),
$$
where $f: \mathbb{A}^{n^2} \times V \to \mathbb{R}^d$ is an arbitrary learnable function (e.g., neural network) that outputs a $d$-dimensional node representation of node $i\in V$ in graph $A \in \mathbb{A}^{n^2}$.

**Theorem 2.1** (Murphy et al., 2019b)
If node and edge attributes come from a finite set and $f: \mathbb{A}^{n^2} \times V \to \mathbb{R}^d$ is an universal function approximator, then $$
\Gamma(A,i) = \frac{1}{n!} \sum_{\pi \in \mathbb{S}_n} f(\pi \circ A, \pi \circ i)
$$
is a most-expressive graph representation of node $i\in V$ in graph $A \in\mathbb{A}^{n^2}$.

- For instance, $f$ could be an MLP or a recurrent neural network (RNN).

$k$-ary representations in Relational Poolling is equivalent to using subgraphs to represent a graph.

**Graph $k$-reconstruction conjecture**. Kelly et al. (1957) generalize the**graph reconstruction conjecture**, considering the multiset of subgraphs of size $k$, which we denote $D_k(G) = \{\{I(H): H \in S^{(k)}(G)\}\}$, where $S^{(k)}$ is the set of all $\binom{n}{k}$ $k$-size subsets of $V$. We often call an element in $D_k(G)$ a $k$-card. Let $G$ and $H$ be graphs, then $H$ is a $k$-reconstruction of $G$ if $H$ and $G$ have the same $k$-deck, denoted $H \sim_k G$.**A graph $G$ is $k$-reconstructible if every $k$-reconstruction of $G$ is isomorphic to $G$, i.e., $H \sim_k G$ implies $H \cong G$.**Results for $k$-reconstruction usually state the least $k$ as a function of $n$ such that all graphs $G$ in $\mathcal{G}$ (or some subset) are $k$-reconstructible (Nydl, 2001). There exist extensive partial results in this direction, mostly describing $k$-reconstructibility (as a function of $n$) for a particular family of graphs, such as trees, disconnected graphs, complete multipartite graphs, and paths. More concretely, Nydl (2001), Spinoza and West (2019) showed graphs with $2k$ vertices that are not $k$-reconstructible. In practice, these results imply that for some fixed $k$ there will be graphs with not many more vertices than $k$ that are not $k$-reconstructible. Further, $k$-reconstructible graph functions such as degree sequence and connectedness have been studied in Manvel (1974) and Taylor (1990) depending on the size of $k$. Please refer to Cotta et al., (2021) for a more complete discussion.

- Nydl, V. (2001). Graph reconstruction from subgraphs. Discrete Mathematics, 235(1-3):335–341.
- Spinoza, H. and West, D. B. (2019). Reconstruction from the deck of-vertex induced subgraphs. Journal of Graph Theory, 90(4):497–522.
- Kostochka, A. V. and West, D. B. (2020). On reconstruction of n-vertex graphs from the multiset of ($n-l$)-vertex induced subgraphs. IEEE Transactions on Information Theory, PP:1–1.
- Manvel, B. (1974). Some basic observations on Kelly’s conjecture for graphs. Discrete Mathematics, 8(2):181–185.
- Taylor, R. (1990). Reconstructing degree sequences from k-vertex-deleted subgraphs. Discrete mathematics, 79(2):207–213.

Let's start with a CNN filter:

Note that the CNN filter is applied to a constant number of inputs in the last layer.

Topologically, an image is a lattice and the convolution is a filter applied over a patch of the lattice

Then, we apply this convolution in layers, multiple times. Note that each pixel in the previous layer has a *convolution representation* in the next layer.

Note that in CNNs:

**Convolutions are applied over fixed-size neighborhoods of a vertex (pixel)****Convolutions are permutation-sensitive**

Today, we are going to explore convolutions over more complex topologies.

Let $\bf A$ represent an arbitrary simple (no loops) undirected social network graph. We now illustrate how a convolution should act over the topological space of Facebook friends:

The graph neural layer applied to Bruno will act on his neighbors

The graph neural layer applied to one of his friends will act on her neighbors

The representation ${\bf h}^{(k)}_v \in \mathbb{R}^{1 \times d}$, $d \in \mathbb{N}$, at layer $k$ of node $v \in V$ can be described through:

- Its own embedding at layer $k-1$, ${\bf h}^{(k-1)}_v \in \mathbb{R}^{1 \times d'}$, $d' \in \mathbb{N}$ and
- The embedding of its neighbors $({\bf h}^{(k-1)}_u)_{\forall u \in \mathcal{N}_v}$ where $\mathcal{N}_v$ is the set of neighbors of $v$ in $V$.

The resulting filter recursion is: $$ {\bf h}^{(k)}_v = \sigma\left( {\bf h}^{(k-1)}_v {\bf W}_\text{self}^{(k)} + \text{Pooling}\left(({\bf h}^{(k-1)}_u)_{\forall u \in \mathcal{N}_v}\right) {\bf W}_\text{neigh}^{(k)} + {\bf b}^{(k)} \right), $$ where ${\bf W}_\text{self}^{(k)} \in \mathbb{R}^{d' \times d}$, ${\bf W}_\text{neigh}^{(k)} \in \mathbb{R}^{d' \times d}$, ${\bf b}^{(k)} \in \mathbb{R}^{1 \times d}$, $d'\in \mathbb{N}$ and $\text{Pooling}$ is a function that outputs a value that does not depend on the order of the inputs.

Given a sequence (input vector) of arbitrary length $n \geq 1$ $$ {\bf x} = (x_1,\ldots,x_n) $$ a permutation $\pi$ over the integers $\{1, 2,\ldots,n\}$ is such that ${\bf x}_\pi = (x_{\pi_1},\ldots,x_{\pi_n})$ is the correspondingly permuted version of ${\bf x}$.

Pooling is such that $$ \text{Pooling}({\bf x}) = \text{Pooling}(\pi \circ {\bf x}), \quad \forall \pi \in \mathbb{S}_n, $$

See our lecture on set representations for a complete list of possible Pooling functions.

**Sum pooling example:**

Using sum pooling, the resulting filter recursion is: $$ {\bf h}^{(k)}_v = \sigma\left( {\bf h}^{(k-1)}_v {\bf W}_\text{self}^{(k)} + \sum_{\forall u \in \mathcal{N}_v} {\bf h}^{(k-1)}_u {\bf W}_\text{neig}^{(k)} + {\bf b}^{(k)} \right), $$ where again $\mathcal{N}_v$ are the neighbors of node $v$ in the graph.

This variant is commonly known as `GraphConv`

in the literature, and has been thoroughly studied in Morris et al. 2019.

Most existing Graph Neural Networks are a variant of the above recursion.

Graph Attention Networks (GATs) (Veličković et al., 2018) replace ${\bf h}^{(k-1)}_u$ inside the pooling with $\alpha({\bf h}^{(k-1)}_v,{\bf h}^{(k-1)}_u) {\bf h}^{(k-1)}_u$, where $\alpha(\cdot,\cdot) \in (0,1)$ is an attention mechanism (Bahdanau et al., 2015).

More generally, an edge function can also be added before the set representation function, to better represent the connection between $v$ and its neighbors: $$ {\bf h}^{(k)}_v = \sigma\left( {\bf h}^{(k-1)}_v {\bf W}_\text{self}^{(k)} + \text{Pooling}\!\left(\left(\text{edgefunc}\left({\bf h}^{(k-1)}_v,{\bf h}^{(k-1)}_u\right)\right)_{\forall u \in \mathcal{N}_v}\right) {\bf W}_\text{neig}^{(k)} + {\bf b}^{(k)} \right), $$ where GAT is just one such example.

- GNNs assign the
**same representation to isomorphic graphs**.- In view of the required constraint that $f$ is invariant to permutations of the vertex ids.

- Additional desideratum:
**Different representations to non-isomorphic graphs**.- Non isomorphic graphs have different structures, so they should have different representations.
- A downstream classifier can then push them together, if needed.

**Definition (Most Expressive function)**:
A function is most expressive if it gives the same representation to isomorphic graphs and can give different representations to non-isomorphic graphs.

**Q:** Are GNNs most-expressive representations for graphs?

**A:** No, there exist pairs of non-isomorphic graphs that cannot be distinguished by *standard* GNNs (Xu et al., 2019).

Independently of the learning procedure, certain non-isomorphic graphs will get the same representation.

*Example 1: Circular Skip Link (CSL) task (Murphy et al., 2019)*

- If GNNs are not most expressive, is there a way of quantifying their expressive power?
- Do all GNNs have the same expressive power?

We will see that we can quantify the expressive power of GNN in terms of the WL test, a well known heuristic used in graph theory to test graph isomorphism.

Recall our desideratum, which is not obtained by standard GNNs: **different representations to non-isomorphic graphs.**

Obviously, if a network is able to test for graph isomorphism, then it is able to assign a unique representation to each isomorphic class.

- But testing for graph isomorphism is not an easy problem.

The graph isomorphism problem is NP-intermediate (i.e., not known to be solvable in poly time, nor to be NP complete).

The WL test is a combinatorial heuristic to test graph isomorphism.

- If two graphs are distinguished by the WL test, they are non-isomorphic.
**The contrary is not true!**

Figure credit Expressive power of graph neural networks and the Weisfeiler-Lehman test

**Theorem (Xu et al. 2019, Morris et al. 2019)**: the expressive power of GNNs is upper bounded by the WL test.

Remark: since there exist pairs of non-isomorphic graphs that are undistinguishable by the WL test, there exist pairs of non-isomorphic graphs that are undistinguishable by GNNs.

GNNs are at most as expressive as the WL test in distinguishing non-isomorphic graphs.

Let us see an example of non-isomorphic graphs that are indistinguishable by a GraphConv-based GNN.

- We will use Pytorch Geometric for our GNN implementation.

In the environment you have in the scholar cluster install pyg as follows:

```
pip install -q torch-scatter -f https://data.pyg.org/whl/torch-${TORCH}.html
pip install -q torch-sparse -f https://data.pyg.org/whl/torch-${TORCH}.html
pip install -q git+https://github.com/pyg-team/pytorch_geometric.git
```

where `TORCH`

is simply the version of Pytorch you have installed, which you can find with python as

```
import torch
print(torch.__version__)
```

In [1]:

```
# @title [RUN] Import modules
import os.path as osp
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
import scipy.io as sio
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from scipy.special import comb
from sklearn.metrics import accuracy_score
from torch_geometric.data import InMemoryDataset
from torch_geometric.data.data import Data
from torch_geometric.loader import DataLoader
from torch_geometric.nn import GraphConv, global_add_pool
from torch_geometric.utils import to_networkx, to_undirected
```

In [2]:

```
# @title [RUN] Helper functions for plots and visualisations
def get_color(node_emb, hashmap):
max_val = max(hashmap.values(), default=0) + 1
crt_node_emb = np.round(node_emb, decimals=2)
if tuple(crt_node_emb) not in hashmap:
hashmap[tuple(crt_node_emb)] = max_val
max_val += 1
crt_node_emb = hashmap[tuple(crt_node_emb)]
# Map float number to a color
crt_color = cm.tab10(crt_node_emb, bytes=True)
crt_color = (
crt_color[0] / 255.0,
crt_color[1] / 255.0,
crt_color[2] / 255.0,
crt_color[3] / 255.0,
)
return crt_color
def draw_one_graph(
ax, edges, hash, label=None, node_emb=None, layout=None,
):
graph = nx.Graph()
graph.add_edges_from(zip(edges[0], edges[1]))
node_pos = layout(graph)
# Add colors according to node embeding
if node_emb is not None:
color_map = [
get_color(node_emb[node_id], hash)
for node_id in graph.nodes()
]
nx.draw_networkx_nodes(
graph, node_pos, node_color=color_map, nodelist=graph.nodes(), ax=ax
)
nx.draw_networkx_edges(graph, node_pos, ax=ax)
nx.draw_networkx_labels(graph, node_pos, ax=ax)
else:
nx.draw_networkx(graph, node_pos, ax=ax)
def gallery(
graphs,
hash=None,
labels=None,
node_emb=None,
max_fig_size=(40, 10),
layout=nx.layout.kamada_kawai_layout,
):
if hash is None:
hash = {}
num_graphs = len(graphs)
ff, axes = plt.subplots(
1, num_graphs, figsize=max_fig_size, subplot_kw={"xticks": [], "yticks": []}
)
if num_graphs == 1:
axes = [axes]
if node_emb is None:
node_emb = num_graphs * [None]
if labels is None:
labels = num_graphs * [" "]
for i in range(num_graphs):
draw_one_graph(
axes[i],
graphs[i].edge_index.numpy(),
hash,
labels[i],
node_emb[i],
layout,
)
if labels[i] != " ":
axes[i].set_title(f"Target: {labels[i]}", fontsize=28)
axes[i].set_axis_off()
plt.show()
```

Let us consider three exemplary graphs chosen from the Graph8c dataset, which contains all the possible connected non-isomorphic simple graphs with 8 nodes.

**The goal is to understand whether a GraphConv-based GNN can disambiguate them.**

To do so, we are going to assign a different class to each of them, and we are going to place them in our training set. A GNN can disambiguate them if and only if it can learn to correctly predict their targets.

In [3]:

```
# @title [RUN] `Graph8c` data retrieval
# Let's get three (selected) graphs from the Graph8c dataset from
# “Breaking the Limits of Message Passing Graph Neural Networks”
# (https://arxiv.org/pdf/2106.04319.pdf)
!wget https://raw.githubusercontent.com/balcilar/gnn-matlang/main/dataset/graph8c/raw/graph8c.g6
```

In [4]:

```
IDS = [379, 4625, 4657]
class Grapg8cDataset(InMemoryDataset):
def __init__(self, root, transform=None, pre_transform=None):
super(Grapg8cDataset, self).__init__(root, transform, pre_transform)
self.data, self.slices = torch.load(self.processed_paths[0])
@property
def raw_file_names(self):
return ["graph8c.g6"]
@property
def processed_file_names(self):
return "data.pt"
def download(self):
# Download to `self.raw_dir`.
pass
def process(self):
# Read data into huge `Data` list.
dataset = nx.read_graph6(self.raw_file_names[0])
data_list = []
for i, datum in enumerate(dataset):
if i not in IDS:
continue
x = torch.ones(datum.number_of_nodes(), 1)
edge_index = to_undirected(
torch.tensor(list(datum.edges())).transpose(1, 0)
)
data_list.append(
Data(edge_index=edge_index, x=x, y=torch.tensor(IDS.index(i)))
)
if self.pre_filter is not None:
data_list = [data for data in data_list if self.pre_filter(data)]
if self.pre_transform is not None:
data_list = [self.pre_transform(data) for data in data_list]
data, slices = self.collate(data_list)
torch.save((data, slices), self.processed_paths[0])
```

In [5]:

```
ds = Grapg8cDataset(root="tmp/Graph8c")
G1, G2, G3 = ds[0], ds[1], ds[2]
```

Let us assign a different color for every different input feature.

In [6]:

```
gallery(
[G1, G2, G3],
labels=np.array([G1.y.item(), G2.y.item(), G3.y.item()]),
)
```

Now, it is time to implement our GNN and train to distinguish these three.

So far so good, it seems an easy task.

In [7]:

```
class MLP(nn.Module):
"""A simple feed forward neural network"""
def __init__(self, in_dim, emb_dim, num_layers=2):
super(MLP, self).__init__()
layer_list = []
layer_list.append(torch.nn.Linear(in_dim, emb_dim))
for _ in range(num_layers - 1):
layer_list.append(torch.nn.BatchNorm1d(emb_dim))
layer_list.append(torch.nn.ReLU())
layer_list.append(torch.nn.Linear(emb_dim, emb_dim))
self.layers = torch.nn.Sequential(*layer_list)
def forward(self, x):
return self.layers(x)
```

In [8]:

```
class GraphConvNet(torch.nn.Module):
def __init__(self, in_dim, emb_dim, out_dim, num_layers, num_final_layers=1):
super(GraphConvNet, self).__init__()
self.convs = torch.nn.ModuleList()
for layer in range(num_layers):
self.convs.append(
GraphConv(
emb_dim if layer != 0 else in_dim,
emb_dim,
)
)
self.pool = global_add_pool
self.final_layers = MLP(emb_dim, out_dim, num_layers=num_final_layers)
def forward(self, data):
h_node = data.x
for conv in self.convs:
h_node = F.relu(conv(h_node, data.edge_index))
h_graph = self.pool(h_node, data.batch)
return self.final_layers(h_graph), h_node.detach()
```

In [9]:

```
# @title [RUN] Hyperparameters GNNs
BATCH_SIZE = 128 # @param {type:"integer"}
NUM_EPOCHS = 25 # @param {type:"integer"}
HIDDEN_DIM = 64 # @param {type:"integer"}
NUM_LAYERS = 4 # @param {type:"integer"}
LR = 0.001 # @param {type:"number"}
SEED = 42 # @param {type:"integer"}
DEVICE = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
```

In [10]:

```
# @title [RUN] Set the seed
torch.manual_seed(SEED)
np.random.seed(SEED)
```

In [11]:

```
def train(train_loader, model, optimiser, loss_fn, metric_fn):
"""Train model for one epoch"""
model.train()
total_loss = 0
num_graphs = 0
for data in train_loader:
optimiser.zero_grad()
data = data.to(DEVICE)
y_hat, _ = model(data)
loss = loss_fn(y_hat, data.y)
loss.backward()
optimiser.step()
total_loss += loss.item() * len(data.y)
num_graphs += len(data.y)
return total_loss / num_graphs
def evaluate(loader, model, metric_fn):
"""Evaluate model on dataset"""
y_pred, y_true = [], []
model.eval()
for data in loader:
data = data.to(DEVICE)
y_hat, _ = model(data)
y_pred.append(y_hat.detach().cpu())
y_true.append(data.y.detach().cpu())
y_pred = torch.cat(y_pred, dim=0)
y_true = torch.cat(y_true, dim=0)
return metric_fn(y_true, y_pred)
```

In [12]:

```
def run(
model,
train_loader,
loaders,
loss_fn,
metric_fn,
use_scheduler=False,
print_steps=True,
):
"""Train the model for NUM_EPOCHS epochs"""
# Instantiate optimiser and scheduler
optimiser = optim.Adam(model.parameters(), lr=LR)
scheduler = (
optim.lr_scheduler.StepLR(optimiser, step_size=DECAY_STEP, gamma=DECAY_RATE)
if use_scheduler
else None
)
curves = {name: [] for name in loaders.keys()}
for epoch in range(NUM_EPOCHS):
train_loss = train(
train_loader, model, optimiser, loss_fn, metric_fn
)
if scheduler is not None:
scheduler.step()
for name, loader in loaders.items():
curves[name].append(evaluate(loader, model, metric_fn))
if print_steps:
print(
f"[Epoch {epoch}]",
f"train loss: {train_loss:.6f}",
end=" ",
)
for name, metric in curves.items():
print(f"{name} metric: {metric[-1]:.3f}", end=" ")
print("\n")
return curves
```

In [13]:

```
# Instantiate our models
graphconv_model = GraphConvNet(
in_dim=ds.num_features,
emb_dim=HIDDEN_DIM,
out_dim=ds.num_classes,
num_layers=NUM_LAYERS,
).to(DEVICE)
```

In [14]:

```
# Create the Dataloader
train_loader = DataLoader(ds, BATCH_SIZE, shuffle=True)
```

In [15]:

```
# Train our GNN model
_ = run(
graphconv_model,
train_loader,
{"train": train_loader},
loss_fn=F.cross_entropy,
metric_fn=lambda x, y: accuracy_score(x, y.argmax(-1)),
)
```

Since the accuracy is 2/3, the GNN model is able to distinguish 2/3 of the graphs.

- And since we had 3 graphs, it implies that 2 of them are indististinguishable.

In [16]:

```
_, G1_h_nodes = graphconv_model.to("cpu")(G1)
_, G2_h_nodes = graphconv_model.to("cpu")(G2)
_, G3_h_nodes = graphconv_model.to("cpu")(G3)
stateful_hash = {}
gallery(
[G1, G2, G3],
hash=stateful_hash,
labels=np.array([G1.y.item(), G2.y.item(), G3.y.item()]),
node_emb=[G1_h_nodes.numpy(), G2_h_nodes.numpy(), G3_h_nodes.numpy()],
)
```