群晖可以做网站服务器,学生制作个人网站,珠海市企业网站制作平台,网站常见错误代码Cora_dataset description
Cora数据集是一个常用的学术文献用网络数据集#xff0c;用于研究学术文献分类和图网络分析等任务。 该数据集由机器学习领域的博士论文摘要组成#xff0c;共计2708篇论文#xff0c;涵盖了7个不同的学科领域。每篇论文都有一个唯一的ID#xf…Cora_dataset description
Cora数据集是一个常用的学术文献用网络数据集用于研究学术文献分类和图网络分析等任务。 该数据集由机器学习领域的博士论文摘要组成共计2708篇论文涵盖了7个不同的学科领域。每篇论文都有一个唯一的ID并且被分为以下7个类别之一Case_Based、Genetic_Algorithms、Neural_Networks、Probabilistic_Methods、Reinforcement_Learning、Rule_Learning和Theory。
除了论文之间的引用关系外Cora数据集还包含了每篇论文的词袋表示即将每篇论文表示为一个词频向量(0-1嵌入每行有多个1非one-hot vectorfeature of node)。这些词频向量表示了论文中出现的单词及其在该论文中的出现频率。
Cora数据集常用于图神经网络的研究和评估可以用于学术文献分类、引文网络分析、节点嵌入等任务。
print cora
dataset Planetoid(./tmp/Cora, nameCora, transformT.NormalizeFeatures())
num_nodes dataset.data.num_nodes
# For num. edges see:
# - https://github.com/pyg-team/pytorch_geometric/issues/343
# - https://github.com/pyg-team/pytorch_geometric/issues/852num_edges dataset.data.num_edges // 2
train_len dataset[0].train_mask.sum()
val_len dataset[0].val_mask.sum()
test_len dataset[0].test_mask.sum()
other_len num_nodes - train_len - val_len - test_len
print(fDataset: {dataset.name})
print(fNum. nodes: {num_nodes} (train{train_len}, val{val_len}, test{test_len}, other{other_len}))
print(fNum. edges: {num_edges})
print(fNum. node features: {dataset.num_node_features})
print(fNum. classes: {dataset.num_classes})
print(fDataset len.: {dataset.len()})GCN原理与实现
卷积公式 f ∗ g F − 1 ( F ( f ) ⋅ F ( g ) ) f*gF^{-1}(F(f)·F(g)) f∗gF−1(F(f)⋅F(g)) 给定一个图信号x和一个卷积核, x ∗ g U ( U T x ⊙ U T g ) U ( U T x ⊙ g θ ) D ~ − 0.5 A ~ D ~ − 0.5 X Θ x*gU(U^Tx\odot U^Tg)U(U^Tx\odot g_{\theta})\widetilde D^{-0.5}\widetilde A\widetilde D^{-0.5}X\Theta x∗gU(UTx⊙UTg)U(UTx⊙gθ)D −0.5A D −0.5XΘ 其中A为图的邻接矩阵D为图的度数矩阵 D ~ D γ I , A ~ A γ I \widetilde DD\gamma I,\widetilde AA\gamma I D DγI,A AγI添加自环以缩小 λ \lambda λ(Laplace matrix)
1.computation of D ~ − 0.5 A ~ D ~ − 0.5 \widetilde D^{-0.5}\widetilde A\widetilde D^{-0.5} D −0.5A D −0.5
def gcn_norm(edge_index, edge_weightNone, num_nodesNone,
add_self_loopsTrue, flowsource_to_target, dtypeNone):
fill_value 1.
num_nodes maybe_num_nodes(edge_index, num_nodes)
if add_self_loops: #添加自环
edge_index, edge_weight add_remaining_self_loops(
edge_index, edge_weight, fill_value, num_nodes)
edge_weight torch.ones((edge_index.size(1), ), dtypedtype,
deviceedge_index.device)
row, col edge_index[0], edge_index[1]
idx col
deg scatter(edge_weight, idx, dim0, dim_sizenum_nodes, reducesum)
deg_inv_sqrt deg.pow_(-0.5)
deg_inv_sqrt.masked_fill_(deg_inv_sqrt float(inf), 0)
edge_weight deg_inv_sqrt[row] * edge_weight * deg_inv_sqrt[col]
return edge_index, edge_weight代码解释
edge_index, edge_weight add_remaining_self_loops(edge_index, edge_weight,fill_value, num_nodes) D ~ D γ I , A ~ A γ I \widetilde DD\gamma I,\widetilde AA\gamma I D DγI,A AγI; deg scatter(edge_weight, idx, dim0, dim_sizenum_nodes, reducesum) 根据edge_weight和idxedge_index[1]得到度数矩阵degD
explantationedge_weight是要放入的对角阵
deg_inv_sqrt deg.pow_(-0.5)require D − 0.5 D^{-0.5} D−0.5 deg_inv_sqrt.masked_fill_(deg_inv_sqrt float(inf), 0) 由于D非对角元0其-0.5次幂∞需要转化为0 edge_weight deg_inv_sqrt[row] * edge_weight * deg_inv_sqrt[col] 输出归一化后的edge_index
2. PairNorm 3.GCNConv的实现如下(删改自torch_geometric.nn.GCNConv)
class myGCNConv2(MessagePassing):def __init__(self, in_channels: int, out_channels: int,add_self_loops: bool True,bias: bool True):super().__init__()self.in_channels in_channelsself.out_channels out_channelsself.add_self_loops add_self_loopsself.lin Linear(in_channels, out_channels, biasFalse,weight_initializerglorot)if bias:self.bias Parameter(torch.Tensor(out_channels))else:self.register_parameter(bias, None)self.reset_parameters()def reset_parameters(self):super().reset_parameters()self.lin.reset_parameters() #卷积层zeros(self.bias) #偏置层def forward(self, x: Tensor, edge_index: Adj,edge_weight: OptTensor None) - Tensor:edge_index, edge_weight gcn_norm( # yapf: disableedge_index, edge_weight, x.size(self.node_dim),self.add_self_loops, self.flow, x.dtype)x self.lin(x)# propagate_type: (x: Tensor, edge_weight: OptTensor)out self.propagate(edge_index, xx, edge_weightedge_weight,sizeNone)if self.bias is not None:out out self.biasreturn outdef message(self, x_j: Tensor, edge_weight: OptTensor) - Tensor:return x_j if edge_weight is None else edge_weight.view(-1, 1) * x_jdef message_and_aggregate(self, adj_t: SparseTensor, x: Tensor) - Tensor:return spmm(adj_t, x, reduceself.aggr)代码解释
x self.lin(x) X ′ X Θ X ∈ R n ∗ d 1 Θ ∈ R d 1 ∗ d 2 XX\ThetaX\in R^{n*d1}\Theta \in R^{d1*d2} X′XΘX∈Rn∗d1Θ∈Rd1∗d2对X降维 out self.propagate(edge_index, xx, edge_weightedge_weight,sizeNone) out A ′ X ′ D ~ − 1 2 A ~ D ~ − 1 2 X Θ AX\widetilde D^{-\frac 1 2}\widetilde A \widetilde D^{-\frac 1 2 } X \Theta A′X′D −21A D −21XΘ Converge {x1’,…,xn’} each of which be a sampled vectorinto target form.
messagemessage_and_aggregate为MessagePassing.propagate的相关函数 经测试删除后val acc下降故予以保留
4.Net(GCN)的实现
class GCN(torch.nn.Module):
def __init__(
self,
num_node_features: int,
num_classes: int,
hidden_dim: int 16,
dropout_rate: float 0.5,
) - None:
super().__init__()
self.dropout1 torch.nn.Dropout(dropout_rate)
self.conv1 myGCNConv2(num_node_features,
hidden_dim,add_self_loopsTrue)
self.relu torch.nn.ReLU(inplaceTrue)
self.dropout2 torch.nn.Dropout(dropout_rate)
self.conv2 myGCNConv2(hidden_dim, num_classes,add_self_loopsTrue)
self.pnPairNorm()
def forward(self, x: Tensor, edge_index: Tensor) - torch.Tensor:
x self.pn(x)
x self.dropout1(x)
x self.conv1(x, edge_index)
x self.relu(x)
x self.dropout2(x)
x self.conv2(x, edge_index)
return x代码解释
x self.pn(x)对x作PairNorm处理之后xi~N(0,s2)各节点特征范数大小平衡作用不明显 采用2层GCN卷积层中间用relu激活dropout避免过拟合
DropEdge Realization的手动实现
idea
首先把有向图的边转化为无向图的边保存在single_edge_index中实现时先用single_edge字 典存储每条无向边(key-value 任意)1次再把single_edge转化成无向图的边集索引(2-dim tensor array)
#single_edge_index
single_edge{}
for i in range(len(dataset.data.edge_index[0])):if(((dataset.data.edge_index[0][i],dataset.data.edge_index[1][i]) not in single_edge.items()) and ((dataset.data.edge_index[1][i],dataset.data.edge_index[0][i]) not in single_edge.items())):single_edge[dataset.data.edge_index[0][i]]dataset.data.edge_index[1][i]single_edge_index[[],[]]for key,value in single_edge.items():single_edge_index[0].append(key)single_edge_index[1].append(value) single_edge_indextorch.tensor(single_edge_index)再把无向边集舍去dropout_rate比例的部分之后转成有向边集索引
def drop_edge(single_edge_index, dropout_rate):# 计算需要丢弃的边数num_edges single_edge_index.shape[1]num_drop int(num_edges * dropout_rate)# 随机选择要丢弃的边remain_indices torch.randperm(num_edges)[num_drop:]remain_single_edges single_edge_index[:, remain_indices]reverse_edges torch.stack([remain_single_edges[1],remain_single_edges[0]],dim0)remain_edgestorch.cat([remain_single_edges,reverse_edges],dim1)return remain_edges