# How to build sequential recommendation systems with convolutional networks of graphs?

In one of the previous articles, we discussed sequential recommender systems in detail. Sequential recommendation systems recommend items to users by capturing their sequence of interactions. Even when the user interacts with the system in sequential order, there are also several dependencies and relationships in this interaction that must be modeled. For such use cases, Jianxin Chang and his team proposed a solution in which graphical convolutional neural networks can be used to create efficient sequential recommendation systems. In this article, we’ll take a close look at this approach, going over the following important points.

**Contents**

- Traditional recommendation systems
- Sequential recommendation systems
- Convolutional networks of graphs
- Sequential recommendation with convolutional graph networks
- Methodology

Let’s start the discussion by having a brief understanding of traditional and sequential recommender systems.

**Traditional recommendation systems**

Traditional recommendation systems are generally considered to be collaborative filtering systems and content-based filtering systems. Collaborative filtering systems predict your preferences based on similar preferences of other users. Whereas content-based filtering systems predict your preferences based solely on your search history. These two recommender systems fail to predict the user’s next move because their interaction is static.

**Sequential recommendation systems**

Sequential recommendation systems try to understand user input over time and model in sequential order. User input interaction is primarily sequence dependent. Sequential recommendation systems are able to capture various user behaviors as trends change.

The image above represents the sequential user behavior. The user buys a few items like a dress, handbag, and shoes. The recommendation system predicts the next item to display.

**Convolutional networks of graphs**

A graph convolutional network is a class of neural networks used to process data that is represented by data structures of graphs. Convolutional graph networks are similar to convolutional neural networks that learn the characteristics of neighboring nodes. The main difference between them is that a convolutional neural network is designed to work on Euclidean or regularly structured data, while a graph neural network is designed to work on non-Euclidean or irregularly structured data.

The image above shows a convolutional neural network on the left side and a convolutional network graph on the right side.

**Sequential recommendation with convolutional graph networks**

**Challenges**

User behaviors in long sequences reflect implicit and noisy preference signals. User behavior is rich in historical sequences. There are many things that users can interact with that they are not interested in. This type of behavior will not reflect user preferences. This type of user recording will act as noise and deteriorate the performance of the model.

User preferences always drift over a period of time due to its diversity. User preferences change regardless of slow or fast trends in social change. Some preferences may be active, some may not. If we extract noise preference data from users who are not active or the user may no longer like it, it will degrade the performance of the model as well.

To meet these challenges, we use the graph-based method. Convolutional graph networks extract implicit preference signals and dynamic graph pooling is then used to capture dynamic preference. In the long history of the user, we first convert the sequence of loose items to a graphic of tight items. And design an attentive graph convolutional network that pulls together weak signals to chain those that can accurately reflect user preferences. Next, we use a dynamic graph grouping technique that will adaptively reserve the activated base preferences to predict the next preference or user behavior.

**Formulation of the problem**

Let us have the formal definition of a sequential recommendation system. Consider that we have defined sequential historical data denoted by X, where 𝑥 ∈ X is the number of elements in X. The sequential user interaction sequence is represented by {𝑥_{1},_{2},. . . ,_{??}} Or_{1 }is the first interaction and 𝑥_{?? } last interaction. We need to predict_{+1 } which is the next item which is user preference. User preferences are listed in chronological order. The sequential recommendation can be formulated as follows:

**Grab:** The historical interaction of each user is {𝑥_{1},_{2},. . . ,_{??}}

**To go out**: The recommendation system will estimate the probability of user interaction and predict 𝑥_{+1 }.

**Methodology**

The graph presented below consists of: –

- Interest graph construction
- Convolutional layer of the interest fusion graph
- Grouping layer of interest extraction graphs
- Prediction layer

**Construction of the graph of interest:** Here, we reconstruct sequences of loose elements into graphs of interest of tight elements based on metric learning. There is always a challenge that the scarcity of the co-occurrence relationship is not sufficient to generate one connected graph for the other.

Here, we first insert the raw graph construction, which will build a unidirectional graph G = {V, E, 𝐴} to represent each of the interaction sequences. Here, the E designates the edges of the graph to be learned and 𝐴 ∈ R 𝑛 × 𝑛 represents the corresponding adjacency matrix. Each vertex 𝑣* _{I}* V where | V | = 𝑛 is linked to an interacting element. Since we need a prior graph in which the neighboring nodes are similar, we use the training of the node similarity metric. A good similarity metric function must be able to be learned up to expressiveness and acceptable complexity. The metric function is formulated as,

Where ⊙ denotes the Hadamard product, the vector 𝑤 are weights that can be trained, the vector hi and hj are embeddings of elements.

To increase the stabilization of the learning process and the expressive power, the similarity metric function can be extended to a multi-head metric.

Where is the number of heads, 𝑀^{??}_{j }calculates the similarity metric between the incorporation of two elements. Each head captures a different perspective of semantics.

The separation of the graph via -sparseness is generally that the elements of the adjacency matrix must be non-negative. The cosine value is between [-1,1].

Where Rank_{2 }(M) returns a value of 𝜀𝑛2 – the largest value in the metric matrix.

**Convolutional layer of the interest fusion graph**

In this layer, we dynamically reinforce important behaviors and weaken noise behavior.

Merging of interests via careful convolution of the graph, an alignment score _{ }E_{I }is calculated to map the importance of the target node 𝑣_{I}

Where is the function of nonlinearity, the aggregate can be the mean, the sum, the maximum, etc. We add some more steps such as cluster and query responsive attention and finally we add the target node cluster score and score node query score as updated weight source node j to node target i.

**Interest extraction graph grouping layer: **Each user has different preferences at different times. A dynamic graph grouping operation is performed to adaptively reserve dynamically activated preferences. Here we use multiple steps such as extracting interests through grouping charts, regularizing assignments, regularizing same mapping, single affiliation regularizing, and reading charts. At the chart reading point we have a strongly coarse G chart which represents the strong interest signal from users. We’re doing a weighted reading on integrating raw graphics.

Or_{I }weight score of each node before pooling.

**Prediction layer:** After pooling, the graphics are flattened to reduce the sequence. In modeling the evolution of Interset, as user preferences change to provide the final representation of interest with more historical data, it is necessary to consider the chronological relationship between user interests. To represent a single sequential model, we use

Where AUGRU is GRU with an attentional door.

**Prediction**: We take the evolution output of the interest evolution layer as the current user interest and graph level representation of the interest extraction layer and concatenate them.

We use the negative log-likelihood function as the loss function. The optimization process here is used to minimize the loss function. We use L2 regularization to avoid overfitting.

Where is a set of parameters that can be trained, 𝜆 will control the strength of the penalty. O is the learning set and | O | is the number of training instances.

**Final words**

In this article, we understood sequential recommender systems and how they differ from traditional recommender systems. We could also understand what convolutional graph networks are and why we use sequential recommender systems with convolutional graph networks.

**Reference**