Tumgik
#a4paper
lictvison · 20 days
Note
a4,,, a4s design is so perfect..... may i have a4s hand in marriage. or just his hand
Tumblr media
“Oh goodness dear! I’m afraid that won’t be possible, I’m a horrible husband!”
6 notes · View notes
globaltradesposts · 4 months
Text
Tumblr media
The demand for A4 copy paper remains consistently high worldwide, driven by the ever-growing need for documentation and printing in various sectors. This demand creates a lucrative market for exporters, but it also means facing stiff competition from other players vying for market share. To thrive in this environment, exporters must adopt innovative solutions that streamline operations and enhance visibility.
0 notes
palmspapers · 8 months
Text
PALM PAPERS (Email: [email protected]) We are a manufacturer, supplier and exporter of office copy papers from Thailand. We are suppliers of office multipurpose printing Copy Papers which includes a4, a3, Letter & Legal Sizes 80gsm 75gsm 70gsm inkjet and laser printer/printing papers. We are the "main supplier of a4 paper a4 copy paper 100% wood pulp photo copy papers in Thailand". Our Products include office papers such as a4-paper, a3 paper, letter paper, white paper, office paper, stationery papers, copy-paper, "copier paper", photocopy paper, FAX paper, A4 paper, paper 80gsm, 70gsm paper, 75gr paper, 90gsm paper, 100gsm paper, paperone paper, double a, paper one, navigator paper, ik paper, inkjet paper, laser paper, A4 Paper, "Double A4 Copy Paper". We manufacturer all types of office copy papers for our customers using their own brand name and logo. We offer free OEM and Customized branding of a4-copy-paper as we manufacture and supply these "copy papers", we intend to make long business relationship with our customers by offering them free OEM and customized manufacturing of a4 copy paper. "We offer Multipurpose Office Copy Paper A4 210mm x 297mm, A3, Letter & Legal Sizes 80gsm 75gsm 70gsm Copier and Photocopy Printing Papers DOUBLE A4 PAPER, XEROX PAPER, A4, A3 PAPER, IK PLUS PAPER, HUSKY, HAMMER-MILL, NAVIGATOR, EXCELLENT, REFLEX, IK YELLOW, IK PLUS, PAPER-LINE GOLD: A4/A3/LETTER/LEGAL/A5/F14". Brands: "DOUBLE A PAPER". "XEROX PAPER" A4, A3 PAPER, IK PLUS PAPER, HUSKY, HAMMER-MILL, NAVIGATOR, EXCELLENT, REFLEX, IK YELLOW, IK PLUS, PAPER-LINE GOLD. SPECIFICATION: - Size: A4/A3/LETTER/LEGAL/A5/F14 etc. - Substance: 80/75/70 GSM - Brightness: 102-104% above - Color: White - Whiteness: CIE167 - Grade: Multipurpose Premium Paper - Roughness: 140ml/Min - Thickness: 110um - Opacity: 95% KEY PERFORMANCE: - No jam in Photocopy machines - No Doubling - Stay flat after copying - Leave no dust in the copy machine - White and clean, Smooth and bulky - Best result in Photocopy machines - Laser printers, Inkjet printer, Fax, etc.
WhatsApp: +66 83 889 3912
Website: https://palmsa4papers.com
0 notes
anielskaaniela · 8 months
Text
How to Print and Assemble Free PDF Sewing Patterns Using A4 Paper Size
In this post you learn how to print and assemble my free PDF sewing patterns using A4 paper. Do you love sewing and want to try some free PDF sewing patterns? If you are new to sewing or just want to save some money, PDF sewing patterns are a great option. You can download them from the internet and print them at home using your own printer and paper. However, printing and assembling PDF sewing…
Tumblr media
View On WordPress
0 notes
manoaod · 2 years
Photo
Tumblr media
“Bleu et violet” est une illustration réaliser lors d’un work-shop avec l’illustrateur Téo Transinne.
21 x 29,7cm, 2021
0 notes
shivamray · 2 months
Text
0 notes
permaytradings · 2 years
Text
1 note · View note
elinaline · 8 days
Text
Another day another \documentclass[notitlepage,a4paper,12pt]{report}
1 note · View note
sendaikoyama · 1 year
Text
実験
latex
\documentclass[a4paper]{jsarticle} \usepackage{theorem}
0 notes
olehswift4 · 2 years
Text
\documentclass[3p]{elsarticle}
%\documentclass[10pt,a4paper]{article}
\usepackage{hyperref}
\usepackage{booktabs}
\usepackage{xparse}
\RequirePackage{etex}
\typeout{FONTSPEC LOADING}
\usepackage{savesym}
\usepackage{amsthm}
\usepackage{hyperref}
\usepackage{enumitem}
\usepackage{epsfig}
\usepackage[lofdepth,lotdepth]{subfig}
\usepackage{url}
\usepackage[noend]{algorithmic}
\usepackage{xcolor}
\usepackage{amsmath, amssymb}
\usepackage[ruled]{algorithm}
\usepackage[export]{adjustbox}
\usepackage{pstricks}
\usepackage{pst-all}
\usepackage{pst-plot}
\usepackage{pst-func,pst-math}
\usepackage{pgfplots}
\usepackage{tikz}
\usepackage{xltxtra}
\usepackage{etex}
\usepackage{bbm}
\usepackage{mathtools}
\newgray{shadecolor}{0.85}
\definecolor{shadecolor}{gray}{0.85}
\newgray{graylight}{0.75}
\newgray{grayplus}{0.35}
\newgray{graydark}{0.1}
\def\algorithmicrequire{\textbf{Input:}}
\def\algorithmicensure{\textbf{Output:}}
\def\algorithmicif{\textbf{if}}
\def\algorithmicthen{\textbf{then}}
\def\algorithmicelse{\textbf{else}}
\def\algorithmicelsif{\textbf{else if}}
\def\algorithmicfor{\textbf{for}}
\def\algorithmicforall{\textbf{for all}}
\def\algorithmicdo{}
\def\algorithmicwhile{\textbf{while}}
\def\algorithmicrepeat{\textbf{repeat}}
\def\algorithmicuntil{\textbf{until}}
\def\algorithmicloop{\textbf{loop}}
\newcommand{\MovieL}{\textsc{MovieLens}}
\newcommand{\NetF}{\textsc{Netflix}}
\newcommand{\RecS}{\textsc{RecSys 2016}}
\newcommand{\Out}{\textsc{Outbrain}}
\newcommand{\ML}{\textsc{ML}}
\newcommand{\userS}{\mathcal{U}}
\newcommand{\itemS}{\mathcal{I}}
\newcommand{\vecU}{\mathbf{U}}
\newcommand{\vecI}{\mathbf{V}}
\newcommand{\MAP}{\texttt{MAP}}
\newcommand{\NDCG}{\texttt{NDCG}}
\newcommand{\RecNet}{\texttt{NERvE}}
\newcommand{\RecNetE}{{\RecNet}$_{E\!\!\backslash}$}
\newcommand{\MostPop}{\texttt{MostPop}}
\newcommand{\BPR}{\texttt{BPR-MF}}
\newcommand{\CoFactor}{\texttt{Co-Factor}}
\newcommand{\LightFM}{\texttt{LightFM}}
\newcommand{\Loss}{\mathcal{L}}
\newcommand{\Trn}{\mathcal{S}}
\newcommand{\D}{\mathcal D}
\newcommand{\EE}{\mathbb E}
\newcommand{\Ind}{\mathbbm{1}}
\newcommand{\N}{\mathbb N}
\newcommand{\Input}{\mathcal X}
\newcommand{\R}{\mathbb R}
\newcommand{\prefu}{\renewcommand\arraystretch{.2} \begin{array}{c}
{\succ} \\ \mbox{{\tiny {\it u}}}
\end{array}\renewcommand\arraystretch{1ex}}
\newcommand{\graph}{\Omega}
\newcommand{\graphH}{\mathcal H}
\newcommand{\vertices}{\mathcal V}
\newcommand{\edges}{\mathcal E}
\newcommand{\Cset}{\mathcal M}
\newcommand{\Weight}{W}
\newcommand{\cover}{\mathcal C}
\newcommand{\Xset}{\mathcal X}
\newcommand{\covers}{{\mathcal K}}
\newcommand{\bfZ}{\mathbf{z}}
\newcommand{\rademacher}{\mathfrak{R}}
\newcommand{\DA}{^\downarrow}
\newcommand{\kasandr}{\textsc{Kasandr}}
\newcommand{\pandor}{\textsc{Pandor}}
\newcommand{\cmmnt}[1]{}
\newcommand{\MRA}[1]{{\color{black}#1}}
\newtheorem{theorem}{Theorem}
\newtheorem{theoremAp}{Theorem}
\newtheorem{definition}{Definition}
%\let\today\relax
\usepackage{etoolbox}
\makeatletter
\patchcmd{\ps@pprintTitle}{\footnotesize\itshape
Preprint submitted to \ifx\@journal\@empty Elsevier
\else\@journal\fi\hfill\today}{\relax}{}{}
\makeatother
\renewcommand*{\today}{October 10, 2020}
\journal{Journal Data Mining and Knowledge Discovery}
\bibliographystyle{elsarticle-num}
%%%%%%%%%%%%%%%%%%%%%%%
\pgfplotsset{compat=1.15}
\begin{document}
\begin{frontmatter}
\title{User Preference and Embedding Learning with Implicit Feedback for Recommender Systems}
\author[addressone]{Sumit Sidana}
\author[addresstwo]{Mikhail Trofimov}
\author[addressthree]{Oleh Horodnytskyi}
\author[addressone]{Charlotte Laclau}
\author[addressthree,addressfour]{Yury Maximov}
\author[addressone]{Massih-Reza Amini}
\cortext[mycorrespondingauthor]{Corresponding Author: Massih-Reza Amini}
\address[addressone]{University Grenoble Alpes CNRS/LIG, France}
\address[addresstwo]{Federal Research Center "Computer Science and Control" of Russian Academy of Sciences}
\address[addressthree]{Skolkovo Institute of Science and Technology, Russia}
\address[addressfour]{Theoretical Division T-5/CNLS, Los Alamos National Laboratory, USA}
\begin{abstract}
In this paper, we propose a novel ranking framework for collaborative filtering with the overall aim of learning user preferences over items by minimizing a pairwise ranking loss. We show the minimization problem involves dependent random variables and provide a theoretical analysis by proving the consistency of the empirical risk minimization in the worst case where all users choose a minimal number of positive and negative items. We further derive a Neural-Network model that jointly learns a new representation of users and items in an embedded space as well as the preference relation of users over the pairs of items. The learning objective is based on three scenarios of ranking losses that control the ability of the model to maintain the ordering over the items induced from the users' preferences, as well as, the capacity of the dot-product defined in the learned embedded space to produce the ordering. The proposed model is by nature suitable for implicit feedback and involves the estimation of only very few parameters. Through extensive experiments on several real-world benchmarks on implicit data, we show the interest of learning the preference and the embedding simultaneously when compared to learning those separately. We also demonstrate that our approach is very competitive with the best state-of-the-art collaborative filtering techniques proposed for implicit feedback.
\end{abstract}
%\begin{keyword}
%\texttt{elsarticle.cls}\sep \LaTeX\sep Elsevier \sep template
%\MSC[2010] 00-01\sep 99-00
%\end{keyword}
\end{frontmatter}
%\linenumbers
\section{Introduction}
In the recent years, recommender systems (RS) have attracted a lot of interest in both industry and academic research communities, mainly due to new challenges that the design of a decisive and efficient RS presents. Given a set of customers (or users), the goal of RS is to provide a personalized recommendation of products to users which would likely to be of their interest. Typical examples of applications include the recommendation of movies (Netflix, Amazon Prime Video), music (Pandora), videos (YouTube), news content (Outbrain) or advertisements (Google). The development of an efficient RS is critical for both the company and the consumer perspective. On the one hand, users usually face a huge number of options: for instance, Amazon proposes over 20,000 movies in its selection, and it is, therefore, essential to help them to take the best possible decision by narrowing down the choices they have to make. On the other hand, major companies report significant increase of their traffic and sales coming from personalized recommendations: Amazon declares that recommendations generate $35\%$ of its sales, two-thirds of the movies watched on Netflix are recommended, and $28\%$ of ChoiceStream users said that they would buy more music, provided the fact that they meet their tastes and interests \footnote{Talk of Xavier Amatriain - Recommender Systems - Machine Learning Summer School 2014 @ CMU.}.
\smallskip
%Pazzani:2007:CRS:1768197.1768209
Two main approaches have been proposed to tackle this problem \cite{Ricci:2010:RSH:1941884}. The first
one, referred to as Content-Based recommendation technique \cite{reference/rsh/LopsGS11} makes use of existing contextual information about the users (e.g. demographic information) or items (e.g. textual description) for recommendation. The second approach referred to as collaborative filtering (CF) and undoubtedly the most popular one, relies on the past interactions and recommends items to users based on the feedback provided by other similar users. Feedback can be {\it explicit}, in the form of ratings; or {\it implicit}, which includes clicks, browsing over an item or listening to a song. Such implicit feedback is readily available in abundance but is more challenging to take into account as it does not clearly depict the preference of a user for an item. Explicit feedback, on the other hand, is very hard to get in abundance. The adaptation of CF systems designed for another type of feedback has been shown to be sub-optimal as the basic hypothesis of these systems inherently depends on the nature of the feedback \cite{ir2004010}. Further, learning a suitable representation of users and items have been shown to be the bottleneck of these systems \cite{DBLP:conf/kdd/WangWY15}, mostly in the cases where contextual information over users and items, which allows having a richer representation, is unavailable.
In this paper, we are interested in the learning of user preferences mostly provided in the form of implicit feedback in RS. Our aim is twofold and concerns:
\begin{enumerate}
\item the development of a theoretical framework for learning user preference in recommender systems that justifies the learnability of pairwise ranking models proposed until now for this task, and its analysis in the worst case where all users provide a minimum of positive/negative feedback;
\item the design of a new neural-network model based on this framework that jointly learns the preference of users over pairs of items and their representations in an embedded space without requiring any contextual information.
\end{enumerate}
We extensively validate our proposed approach over five publicly available benchmarks with implicit feedback by comparing it to state of the art models.
The remainder of this paper is organized as follows. In Section \ref{sec:sim}, we provide an overview of existing related methods. Then, Section~\ref{sec:model} defines the notations and the proposed framework and analyze its theoretical properties.
Section~\ref{sec:experiment} is devoted to numerical experiments on five real-world benchmark datasets including binarized versions of MovieLens and Netflix, and two real data sets on online advertising. We compare different versions of our model with state-of-the-art methods showing the appropriateness of our contribution. Finally, we summarize the study and give possible future research perspectives in Section~\ref{sec:conclusion}.
\section{State-of-the-art}\label{sec:sim}
This section provides an overview of the state-of-the-art approaches that are the most similar to ours.
\subsection{Neural Language Models}
Neural language models have proven themselves to be successful in many natural language processing tasks including speech recognition, information retrieval, and sentiment analysis. These models are based on a distributional hypothesis stating that words, occurring in the same context with the same frequency, are similar. To capture such similarities, these approaches propose to embed the word distribution into a low-dimensional continuous space using Neural Networks, leading to the development of several powerful and highly scalable language models such as the word2vec Skip-Gram (SG) model \cite{word_emb,mikolov_13}.
%shazeer2016swivel
\medskip
The recent work of \cite{levy_14} has shown new opportunities to extend the word representation learning to characterize more complicated pieces of information. This paper established the equivalence between the SG model with a negative sampling and implicitly factorizing point-wise mutual information (PMI) matrix. Further, they demonstrated that word embedding could be applied to different types of data, provided that it is possible to design an appropriate context matrix for them. This idea has been successfully applied to recommendation systems where different approaches attempted to learn representations of items and users in an embedded space to meet the problem of recommendation more efficiently \cite{guardia_15,liang_16,DBLP:conf/kdd/GrbovicRDBSBS15}.
% Covington:2016:DNN:2959100.2959190, He:2017:NCF:3038912.3052569
\medskip
In \cite{He:2017:NCF:3038912.3052569}, the authors
used a bag-of-word vector representation of items and users, from which the latent representations of latter are learned through word-2-vec.
\cite{liang_16} proposed a model that relies on the intuitive idea that the pairs of items which are scored in the same way by different users are similar. The approach reduces to finding both the latent representations of users and items, with the traditional Matrix Factorization (MF) approach, and simultaneously learning item embeddings using a co-occurrence shifted positive PMI (SPPMI) matrix defined by items and their context. The latter is used as a regularization term in the traditional objective function of MF. Similarly, in \cite{DBLP:conf/kdd/GrbovicRDBSBS15} the authors proposed Prod2Vec, which embeds items using a Neural-Network language model applied to a time series of user purchases. This model was further extended in \cite{vasile_16} who, by defining appropriate context matrices, proposed a new model called Meta-Prod2Vec. Their approach learns a representation of both items and side information available in the system. The embedding of additional information is further used to regularize the item embedding.
Inspired by the concept of a sequence of words; the approach proposed by \cite{guardia_15} defined the consumption of items by users as trajectories. Then, the embedding of items is learned using the SG model, and the users' embeddings are further used to predict the next item in the trajectory. In these approaches, the learning of item and user representations are employed to predict with predefined or fixed similarity functions (such as dot-products) in the embedded space.
Although learning user and item embeedings seem to be unavoidable in RS, as the interaction between users and items is generally the only available source of information characterizing them, many studies have pointed out that the second ingredient, making RS to be efficient, is the learning of user's preference.
\subsection{Learning-to-Rank based Neural Networks for Recommender systems}
Motivated by automatically tuning the parameters involved in the combination of different scoring functions, Learning-to-Rank approaches were initially developed for Information Retrieval (IR) tasks and are grouped into three main categories: pointwise, listwise and pairwise \cite{Liu:2009}.
\medskip
Pointwise approaches \cite{Crammer01,Li08} assume that each queried document pair has an ordinal score. The ranking is then stated as a regression problem, in which the rank value of each document is estimated as an absolute quantity. In the case
where relevance judgments are given as pairwise preferences
(rather than relevance degrees), it is usually not straightforward to apply these algorithms for learning. Moreover, pointwise techniques do not consider the inter-dependency among documents, so that the position of documents in the final ranked list is missing in the regression-like loss functions used for parameter tuning. On the other hand, listwise approaches \cite{Shi:2010,Xu07,Xu08} take the entire ranked
list of documents for each query as a training instance. As
a direct consequence, these approaches are able to differentiate documents from different queries and consider their position in the output ranked list at the training stage. Listwise techniques aim to optimize a ranking measure directly, so they generally face a complex optimization problem dealing with non-convex, non-differentiable and discontinuous functions. Finally, in pairwise approaches, \cite{Cohen99,Freund03,Joachims02,PessiotTUAG07} the ranked list is decomposed into a set of document pairs. The ranking is therefore considered as the classification of pairs of documents, such that a classifier is trained by minimizing the number of misorderings in ranking. In the test phase, the classifier assigns a positive or negative class label to a document pair that indicates which of the documents in the pair should be better ranked than the other one.
\smallskip
Perhaps the first Neural Network model for ranking is RankProp, proposed initially by \cite{Caruana:1995}. RankProp is a pointwise approach that alternates between two phases of learning the desired real outputs by minimizing a Mean Squared Error (MSE) objective, and a modification of the desired values themselves to reflect the current ranking given by the net. Later on \cite{DBLP:conf/icml/BurgesSRLDHH05} proposed RankNet, a pairwise approach, that learns a preference function by minimizing a cross entropy cost over the pairs of relevant and irrelevant examples. SortNet proposed by \cite{DBLP:conf/icann/RigutiniPMB08,DBLP:journals/tnn/RigutiniPMS11} also learns a preference function by minimizing a ranking loss over the pairs of examples that are selected iteratively with the overall aim of maximizing the quality of the ranking. The three approaches above consider the problem of Learning-to-Rank for IR and without learning an embedding.
\MRA{The architecture of the propsed approach bears similarity with the wide and deep learning model which has also two components \cite{Chen16}. As in our case, the wide part is a linear model and learns dense embeddings of the users and items and is said to take care of memorization. The prediction function is the dot product of weights learned and user and item embeddings. The deep part is a fully connected deep neural network and takes care of generalization. As in our case, the training takes place jointly.
The key difference, though, lies in the prediction function. In wide and deep learning, the goal is to predict a score (as in pointwise ranking) which is very different from the pairwise learning to rank function that is in our model. In recent years, most focus has been put on the development of pairwise approaches as they have been shown to be more efficient than pointwise approaches for recommender systems. }
Perhaps, the closest work to ours is \cite{HeLZNHC17}, in which authors learn efficient user and item embedding. Then, the respective embeddings are concatenated and goes through a dense layer consisting of fully connected layers. But, no loss focusing on quality of representations is employed. In \cite{ZhangYHW16}, authors develop deep learning techniques for textual data, particularly reviews and use ratings based feedback to optimize. It is not applicable to implicit feedback, such as clicks. \cite{Zhang:2017:JRL:3132847.3132892} use heterogeneous sources such as review text, item image and rating in order to learn good representation of the item for doing top-n recommendation. While being very effective model, it does require heterogeneous sources for it to be effective. Finally, \cite{ChengKHSCAACCIA16} use wide and deep learning, to combine the benefits of memorization and generalization for recommender systems.
In this study we tackle the consistency of the empirical risk minimization principle used to learn pairwise ranking models for RS and propose a Neural Network model, with a composite objective loss, which jointly influences the learning of the embeddings and the scoring function.
%\subsection{Deep Learning for Recommendation Systems}
%In this work, we propose to conduct learning-to-rank (L2R) and representation learning (RL) in a unified framework, where the embedded representations of users, items and the preferences of users over pairs of items are learned simultaneously.
\section{Framework and Model}\label{sec:model}
We denote by $\userS\subseteq \N$ (resp. $\itemS\subseteq \N$) the set of indexes over users (resp. the set of indexes over items). %Further, we suppose that users and items are represented in the same input space of dimension $k$, $\Input\subseteq \R^k$. We propose to learn this representation space using an embedded ranking algorithm (Section \ref{sec:RecNetModel}). For each user (resp. item) index $u\in\userS$ (resp. $i\in\itemS$), we then denote by $\vecU_u\in\Input$ (resp. $\vecI_i\in\Input$) its corresponding vector representation in the input space.
Further, for each user $u\in\userS$, we consider two subsets of items $\itemS^-_u\subset \itemS$ and $\itemS^+_u\subset \itemS$ such that;
\begin{itemize}
\item[$i)$] $\itemS^-_u\neq \varnothing$ and $\itemS^+_u \neq \varnothing$,
\item[$ii)$] for any pair of items $(i,i')\in\itemS^+_u\times \itemS^-_u$; $u$ has a preference, symbolized by~\!\!$\prefu$\!\!. Hence $i\!\prefu\! i'$ implies that, user $u$ prefers item $i$ over item $i'$.
\end{itemize}
From this preference relation, a desired output $y_{i,u,i'}\in\{-1,+1\}$ is defined over each triplet $(i,u,i')\in\itemS^+_u\times\userS\times\itemS^-_u$ as:
\begin{equation}
y_{i,u,i'}= \left\{
\begin{array}{ll}
1 & \mbox{if } i\!\prefu\! i', \\
-1 & \mbox{otherwise.}
\end{array}
\right.
\label{eq:Preference}
\end{equation}
\subsection{Learning objective}
Following \cite{rendle_09}, we consider the learning task that consists in finding a scoring function $f$ from a class of functions $\mathcal F=\{f\mid f: \itemS\times\userS\times \itemS\rightarrow \R\}$ that minimizes the ranking loss:
\begin{equation}
\label{eq:PrefObj}
\Loss(f)=\EE\left[\frac{1}{|\itemS^+_u||\itemS^-_u|}\sum_{i\in\itemS^+_u}\sum_{i'\in\itemS^-_u}\Ind_{y_{i,u,i'}f(i,u,i')<0}\right],
\end{equation}
%
where $|.|$ measures the cardinality of sets and $\Ind_{\pi}$ is the indicator function which is equal to $1$, if the predicate $\pi$ is true, and $0$ otherwise. Many approaches tackled this problem by proposing to learn a mapping function $\Phi:\userS\times\itemS\rightarrow \Input\subseteq \mathbb{R}^k$ that projects a pair of user and item indices into a feature space of dimension $k$, and a function $g:\mathcal X\times \mathcal X\rightarrow \R$ such that each function $f\in\mathcal F$ can be decomposed as:
\begin{equation*}
%\label{eq:deff}
\forall u\in\userS, (i,i')\in\itemS^+_u\times \itemS^-_u,~ f(i,u,i')=g(\Phi(u,i))-g(\Phi(u,i')).
\end{equation*}
The previous loss \eqref{eq:PrefObj} is a pairwise ranking loss, and it is related to the Area under the ROC curve \cite{Usunier:1121}. The learning objective is, hence, to find a function $f$ from the class of functions $\mathcal F$ with a small expected risk, by minimizing the empirical error over a training set
\[
S=\{(\bfZ_{i,u,i'}\doteq(i,u,i'),y_{i,u,i'})\mid u\in\userS, (i,i')\in\itemS^+_u\times \itemS^-_u\},
\]
constituted over $N$ users, $\userS=\{1,\ldots,N\}$, and their respective preferences over $M$ items, $\itemS=\{1,\ldots,M\}$ and is given by:
\begin{align}
\hat\Loss(f,S)&=\frac{1}{N}\sum_{u\in\userS}\frac{1}{|\itemS^+_u||\itemS^-_u|}\sum_{i\in\itemS^+_u}\sum_{i'\in\itemS^-_u} \Ind_{y_{i,u,i'}\left(f(i,u,i')\right)<0} \nonumber\\
&=\frac{1}{N}\sum_{u\in\userS}\frac{1}{|\itemS^+_u||\itemS^-_u|}\sum_{i\in\itemS^+_u}\sum_{i'\in\itemS^-_u} \Ind_{y_{i,u,i'}\left(g(\Phi(u,i))-g(\Phi(u,i'))\right)<0}.\label{eq:EmpRisk}
\end{align}
\MRA{The pairwise ranking loss \eqref{eq:EmpRisk} is equivalent to a classification loss over the pairs of examples. By this, the aim of the prediction function is not to predict the score (+1, relevant, or, -1; irrelevant), but rather to preserve the relative order of preferences between two ratings given by the same user.} Although different studies have shown the efficiency of jointly learning an adapted users and items representations, as well as the scoring function $g$, \cite{rendle_09}. However this minimization problem involves dependent random variables as for each user $u$ and item $i$; all comparisons $g(\Phi(u,i))-g(\Phi(u,i')); i'\in~\itemS^-_u$ involved in the empirical error \eqref{eq:EmpRisk} share the same observation $\Phi(u,i)$.
To the best of our knowledge, there is no study which considered the consistency of the empirical risk minimization principle that is generally used for this task. To tackle this problem, we build on \cite{Amini:15} and derive generalization error bounds for bounding the error \eqref{eq:PrefObj} with respect to \eqref{eq:EmpRisk}. The idea is based on graph coloring, introduced by \cite{Janson04RSA}, and which consists in dividing a graph $\graph=(\vertices,\edges)$ that links dependent variables represented by its nodes $\vertices$ into $J$ sets
of {\em independent} variables, called the exact proper fractional cover of $\graph$ and defined as:
\begin{definition}[Exact proper fractional cover of $\graph$, \cite{Janson04RSA}]
\label{def:chromatic}
Let $\graph=(\vertices,\edges)$ be a
graph. $\cover=\{(\Cset_j,\omega_j)\}_{j\in\{1,\ldots,J\}}$, for some positive integer $J$, with
$\Cset_j\subseteq\vertices$ and $\omega_j\in [0,1]$ is an exact proper
fractional cover of $\graph$, if:
i) it is {\em proper:} $\forall j,$ $\Cset_j$ is an {\em independent set}, i.e., there is no connections between vertices in~$\Cset_j$;
ii) it is an {\em exact fractional cover} of $\graph$: $\forall
v\in\vertices,\;\sum_{j:v\in\Cset_j}\omega_j= 1$.
\end{definition}
The weight $\Weight(\cover)$ of $\cover$ is given by: $\Weight(\cover)\doteq\sum_{j=1}^J\omega_j$ and the
minimum weight $\chi^*(\graph)=\min_{\cover\in\covers(\graph)} \Weight(\cover)$ over the set $\covers(\graph)$ of all exact proper fractional covers of $\graph$ is the {\em fractional chromatic number} of $\graph$.
Figure \ref{fig:ProperCover} depicts an exact proper fractional cover corresponding to the problem we consider for a toy problem with $M=5$ items and a user $u$, preferring $|\itemS^+_u|=2$ items over $|\itemS^-_u|=3$ other ones. In this case, the nodes of the dependency graph correspond to $6$ pairs constituted by; pairs of the user and each of the preferred items, with the pairs constituted by the user and each of the no preferred items, involved in the empirical loss \eqref{eq:EmpRisk}. Among all the sets containing independent pairs of examples, the one shown in Figure \ref{fig:ProperCover},$(c)$ is the exact proper fractional cover of $\graph$ and the fractional chromatic number is, in this case, $\chi^*(\graph)=|\itemS^-_u|=3$.
\begin{figure*}[t!]
\begin{center}
\includegraphics[width=.95\textwidth]{FncTrnsf3.pdf}
\end{center}
\caption{A toy problem with 1 user who prefers $|\itemS_u^+|=2$ items over $|\itemS_u^-|=3$ other ones (top). The dyadic representation of pairs constituted with the representation of the user and each of the representations of preferred and non-preferred items (middle). A different covering of the dependent set, $(a)$ and $(b)$; as well as the exact proper fractional cover, $(c)$, corresponding to the smallest disjoint sets containing independent pairs.}
\label{fig:ProperCover}
\end{figure*}
By mixing the idea of graph coloring with the Laplace transform, Hoeffding like concentration inequalities for the sum of dependent random variables are proposed by~\cite{Janson04RSA}. In \cite{UsunierAG05} this result is extended to provide a generalization of the bounded differences inequality of \cite{mcdiarmid89method} to the case of interdependent random variables. This extension then paved the way for the definition of the {\em fractional Rademacher complexity} that generalizes the idea of Rademacher complexity and allows one to derive generalization bounds for scenarios where the training
data are made of dependent data.
In the worst case scenario where all users provide the lowest interactions over the items, which constitutes the bottleneck of all recommendation systems:
\[
\forall u\in S, |\itemS^-_u|=n_*^-=\mathop{\min}_{u'\in S} |\itemS^-_{u'}| \text{,~and~~} |\itemS^+_u|=n_*^+=\mathop{\min}_{u'\in S} |\itemS^+_{u'}|,
\]
\noindent the empirical loss \eqref{eq:EmpRisk} is upper-bounded by:
\begin{equation}
\label{eq:EmpRisk2}
\hat\Loss(f,S)\le \hat\Loss_*(f,S)= \frac{1}{N}\frac{1}{n_*^- n_*^+}\sum_{u\in\userS}\sum_{i\in\itemS^+_u}\sum_{i'\in\itemS^-_u} \Ind_{y_{i,u,i'}f(i,u,i')<0}.
\end{equation}
Following \cite[Proposition 4]{RalaiAmin15}, a generalization error bound can be derived for the second term of the inequality above based on local Rademacher Complexities that implies second-order (i.e. variance) information inducing faster convergence rates.
For sake of presentation and in order to be in line with the learning representations of users and items in an embedded space introduced in Section \ref{sec:RecNetModel}, let us consider the kernel-based hypotheses with $\kappa:\Input\times\Input\rightarrow\mathbb{R}$ a {\em positive semi-definite} (PSD) kernel and $\Phi:\userS\times\itemS \rightarrow \Input$ its associated feature mapping function. Further we consider linear functions in the feature space with bounded norm:
\begin{equation*}
\mathcal G_B=\{g_{\boldsymbol{w}}\circ \Phi: (u,i)\in \userS\times\itemS \mapsto \langle \boldsymbol{w},\Phi(u,i)\rangle \mid ||\boldsymbol{w}|| \leq B\}
\end{equation*}
where $\boldsymbol{w}$ is the weight vector defining the kernel-based hypotheses and $\langle \cdot,\cdot\rangle$ denotes the dot product. We further define the following associated function class:
\begin{equation*}
\mathcal{F}_B=\{\bfZ_{i,u,i'}\doteq(i,u,i')\mapsto g_{\boldsymbol{w}}(\Phi(u,i))-g_{\boldsymbol{w}}(\Phi(u,i'))\mid g_{\boldsymbol{w}}\in \mathcal G_B\},
\end{equation*}
and the parameterized family $\mathcal{F}_{B,r}$ which, for $r>0$, is defined as:
\[
\mathcal{F}_{B,r} =
\{f:f\in\mathcal{F}_B,\mathbb{V}[f]\doteq\mathbb{V}_{\bfZ,y}[\Ind_{y f(\bfZ)}]\leq r\},
\]
where $\mathbb{V}[.]$ denotes the variance.
%
The fractional Rademacher complexity introduced in \cite{UsunierAG05} entails our analysis:
\[
\rademacher_{S}(\mathcal{F})=\frac{2}{m}\mathbb{E}_{\xi}\sum_{j=1}^{n_*^-}\mathbb{E}_{\Cset_j}\sup_{f\in\mathcal{F}}\sum_{\substack{\alpha\in\Cset_j \\ \bfZ_\alpha \in S}}\xi_\alpha f(\bfZ_\alpha),
\]
where $m=N\times n_*^+\times n_*^-$ is the total number of triplets $\bfZ$ in the training set and $(\xi_i)_{i=1}^m$ is a sequence of
independent Rademacher variables verifying
$\mathbb{P}(\xi_i=1)=\mathbb{P}(\xi_i=-1)=\frac{1}{2}$.
\bigskip
\begin{theorem}
\label{thm:WorseCaseRecNet}
Let $\userS$ be a set of $M$ independent users, such that each user $u \in \userS $ prefers $n_*^+$ items over $n_*^-$ ones in a predefined set of $\itemS$ items. Let $S=\{(\bfZ_{i,u,i'}\doteq(i,u,i'),y_{i,u,i'})\mid u\in\userS, (i,i')\in\itemS^+_u\times \itemS^-_u\}$ be the associated training set, then for any $1>\delta>0$ the following generalization bound holds for all $f\in \mathcal{F}_{B,r}$ with probability at least $1-\delta$:
\begin{equation*}
\Loss(f)\le\hat\Loss_*(f,S) + \frac{2B\mathfrak{C}(S)}{Nn^+_*}+ \frac{5}{2}\left(\sqrt{\frac{2B\mathfrak{C}(S)}{Nn^+_*}}+\sqrt{\frac{r}{2}}\right)\sqrt{\frac{\log\frac{1}{\delta}}{n_*^+}}+\frac{25}{48}\frac{\log\frac{1}{\delta}}{n_*^+},
\end{equation*}
where $\mathfrak{C}(S)=\sqrt{ \frac{1}{n^-_*}\sum_{j=1}^{n_*^-}\mathbb{E}_{\Cset_j}\left[ \sum_{\substack{\alpha\in\Cset_j \\ \bfZ_\alpha \in S}}d(\bfZ_\alpha,\bfZ_{\alpha}))\right]}$, $\bfZ_\alpha=(i_\alpha,u_\alpha,i'_\alpha)$
and also
$d(\bfZ_\alpha,\bfZ_{\alpha}) = \kappa(\phi,\phi) + \kappa(\phi',\phi') - 2\kappa(\phi,\phi')$ with $\phi = \Phi(u_\alpha,i_\alpha)$, $\phi' = \Phi(u_\alpha,i_\alpha')$.
\end{theorem}
The proof is given in Appendix.
\bigskip
This result suggests that~:
\begin{itemize}
\item even though the training set $S$ contains interdependent observations; following \cite[theorem 2.1, p. 38]{vapnik2000nature}, theorem \ref{thm:WorseCaseRecNet} gives insights on the consistency of the empirical risk minimization principle with respect to the minimization of (Eq. \ref{eq:EmpRisk2}),
\MRA{\item in the case where the feature space $\Input\subseteq \mathbb{R}^k$ is of finite dimension; lower values of $k$ involves lower kernel estimation and hence lower complexity term $\mathfrak{C}(S)$ which implies a tighter generalization bound.}
\end{itemize}
\subsection{A Neural Network model to learn user preference}
\label{sec:RecNetModel}
In this section we present a Neural Network, denoted as {\RecNet}, to learn the embedding representation jointly, $\Phi(.)$, as well as the scoring function, $f(.)$, defined in the previous section. The input of the network is a triplet $(i,u,i')$ composed by the indexes of an item $i$, a user $u$ and a second item $i'$; such that the user $u$ has a preference over the pair of items $(i, i')$ expressed by the desired output $y_{i,u,i'}$, defined with respect to the preference relation $\!\prefu\!$ (Eq.~\ref{eq:Preference}). Each index in the triplet is then transformed to a corresponding binary indicator vector $\mathbf{i}, \mathbf{u},$ and $\mathbf{i}'$ having all its characteristics equal to $0$ except the one that indicates the position of the user or the items in its respective set, which is equal to $1$. Hence, the following one-hot vector corresponds to the binary vector representation of user $u\in\userS$:
\begin{center}
\begin{pspicture}(-1.5,-0.5)(6,0.5)
\rput(0.65,0.25){\tiny{$1$}}
\rput(0.65,-0.05){$\downarrow$}
\rput(1.3,-0){$\ldots$}
\rput(1.9,0.25){\tiny{$u\!-\!\!1$}}
\rput(1.9,-0.05){$\downarrow$}
\rput(2.55,0.25){\tiny{$u$}}
\rput(2.55,-0.05){$\downarrow$}
\rput(3.1,0.25){\tiny{$u\!+\!\!1$}}
\rput(3.1,-0.05){$\downarrow$}
\rput(3.65,0.){$\ldots$}
\rput(4.38,0.25){\tiny{$N$}}
\rput(4.38,-0.05){$\downarrow$}
\rput(2.1,-0.5){$\mathbf{u}^\top=(0,~\ldots,~0,~~~1,~~0,~\ldots,~0).$}
\label{uvector}
\end{pspicture}
\end{center}
\bigskip
The network entails then three successive layers, namely {\it Embedding}, {\it Mapping} and {\it Dense} hidden layers depicted in Figure \ref{fig:recnet}.
\begin{figure*}[!ht]
\begin{center}
\input{RecNet.tex}
\end{center}
\caption{The architecture of {\RecNet} trained to reflect the preference of a user $u$ over a pair of items $i$ and $i'$. }
\label{fig:recnet}
\end{figure*}
\begin{itemize}
\item The {\it Embedding} layer transforms the sparse binary representations of the user and each of the items to denser real-valued vectors. We denote by $\vecU_u$ and $\vecI_i$ the transformed vectors of user $u$ and item $i$; and $\mathbf{\mathbb{U}}=(\vecU_u)_{u\in\userS}$ and $\mathbf{\mathbb{V}}=(\vecI_i)_{i\in\itemS}$ the corresponding matrices. Note that as the binary indicator vectors of users and items contain one single non-null characteristic, each entry of the corresponding dense vector in the {\it Embedding} layer is connected by only one weight to that characteristic.
\item The {\it Mapping} layer is composed of two groups of units each being obtained from the element-wise product between the user representation vector $\vecU_u$ of a user $u$ and a corresponding item representation vector $\vecI_i$ of an item $i$ inducing the feature representation of the pair $(u,i); \Phi(u,i)$.
\item Each of these units is also fully connected to the units of a {\it Dense} layer composed of successive hidden layers (see Section \ref{sec:experiment} for more details related to the number of hidden units and the activation function used in this layer).
\end{itemize}
The model is trained such that the output of each of the dense layers reflects the relationship between the corresponding item and the user and is mathematically defined by a multivariate real-valued function $g(.)$.
Hence, for an input $(i,u,i')$, the output of each of the dense layers is a real-value score that reflects a preference associated to the corresponding pair $(u,i)$ or $(u,i')$ (i.e. $g(\Phi(u,i))$ or $g(\Phi(u,i'))$). Finally the prediction given by {\RecNet} for an input $(i,u,i')$ is:
\begin{equation}
\label{eq:defF}
f(i,u,i')=g(\Phi(u,i))-g(\Phi(u,i')).
\end{equation}
\subsection{Algorithmic implementation}
\MRA{The main difference with other approaches which also proposed to learn jointly the user and items embedding and the scoring function $g$ \cite{HeLZNHC17,ZhangYHW16}, is that here we consider a composite pairwise ranking loss defined as}~:
\begin{equation}\label{eq:rankingLoss_alpha}
\mathnormal
\Loss_{c,p}(f,\mathbb{U},\mathbb{V},\Trn)=\alpha\Loss_c(f,\Trn)+(1-\alpha)\Loss_p(\mathbb{U},\mathbb{V},\Trn),
\end{equation}
\MRA{ where $\alpha\in [0,1]$ is a real-valued parameter and the first term reflects the ability of the non-linear transformation of user and item feature representations, $g(\Phi(.,.))$, to respect the relative ordering of items with respect to users' preferences:}
\begin{equation}
\label{eq:ranking_loss}
\Loss_c(f,\Trn)=\frac{1}{|\Trn|}\sum_{(\bfZ_{i,u,i'},y_{i,u,i'})\in \Trn} \log(1+e^{y_{i,u,i'}(g(\Phi(u,i'))-g(\Phi(u,i))}).
\end{equation}
\MRA{The second term focuses on the quality of the compact dense vector representations of items and users that have to be found, as measured by the ability of the dot-product in the resulting embedded vector space to respect the relative ordering of preferred items by users:}
\begin{equation}
\label{eq:embedding_loss}
\small
\Loss_p(\mathbb{U},\mathbb{V},\Trn)=\frac{1}{|\Trn|}\!\sum_{\Trn}\!\left[\log(1+e^{y_{i,u,i'}\vecU_u^\top(\vecI_{i'}-\vecI_{i})})+\!\lambda(\|\vecU_u\|_2^2\!+\!\|\vecI_{i'}\|_2^2\!+\!\|\vecI_{i}\|_2^2)\right]\!,
\end{equation}
where $\lambda$ is a regularization parameter for the user and items norms.
\MRA{The purpose of conjugating between the two losses is to see the impact of each on the learning of the final scoring function.}
\subsubsection*{Training phase}
The empirical minimization of the ranking losses is carried out by back-propagating \cite{Leon2012} the error-gradients from the output to both the deep and embedding parts of the model using mini-batch stochastic optimization (Algorithm~\ref{algRec}).
During training, the input layer takes a random set $\tilde \Trn_n$ of size $n$ of interactions by building triplets $(i,u,i')$ based on this set and generating a sparse representation from id's vector corresponding to the picked user and the pair of items. The binary vectors of the examples in $\tilde \Trn_n$ are then propagated throughout the network, and the ranking error (Eq.~\ref{eq:rankingLoss_alpha}) is back-propagated.
\begin{algorithm}[!ht]
\caption{{\RecNet$_.$}: Learning phase}
\label{algRec}\begin{algorithmic}[J]
\REQUIRE \STATE $T$: maximal number of epochs
\STATE A set of users $\userS=\{1,\ldots,N\}$
\STATE A set of items $\itemS=\{1,\ldots,M\}$
\FOR{$ep=1,\dots,T$}
\STATE Randomly sample a mini-batch $\tilde \Trn_n\subseteq \Trn$ of size $n$ from the original user-item matrix
\FORALL{$((i,u,i'),y_{i,u,i'})\in \tilde \Trn_n$}
\STATE \textbf{Propagate} $(i,u,i')$ from the input to the output.
\ENDFOR
\STATE \textbf{Retro-propagate} the pairwise ranking error (Eq.~\eqref{eq:rankingLoss_alpha}) estimated over~$\tilde \Trn_n$.
\ENDFOR
\ENSURE Users and items latent feature matrices $\mathbb{U}, \mathbb{V}$ and the model weights.
\end{algorithmic}
\end{algorithm}
\subsubsection*{Model Testing}
As for the prediction phase, shown in Algorithm \ref{alg:02}, a ranked list $\mathfrak N_{u,k}$ of the $k\ll M$ preferred items for each user in the test set is maintained while retrieving the set $\mathcal I$. Given the latent representations of the triplets, and the weights learned; the two first items in $\mathcal I$ are placed in $\mathfrak N_{u,k}$ in a way which ensures that preferred one, $i^*$, is in the first position. Then, the algorithm retrieves the next item, $i\in \mathcal I$ by comparing it to $i^*$. This step is simply carried out by comparing the model's output over the concatenated binary indicator vectors of $(i^*, u, i)$ and $(i, u, i^*)$. Hence, if $f(i,u,i^*)>f(i^*,u,i)$, which from Equation \eqref{eq:defF} is equivalent to $g(\Phi(u,i))~>~g(\Phi(u,i^*))$, then $i$ is predicted to be preferred over $i^*$; $i \prefu i^*$; and it is put at the first place instead of $i^*$ in $\mathfrak N_{u,k}$. Here we assume that the predicted preference relation \!\!$\prefu$\!\! is transitive, which then ensures that the predicted order in the list is respected. Otherwise, if $i^*$ is predicted to be preferred over $i$, then $i$ is compared to the second preferred item in the list, using the model' prediction as before, and so on. The new item, $i$, is inserted in $\mathfrak N_{u,k}$ in the case if it is found to be preferred over another item in $\mathfrak N_{u,k}$.
\begin{algorithm}[t!]
\caption{{\RecNet$_.$}: Testing phase}
\label{alg:02}
\begin{algorithmic}[J]
\REQUIRE \STATE A user $u\in\userS$; A set of items $\itemS=\{1,\ldots,M\}$; \\
A set containing the $k$ preferred items in $\itemS$ by $u$;\\
$\mathfrak N_{u,k} \leftarrow \varnothing$;
\STATE $f$: the output of Algorithm \ref{algRec};
\STATE Apply $f$ to the first two items of $\mathcal I$ and, note the preferred one $i^*$ and place it at the top of $\mathfrak N_{u,k}$;
\FOR{$i=3,\dots,M$}
\IF {$g(\Phi(u,i))>g(\Phi(u,i^*))$}
\STATE Add $i$ to $\mathfrak N_{u,k}$ at rank 1
\ELSE
\STATE $j\leftarrow 1$
\WHILE {$j\le k$ AND $g(\Phi(u,i))<g(\Phi(u,i_g))$) {\color{gray} // where $i_g=\mathfrak N_{u,k}(j)$}}
\STATE $j\leftarrow j+1$
\ENDWHILE
\IF {$j\le k$}
\STATE Insert $i$ in $\mathfrak N_{u,k}$ at rank $j$
\ENDIF
\ENDIF
\ENDFOR
\ENSURE $\mathfrak N_{u,k}$;
\end{algorithmic}
\end{algorithm}
By repeating the process until the end of $\mathcal I$, we obtain a ranked list of the $k$ most preferred items for the user $u$. Algorithm \ref{alg:02} does not require an ordering of the whole set of items, as also in most cases we are just interested in the relevancy of the top-ranked items for assessing the quality of a model. Further, its complexity is at most $O(k\times M)$ which is convenient in the case where $M \gg 1$ and sufficiently small $k$ ($k=10$ in our experiments). The merits of a similar algorithm have been discussed by \cite{Ailon08anefficient} but, as pointed out above, the basic assumption for inserting a new item in the ranked list $\mathfrak N_{u,k}$ is that the predicted preference relation induced by the model should be transitive, which may not hold in general.
\smallskip
In our experiments, we also tested a more conventional inference algorithm, which for a given user $u$, consists in the ordering of items in $\mathcal I$ with respect to the output provided by the function $g$, and we did not find any substantial difference in the performance of {\RecNet}$_.$, as presented in the following section.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Experimental Results}\label{sec:experiment}
We conducted several experiments aimed at evaluating how the simultaneous learning of user and item representations, as well as the preferences of users over items, can be efficiently handled with {\RecNet}$_.$. To this end, we considered five real-world benchmarks commonly used for collaborative filtering. We validated our approach concerning different hyper-parameters that impact the accuracy of the model and compare it with competitive state-of-the-art approaches.
\smallskip
We run all experiments on a cluster of five {32 core Intel Xeon @ 2.6Ghz CPU (with 20MB cache per core)} systems with {256 Giga} RAM running {Debian GNU/Linux 8.6 (wheezy)} operating system.\cmmnt{Finally, since \NetF\ and \kasandr\ data sets are quite large, we run experiments on these data sets on 2 GRID-GPU(s) each having 8 GPU(s) of their own with {4 Giga} RAM.}
All subsequently discussed components were implemented in Python3 using the TensorFlow library with version~1.4.0.\footnote{\url{https://www.tensorflow.org/}. }
\subsection{Datasets}
\label{sec:Data}
The datasets that were used in our experiments are~:
\begin{itemize}
\item {\MovieL}\footnote{\url{https://movielens.org/}} 100K (\ML-100K), {\MovieL} 1M (\ML-1M) \cite{Harper:2015:MDH:2866565.2827872} and {\NetF}\footnote{\url{http://academictorrents.com/details/9b13183dc4d60676b773c9e2cd6de5e5542cee9a}} which consist of user-movie ratings, on a scale of one to five, collected from a movie recommendation service and the Netflix company. The latter was released to support the Netflix Prize competition\footnote{B. James and L. Stan, The Netflix Prize (2007).}. \cmmnt{\ML-100K dataset gathers 100,000 ratings from 943 users on 1682 movies, \ML-1M dataset comprises of 1,000,000 ratings from 6040 users and 3900 movies and {\NetF} consists of 100 million ratings from 480,000 users and 17,000 movies.} For all three datasets, we only keep users who have rated at least five movies and remove users who gave the same rating for all movies. In addition, for {\NetF}, we take a subset of the original data and randomly sample $20\%$ of the users and $20\%$ of the items. In the following experiments, as we only compare with approaches developed for the ranking purposes and our model is designed to handle implicit feedback, these three data sets are made binary such that a rating higher or equal to 4 is set to 1 and 0 otherwise.
\vspace{1mm}\item The {\kasandr}\footnote{\url{https://archive.ics.uci.edu/ml/datasets/KASANDR}} dataset contains the interactions and clicks done by the users of Kelkoo, an online advertising platform from Germany. It gathers 17,764,280 interactions from 521,685 users on 2,299,713 offers belonging to 272 categories and spanning across 801 merchants \cite{DBLP:conf/sigir/sidana17}. For \kasandr, we remove users who gave the same rating for all offers, as well as all those who never clicked \cmmnt{ (and always had a negative rating on all offers) } or always clicked on every offer showed to them.
\vspace{1mm}\item The {\pandor}\footnote{\url{https://archive.ics.uci.edu/ml/datasets/PANDOR}} collection is another publicly available dataset for online recommendation \cite{sidana18} provided by Purch (\url{http://www.purch.com/}). The dataset records 2,073,379 clicks generated by 177,366 users of one of the Purch's high-tech website over 9,077 ads they have been shown during one month.
\end{itemize}
Basic statistics on these collections after pre-processing, as discussed above, are presented in Table \ref{tab:dataset-description}.q
\begin{table}[h!]
\centering
\caption{Statistics of various collections used in our experiments after preprocessing.}\vspace{-2mm}
\label{tab:dataset-description}
\begin{tabular}{lcccc}
\hline
Dataset & \# of users &\# of items & \# of interactions & Sparsity\\
\hline
\ML-100K & 943 & 1,682 & 100,000& 93.685\%\\
\ML-1M & 6,040 & 3,706 & 1,000,209 & 95.530\% \\
\NetF & 90,137 & 3,560 & 4,188,098 & 98.700\% \\
\kasandr&25,848&1,513,038&9,489,273&99.976\%\\
\pandor &5,894,431 & 14,716& 48,754,927 & 99.873\%\\
\hline
\end{tabular}
\end{table}
\subsection{Experimental set-up}
\textbf{Compared baselines.} In order to validate the framework defined in the previous section, we propose to compare the following approaches.
\begin{itemize}
\item {\CoFactor} \cite{liang_16}, developed for implicit feedback, constraints the objective of matrix factorization to use jointly item representations with a factorized shifted positive pointwise mutual information matrix of item
co-occurrence counts. The model was found to outperform WMF \cite{Hu:2008} also proposed for implicit feedback.
\item {\LightFM} \cite{kula_15} was first proposed to deal with the problem of cold-start using meta information. As with our approach, it relies on learning the embedding of users and items with the Skip-gram model but optimizes a pointwise based likelihood ranking loss depending on the dot product of user and item representations, adjusted by user and item feature biases.
\item {Neural Collaborative Filtering (NCF)\footnote{\url{https://github.com/hexiangnan/neural_collaborative_filtering}}}\cite{HeLZNHC17}, that jointly learns the user and items embeddings and the scoring function using Neural Networks, by minimizing a least squared error. \item \MRA{{Wide \& Deep (W\&D)\footnote{\url{https://github.com/tensorflow/models/tree/master/official/r1/wide_deep}}}\cite{Chen16}; that jointly learns a linear model component with feature transformation and a neural network component with embeddings.}
\item {\BPR} \cite{rendle_09} provides an optimization criterion based on implicit feedback; which is the maximum posterior estimator derived from a Bayesian analysis of the pairwise ranking problem and proposes an algorithm based on Stochastic Gradient Descent to optimize it. The model can further be extended to the explicit feedback case and is close to {\RecNet}$_p$. That main difference between the two is that in the latter there is a dense layer with rectified linear units in the dense layers; between the mapping layer and the estimation of the scoring function $g$.
\item {\RecNet}$_p$\footnote{\MRA{\url{https://github.com/sumitsidana/NERvE}}} \MRA{focuses on the quality of the latent representation of users and items by learning the preference and the representation through the ranking loss $\Loss_p$ (Eq. \eqref{eq:embedding_loss}).}
\item {\RecNet}$_c$ \MRA{focuses on the accuracy of the score obtained at the output of the framework and therefore learns the preference and the representation through the ranking loss $\Loss_c$ (Eq. \eqref{eq:ranking_loss}).}
\item {\RecNet}$_{c,p}$ \MRA{uses a linear combination of $\Loss_p$ and $\Loss_c$ as the objective function, with $\alpha=\frac{1}{2}$.}
\end{itemize}
\textbf{Evaluation protocol.} For each dataset, we sort the interactions according to time and take 80\% for training the model and the remaining 20\% for testing it. Besides, we remove all users and offers which do not occur during the training phase.
We study the real-world scenario, setting of predicting the right order over the set of all items for each user.
\smallskip
All comparisons are made based on common ranking metrics, namely the Mean Average Precision (MAP) and the mean Normalized Discounted Cumulative Gain (NDCG). First, let us recall that the Average Precision (AP$@\ell$) is defined over the precision, $Pr$ (fraction of recommended items clicked by the user) of clicked items, at rank $\ell$.
\[
\text{AP}@\ell=\frac{1}{\ell}\sum_{j=1}^\ell r_{j} Pr(j),
\]
and the Normalized Discounted Cumulative Gain (nDCG) is defined as~:
\[
\text{nDCG}@\ell=\frac{\text{DCG}@\ell}{\text{IDCG}@\ell},
\]
where the relevance judgment at rank $j$, $r_j$, is binary (i.e. equal to $1$ when the $j^{th}$ top ranked item is clicked or preferred, and 0 otherwise), $\text{DCG}@\ell=r_1+\sum_{j=2}^\ell \frac{r_j}{\log_2 j}$ is the discounted cumulative gain at rank $\ell$, and $\text{IDCG}@\ell$ is the ideal discounted cumulative gain till position $\ell$. Then, the means of these AP's and nDCG's across all users are the MAP and the NDCG. In the following results, we report both measures at ranks $\ell= 1$ and $\ell= 10$.
\bigskip
\textbf{Hyper-parameters tuning.} For all datasets, hyper-parameters tuning is done on a separate validation set among the following sets.
\begin{itemize}
\item The size of the embedding is chosen among $k \in \{1,\ldots,20\}$.
\item We use $\ell_2$ regularization on the embeddings and choose the hyperparameter $\lambda$ from the set~: $\lambda\in\{10^{-4},10^{-3},5.10^{-3},10^{-2},5.10^{-2}\}$.
\item We run {\RecNet} with 1 hidden layer with relu activation functions, where the number of hidden units is chosen in $\{16,32,64\}$.
\item In order to train $\RecNet$, we use ADAM \cite{KingmaB14} and found the learning rate $\eta=1e-3$ to be more efficient for all our settings.
For other parameters involved in Adam, i.e., the exponential decay rates for the moment estimates, we keep the default values ($\beta_1=0.9$, $\beta_{2}=0.999$ and $\epsilon=10^{-8}$).
\item Finally, we fix the number of epochs to be $T=10,000$ in advance and the size of mini-batches to $n=512$.
\end{itemize}
\iffalse
\begin{figure*}[!h]
\centering
\subfloat[{\ML-100K}]{
\includegraphics[width=0.325\textwidth]{PlotMap_Batches_ML100K.eps}
}
\subfloat[{\ML-1M}]{
\includegraphics[width=0.325\textwidth]{PlotMap_Batches_ML1M.eps}
}
\subfloat[{\kasandr}]{
\includegraphics[width=0.325\textwidth]{PlotMap_Batches_Kasandr.eps}
}
\caption{MAP@1 as a function of the number of batches for {\ML}-1M, {\ML}-100K and {\kasandr}.}
\label{fig:map_batch}
\end{figure*}
\fi
\begin{table}[t!]
\centering
\caption{Best parameters for {\RecNet}$_p$, {\RecNet}$_c$ and {\RecNet}$_{c,p}$; $k$ denotes the dimension of embeddings, $\lambda$ the regularization parameter. We also report the \# of hidden units per layer.}\vspace{-2mm}
\label{tab:param_all}
\resizebox{\textwidth}{!}{
\begin{tabular}{|c|ccc|ccc|ccc|ccc|ccc|}
\hline
& \multicolumn{3}{c|}{\ML-100K} & \multicolumn{3}{c|}{\ML-1M} & \multicolumn{3}{c|}{\NetF}& \multicolumn{3}{c|}{\kasandr}& \multicolumn{3}{c|}{\pandor}\\ \hline
& {\RecNet}$_c$ & {\RecNet}$_p$&{\RecNet}$_{c,p}$ &{\RecNet}$_c$ &{\RecNet}$_p$ &{\RecNet}$_{c,p}$ &{\RecNet}$_c$ &{\RecNet}$_p$ &{\RecNet}$_{c,p}$&{\RecNet}$_c$ &{\RecNet}$_p$ &{\RecNet}$_{c,p}$&{\RecNet}$_c$ &{\RecNet}$_p$ &{\RecNet}$_{c,p}$ \\ \hline
$k$ &$15$&$5$&$8$&$2$&$11$&$2$&$3$&$13$&$1$ &$4$&$16$&$14$ &$19$&$15$&$18$ \\
$\lambda$ &$10^{-3}$&$10^{-3}$&$10^{-3}$&$5.10^{-2}$&$10^{-4}$&$10^{-3}$&$10^{-4}$&$10^{-3}$&$10^{-3}$&$10^{-3}$&$10^{-4}$&$5.10^{-2}$ &$5.10^{-3}$&$5.10^{-4}$&$5.10^{-2}$\\
\# units &$32$&$16$&$16$&$32$&$64$&$32$&$32$&$64$& $64$ &$32$&$64$&$64$ &$64$&$64$&$64$ \\ \hline
\end{tabular}
}
\end{table}
Best hyperparameters values for {\RecNet}$_p$, {\RecNet}$_c$ and {\RecNet}$_{c,p}$ with respect to MAP$@\ell$ are reported in Table \ref{tab:param_all}. \MRA{It comes out that best results are generally obtained with small sizes of an item and user embedded vector spaces $k$ which support our theoretical analysis where we found that small $k$ induces smaller generalization bounds. This observation on the dimension of embedding is also in agreement with the conclusion of \cite{kula_15}, which uses the same technique for representation learning.} We exhibit the impact of the value of the hyper-parameter~$\alpha\in [0,1]$~(Eq.~\eqref{eq:rankingLoss_alpha}) on {\ML-1M} and {\pandor} datasets in the learning of model parameters in Figure \ref{fig:alpha_impact}. As expected the conjunction of both ranking losses (Eq.~\ref{eq:ranking_loss}) and (Eq.~\ref{eq:embedding_loss}) corresponding to situations where $\alpha\neq 0$ and $\alpha\neq 1$ gives always the best results.
\begin{figure}[t!]
\begin{tabular}{cc}
\begin{tikzpicture}[scale=0.7]
\begin{axis}[
xlabel=$\alpha$,
grid=major,
legend pos=south west
]
\addplot[color=green,mark=*] coordinates {
(0.1,0.058935361216730035)
(0.2,0.06749049429657794)
(0.3,0.051330798479087454)
(0.4,0.08555133079847908)
(0.5,0.055133079847908745)
(0.6,0.06749049429657794)
(0.7,0.045627376425855515)
(0.8,0.03897338403041825)
(0.9,0.009505703422053232)
};
\addlegendentry{MAP@1}
\addplot[color=red,mark=x] coordinates {
(0.1,0.07032108153781158)
(0.2,0.08244085340092945)
(0.3,0.0785461554710604)
(0.4,0.11408428390367553)
(0.5,0.07391740599915504)
(0.6,0.08319074778200253)
(0.7,0.0758502323616392)
(0.8,0.05643219264892268)
(0.9,0.02262357414448669)
};
\addlegendentry{MAP@5}
\addplot[color=blue,mark=o] coordinates {
(0.1,0.07671190677369386)
(0.2,0.08372035125837408)
(0.3,0.08604434486088477)
(0.4,0.1184116422234293)
(0.5,0.08084298992093669)
(0.6,0.09193501418311305)
(0.7,0.08534424225964152)
(0.8,0.06604887138632387)
(0.9,0.030530659665640652)
};
\addlegendentry{MAP@10}
\end{axis}
\end{tikzpicture} &
\begin{tikzpicture}[scale=0.7]
\begin{axis}[
xlabel=$\alpha$,
grid=major,
legend pos=south west
]
\addplot[color=green,mark=*] coordinates {
(0.1,0.0738938053097345)
(0.2,0.07960176991150443)
(0.3,0.08628318584070796)
(0.4,0.08517699115044247)
(0.5,0.1006637168141593)
(0.6,0.07300884955752213)
(0.7,0.061946902654867256)
(0.8,0.05088495575221239)
(0.9,0.00663716814159292)
};
\addlegendentry{MAP@1}
\addplot[color=red,mark=x] coordinates {
(0.1,0.08605457227138645)
(0.2,0.08485004916420846)
(0.3,0.089634341199606687)
(0.4,0.09396509341199605)
(0.5,0.11252212389380532)
(0.6,0.08043264503441494)
(0.7,0.0709070796460177)
(0.8,0.060324483775811205)
(0.9,0.0122787610619469)
};
\addlegendentry{MAP@5}
\addplot[color=blue,mark=o] coordinates {
(0.1,0.08783589689563141)
(0.2,0.09280689001264224)
(0.3,0.09678325607529148)
(0.4,0.10223546144121366)
(0.5,0.11888502598679589)
(0.6,0.08408220957999718)
(0.7,0.0749271316196095)
(0.8,0.06398019384745048)
(0.9,0.016548145806995363)
};
\addlegendentry{MAP@10}
\end{axis}
\end{tikzpicture}\\
(a) {\ML}-1M & (b) \pandor
\end{tabular}
\caption{MAP@1, MAP@5, MAP@10 as a function of the value of $\alpha$ for {\ML}-1M, and {\pandor}.}
\label{fig:alpha_impact}
\end{figure}
\iffalse
\begin{table}[]
\centering
\caption{popularity bias on datasets where k=1 is chosen}
\label{tab: popBiasInteracted}
\begin{tabular}{l|c|l|c|l|}
\cline{2-5}
& \multicolumn{2}{c|}{ML-100K} & \multicolumn{2}{c|}{ML-1M} \\ \cline{2-5}
& \multicolumn{1}{l|}{MAP@1} & MAP@10 & \multicolumn{1}{l|}{MAP@1} & MAP@10 \\ \hline
\multicolumn{1}{|l|}{Popularity} & 0.594 & \multicolumn{1}{c|}{0.659} & 0.646 & \multicolumn{1}{c|}{0.657} \\ \hline
\end{tabular}
\end{table}
\fi
\iffalse
Lastly, to train {\RecNet$_.$}, we fix the number of epochs to $T=10,000$ and the size of the mini-batches to $n=512$. To further avoid over-fitting (as shown in Figure \ref{fig:map_batch} (a)), we also use early-stopping. For the optimization of the different ranking losses, we use Adam \cite{DBLP:journals/corr/KingmaB14} and the learning rate $\eta$ is set to 1e-3 using a validation set. For other parameters involved in Adam, i.e., the exponential decay rates for the moment estimates, we keep the default values ($\beta_1=0.9$, $\beta_{2}=0.999$ and $\epsilon=10^{-8}$).
\fi
\subsection{Results}
Hereafter, we compare and summarize the performance of {\RecNet}$_.$ with the baseline methods on various datasets. Since the prediction on all offers is more realistic setting, and predicting on shown offers introduces the bias of the algorithm used to show offers, we compute the results of predicting over all offers of the catalog for all baselines.
Empirically, we observed that the version of $\RecNet_{c,p}$ where both $\Loss_c$ and~$\Loss_p$ have an equal weight while training gives better results on average, and we decided to only report these results later.
Table \ref{tab:results_warm_all} reports all results. Also, in each case, we statistically compare the performance of each algorithm, and we use boldface to indicate the highest performance, and the symbol $\DA$ indicates that performance is significantly worse than the best result, according to a Wilcoxon rank sum test used at a p-value threshold of $0.01$ \cite{lehmann_06}.
When the prediction is made over all offers (see Table \ref{tab:results_warm_all}), we can make two observations. First, all the algorithms encounter an extreme drop in their performance in terms of MAP. Second, {\RecNet} framework significantly outperforms all other algorithms on all datasets, and this difference is all the more important on {\kasandr}, where for instance {\RecNet$_{c,p}$} is in average 15 times more efficient.
\MRA{\paragraph{Comparisons with pointwise ranking approaches} CoFactor, LightFM and NCF are pointwise ranking models and, in the majority of cases, they perform less than pairwise approaches on all datasets regarding both performance measures MAP and NDCG. These results are in line with other studies that compared these two approaches for RS and previously presented in \cite{rendle_09,DacremaCJ19}. The common point between these pointwise models is that they all learn users and items' embeddings (using neural networks or not). On \kasandr{} and \pandor{} datasets which are larger than the other collections; NN based models (i.e. NCF and W\&D) are more efficient than CoFactor and LightFM. These results suggest that with sufficient data, NN are able to learn user and item representations that are more robust regarding the related scoring function in the embedded space for implicit feedback. }
\MRA{\paragraph{Comparisons with BPR-MF} BPR-MF is a pairwise non-neural network approach and fails to capture the non-linearity of user-item interactions. Since \RecNet{} models user-item representations and minimize pairwise ranking loss simultaneously using a dense, fully connected network, it is able to learn non-linear relationships between users and item interactions and hence able to outperform BPR-MF considerably.}
\paragraph{Comparison between {\RecNet} versions}
One can note that while optimizing ranking losses by Eq. \eqref{eq:rankingLoss_alpha} or Eq. \eqref{eq:ranking_loss} or Eq. \eqref{eq:embedding_loss}, we simultaneously learn representation and preference function; the main difference is the amount of emphasis we put in learning one or another.
\MRA{Results presented in table \ref{tab:results_warm_all} show that, on larger datasets, \kasandr{} and \pandor{} where the number of interactions are higher than in other collections, optimizing the linear combination of the pairwise-ranking loss and the embedding loss ({\RecNet}$_{c,p}$) increases the quality of overall recommendations instead of optimizing standalone losses to learn embeddings and the pairwise preference function.}
\section{Conclusion}\label{sec:conclusion}
We presented and analyzed a learning to rank framework for recommender systems which consists of learning user preferences over items. We showed that the minimization of pairwise ranking loss over user preferences involves dependent random variables and provided a theoretical analysis by proving the consistency of the empirical risk minimization in the worst case where all users choose a minimal number of positive and negative items. From this analysis, we then proposed {\RecNet}, a new neural-network based model for learning the user preference, where both the user's and item's representations and the function modeling the user's preference over pairs of items are learned simultaneously. The learning phase is guided using a ranking objective that can capture the ranking ability of the prediction function as well as the expressiveness of the learned embedded space, where the preference of users over items is respected by the dot product function defined over that space. The training of {\RecNet} is carried out using the back-propagation algorithm in mini-batches defined over a user-item matrix containing implicit information in the form of subsets of preferred and non-preferred items. The learning capability of the model over both prediction and representation problems show their interconnection and also that the proposed double ranking objective allows to conjugate them well. We assessed and validated the proposed approach through extensive experiments, using five popular collections proposed for the task of recommendation. Furthermore, we propose to study two different settings for the prediction phase and demonstrate that the performance of each approach is strongly impacted by the set of items considered for making the prediction. We believe that our model is a fresh departure from the models which learn pairwise ranking function without the knowledge of embeddings or which learn embeddings without learning any pairwise ranking function.
For future work, we would like to extend {\RecNet} to take into account additional contextual information regarding users and/or items. More specifically, we are interested in the integration of data of different natures, such as text or demographic information as it exists in the \pandor\ dataset. We believe that this information can be taken into account without much effort and by doing so, it is possible to improve the performance of our approach and tackle the problem of providing recommendations for new users/items at the same time, also known as the cold-start problem. The second important extension will be the development of an online version of the proposed algorithm to make the approach suitable for real-time applications and online advertising. Finally, we have shown that choosing a suitable $\alpha$, which controls the trade-off between ranking and embedding loss, greatly impact the performance of the proposed framework, and we believe that an exciting extension will be to learn this hyper-parameter automatically and to make it adaptive during the training phase.
\iffalse
\section*{Appendix}
\setcounter{theorem}{0}
\begin{theorem}
Let $\userS$ be a set of $M$ independent users, such that each user $u \in \userS $ prefers $n_*^+$ items over $n_*^-$ ones in a predefined set of $\itemS$ items. Let $S=\{(\bfZ_{i,u,i'}\doteq(i,u,i'),y_{i,u,i'})\mid u\in\userS, (i,i')\in\itemS^+_u\times \itemS^-_u\}$ be the associated training set, then for any $1>\delta>0$ the following generalization bound holds for all $f\in \mathcal{F}_{B,r}$ with probability at least $1-\delta$:
\begin{align*}
\Loss(f)\le ~~&\hat\Loss_*(f,S) + \frac{2B\mathfrak{C}(S)}{Nn^+_*}+\\ &\frac{5}{2}\left(\sqrt{\frac{2B\mathfrak{C}(S)}{Nn^+_*}}+\sqrt{\frac{r}{2}}\right)\sqrt{\frac{\log\frac{1}{\delta}}{n_*^+}}+\frac{25}{48}\frac{\log\frac{1}{\delta}}{n_*^+},
\end{align*}
where $\mathfrak{C}(S)=\sqrt{ \frac{1}{n^-_*}\sum_{j=1}^{n_*^-}\mathbb{E}_{\Cset_j}\left[ \sum_{\substack{\alpha\in\Cset_j \\ \bfZ_\alpha \in S}}d(\bfZ_\alpha,\bfZ_{\alpha}))\right]}$, $\bfZ_\alpha=(i_\alpha,u_\alpha,i'_\alpha)$ and \\ \begin{align*}
d(\bfZ_\alpha,\bfZ_{\alpha})=~&\kappa(\Phi(u_\alpha,i_\alpha),\Phi(u_\alpha,i_\alpha))\\+\kappa(\Phi(u_\alpha,i'_\alpha),\Phi(u_\alpha,i'_\alpha))-
&2\kappa(\Phi(u_\alpha,i_\alpha),\Phi(u_\alpha,i'_\alpha)).
\end{align*}
\end{theorem}
\begin{IEEEproof}
As the set of users $\userS$ is supposed to be independent, the exact fractional cover of the dependency graph corresponding to the training set $S$ will be the union of the exact fractional cover associated to each user such that cover sets which do not contain any items in common are joined together.
Following \cite[Proposition 4]{RalaiAmin15}, for any $1>\delta>0$ we have with probability at least $1-\delta$:
\begin{equation*}
\small
\begin{split}
&\mathbb{E}_S[\hat\Loss_*(f,S)]-\hat\Loss_*(f,S)\\
&\leq \inf_{\beta>0}\left( (1+\beta)\rademacher_{S}(\mathcal{F}_{B,r})+\frac{5}{4}\sqrt{\frac{2r\log\frac{1}{\delta}}{n_*^+}}+\frac{25}{16}\left(\frac{1}{3}+\frac{1}{\beta}\right)\frac{\log \frac{1}{\delta}}{n_*^+}\right)
\end{split}
\end{equation*}
The infimum is reached for $\beta^*=\sqrt{\frac{25}{16}\frac{\log \frac{1}{\delta}}{n_*^+\times \rademacher_{S}(\mathcal{F}_{B,r})}}$ which by plugging it back into the upper-bound, and from equation \eqref{eq:EmpRisk2}, gives:
\begin{equation}
\resizebox{0.5 \textwidth}{!}{$
\label{eq:UpperBound}
\Loss(f) \le \hat\Loss_*(f,S) + \rademacher_{S}(\mathcal{F}_{B,r})+\frac{5}{2}\left(\sqrt{\rademacher_{S}(\mathcal{F}_{B,r})}+\sqrt{\frac{r}{2}}\right)\sqrt{\frac{\log\frac{1}{\delta}}{n_*^+}}+\frac{25}{48}\frac{\log\frac{1}{\delta}}{n_*^+}.
$}
\end{equation}
Now, for all $j\in\{1,\ldots,J\}$ and $\alpha \in \mathcal M_j$, let $(u_\alpha,i_\alpha)$ and $(u_\alpha,i'_\alpha)$ be the first and the second pair constructed from $\bfZ_\alpha$, then from the bilinearity of dot product and the Cauchy-Schwartz inequality, $\rademacher_{S}(\mathcal{F}_{B,r})$ is upper-bounded by:
\begin{align}
&\frac{2}{m}\mathbb{E}_{\xi}\sum_{j=1}^{n_*^-}\mathbb{E}_{\Cset_j}\sup_{f\in\mathcal{F}_{B,r}} \left\langle \boldsymbol{w},\sum_{{\alpha\in\Cset_j \\ \bfZ_\alpha \in S}}\xi_\alpha\left( \Phi(u_\alpha,i_\alpha)-\Phi(u_\alpha,i'_\alpha)\right)\right\rangle \nonumber\\
& \le \frac{2B}{m}\sum_{j=1}^{n_*^-}\mathbb{E}_{\Cset_j} \mathbb{E}_{\xi}\left\| \sum_{\substack{\alpha\in\Cset_j \\ \bfZ_\alpha \in S}}\xi_\alpha(\Phi(u_\alpha,i_\alpha)-\Phi(u_\alpha,i'_\alpha))\right\| \nonumber \\
& \le \frac{2B}{m}\sum_{j=1}^{n_*^-}\left(\mathbb{E}_{\Cset_j \xi}\left[ \sum_{\substack{\alpha,\alpha'\in\Cset_j \\ \bfZ_\alpha,\bfZ_{\alpha'} \in S}}\xi_\alpha\xi_{\alpha'}d(\bfZ_\alpha,\bfZ_{\alpha'}))\right]\right)^{1/2},
\end{align}
where the last inequality follows from Jensen's inequality and the concavity of the square root, and
\[
d(\bfZ_\alpha,\bfZ_{\alpha'})=\left\langle \Phi(u_\alpha,i_\alpha)-\Phi(u_\alpha,i'_\alpha),\Phi(u_\alpha,i_\alpha)-\Phi(u_\alpha,i'_\alpha)\right\rangle.
\]
Further, for all $j\in\{1,\ldots,n^-_*\}, \alpha,\alpha' \in \mathcal M_j, \alpha\neq \alpha'; $ we have $\mathbb{E}_\xi[\xi_\alpha \xi_{\alpha'}]=0$, \cite[p. 91]{Shawe-Taylor:2004:KMP:975545} so:
\begin{align*}
\rademacher_{S}(\mathcal{F}_{B,r})\le & ~\frac{2B}{m}\sum_{j=1}^{n_*^-}\left(\mathbb{E}_{\Cset_j}\left[ \sum_{\substack{\alpha\in\Cset_j \\ \bfZ_\alpha \in S}}d(\bfZ_\alpha,\bfZ_{\alpha}))\right]\right)^{1/2}\\
= &~ \frac{2Bn^-_*}{m}\sum_{j=1}^{n_*^-}\frac{1}{n^-_*}\left(\mathbb{E}_{\Cset_j}\left[ \sum_{\substack{\alpha\in\Cset_j \\ \bfZ_\alpha \in S}}d(\bfZ_\alpha,\bfZ_{\alpha}))\right]\right)^{1/2}.
\end{align*}
By using Jensen's inequality and the concavity of the square root once again, we finally get
\begin{equation}
\label{eq:FracChromatic}
\rademacher_{S}(\mathcal{F}_{B,r})\le
\frac{2B}{Nn^+_*}\sqrt{\sum_{j=1}^{n_*^-}\frac{1}{n^-_*}\mathbb{E}_{\Cset_j}\left[ \sum_{\substack{\alpha\in\Cset_j \\ \bfZ_\alpha \in S}}d(\bfZ_\alpha,\bfZ_{\alpha}))\right]}.
\end{equation}
The result follows from equations \eqref{eq:UpperBound} and \eqref{eq:FracChromatic}.
\end{IEEEproof}
\fi
\section*{Acknowledgements}
This work was partly done under the Calypso project supported by the FEDER program from the R\'egion Auvergne-Rh\^one-Alpes.
The work of Yury Maximov at LANL was carried out under the auspices of the National Nuclear Security Administration of the US Department of Energy under Contract No.~DE-AC52-06NA25396 and CNLS/LANL support.
\section*{Appendix}
\begin{theoremAp}
Let $\userS$ be a set of $M$ independent users, such that each user $u \in \userS $ prefers $n_*^+$ items over $n_*^-$ ones in a predefined set of $\itemS$ items. Let $S=\{(\bfZ_{i,u,i'}\doteq(i,u,i'),y_{i,u,i'})\mid u\in\userS, (i,i')\in\itemS^+_u\times \itemS^-_u\}$ be the associated training set, then for any $1>\delta>0$ the following generalization bound holds for all $f\in \mathcal{F}_{B,r}$ with probability at least $1-\delta$:
\begin{align*}
\Loss(f)\le ~~&\hat\Loss_*(f,S) + \frac{2B\mathfrak{C}(S)}{Nn^+_*}+\\ &\frac{5}{2}\left(\sqrt{\frac{2B\mathfrak{C}(S)}{Nn^+_*}}+\sqrt{\frac{r}{2}}\right)\sqrt{\frac{\log\frac{1}{\delta}}{n_*^+}}+\frac{25}{48}\frac{\log\frac{1}{\delta}}{n_*^+},
\end{align*}
where $\mathfrak{C}(S)=\sqrt{ \frac{1}{n^-_*}\sum_{j=1}^{n_*^-}\mathbb{E}_{\Cset_j}\left[ \sum_{\substack{\alpha\in\Cset_j \\ \bfZ_\alpha \in S}}d(\bfZ_\alpha,\bfZ_{\alpha}))\right]}$, $\bfZ_\alpha=(i_\alpha,u_\alpha,i'_\alpha)$ and
\begin{align*}
d(\bfZ_\alpha,\bfZ_{\alpha})=~&\kappa(\Phi(u_\alpha,i_\alpha),\Phi(u_\alpha,i_\alpha))\\ & +\kappa(\Phi(u_\alpha,i'_\alpha),\Phi(u_\alpha,i'_\alpha))-
2\kappa(\Phi(u_\alpha,i_\alpha),\Phi(u_\alpha,i'_\alpha)).
\end{align*}
\end{theoremAp}
\begin{proof}
Let $n_\star^+$ and respectively $n_\star^-$ be the minimum number of preferred and non-preferred items for any user $u\in\userS$, then we have~:
\begin{equation}
\hat\Loss(f,S)\le \underbrace{\frac{1}{N}\frac{1}{n_*^- n_*^+}\sum_{u\in\userS}\sum_{i\in\itemS^+_u}\sum_{i'\in\itemS^-_u} \Ind_{y_{i,u,i'}f(i,u,i')<0}}_{=\hat\Loss_*(f,S)}.
\end{equation}
As the set of users $\userS$ is supposed to be independent, the exact fractional cover of the dependency graph corresponding to the training set $S$ will be the union of the exact fractional cover associated to each user such that cover sets which do not contain any items in common are joined together.
Following \cite[Proposition 4]{RalaiAmin15}, for any $1>\delta>0$ we have with probability at least $1-\delta$:
\begin{equation*}
\begin{split}
\mathbb{E}_S[\hat\Loss_*(f,S)]&-\hat\Loss_*(f,S)\\
&\leq \inf_{\beta>0}\left( (1+\beta)\rademacher_{S}(\mathcal{F}_{B,r})+\frac{5}{4}\sqrt{\frac{2r\log\frac{1}{\delta}}{n_*^+}}+\frac{25}{16}\left(\frac{1}{3}+\frac{1}{\beta}\right)\frac{\log \frac{1}{\delta}}{n_*^+}\right).
\end{split}
\end{equation*}
The infimum is reached for $\beta^*=\sqrt{\frac{25}{16}\frac{\log \frac{1}{\delta}}{n_*^+\times \rademacher_{S}(\mathcal{F}_{B,r})}}$ which by plugging it back into the upper-bound, and from equation \eqref{eq:EmpRisk2}, gives:
\begin{equation}
\label{eq:UpperBound}
\Loss(f) \le \hat\Loss_*(f,S) + \rademacher_{S}(\mathcal{F}_{B,r})+\frac{5}{2}\left(\sqrt{\rademacher_{S}(\mathcal{F}_{B,r})}+\sqrt{\frac{r}{2}}\right)\sqrt{\frac{\log\frac{1}{\delta}}{n_*^+}}+\frac{25}{48}\frac{\log\frac{1}{\delta}}{n_*^+}.
\end{equation}
Now, for all $j\in\{1,\ldots,J\}$ and $\alpha \in \mathcal M_j$, let $(u_\alpha,i_\alpha)$ and $(u_\alpha,i'_\alpha)$ be the first and the second pair constructed from $\bfZ_\alpha$, then from the bilinearity of dot product and the Cauchy-Schwartz inequality, $\rademacher_{S}(\mathcal{F}_{B,r})$ is upper-bounded by:
\begin{align}
\frac{2}{m}\mathbb{E}_{\xi}\sum_{j=1}^{n_*^-}\mathbb{E}_{\Cset_j}\sup_{f\in\mathcal{F}_{B,r}} & \left\langle \boldsymbol{w},\sum_{\substack{\alpha\in\Cset_j \\ \bfZ_\alpha \in S}}\xi_\alpha\left( \Phi(u_\alpha,i_\alpha)-\Phi(u_\alpha,i'_\alpha)\right)\right\rangle \nonumber\\
& \le \frac{2B}{m}\sum_{j=1}^{n_*^-}\mathbb{E}_{\Cset_j} \mathbb{E}_{\xi}\left\| \sum_{\substack{\alpha\in\Cset_j \\ \bfZ_\alpha \in S}} \xi_\alpha(\Phi(u_\alpha,i_\alpha)-\Phi(u_\alpha,i'_\alpha))\right\| \nonumber \\
& \le \frac{2B}{m}\sum_{j=1}^{n_*^-}\left(\mathbb{E}_{\Cset_j \xi}\left[ \sum_{\substack{\alpha,\alpha'\in\Cset_j \\ \bfZ_\alpha,\bfZ_{\alpha'} \in S}}\xi_\alpha\xi_{\alpha'}d(\bfZ_\alpha,\bfZ_{\alpha'}))\right]\right)^{1/2},
\end{align}
where the last inequality follows from Jensen's inequality and the concavity of the square root, and
\[
d(\bfZ_\alpha,\bfZ_{\alpha'})=\left\langle \Phi(u_\alpha,i_\alpha)-\Phi(u_\alpha,i'_\alpha),\Phi(u_\alpha,i_\alpha)-\Phi(u_\alpha,i'_\alpha)\right\rangle.
\]
Further, for all $j\in\{1,\ldots,n^-_*\}, \alpha,\alpha' \in \mathcal M_j, \alpha\neq \alpha'; $ we have $\mathbb{E}_\xi[\xi_\alpha \xi_{\alpha'}]=0$, \cite[p. 91]{Shawe-Taylor:2004:KMP:975545} so:
\begin{align*}
\rademacher_{S}(\mathcal{F}_{B,r})\le & ~\frac{2B}{m}\sum_{j=1}^{n_*^-}\left(\mathbb{E}_{\Cset_j}\left[ \sum_{\substack{\alpha\in\Cset_j \\ \bfZ_\alpha \in S}} d(\bfZ_\alpha,\bfZ_{\alpha}))\right]\right)^{1/2}\\
= &~ \frac{2Bn^-_*}{m}\sum_{j=1}^{n_*^-}\frac{1}{n^-_*}\left(\mathbb{E}_{\Cset_j}\left[ \sum_{\substack{\alpha\in\Cset_j \\ \bfZ_\alpha \in S}} d(\bfZ_\alpha,\bfZ_{\alpha}))\right]\right)^{1/2}.
\end{align*}
By using Jensen's inequality and the concavity of the square root once again, we finally get
\begin{equation}
\label{eq:FracChromatic}
\rademacher_{S}(\mathcal{F}_{B,r})\le
\frac{2B}{Nn^+_*}\sqrt{\sum_{j=1}^{n_*^-}\frac{1}{n^-_*}\mathbb{E}_{\Cset_j}\left[ \sum_{\substack{\alpha\in\Cset_j \\ \bfZ_\alpha \in S}} d(\bfZ_\alpha,\bfZ_{\alpha}))\right]}.
\end{equation}
The result follows from equations \eqref{eq:UpperBound} and \eqref{eq:FracChromatic}.
\end{proof}
\section{References}
\bibliography{NERvE}
\end{document}
Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media
1 note · View note
bicyclehandlebar · 2 years
Text
The fold plates move into position automatically
The Martin Yale 1701 Auto folder is a high quality auto setup paper folder that is ideal forchurches, small copy shops, mailrooms, or corporate departments thathave numerous users and/or have a need for multiple types of folds.Itis an automatic setup folder which simply means that you can press abutton or two on the machine in order to choose the desired paper sizeand folding style. The fold plates move into position automatically. This makes the China bicycle stems suppliers Martin Yale 1701 easy to setup and easy to use even for userswho don't have any training or experience using a paper folding machine. Strengths / Features: Oneof the best things about the 1701 Martin Yale Autofolder is the cost.It is the lowest cost auto setup folder available on the market. With aselling price of around $1500 it is priced similarly to manual setuppaper folder from other manufacturers but is much easier to use.Theauto setup features of the 1701 make it incredibly simple to operate.
You simply choose the size of paper that you want to fold and thefolding pattern that you want to use.In just a couple of seconds, themachine will move the folding plates into position and you are ready tobegin folding. This makes the 1701 an excellent choice fororganizations that use multiple folding patterns or who have users whostruggle to set up complicated mechanical equipment.TheMartin Yale 1701 is capable of handling letter size, legal size and A4paper. It can create be set to six different folding patternsincluding: letter fold, half fold, z-fold, double parallet, right angleand baronial. Or if you have a custom fold or need to fold a sheet assmall as 3-1/2" x 5", it can be manually adjusted to handle these aswell.The 1701 can fold stapled sets of up to 5 sheetsthrough its manual bypass. These have to be fed into the machine oneset a time.The 1701 has a feed tray that is capable ofhandling up to 150 sheets of 20lb paper. This helps to improveproductivity without having to worry about reloading the folder everyfew seconds.
Weaknesses / Limitations: The1701 does not have an adjustable speed setting. This means that thespeed of the folder cannot be reduced for folding specialty papers orthicker stocks. The Martin Yale 1701 is an excellent choice for folding20lb, 24lb and 28lb paper. However, it is not designed for foldingheavy card stocks, slick glossy papers or papers with special coatings.These types of stocks will most likely cause paper jams inside the 1701folder. Although the 1701 is capable of folding letter,legal and A4 paper it is not capable of handling 11" x 17" or A3 sizedpapers. If you are looking for an automatic setup folder that canhandle these larger sizes of paper you will need to check out theIntimus 2051 or a similar commercial grade folding machine.Althoughthe 1701 paper folder is an automatic setup folder it is stillnecessary in some cases and for some folding patterns to adjust theoutput stacker wheel manually. Only the largest automatic setup foldersadjust the stacker wheel automatically.
0 notes
globaltradesposts · 4 months
Text
Solutions That Grow Your Copy Paper Exporting Business
In today's globalized economy, businesses engaged in exporting A4 copy paper face both immense opportunities and challenges. With the right strategies and tools, however, navigating this competitive landscape can lead to significant growth and success. Whether you're a seasoned exporter or just starting out in the industry, implementing effective solutions is crucial for staying ahead. One such solution that stands out is leveraging platforms like Global Trade Plaza to find verified leads and expand your market reach.
Tumblr media
Understanding the Landscape
The demand for A4 copy paper remains consistently high worldwide, driven by the ever-growing need for documentation and printing in various sectors. This demand creates a lucrative market for exporters, but it also means facing stiff competition from other players vying for market share. To thrive in this environment, exporters must adopt innovative solutions that streamline operations and enhance visibility.
Leveraging Global Trade Plaza
Global Trade Plaza emerges as a valuable resource for A4 copy paper exporters seeking to connect with potential buyers and expand their network. As a reputable online marketplace, it offers a platform where exporters can showcase their products and services to a global audience. What sets Global Trade Plaza apart is its emphasis on verified leads, ensuring that exporters engage with genuine buyers, thereby minimizing risks and maximizing opportunities for successful transactions.
Benefits of Verified Leads
Trust and Reliability: Dealing with verified leads instills confidence in exporters, knowing they are engaging with legitimate businesses. This trust forms the foundation for building long-term relationships, essential for sustained growth in the export market.
Time and Cost Efficiency: 
Searching for leads can be time-consuming and resource-intensive. Global Trade Plaza streamlines this process by providing a curated list of verified leads, saving exporters valuable time and resources that can be allocated to other critical aspects of their business.
Market Expansion: 
By accessing verified leads on Global Trade Plaza, exporters can tap into new markets and regions they might not have previously explored. This diversification of clientele reduces dependence on specific markets and mitigates risks associated with economic fluctuations or geopolitical events.
Tailored Marketing Strategies
Armed with verified leads from Global Trade Plaza, A4 copy paper exporters can develop targeted marketing strategies to effectively reach their intended audience. This may include personalized email campaigns, engaging content creation, or participation in relevant industry events and exhibitions. By understanding the unique needs and preferences of potential buyers, exporters can tailor their approach for maximum impact and engagement.
Embracing Innovation
Innovation plays a pivotal role in the success of any export-oriented business. A4 copy paper exporters can leverage technology to optimize logistics, improve product quality, and enhance customer experience. Additionally, staying updated with industry trends and adopting eco-friendly practices can give exporters a competitive edge while contributing to sustainability efforts—a growing concern among consumers and businesses alike.
Conclusion
In a competitive market landscape, A4 copy paper exporters must continuously seek solutions that enable growth and sustainability. Leveraging platforms like Global Trade Plaza to access verified leads empowers exporters to expand their reach, build trust, and capitalize on emerging opportunities. By embracing innovation and adopting targeted marketing strategies, exporters can position themselves for long-term success in the dynamic world of international trade.
0 notes
mongomania · 4 years
Photo
Tumblr media
#Mellowyellow #Series #2020 I started this one long ago in 2019. It somehow took me more time than usual to finish this one. Probably because I did so many other things (here I tokyo) than sitting at the desk and work@on these small paper works size is #a4paper on archival 290gr #drawing #drawings #drawingchallenge #ink on #paper finished it in #tokyo #東京 I used different tools this time of coursesome Japanese #copic #fineliner #ink #sumi #すみ and a #toothpick in the end a #かな #brush I’m working a lot@with Japanese calligraphy brushes.. #illustrationart #artcollector #artlovers (hier: Tokyo 東京, Japan) https://www.instagram.com/p/CHuD6YhrTvN/?igshid=qd6iebznhl3n
1 note · View note
Tumblr media
ICare Copier Papers 
from Aadithyaaswin paper mill Pvt ltd
  We offer an amazing assortment of quality Copier Paper that is widely utilized in offices, education institutions, etc. These papers are manufactured using premium quality staples and are highly acclaimed by our valuable clients all across India.
1 note · View note
mercerenterprise · 5 years
Photo
Tumblr media
Get Paper Products at a Cost-Effective Price
1 note · View note
claramusician · 2 years
Photo
Tumblr media
Regent Bowerbird - #HandmadePostcard #HandmadePoster #NatureSeries #BirdsCollection #WatercolorPaintings #Drawing #Painting #A4Paper #InstagramArtist #TwitterArtist #WorldwideShipping #ArtIsLife https://www.instagram.com/p/Cem5evJrR7y/?igshid=NGJjMDIxMWI=
0 notes