Ubuntu Manpage: mlpack_cf - collaborative filtering

Provided by: mlpack-bin_4.5.1-1build2_amd64

NAME

       mlpack_cf - collaborative filtering

SYNOPSIS

        mlpack_cf [-a string] [-A bool] [-m unknown] [-i string] [-I bool] [-N int] [-r double] [-S string] [-n int] [-z string] [-q unknown] [-R int] [-c int] [-s int] [-T unknown] [-t unknown] [-V bool] [-o unknown] [-M unknown] [-h -v]

DESCRIPTION

       This  program  performs collaborative filtering (CF) on the given dataset. Given a list of user, item and
       preferences (the '--training_file (-t)' parameter), the program will perform a matrix  decomposition  and
       then  can  perform  a  series of actions related to collaborative filtering. Alternately, the program can
       load an existing saved CF model with the '--input_model_file (-m)' parameter and then use that  model  to
       provide recommendations or predict values.

       The  input matrix should be a 3-dimensional matrix of ratings, where the first dimension is the user, the
       second dimension is the item, and the third dimension is that user's rating of that item. Both the  users
       and items should be numeric indices, not names. The indices are assumed to start from 0.

       A  set  of query users for which recommendations can be generated may be specified with the '--query_file
       (-q)' parameter; alternately, recommendations  may  be  generated  for  every  user  in  the  dataset  by
       specifying  the  ’--all_user_recommendations  (-A)' parameter. In addition, the number of recommendations
       per user to generate can be specified with the ’--recommendations (-c)'  parameter,  and  the  number  of
       similar  users  (the  size  of  the neighborhood) to be considered when generating recommendations can be
       specified with the '--neighborhood (-n)' parameter.

       For performing the matrix decomposition, the following optimization algorithms can be specified  via  the
       '--algorithm (-a)' parameter:

              •  ’RegSVD' -- Regularized SVD using a SGD optimizer

              •  ’NMF' -- Non-negative matrix factorization with alternating least squares update rules

              •  ’BatchSVD' -- SVD batch learning

              •  ’SVDIncompleteIncremental' -- SVD incomplete incremental learning

              •  ’SVDCompleteIncremental' -- SVD complete incremental learning

              •  ’BiasSVD' -- Bias SVD using a SGD optimizer

              •  ’SVDPP' -- SVD++ using a SGD optimizer

              •  ’RandSVD' -- RandomizedSVD learning

              •  ’QSVD' -- QuicSVD learning

              •  ’BKSVD' -- Block Krylov SVD learning

              The  following  neighbor  search  algorithms  can  be  specified  via the ’--neighbor_search (-S)'
              parameter:

                     •  ’cosine' -- Cosine Search Algorithm

                     •  ’euclidean' -- Euclidean Search Algorithm

                     •  ’pearson' -- Pearson Search Algorithm

              The following weight interpolation algorithms can be  specified  via  the  ’--interpolation  (-i)'
              parameter:

                     •  ’average' -- Average Interpolation Algorithm

                     •  ’regression' -- Regression Interpolation Algorithm

                     •  ’similarity' -- Similarity Interpolation Algorithm

              The  following  ranking  normalization  algorithms can be specified via the ’--normalization (-z)'
              parameter:

                     •  ’none' -- No Normalization

                     •  ’item_mean' -- Item Mean Normalization

                     •  ’overall_mean' -- Overall Mean Normalization

                     •  ’user_mean' -- User Mean Normalization

                     •  ’z_score' -- Z-Score Normalization

              A trained model may be saved to with the '--output_model_file (-M)' output parameter.

              To train a CF model on a dataset 'training_set.csv' using NMF for  decomposition  and  saving  the
              trained model to 'model.bin', one could call:

              $ mlpack_cf --training_file training_set.csv --algorithm NMF --output_model_file model.bin

              Then,  to  use  this  model  to  generate  recommendations  for the list of users in the query set
              'users.csv', storing 5 recommendations in 'recommendations.csv', one could call

              $ mlpack_cf --input_model_file model.bin --query_file users.csv --recommendations 5  --output_file
              recommendations.csv

OPTIONAL INPUT OPTIONS

       --algorithm (-a) [string]
              Algorithm used for matrix factorization.  Default value 'NMF'.

       --all_user_recommendations (-A) [bool]
              Generate recommendations for all users.

       --help (-h) [bool]
              Default help info.

       --info [string]
              Print help on a specific option. Default value ''.

       --input_model_file (-m) [unknown]
              Trained CF model to load.

       --interpolation (-i) [string]
              Algorithm used for weight interpolation.  Default value 'average'.

       --iteration_only_termination (-I) [bool]
              Terminate only when the maximum number of iterations is reached.

       --max_iterations (-N) [int]
              Maximum  number  of  iterations.  If  set  to zero, there is no limit on the number of iterations.
              Default value 1000.

       --min_residue (-r) [double]
              Residue required to terminate  the  factorization  (lower  values  generally  mean  better  fits).
              Default value 1e-05.

       --neighbor_search (-S) [string]
              Algorithm used for neighbor search. Default value 'euclidean'.

       --neighborhood (-n) [int]
              Size of the neighborhood of similar users to consider for each query user. Default value 5.

       --normalization (-z) [string]
              Normalization performed on the ratings. Default value 'none'.

       --query_file (-q) [unknown]
              List of query users for which recommendations should be generated.

       --rank (-R) [int]
              Rank  of  decomposed  matrices  (if 0, a heuristic is used to estimate the rank). Default value 0.
              --recommendations (-c) [int] Number of recommendations to generate for each  query  user.  Default
              value 5.

       --seed (-s) [int]
              Set the random seed (0 uses std::time(NULL)).  Default value 0.

       --test_file (-T) [unknown]
              Test set to calculate RMSE on.

       --training_file (-t) [unknown]
              Input dataset to perform CF on.

       --verbose (-v) [bool]
              Display informational messages and the full list of parameters and timers at the end of execution.

       --version (-V) [bool]
              Display the version of mlpack.

OPTIONAL OUTPUT OPTIONS

       --output_file (-o) [unknown] Matrix that will store output recommendations.

       --output_model_file (-M) [unknown]
              Output for trained CF model.

ADDITIONAL INFORMATION

       For  further  information,  including  relevant  papers, citations, and theory, consult the documentation
       found at http://www.mlpack.org or included with your distribution of mlpack.

mlpack-4.5.1                                     29 January 2025                                    mlpack_cf(1)