{"id":15902,"date":"2022-02-22T13:51:04","date_gmt":"2022-02-22T13:51:04","guid":{"rendered":"https:\/\/mailinvest.blog\/index.php\/2022\/02\/22\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\/"},"modified":"2022-02-22T13:51:04","modified_gmt":"2022-02-22T13:51:04","slug":"how-to-build-a-recommender-system-with-tf-idf-and-nmf-python","status":"publish","type":"post","link":"https:\/\/mailinvest.blog\/index.php\/2022\/02\/22\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\/","title":{"rendered":"How To Build A Recommender System With TF-IDF And NMF (Python)"},"content":{"rendered":" <a href=\"https:\/\/go.fiverr.com\/visit\/?bta=1052423&nci=17043\" Target=\"_Top\"><img loading=\"lazy\" decoding=\"async\" border=\"0\" src=\"https:\/\/mailinvest.blog\/wp-content\/themes\/breek\/assets\/images\/transparent.gif\" data-lazy=\"true\" data-src=\"https:\/\/fiverr.ck-cdn.com\/tn\/serve\/?cid=40081059\"  width=\"601\" height=\"201\"><\/a>\r\n<br><div id=\"narrow-cont\"><p>Topic clusters and recommender systems can help SEO experts to build a scalable <a href=\"https:\/\/www.searchenginejournal.com\/site-structure-internal-linking-seo\/351576\/\">internal linking architecture<\/a>.<\/p>\n<p>And as we know, internal linking can impact both user experience and search rankings. It\u2019s an area we want to get right.<\/p>\n<p>In this article, we will use Wikipedia data to build topic clusters and recommender systems with Python and the Pandas data analysis tool.<\/p>\n<p>To achieve this, we will use the Scikit-learn library, a free software machine learning library for Python, with two main algorithms:<\/p>\n<!-- \/wp:post-content -->\n\n<!-- wp:list -->\n<ul><li><a href=\"https:\/\/www.searchenginejournal.com\/google-tf-idf\/304361\/\"><strong>TF-IDF<\/strong><\/a>: Term frequency-inverse document frequency.<\/li>&#13;\n<li><strong>NMF<\/strong>: Non-negative matrix factorization, which is a group of algorithms in multivariate analysis and linear algebra that can be used to analyze dimensional data.<\/li>&#13;\n<\/ul><!-- \/wp:list --><!-- wp:paragraph --><p>Specifically, we will:<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:list {\"ordered\":true} -->\n<ol><li>Extract all of the links from a Wikipedia article.<\/li>&#13;\n<li>Read text from Wikipedia articles.<\/li>&#13;\n<li>Create a TF-IDF map.<\/li>&#13;\n<li>Split queries into clusters.<\/li>&#13;\n<li>Build a recommender system.<\/li>&#13;\n<\/ol><!-- \/wp:list --><!-- wp:paragraph --><p>Here is an example of topic clusters that you will be able to build:\u00a0<\/p>\n<div style=\"width: 770px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" alt=\"example of a topic cluster in pandas\" width=\"760\" height=\"256\" data-srcset=\"\" src=\"https:\/\/mailinvest.blog\/wp-content\/themes\/breek\/assets\/images\/transparent.gif\" data-lazy=\"true\" data-src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2022\/02\/topic-cluster-61fa08c7e0873-sej-768x259.png\" class=\" b-lazy pcimg\"\/><span class=\"wp-caption-text\"><em>Screenshot from Pandas, February 2022<\/em><\/span><noscript><img decoding=\"async\" src=\"https:\/\/mailinvest.blog\/wp-content\/themes\/breek\/assets\/images\/transparent.gif\" data-lazy=\"true\" data-src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2022\/02\/topic-cluster-61fa08c7e0873-sej-768x259.png\" alt=\"example of a topic cluster in pandas\"\/><\/noscript><\/div>\n<!-- \/wp:paragraph -->\n\n<!-- wp:image {\"id\":10178,\"sizeSlug\":\"large\",\"linkDestination\":\"none\"} \/-->\n\n<!-- wp:paragraph -->\n<p>Moreover, here\u2019s the overview of the recommender system that you can recreate.<\/p>\n<div style=\"width: 770px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" alt=\"example of a recommender system in pandas\" width=\"760\" height=\"318\" data-srcset=\"\" src=\"https:\/\/mailinvest.blog\/wp-content\/themes\/breek\/assets\/images\/transparent.gif\" data-lazy=\"true\" data-src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2022\/02\/recommender-system-61fa091e399b4-sej-768x321.png\" class=\" b-lazy pcimg\"\/><span class=\"wp-caption-text\"><em>Screenshot from Pandas, February 2022<\/em><\/span><noscript><img decoding=\"async\" src=\"https:\/\/mailinvest.blog\/wp-content\/themes\/breek\/assets\/images\/transparent.gif\" data-lazy=\"true\" data-src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2022\/02\/recommender-system-61fa091e399b4-sej-768x321.png\" alt=\"example of a recommender system in pandas\"\/><\/noscript><\/div>\n<!-- \/wp:paragraph -->\n\n<!-- wp:image {\"id\":10186,\"sizeSlug\":\"large\",\"linkDestination\":\"none\"} \/-->\n\n<!-- wp:heading -->\n<p>Ready? Let\u2019s get a few definitions and concepts you\u2019ll want to know out of the way first.<\/p>\n<h2>The Difference Between Topic Clusters &amp; Recommender Systems<\/h2>\n<p>Topic clusters and recommender systems can be built in different ways.<\/p>\n<p>In this case, the former is grouped by IDF weights and the latter by cosine similarity.\u00a0<\/p>\n<p>In simple SEO terms:<\/p>\n<ul><li><strong>Topic clusters<\/strong> can help to create an architecture where all articles are linked to.<\/li>&#13;\n<li><strong>Recommender systems<\/strong> can help to create an architecture where the most relevant pages are linked to.<\/li>&#13;\n<\/ul><h3 id=\"what-is-tf-idf\">What Is TF-IDF?<\/h3>\n<!-- \/wp:heading -->\n\n<!-- wp:paragraph -->\n<p>TF-IDF, or term frequency-inverse document frequency, is a <span style=\"font-weight: 400;\">figure that expresses the statistical importance of any given word to the document collection as a whole.<\/span><\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph \/-->\n\n<!-- wp:paragraph -->\n<p>TF-IDF is calculated by multiplying term frequency and inverse document frequency.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:syntaxhighlighter\/code -->\n<pre>TF-IDF = TF * IDF<\/pre>\n<!-- \/wp:syntaxhighlighter\/code -->\n\n<!-- wp:list -->\n<ul><li><strong>TF<\/strong>: Number of times a word appears in a document\/number of words in the document.<\/li>&#13;\n<li><strong>IDF<\/strong>: log(Number of documents \/ Number of documents that contain the word).<\/li>&#13;\n<\/ul><!-- \/wp:list --><!-- wp:paragraph --><p>To illustrate this, let\u2019s consider this situation with <strong>Machine Learning<\/strong> as a target word:<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:list -->\n<ul><li>Document A contains the target word 10 times out of 100 words.<\/li>&#13;\n<li>In the entire corpus, 30 documents out of 200 documents also contain the target word.<\/li>&#13;\n<\/ul><p>Then, the formula would be:<\/p>\n<!-- \/wp:list -->\n\n<!-- wp:syntaxhighlighter\/code -->\n<pre>TF-IDF = (10\/100) * log(200\/30)<\/pre>\n<!-- \/wp:syntaxhighlighter\/code -->\n\n<!-- wp:heading -->\n<h3>What TF-IDF Is Not<\/h3>\n<p>TF-IDF is not something new. It\u2019s not something that you need to optimize for.\u00a0<\/p>\n<p>According to <a href=\"https:\/\/www.searchenginejournal.com\/google-tf-idf\/304361\/\">John Mueller<\/a>, it\u2019s an old information retrieval concept that isn\u2019t worth focusing on for SEO.<\/p>\n<p>There is nothing in it that will help you outperform your competitors.<\/p>\n<p>Still, TF-IDF can be useful to SEOs.<\/p>\n<p>Learning how TF-IDF works gives insight into how a computer can interpret human language.<\/p>\n<p>Consequently, one can leverage that understanding to improve the relevancy of the content using similar techniques.<\/p>\n<h3 id=\"what-is-non-negative-matrix-factorization-nmf\">What Is Non-negative Matrix Factorization (NMF)?<\/h3>\n<!-- \/wp:heading -->\n\n<!-- wp:paragraph -->\n<p>Non-negative matrix factorization, or NMF, is a dimension reduction technique often used in unsupervised learning that combines the product of non-negative features into a single one.<\/p>\n<p>In this article, NMF will be used to define the number of topics we want all the articles to be grouped under.<\/p>\n<h3>Definition Of Topic Clusters<\/h3>\n<p>Topic clusters are groupings of related terms that can help you create an architecture where all articles are interlinked or on the receiving end of internal links.<\/p>\n<h3>Definition Of Recommender Systems<\/h3>\n<p>Recommender systems can help to create an architecture where the most relevant pages are linked to.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:heading -->\n<h2>Building A Topic Cluster<\/h2>\n<p>Topic clusters and recommender systems can be built in different ways.<\/p>\n<p>In this case, topic clusters are grouped by IDF weights and the Recommender systems by cosine similarity.\u00a0<\/p>\n<h3 id=\"extract-all-the-links-from-a-specific-wikipedia-article\">Extract All The Links From A Specific Wikipedia Article<\/h3>\n<!-- \/wp:heading -->\n\n<!-- wp:paragraph -->\n<p>Extracting links on a Wikipedia page is done in two steps.<\/p>\n<p>First, select a specific subject. In this case, we use the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Machine_learning\" target=\"_blank\" rel=\"noopener\">Wikipedia article on machine learning<\/a>.<\/p>\n<p>Second, use the Wikipedia API to find all the internal links on the article.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph \/-->\n\n<!-- wp:paragraph \/-->\n\n<!-- wp:paragraph -->\n<p>Here is how to query the Wikipedia API using the Python requests library.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:syntaxhighlighter\/code {\"language\":\"python\"} -->\n<pre>import requests&#13;\n&#13;\nmain_subject=\"Machine learning\"&#13;\n&#13;\nurl=\"https:\/\/en.wikipedia.org\/w\/api.php\"&#13;\nparams = {&#13;\n        'action': 'query',&#13;\n        'format': 'json',&#13;\n        'generator':'links',&#13;\n        'titles': main_subject,&#13;\n        'prop':'pageprops',&#13;\n        'ppprop':'wikibase_item',&#13;\n        'gpllimit':1000,&#13;\n        'redirects':1&#13;\n        }&#13;\n&#13;\nr = requests.get(url, params=params)&#13;\nr_json = r.json()&#13;\nlinked_pages = r_json['query']['pages']&#13;\n&#13;\npage_titles = [p['title'] for p in linked_pages.values()]<\/pre>\n<!-- \/wp:syntaxhighlighter\/code -->\n\n<!-- wp:image {\"id\":10161,\"sizeSlug\":\"full\",\"linkDestination\":\"none\"} \/-->\n\n<!-- wp:paragraph -->\n<p>At last, the result is a list of all the pages linked from the initial article.<\/p>\n<dl><dt>&#13;\n<div style=\"width: 490px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" alt=\"all the pages linked\" width=\"480\" height=\"251\" data-srcset=\"\" src=\"https:\/\/mailinvest.blog\/wp-content\/themes\/breek\/assets\/images\/transparent.gif\" data-lazy=\"true\" data-src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2022\/02\/list-of-wikipedia-articles-61fa0f26d56f8-sej-480x251.png\" class=\" b-lazy pcimg\"\/><span class=\"wp-caption-text\"><em>Screenshot from Pandas, February 2022<\/em><\/span><noscript><img decoding=\"async\" src=\"https:\/\/mailinvest.blog\/wp-content\/themes\/breek\/assets\/images\/transparent.gif\" data-lazy=\"true\" data-src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2022\/02\/list-of-wikipedia-articles-61fa0f26d56f8-sej-480x251.png\" alt=\"all the pages linked\"\/><\/noscript><\/div>&#13;\n<\/dt>&#13;\n<\/dl><p>These links represent each of the entities used for the topic clusters.<\/p>\n<h3 style=\"text-align: left;\">Select A Subset Of Articles<\/h3>\n<p>For performance purposes, we will select only the first 200 articles (including the main article on machine learning).<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:syntaxhighlighter\/code {\"language\":\"python\"} -->\n<pre># select first X articles&#13;\nnum_articles = 200&#13;\npages = page_titles[:num_articles] &#13;\n&#13;\n# make sure to keep the main subject on the list&#13;\npages += [main_subject] &#13;\n&#13;\n# make sure there are no duplicates on the list&#13;\npages = list(set(pages))<\/pre>\n<!-- \/wp:syntaxhighlighter\/code -->\n\n<!-- wp:heading -->\n<h3 id=\"read-text-from-wikipedia-articles\">Read Text From The Wikipedia Articles<\/h3>\n<!-- \/wp:heading -->\n\n<!-- wp:paragraph -->\n<p>Now, we need to extract the content of each article\u00a0to perform the calculations for the\u00a0 TF-IDF analysis.<\/p>\n<p>To do so, we will fetch the API again for each of the pages stored in the pages variable.<\/p>\n<p>From each response, we will store the text from the page and add it to a list called text_db.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>Note that you may need to install tqdm and lxml packages to use them.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:syntaxhighlighter\/code {\"language\":\"python\"} -->\n<pre>import requests&#13;\nfrom lxml import html&#13;\nfrom tqdm.notebook import tqdm&#13;\n&#13;\ntext_db = []&#13;\nfor page in tqdm(pages):&#13;\n    response = requests.get(&#13;\n            'https:\/\/en.wikipedia.org\/w\/api.php',&#13;\n            params={&#13;\n                'action': 'parse',&#13;\n                'page': page,&#13;\n                'format': 'json',&#13;\n                'prop':'text',&#13;\n                'redirects':''&#13;\n            }&#13;\n        ).json()&#13;\n&#13;\n    raw_html = response['parse']['text']['*']&#13;\n    document = html.document_fromstring(raw_html)&#13;\n    text=\"\"&#13;\n    for p in document.xpath('\/\/p'):&#13;\n        text += p.text_content()&#13;\n    text_db.append(text)&#13;\nprint('Done')<\/pre>\n<!-- \/wp:syntaxhighlighter\/code -->\n\n<!-- wp:paragraph -->\n<p>This query will return a list in which each element represent the text of the corresponding Wikipedia page.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:syntaxhighlighter\/code {\"language\":\"python\"} -->\n<pre>## Print number of articles&#13;\nprint('Number of articles extracted: ', len(text_db))<\/pre>\n<!-- \/wp:syntaxhighlighter\/code -->\n\n<!-- wp:paragraph -->\n<p>Output:<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:syntaxhighlighter\/code -->\n<pre>Number of articles extracted:  201<\/pre>\n<!-- \/wp:syntaxhighlighter\/code -->\n\n<!-- wp:paragraph -->\n<p>As we can see, there are 201 articles.<\/p>\n<p>This is because we added the article on \u201cMachine learning\u201d on top of the top 200 links from that page.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph \/-->\n\n<!-- wp:paragraph -->\n<p>Furthermore, we can select the first article (index 0) and read the first 300 characters to gain a better understanding.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:syntaxhighlighter\/code {\"language\":\"python\"} -->\n<pre># read first 300 characters of 1st article&#13;\ntext_db[0][:300]<\/pre>\n<!-- \/wp:syntaxhighlighter\/code -->\n\n<!-- wp:paragraph -->\n<p>Output:<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:syntaxhighlighter\/code -->\n<pre>'\\nBiology is the  scientific study of life.[1][2][3] It is a natural science with a broad scope but has several unifying themes that tie it together as a single, coherent field.[1][2][3] For instance, all organisms are made up of  cells that process hereditary information encoded in genes, which can '<\/pre>\n<!-- \/wp:syntaxhighlighter\/code -->\n\n<!-- wp:heading -->\n<h3 id=\"create-a-tf-idf-map\">Create A TF-IDF Map<\/h3>\n<!-- \/wp:heading -->\n\n<!-- wp:paragraph -->\n<p>In this section, we will rely on pandas and TfidfVectorizer to create a Dataframe that contains the bi-grams (two consecutive words) of each article.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>Here, we are using TfidfVectorizer.<\/p>\n<p>This is the equivalent of using CountVectorizer followed by TfidfTransformer, which you may see in other tutorials.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>In addition, we need to remove the \u201cnoise\u201d. In the field of Natural Language Processing, words like \u201cthe\u201d, \u201ca\u201d, \u201cI\u201d, \u201cwe\u201d are called \u201cstopwords\u201d.<\/p>\n<p>In the English language, <a href=\"https:\/\/www.searchenginejournal.com\/google-bert-misinformation\/332931\/\">stopwords have low relevancy<\/a> for SEOs and are overrepresented in documents.<\/p>\n<p>Hence, using nltk, we will add a list of English stopwords to the TfidfVectorizer class.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:syntaxhighlighter\/code {\"language\":\"python\"} -->\n<pre>import pandas as pd&#13;\nfrom sklearn.feature_extraction.text import TfidfVectorizer&#13;\nfrom nltk.corpus import stopwords&#13;\n<br\/># Create a list of English stopwords&#13;\nstop_words = stopwords.words('english')&#13;\n<br\/># Instantiate the class&#13;\nvec = TfidfVectorizer(&#13;\n    stop_words=stop_words, &#13;\n    ngram_range=(2,2), # bigrams&#13;\n    use_idf=True&#13;\n    )&#13;\n<br\/># Train the model and transform the data&#13;\ntf_idf =  vec.fit_transform(text_db)&#13;\n<br\/># Create a pandas DataFrame&#13;\ndf = pd.DataFrame(&#13;\n    tf_idf.toarray(), &#13;\n    columns=vec.get_feature_names(), &#13;\n    index=pages&#13;\n    )&#13;\n<br\/># Show the first lines of the DataFrame  &#13;\ndf.head()<\/pre>\n<!-- \/wp:syntaxhighlighter\/code -->\n\n<!-- wp:image {\"id\":10157,\"sizeSlug\":\"large\",\"linkDestination\":\"none\"} \/-->\n\n<!-- wp:paragraph -->\n<div style=\"width: 770px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" alt=\"tfidf pandas result\" width=\"760\" height=\"274\" data-srcset=\"\" src=\"https:\/\/mailinvest.blog\/wp-content\/themes\/breek\/assets\/images\/transparent.gif\" data-lazy=\"true\" data-src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2022\/02\/tfidf-map-61fa0a5a54b34-sej-768x277.png\" class=\" b-lazy pcimg\"\/><span class=\"wp-caption-text\"><em>Screenshot from Pandas, February 2022<\/em><\/span><noscript><img decoding=\"async\" src=\"https:\/\/mailinvest.blog\/wp-content\/themes\/breek\/assets\/images\/transparent.gif\" data-lazy=\"true\" data-src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2022\/02\/tfidf-map-61fa0a5a54b34-sej-768x277.png\" alt=\"tfidf pandas result\"\/><\/noscript><\/div>\n<p>In the DataFrame above:<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:list -->\n<ul><li>Rows are the documents.<\/li>&#13;\n<li>Columns are the bi-grams (two consecutive words).<\/li>&#13;\n<li>The values are the word frequencies (tf-idf).<\/li>&#13;\n<\/ul><div style=\"width: 770px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" alt=\"word frequencies\" width=\"760\" height=\"163\" data-srcset=\"\" src=\"https:\/\/mailinvest.blog\/wp-content\/themes\/breek\/assets\/images\/transparent.gif\" data-lazy=\"true\" data-src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2022\/02\/word-frequencies-61fa0a8e30c29-sej-768x165.png\" class=\" b-lazy pcimg\"\/><span class=\"wp-caption-text\"><em>Screenshot from Pandas, February 2022<\/em><\/span><noscript><img decoding=\"async\" src=\"https:\/\/mailinvest.blog\/wp-content\/themes\/breek\/assets\/images\/transparent.gif\" data-lazy=\"true\" data-src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2022\/02\/word-frequencies-61fa0a8e30c29-sej-768x165.png\" alt=\"word frequencies\"\/><\/noscript><\/div>\n<h3>Sort The IDF Vectors<\/h3>\n<p>Below, we are sorting the Inverse document frequency vectors by relevance.<\/p>\n<!-- \/wp:list -->\n\n<!-- wp:image \/-->\n\n<!-- wp:syntaxhighlighter\/code {\"language\":\"python\"} -->\n<pre>idf_df = pd.DataFrame(&#13;\n    vec.idf_, &#13;\n    index=vec.get_feature_names(),&#13;\n    columns=['idf_weigths']&#13;\n    )&#13;\n    &#13;\nidf_df.sort_values(by=['idf_weigths']).head(10)<\/pre>\n<!-- \/wp:syntaxhighlighter\/code -->\n\n<!-- wp:image {\"id\":10159,\"sizeSlug\":\"full\",\"linkDestination\":\"none\"} \/-->\n\n<!-- wp:paragraph -->\n<div style=\"width: 770px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" alt=\"idf weights\" width=\"760\" height=\"331\" data-srcset=\"\" src=\"https:\/\/mailinvest.blog\/wp-content\/themes\/breek\/assets\/images\/transparent.gif\" data-lazy=\"true\" data-src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2022\/02\/idf_weights-61fa0ab2c1816-sej-768x334.png\" class=\" b-lazy pcimg\"\/><span class=\"wp-caption-text\"><em>Screenshot from Pandas, February 2022<\/em><\/span><noscript><img decoding=\"async\" src=\"https:\/\/mailinvest.blog\/wp-content\/themes\/breek\/assets\/images\/transparent.gif\" data-lazy=\"true\" data-src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2022\/02\/idf_weights-61fa0ab2c1816-sej-768x334.png\" alt=\"idf weights\"\/><\/noscript><\/div>\n<p>Specifically, the IDF vectors are calculated from the log of the number of articles divided by the number of articles containing each word.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>The greater the IDF, the more relevant it is to an article.<\/p>\n<p>The lower the IDF, the more common it is across all articles.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:list -->\n<ul><li>1 mention out of 1 articles = log(1\/1) = 0.0<\/li>&#13;\n<li>1 mention out of 2 articles = log(2\/1) = 0.69<\/li>&#13;\n<li>1 mention out of 10 articles = log(10\/1) = 2.30<\/li>&#13;\n<li>1 mention out of 100 articles = log(100\/1) = 4.61<\/li>&#13;\n<\/ul><!-- \/wp:list --><!-- wp:paragraph \/--><!-- wp:heading --><h3 id=\"split-queries-into-clusters-using-nmf\">Split Queries Into Clusters Using NMF<\/h3>\n<!-- \/wp:heading -->\n\n<!-- wp:paragraph -->\n<p>Using the tf_idf matrix, we will split queries into topical clusters.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>Each cluster will contain closely related bi-grams.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>Firstly, we will use NMF to reduce the dimensionality of the matrix into topics.<\/p>\n<p>Simply put, we will group 201 articles into 25 topics.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:syntaxhighlighter\/code {\"language\":\"python\"} -->\n<pre>from sklearn.decomposition import NMF&#13;\nfrom sklearn.preprocessing import normalize&#13;\n&#13;\n# (optional) Disable FutureWarning of Scikit-learn&#13;\nfrom warnings import simplefilter&#13;\nsimplefilter(action='ignore', category=FutureWarning)&#13;\n&#13;\n# select number of topic clusters&#13;\nn_topics = 25&#13;\n&#13;\n# Create an NMF instance&#13;\nnmf = NMF(n_components=n_topics)&#13;\n&#13;\n# Fit the model to the tf_idf&#13;\nnmf_features = nmf.fit_transform(tf_idf)&#13;\n&#13;\n# normalize the features&#13;\nnorm_features = normalize(nmf_features)<\/pre>\n<!-- \/wp:syntaxhighlighter\/code -->\n\n<!-- wp:paragraph -->\n<p>We can see that the number of bigrams stays the same, but articles are grouped into topics.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:syntaxhighlighter\/code {\"language\":\"python\"} -->\n<pre># Compare processed VS unprocessed dataframes&#13;\nprint('Original df: ', df.shape)&#13;\nprint('NMF Processed df: ', nmf.components_.shape)<\/pre>\n<!-- \/wp:syntaxhighlighter\/code -->\n\n<!-- wp:paragraph -->\n<p>Secondly, for each of the 25 clusters, we will provide query recommendations.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:syntaxhighlighter\/code -->\n<pre># Create clustered dataframe the NMF clustered df&#13;\ncomponents = pd.DataFrame(&#13;\n    nmf.components_, &#13;\n    columns=[df.columns]&#13;\n    ) &#13;\n&#13;\nclusters = {}&#13;\n&#13;\n# Show top 25 queries for each cluster&#13;\nfor i in range(len(components)):&#13;\n    clusters[i] = []&#13;\n    loop = dict(components.loc[i,:].nlargest(25)).items()&#13;\n    for k,v in loop:&#13;\n        clusters[i].append({'q':k[0],'sim_score': v})<\/pre>\n<!-- \/wp:syntaxhighlighter\/code -->\n\n<!-- wp:image {\"id\":10178,\"sizeSlug\":\"large\",\"linkDestination\":\"none\"} \/-->\n\n<!-- wp:heading -->\n<p>Thirdly, we will\u00a0create a data frame that shows the recommendations.<\/p>\n<pre># Create dataframe using the clustered dictionary&#13;\ngrouping = pd.DataFrame(clusters).T&#13;\ngrouping['topic'] = grouping[0].apply(lambda x: x['q'])&#13;\ngrouping.drop(0, axis=1, inplace=True)&#13;\ngrouping.set_index('topic', inplace=True)&#13;\n&#13;\ndef show_queries(df):&#13;\n    for col in df.columns:&#13;\n        df[col] = df[col].apply(lambda x: x['q'])&#13;\n    return df&#13;\n&#13;\n# Only display the query in the dataframe&#13;\nclustered_queries = show_queries(grouping)&#13;\nclustered_queries.head()<\/pre>\n\n<p>Finally, the result is a DataFrame showing 25 topics along with the top 25 bigrams for each topic.<\/p>\n<div style=\"width: 770px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" alt=\"example of a topic cluster in pandas\" width=\"760\" height=\"256\" data-srcset=\"\" src=\"https:\/\/mailinvest.blog\/wp-content\/themes\/breek\/assets\/images\/transparent.gif\" data-lazy=\"true\" data-src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2022\/02\/topic-cluster-61fa08c7e0873-sej-768x259.png\" class=\" b-lazy pcimg\"\/><span class=\"wp-caption-text\"><em>Screenshot from Pandas, February 2022<\/em><\/span><noscript><img decoding=\"async\" src=\"https:\/\/mailinvest.blog\/wp-content\/themes\/breek\/assets\/images\/transparent.gif\" data-lazy=\"true\" data-src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2022\/02\/topic-cluster-61fa08c7e0873-sej-768x259.png\" alt=\"example of a topic cluster in pandas\"\/><\/noscript><\/div>\n<h2 id=\"build-recommender-system\">Building A Recommender System<\/h2>\n<!-- \/wp:heading -->\n\n<!-- wp:paragraph \/-->\n\n<!-- wp:paragraph -->\n<p>Now, instead of building topic clusters, we will now build a recommender system using the same normalized features from the previous step.<\/p>\n<p>The normalized features are stored in the norm_features variable.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:syntaxhighlighter\/code {\"language\":\"python\"} -->\n<pre># compute cosine similarities of each cluster&#13;\ndata = {}&#13;\n# create dataframe&#13;\nnorm_df = pd.DataFrame(norm_features, index=pages)&#13;\nfor page in pages:&#13;\n    # select page recommendations&#13;\n    recommendations = norm_df.loc[page,:]&#13;\n&#13;\n    # Compute cosine similarity&#13;\n    similarities = norm_df.dot(recommendations)&#13;\n&#13;\n    data[page] = []&#13;\n    loop = dict(similarities.nlargest(20)).items()&#13;\n    for k, v in loop:&#13;\n        if k != page:&#13;\n            data[page].append({'q':k,'sim_score': v})<\/pre>\n<!-- \/wp:syntaxhighlighter\/code -->\n\n<!-- wp:paragraph -->\n<p>What the code above does is:<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:list -->\n<ul><li>Loops through each of the pages selected at the start.<\/li>&#13;\n<li>Selects the corresponding row in the normalized dataframe.<\/li>&#13;\n<li>Computes the cosine similarity of all the bigram queries.<\/li>&#13;\n<li>Selects the top 20 queries sorted by similarity score.<\/li>&#13;\n<\/ul><p>After the execution, we are left with a dictionary of pages containing lists of recommendations sorted by similarity score.<\/p>\n<!-- \/wp:list -->\n\n<!-- wp:image {\"id\":10181,\"sizeSlug\":\"full\",\"linkDestination\":\"none\"} \/-->\n\n<!-- wp:paragraph -->\n<div style=\"width: 770px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" alt=\"similarity score\" width=\"760\" height=\"318\" data-srcset=\"\" src=\"https:\/\/mailinvest.blog\/wp-content\/themes\/breek\/assets\/images\/transparent.gif\" data-lazy=\"true\" data-src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2022\/02\/sim-score-61fa0bfe2c1ba-sej-768x321.png\" class=\" b-lazy pcimg\"\/><span class=\"wp-caption-text\"><em>Screenshot from Pandas, February 2022<\/em><\/span><noscript><img decoding=\"async\" src=\"https:\/\/mailinvest.blog\/wp-content\/themes\/breek\/assets\/images\/transparent.gif\" data-lazy=\"true\" data-src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2022\/02\/sim-score-61fa0bfe2c1ba-sej-768x321.png\" alt=\"similarity score\"\/><\/noscript><\/div>\n<p>The next step is to convert that dictionary into a DataFrame.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:syntaxhighlighter\/code {\"language\":\"python\"} -->\n<pre># convert dictionary to dataframe&#13;\nrecommender = pd.DataFrame(data).T&#13;\n&#13;\ndef show_queries(df):&#13;\n    for col in df.columns:&#13;\n        df[col] = df[col].apply(lambda x: x['q'])&#13;\n    return df&#13;\n&#13;\nshow_queries(recommender).head()<\/pre>\n<!-- \/wp:syntaxhighlighter\/code -->\n\n<!-- wp:image {\"id\":10185,\"sizeSlug\":\"large\",\"linkDestination\":\"none\"} -->\n<p>The resulting DataFrame shows the parent query along with sorted recommended topics in each column.<\/p>\n<div style=\"width: 770px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" alt=\"example of a recommender system in pandas\" width=\"760\" height=\"318\" data-srcset=\"\" src=\"https:\/\/mailinvest.blog\/wp-content\/themes\/breek\/assets\/images\/transparent.gif\" data-lazy=\"true\" data-src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2022\/02\/recommender-system-61fa091e399b4-sej-768x321.png\" class=\" b-lazy pcimg\"\/><span class=\"wp-caption-text\"><em>Screenshot from Pandas, February 2022<\/em><\/span><noscript><img decoding=\"async\" src=\"https:\/\/mailinvest.blog\/wp-content\/themes\/breek\/assets\/images\/transparent.gif\" data-lazy=\"true\" data-src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2022\/02\/recommender-system-61fa091e399b4-sej-768x321.png\" alt=\"example of a recommender system in pandas\"\/><\/noscript><\/div>\n<p>Voil\u00e0!<\/p>\n<p>We are done building our own recommender system and topic cluster.<\/p>\n<h2>Interesting Contributions From The SEO Community<\/h2>\n<p>I am a big fan of Daniel Heredia, who has also played around with TF-IDF by <a href=\"https:\/\/www.danielherediamejias.com\/find-your-main-relevant-words-with-tf-idf-and-python\/\" target=\"_blank\" rel=\"noopener\">finding relevant words with TF IDF, textblob, and Python<\/a>.<\/p>\n<p>Python tutorials can be daunting.<\/p>\n<p>A single article may not be enough.<\/p>\n<p>If that is the case, I encourage you to read <a href=\"https:\/\/www.holisticseo.digital\/python-seo\/tf-idf-analyse\/\" target=\"_blank\" rel=\"noopener\">Koray Tu\u011fberk G\u00dcB\u00dcR\u2019s tutorial<\/a>, which exposes a similar way to use TF-IDF.<\/p>\n<p>Billy Bonaros also came up with a creative application of TF-IDF in Python and showed <a href=\"https:\/\/predictivehacks.com\/how-to-create-a-powerful-tf-idf-keyword-research-tool\/\" target=\"_blank\" rel=\"noopener\">how to create a TF-IDF keyword research tool<\/a>.<\/p>\n<h2>Conclusion<\/h2>\n<p>In the end, I hope you have learned a logic here that can be adapted to any website.<\/p>\n<p>Understanding how topic clusters and recommender systems can help improve a website\u2019s architecture is a valuable skill for any SEO pro wishing to scale your work.<\/p>\n<p>Using Python and Scikit-learn, you have learned how to build your own \u2013 and have learned the basics of TF-IDF and of non-negative matrix factorization in the process.<\/p>\n<p><strong>More resources:<\/strong><\/p>\n<hr\/><p><em>Featured Image: Kateryna Reka\/Shutterstock<\/em><\/p>\n<!-- \/wp:image -->\n\n<!-- wp:heading \/-->\n<\/div>\r\n<br><iframe data-lazy=\"true\" data-src=\"https:\/\/www.fiverr.com\/gig_widgets?id=U2FsdGVkX18x7XQvttUTrv1oEqmGNGTgvvCUiUoJ\/AP4z\/UyMz8lXGOLpu15jIMxBbTR0gmD5uBoFvhC4KWeALQRp3h\/X\/AwcVD0K8Wj9H\/ZzYKzcCNHosB9oS4SCJJFWiN85P9ICAc4OgCoE\/wHKIY7CDkf2\/DQ1vqGvk4smVe5cRDEmrLPCWi4FC8p40VUhSmWQ5udCm0zoJtorgWv3vbDQw0kKYkwn39ozAnQXDe+YvWMxkLFWA+O3TFwkJvdkIK+\/AUSnRssPKt5WHY0FhNOxnSPcLslEL4G4\/RfP95ve99U+kRnDy3X+KtzdQLY+u935ghON\/o3UE4IMv9oN6JX9RnxzL\/LRcOgnHigxStSGPKsZYtnz8RWNVT\/rOLAibqiWJadC5MYHRbekF3eg6FOGrQGkXYbsn0+a5aovnlLCbLwIqY9fcS17UX8J235iQ6cdmHNbrPeS84CMm34RA==&affiliate_id=1052423&strip_google_tagmanager=true\" loading=\"lazy\" data-with-title=\"true\" class=\"fiverr_nga_frame\" frameborder=\"0\" height=\"350\" width=\"100%\" referrerpolicy=\"no-referrer-when-downgrade\" data-mode=\"random_gigs\" onload=\" var frame = this; var script = document.createElement('script'); script.addEventListener('load', function() { window.FW_SDK.register(frame); }); script.setAttribute('src', 'https:\/\/www.fiverr.com\/gig_widgets\/sdk'); document.body.appendChild(script); \" ><\/iframe>\r\n<br><a href=\"https:\/\/www.searchenginejournal.com\/topic-clusters-recommender-system\/436123\/\">Source link <\/a>","protected":false},"excerpt":{"rendered":"<p>Topic clusters and recommender systems can help SEO experts to build a scalable internal linking architecture. And as we know, internal linking can impact both&#8230;<\/p>\n","protected":false},"author":1,"featured_media":15903,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[],"class_list":["post-15902","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tech-universe"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>How To Build A Recommender System With TF-IDF And NMF (Python) - mailinvest.blog<\/title>\n<meta name=\"description\" content=\"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis.mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what&#039;s new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/mailinvest.blog\/index.php\/2022\/02\/22\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How To Build A Recommender System With TF-IDF And NMF (Python) - mailinvest.blog\" \/>\n<meta property=\"og:description\" content=\"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis.mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what&#039;s new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/mailinvest.blog\/index.php\/2022\/02\/22\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\/\" \/>\n<meta property=\"og:site_name\" content=\"mailinvest.blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/freelanceracademic\/\" \/>\n<meta property=\"article:published_time\" content=\"2022-02-22T13:51:04+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/mailinvest.blog\/wp-content\/uploads\/2022\/02\/wikipedia-recommender-system-61fbdea205b07-sej.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1600\" \/>\n\t<meta property=\"og:image:height\" content=\"840\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"admin@mailinvest.blog\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin@mailinvest.blog\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"11 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2022\\\/02\\\/22\\\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2022\\\/02\\\/22\\\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\\\/\"},\"author\":{\"name\":\"admin@mailinvest.blog\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#\\\/schema\\\/person\\\/012701c4c204d4e4ebd34f926cfd31a4\"},\"headline\":\"How To Build A Recommender System With TF-IDF And NMF (Python)\",\"datePublished\":\"2022-02-22T13:51:04+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2022\\\/02\\\/22\\\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\\\/\"},\"wordCount\":1511,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2022\\\/02\\\/22\\\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/mailinvest.blog\\\/wp-content\\\/uploads\\\/2022\\\/02\\\/wikipedia-recommender-system-61fbdea205b07-sej.png\",\"articleSection\":[\"Tech Universe\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2022\\\/02\\\/22\\\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2022\\\/02\\\/22\\\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\\\/\",\"url\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2022\\\/02\\\/22\\\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\\\/\",\"name\":\"How To Build A Recommender System With TF-IDF And NMF (Python) - mailinvest.blog\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2022\\\/02\\\/22\\\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2022\\\/02\\\/22\\\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/mailinvest.blog\\\/wp-content\\\/uploads\\\/2022\\\/02\\\/wikipedia-recommender-system-61fbdea205b07-sej.png\",\"datePublished\":\"2022-02-22T13:51:04+00:00\",\"description\":\"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis.mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what's new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2022\\\/02\\\/22\\\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2022\\\/02\\\/22\\\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2022\\\/02\\\/22\\\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\\\/#primaryimage\",\"url\":\"https:\\\/\\\/mailinvest.blog\\\/wp-content\\\/uploads\\\/2022\\\/02\\\/wikipedia-recommender-system-61fbdea205b07-sej.png\",\"contentUrl\":\"https:\\\/\\\/mailinvest.blog\\\/wp-content\\\/uploads\\\/2022\\\/02\\\/wikipedia-recommender-system-61fbdea205b07-sej.png\",\"width\":1600,\"height\":840},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2022\\\/02\\\/22\\\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/mailinvest.blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How To Build A Recommender System With TF-IDF And NMF (Python)\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#website\",\"url\":\"https:\\\/\\\/mailinvest.blog\\\/\",\"name\":\"mailinvest.blog\",\"description\":\"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis. mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what&#039;s new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.\",\"publisher\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/mailinvest.blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#organization\",\"name\":\"mailinvest\",\"url\":\"https:\\\/\\\/mailinvest.blog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/mailinvest.blog\\\/wp-content\\\/uploads\\\/2022\\\/01\\\/default.png\",\"contentUrl\":\"https:\\\/\\\/mailinvest.blog\\\/wp-content\\\/uploads\\\/2022\\\/01\\\/default.png\",\"width\":1000,\"height\":1000,\"caption\":\"mailinvest\"},\"image\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/freelanceracademic\\\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#\\\/schema\\\/person\\\/012701c4c204d4e4ebd34f926cfd31a4\",\"name\":\"admin@mailinvest.blog\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/98ed217bd0f3d6a6dcae2d9b0c76e305b049a07275e315e1407e19ec8b08e139?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/98ed217bd0f3d6a6dcae2d9b0c76e305b049a07275e315e1407e19ec8b08e139?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/98ed217bd0f3d6a6dcae2d9b0c76e305b049a07275e315e1407e19ec8b08e139?s=96&d=mm&r=g\",\"caption\":\"admin@mailinvest.blog\"},\"sameAs\":[\"https:\\\/\\\/mailinvest.blog\",\"admin@mailinvest.blog\"],\"url\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/author\\\/adminmailinvest-blog\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How To Build A Recommender System With TF-IDF And NMF (Python) - mailinvest.blog","description":"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis.mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what's new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/mailinvest.blog\/index.php\/2022\/02\/22\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\/","og_locale":"en_US","og_type":"article","og_title":"How To Build A Recommender System With TF-IDF And NMF (Python) - mailinvest.blog","og_description":"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis.mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what's new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.","og_url":"https:\/\/mailinvest.blog\/index.php\/2022\/02\/22\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\/","og_site_name":"mailinvest.blog","article_publisher":"https:\/\/www.facebook.com\/freelanceracademic\/","article_published_time":"2022-02-22T13:51:04+00:00","og_image":[{"width":1600,"height":840,"url":"https:\/\/mailinvest.blog\/wp-content\/uploads\/2022\/02\/wikipedia-recommender-system-61fbdea205b07-sej.png","type":"image\/png"}],"author":"admin@mailinvest.blog","twitter_card":"summary_large_image","twitter_misc":{"Written by":"admin@mailinvest.blog","Est. reading time":"11 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/mailinvest.blog\/index.php\/2022\/02\/22\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\/#article","isPartOf":{"@id":"https:\/\/mailinvest.blog\/index.php\/2022\/02\/22\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\/"},"author":{"name":"admin@mailinvest.blog","@id":"https:\/\/mailinvest.blog\/#\/schema\/person\/012701c4c204d4e4ebd34f926cfd31a4"},"headline":"How To Build A Recommender System With TF-IDF And NMF (Python)","datePublished":"2022-02-22T13:51:04+00:00","mainEntityOfPage":{"@id":"https:\/\/mailinvest.blog\/index.php\/2022\/02\/22\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\/"},"wordCount":1511,"commentCount":0,"publisher":{"@id":"https:\/\/mailinvest.blog\/#organization"},"image":{"@id":"https:\/\/mailinvest.blog\/index.php\/2022\/02\/22\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\/#primaryimage"},"thumbnailUrl":"https:\/\/mailinvest.blog\/wp-content\/uploads\/2022\/02\/wikipedia-recommender-system-61fbdea205b07-sej.png","articleSection":["Tech Universe"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/mailinvest.blog\/index.php\/2022\/02\/22\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/mailinvest.blog\/index.php\/2022\/02\/22\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\/","url":"https:\/\/mailinvest.blog\/index.php\/2022\/02\/22\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\/","name":"How To Build A Recommender System With TF-IDF And NMF (Python) - mailinvest.blog","isPartOf":{"@id":"https:\/\/mailinvest.blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/mailinvest.blog\/index.php\/2022\/02\/22\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\/#primaryimage"},"image":{"@id":"https:\/\/mailinvest.blog\/index.php\/2022\/02\/22\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\/#primaryimage"},"thumbnailUrl":"https:\/\/mailinvest.blog\/wp-content\/uploads\/2022\/02\/wikipedia-recommender-system-61fbdea205b07-sej.png","datePublished":"2022-02-22T13:51:04+00:00","description":"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis.mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what's new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.","breadcrumb":{"@id":"https:\/\/mailinvest.blog\/index.php\/2022\/02\/22\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/mailinvest.blog\/index.php\/2022\/02\/22\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/mailinvest.blog\/index.php\/2022\/02\/22\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\/#primaryimage","url":"https:\/\/mailinvest.blog\/wp-content\/uploads\/2022\/02\/wikipedia-recommender-system-61fbdea205b07-sej.png","contentUrl":"https:\/\/mailinvest.blog\/wp-content\/uploads\/2022\/02\/wikipedia-recommender-system-61fbdea205b07-sej.png","width":1600,"height":840},{"@type":"BreadcrumbList","@id":"https:\/\/mailinvest.blog\/index.php\/2022\/02\/22\/how-to-build-a-recommender-system-with-tf-idf-and-nmf-python\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/mailinvest.blog\/"},{"@type":"ListItem","position":2,"name":"How To Build A Recommender System With TF-IDF And NMF (Python)"}]},{"@type":"WebSite","@id":"https:\/\/mailinvest.blog\/#website","url":"https:\/\/mailinvest.blog\/","name":"mailinvest.blog","description":"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis. mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what&#039;s new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.","publisher":{"@id":"https:\/\/mailinvest.blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/mailinvest.blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/mailinvest.blog\/#organization","name":"mailinvest","url":"https:\/\/mailinvest.blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/mailinvest.blog\/#\/schema\/logo\/image\/","url":"https:\/\/mailinvest.blog\/wp-content\/uploads\/2022\/01\/default.png","contentUrl":"https:\/\/mailinvest.blog\/wp-content\/uploads\/2022\/01\/default.png","width":1000,"height":1000,"caption":"mailinvest"},"image":{"@id":"https:\/\/mailinvest.blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/freelanceracademic\/"]},{"@type":"Person","@id":"https:\/\/mailinvest.blog\/#\/schema\/person\/012701c4c204d4e4ebd34f926cfd31a4","name":"admin@mailinvest.blog","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/98ed217bd0f3d6a6dcae2d9b0c76e305b049a07275e315e1407e19ec8b08e139?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/98ed217bd0f3d6a6dcae2d9b0c76e305b049a07275e315e1407e19ec8b08e139?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/98ed217bd0f3d6a6dcae2d9b0c76e305b049a07275e315e1407e19ec8b08e139?s=96&d=mm&r=g","caption":"admin@mailinvest.blog"},"sameAs":["https:\/\/mailinvest.blog","admin@mailinvest.blog"],"url":"https:\/\/mailinvest.blog\/index.php\/author\/adminmailinvest-blog\/"}]}},"_links":{"self":[{"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/posts\/15902","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/comments?post=15902"}],"version-history":[{"count":0,"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/posts\/15902\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/media\/15903"}],"wp:attachment":[{"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/media?parent=15902"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/categories?post=15902"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/tags?post=15902"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}