{"id":40444,"date":"2023-02-10T12:44:19","date_gmt":"2023-02-10T12:44:19","guid":{"rendered":"https:\/\/mailinvest.blog\/index.php\/2023\/02\/10\/google-bard-ai-what-sites-were-used-to-train-it\/"},"modified":"2023-02-10T12:45:15","modified_gmt":"2023-02-10T12:45:15","slug":"google-bard-ai-what-sites-were-used-to-train-it","status":"publish","type":"post","link":"https:\/\/mailinvest.blog\/index.php\/2023\/02\/10\/google-bard-ai-what-sites-were-used-to-train-it\/","title":{"rendered":"Google Bard AI &#8211; What Sites Were Used To Train It?"},"content":{"rendered":"<p> <a href=\"https:\/\/go.fiverr.com\/visit\/?bta=1052423&nci=17043\" Target=\"_Top\"><img loading=\"lazy\" decoding=\"async\" border=\"0\" src=\"https:\/\/mailinvest.blog\/wp-content\/themes\/breek\/assets\/images\/transparent.gif\" data-lazy=\"true\" data-src=\"https:\/\/fiverr.ck-cdn.com\/tn\/serve\/?cid=40081059\"  width=\"601\" height=\"201\"><\/a>\n<\/p>\n<div id=\"narrow-cont\">\n<p>Google\u2019s Bard is predicated on the LaMDA language mannequin, skilled on datasets primarily based on Web content material referred to as Infiniset of which little or no is thought about the place the information got here from and the way they received it.<\/p>\n<p>The 2022 LaMDA analysis paper lists percentages of various varieties of knowledge used to coach LaMDA, however solely 12.5% comes from a public dataset of crawled content material from the online and one other 12.5% comes from Wikipedia.<\/p>\n<p>Google is purposely obscure about the place the remainder of the scraped knowledge comes from however there are hints of what websites are in these datasets.<\/p>\n<h2>Google\u2019s Infiniset Dataset<\/h2>\n<p>Google Bard is predicated on a language mannequin referred to as LaMDA, which is an acronym for <em>Language Mannequin for Dialogue Purposes<\/em>.<\/p>\n<p>LaMDA was skilled on a dataset referred to as Infiniset.<\/p>\n<p>Infiniset is a mix of Web content material that was intentionally chosen to reinforce the mannequin\u2019s potential to interact in dialogue.<\/p>\n<p><strong>The LaMDA analysis paper (<a href=\"https:\/\/arxiv.org\/pdf\/2201.08239.pdf\" target=\"_blank\" rel=\"noopener\">PDF<\/a>) explains why they selected this composition of content material:<\/strong><\/p>\n<blockquote>\n<p>\u201c\u2026this composition was chosen to realize a extra sturdy efficiency on dialog duties \u2026whereas nonetheless preserving its potential to carry out different duties like code technology.<\/p>\n<p>As future work, we will research how the selection of this composition could have an effect on the standard of a number of the different NLP duties carried out by the mannequin.\u201d<\/p>\n<\/blockquote>\n<p>The analysis paper makes reference to <em>dialog<\/em> and<em> dialogs<\/em>, which is the spelling of the phrases used on this context, inside the realm of laptop science.<\/p>\n<p>In complete, LaMDA was pre-trained on 1.56 trillion phrases of \u201c<em>public dialog knowledge and net textual content<\/em>.\u201d<\/p>\n<p><strong>The dataset is comprised of the next combine:<\/strong><\/p>\n<ul>\n<li>12.5% C4-based knowledge<\/li>\n<li>12.5% English language Wikipedia<\/li>\n<li>12.5% code paperwork from programming Q&amp;A web sites, tutorials, and others<\/li>\n<li>6.25% English net paperwork<\/li>\n<li>6.25% Non-English net paperwork<\/li>\n<li>50% dialogs knowledge from public boards<\/li>\n<\/ul>\n<p>The primary two elements of Infiniset (C4 and Wikipedia) is comprised of knowledge that&#8217;s identified.<\/p>\n<p>The C4 dataset, which will likely be explored shortly, is a specifically filtered model of the Frequent Crawl dataset.<\/p>\n<p>Solely 25% of the information is from a named supply (the <em>C4<\/em> dataset and <em>Wikipedia<\/em>).<\/p>\n<p>The remainder of the information that makes up the majority of the Infiniset dataset, 75%, consists of phrases that have been scraped from the Web.<\/p>\n<p>The analysis paper doesn\u2019t say how the information was obtained from web sites, what web sites it was obtained from or some other particulars in regards to the scraped content material.<\/p>\n<p>Google solely makes use of generalized descriptions like \u201cNon-English net paperwork.\u201d<\/p>\n<p>The phrase \u201cmurky\u201d means when one thing will not be defined and is generally hid.<\/p>\n<p>Murky is the perfect phrase for describing the 75% of knowledge that Google used for coaching LaMDA.<\/p>\n<p>There are some clues that <em>could give a normal thought<\/em> of what websites are contained inside the 75% of net content material, however we will\u2019t know for sure.<\/p>\n<h2>C4 Dataset<\/h2>\n<p>C4 is a dataset developed by Google in 2020. C4 stands for \u201c<em>Colossal Clear Crawled Corpus<\/em>.\u201d<\/p>\n<p>This dataset is predicated on the Frequent Crawl knowledge, which is an open-source dataset.<\/p>\n<h3>About Frequent Crawl<\/h3>\n<p><a href=\"https:\/\/commoncrawl.org\/\" target=\"_blank\" rel=\"noopener\">Common Crawl<\/a> is a registered non-profit group that crawls the Web on a month-to-month foundation to create free datasets that anybody can use.<\/p>\n<p>The Frequent Crawl group is at present run by individuals who have labored for the Wikimedia Basis, former Googlers, a founding father of Blekko, and rely as advisors individuals like Peter Norvig, Director of Analysis at Google and Danny Sullivan (additionally of Google).<\/p>\n<h3>How C4 is Developed From Frequent Crawl<\/h3>\n<p>The uncooked Frequent Crawl knowledge is cleaned up by eradicating issues like skinny content material, obscene phrases, lorem ipsum, navigational menus, deduplication, and so on. so as to restrict the dataset to the principle content material.<\/p>\n<p>The purpose of filtering out pointless knowledge was to take away gibberish and retain examples of pure English.<\/p>\n<p><strong>That is what the researchers who created C4 wrote:<\/strong><\/p>\n<blockquote>\n<p>\u201cTo assemble our base knowledge set, we downloaded the online extracted textual content from April 2019 and utilized the aforementioned filtering.<\/p>\n<p>This produces a group of textual content that&#8217;s not solely orders of magnitude bigger than most knowledge units used for pre-training (about 750 GB) but additionally contains fairly clear and pure English textual content.<\/p>\n<p>We dub this knowledge set the \u201cColossal Clear Crawled Corpus\u201d (or C4 for brief) and launch it as a part of TensorFlow Datasets\u2026\u201d<\/p>\n<\/blockquote>\n<p>There are different unfiltered variations of C4 as nicely.<\/p>\n<p>The analysis paper that describes the C4 dataset is titled, <a href=\"https:\/\/arxiv.org\/pdf\/1910.10683.pdf\" target=\"_blank\" rel=\"noopener\">Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (PDF)<\/a>.<\/p>\n<p>One other analysis paper from 2021, (<a href=\"https:\/\/arxiv.org\/pdf\/2104.08758.pdf\" target=\"_blank\" rel=\"noopener\">Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus \u2013 PDF<\/a>) examined the make-up of the websites included within the C4 dataset.<\/p>\n<p>Apparently, the second analysis paper found anomalies within the unique C4 dataset that resulted within the removing of webpages that have been Hispanic and African American aligned.<\/p>\n<p>Hispanic aligned webpages have been eliminated by the blocklist filter (swear phrases, and so on.) on the charge of 32% of pages.<\/p>\n<p>African American aligned webpages have been eliminated on the charge of 42%.<\/p>\n<p>Presumably these shortcomings have been addressed\u2026<\/p>\n<p>One other discovering was that 51.3% of the C4 dataset consisted of webpages that have been hosted in the USA.<\/p>\n<p>Lastly, the 2021 evaluation of the unique C4 dataset acknowledges that the dataset represents only a fraction of the overall Web.<\/p>\n<p><strong>The evaluation states:<\/strong><\/p>\n<blockquote>\n<p>\u201cOur evaluation exhibits that whereas this dataset represents a major fraction of a scrape of the general public web, it&#8217;s on no account consultant of English-speaking world, and it spans a variety of years.<\/p>\n<p>When constructing a dataset from a scrape of the online, reporting the domains the textual content is scraped from is integral to understanding the dataset; the information assortment course of can result in a considerably completely different distribution of web domains than one would count on.\u201d<\/p>\n<\/blockquote>\n<p>The next statistics in regards to the C4 dataset are from the second analysis paper that&#8217;s linked above.<\/p>\n<p><strong>The highest 25 web sites (by variety of tokens) in C4 are:<\/strong><\/p>\n<ol>\n<li>patents.google.com<\/li>\n<li>en.wikipedia.org<\/li>\n<li>en.m.wikipedia.org<\/li>\n<li>www.nytimes.com<\/li>\n<li>www.latimes.com<\/li>\n<li>www.theguardian.com<\/li>\n<li>journals.plos.org<\/li>\n<li>www.forbes.com<\/li>\n<li>www.huffpost.com<\/li>\n<li>patents.com<\/li>\n<li>www.scribd.com<\/li>\n<li>www.washingtonpost.com<\/li>\n<li>www.idiot.com<\/li>\n<li>ipfs.io<\/li>\n<li>www.frontiersin.org<\/li>\n<li>www.businessinsider.com<\/li>\n<li>www.chicagotribune.com<\/li>\n<li>www.reserving.com<\/li>\n<li>www.theatlantic.com<\/li>\n<li>hyperlink.springer.com<\/li>\n<li>www.aljazeera.com<\/li>\n<li>www.kickstarter.com<\/li>\n<li>caselaw.findlaw.com<\/li>\n<li>www.ncbi.nlm.nih.gov<\/li>\n<li>www.npr.org<\/li>\n<\/ol>\n<p><strong>These are the highest 25 represented high stage domains within the C4 dataset:<\/strong><\/p>\n<div id=\"attachment_478956\" style=\"width: 667px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-478956 size-full b-lazy pcimg\" alt=\"Google Bard AI &amp;#8211; What Sites Were Used To Train It?\" width=\"657\" height=\"612\" data-sizes=\"auto, (max-width: 657px) 100vw, 657px\" data-srcset=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2023\/02\/c4-top-level-domains-63e4d64f81295-sej.png 657w, https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2023\/02\/c4-top-level-domains-63e4d64f81295-sej-480x447.png 480w\" src=\"https:\/\/mailinvest.blog\/wp-content\/themes\/breek\/assets\/images\/transparent.gif\" data-lazy=\"true\" data-src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2023\/02\/c4-top-level-domains-63e4d64f81295-sej.png\"\/><span class=\"wp-caption-text\">Screenshot from<em> Documenting Massive Webtext Corpora: A Case Research on the Colossal Clear Crawled Corpus<\/em><\/span><noscript><img decoding=\"async\" src=\"https:\/\/mailinvest.blog\/wp-content\/themes\/breek\/assets\/images\/transparent.gif\" data-lazy=\"true\" data-src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2023\/02\/c4-top-level-domains-63e4d64f81295-sej.png\" alt=\"Google Bard AI &amp;#8211; What Sites Were Used To Train It?\"\/><\/noscript><\/div>\n<p>In the event you\u2019re interested by studying extra in regards to the C4 dataset, I like to recommend studying <a href=\"https:\/\/arxiv.org\/pdf\/2104.08758.pdf\" target=\"_blank\" rel=\"noopener\">Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus (PDF)<\/a> in addition to the unique 2020 analysis paper (<a href=\"https:\/\/arxiv.org\/pdf\/1910.10683.pdf\" target=\"_blank\" rel=\"noopener\">PDF<\/a>) for which C4 was created.<\/p>\n<h2>What May Dialogs Knowledge from Public Boards Be?<\/h2>\n<p>50% of the coaching knowledge comes from \u201c<em>dialogs knowledge from public boards<\/em>.\u201d<\/p>\n<p>That\u2019s all that Google\u2019s LaMDA analysis paper says about this coaching knowledge.<\/p>\n<p>If one have been to guess, Reddit and different high communities like StackOverflow are protected bets.<\/p>\n<p>Reddit is utilized in many essential datasets comparable to ones <a href=\"https:\/\/arxiv.org\/abs\/1910.10683\" target=\"_blank\" rel=\"noopener\">developed by OpenAI called WebText2 (PDF)<\/a>, an open-source approximation of WebText2 referred to as OpenWebText2 and Google\u2019s personal <a href=\"https:\/\/arxiv.org\/pdf\/1910.10683.pdf\" target=\"_blank\" rel=\"noopener\">WebText-like (PDF)<\/a> dataset from 2020.<\/p>\n<p>Google additionally printed particulars of one other dataset of public dialog websites a month earlier than the publication of the LaMDA paper.<\/p>\n<p>This dataset that comprises public dialog websites is known as MassiveWeb.<\/p>\n<p>We\u2019re not speculating that the MassiveWeb dataset was used to coach LaMDA.<\/p>\n<p>However it comprises a very good instance of what Google selected for an additional language mannequin that targeted on dialogue.<\/p>\n<p>MassiveWeb was created by DeepMind, which is owned by Google.<\/p>\n<p>It was designed to be used by a big language mannequin referred to as Gopher (<a href=\"https:\/\/arxiv.org\/pdf\/2112.11446.pdf\" target=\"_blank\" rel=\"noopener\">link to PDF of research paper<\/a>).<\/p>\n<p>MassiveWeb makes use of dialog net sources that transcend Reddit so as to keep away from making a bias towards Reddit-influenced knowledge.<\/p>\n<p>It nonetheless makes use of Reddit. However it additionally comprises knowledge scraped from many different websites.<\/p>\n<p><strong>Public dialog websites included in MassiveWeb are:<\/strong><\/p>\n<ul>\n<li>Reddit<\/li>\n<li>Fb<\/li>\n<li>Quora<\/li>\n<li>YouTube<\/li>\n<li>Medium<\/li>\n<li>StackOverflow<\/li>\n<\/ul>\n<p>Once more, this isn\u2019t suggesting that LaMDA was skilled with the above websites.<\/p>\n<p>It\u2019s simply meant to point out what Google might have used, by displaying a dataset Google was engaged on across the similar time as LaMDA, one which comprises forum-type websites.<\/p>\n<h2>The Remaining 37.5%<\/h2>\n<p><strong>The final group of knowledge sources are:<\/strong><\/p>\n<ul>\n<li>12.5% code paperwork from websites associated to programming like Q&amp;A websites, tutorials, and so on;<\/li>\n<li>12.5% Wikipedia (English)<\/li>\n<li>6.25% English net paperwork<\/li>\n<li>6.25% Non-English net paperwork.<\/li>\n<\/ul>\n<p>Google doesn&#8217;t specify what websites are within the <em>Programming Q&amp;A Websites<\/em> class that makes up 12.5% of the dataset that LaMDA skilled on.<\/p>\n<p>So we will solely speculate.<\/p>\n<p>Stack Overflow and Reddit appear to be apparent decisions, particularly since they have been included within the MassiveWeb dataset.<\/p>\n<p>What \u201c<em>tutorials<\/em>\u201d websites have been crawled? We will solely speculate what these \u201ctutorials\u201d websites could also be.<\/p>\n<p>That leaves the ultimate three classes of content material, two of that are exceedingly obscure.<\/p>\n<p>English language Wikipedia wants no dialogue, everyone knows Wikipedia.<\/p>\n<p><strong>However the next two are usually not defined:<\/strong><\/p>\n<p><em>English<\/em> and<em> non-English<\/em> language net pages are a normal description of 13% of the websites included within the database.<\/p>\n<p>That\u2019s all the knowledge Google provides about this a part of the coaching knowledge.<\/p>\n<h2>Ought to Google Be Clear About Datasets Used for Bard?<\/h2>\n<p>Some publishers really feel uncomfortable that their websites are used to coach AI methods as a result of, of their opinion, these methods might sooner or later make their web sites out of date and disappear.<\/p>\n<p>Whether or not that\u2019s true or not stays to be seen, however it&#8217;s a real concern expressed by publishers and members of the search advertising and marketing neighborhood.<\/p>\n<p>Google is frustratingly obscure in regards to the web sites used to coach LaMDA in addition to what expertise was used to scrape the web sites for knowledge.<\/p>\n<p>As was seen within the evaluation of the C4 dataset, the methodology of selecting which web site content material to make use of for coaching giant language fashions can have an effect on the standard of the language mannequin by excluding sure populations.<\/p>\n<p>Ought to Google be extra clear about what websites are used to coach their AI or a minimum of publish a simple to search out transparency report in regards to the knowledge that was used?<\/p>\n<p><em>Featured picture by Shutterstock\/Asier Romero<\/em><\/p>\n<\/div>\n<iframe data-lazy=\"true\" data-src=\"https:\/\/www.fiverr.com\/gig_widgets?id=U2FsdGVkX18x7XQvttUTrv1oEqmGNGTgvvCUiUoJ\/AP4z\/UyMz8lXGOLpu15jIMxBbTR0gmD5uBoFvhC4KWeALQRp3h\/X\/AwcVD0K8Wj9H\/ZzYKzcCNHosB9oS4SCJJFWiN85P9ICAc4OgCoE\/wHKIY7CDkf2\/DQ1vqGvk4smVe5cRDEmrLPCWi4FC8p40VUhSmWQ5udCm0zoJtorgWv3vbDQw0kKYkwn39ozAnQXDe+YvWMxkLFWA+O3TFwkJvdkIK+\/AUSnRssPKt5WHY0FhNOxnSPcLslEL4G4\/RfP95ve99U+kRnDy3X+KtzdQLY+u935ghON\/o3UE4IMv9oN6JX9RnxzL\/LRcOgnHigxStSGPKsZYtnz8RWNVT\/rOLAibqiWJadC5MYHRbekF3eg6FOGrQGkXYbsn0+a5aovnlLCbLwIqY9fcS17UX8J235iQ6cdmHNbrPeS84CMm34RA==&affiliate_id=1052423&strip_google_tagmanager=true\" loading=\"lazy\" data-with-title=\"true\" class=\"fiverr_nga_frame\" frameborder=\"0\" height=\"350\" width=\"100%\" referrerpolicy=\"no-referrer-when-downgrade\" data-mode=\"random_gigs\" onload=\" var frame = this; var script = document.createElement('script'); script.addEventListener('load', function() { window.FW_SDK.register(frame); }); script.setAttribute('src', 'https:\/\/www.fiverr.com\/gig_widgets\/sdk'); document.body.appendChild(script); \" ><\/iframe>\n<br \/><a href=\"https:\/\/www.searchenginejournal.com\/google-bard-training-data\/478941\/\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Google\u2019s Bard is predicated on the LaMDA language mannequin, skilled on datasets primarily based on Web content material referred to as Infiniset of which little&#8230;<\/p>\n","protected":false},"author":1,"featured_media":40445,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[],"class_list":["post-40444","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tech-universe"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Google Bard AI - What Sites Were Used To Train It? - mailinvest.blog<\/title>\n<meta name=\"description\" content=\"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis.mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what&#039;s new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/mailinvest.blog\/index.php\/2023\/02\/10\/google-bard-ai-what-sites-were-used-to-train-it\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Google Bard AI - What Sites Were Used To Train It? - mailinvest.blog\" \/>\n<meta property=\"og:description\" content=\"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis.mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what&#039;s new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/mailinvest.blog\/index.php\/2023\/02\/10\/google-bard-ai-what-sites-were-used-to-train-it\/\" \/>\n<meta property=\"og:site_name\" content=\"mailinvest.blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/freelanceracademic\/\" \/>\n<meta property=\"article:published_time\" content=\"2023-02-10T12:44:19+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-02-10T12:45:15+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/mailinvest.blog\/wp-content\/uploads\/2023\/02\/google-bard-63e60ced1e153-sej.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1600\" \/>\n\t<meta property=\"og:image:height\" content=\"840\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"admin@mailinvest.blog\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin@mailinvest.blog\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2023\\\/02\\\/10\\\/google-bard-ai-what-sites-were-used-to-train-it\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2023\\\/02\\\/10\\\/google-bard-ai-what-sites-were-used-to-train-it\\\/\"},\"author\":{\"name\":\"admin@mailinvest.blog\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#\\\/schema\\\/person\\\/012701c4c204d4e4ebd34f926cfd31a4\"},\"headline\":\"Google Bard AI &#8211; What Sites Were Used To Train It?\",\"datePublished\":\"2023-02-10T12:44:19+00:00\",\"dateModified\":\"2023-02-10T12:45:15+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2023\\\/02\\\/10\\\/google-bard-ai-what-sites-were-used-to-train-it\\\/\"},\"wordCount\":1807,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2023\\\/02\\\/10\\\/google-bard-ai-what-sites-were-used-to-train-it\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/mailinvest.blog\\\/wp-content\\\/uploads\\\/2023\\\/02\\\/google-bard-63e60ced1e153-sej.jpg\",\"articleSection\":[\"Tech Universe\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2023\\\/02\\\/10\\\/google-bard-ai-what-sites-were-used-to-train-it\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2023\\\/02\\\/10\\\/google-bard-ai-what-sites-were-used-to-train-it\\\/\",\"url\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2023\\\/02\\\/10\\\/google-bard-ai-what-sites-were-used-to-train-it\\\/\",\"name\":\"Google Bard AI - What Sites Were Used To Train It? - mailinvest.blog\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2023\\\/02\\\/10\\\/google-bard-ai-what-sites-were-used-to-train-it\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2023\\\/02\\\/10\\\/google-bard-ai-what-sites-were-used-to-train-it\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/mailinvest.blog\\\/wp-content\\\/uploads\\\/2023\\\/02\\\/google-bard-63e60ced1e153-sej.jpg\",\"datePublished\":\"2023-02-10T12:44:19+00:00\",\"dateModified\":\"2023-02-10T12:45:15+00:00\",\"description\":\"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis.mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what's new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2023\\\/02\\\/10\\\/google-bard-ai-what-sites-were-used-to-train-it\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2023\\\/02\\\/10\\\/google-bard-ai-what-sites-were-used-to-train-it\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2023\\\/02\\\/10\\\/google-bard-ai-what-sites-were-used-to-train-it\\\/#primaryimage\",\"url\":\"https:\\\/\\\/mailinvest.blog\\\/wp-content\\\/uploads\\\/2023\\\/02\\\/google-bard-63e60ced1e153-sej.jpg\",\"contentUrl\":\"https:\\\/\\\/mailinvest.blog\\\/wp-content\\\/uploads\\\/2023\\\/02\\\/google-bard-63e60ced1e153-sej.jpg\",\"width\":1600,\"height\":840},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2023\\\/02\\\/10\\\/google-bard-ai-what-sites-were-used-to-train-it\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/mailinvest.blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Google Bard AI &#8211; What Sites Were Used To Train It?\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#website\",\"url\":\"https:\\\/\\\/mailinvest.blog\\\/\",\"name\":\"mailinvest.blog\",\"description\":\"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis. mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what&#039;s new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.\",\"publisher\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/mailinvest.blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#organization\",\"name\":\"mailinvest\",\"url\":\"https:\\\/\\\/mailinvest.blog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/mailinvest.blog\\\/wp-content\\\/uploads\\\/2022\\\/01\\\/default.png\",\"contentUrl\":\"https:\\\/\\\/mailinvest.blog\\\/wp-content\\\/uploads\\\/2022\\\/01\\\/default.png\",\"width\":1000,\"height\":1000,\"caption\":\"mailinvest\"},\"image\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/freelanceracademic\\\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#\\\/schema\\\/person\\\/012701c4c204d4e4ebd34f926cfd31a4\",\"name\":\"admin@mailinvest.blog\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/98ed217bd0f3d6a6dcae2d9b0c76e305b049a07275e315e1407e19ec8b08e139?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/98ed217bd0f3d6a6dcae2d9b0c76e305b049a07275e315e1407e19ec8b08e139?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/98ed217bd0f3d6a6dcae2d9b0c76e305b049a07275e315e1407e19ec8b08e139?s=96&d=mm&r=g\",\"caption\":\"admin@mailinvest.blog\"},\"sameAs\":[\"https:\\\/\\\/mailinvest.blog\",\"admin@mailinvest.blog\"],\"url\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/author\\\/adminmailinvest-blog\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Google Bard AI - What Sites Were Used To Train It? - mailinvest.blog","description":"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis.mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what's new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/mailinvest.blog\/index.php\/2023\/02\/10\/google-bard-ai-what-sites-were-used-to-train-it\/","og_locale":"en_US","og_type":"article","og_title":"Google Bard AI - What Sites Were Used To Train It? - mailinvest.blog","og_description":"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis.mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what's new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.","og_url":"https:\/\/mailinvest.blog\/index.php\/2023\/02\/10\/google-bard-ai-what-sites-were-used-to-train-it\/","og_site_name":"mailinvest.blog","article_publisher":"https:\/\/www.facebook.com\/freelanceracademic\/","article_published_time":"2023-02-10T12:44:19+00:00","article_modified_time":"2023-02-10T12:45:15+00:00","og_image":[{"width":1600,"height":840,"url":"https:\/\/mailinvest.blog\/wp-content\/uploads\/2023\/02\/google-bard-63e60ced1e153-sej.jpg","type":"image\/jpeg"}],"author":"admin@mailinvest.blog","twitter_card":"summary_large_image","twitter_misc":{"Written by":"admin@mailinvest.blog","Est. reading time":"9 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/mailinvest.blog\/index.php\/2023\/02\/10\/google-bard-ai-what-sites-were-used-to-train-it\/#article","isPartOf":{"@id":"https:\/\/mailinvest.blog\/index.php\/2023\/02\/10\/google-bard-ai-what-sites-were-used-to-train-it\/"},"author":{"name":"admin@mailinvest.blog","@id":"https:\/\/mailinvest.blog\/#\/schema\/person\/012701c4c204d4e4ebd34f926cfd31a4"},"headline":"Google Bard AI &#8211; What Sites Were Used To Train It?","datePublished":"2023-02-10T12:44:19+00:00","dateModified":"2023-02-10T12:45:15+00:00","mainEntityOfPage":{"@id":"https:\/\/mailinvest.blog\/index.php\/2023\/02\/10\/google-bard-ai-what-sites-were-used-to-train-it\/"},"wordCount":1807,"commentCount":0,"publisher":{"@id":"https:\/\/mailinvest.blog\/#organization"},"image":{"@id":"https:\/\/mailinvest.blog\/index.php\/2023\/02\/10\/google-bard-ai-what-sites-were-used-to-train-it\/#primaryimage"},"thumbnailUrl":"https:\/\/mailinvest.blog\/wp-content\/uploads\/2023\/02\/google-bard-63e60ced1e153-sej.jpg","articleSection":["Tech Universe"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/mailinvest.blog\/index.php\/2023\/02\/10\/google-bard-ai-what-sites-were-used-to-train-it\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/mailinvest.blog\/index.php\/2023\/02\/10\/google-bard-ai-what-sites-were-used-to-train-it\/","url":"https:\/\/mailinvest.blog\/index.php\/2023\/02\/10\/google-bard-ai-what-sites-were-used-to-train-it\/","name":"Google Bard AI - What Sites Were Used To Train It? - mailinvest.blog","isPartOf":{"@id":"https:\/\/mailinvest.blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/mailinvest.blog\/index.php\/2023\/02\/10\/google-bard-ai-what-sites-were-used-to-train-it\/#primaryimage"},"image":{"@id":"https:\/\/mailinvest.blog\/index.php\/2023\/02\/10\/google-bard-ai-what-sites-were-used-to-train-it\/#primaryimage"},"thumbnailUrl":"https:\/\/mailinvest.blog\/wp-content\/uploads\/2023\/02\/google-bard-63e60ced1e153-sej.jpg","datePublished":"2023-02-10T12:44:19+00:00","dateModified":"2023-02-10T12:45:15+00:00","description":"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis.mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what's new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.","breadcrumb":{"@id":"https:\/\/mailinvest.blog\/index.php\/2023\/02\/10\/google-bard-ai-what-sites-were-used-to-train-it\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/mailinvest.blog\/index.php\/2023\/02\/10\/google-bard-ai-what-sites-were-used-to-train-it\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/mailinvest.blog\/index.php\/2023\/02\/10\/google-bard-ai-what-sites-were-used-to-train-it\/#primaryimage","url":"https:\/\/mailinvest.blog\/wp-content\/uploads\/2023\/02\/google-bard-63e60ced1e153-sej.jpg","contentUrl":"https:\/\/mailinvest.blog\/wp-content\/uploads\/2023\/02\/google-bard-63e60ced1e153-sej.jpg","width":1600,"height":840},{"@type":"BreadcrumbList","@id":"https:\/\/mailinvest.blog\/index.php\/2023\/02\/10\/google-bard-ai-what-sites-were-used-to-train-it\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/mailinvest.blog\/"},{"@type":"ListItem","position":2,"name":"Google Bard AI &#8211; What Sites Were Used To Train It?"}]},{"@type":"WebSite","@id":"https:\/\/mailinvest.blog\/#website","url":"https:\/\/mailinvest.blog\/","name":"mailinvest.blog","description":"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis. mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what&#039;s new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.","publisher":{"@id":"https:\/\/mailinvest.blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/mailinvest.blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/mailinvest.blog\/#organization","name":"mailinvest","url":"https:\/\/mailinvest.blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/mailinvest.blog\/#\/schema\/logo\/image\/","url":"https:\/\/mailinvest.blog\/wp-content\/uploads\/2022\/01\/default.png","contentUrl":"https:\/\/mailinvest.blog\/wp-content\/uploads\/2022\/01\/default.png","width":1000,"height":1000,"caption":"mailinvest"},"image":{"@id":"https:\/\/mailinvest.blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/freelanceracademic\/"]},{"@type":"Person","@id":"https:\/\/mailinvest.blog\/#\/schema\/person\/012701c4c204d4e4ebd34f926cfd31a4","name":"admin@mailinvest.blog","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/98ed217bd0f3d6a6dcae2d9b0c76e305b049a07275e315e1407e19ec8b08e139?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/98ed217bd0f3d6a6dcae2d9b0c76e305b049a07275e315e1407e19ec8b08e139?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/98ed217bd0f3d6a6dcae2d9b0c76e305b049a07275e315e1407e19ec8b08e139?s=96&d=mm&r=g","caption":"admin@mailinvest.blog"},"sameAs":["https:\/\/mailinvest.blog","admin@mailinvest.blog"],"url":"https:\/\/mailinvest.blog\/index.php\/author\/adminmailinvest-blog\/"}]}},"_links":{"self":[{"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/posts\/40444","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/comments?post=40444"}],"version-history":[{"count":1,"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/posts\/40444\/revisions"}],"predecessor-version":[{"id":40446,"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/posts\/40444\/revisions\/40446"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/media\/40445"}],"wp:attachment":[{"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/media?parent=40444"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/categories?post=40444"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/tags?post=40444"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}