LiveWhale automatically generates a sitemap to help search engines like Google index fresh and relevant content as quickly as possible, and a robots.txt file that points search engines to the sitemap and instructs them to exclude certain URLs.
Google will automatically crawl your site anyways—and should find your sitemap via robots.txt—but you can help it along by submitting your sitemap using the Google Webmaster Tools. You’ll need to create an account and add a property for your website. You may need to go through some verification steps to prove to Google that you are the administrator of the site.
Once that’s done, select your property and find Crawl > Sitemaps in the main menu. Click Add/Test Sitemap and enter https://myschool.edu/sitemap.livewhale.xml
. Once that’s submitted, over time Google will use that in indexing your site.
Note: Google chooses how quickly and how often to re-index your site. While you can request reindexing of certain pages, it may still take up to several months for certain content to appear in Google. If you are updating your pages frequently with content Google’s algorithm deems high-quality, you should have no trouble getting indexed, but it can sometimes take awhile.
LiveWhale makes an effort to include your most up-to-date, relevant content in the sitemap.
/_
(e.g., /_sample/index.php)In LiveWhale 2.7.0 and later, news stories published within the last two days will receive additional syntax in your sitemap XML file in accordance with the Google News <news:news>
specification.
1 | <news:news> |
If you have special content outside of LiveWhale you want to include in your sitemap, you can do so by creating a file at /sitemap.custom.xml in your main web root. LiveWhale will detect that file, if it exists, and include it in the main sitemap the next time it is generated.
The robots.txt file indicates to search engines where your sitemap is located and also tells the web crawler not to visit
Pages visible to “Anyone with the link” will not be included, so you don’t have to worry about anyone finding your secret URLs via robots.txt. Those pages still do include a robots “noindex” meta tag, so search engines should receive the instruction not to index them.