Setup Site Crawling

Regularly capturing and translating new content is essential to maintaining a fully translated website. Content capture is triggered by loading a page on a translated site running through the Global Delivery Network (GDN). This can happen by organic traffic from end-users, or via bots indexing a translated site.

Rather than relying on organic traffic to capture new content in a timely manner, set up a web crawler (spider), to browse each page automatically. This is especially helpful if you have a staging environment to capture and translate content before it's pushed to production, where organic traffic is low.

The GDN Crawler

The GDN Crawler is Smartling's-own web crawling tool. It can be configured and scheduled from your GDN project to capture content for translation. Job creation and authorization can also be automated in line with your Smartling account configuration.

Alternative Crawlers

If you do not already utilize the GDN Crawler, there are cloud-based solutions (like Apify) and browser-based extensions that you can use depending on your preference. Each crawler may have its own features, but the core functionality is the same. You can specify a domain, and a bot will identify all the hyperlinks on a page, store them in a queue, and systematically open or download each page while simultaneously queuing additional hyperlinks.

It is only necessary to load one translated page (one language version) to trigger content capture for all languages tied to a given source domain.

Depending on your web crawler, you may have to deselect the Protected checkbox within your translated site configuration.

Web crawlers are unable to capture content that requires user interactions via submission.

Hey! Hoi! ¡Oye! Ciao ! 你好! Hallo! Salut ! Hey! How can we help?

The GDN Crawler

Alternative Crawlers

Table of Contents