Continuing the FEMA, Crawling
Let’s have a look at how we start a crawl.

Not enough free vCores to start a container.
Since we know this can, and most likely will, happen, we’re doing a check before even showing the “Start Crawl”-button. If not enough resources are available, we show a message to the user stating current high demand. If for some reason the resources run out in the time it takes from clicking the button to assigning the resources, we graciously fail and report to the user.
We’re monitoring the resource usage to know how often this occurs.
If it just happens every now and then, it’s not that big of a deal. If it happens regularly, we need to handle it. The suggested solutions is either ask Microsoft for more vCores (vCores is the limiting resource) or spreading the Container Instances over different regions.
LightHouse or Google Search Console not responding as expected.
Always expect external resources to fail. This application will not fail because of LightHouse or Google Search Console misbehaving. We will just finish the crawl without the result from the failing service.
The Container Instance exits unexpected.
The Container is configured with a Retry Policy, making it start over if it exits unexpected. The Crawler App is built to continue where it left of, based on the SessionId.
If it fails more than three times, it will write the error to a database and dispose the Instance to free resources.
The Container Instance application never finishes.
To be fair, this is something the developer of the Crawler App warned us about. In some conditions the program stops doing anything, but stays running. This gives us no clear signal that something went wrong, and no signal that it is done. The limited number of reserved vCores makes this very problematic, since the vCores are still reserved, but not utilized (meaning, no cost to talk about).
We identify this with a scheduled Container Managment Azure Function, that looks at the running Container Instances and the Last Updated timestamp for the corresponding Session in the database.
A Container Instance that has been running without updating the Last Updated timestamp for a pre defined set of time well be shut down, causing it to trigger the unexpected exit handler.