Failure Mode Effect Analysis (step 1)

Microsoft Entra ID

Failure Modes:

  1. Microsoft Entra ID isn’t available or can’t be reached due to a network problem. Redirection to the authentication endpoint fails.

    System Impact: User can’t login.
    Detection: Authentication middleware catches error
    Higher Level Impact: System unusable for user
    Mitigation: For now we’ll just trust Microsoft, but in the future we’ll add login by email, as backup, as well for users without Microsoft or Google accounts.
  2. User can’t login.

    System Impact: User can’t login.
    Detection: Microsoft Entra ID handles this.
    Higher Level Impact: System unusable for user
    Mitigation: In the future, the backup login by email will be an option

Google Authentication

Failure Modes:

  1. Google Auth isn’t available or can’t be reached due to a network problem. Redirection to the authentication endpoint fails.

    System Impact: User can’t login.
    Detection: Authentication middleware catches error
    Higher Level Impact: System unusable for user
    Mitigation: For now we’ll just trust Google, but in the future we’ll add login by email, as backup, as well for users without Microsoft or Google accounts.
  2. User can’t login.

    System Impact: User can’t login.
    Detection: Microsoft Entra ID handles this.
    Higher Level Impact: System unusable for user
    Mitigation: In the future, the backup login by email will be an option

Crawl Container Instance(s)

Failure Modes:

  1. The Container Instance exits unexpected

    System Impact: The crawl session isn’t finished.
    Detection: Azure Retry mechanism, and if all retries fails our scheduled Container Management Function detects Container with exit code not equal to 0.
    Higher Level Impact: Very inconvenient since the crawl may take hours
    Mitigation: Automatic retry on error, after 3 unsuccessful attempts, alert support/dev team, and display an error in the UI
  2. The Container Instance application never finishes

    System Impact: The crawl session isn’t finished.
    Detection: Scheduled Container Management Function detects running Container Session with no results logged for a set time
    Higher Level Impact: Very inconvenient since the crawl may take hours
    Mitigation: Automatic retry on error, after 3 unsuccessful attempts, alert support/dev team, and display an error in the UI

Report Generator Container Instance(s)

Failure Modes:

  1. The Api (app inside container) is unresponsive

    System Impact: Reports cannot be generated.
    Detection: Client side error handling
    Higher Level Impact:
    Mitigation: Display an error in the UI, the user can try again. Log this, and figure out a way to restart the container instance
  2. The Container Instance exits unexpected

    System Impact: Reports cannot be generated.
    Detection: Azure Retry mechanism.
    Higher Level Impact:
    Mitigation: Automatic retry on error, after 3 unsuccessful attempts, alert support/dev team and display an error in the UI, the user can try again.

About Stefan Bergfeldt

Jag som kallar mig för Ordbajsarn heter egentligen Stefan Bergfeldt. Jag föddes på Falu lasarett i augusti 1978, och är uppväxt i Hedemora. Webbutvecklare, sökmotoroptimerare, entreprenör och gitarrist är andra saker man kan kalla mig, om inte Ordbajsarn passar. Jag driver konsultfirman CRS Webbproduktion, och har specialiserat mig på att ta fram kostnadseffektiva webblösningar till små- och medelstora företag.