ISSUE:  SDDC Manager UI never loads up as known Hosts file is corrupted “KNOWN_HOST_RETRIEVAL_FAILURE”

Symptoms:

/var/log/vmware/vcf/sddc-manager-ui-app/sddcManagerServer.log:

====================
Object.maximumInitializationPscError (/opt/vmware/vcf/sddc-manager-ui-app/server/src/errors/VCFError.js:100:5)\n at attemptPSCInitWithRetry (/opt/vmware/vcf/sddc-manager-ui-app/server/src/services/pscUtils.js:117:38)\n at runMicrotasks (<anonymous>)\n at processTicksAndRejections (internal/process/task_queues.js:97:5) {\n jse_shortmsg: ‘Maximum number of PSC Initilization attempts reached. Canceling’,\n jse_cause: VCFError [VError]: Failed to initiate PSC: Primary psc init failed and failover psc init also failed: Unable to retrieve iDP Metadata: 500 – \”\\\”Failed to establish SSH session to vcs01..\\\”\”\n at Object.initiatePscError (/opt/vmware/vcf/sddc-manager-ui-app/server/src/errors/VCFError.js:100:5)\n at attemptPSCInit (/opt/vmware/vcf/sddc-manager-ui-app/server/src/services/pscUtils.js:76:26)\n at runMicrotasks (<anonymous>)\n at processTicks

====================

Workaround/Solution:

We see, the issue here was that /etc/vmware/vcf/commonsvcs/known_hosts were somehow corrupted. SDDC APIs make use of this file when connecting with components.
As the file was unusable, connections to the Management VC from SDDC via API were failing, causing the UI to fail.

  1. Take a snapshot of the SDDC Manager VM through vCenter UI.
  2. SSH to SDDC Manager using vcf and then root. 
  3. Take a backup of the known_hosts files.
  4. Run the below 2 commands on the SDDC manager:

cp -rf /home/vcf/.ssh/known_hosts /home/vcf/.ssh/known_hosts.BACKUP

cp -rf /etc/vmware/vcf/commonsvcs/known_hosts/etc/vmware/vcf/commonsvcs/known_hosts.BACKUP

5. Next step is to check entries are missing from the known_hosts file in SDDC Manager or is it corrupted but running the below command:

root@sddcmgr [ ~ ]# curl localhost/appliancemanager/ssh/knownHosts | json_pp

Output:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   423    0   423    0     0   5788      0 --:--:-- --:--:-- --:--:--  5875
{
   "causes" : [
      {
         "message" : "com.jcraft.jsch.JSchException: fromBase64: invalid base64 data",
         "type" : "java.lang.RuntimeException"
      },
      {
         "message" : "fromBase64: invalid base64 data",
         "type" : "com.jcraft.jsch.JSchException"
      },
      {
         "type" : "java.lang.ArrayIndexOutOfBoundsException"
      }
   ],
   "message" : "Not able to retrieve known hosts details. Please check the logs.",
   "arguments" : [],
   "errorCode" : "KNOWN_HOST_RETRIEVAL_FAILURE",
   "referenceToken" : "CND804"
}

6. We could see that the /home/vcf.ssh/known_hosts file contained all vCenter entries so we replaced below file:

/etc/vmware/vcf/commonsvcs/known_hosts file with the contents of the /home/vcf.ssh/known_hosts file

7. Run the below command to check what comes up in the API output:

curl localhost/appliancemanager/ssh/knownHosts

{
         “host” : “192.168.1.221”,
         “keyType” : “ssh-rsa”,
         “key” : XXXX

}

8. Reboot the SDDC manager and your UI should get loaded.