VCSA 6.5 – “Failed to start vpxd-svcs, vapi-endpoint services. Error: Operation timed out” after Cert Renewal

This post was originally published on this site

We have a single on-premises VCSA 6.5 instance that recently ran into the certificate expiration detailed in this KB:

https://kb.vmware.com/s/article/76719

 

All the certificates have been regenerated using the certificate-tool via the CLI, and now show up as up-to-date using the one-liner in the above KB (they were all previously expired a week ago):

STORE MACHINE_SSL_CERT

Alias : __MACHINE_CERT

            Not After : Aug 18 19:56:50 2022 GMT

STORE TRUSTED_ROOTS

Alias : 9bd7b30bcb1dcecfe2491a3e91fcd3dd756f347f

            Not After : Aug  1 13:58:01 2028 GMT

Alias : c0af9d76ae9fab214298c6b11d4efb72f64b6c13

            Not After : Aug 13 18:18:55 2030 GMT

Alias : ac50bb369ff7dce7e8c372b9b3e50f6e3aaaa528

            Not After : Aug 13 18:20:03 2030 GMT

Alias : 3e816060d6322a45114eac30798edbf1a4a1397d

            Not After : Aug 13 18:28:26 2030 GMT

Alias : 074ddc83baeea4c6588f3f11837ed4fc77b25220

            Not After : Aug 13 19:21:38 2030 GMT

Alias : 4bbaf83d23a818f2e8122b60ca0edc6dabf76d7d

            Not After : Aug 13 19:33:49 2030 GMT

STORE TRUSTED_ROOT_CRLS

Alias : a45f284d7b9325005381b1b14d3ac3c823e104c9

Alias : 4b3b32cf9bb0d212aa6551bdd97dd3aaf029dde5

Alias : 02c60981250d68d94e1fcd31c93d0c50ae26d531

Alias : c4df908ec94dc3b1b774ca4a8768acfdbee90e59

Alias : f65b7ab274c5d949e8e914101797260d9e40fd70

Alias : 84d8635a51db3a011bab257873555c6776381d37

STORE machine

Alias : machine

            Not After : Aug 18 19:12:42 2022 GMT

STORE vsphere-webclient

Alias : vsphere-webclient

            Not After : Aug 18 19:12:43 2022 GMT

STORE vpxd

Alias : vpxd

            Not After : Aug 18 19:12:43 2022 GMT

STORE vpxd-extension

Alias : vpxd-extension

            Not After : Aug 18 19:12:44 2022 GMT

STORE SMS

Alias : sms_self_signed

            Not After : Aug  7 14:06:21 2028 GMT

STORE BACKUP_STORE

Alias : bkp___MACHINE_CERT

            Not After : Aug 18 19:11:39 2022 GMT

Alias : bkp_machine

            Not After : Aug 18 19:12:42 2022 GMT

Alias : bkp_vsphere-webclient

            Not After : Aug 18 19:12:43 2022 GMT

Alias : bkp_vpxd

            Not After : Aug 18 19:12:43 2022 GMT

Alias : bkp_vpxd-extension

            Not After : Aug 18 19:12:44 2022 GMT

 

When I try to start all services now, it returns the following after ~5 minutes:

 

Service-control failed. Error Failed to start vmon services.vmon-cli RC=1, stderr=Failed to start vpxd-svcs, vapi-endpoint services. Error: Operation timed out

 

When using service-control to start just the vpxd-svcs service by itself, it returns the following error:

 

Perform start operation. vmon_profile=None, svc_names=[‘vmware-vpxd-svcs’], include_coreossvcs=False, include_leafossvcs=False

2020-08-18T21:10:50.484Z   Service vpxd-svcs state STOPPED

Error executing start on service vpxd-svcs. Details {

    “resolution”: null,

    “detail”: [

        {

            “args”: [

                “vpxd-svcs”

            ],

            “id”: “install.ciscommon.service.failstart”,

            “localized”: “An error occurred while starting service ‘vpxd-svcs'”,

            “translatable”: “An error occurred while starting service ‘%(0)s'”

        }

    ],

    “componentKey”: null,

    “problemId”: null

}

Service-control failed. Error {

    “resolution”: null,

    “detail”: [

        {

            “args”: [

                “vpxd-svcs”

            ],

            “id”: “install.ciscommon.service.failstart”,

            “localized”: “An error occurred while starting service ‘vpxd-svcs'”,

            “translatable”: “An error occurred while starting service ‘%(0)s'”

        }

    ],

    “componentKey”: null,

    “problemId”: null

}

 

The web UI returns the following 503 error (which it has been returning since the certs expired):

503 Service Unavailable (Failed to connect to endpoint: [N7Vmacore4Http20NamedPipeServiceSpecE:0x000056033c080640] _serverNamespace = / action = Allow _pipeName =/var/run/vmware/vpxd-webserver-pipe)

 

Can anyone point me to what log files specifically I need to be looking at to diagnose this and figure out what keeps the service from starting? I’ve already covered the following:

  • It’s not a disk space / log rotation issue
  • It’s not the postgre DB (for which I found a few threads, but it’s starting properly in our instance)

 

Our last resort is to simply wipe and reinstall VCSA, but I’d like to avoid it if this is possible to fix.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.