After upgrading Cloud Director to version 10.2.2 problem (that is related to all versions between 10.1 till 10.2.1) with crashing console proxy service should be fixed but before that, you can use a workaround by adding setting consoleproxy.cores.max with value zero to fix the issue before upgrade. To do so you can use config of Cloud Director cell (adding string consoleproxy.cores.max = 0) or using the command:
/opt/vmware/vcloud-director/bin/cell-management-tool manage-config -n consoleproxy.cores.max -v "0"
For additional information, you should check the awesome article from Bryan van Eeden and incredible blog vCloud Vision (this blog was created by Rudolf Kleijwegt and Bryan van Eeden). Thanks, Bryan!
In our environment, we found that option doesn't help at all and we create a bash script w/a that checks for CLOSE-WAIT connections (these connections are waiting for TCP-ACK for being closed, but Cloud Director doesn't send that package) and if we'll have more than 9 connections, that should be closed, and if there are no active users Jobs at cell - we're restarting vmware-vcd service. This is the only solution to close these problematic connections that will cause inaccessible of console proxy at the Cloud Director side.
Here is a little script (your Console Proxy service should listen at 8443 port of Cloud Director cell. You can check this in /opt/vmware/vcloud-director/etc/global.properties configuration file at option consoleproxy.port.https = 8443 ) to automate restarting of Cloud Director service in case of getting these problematic connections and if there are no active users Jobs at Cloud Director cell:
#!/bin/bash
# Let's check current count of hanged TCP VMRC sessions
tcpclosewaitcount=""$(/usr/sbin/ss --tcp state CLOSE-WAIT '( dport = :8443 or sport = :8443 )' | wc -l)
# Let's check for active Cloud Director Jobs that are initiated by users
vcdtaskscount=$(/opt/vmware/vcloud-director/bin/cell-management-tool cell -i `cat /var/run/vmware-vcd-cell.pid` --status | grep "Job count" | awk '{ print $4 }')
# If count of hanged sessions more that 9 and there is no active users jobs - let's make a restart and log the date of restarting services in log file
if [ ${tcpclosewaitcount} -gt 10 ] && [ ${vcdtaskscount} -eq 0 ];then
/usr/sbin/service vmware-vcd restart && echo "Cloud Director service restarted by VMRC script at $(date)" >> /root/cloudrestart.log
fi
You can add this script to crontab (on different cells I recommend you to add different run-time periods to prevent situations when all cells will be restarted and this will cause Cloud Director web-panel outage). For example (add for first cell timings 00,15,30,45 to run script every 15 minutes on cell , on second 02,17,32,47 to run script every 15 minutes on cell +2 minutes from first cell, etc):
02,17,32,47 * * * * root /root/tcpclosewaitrestart.sh
We've upgraded our env to 10.3.1 but sometimes I see CLOSE-WAIT connections on Cloud Director cells and we need to make a manual restart of services. Currently, we're performing additional investigation and planning to create a new Service Request at VMware Global Support Services.
Comments?
Leave us your opinion.