TruGrid SecureRDP - Mac Network Monitor Script

TruGrid Mac Network Test


The companion to the Windows .bat ping launcher for Mac users. Same idea: when an RDP session drops and you do not know whether the problem is the LAN, the ISP, or TruGrid, run this in the background and let it tell you which link broke when the disconnect happened.


Unlike the Windows version which opens three terminal windows running ping, the Mac version is a single zsh script that probes five targets every two seconds (public DNS, the default gateway, the TruGrid front end, and the currently assigned TruGrid relay), clusters any outages it sees, and writes a self-contained HTML report to the Desktop with a correlation verdict explaining which side of the path failed.


How to use it

  1. Open TextEdit on the Mac. Choose Format then Make Plain Text before doing anything else, or you will save an RTF and it will not run.
  2. Copy the script (below) into the empty document.
  3. Save the file to ~/Downloads (or anywhere convenient) with the name trugrid-mac-network-test.zsh.
  4. Open Terminal, then make the script executable by running chmod +x ~/Downloads/trugrid-mac-network-test.zsh.
  5. Run the script by typing ~/Downloads/trugrid-mac-network-test.zsh and pressing Return.
  6. Minimize the Terminal window and work normally. The script keeps probing for 60 minutes by default, or until you press Ctrl+C, or until it has caught 10 outages, whichever comes first.
  7. When the script ends, an HTML report opens automatically in the default browser. Attach that file to the support ticket.


Options


The script accepts a few optional flags. Append them to the run command in step 5.

  • --duration N runs for N minutes instead of the default 60.
  • --indefinite runs until Ctrl+C is pressed. The HTML report is generated when you stop it.
  • --no-open skips auto-opening the HTML report. Useful over SSH or in batch runs.
  • -h or --help prints usage.


If a disconnect is rare and you want the script to run all day until it catches one, use --indefinite.

What the report tells you


The report opens in the browser and has four parts:

  • Run summary: how many sample rounds completed, how many outages were recorded, and why the run stopped.
  • Correlation verdict: a plain-English sentence explaining where the failure was. For example: "At 14:32:18, only TruGrid path failed for 6s. Other connectivity healthy. Suggests a TruGrid Azure or transit issue, not your network."
  • Per-target statistics: success rate and latency stats (mean, median, P95, max) for each probed target.
  • Latency charts: one chart per target with red bands marking outages and a blue line showing latency over time.

If everything stayed up for the entire run, you will see a green "Connectivity was stable across all probed targets for the entire run" line in place of the verdict. In that case the support team will know the disconnect happened outside the probed paths, most often on the RDP host machine itself.


The script

Paste the contents of the script below into your TextEdit document at step 2.


#!/bin/zsh
# trugrid-mac-network-monitor.zsh
# Mac port of TruGrid SecureRDP Toolkit Test-TruGridNetworkMonitor.ps1
# v1.1
#
# Matches Windows behavior: 5 targets, 2s interval, outage clustering,
# per-target SVG charts, correlation verdict.

emulate -L zsh
setopt null_glob

# ===== Config =====
SampleIntervalSeconds=2
MaxRunDurationMinutes=60           # 60 min default (was 240; can override)
MaxOutageEvents=10
OutageThresholdConsecutive=2
ClusterWindowSeconds=10

ICMP_TIMEOUT_MS=1000
TCP_TIMEOUT_SEC=2
TCP_MAX_TIME_SEC=3

OutputDir="${HOME}/Desktop"
StopFlagPath="${TMPDIR:-/tmp}/TruGridDiag/netmonitor.stop"
# ==================

INDEFINITE=0
NO_OPEN=0
DURATION_OVERRIDE=""

usage() {
  cat <<EOF
Usage: ${0##*/} [options]
  --indefinite              Run until Ctrl+C or stop flag
  --duration N              Override max duration in minutes (default 60)
  --no-open                 Skip auto-open of HTML report
  -h, --help                Show this help
EOF
  exit 0
}

while (( $# > 0 )); do
  case "$1" in
    --indefinite) INDEFINITE=1; shift ;;
    --duration) DURATION_OVERRIDE="$2"; shift 2 ;;
    --no-open) NO_OPEN=1; shift ;;
    -h|--help) usage ;;
    *) echo "Unknown arg: $1" >&2; usage ;;
  esac
done

[[ -n "$DURATION_OVERRIDE" ]] && MaxRunDurationMinutes="$DURATION_OVERRIDE"

(( EUID == 0 )) && echo "Warning: running as root, lsof will not see your user's TruGrid client." >&2

RUN_STAMP=$(date +%Y%m%d_%H%M%S)
HOSTNAME_SHORT=$(hostname -s)
HTML_PATH="${OutputDir}/TruGridNetMonitor_${HOSTNAME_SHORT}_${RUN_STAMP}.html"

TMP_DIR="/tmp/trugrid-netmon.$$"
mkdir -p "$TMP_DIR" || { echo "Cannot create $TMP_DIR" >&2; exit 1; }
SAMPLES_TSV="$TMP_DIR/samples.tsv"
: > "$SAMPLES_TSV"

mkdir -p "$(dirname "$StopFlagPath")" 2>/dev/null
rm -f "$StopFlagPath"

INTERRUPTED=0
on_int() { INTERRUPTED=1; }
trap on_int INT TERM
trap "rm -rf $TMP_DIR" EXIT

# ---------- Helpers ----------

get_default_gateway() {
  route -n get default 2>/dev/null | awk '/gateway:/{print $2; exit}'
}

get_relay_ip() {
  local listen_data client_pid
  listen_data=$(lsof -iTCP -nP 2>/dev/null \
    | awk '
        NR>1 {
          local_side = $9
          sub(/->.*/, "", local_side)
          if (local_side ~ /^127\.0\.0\.1:43[0-9]+$/) print $2 "\t" local_side
        }' \
    | sort -u)
  [[ -z "$listen_data" ]] && return 0
  client_pid=$(echo "$listen_data" | sort -t: -k2 -n | tail -1 | cut -f1)

  local ws_ip app_ip
  ws_ip=$(dig +short +tries=1 +time=2 ws.trugrid.com A 2>/dev/null | grep -E '^[0-9]' | head -1)
  app_ip=$(dig +short +tries=1 +time=2 app.trugrid.com A 2>/dev/null | grep -E '^[0-9]' | head -1)

  lsof -i -nP -a -p "$client_pid" 2>/dev/null \
    | awk '/(ESTABLISHED|CLOSE_WAIT)/ {print $9}' \
    | awk -F'->' 'NF==2 {print $2}' \
    | grep -E '^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+:443$' \
    | grep -vE "^(${ws_ip:-NOMATCH}|${app_ip:-NOMATCH}):" \
    | awk -F: '{print $1}' \
    | sort -u | head -1
}

probe_icmp() {
  local h="$1" out
  out=$(ping -c 1 -W $ICMP_TIMEOUT_MS -q "$h" 2>&1)
  if echo "$out" | grep -q "round-trip"; then
    echo "$out" | grep "round-trip" | awk -F'/' '{print $5}' | awk '{printf "%.0f", $1+0.5}'
  fi
}

probe_tcp() {
  local h="$1" p="$2" t
  t=$(curl -s -o /dev/null -w "%{time_connect}" \
        --connect-timeout $TCP_TIMEOUT_SEC --max-time $TCP_MAX_TIME_SEC \
        "https://${h}:${p}/" 2>/dev/null)
  if [[ -n "$t" ]] && awk -v v="$t" 'BEGIN{exit (v > 0) ? 0 : 1}'; then
    awk -v v="$t" 'BEGIN{printf "%.0f", v*1000+0.5}'
  fi
}

# Perl-based ISO timestamp to epoch (reliable across timezones, ships on macOS)
iso_to_epoch() {
  local iso="$1"
  [[ -z "$iso" ]] && { echo 0; return; }
  perl -MTime::Piece -e '
    my $s = $ARGV[0];
    $s =~ s/([+-])(\d{2})(\d{2})$/$1$2:$3/;
    my $t = Time::Piece->strptime($s, "%Y-%m-%dT%H:%M:%S%z");
    print $t->epoch;
  ' "$iso" 2>/dev/null || echo 0
}

# ---------- Targets ----------

typeset -a TARGET_NAMES TARGET_TYPES TARGET_HOSTS TARGET_PORTS TARGET_KINDS

add_target() {
  TARGET_NAMES+=("$1"); TARGET_TYPES+=("$2"); TARGET_HOSTS+=("$3"); TARGET_PORTS+=("$4"); TARGET_KINDS+=("$5")
}

echo "TruGrid Network Monitor (Mac port v1.1)"
echo "============================================================"
echo ""

add_target "Cloudflare DNS (1.1.1.1)"           "icmp" "1.1.1.1" ""    "internet"
add_target "Google DNS (8.8.8.8)"               "icmp" "8.8.8.8" ""    "internet"

GW=$(get_default_gateway)
if [[ -n "$GW" ]]; then
  add_target "Local Gateway ($GW)" "icmp" "$GW" "" "gateway"
  echo "  Detected default gateway: $GW"
else
  echo "  Note: could not detect default gateway, LAN probe skipped."
fi

add_target "TruGrid Front End (ws.trugrid.com:443)" "tcp" "ws.trugrid.com" "443" "trugrid"

RELAY_IP=$(get_relay_ip)
if [[ -n "$RELAY_IP" ]]; then
  add_target "TruGrid Relay ($RELAY_IP)" "icmp" "$RELAY_IP" "" "relay"
  echo "  Found latest relay IP from lsof: $RELAY_IP"
else
  echo "  No active TruGrid session detected. Relay probe skipped."
fi

NUM_TARGETS=${#TARGET_NAMES[@]}
echo ""
if (( INDEFINITE )); then
  echo "Probing $NUM_TARGETS target(s) every ${SampleIntervalSeconds}s."
  echo "Stops after $MaxOutageEvents outage events, stop flag, or Ctrl+C."
else
  echo "Probing $NUM_TARGETS target(s) every ${SampleIntervalSeconds}s."
  echo "Stops after $MaxRunDurationMinutes min, $MaxOutageEvents outage events, stop flag, or Ctrl+C."
fi
echo "Will write report to: $HTML_PATH"
echo ""

# ---------- Main loop ----------

START_TS=$(date +%s)
END_TS=$(( START_TS + MaxRunDurationMinutes * 60 ))
SAMPLE_IDX=0
TOTAL_OUTAGES=0
STOP_REASON=""

typeset -A CONSEC_FAIL LAST_GLYPH OUTAGE_RECORDED
for ((i=1; i<=NUM_TARGETS; i++)); do
  CONSEC_FAIL[${TARGET_NAMES[$i]}]=0
  LAST_GLYPH[${TARGET_NAMES[$i]}]="?"
done

while (( INTERRUPTED == 0 )); do
  if [[ -f "$StopFlagPath" ]]; then
    STOP_REASON="GUI stop button"
    break
  fi
  if (( INDEFINITE == 0 )) && (( $(date +%s) >= END_TS )); then
    STOP_REASON="max duration reached"
    break
  fi
  if (( TOTAL_OUTAGES >= MaxOutageEvents )); then
    STOP_REASON="max outage events"
    break
  fi

  SAMPLE_IDX=$(( SAMPLE_IDX + 1 ))
  now_iso=$(date +%Y-%m-%dT%H:%M:%S%z)

  for ((i=1; i<=NUM_TARGETS; i++)); do
    t_type="${TARGET_TYPES[$i]}"
    t_host="${TARGET_HOSTS[$i]}"
    t_port="${TARGET_PORTS[$i]}"
    outfile="$TMP_DIR/probe_$i.out"
    if [[ "$t_type" == "icmp" ]]; then
      (probe_icmp "$t_host" > "$outfile") &
    else
      (probe_tcp "$t_host" "$t_port" > "$outfile") &
    fi
  done
  wait

  for ((i=1; i<=NUM_TARGETS; i++)); do
    name="${TARGET_NAMES[$i]}"
    outfile="$TMP_DIR/probe_$i.out"
    result=$(cat "$outfile" 2>/dev/null)
    rm -f "$outfile"

    if [[ -n "$result" ]]; then
      printf "%s\t%d\t%s\t%s\tOK\n" "$name" "$SAMPLE_IDX" "$now_iso" "$result" >> "$SAMPLES_TSV"
      LAST_GLYPH[$name]="+"
      CONSEC_FAIL[$name]=0
      unset "OUTAGE_RECORDED[$name]" 2>/dev/null
    else
      printf "%s\t%d\t%s\t\tFAIL\n" "$name" "$SAMPLE_IDX" "$now_iso" >> "$SAMPLES_TSV"
      LAST_GLYPH[$name]="X"
      CONSEC_FAIL[$name]=$(( CONSEC_FAIL[$name] + 1 ))
      if (( CONSEC_FAIL[$name] == OutageThresholdConsecutive )) && [[ -z "${OUTAGE_RECORDED[$name]:-}" ]]; then
        TOTAL_OUTAGES=$(( TOTAL_OUTAGES + 1 ))
        OUTAGE_RECORDED[$name]=1
        echo ""
        echo "[outage] $name failed ${OutageThresholdConsecutive} consecutive samples (event #${TOTAL_OUTAGES})"
      fi
    fi
  done

  elapsed=$(( $(date +%s) - START_TS ))
  elapsed_str=$(printf '%02d:%02d:%02d' $((elapsed/3600)) $((elapsed%3600/60)) $((elapsed%60)))
  status_line="[${elapsed_str}] s=${SAMPLE_IDX}"
  for ((i=1; i<=NUM_TARGETS; i++)); do
    n="${TARGET_NAMES[$i]}"
    short="${n%% \(*}"
    status_line+=" ${short}:${LAST_GLYPH[$n]}"
  done
  printf "\r%s\033[K" "$status_line"

  for ((s=0; s<SampleIntervalSeconds; s++)); do
    (( INTERRUPTED )) && break
    sleep 1
  done
done

(( INTERRUPTED )) && STOP_REASON="${STOP_REASON:-Ctrl+C}"
[[ -z "$STOP_REASON" ]] && STOP_REASON="loop exit"

END_TS_ACTUAL=$(date +%s)
ACTUAL_DURATION=$(( END_TS_ACTUAL - START_TS ))
echo ""
echo "Stop reason: $STOP_REASON"
echo "Samples collected: $SAMPLE_IDX rounds across $NUM_TARGETS targets"

rm -f "$StopFlagPath" 2>/dev/null

if (( SAMPLE_IDX == 0 )); then
  echo "No samples collected, skipping report."
  exit 0
fi

# ---------- Post-processing ----------

echo "Generating report..."

# Stats per target
echo "  computing per-target stats..."
compute_stats() {
  local target="$1"
  local total ok
  total=$(awk -F'\t' -v t="$target" '$1==t' "$SAMPLES_TSV" | wc -l | tr -d ' ')
  ok=$(awk -F'\t' -v t="$target" '$1==t && $5=="OK"' "$SAMPLES_TSV" | wc -l | tr -d ' ')
  : ${total:=0}
  : ${ok:=0}

  local rate="0.0"
  (( total > 0 )) && rate=$(awk -v o="$ok" -v t="$total" 'BEGIN{printf "%.1f", (o/t)*100}')

  if (( ok == 0 )); then
    echo "${total}|0|${rate}|-|-|-|-|-"
    return
  fi

  local lat_file="$TMP_DIR/lat.$RANDOM"
  awk -F'\t' -v t="$target" '$1==t && $5=="OK" {print $4}' "$SAMPLES_TSV" | sort -n > "$lat_file"
  local n=$(wc -l < "$lat_file" | tr -d ' ')
  : ${n:=0}
  local mean=$(awk '{s+=$1; n++} END{if(n>0) printf "%.0f", s/n; else print "-"}' "$lat_file")
  local med_i=$(( (n + 1) / 2 ))
  local p95_i=$(awk -v n="$n" 'BEGIN{i=int(n*0.95+0.5); if(i<1)i=1; if(i>n)i=n; print i}')
  local p99_i=$(awk -v n="$n" 'BEGIN{i=int(n*0.99+0.5); if(i<1)i=1; if(i>n)i=n; print i}')
  (( med_i < 1 )) && med_i=1
  (( med_i > n )) && med_i=$n
  local median=$(sed -n "${med_i}p" "$lat_file")
  local p95=$(sed -n "${p95_i}p" "$lat_file")
  local p99=$(sed -n "${p99_i}p" "$lat_file")
  local maxv=$(tail -1 "$lat_file")
  rm -f "$lat_file"

  echo "${total}|${ok}|${rate}|${mean}|${median}|${p95}|${p99}|${maxv}"
}

extract_outages() {
  local target="$1"
  awk -F'\t' -v t="$target" -v thr="$OutageThresholdConsecutive" '
    $1==t {
      if ($5 == "FAIL") {
        if (n == 0) { si = $2; sts = $3 }
        n++; ei = $2; ets = $3
      } else {
        if (n >= thr) printf "%s\t%d\t%d\t%s\t%s\n", t, si, ei, sts, ets
        n = 0
      }
    }
    END { if (n >= thr) printf "%s\t%d\t%d\t%s\t%s\n", t, si, ei, sts, ets }
  ' "$SAMPLES_TSV"
}

ALL_OUTAGES="$TMP_DIR/outages.tsv"
: > "$ALL_OUTAGES"
echo "  extracting outage events..."
for ((i=1; i<=NUM_TARGETS; i++)); do
  extract_outages "${TARGET_NAMES[$i]}" >> "$ALL_OUTAGES"
done

generate_verdicts() {
  local outage_count
  outage_count=$(wc -l < "$ALL_OUTAGES" | tr -d ' ')
  : ${outage_count:=0}
  if (( outage_count == 0 )); then
    echo "<p class=\"ok\">Connectivity was stable across all probed targets for the entire run.</p>"
    return
  fi

  local sorted="$TMP_DIR/outages_sorted.tsv"
  sort -t$'\t' -k4 "$ALL_OUTAGES" > "$sorted"

  echo "<ul class=\"verdicts\">"
  local cluster_kinds="" cluster_targets="" cluster_start=""
  local cluster_max_dur=0 last_end_epoch=0
  while IFS=$'\t' read -r tgt si ei sts ets; do
    local s_epoch e_epoch dur kind=""
    s_epoch=$(iso_to_epoch "$sts")
    e_epoch=$(iso_to_epoch "$ets")
    : ${s_epoch:=0}
    : ${e_epoch:=0}
    dur=$(( e_epoch - s_epoch ))
    (( dur < 1 )) && dur=$(( SampleIntervalSeconds * (ei - si + 1) ))

    for ((j=1; j<=NUM_TARGETS; j++)); do
      [[ "${TARGET_NAMES[$j]}" == "$tgt" ]] && kind="${TARGET_KINDS[$j]}" && break
    done

    if [[ -z "$cluster_start" ]]; then
      cluster_start="$sts"; last_end_epoch="$e_epoch"
      cluster_kinds="$kind"; cluster_targets="$tgt"; cluster_max_dur=$dur
    elif (( s_epoch - last_end_epoch <= ClusterWindowSeconds )); then
      cluster_kinds="$cluster_kinds $kind"
      cluster_targets="$cluster_targets|$tgt"
      (( e_epoch > last_end_epoch )) && last_end_epoch="$e_epoch"
      (( dur > cluster_max_dur )) && cluster_max_dur=$dur
    else
      emit_verdict "$cluster_start" "$cluster_max_dur" "$cluster_kinds" "$cluster_targets"
      cluster_start="$sts"; last_end_epoch="$e_epoch"
      cluster_kinds="$kind"; cluster_targets="$tgt"; cluster_max_dur=$dur
    fi
  done < "$sorted"

  [[ -n "$cluster_start" ]] && emit_verdict "$cluster_start" "$cluster_max_dur" "$cluster_kinds" "$cluster_targets"
  echo "</ul>"
}

emit_verdict() {
  local start_ts="$1" max_dur="$2" kinds="$3" targets="$4"
  local time_short
  time_short=$(echo "$start_ts" | awk -F'T' '{print $2}' | cut -c1-8)
  local has_inet=0 has_gw=0 has_tg=0 has_relay=0
  [[ " $kinds " == *" internet "* ]] && has_inet=1
  [[ " $kinds " == *" gateway "* ]] && has_gw=1
  [[ " $kinds " == *" trugrid "* ]] && has_tg=1
  [[ " $kinds " == *" relay "* ]] && has_relay=1

  local verdict
  if (( has_inet && has_gw && has_tg )); then
    verdict="At ${time_short}, total connectivity loss for ${max_dur}s. Likely full ISP or upstream outage."
  elif (( has_gw && ! has_inet )); then
    verdict="At ${time_short}, gateway failed for ${max_dur}s but internet stayed reachable. Suspicious, possibly a transient routing or LAN issue."
  elif (( has_inet && ! has_gw )); then
    verdict="At ${time_short}, internet probes failed for ${max_dur}s but gateway stayed up. Likely an ISP-side issue past your gateway."
  elif (( (has_tg || has_relay) && ! has_inet && ! has_gw )); then
    verdict="At ${time_short}, only TruGrid path failed for ${max_dur}s. Other connectivity healthy. Suggests a TruGrid Azure or transit issue, not your network."
  elif (( has_relay && ! has_tg )); then
    verdict="At ${time_short}, the assigned relay failed for ${max_dur}s but the TruGrid front end stayed reachable. Possible ACI rotation or relay-specific incident."
  else
    local names="${targets//\|/, }"
    verdict="At ${time_short}, outage affecting: ${names}. Duration ${max_dur}s. Manual review of the timeline recommended."
  fi
  echo "<li>${verdict}</li>"
}

echo "  building correlation verdict..."
VERDICT_HTML=$(generate_verdicts)

generate_svg() {
  local target="$1"
  local has_data
  has_data=$(awk -F'\t' -v t="$target" '$1==t && $5=="OK" {n++} END{print n+0}' "$SAMPLES_TSV")
  : ${has_data:=0}
  if (( has_data == 0 )); then
    echo "<p class=\"meta\">No successful samples to chart.</p>"
    return
  fi

  awk -F'\t' -v target="$target" -v thr="$OutageThresholdConsecutive" '
    BEGIN {
      W = 900; H = 200
      ML = 50; MR = 15; MT = 15; MB = 30
      PW = W - ML - MR; PH = H - MT - MB
      max_lat = 0; min_idx = -1; max_idx = -1
    }
    $1 == target {
      if (min_idx < 0) min_idx = $2
      max_idx = $2
      if ($5 == "OK") {
        v = $4 + 0
        if (v > max_lat) max_lat = v
        ok_idx[$2] = v
      } else {
        fail_idx[$2] = 1
      }
    }
    END {
      if (max_lat < 1) max_lat = 1
      span = max_idx - min_idx
      if (span < 1) span = 1

      ymax = max_lat
      if (ymax < 10) ymax = 10
      else if (ymax < 50) ymax = int((ymax + 9) / 10) * 10
      else if (ymax < 200) ymax = int((ymax + 24) / 25) * 25
      else ymax = int((ymax + 99) / 100) * 100

      printf "<svg viewBox=\"0 0 %d %d\" xmlns=\"http://www.w3.org/2000/svg\" style=\"width:100%%;max-width:%dpx;height:auto;\">\n", W, H, W
      printf "<rect x=\"%d\" y=\"%d\" width=\"%d\" height=\"%d\" fill=\"rgba(0,0,0,0.02)\" stroke=\"#ccc\" stroke-width=\"0.5\"/>\n", ML, MT, PW, PH
      for (gy = 1; gy <= 3; gy++) {
        ly = MT + (PH * gy / 4)
        printf "<line x1=\"%d\" y1=\"%.1f\" x2=\"%d\" y2=\"%.1f\" stroke=\"#ddd\" stroke-width=\"0.5\"/>\n", ML, ly, ML+PW, ly
      }

      run_start = -1
      for (idx = min_idx; idx <= max_idx; idx++) {
        if (idx in fail_idx) {
          if (run_start < 0) run_start = idx
          run_end = idx
        } else {
          if (run_start >= 0 && (run_end - run_start + 1) >= thr) {
            x1 = ML + ((run_start - min_idx) / span) * PW
            x2 = ML + ((run_end - min_idx + 1) / span) * PW
            printf "<rect x=\"%.1f\" y=\"%d\" width=\"%.1f\" height=\"%d\" fill=\"#c0392b\" opacity=\"0.18\"/>\n", x1, MT, x2-x1, PH
          }
          run_start = -1
        }
      }
      if (run_start >= 0 && (run_end - run_start + 1) >= thr) {
        x1 = ML + ((run_start - min_idx) / span) * PW
        x2 = ML + ((run_end - min_idx + 1) / span) * PW
        printf "<rect x=\"%.1f\" y=\"%d\" width=\"%.1f\" height=\"%d\" fill=\"#c0392b\" opacity=\"0.18\"/>\n", x1, MT, x2-x1, PH
      }

      in_seg = 0
      for (idx = min_idx; idx <= max_idx; idx++) {
        if (idx in ok_idx) {
          x = ML + ((idx - min_idx) / span) * PW
          y = MT + PH - (ok_idx[idx] / ymax) * PH
          if (in_seg == 0) { printf "<polyline points=\""; in_seg = 1 }
          printf "%.1f,%.1f ", x, y
        } else {
          if (in_seg) { printf "\" fill=\"none\" stroke=\"#1972F5\" stroke-width=\"1.4\"/>\n"; in_seg = 0 }
        }
      }
      if (in_seg) printf "\" fill=\"none\" stroke=\"#1972F5\" stroke-width=\"1.4\"/>\n"

      printf "<text x=\"%d\" y=\"%d\" font-size=\"10\" text-anchor=\"end\" fill=\"#888\">%dms</text>\n", ML-5, MT+5, ymax
      printf "<text x=\"%d\" y=\"%d\" font-size=\"10\" text-anchor=\"end\" fill=\"#888\">%dms</text>\n", ML-5, MT+PH/2+3, ymax/2
      printf "<text x=\"%d\" y=\"%d\" font-size=\"10\" text-anchor=\"end\" fill=\"#888\">0ms</text>\n", ML-5, MT+PH+3

      for (k = 0; k <= 4; k++) {
        tx = ML + (PW * k / 4)
        ti = min_idx + int(span * k / 4)
        printf "<line x1=\"%.1f\" y1=\"%d\" x2=\"%.1f\" y2=\"%d\" stroke=\"#888\" stroke-width=\"0.5\"/>\n", tx, MT+PH, tx, MT+PH+4
        printf "<text x=\"%.1f\" y=\"%d\" font-size=\"10\" text-anchor=\"middle\" fill=\"#888\">s%d</text>\n", tx, MT+PH+16, ti
      }

      print "</svg>"
    }
  ' "$SAMPLES_TSV"
}

echo "  rendering per-target tables..."
STATS_ROWS=""
for ((i=1; i<=NUM_TARGETS; i++)); do
  name="${TARGET_NAMES[$i]}"
  stats=$(compute_stats "$name")
  IFS='|' read -r total ok rate mean median p95 p99 maxv <<< "$stats"
  rate_class="ok"
  awk -v r="${rate:-0}" 'BEGIN{exit (r < 99) ? 0 : 1}' && rate_class="warn"
  awk -v r="${rate:-0}" 'BEGIN{exit (r < 95) ? 0 : 1}' && rate_class="fail"
  STATS_ROWS+="<tr><td>${name}</td><td>${total}</td><td class=\"${rate_class}\">${rate}%</td><td>${mean}</td><td>${median}</td><td>${p95}</td><td>${p99}</td><td>${maxv}</td></tr>"
done

echo "  generating per-target SVG charts..."
CHARTS_HTML=""
for ((i=1; i<=NUM_TARGETS; i++)); do
  name="${TARGET_NAMES[$i]}"
  CHARTS_HTML+="<h3>${name}</h3>$(generate_svg "$name")"
done

echo "  building outage table..."
OUTAGE_ROWS=""
while IFS=$'\t' read -r tgt si ei sts ets; do
  [[ -z "$tgt" ]] && continue
  dur_samples=$(( ei - si + 1 ))
  s_epoch=$(iso_to_epoch "$sts")
  e_epoch=$(iso_to_epoch "$ets")
  : ${s_epoch:=0}
  : ${e_epoch:=0}
  dur=$(( e_epoch - s_epoch ))
  (( dur < 1 )) && dur=$(( SampleIntervalSeconds * dur_samples ))
  OUTAGE_ROWS+="<tr><td>${tgt}</td><td>${sts}</td><td>${ets}</td><td>${dur}s</td><td>${dur_samples}</td></tr>"
done < "$ALL_OUTAGES"

OUTAGE_SECTION=""
if [[ -n "$OUTAGE_ROWS" ]]; then
  OUTAGE_SECTION="<h2>Outage Events</h2><table><tr><th>Target</th><th>Start</th><th>End</th><th>Duration</th><th>Samples</th></tr>${OUTAGE_ROWS}</table>"
fi

echo "  writing HTML..."

GEN_NOW=$(date +%Y-%m-%dT%H:%M:%S%z)
START_LOCAL=$(date -r $START_TS +%H:%M:%S)
END_LOCAL=$(date -r $END_TS_ACTUAL +%H:%M:%S)

cat > "$HTML_PATH" <<HTML
<!DOCTYPE html>
<html lang="en"><head><meta charset="utf-8">
<title>TruGrid Network Monitor - ${HOSTNAME_SHORT}</title>
<style>
:root { color-scheme: light dark; }
body { font-family: -apple-system, BlinkMacSystemFont, "Helvetica Neue", sans-serif; max-width: 1100px; margin: 2em auto; padding: 0 1.2em; line-height: 1.45; }
h1 { border-bottom: 3px solid #1972F5; padding-bottom: 0.3em; }
h2 { color: #1972F5; margin-top: 2em; font-size: 1.05em; text-transform: uppercase; letter-spacing: 0.5px; }
h3 { margin-top: 1.5em; font-size: 0.95em; color: #444; }
.meta { color: #888; font-size: 0.9em; }
table { border-collapse: collapse; width: 100%; margin-top: 0.5em; font-size: 0.9em; }
th, td { padding: 0.45em 0.7em; text-align: left; border-bottom: 1px solid #ddd; vertical-align: top; }
th { background: rgba(25,114,245,0.07); }
.ok   { color: #1f8a1f; font-weight: 600; }
.warn { color: #b8860b; font-weight: 600; }
.fail { color: #c0392b; font-weight: 600; }
.counts { display: flex; gap: 1em; flex-wrap: wrap; margin: 0.5em 0; }
.counts > div { background: rgba(25,114,245,0.06); padding: 0.5em 0.9em; border-radius: 4px; min-width: 100px; }
.counts > div .num { font-size: 1.4em; font-weight: 700; display: block; line-height: 1.1; }
.counts > div .lbl { font-size: 0.75em; color: #888; text-transform: uppercase; letter-spacing: 0.5px; }
ul.verdicts li { margin-bottom: 0.4em; }
.note { background: #fff8d6; padding: 0.6em 1em; border-left: 4px solid #e0c000; margin: 1.5em 0 0; font-size: 0.85em; }
code { font-family: ui-monospace, "SF Mono", Menlo, monospace; font-size: 0.9em; }
@media (prefers-color-scheme: dark) {
  body { background: #1a1a1a; color: #eee; }
  th { background: rgba(25,114,245,0.15); }
  th, td { border-bottom: 1px solid #333; }
  h3 { color: #ccc; }
  .counts > div { background: rgba(25,114,245,0.12); }
  .note { background: #2a260c; border-left-color: #b08800; color: #eee; }
}
</style></head><body>

<h1>TruGrid Network Monitor</h1>
<p class="meta">Generated ${GEN_NOW} &middot; Host: ${HOSTNAME_SHORT} &middot; Run: ${START_LOCAL} to ${END_LOCAL} (${ACTUAL_DURATION}s)</p>

<h2>Run Summary</h2>
<div class="counts">
  <div><span class="num">${SAMPLE_IDX}</span><span class="lbl">Sample rounds</span></div>
  <div><span class="num">${NUM_TARGETS}</span><span class="lbl">Targets</span></div>
  <div><span class="num">${TOTAL_OUTAGES}</span><span class="lbl">Outage events</span></div>
  <div><span class="num" style="font-size:0.95em;">${STOP_REASON}</span><span class="lbl">Stop reason</span></div>
</div>

<h2>Correlation Verdict</h2>
<div>${VERDICT_HTML}</div>

<h2>Per-Target Statistics</h2>
<p class="meta">Latency in milliseconds. Mean is sensitive to outliers; median and p95 are usually the interesting ones.</p>
<table>
<tr><th>Target</th><th>Samples</th><th>Success</th><th>Mean</th><th>Median</th><th>P95</th><th>P99</th><th>Max</th></tr>
${STATS_ROWS}
</table>

<h2>Latency Charts</h2>
${CHARTS_HTML}

${OUTAGE_SECTION}

<div class="note">v1.1 Mac port. Behavior mirrors Windows Test-TruGridNetworkMonitor.ps1. Relay IP discovered via lsof on the loopback 127.0.0.1:43xxx tunnel (Mac log does not record upstream IP, unlike Windows Connector log).</div>

</body></html>
HTML

echo "  HTML report: $HTML_PATH"
echo "REPORT_PATH=$HTML_PATH"
echo "Done."

[[ "$NO_OPEN" == "1" ]] || open "$HTML_PATH"


If none of this surfaces the cause, share the HTML report with TruGrid support along with the approximate time of the most recent disconnect.


HTML Report


Here is the resulting HTML report:



Updated on: 19/05/2026

Was this article helpful?

Share your feedback

Cancel

Thank you!