chore: [WIP] experimental parallel kokoro system test#17608
chore: [WIP] experimental parallel kokoro system test#17608chalmerlowe wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces parallel execution for system tests in .kokoro/system.sh with concurrency limits and log segregation, and isolates gcloud configurations per run using unique temporary directories. The feedback suggests robust cleanup of the temporary directories using shell traps to prevent leaks if commands fail under set -e, sanitizing package names used in directory creation, and refactoring the duplicate process reaping logic into a helper function to improve maintainability.
| local gcloud_config_dir=$(mktemp -d -t "gcloud-config-${package_name}-XXXXXX") | ||
| local CLOUDSDK_CONFIG="${gcloud_config_dir}" |
There was a problem hiding this comment.
Since set -e is active, if any command fails during the test execution, the function/subshell will exit immediately, and the manual cleanup at the end of the function will be bypassed, leaking the temporary directory.
Using a trap on EXIT ensures that the temporary directory is cleaned up regardless of how the subshell exits. We should use double quotes in the trap definition so that the local variable gcloud_config_dir is expanded immediately, as the local variable will be out of scope when the subshell actually exits.
Additionally, if package_name contains any slashes, mktemp will fail because the template cannot contain directory separators. Sanitizing the package name first prevents this issue.
| local gcloud_config_dir=$(mktemp -d -t "gcloud-config-${package_name}-XXXXXX") | |
| local CLOUDSDK_CONFIG="${gcloud_config_dir}" | |
| local safe_package_name=$(echo "${package_name}" | tr '/' '_') | |
| local gcloud_config_dir=$(mktemp -d -t "gcloud-config-${safe_package_name}-XXXXXX") | |
| trap "rm -rf '${gcloud_config_dir}'" EXIT | |
| local CLOUDSDK_CONFIG="${gcloud_config_dir}" |
| rm -rf "${gcloud_config_dir}" | ||
| return $res |
There was a problem hiding this comment.
With the trap on EXIT configured at the start of the function, we can safely remove the manual rm -rf call here. We should also unregister the trap using trap - EXIT before returning to prevent it from running again if the shell continues.
| rm -rf "${gcloud_config_dir}" | |
| return $res | |
| trap - EXIT | |
| return $res |
| # Parallel Execution Logic | ||
| MAX_JOBS=${MAX_JOBS:-4} | ||
| active_jobs=0 | ||
| declare -A job_pids | ||
| declare -A job_pkgs | ||
| failed_packages=() | ||
| passed_packages=() |
There was a problem hiding this comment.
To avoid duplicating the complex process reaping logic, we can extract it into a helper function reap_finished_jobs. This improves maintainability and readability.
# Parallel Execution Logic
MAX_JOBS=${MAX_JOBS:-4}
active_jobs=0
declare -A job_pids
declare -A job_pkgs
failed_packages=()
passed_packages=()
reap_finished_jobs() {
for pid in "${!job_pids[@]}"; do
if ! kill -0 "$pid" 2>/dev/null; then
wait "$pid" && status=0 || status=$?
finished_pkg=${job_pkgs[$pid]}
if [ "$status" -eq 0 ]; then
echo "✔ [SUCCESS] ${finished_pkg}"
passed_packages+=("$finished_pkg")
else
echo "✘ [FAILURE] ${finished_pkg} (Exit Code: ${status})"
failed_packages+=("$finished_pkg")
fi
unset "job_pids[$pid]"
unset "job_pkgs[$pid]"
active_jobs=$((active_jobs - 1))
fi
done
}| while [ "$active_jobs" -ge "$MAX_JOBS" ]; do | ||
| for pid in "${!job_pids[@]}"; do | ||
| if ! kill -0 "$pid" 2>/dev/null; then | ||
| wait "$pid" && status=0 || status=$? | ||
| finished_pkg=${job_pkgs[$pid]} | ||
| if [ "$status" -eq 0 ]; then | ||
| echo "✔ [SUCCESS] ${finished_pkg}" | ||
| passed_packages+=("$finished_pkg") | ||
| else | ||
| echo "✘ [FAILURE] ${finished_pkg} (Exit Code: ${status})" | ||
| failed_packages+=("$finished_pkg") | ||
| fi | ||
| unset "job_pids[$pid]" | ||
| unset "job_pkgs[$pid]" | ||
| active_jobs=$((active_jobs - 1)) | ||
| fi | ||
| done | ||
| sleep 0.1 | ||
| done |
There was a problem hiding this comment.
Use the extracted reap_finished_jobs helper function here to simplify the concurrency limit loop.
| while [ "$active_jobs" -ge "$MAX_JOBS" ]; do | |
| for pid in "${!job_pids[@]}"; do | |
| if ! kill -0 "$pid" 2>/dev/null; then | |
| wait "$pid" && status=0 || status=$? | |
| finished_pkg=${job_pkgs[$pid]} | |
| if [ "$status" -eq 0 ]; then | |
| echo "✔ [SUCCESS] ${finished_pkg}" | |
| passed_packages+=("$finished_pkg") | |
| else | |
| echo "✘ [FAILURE] ${finished_pkg} (Exit Code: ${status})" | |
| failed_packages+=("$finished_pkg") | |
| fi | |
| unset "job_pids[$pid]" | |
| unset "job_pkgs[$pid]" | |
| active_jobs=$((active_jobs - 1)) | |
| fi | |
| done | |
| sleep 0.1 | |
| done | |
| while [ "$active_jobs" -ge "$MAX_JOBS" ]; do | |
| reap_finished_jobs | |
| sleep 0.1 | |
| done |
| while [ "$active_jobs" -gt 0 ]; do | ||
| for pid in "${!job_pids[@]}"; do | ||
| if ! kill -0 "$pid" 2>/dev/null; then | ||
| wait "$pid" && status=0 || status=$? | ||
| finished_pkg=${job_pkgs[$pid]} | ||
| if [ "$status" -eq 0 ]; then | ||
| echo "✔ [SUCCESS] ${finished_pkg}" | ||
| passed_packages+=("$finished_pkg") | ||
| else | ||
| echo "✘ [FAILURE] ${finished_pkg} (Exit Code: ${status})" | ||
| failed_packages+=("$finished_pkg") | ||
| fi | ||
| unset "job_pids[$pid]" | ||
| unset "job_pkgs[$pid]" | ||
| active_jobs=$((active_jobs - 1)) | ||
| fi | ||
| done | ||
| sleep 0.1 | ||
| done |
There was a problem hiding this comment.
Use the extracted reap_finished_jobs helper function here to simplify the final reaping loop.
| while [ "$active_jobs" -gt 0 ]; do | |
| for pid in "${!job_pids[@]}"; do | |
| if ! kill -0 "$pid" 2>/dev/null; then | |
| wait "$pid" && status=0 || status=$? | |
| finished_pkg=${job_pkgs[$pid]} | |
| if [ "$status" -eq 0 ]; then | |
| echo "✔ [SUCCESS] ${finished_pkg}" | |
| passed_packages+=("$finished_pkg") | |
| else | |
| echo "✘ [FAILURE] ${finished_pkg} (Exit Code: ${status})" | |
| failed_packages+=("$finished_pkg") | |
| fi | |
| unset "job_pids[$pid]" | |
| unset "job_pkgs[$pid]" | |
| active_jobs=$((active_jobs - 1)) | |
| fi | |
| done | |
| sleep 0.1 | |
| done | |
| while [ "$active_jobs" -gt 0 ]; do | |
| reap_finished_jobs | |
| sleep 0.1 | |
| done |
Note
This is an experimental PR intended to test parallelization of system tests.