Problem
I was working on a custom GitHub Action that would evaluate a directory to ensure it conformed to certain standards (naming conventions, contained valid links, etc). This is for making the review process easier for the Microsoft WhatTheHack repo.
My custom Action is called via the GitHub Action YAML file. You pass a specific path
to the Action and it will evaluate that directory.
jobs:
checklinks:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Template Format Check
uses: jordanbean-msft/github-action-wth-template-check@v0.3.1
with:
path: ${{ github.workspace }}/004-HotelCaliVegasDevHack
You can see the specific checks that have failed in the GitHub Actions workflow log.

Once I got this code working, I needed to run it against a GitHub repo that contained lots of directories that needed to be checked.

The naive solution is to copy/paste the YAML code over and over again for each directory to scan. Not a sustainable choice.
jobs:
checklinks:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Template Format Check
uses: jordanbean-msft/github-action-wth-template-check@v0.3.1
with:
path: ${{ github.workspace }}/001-IntroToKubernetes
- name: Template Format Check
uses: jordanbean-msft/github-action-wth-template-check@v0.3.1
with:
path: ${{ github.workspace }}/002-IntroToAzureAI
...
The correct solution is to modify the GitHub Action to loop through all directories (specified via a flag). However, for fun, I wanted to see if there was a way I could do this in YAML.
Looping
Normally, you can’t write a loop in YAML. It is not a Turing-complete language.
Howver, there are various workarounds depending on the flavor of YAML (for example, using Jinja2, etc). Most of these solutions involve writing “code that writes code”. You would write “pseudo-YAML” and your script would output a valid YAML file.
GitHub Action workflows can do this too. This is done using the matrix
keyword. Normally, this is used to build multiple versions of your application (such as targeting different platforms).
In my case, I used it to auto-generate the YAML needed to loop through all the directories to scan in parallel.
The Wrong Way
I will need a GitHub Action workflow with 2 jobs.
First, I need to generate a list of the directories in the repo. A little shell scripting will do the trick.
ls -l | grep '^d' | awk -F ' ' '{print $9}' | grep -Po '\d{3}.*' | jq -R -s -c 'split("\n") | map(select(length > 0))'
I now have a JSON array of the names of all the directories I want to scan, which is the format the matrix
keyword expects.
["000-HowToHack","001-IntroToKubernetes","002-IntroToAzureAI","003-DrivingMissData","004-HotelCaliVegasDevHack",...]
This string needs to be saved into a variable so it can be passed to the next job.
echo "::set-output name=matrix::$(ls -l | grep '^d' | awk -F ' ' '{print $9}' | grep -Po '\d{3}.*' | jq -R -s -c 'split("\n") | map(select(length > 0))')"
I need to run this shell command in a GitHub Action workflow job step.
- name: Generate matrix with all modules of WhatTheHack repository
id: set-matrix
run: |
echo "::set-output name=matrix::$(ls -l | grep '^d' | awk -F ' ' '{print $9}' | grep -Po '\d{3}.*' | jq -R -s -c 'split("\n") | map(select(length > 0))')"
I need to store this array in a variable and make it available to the next job (via a variable I defined: matrix
).
generateInputPaths:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Generate matrix with all modules of WhatTheHack repository
id: set-matrix
run: |
echo "::set-output name=matrix::$(ls -l | grep '^d' | awk -F ' ' '{print $9}' | grep -Po '\d{3}.*' | jq -R -s -c 'split("\n") | map(select(length > 0))')"
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
I can now use that variable to define my matrix
build strategy. The matrix build strategy will auto-generate a number of parallel jobs equal to the number of items in the array I passed in the needs.generateInputPaths.outputs.matrix
variable. Each job will perform the same steps.
- Checkout the repo (which copies the repo code to the build agent filesystem)
- Run the template check on a specific directory (specified in the
path
variable).
checklinks:
needs: generateInputPaths
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
path: ${{ fromJson(needs.generateInputPaths.outputs.matrix) }}
steps:
- uses: actions/checkout@v2
- name: Template Format Check
uses: jordanbean-msft/github-action-wth-template-check@v0.3.1
with:
path: ${{ github.workspace }}/${{ matrix.path }}
The entire GitHub Action workflow looks like this.
name: Check Template Format
on:
workflow_dispatch:
jobs:
generateInputPaths:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Generate matrix with all modules of WhatTheHack repository
id: set-matrix
run: |
echo "::set-output name=matrix::$(ls -l | grep '^d' | awk -F ' ' '{print $9}' | grep -Po '\d{3}.*' | jq -R -s -c 'split("\n") | map(select(length > 0))')"
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
checklinks:
needs: generateInputPaths
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
path: ${{ fromJson(needs.generateInputPaths.outputs.matrix) }}
steps:
- uses: actions/checkout@v2
- name: Print Directory Structure
run: |
ls -R | grep ":$" | sed -e 's/:$//' -e 's/[^-][^\/]*\//--/g' -e 's/^/ /' -e 's/-/|/'
- name: Template Format Check
uses: jordanbean-msft/github-action-wth-template-check@v0.3.1
with:
path: ${{ github.workspace }}/${{ matrix.inputPath }}
Result
A run of this workflow shows that it auto generated & executed 47 parallel jobs!

Of course, this is hilariously inefficient because it has to clone the GitHub repo over and over again (since each job runs on an independent build agent that does not share a file system).
Final Solution
The correct solution was to modify the GitHub Action so that it provides a flag & does the looping inside the JavaScript, so the repo is only cloned once.
name: Template Format Check
uses: jordanbean-msft/github-action-wth-template-check@v0.4.0
with:
path: ${{ github.workspace }}
shouldScanSubdirectories: true
But where’s the fun in that?
Thanks for the article. I am new to Github Actions but was looking to try something similar with microservices where I used am currently using dorny/paths-filter@v2 to check if a certain directory changed and then pass a service name to a reusable workflow that handles an AWS CDK deploy.
This works but I would have to copy/paste (as you illustrated in your article). I am sharing an artifact to avoid checking out the code repeatedly but I might try your matrix to avoid the copy/paste.
At one point I had a gulp file locally working great using child process spawns but when I ran it on Github it just seemed to hang when the threads kicked off so that was kind of a bummer.
Anyways thanks for the great article!