diff options
| author | St33v <github@f3rr3t.com> | 2026-02-01 14:53:04 +1100 |
|---|---|---|
| committer | St33v <github@f3rr3t.com> | 2026-02-01 14:53:04 +1100 |
| commit | bb93e8d7c0a37a38d80b75d55117a75aa53e1de1 (patch) | |
| tree | e1d1b4b8cd3c20b8dca38df5d141c1b337ee7780 | |
| parent | 70f2fc45af8a0ea98e0e6f7b4254928dc7bfe317 (diff) | |
robots / sitemap demo version
| -rwxr-xr-x | forge/script/gen-robots-sitemap.sh | 48 | ||||
| -rw-r--r-- | forge/script/rsyncGlitch.txt | 51 |
2 files changed, 99 insertions, 0 deletions
diff --git a/forge/script/gen-robots-sitemap.sh b/forge/script/gen-robots-sitemap.sh new file mode 100755 index 0000000..30dbdea --- /dev/null +++ b/forge/script/gen-robots-sitemap.sh @@ -0,0 +1,48 @@ +#!/usr/bin/env bash +set -euo pipefail + +OUT_DIR="${1:-.faircamp_build}" +SITE_URL="${2:-https://st33v.com}" + +cd "$OUT_DIR" + +# --- robots.txt --- +cat > robots.txt <<EOF +User-agent: * +Allow: / + +Sitemap: ${SITE_URL%/}/sitemap.xml +EOF + +# --- sitemap.xml --- +# Include HTML pages + common content types; exclude obvious junk. +# If you have multiple languages/hosts, we can expand later. +tmp="$(mktemp)" +find . -type f \( -name '*.html' -o -name '*.pdf' -o -name '*.mp3' -o -name '*.flac' \) \ + ! -path './.git/*' ! -path './assets/*' ! -path './static/*' \ + -print0 \ +| sort -z \ +| while IFS= read -r -d '' f; do + # Turn ./path/index.html into /path/index.html + path="${f#./}" + # Basic lastmod (UTC) from file mtime + lastmod="$(date -u -r "$f" +%Y-%m-%dT%H:%M:%SZ)" + printf '%s\t%s\n' "$path" "$lastmod" + done > "$tmp" + +{ + printf '%s\n' '<?xml version="1.0" encoding="UTF-8"?>' + printf '%s\n' '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">' + while IFS=$'\t' read -r path lastmod; do + # Escape ampersands minimally + url="${SITE_URL%/}/$(printf '%s' "$path" | sed 's/&/\&/g')" + printf ' <url><loc>%s</loc><lastmod>%s</lastmod></url>\n' "$url" "$lastmod" + done < "$tmp" + printf '%s\n' '</urlset>' +} > sitemap.xml + +rm -f "$tmp" + +echo "Wrote: $OUT_DIR/robots.txt" +echo "Wrote: $OUT_DIR/sitemap.xml" + diff --git a/forge/script/rsyncGlitch.txt b/forge/script/rsyncGlitch.txt new file mode 100644 index 0000000..0b3142b --- /dev/null +++ b/forge/script/rsyncGlitch.txt @@ -0,0 +1,51 @@ +This is background for gpt. +Here I explain th details of the static site generation schema for st33v.com. + +Brifely, the site is built from two 'faircamp's: + one for the main site and + song of the day (sotd), which lives in st33v.com/sotd (i.e thats its base url) + +detail: +st33v@cr4y:~/dox/st33v.com$ tree -La 2 +. +├── faircamp +│ ├── campsite.png +│ ├── catalog.eno +│ ├── deploy.sh +│ ├── drMorbius +│ ├── eli +│ ├── .faircamp_build +│ ├── .faircamp_cache +│ ├── robots.txt +│ └── st33vTM +├── forge +│ ├── automationUseCase.txt +│ ├── in +│ ├── out +│ ├── script +│ └── template +├── .git +│ └─[redacted for clarity] +└── sotd + ├── 2016-01-29-pluto + ├── 2026-01-29-devonian-dunkleosteus + ├── 2026-01-30-grouse + ├── 2026-01-30-llmtm + ├── catalog.eno + ├── .faircamp_build + ├── .faircamp_cache + └── sotd_cover.png + +The two static site are held in the two .faircamp_build directories + +rsync copies their contents to st33.com and st33v.com/sotd, respectively. + +BUT the base site know nothing of sotd, so the --delete directive deletes all of sotd. +This is not what we want. + +Question: How can we protect sotd from the ravages of rsync? + +There is also a second question around the robots & sitemap generator. Are we allowed to have a robots.txt & sitemap.xml in sot as well? + +Or is there a more elegant way to include the entire /sotd path in th first script? + |
