summaryrefslogtreecommitdiff
path: root/forge
diff options
context:
space:
mode:
authorSt33v <github@f3rr3t.com>2026-02-01 14:53:04 +1100
committerSt33v <github@f3rr3t.com>2026-02-01 14:53:04 +1100
commitbb93e8d7c0a37a38d80b75d55117a75aa53e1de1 (patch)
treee1d1b4b8cd3c20b8dca38df5d141c1b337ee7780 /forge
parent70f2fc45af8a0ea98e0e6f7b4254928dc7bfe317 (diff)
robots / sitemap demo version
Diffstat (limited to 'forge')
-rwxr-xr-xforge/script/gen-robots-sitemap.sh48
-rw-r--r--forge/script/rsyncGlitch.txt51
2 files changed, 99 insertions, 0 deletions
diff --git a/forge/script/gen-robots-sitemap.sh b/forge/script/gen-robots-sitemap.sh
new file mode 100755
index 0000000..30dbdea
--- /dev/null
+++ b/forge/script/gen-robots-sitemap.sh
@@ -0,0 +1,48 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+OUT_DIR="${1:-.faircamp_build}"
+SITE_URL="${2:-https://st33v.com}"
+
+cd "$OUT_DIR"
+
+# --- robots.txt ---
+cat > robots.txt <<EOF
+User-agent: *
+Allow: /
+
+Sitemap: ${SITE_URL%/}/sitemap.xml
+EOF
+
+# --- sitemap.xml ---
+# Include HTML pages + common content types; exclude obvious junk.
+# If you have multiple languages/hosts, we can expand later.
+tmp="$(mktemp)"
+find . -type f \( -name '*.html' -o -name '*.pdf' -o -name '*.mp3' -o -name '*.flac' \) \
+ ! -path './.git/*' ! -path './assets/*' ! -path './static/*' \
+ -print0 \
+| sort -z \
+| while IFS= read -r -d '' f; do
+ # Turn ./path/index.html into /path/index.html
+ path="${f#./}"
+ # Basic lastmod (UTC) from file mtime
+ lastmod="$(date -u -r "$f" +%Y-%m-%dT%H:%M:%SZ)"
+ printf '%s\t%s\n' "$path" "$lastmod"
+ done > "$tmp"
+
+{
+ printf '%s\n' '<?xml version="1.0" encoding="UTF-8"?>'
+ printf '%s\n' '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">'
+ while IFS=$'\t' read -r path lastmod; do
+ # Escape ampersands minimally
+ url="${SITE_URL%/}/$(printf '%s' "$path" | sed 's/&/\&amp;/g')"
+ printf ' <url><loc>%s</loc><lastmod>%s</lastmod></url>\n' "$url" "$lastmod"
+ done < "$tmp"
+ printf '%s\n' '</urlset>'
+} > sitemap.xml
+
+rm -f "$tmp"
+
+echo "Wrote: $OUT_DIR/robots.txt"
+echo "Wrote: $OUT_DIR/sitemap.xml"
+
diff --git a/forge/script/rsyncGlitch.txt b/forge/script/rsyncGlitch.txt
new file mode 100644
index 0000000..0b3142b
--- /dev/null
+++ b/forge/script/rsyncGlitch.txt
@@ -0,0 +1,51 @@
+This is background for gpt.
+Here I explain th details of the static site generation schema for st33v.com.
+
+Brifely, the site is built from two 'faircamp's:
+ one for the main site and
+ song of the day (sotd), which lives in st33v.com/sotd (i.e thats its base url)
+
+detail:
+st33v@cr4y:~/dox/st33v.com$ tree -La 2
+.
+├── faircamp
+│   ├── campsite.png
+│   ├── catalog.eno
+│   ├── deploy.sh
+│   ├── drMorbius
+│   ├── eli
+│   ├── .faircamp_build
+│   ├── .faircamp_cache
+│   ├── robots.txt
+│   └── st33vTM
+├── forge
+│   ├── automationUseCase.txt
+│   ├── in
+│   ├── out
+│   ├── script
+│   └── template
+├── .git
+│   └─[redacted for clarity]
+└── sotd
+ ├── 2016-01-29-pluto
+ ├── 2026-01-29-devonian-dunkleosteus
+ ├── 2026-01-30-grouse
+ ├── 2026-01-30-llmtm
+ ├── catalog.eno
+ ├── .faircamp_build
+ ├── .faircamp_cache
+ └── sotd_cover.png
+
+The two static site are held in the two .faircamp_build directories
+
+rsync copies their contents to st33.com and st33v.com/sotd, respectively.
+
+BUT the base site know nothing of sotd, so the --delete directive deletes all of sotd.
+This is not what we want.
+
+Question: How can we protect sotd from the ravages of rsync?
+
+There is also a second question around the robots & sitemap generator. Are we allowed to have a robots.txt & sitemap.xml in sot as well?
+
+Or is there a more elegant way to include the entire /sotd path in th first script?
+