<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>sebiwi</title><link>https://sebiwi.github.io/tags/databases/</link><description>Recent content from sebiwi</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><managingEditor>contact.sebiwi@gmail.com (sebiwi)</managingEditor><webMaster>contact.sebiwi@gmail.com (sebiwi)</webMaster><lastBuildDate>Mon, 22 Jun 2026 09:00:00 +0200</lastBuildDate><atom:link href="https://sebiwi.github.io/tags/databases/index.xml" rel="self" type="application/rss+xml"/><item><title>Counting the florbs: BRIN indexes, rollup tables, and a 90-second query</title><link>https://sebiwi.github.io/blog/counting-the-florbs-brin-indexes-rollup-tables-and-a-90-second-query/</link><pubDate>Mon, 22 Jun 2026 09:00:00 +0200</pubDate><author>sebiwi</author><guid>https://sebiwi.github.io/blog/counting-the-florbs-brin-indexes-rollup-tables-and-a-90-second-query/</guid><description>A story about counting fifteen million of something, fast. What a BRIN index is and why it’s great, why it failed at volume, and how a daily rollup table fixed it, proven with query plans.</description><content:encoded>&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;p&gt;I needed to count the florbs. All of them, across the whole system, over any
time window you care to name. The florbs lived in Postgres, so I tried using a
&lt;a href="https://www.postgresql.org/docs/current/brin-intro.html"&gt;BRIN index&lt;/a&gt;, which was great for small windows and useless for large ones.
Then I built a daily rollup table to pre-aggregate the counts, kept the BRIN
around for the small stuff, and ended up with a query that went from about 90
seconds to a handful of milliseconds.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;If you keep reading, I’m going to tell you what a BRIN index is, why it’s
great, why it wasn’t enough, how a rollup table works, and how to use it for big
counts.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="the-florbs"&gt;The florbs&lt;/h2&gt;
&lt;p&gt;I’m not going to tell you what a florb is. It doesn’t matter (trust me). It&amp;rsquo;s
better this way. It&amp;rsquo;s more fun too. Here’s everything you actually need to know.&lt;/p&gt;
&lt;p&gt;There are about fifteen million florbs living in a Postgres database. There are
more of them every second. Each florb is either still florbing, has florbed
cleanly, or has failed to florb, and each one takes some amount of time to do
so. They belong to different tenants. One day, someone important walked over and
asked a perfectly reasonable question: &lt;em&gt;across all of them, what fraction of
florbs are failing right now? And last week? And over the last 90 days?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;We already had a way to look at one tenant’s florbs. What we didn’t have was the
global view: every florb, every tenant, an arbitrary time window, the error rate
at a glance. So I went to build it.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s just counting, right?&lt;/p&gt;
&lt;h2 id="first-attempt-a-brin-index"&gt;First attempt: a BRIN index&lt;/h2&gt;
&lt;p&gt;The florbs live in one big ass table, and the only thing my query filters on is
&lt;code&gt;created_at&lt;/code&gt;. Count the rows in a time window, group by status, done. The naive
version is fine until the table gets big, and fifteen million rows is big enough
to make a plain sequential scan crawl. In other words, not something you want to
run on every dashboard load.&lt;/p&gt;
&lt;p&gt;So then I index &lt;code&gt;created_at&lt;/code&gt;. Florbs are append-mostly, and they arrive roughly
in time order. This means that the table is already physically laid out on disk
in &lt;code&gt;created_at&lt;/code&gt; order. That is the exact situation &lt;a href="https://www.postgresql.org/docs/current/brin-intro.html"&gt;BRIN indexes&lt;/a&gt; were built
for.&lt;/p&gt;
&lt;p&gt;A BRIN index (Block Range INdex) is super lazy. I like that. Instead of storing
one entry per row like a B-tree, it slices the table in ranges of physical
blocks and stores only the minimum and maximum value of the indexed column for
each range. That’s the whole idea. For a table sorted by time, each block range
covers a small slice of time, so when you ask for “florbs created last Tuesday”,
Postgres can skip every block range whose min and max don’t overlap Tuesday, and
only look at the few that do. Other databases do this too, they just don&amp;rsquo;t call
them BRIN: ClickHouse does &lt;a href="https://clickhouse.com/docs/optimize/skipping-indexes#minmax"&gt;minmax skip indexes&lt;/a&gt;, for example.&lt;/p&gt;
&lt;p&gt;The result is an index that is small (kilobytes, not gigabytes), costs almost
nothing to maintain on insert, and makes narrow time-window queries fast. A
one-day window dropped to around 268 milliseconds. Problem solved. Right?&lt;/p&gt;
&lt;h2 id="where-it-fell-apart"&gt;Where it fell apart&lt;/h2&gt;
&lt;p&gt;BRIN indexes are &lt;em&gt;lossy&lt;/em&gt;: a block range only tells Postgres that matching rows
&lt;em&gt;might&lt;/em&gt; live in those blocks, never that they definitely do. Postgres still has
to read every block in the matching ranges and re-check each row by hand. When
your window is narrow and only a handful of ranges match, that recheck is cheap.
When your window is wide, it&amp;rsquo;s expensive.&lt;/p&gt;
&lt;p&gt;A 30-day window matched about 4.4 million rows, roughly 30% of the table. At
that point the planner did the sensible thing and gave up on the index entirely:
if you’re going to read a third of the table anyway, the bookkeeping of a lossy
index is pure overhead, so it fell back to a parallel sequential scan. That scan
took about 89 seconds. The 90-day window was worse.&lt;/p&gt;
&lt;p&gt;But then I started thinking about this differently. A sequential scan reads the
whole table no matter how wide your window is. One day or ninety days, same
cost, because it reads everything either way. There is no index in the world
that fixes this, because the problem was never finding the rows. The problem is
that there are simply too many of them to count on demand. The BRIN index was a
good answer to a subset of the question.&lt;/p&gt;
&lt;h2 id="second-attempt-a-rollup-table"&gt;Second attempt: a rollup table&lt;/h2&gt;
&lt;p&gt;Alright, counting fifteen million florbs on every request is too slow, let&amp;rsquo;s not
do that. How about counting them once ahead of time, and keep the running totals
around?&lt;/p&gt;
&lt;p&gt;That’s a rollup table. Mine pre-aggregates the florbs by day, at a grain of
(tenant, day, status). A whole day’s worth of florbs for a given tenant and
status collapses into a single row holding a count. A wide-window query then
sums a few thousand pre-aggregated rows instead of scanning millions of raw
ones.&lt;/p&gt;
&lt;p&gt;There is one design decision here that matters more than all the others: &lt;strong&gt;store
the raw data, never the finished answers.&lt;/strong&gt; It is tempting to store an
&lt;code&gt;average_duration&lt;/code&gt; or a &lt;code&gt;failure_rate&lt;/code&gt; directly. Don’t do that. If you want to
combine two days, a stored average is useless, because the average of two
averages is a lie unless both days had exactly the same number of florbs.&lt;/p&gt;
&lt;p&gt;So instead of an average I store a &lt;code&gt;sum_duration_ms&lt;/code&gt; and a &lt;code&gt;duration_count&lt;/code&gt;, and
instead of a rate I store the raw counts per status. Then any span of days
reconstructs perfectly: the failure rate is &lt;code&gt;SUM(failed) / SUM(total)&lt;/code&gt;, and the
average duration is &lt;code&gt;SUM(sum_duration_ms) / SUM(duration_count)&lt;/code&gt;, both computed
at read time over however many days you asked for. Same numbers as the live
query, every single time.&lt;/p&gt;
&lt;p&gt;The cool side effect is that this table barely grows. Its size depends on the
number of tenants, statuses, and days, not on the number of florbs. A year of
history is a few hundred thousand rows, which is kinda nothing.&lt;/p&gt;
&lt;p&gt;Reads route themselves by window width. Narrow windows (under a week) still hit
the live table through the BRIN index, where they’re fast, exact, and perfectly
fresh. Wide windows hit the rollup. Best of both worlds.&lt;/p&gt;
&lt;h3 id="keeping-it-fresh"&gt;Keeping it fresh&lt;/h3&gt;
&lt;p&gt;The rollup is rebuilt by a small loop that runs inside the app itself: refresh
once on startup, then every fifteen minutes. Since many instances of the app are
running, they all coordinate through a Postgres &lt;a href="https://www.postgresql.org/docs/current/explicit-locking.html#ADVISORY-LOCKS"&gt;advisory lock&lt;/a&gt;, so exactly
one of them does the work at a time and the rest happily do nothing (execute but
no-op). No cron, no extra service, no new infrastructure, fresh and frugal.&lt;/p&gt;
&lt;p&gt;Two interesting implementation details here. First, each refresh deletes and
rebuilds a trailing window of days rather than upserting, because a florb can
change its verdict after the fact (it goes from still florbing to florbed or
failed), which means a bucket’s count can go &lt;em&gt;down&lt;/em&gt;, and an upsert would never
notice. Second, it only recomputes the last few days, because a florb’s
timestamp never changes once it is created, so older buckets are already final.
That recompute is a quick, BRIN-assisted scan of recent rows. There&amp;rsquo;s one
assumption here, based on my florbs behavior: the trailing window has to be at
least as long as a florb can take to settle. If a florb could keep florbing for
longer than the window, it would age out still counted as running, and its final
verdict would never make it into the rollup. Your florbs may behave differently.&lt;/p&gt;
&lt;p&gt;There is also a small downside: rollup answers can be up to fifteen minutes
stale, and they are aligned to day boundaries at the very edges of the window.
On a 90-day dashboard, it&amp;rsquo;s a rounding error. On the narrow windows where you
would actually notice, you’re reading the live table anyway, so it doesn’t
apply.&lt;/p&gt;
&lt;h2 id="the-data-model-if-you-want-to-steal-it"&gt;The data model, if you want to steal it&lt;/h2&gt;
&lt;p&gt;Here is the whole pattern, anonymized into the florb domain, ready to adapt. The
base table is whatever you already have; the only assumptions are a timestamp
and a status:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#c0caf5;background-color:#1a1b26;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#414868;font-style:italic"&gt;-- The table you already have. Append-mostly, naturally ordered by time.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#bb9af7"&gt;CREATE&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;TABLE&lt;/span&gt; florbs (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id BIGSERIAL &lt;span style="color:#bb9af7"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;KEY&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tenant_id &lt;span style="color:#9ece6a"&gt;INTEGER&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;NOT&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;NULL&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; status &lt;span style="color:#9ece6a"&gt;TEXT&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;NOT&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;NULL&lt;/span&gt;, &lt;span style="color:#414868;font-style:italic"&gt;-- &amp;#39;running&amp;#39; | &amp;#39;completed&amp;#39; | &amp;#39;failed&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; created_at TIMESTAMPTZ &lt;span style="color:#bb9af7"&gt;NOT&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;NULL&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;DEFAULT&lt;/span&gt; NOW(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; duration_ms &lt;span style="color:#9ece6a"&gt;BIGINT&lt;/span&gt; &lt;span style="color:#414868;font-style:italic"&gt;-- NULL until the florb finishes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Step one, the BRIN index, for the narrow windows:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#c0caf5;background-color:#1a1b26;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#bb9af7"&gt;CREATE&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;INDEX&lt;/span&gt; CONCURRENTLY idx_florbs_created_at_brin
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#bb9af7"&gt;ON&lt;/span&gt; florbs
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#bb9af7"&gt;USING&lt;/span&gt; BRIN (created_at)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#bb9af7"&gt;WITH&lt;/span&gt; (pages_per_range &lt;span style="color:#9ece6a;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#e0af68"&gt;32&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Step two, the rollup table. Notice that it stores counts and duration
components, never averages or rates:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#c0caf5;background-color:#1a1b26;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#bb9af7"&gt;CREATE&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;TABLE&lt;/span&gt; florb_daily_stats (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tenant_id &lt;span style="color:#9ece6a"&gt;INTEGER&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;NOT&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;NULL&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#bb9af7"&gt;day&lt;/span&gt; &lt;span style="color:#9ece6a"&gt;DATE&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;NOT&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;NULL&lt;/span&gt;, &lt;span style="color:#414868;font-style:italic"&gt;-- (created_at AT TIME ZONE &amp;#39;UTC&amp;#39;)::date
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; status &lt;span style="color:#9ece6a"&gt;TEXT&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;NOT&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;NULL&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#bb9af7"&gt;count&lt;/span&gt; &lt;span style="color:#9ece6a"&gt;BIGINT&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;NOT&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;NULL&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;DEFAULT&lt;/span&gt; &lt;span style="color:#e0af68"&gt;0&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sum_duration_ms &lt;span style="color:#9ece6a"&gt;BIGINT&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;NOT&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;NULL&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;DEFAULT&lt;/span&gt; &lt;span style="color:#e0af68"&gt;0&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; duration_count &lt;span style="color:#9ece6a"&gt;BIGINT&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;NOT&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;NULL&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;DEFAULT&lt;/span&gt; &lt;span style="color:#e0af68"&gt;0&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; refreshed_at TIMESTAMPTZ &lt;span style="color:#bb9af7"&gt;NOT&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;NULL&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;DEFAULT&lt;/span&gt; NOW(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#bb9af7"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;KEY&lt;/span&gt; (tenant_id, &lt;span style="color:#bb9af7"&gt;day&lt;/span&gt;, status)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#bb9af7"&gt;CREATE&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;INDEX&lt;/span&gt; idx_florb_daily_stats_day &lt;span style="color:#bb9af7"&gt;ON&lt;/span&gt; florb_daily_stats (&lt;span style="color:#bb9af7"&gt;day&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Step three, the refresh. Delete a trailing window and rebuild it in one
transaction, behind an advisory lock so only one process runs it. Here &lt;code&gt;$1&lt;/code&gt; is
something like today minus three days:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#c0caf5;background-color:#1a1b26;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#bb9af7"&gt;SELECT&lt;/span&gt; pg_try_advisory_xact_lock(&lt;span style="color:#e0af68"&gt;8675309&lt;/span&gt;); &lt;span style="color:#414868;font-style:italic"&gt;-- losers no-op; one runner at a time.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#414868;font-style:italic"&gt;-- $1 must be a UTC-midnight boundary, or the oldest day is deleted but only partly rebuilt.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#bb9af7"&gt;DELETE&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;FROM&lt;/span&gt; florb_daily_stats
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#bb9af7"&gt;WHERE&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;day&lt;/span&gt; &lt;span style="color:#9ece6a;font-weight:bold"&gt;&amp;gt;=&lt;/span&gt; (&lt;span style="color:#db4b4b"&gt;$&lt;/span&gt;&lt;span style="color:#e0af68"&gt;1&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;AT&lt;/span&gt; TIME &lt;span style="color:#bb9af7"&gt;ZONE&lt;/span&gt; &lt;span style="color:#9ece6a"&gt;&amp;#39;UTC&amp;#39;&lt;/span&gt;)::&lt;span style="color:#9ece6a"&gt;date&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#bb9af7"&gt;INSERT&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;INTO&lt;/span&gt; florb_daily_stats
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (tenant_id, &lt;span style="color:#bb9af7"&gt;day&lt;/span&gt;, status, &lt;span style="color:#bb9af7"&gt;count&lt;/span&gt;, sum_duration_ms, duration_count, refreshed_at)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#bb9af7"&gt;SELECT&lt;/span&gt; tenant_id,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (created_at &lt;span style="color:#bb9af7"&gt;AT&lt;/span&gt; TIME &lt;span style="color:#bb9af7"&gt;ZONE&lt;/span&gt; &lt;span style="color:#9ece6a"&gt;&amp;#39;UTC&amp;#39;&lt;/span&gt;)::&lt;span style="color:#9ece6a"&gt;date&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;AS&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;day&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; status,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#bb9af7"&gt;COUNT&lt;/span&gt;(&lt;span style="color:#9ece6a;font-weight:bold"&gt;*&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; COALESCE(&lt;span style="color:#bb9af7"&gt;SUM&lt;/span&gt;(duration_ms), &lt;span style="color:#e0af68"&gt;0&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#bb9af7"&gt;COUNT&lt;/span&gt;(duration_ms),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; NOW()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#bb9af7"&gt;FROM&lt;/span&gt; florbs
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#bb9af7"&gt;WHERE&lt;/span&gt; created_at &lt;span style="color:#9ece6a;font-weight:bold"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#db4b4b"&gt;$&lt;/span&gt;&lt;span style="color:#e0af68"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#bb9af7"&gt;GROUP&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;BY&lt;/span&gt; tenant_id, &lt;span style="color:#bb9af7"&gt;day&lt;/span&gt;, status;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;And finally the read, reconstructing the stats from the stored components over
whatever window you want:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#c0caf5;background-color:#1a1b26;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#bb9af7"&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#bb9af7"&gt;SUM&lt;/span&gt;(&lt;span style="color:#bb9af7"&gt;count&lt;/span&gt;) &lt;span style="color:#bb9af7"&gt;AS&lt;/span&gt; total,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#bb9af7"&gt;SUM&lt;/span&gt;(&lt;span style="color:#bb9af7"&gt;count&lt;/span&gt;) FILTER (&lt;span style="color:#bb9af7"&gt;WHERE&lt;/span&gt; status &lt;span style="color:#9ece6a;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#9ece6a"&gt;&amp;#39;completed&amp;#39;&lt;/span&gt;) &lt;span style="color:#bb9af7"&gt;AS&lt;/span&gt; completed,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#bb9af7"&gt;SUM&lt;/span&gt;(&lt;span style="color:#bb9af7"&gt;count&lt;/span&gt;) FILTER (&lt;span style="color:#bb9af7"&gt;WHERE&lt;/span&gt; status &lt;span style="color:#9ece6a;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#9ece6a"&gt;&amp;#39;failed&amp;#39;&lt;/span&gt;) &lt;span style="color:#bb9af7"&gt;AS&lt;/span&gt; failed,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#bb9af7"&gt;SUM&lt;/span&gt;(&lt;span style="color:#bb9af7"&gt;count&lt;/span&gt;) FILTER (&lt;span style="color:#bb9af7"&gt;WHERE&lt;/span&gt; status &lt;span style="color:#9ece6a;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#9ece6a"&gt;&amp;#39;running&amp;#39;&lt;/span&gt;) &lt;span style="color:#bb9af7"&gt;AS&lt;/span&gt; running,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#bb9af7"&gt;SUM&lt;/span&gt;(sum_duration_ms) &lt;span style="color:#9ece6a;font-weight:bold"&gt;/&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;NULLIF&lt;/span&gt;(&lt;span style="color:#bb9af7"&gt;SUM&lt;/span&gt;(duration_count), &lt;span style="color:#e0af68"&gt;0&lt;/span&gt;) &lt;span style="color:#bb9af7"&gt;AS&lt;/span&gt; avg_duration_ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#bb9af7"&gt;FROM&lt;/span&gt; florb_daily_stats
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#bb9af7"&gt;WHERE&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;day&lt;/span&gt; &lt;span style="color:#9ece6a;font-weight:bold"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#db4b4b"&gt;$&lt;/span&gt;&lt;span style="color:#e0af68"&gt;1&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;AND&lt;/span&gt; &lt;span style="color:#bb9af7"&gt;day&lt;/span&gt; &lt;span style="color:#9ece6a;font-weight:bold"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#db4b4b"&gt;$&lt;/span&gt;&lt;span style="color:#e0af68"&gt;2&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;That last query is the whole point. It touches a few thousand rows, reconstructs
the same totals and averages the live query would have computed, and returns
fast. Add a &lt;code&gt;tenant_id&lt;/code&gt; filter and you get a per-tenant view out of the same
table.&lt;/p&gt;
&lt;h2 id="proof-reading-the-query-plans"&gt;Proof: reading the query plans&lt;/h2&gt;
&lt;p&gt;Data trumps debate, so let’s look at what Postgres actually did. An
&lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; on the two paths tells the complete story.&lt;/p&gt;
&lt;p&gt;First, the wide-window query against the live table, the one we’re trying to
avoid (a narrower window than the 90-second worst case, but still wide enough to
make the point):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#c0caf5;background-color:#1a1b26;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Parallel Bitmap Heap Scan on florbs
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Index: idx_florbs_created_at_brin
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Heap Blocks: lossy=154514
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Rows Removed by Index Recheck: 26511
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: read=416403 (~3.3 GB off disk)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; rows ≈ 1.78M
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Execution Time: 11014 ms
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Everything painful is right there. &lt;code&gt;Heap Blocks: lossy&lt;/code&gt; and
&lt;code&gt;Rows Removed by Index Recheck&lt;/code&gt; are the BRIN tax: Postgres visited about 154,000
blocks the index said &lt;em&gt;might&lt;/em&gt; match, then threw away the rows that didn’t. It
read about 416,000 pages, roughly 3.3 GB, and aggregated 1.78 million rows, even
with parallel workers helping. Eleven seconds, and that was one of the faster
runs. The grouped variant took about 13.6 seconds.&lt;/p&gt;
&lt;p&gt;Now the same question, answered from the rollup:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#c0caf5;background-color:#1a1b26;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Bitmap Index Scan on idx_florb_daily_stats_day
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Index Cond: day &amp;gt;= &amp;#39;2026-06-01&amp;#39; AND day &amp;lt;= &amp;#39;2026-06-14&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Bitmap Heap Scan on florb_daily_stats
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; rows = 3031 Buffers: read=240
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Execution Time: 29.875 ms
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Three thousand rows instead of nearly two million. Two hundred and forty pages
instead of four hundred thousand. About 30 milliseconds cold, and under 10
milliseconds once the cache is warm. That works out to roughly 370 times faster
while reading around 1,750 times fewer pages.&lt;/p&gt;
&lt;p&gt;Same numbers out, three orders of magnitude less work to get them.&lt;/p&gt;
&lt;h2 id="final-thoughts"&gt;Final thoughts&lt;/h2&gt;
&lt;p&gt;I did consider other options before reaching for a second table. A big covering
B-tree index would have made the scan index-only, but it scales with the number
of rows matched, so 90-day windows would still be measured in seconds, and it is
a large index to maintain on every insert. A dedicated time-series database is
the textbook tool for this kind of question, except that florbs mutate after
they are created, and most time-series stores really don’t want you updating
points after the fact, so it was the wrong shape for the data. The data could
have been modeled differently in this scenario, but it still implied adding an
extra infrastructure component, with its operational overhead. The boring
relational rollup won on simplicity.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Lessons learned&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Match the index to how you query the data, and when no index can save you,
stop querying the raw data and pre-aggregate it.&lt;/li&gt;
&lt;li&gt;BRIN was the right tool for small windows and a trap for large ones, and
knowing the difference is most of the job.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You still wanna know what a florb is? Why? Some mysteries are load-bearing.&lt;/p&gt;</content:encoded></item><item><title>You are not your code</title><link>https://sebiwi.github.io/comics/you-are-not-your-code/</link><pubDate>Mon, 11 Nov 2019 07:19:02 +0100</pubDate><author>sebiwi</author><guid>https://sebiwi.github.io/comics/you-are-not-your-code/</guid><description>You are not your code. This code looks like it's been written by an imbecil. That remark is about your code, not about you.</description><content:encoded>&lt;img src="https://sebiwi.github.io/images/comics/2019-11-11-you-are-not-your-code.jpg" /&gt;</content:encoded></item><item><title>DST</title><link>https://sebiwi.github.io/comics/dst/</link><pubDate>Mon, 04 Nov 2019 07:19:02 +0100</pubDate><author>sebiwi</author><guid>https://sebiwi.github.io/comics/dst/</guid><description>DST makes my head spin and I often wonder about the reasons we're still using it. It must be hard to tell people that all of their suffering was for nothing.</description><content:encoded>&lt;img src="https://sebiwi.github.io/images/comics/2019-11-04-dst.jpg" /&gt;</content:encoded></item><item><title>ext5</title><link>https://sebiwi.github.io/comics/ext5/</link><pubDate>Mon, 28 Oct 2019 07:19:02 +0100</pubDate><author>sebiwi</author><guid>https://sebiwi.github.io/comics/ext5/</guid><description>ext5 will be directly integrated into systemd.</description><content:encoded>&lt;img src="https://sebiwi.github.io/images/comics/2019-10-28-ext5.jpg" /&gt;</content:encoded></item><item><title>KPI</title><link>https://sebiwi.github.io/comics/kpi/</link><pubDate>Mon, 21 Oct 2019 07:19:02 +0100</pubDate><author>sebiwi</author><guid>https://sebiwi.github.io/comics/kpi/</guid><description>If you want to generate some bugs, always remember to fix your deadline.</description><content:encoded>&lt;img src="https://sebiwi.github.io/images/comics/2019-10-21-kpi.jpg" /&gt;</content:encoded></item><item><title>Libra</title><link>https://sebiwi.github.io/comics/libra/</link><pubDate>Mon, 14 Oct 2019 07:19:02 +0100</pubDate><author>sebiwi</author><guid>https://sebiwi.github.io/comics/libra/</guid><description>If I wanted my financial transaction information to be used for political advertisement, I'd give it to Cambridge Analytica myself.</description><content:encoded>&lt;img src="https://sebiwi.github.io/images/comics/2019-10-14-libra.jpg" /&gt;</content:encoded></item><item><title>The Powershell diaries: part 1</title><link>https://sebiwi.github.io/comics/the-powershell-diaries-part-1/</link><pubDate>Mon, 07 Oct 2019 07:19:02 +0100</pubDate><author>sebiwi</author><guid>https://sebiwi.github.io/comics/the-powershell-diaries-part-1/</guid><description>I spent 15 minutes trying to figure out how to get my Powershell version. Turns out the syntax is harder than quitting vim.</description><content:encoded>&lt;img src="https://sebiwi.github.io/images/comics/2019-10-07-the-powershell-diaries-part-1.jpg" /&gt;</content:encoded></item><item><title>42</title><link>https://sebiwi.github.io/comics/42/</link><pubDate>Mon, 30 Sep 2019 07:19:02 +0100</pubDate><author>sebiwi</author><guid>https://sebiwi.github.io/comics/42/</guid><description>If she'd written the Hitchhiker's guide to the Galaxy, the Answer to the Ultimate Question of Life, the Universe, and Everything would be vim.</description><content:encoded>&lt;img src="https://sebiwi.github.io/images/comics/2019-09-30-42.jpg" /&gt;</content:encoded></item><item><title>Disagreement</title><link>https://sebiwi.github.io/comics/disagreement/</link><pubDate>Mon, 23 Sep 2019 07:19:02 +0100</pubDate><author>sebiwi</author><guid>https://sebiwi.github.io/comics/disagreement/</guid><description>Person 1: Truth comes from disagreement, so may I suggest..? Person 2: No. Person 1: Disagreement should be constructive. Person 2: I disagree.</description><content:encoded>&lt;img src="https://sebiwi.github.io/images/comics/2019-09-23-disagreement.jpg" /&gt;</content:encoded></item><item><title>Technical</title><link>https://sebiwi.github.io/comics/technical/</link><pubDate>Mon, 16 Sep 2019 07:19:02 +0100</pubDate><author>sebiwi</author><guid>https://sebiwi.github.io/comics/technical/</guid><description>Often, people think all of their problems are technical. Most of the time, they're not.</description><content:encoded>&lt;img src="https://sebiwi.github.io/images/comics/2019-09-16-technical.jpg" /&gt;</content:encoded></item><item><title>Planning</title><link>https://sebiwi.github.io/comics/planning/</link><pubDate>Mon, 25 Mar 2019 07:19:02 +0100</pubDate><author>sebiwi</author><guid>https://sebiwi.github.io/comics/planning/</guid><description>I love planning meetings. Specially when we're supposed to plan every single aspect of a product's lifetime for the next 25 years</description><content:encoded>&lt;img src="https://sebiwi.github.io/images/comics/2019-03-25-planning.jpg" /&gt;</content:encoded></item><item><title>Regex</title><link>https://sebiwi.github.io/comics/regex/</link><pubDate>Mon, 18 Mar 2019 07:19:02 +0100</pubDate><author>sebiwi</author><guid>https://sebiwi.github.io/comics/regex/</guid><description>I've heard people trying to evaluate performance by counting lines of code. That's attributing the same value to a variable assignment and a regular expression.</description><content:encoded>&lt;img src="https://sebiwi.github.io/images/comics/2019-03-18-regex.jpg" /&gt;</content:encoded></item><item><title>Datalake</title><link>https://sebiwi.github.io/comics/datalake/</link><pubDate>Mon, 11 Mar 2019 07:19:02 +0100</pubDate><author>sebiwi</author><guid>https://sebiwi.github.io/comics/datalake/</guid><description>Everyone talks about datalakes, no one talks about quality data. So almost every datalake looks like a shitpile.</description><content:encoded>&lt;img src="https://sebiwi.github.io/images/comics/2019-03-11-datalake.jpg" /&gt;</content:encoded></item><item><title>OOM</title><link>https://sebiwi.github.io/comics/oom/</link><pubDate>Mon, 04 Mar 2019 07:19:02 +0100</pubDate><author>sebiwi</author><guid>https://sebiwi.github.io/comics/oom/</guid><description>The OOM killer is supposed to kill processes based on an OOM score. In reality, it just kills the Java application.</description><content:encoded>&lt;img src="https://sebiwi.github.io/images/comics/2019-03-04-OOM.jpg" /&gt;</content:encoded></item><item><title>Agile says</title><link>https://sebiwi.github.io/comics/agile-says/</link><pubDate>Mon, 28 Jan 2019 07:19:02 +0100</pubDate><author>sebiwi</author><guid>https://sebiwi.github.io/comics/agile-says/</guid><description>It's funny to hear people say 'Agile says'. It's like there are rules to follow.</description><content:encoded>&lt;img src="https://sebiwi.github.io/images/comics/2019-01-28-agile-says.jpg" /&gt;</content:encoded></item><item><title>Status feature</title><link>https://sebiwi.github.io/comics/status-feature/</link><pubDate>Mon, 21 Jan 2019 07:19:02 +0100</pubDate><author>sebiwi</author><guid>https://sebiwi.github.io/comics/status-feature/</guid><description>Why did they add the status feature to GitHub? What is this, Facebook?</description><content:encoded>&lt;img src="https://sebiwi.github.io/images/comics/2019-01-21-status-feature.jpg" /&gt;</content:encoded></item><item><title>On Operations, DevOps and soft skills</title><link>https://sebiwi.github.io/blog/on-operations-devops-and-soft-skills/</link><pubDate>Thu, 17 Jan 2019 07:19:02 +0100</pubDate><author>sebiwi</author><guid>https://sebiwi.github.io/blog/on-operations-devops-and-soft-skills/</guid><description>Lessons from being the ops person embedded in a dev team, why DevOps is as much about communication and soft skills as tooling.</description><content:encoded>&lt;h2 id="lets-talk-about-communication-for-a-bit"&gt;Let’s talk about communication for a bit&lt;/h2&gt;
&lt;p&gt;One of the most interesting roles I’ve had to fulfill the last couple of years
has been the “Operations guy working as a part of a Development team”. This is
a fascinating situation to find yourself in. Allow me to elaborate.&lt;/p&gt;
&lt;p&gt;Historically, Operations teams have been isolated from Development teams. &lt;strong&gt;Two
separate organizational entities&lt;/strong&gt;. The reasons for this were manifold:
Operations people were a scarce resource, they needed to accommodate vast
amounts of work for numerous Development teams (server provisioning for team A,
middleware configuration for team B, application deployment for team C), it
made sense for management to group people into teams using their skillset as
sorting criteria&amp;hellip; you name it. In the end, most interactions between
Development teams and Operation teams happened through ticketing systems.&lt;/p&gt;


&lt;figure&gt;
 












 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 &lt;img src="https://sebiwi.github.io/images/soft-skills/effective-communication_hu_a182460679167c81.webp" srcset="https://sebiwi.github.io/images/soft-skills/effective-communication_hu_5bd9c7b7cbc82a9e.webp 480w, https://sebiwi.github.io/images/soft-skills/effective-communication_hu_417f41701efb82a4.webp 960w, https://sebiwi.github.io/images/soft-skills/effective-communication_hu_a182460679167c81.webp 1440w" sizes="(max-width: 880px) 92vw, 880px" width="1440" height="570" alt="Effective communication" loading="lazy" decoding="async"&gt;


 
 &lt;figcaption&gt;Effective communication&lt;/figcaption&gt;
 
&lt;/figure&gt;

&lt;p&gt;These workflows created and fueled most of the &lt;strong&gt;communication issues&lt;/strong&gt; within
organizations, and by doing so, created &lt;strong&gt;gargantuan bottlenecks&lt;/strong&gt; on the
development/deployment pipelines. Applications and services took months or even
years to ship into production. This, needless to say, was frustrating for
everyone involved in the process. It also had a huge impact on Time to Market.
No one was happy.&lt;/p&gt;
&lt;p&gt;While these situations still occur nowadays, the incorporation of Agile
approaches into Software Development is becoming increasingly common, and its
impact on Operations is clearly visible from an organisational point of view.
It is not rare, for example, to see &lt;strong&gt;Feature Teams within an organization.&lt;/strong&gt;
These are &lt;strong&gt;cross-functional&lt;/strong&gt;, which means that they tend to incorporate many
different profiles into their ranks. From conception, design, and
implementation, up to deployment, product owners and designers will be working
hand in hand with developers. This most likely means that an &lt;strong&gt;OPS will also be a
part of the team.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="how-is-that-a-game-changer"&gt;How is that a game changer?&lt;/h2&gt;
&lt;p&gt;When working as an OPS in a Development, Feature or Product team, &lt;strong&gt;most of your
responsibilities will shift&lt;/strong&gt;, whether you realize it or not. It will not be
about taking team X’s artifact and deploying it on servers A, B and C anymore.
It is &lt;strong&gt;your team now&lt;/strong&gt;, and therefore, &lt;strong&gt;your artifact and your servers.&lt;/strong&gt; You will
most likely have to deal with a lot of things you are not used to dealing with.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;This is a good thing. Embrace it.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;It is an opportunity. If you do things right, you will be able to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use and apply your knowledge regarding technical architecture and systems. This
may concern not only the application itself, but also every service or platform
involved.&lt;/li&gt;
&lt;li&gt;Facilitate discussions and answer questions regarding your areas of
expertise. Once again, this not only includes the application, but also the
components around it.&lt;/li&gt;
&lt;li&gt;Show people what you do everyday, and how you do it.
What’s the point of your work? Is it actually necessary? Isn’t it something
that needs to be done only at the beginning of an application’s lifetime?&lt;/li&gt;
&lt;li&gt;Improve the team’s dynamics: help them grow, and help them care. Don’t hold
back: from good development practices applied to Operations, to workshops on
existing processes.&lt;/li&gt;
&lt;li&gt;Learn, to a vast extent.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Under this light, the role of an OPS inside a development team is virtually
Tech Leading, focusing on operational aspects. In short, it’s mostly about soft
skills (even though technical skills are still required), about the ability to
express ideas, vulgarize subjects, solve issues, analyze and solve problems,
and share knowledge with your team. That’s a tremendous change. &lt;strong&gt;And the basic
ingredient for all of these is kindness.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="on-technical-architecture"&gt;On technical architecture&lt;/h2&gt;
&lt;p&gt;When working on an organizational setup like this, there is a high chance that you
will be the most technical person on the team. When I say technical, I mean
with the biggest background on Linux/Windows systems, middleware, networks and
virtualisation. &lt;strong&gt;If not, that’s great news.&lt;/strong&gt; It means that you will have people
with whom you will be able to discuss all of these subjects. Someone who will
be able to challenge you, or remind you what’s important when you’re having
tunnel vision. In any case, you will most likely have to propose solutions to
many different issues.&lt;/p&gt;
&lt;p&gt;At some point you will have to deploy your application and its dependencies
somewhere. That’s where your knowledge on the subject comes in. Web servers,
application servers, reverse proxies, caching systems, databases, failover,
high availability, disaster recovery&amp;hellip; These are all things you will have to
analyze. Should you use nginx or Apache? PostgreSQL or MySQL? The answer is
always the same: &lt;strong&gt;it all depends on your needs&lt;/strong&gt;. Try to analyze what you need
before proposing a solution. Leverage your experience when doing so.&lt;/p&gt;
&lt;p&gt;Be cautious: &lt;strong&gt;this does not mean that you need to make all of these decisions
all by yourself.&lt;/strong&gt; There are trade-offs for every single choice you will make.
These trade-offs will not only impact the functioning of the application
itself, but also its development. This means that their opinion on the matter
is paramount. Your role is to explain the different options to the people that
are concerned by the choice. Remember, consensus is key. Embrace challenge too.
For good ideas, you need human interaction, healthy conflicts, argument and
debate. When doing so, remember to be kind. Don’t impose your opinions on
everyone else. Be constructive. And this is not specific for this section.
There should be an acronym for this, &lt;strong&gt;RTBK&lt;/strong&gt;. It should be used way more often
than &lt;a href="https://en.wikipedia.org/wiki/RTFM"&gt;RTFM&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This also applies to Continuous Integration and Continuous Deployment. What is
the simplest workflow in order to take the application’s source code, shape it
into the actual application and make it accessible to users? This has
tremendous value: it will allow you to automate time-consuming, repetitive
tasks, creating a safe automated pipeline, which will in turn allow everyone to
concentrate on delivering value. A piece of advice: start small, and work your
way up to something that suits your needs. “Simple is better” should be your
mantra. Once again, apply the knowledge acquired from previous experiences when
doing so.&lt;/p&gt;


&lt;figure&gt;
 












 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 &lt;img src="https://sebiwi.github.io/images/comics/2018-04-09-dear-jenkins_hu_806733c3c63d082e.webp" srcset="https://sebiwi.github.io/images/comics/2018-04-09-dear-jenkins_hu_2068d4103516a6d2.webp 480w, https://sebiwi.github.io/images/comics/2018-04-09-dear-jenkins_hu_806733c3c63d082e.webp 761w" sizes="(max-width: 880px) 92vw, 880px" width="761" height="570" alt="Why do you keep using Jenkins if you hate it?" loading="lazy" decoding="async"&gt;


 
 &lt;figcaption&gt;Improve, based on your experiences&lt;/figcaption&gt;
 
&lt;/figure&gt;

&lt;h2 id="on-facilitating-interactions"&gt;On facilitating interactions&lt;/h2&gt;
&lt;p&gt;More often than not, people on your team will have questions regarding your
field of expertise. Discussions will be held on subjects you know in depth. A
lot of terms will be thrown around: availability, redundancy, stress testing,
building and deployment. Once again, &lt;strong&gt;if they know everything there is to know
about these subjects, that’s a good thing too&lt;/strong&gt;. Either way, you might want to
step in and catalyze the discussion.&lt;/p&gt;


&lt;figure&gt;
 












 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 &lt;img src="https://sebiwi.github.io/images/comics/2018-01-29-communication_hu_135bb762f3aa6f26.webp" srcset="https://sebiwi.github.io/images/comics/2018-01-29-communication_hu_c7741e234da2e51e.webp 480w, https://sebiwi.github.io/images/comics/2018-01-29-communication_hu_65cc922aadbf2a18.webp 960w, https://sebiwi.github.io/images/comics/2018-01-29-communication_hu_135bb762f3aa6f26.webp 1221w" sizes="(max-width: 880px) 92vw, 880px" width="1221" height="449" alt="Individuals and interactions overs processes and tools" loading="lazy" decoding="async"&gt;


 
 &lt;figcaption&gt;Individuals and interactions over processes and tools&lt;/figcaption&gt;
 
&lt;/figure&gt;

&lt;p&gt;Before doing so, &lt;strong&gt;remember to be kind&lt;/strong&gt;. It is a key aspect in human interaction,
and it will encourage participation, collaboration, and innovation.&lt;/p&gt;
&lt;p&gt;First off, explain the meaning of every concept being discussed to everyone. &lt;strong&gt;It
is crucial for every single person on the team to share the same language in
order to have effective interactions&lt;/strong&gt;. When having discussions about the
“bastion”, one person may be talking about the web interface of a Cloud
provider, whereas another one might be talking about a Linux server.&lt;/p&gt;
&lt;p&gt;Be clear with your communication. &lt;strong&gt;More specifically, work on your
vulgarization. I can’t stress this enough&lt;/strong&gt;. The ability to express complex
concepts in a simple fashion is priceless. It’s one of the most valuable skills
you can learn. They don’t necessarily need to know every single detail on the
subject. It is not the same, saying “there was a problem with the application’s
configuration” than “the application crashed because we forgot to set the
heap’s maximum size”. A fundamental part of the vulgarization exercise is to be
capable of discerning the appropriate level of technical depth for each person.&lt;/p&gt;
&lt;p&gt;Sometimes, these discussions will start to consume the time allocated for other
purposes: stand up meetings, retrospectives, backlog grooming or others. While
it’s good to be able to facilitate these discussions, it is also important to
find the right instances to do so. If they do not exist, you can propose new
ones yourself. Or even better, just make yourself available in order to discuss
these subjects. Individuals and interactions over processes and tools.&lt;/p&gt;
&lt;h2 id="on-sharing"&gt;On sharing&lt;/h2&gt;
&lt;p&gt;A large amount of people see Operations as an impenetrable affair. It
is safe to say that it is a hard topic to grasp. This is mainly due to the huge
amount of subjects it covers. Operations entails development, systems
administration, networks and security, just to name a few. &lt;strong&gt;This may sound
intimidating to most newcomers.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Nevertheless, &lt;strong&gt;this does not mean that they are not interested. They are often
just scared of the unknown&lt;/strong&gt;. Once people see that you are approachable, they
will start asking questions. Even if they don’t, you should ask them if there
are things that they want to learn. If things go right, you’ll get the
opportunity to share your knowledge.&lt;/p&gt;
&lt;p&gt;Before even thinking about doing so: &lt;strong&gt;RTBK&lt;/strong&gt;. If you don’t know what this means,
you’re skipping important parts of this article. Don’t. Go back, and take the
time to read them.&lt;/p&gt;
&lt;p&gt;There are two fundamental reasons to share your knowledge. First, &lt;strong&gt;mentoring
people is one of the most interesting and rewarding things you can do.&lt;/strong&gt;
Motivated people have &lt;strong&gt;tremendous potential&lt;/strong&gt;, and if they are willing to learn
and are interested in what you do, &lt;strong&gt;you can catalyze their growth to a great
extent.&lt;/strong&gt; Sometimes it only takes a &lt;strong&gt;little push for someone to discover a great
deal of capabilities.&lt;/strong&gt; You just need to know how to give the right push, in the
right direction.&lt;/p&gt;
&lt;p&gt;Second, it allows you to &lt;strong&gt;spread the knowledge&lt;/strong&gt;. If you’re the only person
working on these subjects, you will most likely become a single point of
failure. If something happens to what you built when you are not available to
fix it, the whole system goes down. It is reasonable to want &lt;strong&gt;to share that
responsibility with someone.&lt;/strong&gt; Do not expect them to become fully autonomous, or
a plug in replacement for you right away, they are already doing something else
as a full-time job. Still, having enough knowledge to be able to debug common
issues is already great.&lt;/p&gt;


&lt;figure&gt;
 












 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 &lt;img src="https://sebiwi.github.io/images/soft-skills/fullstack-devops-engineer_hu_d9b37f71dce5aaf8.webp" srcset="https://sebiwi.github.io/images/soft-skills/fullstack-devops-engineer_hu_b52410d2be8bd3eb.webp 480w, https://sebiwi.github.io/images/soft-skills/fullstack-devops-engineer_hu_e5fa7165791625bc.webp 960w, https://sebiwi.github.io/images/soft-skills/fullstack-devops-engineer_hu_d9b37f71dce5aaf8.webp 1098w" sizes="(max-width: 880px) 92vw, 880px" width="1098" height="738" alt="Fullstack DevOps engineer" loading="lazy" decoding="async"&gt;


 
 &lt;figcaption&gt;Fullstack DevOps engineer&lt;/figcaption&gt;
 
&lt;/figure&gt;

&lt;p&gt;The most efficient form of knowledge sharing is pair programming. It is
fundamental that everything that is going on is completely understood. You
can’t expect people to pick up Infrastructure as Code without having any
knowledge on Linux systems, for example. This is not necessarily an issue, as
you can teach them as you go. Make sure they understand what’s going on, and do
it often. Ask them to reformulate what you just said: drawing things works
great for this purpose.&lt;/p&gt;
&lt;p&gt;This is probably going to be a good exercise for you as well. Explaining a
concept to someone forces you to structure it differently in your head, and it
will allow you to know if you fully understand it too. If you do not, you can
look up the answer together. This is not a problem. More on this in the next
section.&lt;/p&gt;
&lt;p&gt;Later on, you can give them tasks you would normally work on by yourself. When
doing this, the most important part is to be able to select the right amount of
work, with the right complexity. It must be challenging enough for them to feel
excited, and not hard enough to make them frustrated. Aim for balance.&lt;/p&gt;
&lt;p&gt;You can also use other formats in order to teach large groups of people at the
same time. Mob programming works great too, or code katas, if you have
interesting subjects you can share. &lt;a href="https://github.com/sebiwi/terraform-wrapper"&gt;You can teach them how to code their own
wrappers using Test-Driven Development&lt;/a&gt;, how to &lt;a href="https://sebiwi.github.io/blog/the-wizard-1/"&gt;test their infrastructure code&lt;/a&gt;,
or how to &lt;a href="https://sebiwi.github.io/blog/ara/"&gt;leverage monitoring or reporting tools in order to understand what’s
going on within your system&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="on-improving"&gt;On improving&lt;/h2&gt;
&lt;p&gt;Last but not least, you should contribute to the continuous improvement of the
team. When doing so, &lt;strong&gt;remember to be kind&lt;/strong&gt;. Make sure everyone on the team knows
how important this is. It is a must-have quality when trying to improve as a
whole, when proposing enhancements and giving feedback. Change your
formulations, switch from “you made a mistake” to “we made a mistake”, it will
help you stay away from finger pointing. &lt;strong&gt;Responsibilities should be collective.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;It’s hard to be specific on this topic, since areas of improvement will vary
from team to team. I’ll give it a try though.&lt;/p&gt;
&lt;p&gt;At first, most people will refer to you as the DevOps of the team. Explain them
that &lt;strong&gt;DevOps is not a role, but a culture of collaboration and communication&lt;/strong&gt;. It
should be a goal you should all aim for.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fail&lt;/strong&gt;, and fail fast, too. Champion innovation, testing new ideas, validating
them, and cope with the fact that sometimes they don’t work. It’s a good thing,
as long as you know when to seek an alternative, and you get to learn
something.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Work on quality&lt;/strong&gt;. Start doing code reviews, for example, if you are not doing
them already, and include infrastructure code as well. They are most important
when trying to see what people are doing, to correct or improve certain
practices, and to share the code. Have them read your code too. At first they
will be scared, and they will tell you that they don’t want to because they
won’t have any remarks or contributions. This is not true. They will read it,
and understand how it works. If they don’t, they can come over and ask for
clarifications. No single points of failure.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Measure everything&lt;/strong&gt;. Preach the importance of monitoring and observability. Let
them know how to work on code instrumentation for it to be easily observable.
Show them the benefits of having structured, clear logs, and being able to
query the platform to have immediate answers on what is going on, at any time.
They’ll be onboard as soon as you show them the advantages.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Lead by example.&lt;/strong&gt; Motivate them, code with them, show them that you can do the
same things you do in standard development when you’re doing Operations. Show
them Clean Code, Test-Driven Development and refactoring, applied to
infrastructure code.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Learn from errors and mistakes.&lt;/strong&gt; If you have a production incident, analyze the
root cause so that you can learn from it, and prevent it in the future. Asking
yourself five times why something happened is a great way of finding root
causes. You will often see that what you thought was a technical issue actually
has an organizational or design root cause.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Say “I don’t know”, when you don’t know.&lt;/strong&gt; You are not meant to know everything.
Not knowing is not an issue. It is impossible to know everything. It is a good
thing to say it. Be honest. People will trust you more, and start doing it
themselves too.&lt;/p&gt;


&lt;figure&gt;
 












 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 &lt;img src="https://sebiwi.github.io/images/comics/2018-12-03-experience_hu_a54d13dc62cda518.webp" srcset="https://sebiwi.github.io/images/comics/2018-12-03-experience_hu_a44d028e156cece0.webp 480w, https://sebiwi.github.io/images/comics/2018-12-03-experience_hu_9ca93ddc917c89c8.webp 960w, https://sebiwi.github.io/images/comics/2018-12-03-experience_hu_a54d13dc62cda518.webp 1440w" sizes="(max-width: 880px) 92vw, 880px" width="1440" height="671" alt="I don&amp;#39;t know. Say it." loading="lazy" decoding="async"&gt;


 
 &lt;figcaption&gt;“I don&amp;rsquo;t know”. Say it.&lt;/figcaption&gt;
 
&lt;/figure&gt;

&lt;h2 id="theres-no-way-im-remembering-all-those-things"&gt;There’s no way I’m remembering all those things&lt;/h2&gt;
&lt;p&gt;You don’t have to. It all comes down to six basic principles:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Remember to be kind.&lt;/li&gt;
&lt;li&gt;Use your power for the greater good.&lt;/li&gt;
&lt;li&gt;Help people, explain things to them, resolve issues.&lt;/li&gt;
&lt;li&gt;Share what you know.&lt;/li&gt;
&lt;li&gt;Improve the team itself.&lt;/li&gt;
&lt;li&gt;Remember to be kind.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And most importantly, have fun. If it’s not fun, it’s probably not worth it.&lt;/p&gt;</content:encoded></item><item><title>ESx</title><link>https://sebiwi.github.io/comics/esx/</link><pubDate>Mon, 14 Jan 2019 07:19:02 +0100</pubDate><author>sebiwi</author><guid>https://sebiwi.github.io/comics/esx/</guid><description>You can get a neat mathematical formula to calculate the real name of an ECMAScript. I double dare you to try to get the same thing with Windows.</description><content:encoded>&lt;img src="https://sebiwi.github.io/images/comics/2019-01-14-ESx.jpg" /&gt;</content:encoded></item><item><title>Reality</title><link>https://sebiwi.github.io/comics/reality/</link><pubDate>Mon, 07 Jan 2019 07:19:02 +0100</pubDate><author>sebiwi</author><guid>https://sebiwi.github.io/comics/reality/</guid><description>There's a difference between the planning and our current progress. There must be an issue with reality. Certainly.</description><content:encoded>&lt;img src="https://sebiwi.github.io/images/comics/2019-01-07-reality.jpg" /&gt;</content:encoded></item><item><title>Retrospective</title><link>https://sebiwi.github.io/comics/retrospective/</link><pubDate>Mon, 31 Dec 2018 07:19:02 +0100</pubDate><author>sebiwi</author><guid>https://sebiwi.github.io/comics/retrospective/</guid><description>It's been a while since I started posting drawings in here. It's been fun. Maybe it's time for a little retrospective</description><content:encoded>&lt;img src="https://sebiwi.github.io/images/comics/2018-12-31-retrospective.jpg" /&gt;</content:encoded></item></channel></rss>