Building a Minimalistic BI Stack with PostgreSQL, FDW, and Superset – Looking for Feedback!

Hey everyone! I’m in the process of setting up a lightweight BI (Business Intelligence) architecture and would love to hear your thoughts before I go too far down the rabbit hole. Here’s the broad outline:

My Setup

  1. Dedicated PostgreSQL “BI” Server
    • I have a separate Postgres instance just for analytics (i.e., not the production DB). I really don’t want Superset or heavy queries hitting prod directly.
  2. Foreign Data Wrapper (FDW)
    • I’m using FDW on the BI Postgres to pull data from the production DB (and potentially from other sources in the future). This way, I can query external tables without actually copying them over or building huge ETL pipelines right now.
  3. Materialized Views
    • On the BI Postgres, I’m creating materialized views to store pre-aggregated data. The idea is to speed up reporting queries and reduce the load on both the BI and production DB. I’ll refresh these on a schedule (using either pg_cron or a simple script).
  4. Apache Superset for Visuals
    • I’ll point Superset at the BI Postgres so I can build dashboards from the materialized views. I’m trying to keep everything cost-effective and as simple as possible.

Why I’m Doing This

  • FinOps & Simplicity: I want to avoid a giant stack or expensive cloud services for moderate data volumes. We’re already comfortable with Postgres, so it made sense to leverage it fully.
  • Avoid Overloading Production: I really don’t want to connect Superset directly to the production DB. I’m hoping FDW plus a separate BI Postgres instance is a safe approach.

What I’m Unsure About

  1. FDW Performance: Have you seen any issues with slow queries when using FDW for relatively large tables? I’ve read about “query pushdown” being pretty good, but I’m worried about network overhead or unexpected performance hits.
  2. Materialized View Refresh: Right now, I’m planning on refreshing them daily (or maybe hourly for certain critical tables). Any advice on how to handle partial vs. full refreshes?
  3. Scaling: Is this going to become a nightmare if data volumes suddenly jump? I realize I might need to move to something more robust if we get hammered with queries. But how do you all handle that growth path?
  4. Security & Access Controls: Are there any best practices I should consider for user roles, especially to ensure people only see the data they’re allowed to see in Superset?

Questions for the Community

  • Is this approach workable in the long run, or is it an early-stage hack?
  • Any big gotchas with Postgres FDW or materialized views I should watch for?
  • Would you recommend a different setup if you were building from scratch?
  • Any cool tips for automating view refreshes (e.g., triggers, using pg_cron, or external schedulers)?

I’d really appreciate your feedback! If anyone has done something similar (or sees a looming pitfall), please let me know. My goal is to keep the architecture straightforward and cost-friendly, but also robust enough that I’m not drowning in queries or stale data.

Thanks in advance!