Superset Tutorial

Notes from Email exchange from Ben

The documentation is crappy :(. Their website is a work in progress (at best). You can post questions to their GitHub although they rarely reply and there's a Gitter that is somewhat more helpful. Ona will be working on documentation for Superset as we start to develop for it, and I can share those docs once we have them put together. This is a mid-term priority for us currently
I'm not sure about overlays, CSS, and branding, but I can follow up on it for you
Superset is the visualization layer that sits on top of Druid. Technically when we write metrics (which I can show), you're writing Druid metrics in Sueprset. The two are intentionally heavily integrated. The data flow we'll be using is this

APIs --> NiFi --> Kafka --> Druid / Superset
We will also have NiFi pass the data to HDFS (s3) for historical indexing
So, all of the data goes into Hadoop and Druid both - Druid is for OLAP processing, and HDFS is for historical data

Right now the servers all contain data from multiple clients as we're building out POCs of the platform, so, we can't share access to the server directly. Are there particular questions you have about the configuration?
A majority of our time has been spent in data ingestion (NiFi) and some on aggregations (Druid / Superset). To date, we haven't done any custom development in Superset
Drawing data from PostgresSQL would work, and we've done that with some deployments, although the performance / load times in Superset with a PG back-end are noticeably slower. It would be ok for internal-only work, and potentially, although I would recommend against that approach in a PRD instance. Also note that you'll need to rebuild the dashboard from a PG back-end to a Druid back-end as PG relies on SQL whereas Druid queries rely on JSON. I can show you this on the call

Druid queries and JSON queries in Superset are almost one in the same

NiFi pulls every minute (but eventually OpenLMIS will push) published directly to Kafka and Hadoop/Druid. Kafka passes it to Druid.

Outcome: need to further articulate

Superset instructions

Sources → Druid Datasources includes stock_card_line_items, which is what all the visualizations are based on right now.

Think of a "data source" as a table.

Each slice refers to a single data-source. The same applies to filters.

A "slice" is a singe component (ie: visualization) within a dashboard.

A filer will modify all the slices on a dashboard that use the filter's datasource.

It's thus ideal, if possible, to have a single table per dashboard. That way, you can rely on just a single filter within the dashboard. (Otherwise, because you need a filter per data source, you can end up with filters which seem redundant to the user.)

Clay will look into whether we can embed Superset's charts in other pages (eg: within OpenLMIS).

metrics vs dimensions.
Dimension: The data you want to report on or group by.
Metrics: The numbers/calculations associated with your dimensions.

When creating a graph, using a time granularity of a day or a month (rather than the default of a day) can make the line smoother.

Customization

A Superset installation includes and serves all of its web assets (HTML, images, etc.) in a manner which makes them easy to edit. Branding the UI this way is trivial but would complicate future upgrades. As the screenshot below suggests, however, Superset offers built-in support for the use of custom CSS. This CSS applies to the entirety of the page (as opposed to just the dashboard area) and presents a sustainable approach to the introduction of modest visual customizations.

Meanwhile, screenshot below suggests, Superset supports the inclusion of arbitrary HTML within slices.

By choosing a slice type of "Markup," "Separator," or "iFrame," developers may specify HTML for the slice. It can include any number of iFrames which, in turn, render other slices. A dashboard could thus potentially be comprised of a single slice which includes many others in a carefully arranged and annotated manner.

TODO:

Ben Leibert will look into how to apply CSS and branding to Superset dashboards, as well as whether arbitrary HTML elements (eg: labels, instructions) can be added to them.
Clay Crosby (Unlicensed) will look into whether slices may be embedded directly within external pages (eg: within OpenLMIS) and, if so, how authorization would be handled.