Reporting Platform Configuration Guide and User Manual
The reporting platform will come with a default configuration and needs to be configured independently to fit the needs of the implementation. This often happens after the OpenLMIS system has been configured for a specific implementation. This document is a work in progress and will continuously change as we develop the reporting platform.
Superset
Superset is the front-end visualization layer in the reporting platform. It queries the data storage system, Druid or RDBMS, for data and presents it in dashboards. This section of the guide focuses on configuring Superset for your implementation, including a frequently asked questions area for identifying where to perform a specific task.
Terminology
Datasource - the Druid data source that contains the data model, accessible via Sources → Druid Datasources. Note that Superset also supports Postgres (or other SQL) backends via Sources → Tables
Druid Cluster - a higher-level data source. Currently the only cluster is Ona Data Warehouse
Dimension - a field on each record defined in the data source that stores a particular data element, like a column in a SQL database. Primarily used for Group Bys and for generating metrics. Viewable via the List Druid Column tab in the data source editor
Metric - a calculation performed on a dimension in the data source. Viewable via the List Druid Metric tab in the data source editor. By default, Superset will create a COUNT() metric which returns the count of all records in the data source, there are a couple of ways to create new metrics:
In the List Druid Column tab of the data source editor, open the column editor and select any number of the checkboxes Count Distinct, Sum, Min, Max. This will create a metric off of that dimension that will produce the expected output (e.g. a distinct count of values in that dimension)
In the List Druid Metric tab of the data source editor, click the plus sign. Specify a data source, a metric name, verbose name, and JSON query that references metrics and dimensions available in the data source. More info on this in the General Configuration section below
Chart - a visualization in Superset. Some of the key parts are:
Data source - see above
Visualization type - things like Line Chart, Bar Chart, Table, Map. Superset has an extensive list already, but please document new visualizations that we should add on our GitHub (PLD review)
Date range - date range over which to query. You can also specify a granularity which is particularly useful on chart visualizations to keep the lines smoother
Group By (optional but very frequently used) - dimension(s) to group the metric results by (e.g. show a visit count per facility
Metric - metric(s) to display in the visualization
Options - varies by type of visualization, but gives some control over how the results are displayed as well as whether things like legends appear
Filters and Result Filters - these let you filter out certain dimension values (e.g. null) or results (e.g. metric values > 100) at the Chart-level, so that those filters do not need to be applied at the dashboard-level
Note you can run and re-run your query by clicking the Run Query button in the top right
Note the Save button there also
Dashboard - a collection of Charts on a single webpage. Most dashboards should include a Filter Chart that lets the consumer restrict the results to a specific dimension, e.g. to a particular region
User - the unique login account which contains a username, password, and user roles. Note an administrator cannot currently change the password after initially creating the user
User role - a collection of security points that define the features and data sources in Superset that a particular user can access
Filtering
Filters can be applied in two places:
On a dashboard. This requires you to build a Chart of the type Filter and add it to the dashboard. You can specify the particular dimensions on which you want to be able to filter on the entire dashboard. The filter can also adjust the date range for all Charts in the dashboard.
In a Chart (see Filters and Result Filters above). You can apply filters on dimensions and metric values for a single Chart, for instance to create a Chart that only returns results that are above a predetermined risk threshold.
Security
Security --> List Users shows the list of users (both active and inactive). To create a new user, click the plus sign in the top right and fill out the required fields. You're only allowed to create one user per email address entered. Superset recommends deactivating users instead of deleting them. Note an administrator cannot currently change the password after initially creating the user in Superset's UI. This is logged here: https://github.com/apache/incubator-superset/issues/4518#issuecomment-369900509 . Instead, you need to do it in the fabmanager cli. To do this:
SSH into the server where Superset is being hosted
type this at the command line: fabmanager reset-password admin --app superset
Users can be assigned to multiple roles, so it's recommended to have a few types of users (admin, power user = Alpha, basic user = Gamma) and match those with data source-specific roles.
Security --> List User Roles shows the list of user roles available. The user role controls the features and data sources a particular user can access. Some useful security points:
datasource access on [data cluster] [data source] --> gives access to a specific data source within a cluster
metric access on [data cluster] [data source] [metric] --> gives access to a specific metric that has been marked restricted
Note you also need to mark the metric as "restricted" in the metric editor
Check out Apache's documentation here for more info: https://superset.incubator.apache.org/security.html
Known Metrics & Limitations
See Druid documentation here for information on writing custom metrics / JSON queries.
If you're using a RDBMS backend, you can write custom SQL queries via SQL Lab --> SQL Editor.
Tips & Tricks
It's useful to have a spare Chart that you use specifically for testing, I usually preface it with my initials and call it something like CSC_testing so anyone else looking at Superset knows not to add it to a dashboard. That way, you avoid the risk of having someone mistakenly deploy something not PRD-ready.
To generate a .csv dump of all of the data in your data source, create a Chart with visualization type "Table" and add every dimension available in the Group By field. If there are metrics that you want to see in this as well, include them. Run the query. Depending upon the size of your data set, this may take a little bit. Then, in the top right, click the .csv button to export the resulting table to .csv. This is really helpful for data validation.
You can generate an iframe for a particular Chart by clicking the </> icon in the top right of the Chart Editor and embed that Chart in a webpage.
External Documentation
Superset site: https://superset.incubator.apache.org/index.html
Gitter: https://gitter.im/airbnb/superset
GitHub: https://github.com/apache/incubator-superset/issues
How to Configure Superset
This section provides a quick reference on how to configure Superset in a simple, consumable quick view list.
Create a user
Security → List Users
Click the plus sign (+) in the top right
Fill out the required fields
Note: you can only create one user per email address
Save
Note: once you create a user, only that user can edit their password. Administrators cannot reset passwords in Superset’s UI currently.
Create a user role
Security → List Roles
Specify the security points
Access your datasource
Druid
Sources → Druid Datasources
RDBMS
Sources → Tables
View and edit columns - incl. column labels
Within each datasource (see #3 above), click List Columns
Specify whether you want the column to be available for Grouping By within queries or Filtering within queries/dashboards
To edit a column’s name
Click the Edit Record button to the left of the column’s name
Druid columns
Lst this in the Dimension Spec Json field:
{
"type" : "default",
"dimension" : "actual_dimension_name",
"outputName": "Desired Name",
"outputType": "STRING"
}
RDBMS columns
Fill in the “Verbose Name” field
Create basic metrics
Within each datasource (see #3 above), click List Columns
For any relevant column, click the checkboxes for Count Distinct, Sum, Min, or Max. This will auto-generate metrics on the List Druid Metrics tab. Currently, Sum, Min, and Max are selected by default for any numeric-type column
Create complex metrics
Within each datasource (see #3 above), click List Metrics
Druid
Asdf
RDBMS - option 1
Click the plus sign (+) to the top left
Fill in the following important fields when appropriate:
Name (Metric field) (required)
Description (optional)
Verbose Name (optional)
Type (required)
SQL expression (required)
This is a simple SQL expression between two or more columns, it supports basic aggregation functions like AVG, subtraction, division. It does not require SELECT or FROM clauses, and does not support more advanced clauses like WHERE, OVER, or setting variables.
Click Save
RDBMS - option 2
SQL Lab → SQL Editor
Write a full SQL expression, including SELECT, FROM, WHERE, etc. clauses.
Run Query shows your output in tabular format
When your query is complete, click the Visualize button and:
Specify your time column
Specify any columns that should be treated as dimensions
Specify any aggregate functions that should be applied
Select a Chart Type (this can be changed once you’re within the Chart Editor)
Specify a Datasource Name
Note this means any Charts built off of this new datasource will not be affected by filters applied to the “parent” datasource (where you’re selecting from in the SQL query)
See the Create/edit Charts section (#7) below
Create/edit Charts
Click the Charts button at the top
Click the Edit record button to the left of the chart name to view a description, owners, associated dashboards, and parameters JSON (including default filters) for the Chart
Click the Chart name to open the Chart Editor. Note that each visualization type has its own parameters, but we will cover the essentials here:
Edit Visualization Type
Edit time granularity and time range (Since and Until)
Specify which metrics to display
Specify which dimensions/columns to Group By
Edit Number format - this supports free text entries, so typing “100%” will convert the results into percentages
Apply any default filters (based on dimensions) or result filters (based on metric values)
Create/edit a dashboard
Click Dashboards at the top
To create a dashboard
Click the plus sign (+) in the top right
Specify a title and owners
Your dashboard is now available in the Dashboards menu, follow steps 8c below
To edit a dashboard
Click the Edit record button to the left of the dashboard name to view associated charts, owners, CSS and JSON metadata for the dashboard.
Click on the dashboard’s name to open it
Click the Edit Dashboard button in the top right to adjust positioning of charts, to add/remove charts, or to set a refresh schedule (found under the Actions dropdown)
Click Save
Click Switch to View Mode
Drill into a Chart
Dashboards → dashboard name
In the top right of a Chart, click the Explore Chart button (right-pointing arrow). This opens the Chart Editor (see #7c above)