Reporting Platform Configuration Guide and User Manual
The reporting platform will come with a default configuration and needs to be configured independently to fit the needs of the implementation. This often happens after the OpenLMIS system has been configured for a specific implementation. This document is a work in progress and will continuously change as we develop the reporting platform.
Superset
Superset is the front-end visualization layer in the reporting platform. It queries the data storage system, Druid or RDBMS, for data and presents it in dashboards. This section of the guide focuses on configuring Superset for your implementation, including a frequently asked questions area for identifying where to perform a specific task.
Terminology
- Datasource - the Druid data source that contains the data model, accessible via Sources → Druid Datasources. Note that Superset also supports Postgres (or other SQL) backends via Sources → Tables
- Druid Cluster - a higher-level data source. Currently the only cluster is Ona Data Warehouse
- Dimension - a field on each record defined in the data source that stores a particular data element, like a column in a SQL database. Primarily used for Group Bys and for generating metrics. Viewable via the List Druid Column tab in the data source editor
- Metric - a calculation performed on a dimension in the data source. Viewable via the List Druid Metric tab in the data source editor. By default, Superset will create a COUNT() metric which returns the count of all records in the data source, there are a couple of ways to create new metrics:
- In the List Druid Column tab of the data source editor, open the column editor and select any number of the checkboxes Count Distinct, Sum, Min, Max. This will create a metric off of that dimension that will produce the expected output (e.g. a distinct count of values in that dimension)
- In the List Druid Metric tab of the data source editor, click the plus sign. Specify a data source, a metric name, verbose name, and JSON query that references metrics and dimensions available in the data source. More info on this in the General Configuration section below
- Chart - a visualization in Superset. Some of the key parts are:
- Data source - see above
- Visualization type - things like Line Chart, Bar Chart, Table, Map. Superset has an extensive list already, but please document new visualizations that we should add on our GitHub (PLD review)
- Date range - date range over which to query. You can also specify a granularity which is particularly useful on chart visualizations to keep the lines smoother
- Group By (optional but very frequently used) - dimension(s) to group the metric results by (e.g. show a visit count per facility
- Metric - metric(s) to display in the visualization
- Options - varies by type of visualization, but gives some control over how the results are displayed as well as whether things like legends appear
- Filters and Result Filters - these let you filter out certain dimension values (e.g. null) or results (e.g. metric values > 100) at the Chart-level, so that those filters do not need to be applied at the dashboard-level
- Note you can run and re-run your query by clicking the Run Query button in the top right
- Note the Save button there also
- Dashboard - a collection of Charts on a single webpage. Most dashboards should include a Filter Chart that lets the consumer restrict the results to a specific dimension, e.g. to a particular region
- User - the unique login account which contains a username, password, and user roles. Note an administrator cannot currently change the password after initially creating the user
- User role - a collection of security points that define the features and data sources in Superset that a particular user can access
Filtering
Filters can be applied in two places:
- On a dashboard. This requires you to build a Chart of the type Filter and add it to the dashboard. You can specify the particular dimensions on which you want to be able to filter on the entire dashboard. The filter can also adjust the date range for all Charts in the dashboard.
- In a Chart (see Filters and Result Filters above). You can apply filters on dimensions and metric values for a single Chart, for instance to create a Chart that only returns results that are above a predetermined risk threshold.
Security
Security --> List Users shows the list of users (both active and inactive). To create a new user, click the plus sign in the top right and fill out the required fields. You're only allowed to create one user per email address entered. Superset recommends deactivating users instead of deleting them. Note an administrator cannot currently change the password after initially creating the user in Superset's UI. This is logged here: https://github.com/apache/incubator-superset/issues/4518#issuecomment-369900509 . Instead, you need to do it in the fabmanager cli. To do this:
- SSH into the server where Superset is being hosted
- type this at the command line: fabmanager reset-password admin --app superset
Users can be assigned to multiple roles, so it's recommended to have a few types of users (admin, power user = Alpha, basic user = Gamma) and match those with data source-specific roles.
Security --> List User Roles shows the list of user roles available. The user role controls the features and data sources a particular user can access. Some useful security points:
- datasource access on [data cluster] [data source] --> gives access to a specific data source within a cluster
- metric access on [data cluster] [data source] [metric] --> gives access to a specific metric that has been marked restricted
- Note you also need to mark the metric as "restricted" in the metric editor
Check out Apache's documentation here for more info: https://superset.incubator.apache.org/security.html
Known Metrics & Limitations
See Druid documentation here for information on writing custom metrics / JSON queries.
If you're using a RDBMS backend, you can write custom SQL queries via SQL Lab --> SQL Editor.
Tips & Tricks
- It's useful to have a spare Chart that you use specifically for testing, I usually preface it with my initials and call it something like CSC_testing so anyone else looking at Superset knows not to add it to a dashboard. That way, you avoid the risk of having someone mistakenly deploy something not PRD-ready.
- To generate a .csv dump of all of the data in your data source, create a Chart with visualization type "Table" and add every dimension available in the Group By field. If there are metrics that you want to see in this as well, include them. Run the query. Depending upon the size of your data set, this may take a little bit. Then, in the top right, click the .csv button to export the resulting table to .csv. This is really helpful for data validation.
- You can generate an iframe for a particular Chart by clicking the </> icon in the top right of the Chart Editor and embed that Chart in a webpage.
External Documentation
Superset site: https://superset.incubator.apache.org/index.html
Gitter: https://gitter.im/airbnb/superset
GitHub: https://github.com/apache/incubator-superset/issues
How to Configure Superset
This section provides a quick reference on how to configure Superset in a simple, consumable quick view list.
Create a user
- Security → List Users
- Click the plus sign (+) in the top right
- Fill out the required fields
- Note: you can only create one user per email address
- Save
- Note: once you create a user, only that user can edit their password. Administrators cannot reset passwords in Superset’s UI currently.
Create a user role
- Security → List Roles
- Specify the security points
Access your datasource
- Druid
- Sources → Druid Datasources
- RDBMS
- Sources → Tables
View and edit columns - incl. column labels
- Within each datasource (see #3 above), click List Columns
- Specify whether you want the column to be available for Grouping By within queries or Filtering within queries/dashboards
- To edit a column’s name
- Click the Edit Record button to the left of the column’s name
- Druid columns
- Lst this in the Dimension Spec Json field:
- {
- "type" : "default",
- "dimension" : "actual_dimension_name",
- "outputName": "Desired Name",
- "outputType": "STRING"
- }
- RDBMS columns
- Fill in the “Verbose Name” field
Create basic metrics
- Within each datasource (see #3 above), click List Columns
- For any relevant column, click the checkboxes for Count Distinct, Sum, Min, or Max. This will auto-generate metrics on the List Druid Metrics tab. Currently, Sum, Min, and Max are selected by default for any numeric-type column
Create complex metrics
- Within each datasource (see #3 above), click List Metrics
- Druid
- Asdf
- RDBMS - option 1
- Click the plus sign (+) to the top left
- Fill in the following important fields when appropriate:
- Name (Metric field) (required)
- Description (optional)
- Verbose Name (optional)
- Type (required)
- SQL expression (required)
- This is a simple SQL expression between two or more columns, it supports basic aggregation functions like AVG, subtraction, division. It does not require SELECT or FROM clauses, and does not support more advanced clauses like WHERE, OVER, or setting variables.
- Click Save
- RDBMS - option 2
- SQL Lab → SQL Editor
- Write a full SQL expression, including SELECT, FROM, WHERE, etc. clauses.
- Run Query shows your output in tabular format
- When your query is complete, click the Visualize button and:
- Specify your time column
- Specify any columns that should be treated as dimensions
- Specify any aggregate functions that should be applied
- Select a Chart Type (this can be changed once you’re within the Chart Editor)
- Specify a Datasource Name
- Note this means any Charts built off of this new datasource will not be affected by filters applied to the “parent” datasource (where you’re selecting from in the SQL query)
- See the Create/edit Charts section (#7) below
Create/edit Charts
- Click the Charts button at the top
- Click the Edit record button to the left of the chart name to view a description, owners, associated dashboards, and parameters JSON (including default filters) for the Chart
- Click the Chart name to open the Chart Editor. Note that each visualization type has its own parameters, but we will cover the essentials here:
- Edit Visualization Type
- Edit time granularity and time range (Since and Until)
- Specify which metrics to display
- Specify which dimensions/columns to Group By
- Edit Number format - this supports free text entries, so typing “100%” will convert the results into percentages
- Apply any default filters (based on dimensions) or result filters (based on metric values)
Create/edit a dashboard
- Click Dashboards at the top
- To create a dashboard
- Click the plus sign (+) in the top right
- Specify a title and owners
- Your dashboard is now available in the Dashboards menu, follow steps 8c below
- To edit a dashboard
- Click the Edit record button to the left of the dashboard name to view associated charts, owners, CSS and JSON metadata for the dashboard.
- Click on the dashboard’s name to open it
- Click the Edit Dashboard button in the top right to adjust positioning of charts, to add/remove charts, or to set a refresh schedule (found under the Actions dropdown)
- Click Save
- Click Switch to View Mode
Drill into a Chart
- Dashboards → dashboard name
- In the top right of a Chart, click the Explore Chart button (right-pointing arrow). This opens the Chart Editor (see #7c above)
OpenLMIS: the global initiative for powerful LMIS software