Superset Tutorial
Overview and General Information
Description
Superset is an open source dashboarding tool built by Airbnb developers and currently being incubated by Apache. It was selected as the data visualization component for the OLMIS reporting stack.
Terminology
Data source - the Postgres (or other SQL) table that you want to view or build charts off of. You can view them via Sources --> Tables
Dimension - a field on each record defined in the data source that stores a particular data element (a column in a SQL database). Primarily used for Group Bys and for generating metrics. Viewable via the List Column tab in the data source editor
Metric - a calculation performed on a dimension in the data source. Viewable via the List Metric tab in the data source editor. By default, Superset will create a COUNT() metric which returns the count of all records in the data source, there are a couple of ways to create new metrics:
In the List Druid Metric tab of the data source editor, click the plus sign. Specify a data source, a metric name, verbose name, and JSON query that references metrics and dimensions available in the data source. More info on this in the General Configruation section below
- Within a chart
Chart (fka "slice") - a visualization in Superset. Some of the key parts are:
- Data source - see above
- Visualization type - things like Line Chart, Bar Chart, Table, Map. Superset has an extensive list already, but new visualization styles canb e developed
- Date range - date range over which to query. You can also specify a granularity which is particularly useful on chart visualizations to keep the lines smoother
- Group By (optional but very frequently used) - dimension(s) to group the metric results by (e.g. show a visit count per facility)
- Metric - metric(s) to display in the visualization
- Options - varies by type of visualization, but gives some control over how the results are displayed as well as whether things like legends appear
- Filters and Result Filters - these let you filter out certain dimension values (e.g. null) or results (e.g. metric values > 100) at the chart-level, so that those filters do not need to be applied at the dashboard-level
- Note you can run and re-run your query by clicking the Run Query button in the top right
- Note the Save button there also
Dashboard - a collection of charts on a single page. Most dashboards should include a Filter slice that lets the consumer restrict the results to a specific dimension, e.g. to a particular region
User - the unique login account which contains a username, password, and user roles. Note an administrator cannot currently change the password after initially creating the user
User role - a collection of security points that define the features and data sources in Superset that a particular user can access
Filtering
Filters can be applied in a few places:
- On a dashboard. This requires you to build a slice of the type Filter and add it to the dashboard. You can specify the particular dimensions on which you want to be able to filter on the entire dashboard. The filter can also adjust the date range for all slices in the dashboard.
- In a dashboard. This requires you to build a slice of the type Filter and reference it in the JSON Metadata field available in the dashboard editor. This filter is applied at the dashboard-level, and cannot be edited by an average user.
- In a chart (see Filters and Result Filters above). You can apply filters on dimensions and metric values for a single chart, for instance to create a chart that only returns results that are above a predetermined risk threshold.
Security
Security --> List Users shows the list of users (both active and inactive). To create a new user, click the plus sign in the top right and fill out the required fields. You're only allowed to create one user per email address entered. Superset recommends deactivating users instead of deleting them. Note an administrator cannot currently change the password after initially creating the user in Superset's UI. This is logged here:https://github.com/apache/incubator-superset/issues/4518#issuecomment-369900509. Instead, you need to do it in the fabmanager cli. To do this:
- SSH into the server where Superset is being hosted
- type this at the command line: fabmanager reset-password admin --app superset
Users can be assigned to multiple roles, so it's recommended to have a few types of users (admin, power user = Alhpa, basic user = Gamma) and match those with data source-specific roles.
Security --> List User Roles shows the list of user roles available. The user role controls the features and data sources a particular user can access. Some useful security points:
- datasource access on [data cluster] [data source] --> gives access to a specific data source within a cluster
- metric access on [data cluster] [data source] [metric] --> gives access to a specific metric that has been marked restricted
- all datasource access on all_datasource_access --> gives access to all data sources, note the Alpha and Admin user roles have this security point. We've created Alpha_restricted which does not include this point, and should be combined with a datasource-specific user role
- Note you also need to mark the metric as "restricted" in the metric editor
To duplicate a user role, select the checkbox next to that role and under the Actions button at the bottom left, click the Copy button.
Check out Apache's documentation here for more info:https://superset.incubator.apache.org/security.html
Tips & Tricks
It's useful to have a spare slice that you use specifically for testing, I usually preface it with my initials and call it something like CSC_testing so anyone else looking at Superset knows not to add it to a dashboard. That way, you avoid the risk of having someone mistakenly deploy something not PRD-ready.
To generate a .csv dump of all of the data in your data source, create a slice with visualization type "Table" and add every dimension available in the Group By field. If there are metrics that you want to see in this as well, include them. Run the query. Depending upon the size of your data set, this may take a little bit. Then, in the top right, click the .csv button to export the resulting table to .csv. This is really helpful for data validation.
You can generate an iframe for a particular slice by clicking the </> icon in the top right of the Slice Editor and embed that slice in a webpage.
When using a remote database you might need to reset the connection (by typing in the database URL and specifically the password afresh) to get around
UnicodeDecodeError
reported by Superset
External Documentation
Specific Tasks
Create a user
- Note: you can only create one user per email address
- Note: once you create a user, only that user can edit their password. Administrators cannot reset passwords in Superset's UI currently, it needs to be done through the command line
- Security → List Users
- Click the plus sign (+) in the top right
- Fill out the required fields
- Save
Create a user role
- Security → List Roles
- Specify the security points
Access your datasource
- Sources → Druid Datasources
- Sources → Tables
- Druid
- PostgreSQL (or other RDBMS)
View and edit columns - incl. column labels
- Click the Edit Record button to the left of the column’s name
- Druid columns
- {
- "type" : "default",
- "dimension" : "actual_dimension_name",
- "outputName": "Desired Name",
- "outputType": "STRING"
- }
- Lst this in the Dimension Spec Json field:
- RDBMS columns
- Fill in the “Verbose Name” field
- Within each datasource (see #3 above), click List Columns
- Specify whether you want the column to be available for Grouping By within queries or Filtering within queries/dashboards
- To edit a column’s name
Create basic metrics
- Within each datasource (see #3 above), click List Columns
- For any relevant column, click the checkboxes for Count Distinct, Sum, Min, or Max. This will auto-generate metrics on the List Druid Metrics tab. Currently, Sum, Min, and Max are selected by default for any numeric-type column
Create complex metrics
- RDBMS - option 1
- Name (Metric field) (required)
- Description (optional)
- Verbose Name (optional)
- Type (required)
- SQL expression (required)
- This is a simple SQL expression between two or more columns, it supports basic aggregation functions like AVG, subtraction, division. It does not require SELECT or FROM clauses, and does not support more advanced clauses like WHERE, OVER, or setting variables.
- Click the plus sign (+) to the top left
- Fill in the following important fields when appropriate:
- Click Save
- RDBMS - option 2
- Specify your time column
- Specify any columns that should be treated as dimensions
- Specify any aggregate functions that should be applied
- Select a Chart Type (this can be changed once you’re within the Chart Editor)
- Specify a Datasource Name
- Note this means any Charts built off of this new datasource will not be affected by filters applied to the “parent” datasource (where you’re selecting from in the SQL query)
- See the Create/edit Charts section (#7) below
- SQL Lab → SQL Editor
- Write a full SQL expression, including SELECT, FROM, WHERE, etc. clauses.
- Run Query shows your output in tabular format
- When your query is complete, click the Visualize button and:
- RDBMS - option 1
- Within each datasource (see #3 above), click List Metrics
Create/edit Charts
- Edit Visualization Type
- Edit time granularity and time range (Since and Until)
- Specify which metrics to display
- Specify which dimensions/columns to Group By
- Edit Number format - this supports free text entries, so typing “100%” will convert the results into percentages
- Apply any default filters (based on dimensions) or result filters (based on metric values)
- Click the Charts button at the top
- Click the Edit record button to the left of the chart name to view a description, owners, associated dashboards, and parameters JSON (including default filters) for the Chart
- Click the Chart name to open the Chart Editor. Note that each visualization type has its own parameters, but we will cover the essentials here:
Create/edit a dashboard
- Click the plus sign (+) in the top right
- Specify a title and owners
- Your dashboard is now available in the Dashboards menu, follow steps 8c below
- Click the Edit record button to the left of the dashboard name to view associated charts, owners, CSS and JSON metadata for the dashboard.
- Click on the dashboard’s name to open it
- Click the Edit Dashboard button in the top right to adjust positioning of charts, to add/remove charts, or to set a refresh schedule (found under the Actions dropdown)
- Click Save
- Click Switch to View Mode
- Click Dashboards at the top
- To create a dashboard
- To edit a dashboard
Drill into a Chart
- Dashboards → dashboard name
- In the top right of a Chart, click the Explore Chart button (right-pointing arrow). This opens the Chart Editor (see #7c above)
- Apply a default filter to a dashboard
- Dashboard list → edit dashboard
- In the JSON Metadata field, enter this: "default_filters": "{\"id\": {\"dimension\": [\"value\"]}}"
- id is the NUMERIC id of the filter slice you want to filter by
- dimension is the dimension name (i.e. column name) that you want to filter by
- value is the dimension value that you want to filter to
- "default_filters": "{\"1705\": {\"country\": [\"Kenya\"]}}" this applies a default filter to a dashboard using chart id 1705, and it will only return results where the country column is equal to Kenya
- Apply default colors to charts
- Dashboard list → edit dashboard
- In the JSON Metadaata field, enter this: "label_colors": {"Value1": "color1", "value2": "color2"...}
- Color must be expressed using hex codes
OpenLMIS: the global initiative for powerful LMIS software