Managed SSBI with Power BI: Core Dimensions

In this post I want to revisit the core datasets I talked about in a previous post about Managed SSBI.

One problem that we may have overlooked when building a bunch of core datasets in that post, is that certain dimensions tend to duplicate across the datasets. Imagine a scenario where the single master data source of a managed self-service setup is a data warehouse, which sources all the required dimensions. When you have, for example, core reseller sales, internet sales, and finance datasets, each one will have a calendar dimension and a few others created in each of these datasets. This is not ideal if you think about the extent of the duplication and effort that is required.

This is where, once again, using DQ for PBI datasets and AS comes into play, where you could draw up a layered core dataset architecture. If we take the example of AdventureWorks’ fact tables in the data warehouse (single master data source) you can figure out what the business processes are. Then, when you draw up an enterprise bus matrix, this is a snippet of what you will arrive at:

You will note that dimensions such as Calendar (Date), Salesperson, Product and a few others span multiple business processes. These are ideal candidates for the first layer of the core datasets. The corresponding core workspace looks like this; a dataset each for the business processes (with the exception of sales quota and inventory):

Each dataset has at least one of the following dimensions: Calendar, Salesperson and Product. Hence, I shall focus on these for the post.

Since, reseller sales has all the three dimensions, I shall focus on this first, and download the dataset from the service:

The reseller sales dataset

I will get to work by keeping Calendar, Salesperson and Product, and removing the Reseller Sales and Metrics tables:

Dataset stripped down to only the common dimensions

I will name this dataset Core Dimensions and publish to the core workspace. This will be the dataset that all core datasets will refer to for common dimensions.

Then, I will edit each core dataset by removing the dimensions that correspond to what’s in Core Dimensions. Next, I will connect to the data hub, select the Core Dimensions dataset and select the required dimensions. Finally, I re-create the relationships, for the dataset to look like this (This is the new and improved reseller sales):

If you look at the lineage in the core workspace, this is what you will now see:

Note: For the steps to direct query a dataset, you can refer to my previous post on extending a core dataset at the edge.

Managed SSBI: Extending a Core Dataset at the Edge

My last post gave an introduction to managed self-service BI, and the concept of discipline at the core, flexibility at the edge.

Now, what happens when an analyst, for instance, has a set of sales target spreadsheets and wants to compare the figures with sales metrics so that salespeople’s performances can be measured? It certainly needs a new dataset. However, flexibility at the edge has to prevail in the right way. This post will look at how we can go about this keeping to discipline at the core, flexibility at the edge.

Note: The analyst’s requirement is at current local to their group or department. It has not yet been made an organizational requirement. That’s how most requirements start out: A requirement at the departmental level, and then when enough people start reaping the benefits within and outside of the department, it can get absorbed into the core.

To mash up data from a core dataset with the spreadsheet data, the analyst typically will have to either get a copy of the dataset or export the required data to spreadsheets. But this can cause a lot of problems, such as:

  • Core datasets are behind lock and key, and are only for reading from
  • The dataset gets duplicated
  • The metrics and other analytics get diluted back into plain data
  • Multiple versions of the truth

This is where DirectQuery for Power BI datasets and Analysis Services comes in. (Though still in preview as of the date of this post) this functionality allows one to extend an existing dataset into their own. Which means the analyst can now build a dataset with their sales targets, and then “pull in” what’s necessary from the core dataset. The content of the core dataset will not be a copy, rather a reference, thereby reducing effort and preventing duplication.

Let us get into the depth of it using the reseller sales example. The existing core reseller sales dataset consists of one fact table and three dimensions. A separate table for metrics serves as the home for the measures and KPIs:

Core reseller sales dataset

The analyst has reseller sales targets stored in a SharePoint folder across four annual files:

Step 1

The analyst needs to bring this into a new (edge) dataset. This video shows how that can be done:

Importing reseller sales targets from SharePoint

This is what the model will look like once the targets are imported:

Step 2

Bring in the core reseller sales.

Since the DQ for PBI Datasets and AS feature is still in preview, it needs to be first enabled in Power BI Desktop.

Now, we direct query (and not Power Query) the core dataset from the data hub:

And follow these steps, as shown in the video:

  1. Select the dataset of choice
  2. Select the database (dataset) or table that are required. In this example everything.
  3. And then you’ll have the core tables showing up within your local dataset
  4. Create the necessary relationships
  5. Create the necessary measures

Publish it, and now you’ve got the edge dataset extended off of a core dataset. Notice the measure % Reseller Performance that was created using data from the core dataset (Reseller sales) and reseller targets.

An introduction to a Power BI strategy designed with Discipline and Flexibility

One of the major components of the analytics solutions that I build for my customers, is the business intelligence subsystem. Essentially from an end user’s perspective, the BI subsystem is the entire solution, whereas the rest, the “big machine”, is under the hood, known only to a select few. However, a post on the big machine is for another time. In this post we shall focus on the BI subsystem.

Regardless of the size of the organization it is essential to have a strategy for business intelligence and how it should be implemented. The strategy may not even need a big machine under the hood, and its function can be handled by a tool like Power BI in many cases.

When putting together a business intelligence strategy using Power BI, Microsoft recommends three primary strategies that an organization can adopt. Out of these, the one that I tend to go with is managed self-service BI, which brings forth the concept of discipline at the core, flexibility at the edge. This concept is the dominant strategy used for BI at Microsoft itself; explained very nicely in this article. It’s my personal favorite, because I find it an effective means of onboarding customers once the core platform is built with the required standards (discipline), and then help them adopt the solution from the edge, thus providing them with the best of both worlds.

Source: Microsoft

The core consists of three elements:

  1. The single master data source: which in most cases is the “big machine”, where data from all relevant systems are brought together, processed and made available for analysis),
  2. Standard corporate BI (which are the measures and dimensions set up in a standard manner and applicable across the organization) and
  3. Consistent taxonomies (where standard definitions and hierarchies are established for organizational KPIs along with calculations that depict the definitions)

The latter two can be realized using Power BI datasets (sometimes in collaboration with a data warehouse).

Core

So, when implementing the core, standard corporate BI with the consistent taxonomies can be built as a dataset (or bunch of datasets) per business function, such as, a sales dataset, a finance dataset, a marketing dataset and so forth.

The below example, shows a finance dataset and two sales datasets at the core:

Core models in core workspace

Edge

The analytics are now made available for end users to build their reports and dashboards off of at the edge, i.e., at their departmental or functional levels. Each user or group of users will create their own workspaces where they will self-serve the metrics and data from the core models, and author their content for various purposes. For instance, the sales team will have their package of monthly reports, a package of dashboards to present at the quarterly sales review, and so forth.
This way the need for users and analysts to collect, collate and build their own datasets is greatly reduced since the metrics and KPIs they need are already pre-created using a high standard. Note the reports in the example below where they are sourced off the core models.

Sales analysis in an edge workspace

In the next couple of posts, I shall be getting a little into the internals of the core.

What’s the difference between Live Connection and DirectQuery in Power BI, and where do they fit?

Hot on the heels of the question I was asked a few days ago, comes another closely related one: “What’s the difference between Connect live to a data source and DirectQuery a data source in Power BI?”

We had already established that there are two methods in which we could interact with data using Power BI: loading data into Power BI and accessing the data source directly.

Connecting live and DirectQuery both fall into the latter method, but there is a difference.

In DirectQuery mode, you access the data source, such as a relational database or data mart for data, but then you would create calculated columns or measures on top of it in Power BI generating a data model layer, something similar to a database view, if you may. The data still exists at the data source; but is pulled through the data model on Power BI onto the the visuals. The end users and report creators will see and interact with the data model on Power BI.

In the case of Connect live, the data model itself is at the source, you interact with it directly and no data model layer is created on Power BI. All measures, calculated columns and KPIs are provided by the data model at the source, along with the data. End users and report authors will see and interact with this data model through Power BI.

If you would compare these two methods on a conceptual level; DirectQuery mode is used in cases of self-service where you have data marts or a data warehouse on a relational database, and business users build their own data models off this for their business needs. The data marts or data warehouse will integrate data from various systems, and provide base measures with related dimensions.  Business user may create custom measures and calculated columns on top of this to suit their reporting and analytic requirements, and then explore data and build visual reports. Think of this as the data discovery phase of the self-service exercise.

Live connections would probably be used in scenarios where the analytic needs are better understood, and/or the type of analytics that were described above have matured and has become a mainstream in the organization. Here data models are built off the data warehouse using Analysis Services (multidimensional or tabular), with measures, calculations and KPIs that were earlier part of the self-service (and the data discovery exercise) incorporated in it. Business users now have established reports and dashboards that showcase organizational performance powered by established data models. Think of this phase where things have evolved into corporate BI that gives real value.

[SUBJECT TO CHANGE] Out of the whole bunch of supported data sources, Power BI currently supports the following in DirectQuery mode:

  • SQL Server
  • Azure SQL Database
  • Azure SQL Data Warehouse
  • SAP HANA
  • Oracle Database
  • Teradata Database
  • Amazon Redshift (Preview)
  • Impala (Preview)
  • Snowflake (Preview)

and the following using a Live connection:

  • Analysis Services Tabular
  • Analysis Services Multidimensional
  • Azure Analysis Services (Preview)

What is Power BI?

Power BI is an self-service business intelligence offering from Microsoft. It is primarily aimed at business users to access corporate information (and even information outside of the organization) without (or with the least) involvement of IT. Users can build their own business intelligence solutions, share and collaborate with team members and finally publish this information.

From the mid 2000s, with the launch of SQL Server 2005, Microsoft had been striving to bring business intelligence to the masses. Business Intelligence at that time was mostly used by large enterprises and was quite a costly solution to put into place. So, when SQL Server 2005 came into being with a new and improved Analysis Services, sporting Unified Dimensional Model, all of a sudden even small organizations could implement BI.

Fast forward to the present, we have self-service BI and analytics running the show, and Microsoft is in the middle of the game once again with Power BI. If you would like to learn more about Power BI and what it offers, I have a curation on Curah! dedicated especially for this. The content of the curation will change as Power BI evolves, so go take a look: https://curah.microsoft.com/348325/what-is-power-bi.