Distinct Count of Dimension based on a Filtered Measure

Okay, so we have this brand new awesome project going on; first time on Tabular model, and that too with a large number of measures connected to Power BI and all that jazz. The model would contain all those standard measures that the organization used, while we built some Power BI reports using these measures, with some visuals needing some improvisation on top of the standard measures, i.e. requirements specific to the reports.

Enthusiastic as we were, one of the hardest nuts to crack, though it seemed so simple during requirements gathering, was to perform a distinct count of a dimension based on a filtered measure on a couple of the reports. To sketch it up with some context; you have products, several more dimensions, and a whole lot of measures including one called Fulfillment (which was a calculation based on a couple of measures from two separate tables). The requirement was to get a count of all those products (that were of course filtered by other slicers on the Power BI report) wherever Fulfillment was less than 100%, i.e. the number of products that had not reached their targets.

Simple as the requirements seemed, the hardest part in getting it done, was the limited knowledge in DAX, specifically, knowing which function to use. We first tried building the data model itself, but our choice in DAX formulae, and the number of records we had (50 million+) soon saw us running out of memory in seconds on a 28GB box; Not too good, given the rest of the model didn’t even utilize more than half the memory.

Using calculated columns was not a possibility since the columns that made up the measures that made up Fulfillment was from a different table, and the calculation does not aggregate up.

Since it was a report requirement, we tried doing it visually, by pulling in the Product and the Fulfillment fields on to a table visual, and then filtered Fulfillment, as so:

And then performed a distinct count on product, and voila! it performed a count, but alas! we realized that the filter gets disabled/removed when that happens. Which means the count always ends up being a count of all.

A frenzy of DAX formulae at the Power BI level did not help either, until we tried out the VALUES() function, courtesy direction from Gerhard Brueckl (b|t), trust the MVP community to have your back.

The VALUES() function returns a single-column table of unique values from a specified table. Hence using the FILTER() function we were able to extract the unique products where fulfillment was less than 100%, and then used the COUNTROWS() function to simply count the number of records returned.

Product Short of Target =
VAR ProductCount =
COUNTROWS ( FILTER ( VALUES ( 'Product'[ProductCode] ), [Fulfillment] < 1 ) )
RETURN IF ( ISBLANK ( ProductCount ), 0, ProductCount )

It is very easy to overlook VALUES as a function that would give you unique values. Hence why it is important that you have an overall understanding of what each DAX function can be used for, or at least get your hands on a DAX cheat sheet of sorts pinned against your wall. Glad this worked out though.

Advertisements

What’s the difference between Live Connection and DirectQuery in Power BI, and where do they fit?

Hot on the heels of the question I was asked a few days ago, comes another closely related one: “What’s the difference between Connect live to a data source and DirectQuery a data source in Power BI?”

We had already established that there are two methods in which we could interact with data using Power BI: loading data into Power BI and accessing the data source directly.

Connecting live and DirectQuery both fall into the latter method, but there is a difference.

In DirectQuery mode, you access the data source, such as a relational database or data mart for data, but then you would create calculated columns or measures on top of it in Power BI generating a data model layer, something similar to a database view, if you may. The data still exists at the data source; but is pulled through the data model on Power BI onto the the visuals. The end users and report creators will see and interact with the data model on Power BI.

In the case of Connect live, the data model itself is at the source, you interact with it directly and no data model layer is created on Power BI. All measures, calculated columns and KPIs are provided by the data model at the source, along with the data. End users and report authors will see and interact with this data model through Power BI.

If you would compare these two methods on a conceptual level; DirectQuery mode is used in cases of self-service where you have data marts or a data warehouse on a relational database, and business users build their own data models off this for their business needs. The data marts or data warehouse will integrate data from various systems, and provide base measures with related dimensions.  Business user may create custom measures and calculated columns on top of this to suit their reporting and analytic requirements, and then explore data and build visual reports. Think of this as the data discovery phase of the self-service exercise.

Live connections would probably be used in scenarios where the analytic needs are better understood, and/or the type of analytics that were described above have matured and has become a mainstream in the organization. Here data models are built off the data warehouse using Analysis Services (multidimensional or tabular), with measures, calculations and KPIs that were earlier part of the self-service (and the data discovery exercise) incorporated in it. Business users now have established reports and dashboards that showcase organizational performance powered by established data models. Think of this phase where things have evolved into corporate BI that gives real value.

[SUBJECT TO CHANGE] Out of the whole bunch of supported data sources, Power BI currently supports the following in DirectQuery mode:

  • SQL Server
  • Azure SQL Database
  • Azure SQL Data Warehouse
  • SAP HANA
  • Oracle Database
  • Teradata Database
  • Amazon Redshift (Preview)
  • Impala (Preview)
  • Snowflake (Preview)

and the following using a Live connection:

  • Analysis Services Tabular
  • Analysis Services Multidimensional
  • Azure Analysis Services (Preview)

Consuming Live Data Sources vs On-Premise Data via gateways

So I got this question today, on what seemed to be a little confusing for the questioner, about two features of Power BI Pro.

Power BI Pro - Two Features

The question: “Why does it say ‘Consume live data sources with full interactivity’ as one feature while the other feature says ‘Access on-premise data using the Data Connectivity Gateways’, while it is obvious that if you need to connect to an on-premise data source to consume live data it has to be through a gateway?”

Okay, this is how I would explain this:

‘Consume live data sources with full interactivity’ means that you can directly access live data sources, such as relational databases, or data models built atop Analysis Services without having to load the data into Power BI first. Power BI has two ways of providing data to the user to build reports: “Import” and “Direct Query/Connect live”. The former allows you to connect to the data source, pull in the required data into Power BI and build a model off that, and then let the consumer build reports off this data model. Here the user hits Power BI for the data. The latter allows you to directly connect to the data source via Power BI and build reports off the data structure that already exists at the source, either by creating a model layer or using the model at the source. It is this latter method does this feature describe.

The other feature, ‘Access on-premises data using the Data Connectivity Gateways’ just mean that in order for you to get data from an on-premise source you need to use a Data Connectivity Gateway. The gateway is but a security mechanism that allows Power BI (which is a cloud service) to access a client’s secure environment (which is on-premise) to access data, regardless of it using the “Import” or the “Direct Query/Connect live” modes.

Of course, if you were accessing a cloud based data source such as an Azure SQL Database, an Azure SQL Data Warehouse or Excel files on a SharePoint Online folder, you would not need a gateway; and you can access them using “Import” or consume live data with full interactivity using the “Direct Query/Connect live” mode.

Theming in Power BI

Finally, we have theming in Power BI. A much requested and required feature, especially for organizations where using their corporate color themes in everything they do, is a way of life. And even when showcasing the capabilities of Power BI to potential clients, the questions sometimes boils down to something simple things like the customization of the color theme. This question can now be attempted with a confident ‘yes’, rather than the thoughtful ‘yes’ that we blurt out while mentally going through the steps of applying a colors from widget to widget.

The March 2017 update of Power BI Desktop comes with a preview of Themes. Right now it is in its simplest of forms: You manually create a JSON file that has a very few attributes that can set basic color themes to your reports. So all you have to do is create file that looks like this:

{
“name”: “rainbow”,
“dataColors”: [ “#FF0000”, “#FF7F00”, “#FFFF00”, “#00FF00”, “#0000FF”, “#4B0082”, “#9400D3” ],
“background”:”#FFFFFF”,
“foreground”: “#9400D3”,
“tableAccent”: “#FFFF00”
}

And then do this in Power BI Desktop; here:

Theme Import

And lo and behold my rainbow theme is applied:

To revert, you just re-select the Default Theme.

Yes, it is old-school, but this is preview, and only a few attributes are designed to get affected by the theme settings. However, it works, it gives us an idea as to what’s coming, and also let’s us pour in our suggestions as well.

What I really like about this is that you can have any number of colors listed out, usually it is around 8, with Power BI adding the default white and black. And what I really like about it is the list of accent colors based on the main colors:

Theme Colors

All in all these are exciting times. Things on the aesthetic customization aspect can only get better. To read more, check out the Power BI blog.

Azure Analysis Services

We’ve all seen how the world of data has been changing during the recent past. Many organizations have massive amounts of data. And many of them are running out of space to put them in. So naturally they turn to the cloud to store and process all of this data. The processed data can then be used for gaining insights in various ways. Apart from the popular forecasting and machine learning that is becoming a fad these days, there is a lot of traditional and “business” analytics that businesses still want to see. Business users want to dive into their data and perform self-service analytics and do data discovery.

However when you looked at the space on the Microsoft cloud, along with its data and analytics capabilities, you have the tools and services to store and process large amounts of data, but what you did not see was something that you could create a analytical model out of so that business users could easily consume as part of their business intelligence routine. Of course you had Power BI, but that was more of a next step, plus Power BI is lightweight and cannot handle more than 10GB.

The closest we had, on the cloud, was to build an Azure VM with SQL Server installed on it, and build the analytic model using Analysis Services. But then there was licensing, and the maintenance overhead among other things that did not make it a feasible option in a lot of cases.

And then Microsoft announced Azure Analysis Services a few months ago, a fully managed Platform-as-a-Service for analytic modeling. And suddenly there was hope. You no longer needed to write complex SQL against a SQL Data Warehouse, nor did you have to import processed data in its hundreds of thousands into Power BI to create your own analytic model.

Azure Analysis Services is currently in its preview phase, and hence Microsoft has given it only Tabular capability for the time being, with Multidimensional hopefully coming some time later. In my opinion that is just fine. One more thing though, if you would remember, the on-prem version of Analysis Services uses Windows Authentication only, in other words you needed to be on a Active Directory domain. So on Azure, in order to access Azure Analysis Services you need to be on Azure AD.

Let’s take a look at quickly setting up your first Azure Analysis Services database.

Creating a service instance is the usual process: Type in “Analysis Service”, and you would see it showing up in the Marketplace list:

Marketplace List Analysis Services
Analysis Services on the Marketplace

Once you select Analysis Services, you would see the Analysis Services information screen:

Analysis Services information screen
Analysis Services information blade

And then all you need to do this supply the configurations/settings and you are done:

Analysis Services Create blade
Settings/Configurations blade

When in Preview

At the time of writing, Analysis Services (preview) is only available in the South Central US and West Europe regions, so make sure that your resource group is created on one of those regions. The preview currently offers three standard pricing tiers, and one developer pricing tier (at an estimated cost of ~50 USD per month). The developer tier with 20 query processing units or QPUs (the unit of measure in Azure Analysis Services) and 3GB  of memory cache, is ideal to get started. More info on QPUs and pricing here.

Identity Not Found error

Another problem that I ran across was the “Identity not found” error that comes up a few moments after I click on the “Create” button, and Azure starts provisioning my service. It claimed that the user specified under “Administrator” cannot be found in Azure Active Directory, even though I did create such a user in AAD. The reason for this and how to resolve it is documented nicely here by @sqlchick. If you need further details on how to get your Office 365 tenant linked with your Azure subscription while integrating your Azure AD, you should definitely look at this.

Once provisioned, you can pause Analysis Services when it is not being used so that you could save dollars, while switching among pricing tiers is expected in the future.

Azure Analysis Services running
Analysis Services running