Something happened on the way to Analytics

I just spent the week onsite with a client’s business intelligence team. Apart from the amazing cultural experiences after each day’s work, the experiences of working with a corporate technology team gave me hitherto unseen experiences from their perspective. The client was launching their new ERP system, and were starting to building an analytics system on Azure that would service the needs of the business users with information from this new ERP, and other systems.

Working as part of a BI technology solutions partner in the long run morphs your way at looking things: technology first, trying to fit client needs to the types of solutions we do best; assuming that all tech people will be comfortable with the next best thing in technology; assuming that business users will always know the first thing about BI; Your architecture is always good.

Technology first

Despite knowing, understanding, and architecting a solution for a client’s business needs, the zeal to implement the solution using the latest and greatest technologies is overwhelming and is even “believed” to be the need of the hour. Very often the solution itself is pre-designed in my head with new technologies.

Let’s say that an organization has indeed gone ahead and invested in a new platform such as Microsoft Azure as their initiative to put everything on the cloud. That should not be an automatic signal that they are ready to adopt everything Azure has to offer despite those being part of the package. They would rather work with the technologies that are closely related to what they are familiar with. So when I jumped in all guns blazing yelling out “Data Lake, Data Lake and Power BI, Power BI”, I should have rightly expected them to have put up their hands and say “hold up!”, so it was a good thing that I didn’t even whisper “Databricks”.

Being a consultant and evangelizing in technologies is good, but what is important when consulting with a team which has been working on traditional technologies for quite sometime, and also having had success doing it, the need for talking them through the paradigm shift in the way we do analytics, and guiding them to the new set of technologies plays a vital part because “they are not a test-bed for Microsoft technologies”, and can be easily misconstrued as such if I seem to be forcing different technologies down their throats…

Names are misleading

When Azure SQL Data Warehouse was chosen to implement a multi-dimensional data warehouse, it may have seemed like the ideal choice. Why? because it was plain to see: keywords: “SQL”, “Warehouse”. However, no, SQL Data Warehouse is ideal only when you have data loads that are quite high, not when it is only several 100GBs. Armed with a few more reasons as to why not (A good reference for choosing Azure SQL Data Warehouse), I had confronted them. But the rebuke then was that they did get good enough performance, and that cost wasn’t a problem. Until of course a few months later when complex queries started hitting the system, and despite being able to afford that cost, the value of paying that amount did not seem worth it.

This is not where as a consultant I should go in saying “I told you so”, rather this is where I acknowledge their errors, suggest alternatives, and work with them at looking into alternatives that they have come up with and plan for change (such as Azure SQL Database).

BI what?

After all of this, it finally came to which business intelligence tool should the business users be using. I “knew” the obvious choice “was” Power BI. However, the team was not convinced when I told them that it was easy to build reports and dashboards out of Power BI. Sure, and so was creating reports and dashboards using the new Cognos, since they all knew Cognos so much. They even brought up the tool that came with their ERP, despite not having grouping in reports, and being demonstrated by the consultant as “easy to build”. Yes, for each person who was familiar with a particular tool, building the report will be easy was the lesson of the day. Hence, it was decided to go about a fact-based approach, and not opinion-based.

Then came the interesting bit, just by chance, during the fact-based debate, when I asked the question: “What type of reports will we be running for the users”. The answers all started pointing towards operational reports, and ad-hoc reports. The ad-hoc reports looked to be analytical in nature, but provided in the form of a flat Excel, or a tabular report. The operational reports, were all pages of tabular reports. Further questions digging into this revealed that with the new ERP launch, the ask was for more and more operational reports. And whose job was it to churn out these reports? The BI team’s of course! A cringe-worthy moment.

Months after consulting with them, for the first time, after they had come down a path of business intelligence, that too on Azure with its many options for providing analytics, they were writing heavy queries for operational reports because the ERP was only so good in providing the reports that were needed, and “naturally”, as a lot of business users expect, the job had fallen on the BI folk to supply that demand. Heavy, complex queries were hitting the star schema on the SQL Data Warehouse. The results were then dumped on to flat Excels, and the Excel-comfortable users consumed them, and sometimes, interestingly… wait for it… <insert theme to Jaws> performing their own analysis on top of them because all what they wanted was not given the way they wanted it.

Back to the drawing board

It was evident with all the BI initiatives what the users were currently in need of were operational reports. Additionally, they also wanted ad-hoc reports that were presented in a tabular form. It was evident that the architecture that was adopted was not the right one.

Current Architecture

It was soon decided that the team falls back, perform a reality check, and serve the users’ primary objective: operational reports. The architecture will be altered, but with less effort to service this need, and with Cognos since it would give printable reports, and the BI team was already familiar with it. Everyone understood and agreed that BI was not the need of the hour. However, the need for BI would come soon. Hence, it was decided that most of the ad hoc reports will be built in an analytical fashion to Power BI to tease the users with what was possible over the Excel-based method that everyone was used to. Excel-based analysis will still be given, but through Power BI datasets.

First Iteration of the New Way Forward

Crude, you may say. But this was most agreeable. Users will not be overwhelmed; the BI team will not be overworked; technology will not be given priority. The need of the hour will be serviced, and the need to come will be slowly infused into the users’ DNA. Since this is Azure, it will allow for an initial architecture, with less spend and less resources, and allowing the BI folk to iteratively build an analytics platform with an iteratively improving architecture, on the road taking them on their BI journey.

The effect of Summarize by Default on a Tabular Model

Another one of my experiences recently was this strange looking query generated off my Power BI reports. These reports were running off live a live connection to a tabular model on Analysis Services. And of course these queries at up my CPU, leaving it at 100% for quite a long while, with the reports not responding. One thing about the reports were that they had quite a number of parameters (which of course were a client requirement which we couldn’t do much about).

Here’s just a tip of the ice berg sample of the huge query that is generated:

Big Query

The data model was quite a large one with 30+ measures, and 10+ dimensions, and one thing that we had missed out was that all numeric fields that were not used as measures such as the Year, MonthNumber and WeekNumber fields of the Calendar dimension, and EmployeeAge field of the Employee dimension had their Summarize By property set to Default. But this was not indicated in any way in Power BI especially in the Fields section, where usually fields that are summarized show up with a Σ to indicate that the field is summarized when the data model is a Power BI data model.

The phenomenon is explained in detail in Pinal’s SQLAuthority. Make sure you take a look. Happy New Year!

Distinct Count of Dimension based on a Filtered Measure

Okay, so we have this brand new awesome project going on; first time on Tabular model, and that too with a large number of measures connected to Power BI and all that jazz. The model would contain all those standard measures that the organization used, while we built some Power BI reports using these measures, with some visuals needing some improvisation on top of the standard measures, i.e. requirements specific to the reports.

Enthusiastic as we were, one of the hardest nuts to crack, though it seemed so simple during requirements gathering, was to perform a distinct count of a dimension based on a filtered measure on a couple of the reports. To sketch it up with some context; you have products, several more dimensions, and a whole lot of measures including one called Fulfillment (which was a calculation based on a couple of measures from two separate tables). The requirement was to get a count of all those products (that were of course filtered by other slicers on the Power BI report) wherever Fulfillment was less than 100%, i.e. the number of products that had not reached their targets.

Simple as the requirements seemed, the hardest part in getting it done, was the limited knowledge in DAX, specifically, knowing which function to use. We first tried building the data model itself, but our choice in DAX formulae, and the number of records we had (50 million+) soon saw us running out of memory in seconds on a 28GB box; Not too good, given the rest of the model didn’t even utilize more than half the memory.

Using calculated columns was not a possibility since the columns that made up the measures that made up Fulfillment was from a different table, and the calculation does not aggregate up.

Since it was a report requirement, we tried doing it visually, by pulling in the Product and the Fulfillment fields on to a table visual, and then filtered Fulfillment, as so:

And then performed a distinct count on product, and voila! it performed a count, but alas! we realized that the filter gets disabled/removed when that happens. Which means the count always ends up being a count of all.

A frenzy of DAX formulae at the Power BI level did not help either, until we tried out the VALUES() function, courtesy direction from Gerhard Brueckl (b|t), trust the MVP community to have your back.

The VALUES() function returns a single-column table of unique values from a specified table. Hence using the FILTER() function we were able to extract the unique products where fulfillment was less than 100%, and then used the COUNTROWS() function to simply count the number of records returned.

Product Short of Target =
VAR ProductCount =
COUNTROWS ( FILTER ( VALUES ( 'Product'[ProductCode] ), [Fulfillment] < 1 ) )
RETURN IF ( ISBLANK ( ProductCount ), 0, ProductCount )

It is very easy to overlook VALUES as a function that would give you unique values. Hence why it is important that you have an overall understanding of what each DAX function can be used for, or at least get your hands on a DAX cheat sheet of sorts pinned against your wall. Glad this worked out though.

Azure Analysis Services

We’ve all seen how the world of data has been changing during the recent past. Many organizations have massive amounts of data. And many of them are running out of space to put them in. So naturally they turn to the cloud to store and process all of this data. The processed data can then be used for gaining insights in various ways. Apart from the popular forecasting and machine learning that is becoming a fad these days, there is a lot of traditional and “business” analytics that businesses still want to see. Business users want to dive into their data and perform self-service analytics and do data discovery.

However when you looked at the space on the Microsoft cloud, along with its data and analytics capabilities, you have the tools and services to store and process large amounts of data, but what you did not see was something that you could create a analytical model out of so that business users could easily consume as part of their business intelligence routine. Of course you had Power BI, but that was more of a next step, plus Power BI is lightweight and cannot handle more than 10GB.

The closest we had, on the cloud, was to build an Azure VM with SQL Server installed on it, and build the analytic model using Analysis Services. But then there was licensing, and the maintenance overhead among other things that did not make it a feasible option in a lot of cases.

And then Microsoft announced Azure Analysis Services a few months ago, a fully managed Platform-as-a-Service for analytic modeling. And suddenly there was hope. You no longer needed to write complex SQL against a SQL Data Warehouse, nor did you have to import processed data in its hundreds of thousands into Power BI to create your own analytic model.

Azure Analysis Services is currently in its preview phase, and hence Microsoft has given it only Tabular capability for the time being, with Multidimensional hopefully coming some time later. In my opinion that is just fine. One more thing though, if you would remember, the on-prem version of Analysis Services uses Windows Authentication only, in other words you needed to be on a Active Directory domain. So on Azure, in order to access Azure Analysis Services you need to be on Azure AD.

Let’s take a look at quickly setting up your first Azure Analysis Services database.

Creating a service instance is the usual process: Type in “Analysis Service”, and you would see it showing up in the Marketplace list:

Marketplace List Analysis Services
Analysis Services on the Marketplace

Once you select Analysis Services, you would see the Analysis Services information screen:

Analysis Services information screen
Analysis Services information blade

And then all you need to do this supply the configurations/settings and you are done:

Analysis Services Create blade
Settings/Configurations blade

When in Preview

At the time of writing, Analysis Services (preview) is only available in the South Central US and West Europe regions, so make sure that your resource group is created on one of those regions. The preview currently offers three standard pricing tiers, and one developer pricing tier (at an estimated cost of ~50 USD per month). The developer tier with 20 query processing units or QPUs (the unit of measure in Azure Analysis Services) and 3GB  of memory cache, is ideal to get started. More info on QPUs and pricing here.

Identity Not Found error

Another problem that I ran across was the “Identity not found” error that comes up a few moments after I click on the “Create” button, and Azure starts provisioning my service. It claimed that the user specified under “Administrator” cannot be found in Azure Active Directory, even though I did create such a user in AAD. The reason for this and how to resolve it is documented nicely here by @sqlchick. If you need further details on how to get your Office 365 tenant linked with your Azure subscription while integrating your Azure AD, you should definitely look at this.

Once provisioned, you can pause Analysis Services when it is not being used so that you could save dollars, while switching among pricing tiers is expected in the future.

Azure Analysis Services running
Analysis Services running

Proactive Caching: Automatic Processing would not start

I had proposed using proactive caching for a near real-time cube, and the idea was that when the ETL was done every five minutes, proactive caching should automatically kick in to process the cube.

Seemed simple. Configuration was simple. And it worked as expected on development and QA environments. But on UAT, proactive caching would simply not start. Everything was configured just as it was on dev and QA. Permissions were also perfect, but for some reason proactive caching would simply not kick in to automatically process the cube once the underlying table was updated. The ETL runs every 5 minutes and takes only a minute to update the underlying table, so I knew exactly when silence interval should start telling the analysis services to start processing – yet nothing happened.

Continue reading Proactive Caching: Automatic Processing would not start