Keeping in touch with Technology

This is not going to be a post about technology nor about a technical solution to a problem, rather something equally important that I feel has helped my career quite a lot. Inspiration for this post comes from a question that I received from an attendee at yesterday’s SLDC meetup asking how I keep myself up to date with technology. My expertise lies in business intelligence and databases especially on the Microsoft side of things, and so was this particular attendee’s. However, he had been, or rather his job had taken him on various stints on various technologies for a period of time, and he was now getting back into Microsoft technologies, but was concerned about the amount of learning that has to be caught up with. So back to the question, “how do you keep yourself updated?” and I know that this is a question that lingers in the minds of many. My answer to him was: “Nights”. But I think the right answer should have been: “Passion”.

Passion (https://i2.wp.com/direct2door.lk/products-images/p/a/passion_fruit.jpg)

In technology, I feel, one should have passion. Not the type of “passion” you see on every other CV describing the CV’s owner; but real, undying love for the type of technology you work with. “But I do have passion” some retort when I ask for demonstrable passion from interviewees, “I came up with a great idea for a dashboard for my client, and built this really awesome dashboard that won my organization a lot of praise” — That’s not passion, that’s just doing your job. That’s what you are supposed to do. “But I really really love to work with SQL Server” they’d say; then show me something that showcases it, I ask, something beyond just doing your job. Mostly I don’t get an answer. Well, if you have the passion you need to go beyond.

Let’s bring in an analogy. Marriage. You could marry either because you decide you need to settle down, have a husband/wife to take care of you, and start a family with; or because you are really, madly, deeply in love with that person you are in love with, and you want to spend your life with that person doing every crazy, stupid thing you ever wanted to. Passion for technology, for me, is like the latter option. You need to go beyond the great job you do at work, and explore the technology more; experiment → try new things → fail → fail again → read → learn → succeed → experiment more → teach → write → evangelize. But all this in your own time: A couple of hours in the night, an evening on a weekend, on game night, on a Friday night when you should be drinking the wee hours away, one day of your vacation… It takes a little bit of sacrifice. But it does pay off.

Once you go through a process such as this, you will have a revelation; people start calling you an expert, you see opportunities in your career that you never saw before, you get opportunities that you never dreamed of before. You would see things paying off — It has for me. Several times.

All those people who I know, who are passionate about technology, do the things they do because it comes from within, either because they are hardwired like that, or they have come to realize how much it is important and cultivate the habits. If you were wondering, what these people do; well, here are a few things…

Read. That’s the starting point. Read documentation, blogs, white papers to understand more. Then you can start experimenting; trying out tutorials, or scenarios of your own. And don’t just stop there; put your code into a repository such as Github. You can easily pull these out to showcase at an interview. Write about your experiments, take notes and start a blog to share your learning with the world (and also to showcase at interviews). Talk about your research and experiments; Organize a tech talk for your team, or your organization, or be a little bold and speak at a local community event — you will eventually become known in the technology community, and the technology will automatically get associated with you. The more you write about something and the more you speak about something; the more you understand it and the more that others associate you to it.

So, go ahead and work on it. Do it now. It’s hard at first. Persevere and it will pay off. I promise.

Advertisements

Unexpected Exception in Power BI connected to SQL Server 2016 Analysis Services

One interesting issue that showed its ugly head in my recent project was an unexpected exception that popped up on all visuals on a Power BI report. How it happened was quite strange; one moment all the reports were working fine. The next moment when a report visual was modified and deployed to Power BI, all the visuals start turning into a grey box with a cross, with a “See details” link. The details weren’t at all helpful. Just, “An unexpected exception occurred.”:

Clicking on the “Copy Details” link and pasting on a text editor, and quickly scanning through showed this:

And to make matters worse, all subsequent report loads, including that of other reports, puts the reports into hang, until Analysis Services is restarted. The cycle starts again when the offending report is opened again.

A little digging around pointed at a bug on SQL Server 2016 SP1 Analysis Services. I’m not sure if the problem occurs on a version prior to SP1, however there is a fix: SP1 CU5 (KB Article: KB4037354). Once installed the problem disappears. I tested the issue out on SQL Server 2017 Analysis Services, and this problem does not occur on SQL Server 2017.

Speeding up ETL data cleansing

While I was at the PASS Summit this year, I was sitting in Paul Turleys’s (b|t) session on creating integrated BI solutions. Paul was talking about how ETL takes up most of a BI solution’s development lifetime, regardless of how you planned for it. That’s when someone from the audience asked the question: “…methodology or tool in the MS stack to shorten the time taken for ETL?”. Paul’s answer was: “There’s no silver bullet, But I’m going to say it’s Power Query”. This got me thinking, since for a few weeks, my team had been struggling to get data cleansed on the project that they were working on.

Despite having a data profiling document, which documented how source data mapped to the data model, along with the rules that needed to be applied to these data, such as nullability, constraints, valid domain values and so forth, bad data continued to come in. Apart from bad data, there were scenarios that were not identified nor thought of before that were coming in with the data. Though we do usually expect some amount of bad data or anomalies, which can be reported on, this was pretty bad. We needed to have a data set that would not fail the ETL so that it can be at least tested for all business logic that we had identified as requirements.

The issue was that the team had to build the Integration Services packages first, before running it to finally find out the various data problems that kept spitting out. So, it took a while for the developer to identify the problem, realize the data issue, report it to the client, have it fixed by the client, (sometimes do a fix on the package) and repeat. There were times when I had to jump in to help figure out what the data problems were. But because getting onto a tool (specially non-Microsoft), connecting to the source, and then writing SQL queries to retrieve and check the data was quite a slow process, I preferred using Power Query.

Power BI is a self-service tool; it can connect to many different types of data source, the best thing about it is the ease of pulling in data, building joins, filtering, sorting and perform all the data manipulations that you want do is made super easy. And you could just save it all in one just file, and use it the next time you want it. So I get it (or at least formed an impression in my head), and related to it when Paul said “Power Query” as the tool to speed up the life time of ETL development.

Explaining how you could go about it is another post for the future, but I am presenting it at the Sri Lankan Data Community Meetup on Nov 29th in Colombo,if you are interested 😉

Distinct Count of Dimension based on a Filtered Measure

Okay, so we have this brand new awesome project going on; first time on Tabular model, and that too with a large number of measures connected to Power BI and all that jazz. The model would contain all those standard measures that the organization used, while we built some Power BI reports using these measures, with some visuals needing some improvisation on top of the standard measures, i.e. requirements specific to the reports.

Enthusiastic as we were, one of the hardest nuts to crack, though it seemed so simple during requirements gathering, was to perform a distinct count of a dimension based on a filtered measure on a couple of the reports. To sketch it up with some context; you have products, several more dimensions, and a whole lot of measures including one called Fulfillment (which was a calculation based on a couple of measures from two separate tables). The requirement was to get a count of all those products (that were of course filtered by other slicers on the Power BI report) wherever Fulfillment was less than 100%, i.e. the number of products that had not reached their targets.

Simple as the requirements seemed, the hardest part in getting it done, was the limited knowledge in DAX, specifically, knowing which function to use. We first tried building the data model itself, but our choice in DAX formulae, and the number of records we had (50 million+) soon saw us running out of memory in seconds on a 28GB box; Not too good, given the rest of the model didn’t even utilize more than half the memory.

Using calculated columns was not a possibility since the columns that made up the measures that made up Fulfillment was from a different table, and the calculation does not aggregate up.

Since it was a report requirement, we tried doing it visually, by pulling in the Product and the Fulfillment fields on to a table visual, and then filtered Fulfillment, as so:

And then performed a distinct count on product, and voila! it performed a count, but alas! we realized that the filter gets disabled/removed when that happens. Which means the count always ends up being a count of all.

A frenzy of DAX formulae at the Power BI level did not help either, until we tried out the VALUES() function, courtesy direction from Gerhard Brueckl (b|t), trust the MVP community to have your back.

The VALUES() function returns a single-column table of unique values from a specified table. Hence using the FILTER() function we were able to extract the unique products where fulfillment was less than 100%, and then used the COUNTROWS() function to simply count the number of records returned.

Product Short of Target =
VAR ProductCount =
COUNTROWS ( FILTER ( VALUES ( 'Product'[ProductCode] ), [Fulfillment] < 1 ) )
RETURN IF ( ISBLANK ( ProductCount ), 0, ProductCount )

It is very easy to overlook VALUES as a function that would give you unique values. Hence why it is important that you have an overall understanding of what each DAX function can be used for, or at least get your hands on a DAX cheat sheet of sorts pinned against your wall. Glad this worked out though.

What’s the difference between Live Connection and DirectQuery in Power BI, and where do they fit?

Hot on the heels of the question I was asked a few days ago, comes another closely related one: “What’s the difference between Connect live to a data source and DirectQuery a data source in Power BI?”

We had already established that there are two methods in which we could interact with data using Power BI: loading data into Power BI and accessing the data source directly.

Connecting live and DirectQuery both fall into the latter method, but there is a difference.

In DirectQuery mode, you access the data source, such as a relational database or data mart for data, but then you would create calculated columns or measures on top of it in Power BI generating a data model layer, something similar to a database view, if you may. The data still exists at the data source; but is pulled through the data model on Power BI onto the the visuals. The end users and report creators will see and interact with the data model on Power BI.

In the case of Connect live, the data model itself is at the source, you interact with it directly and no data model layer is created on Power BI. All measures, calculated columns and KPIs are provided by the data model at the source, along with the data. End users and report authors will see and interact with this data model through Power BI.

If you would compare these two methods on a conceptual level; DirectQuery mode is used in cases of self-service where you have data marts or a data warehouse on a relational database, and business users build their own data models off this for their business needs. The data marts or data warehouse will integrate data from various systems, and provide base measures with related dimensions.  Business user may create custom measures and calculated columns on top of this to suit their reporting and analytic requirements, and then explore data and build visual reports. Think of this as the data discovery phase of the self-service exercise.

Live connections would probably be used in scenarios where the analytic needs are better understood, and/or the type of analytics that were described above have matured and has become a mainstream in the organization. Here data models are built off the data warehouse using Analysis Services (multidimensional or tabular), with measures, calculations and KPIs that were earlier part of the self-service (and the data discovery exercise) incorporated in it. Business users now have established reports and dashboards that showcase organizational performance powered by established data models. Think of this phase where things have evolved into corporate BI that gives real value.

[SUBJECT TO CHANGE] Out of the whole bunch of supported data sources, Power BI currently supports the following in DirectQuery mode:

  • SQL Server
  • Azure SQL Database
  • Azure SQL Data Warehouse
  • SAP HANA
  • Oracle Database
  • Teradata Database
  • Amazon Redshift (Preview)
  • Impala (Preview)
  • Snowflake (Preview)

and the following using a Live connection:

  • Analysis Services Tabular
  • Analysis Services Multidimensional
  • Azure Analysis Services (Preview)