Skip to content

PBI-Guy/PowerQuery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Power Query

This repo is about showing the power of Power Query and how to transform your data with it. We will first focus on Excel and in a later stage, add Power BI as well.

Table of content

Secnario Description
Problem statement
Scenario 1 - Connect to one CSV file
Scenario 2 - Connect to folder and combine multiple CSV files
Scenario 3 - Create a data model
Bonus - Add a Date Table as dimension
Clean Up

Scenario Description

We're a global sales manager and want to analyze our sales performance across different dimensions like time, product, and many more. From our controlling team we receive a monthly CSV file with the current sales performance. With Power Query, we want to automate as much as possible to avoid manual tasks and be as efficient as possible.

Problem statement

As we receive each month one CSV file, it's hard to analyze revenue over time. Further, it takes a lot of manual effort as of now to combine the files to one table.

Scenario 1 - Connect to one CSV file

With Power Query, we can connect to different data sources. In the first scenario, we connect to one CSV file and explore the possibilities.

Before we begin, make sure to download the Data folder and store it e.g. on your Desktop.

Open Excel, select Data in the Ribbon, click on Get Data, choose From File, and hit From Text/CSV.

A window will pop up in which you can select one CSV file. Navigate to the CSV folder and select April 2018.csv. Confirm by selecting Import.

Excel will now create a connection to the CSV file and show a preview of the data. In a first step, we will just Load the data as is.

After hitting the load button, Excel will copy the data from the CSV file into Excel and create a new worksheet April 2018. Further, in the Queries & Connections pane, we will see one query April 2018 with 2249 rows loaded. In case you don't see the Queries & Connections pane, head over to Data and select Queries & Connections.

As we have now our data loaded into the Excel, we can easily create a Pivot Table. For that, click somewhere into your table (e.g. cell A2), select Insert from the Ribbon, and choose PivotTable.

In the pop up window, we'll see the Table April 2018 is selected as well as to create the Pivot Table in a new worksheet. Confirm with OK.

Now, we can easily select the required fields. For example, let's choose ProductCategory and SumOfAmount. Make sure ProductCategory is added to the Rows section and SumOfAmount is in the Values section. This way, we get a quick overview which product category has the most and least sales.

If we now add TranDate to the Columns section, we'll get a breakdown by date.

As this is quite a wide range and chaotic, it would be better to have some kind of hierarchy - Year, Quarter, Month, Day. For that purpose, we can leverage Power Query to create the desired columns and avoid manual work in future. Let's launch Power Query by selecting Data from the Ribbon, click on Get Data and choose Launch Power Query Editor....

A new window will pop up. This is the Power Query window in which we'll see on table already - our April 2018 CSV file which we loaded previously. If you click on it, you'll get a preview of your data.

One the left hand side (1), you'll see all table to which you have connected to. In the middle (2), you'll see a preview of your data. This is depending on which table you select on the left hand side. Lastly, at the right (3), you'll see all transformation steps applied to this specific table. As of now, we see three steps: Source - Promoted Headers - Changed Type. You can explore each step by selecting it and the preview will show you how the data looked like at this step. Once we refresh the data, Power Query will go through each step from top to down and perform each applied transformation. Once done, the data will be loaded into Excel. Due to that approach, we only need to modify and transform the data once and every time we refresh our data, all steps will be applied automatically.

For our purpose, let's add a Year, Quarter, Month, and Date column. We start with the Year column by selecting the TranDate from our April 2018 Table, go to Add Column in the Ribbon, choose Date, click on Year and select Year from the options.

This will add a new column Year as well as a new step into our Applied Steps section.

Repeat the steps for Quarter, Month, and Day. Keep in mind to select the TranDate before you add a new column.

Once finished, select Home in the Ribbon and hit Close & Load.

This way, Power Query will refresh our data with the new speps applied and add the four new columns.

Because the Pivot Table is connected to the April 2018 table, we need now to refresh the Pivot Table as well. To do so, go to your Pivot Table, select it, and press Refresh in the PivotTable Analyze Ribbon menu. Once the refresh has successfully completed, we'll see the four new columns.

Let's now adjust the Pivot Table to show Year and Month in the Columns section instead of days. As we can see, a hierarchy helps for a better overview. Obviously, we can also add Quarter and Day if needed.

But as we wish to analyze our revenue across multiple months to compare and identify trends, we would now need to connect to the other CSV files. One approach would be to repeat the steps and combine the tables in Power Query or Excel. But as we want to be as efficient as possible, let's leverage the Power of Power Query even more.

Scenario 2 - Connect to folder and combine multiple CSV files

In our existing file, we select Get Data from the Data menu in the Ribbon, click on From File and select From Folder.

In the pop up window browse to the CSV folder and confirm with Open.

Now, we're connected to the folder. In the preview screen, we can see all files which are stored in this folder with some meta data like modified date, created date, path, etc. As we're not interested in the meta data but rather in the binaries, we have to extract those and combine it into one big table. Power Qurey helps us to achieve our goal by providing a Combine button. If we select it, it would do the magic automatically for us but in this case, we want to do further transformation steps on top so we select Transform Data.

In Power Query, we see now two tables - one called April 2018 and the other one CSV. Select the CSV table and click on the two arrows in the Content column.

A new window will pop up in which we define our sample CSV. Based on it the schema will be defined (meaning which columns should be loaded). Per defalt, the first file is taken as sample but it can be overwritten at the top. In our case, we can leave it as is. Like before, when we connected to one single CSV file, we have now a preview of the data from the sample file. As the data looks good, we confirm with the OK button.

After a few seconds, we have one big table containing all CSV data!

Power Query created automatically a function which is used to extract the binaries. As of now, we do not need to take care of.

Let's do some further transformation before we load the data. First, we want to remove the Source.Name column as it's not needed. Right click on the column name and select Remove.

Second, we want to rename some columns. Double click on TranDate and rename it to Date. Please rename columns as follow:

Original Name Renamed
TranDate Date
Dept Country
SumofAmount Revenue
CustomerSegment Customer Segment
ProductCategory Product Category

All changes will not affect the CSV file and are only applied during the load of the data into Excel.

Once done, confirm by selecting Close & Load from the Home Ribbon.

A new worksheet with a new table CSV has been loaded. Instead of creating now a new Pivot Table, let's leverage the existing one and just change the data source. For that, we go to our Sheet2 in which our Pivot Table is, click into the Pivot Table (e.g. cell A6), select PivotTable Analyze in the Ribbon, and click on Change Data Source.

Change the Table/Range to CSV and confirm with OK.

Lastly, let's add Product Category to the Rows, Revenue to Values, and the automatic created field Months (Date) to the Columns secion.

Now, we can easily compare the numbers for each month. Let's create a column chart to visualize our result. Select Insert in the Ribbon and click on PivotChart.

In the pop up window select Column, make sure to select the first one, and confirm with OK.

Next, right click in the chart and choose Select Data.

Switch now Rows and Columns with the button in the middle and confirm with OK.

This way the Month column is on the X-Axis. As the Pivot Chart and Pivot Table are now connected, we can filter our Pivot Table and influence our Chart this way. Let's test it by selecting the filter icon next to Column Labels and filter down to Decor and Furniture.

Let's test the magic of Power Query and add a new CSV file into our folder. Go back to the Data folder and Drag & Drop the September 2018.csv file into the CSV folder.

Once done, go back to Excel and hit Refresh All from the Data Ribbon.

If everything works as expected, September data will be visible in our Pivot Table as well as the Chart! This way, we can now just add new files into the CSV folder, hit the Refresh button in Excel and everything else will magically be done for us!

If we go one step further and change the Columns fields now to Dept and remove Product Category, we see the Table as well as the Chart is influenced.

For reporting purpose, it would be easier if we would have names instead of numbers for our departement respectively countries. There are different ways to achieve it. One would be to use Power Query and add an additional column with an IF statement. For example, if the country code is 110, write USA, if 120, write Germany, etc. The issue with this approach is if a new country code appears or changes, we need to adjust our Power Query. Therefore, let's use a better approach and build a data model.

Scenario 3 - Create a data model

Within our CSV folder, we have a Country Codes.csv file which we can use in Excel and create a data model. We again connect to the Excel Sheet via Get Data form the Data Ribbon and select From File - From Text/CSV.

On the next screen, browse to your Data folder and select Country Codes.csv.

On the next screen, just hit Load.

Afterwards, go to the Data Ribbon and select Data Model - Relationships.

In the Manage Relationships window, select **New...*.

Now, create a relationship between the two tables CSV and Country_Codes on the columns Dept and Country Code as shown in the screen shot below. Confirm with OK.

Confirm once more by selecting Close.

Excel offers the possibility of showing the relationship between the tables in a view. To access it, select Data Model from the Data Ribon and choose Manage Data Model. A new window pops up in which you can select the Diagram View to see a visual representation of the relationships between the tables.

Let's close the Data Model window and return to our Sheet2. In here, we would like to add a new table - the Country Codes table - to our existing Pivot Table. To do so, choose More Tables... to create a new Pivot Table, which is required now.

A message pops up which indicates we need to create a new Pivot Table. Confirm by selecting Yes.

Now, all three tables are accessible. Add the Country column from the Country Codes table to the Columns section. Further, add the Product Category from the CSV table to the Rows section. Lastly, make sure Revenue is added to the Values.

Lastly, we can add a Pivot Chart to visualize our data in a column chart. Just follow the same steps as before b selecting Insert from the Ribbon menu - PivotChart - Column. If we swap column and rows once more, we got a chart like the one in the picture below.

Bonus - Add a Date Table as dimension

Following best practice and creating a start schema. it makes sense to add a Date table to our model instead of adding further columns like Year, Month, Day to our Fact (CSV) table. We can manually create one in Power Query. To do so, hit the Get Data button from the Data Ribbon and select From Other Source - Blank Query.

Power Query Editor window will appear. Here, select Advanced Editor from the Home Ribbon. Copy and paste the code below, confirm with Done.

    Source = CSV,
    MinDate = List.Min(Source[Date]),
    MaxDate = List.Max(Source[Date]),
    StartDate = #date(Date.Year(MinDate), 1, 1),
    EndDate = #date(Date.Year(MaxDate), 12, 31),
    DateList = List.Dates(StartDate, Duration.Days(EndDate - StartDate) + 1, #duration(1, 0, 0, 0)),
    DateTable = Table.FromList(DateList, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    ChangedType = Table.TransformColumnTypes(DateTable,{{"Column1", type date}}),
    RenamedColumns = Table.RenameColumns(ChangedType,{{"Column1", "Date"}}),
    InsertYear = Table.AddColumn(RenamedColumns, "Year", each Date.Year([Date]), Int64.Type),
    InsertMonth = Table.AddColumn(InsertYear, "Month", each Date.Month([Date]), Int64.Type),
    InsertDay = Table.AddColumn(InsertMonth, "Day", each Date.Day([Date]), Int64.Type),
    InsertQuarter = Table.AddColumn(InsertDay, "Quarter", each Date.QuarterOfYear([Date]), Int64.Type),
    InsertWeekOfYear = Table.AddColumn(InsertQuarter, "WeekOfYear", each Date.WeekOfYear([Date]), Int64.Type)
in
    InsertWeekOfYear

The code uses the CSV table as source and extract the min and max year from the Date column. Based on it, it creates dynamically a date table from the first day of the min Year until the last day of the max Year. In our example this means it create automatically a date table from 01. January 2018 until 31. December 2018.

Rename Query1 to Date through the Properties on the right hand side (or double click on the table name).

Now, hit the Close & Apply button from the Home Ribbon to load the new Date table.

Once loaded, create a relationship between the Date and CSV table by selecting Data in the Ribbon - Data Model - Relationships and hit New.... Make sure to create a new relationship between the CSV and Date table on the Date columns. Confirm by selecting OK and Close afterwards.

With our new dimension table Date and a correct relationship to our fact table CSV we can easily add any column to our Pivot Table and Chart.

Clean Up

As we don't need the April 2018 table anymore, we can easily delete it. To do so, open Queries and Connections through the Data Ribbon, right click on April 2018, and select Delete.

Confirm by clicking on Delete on the pop up window.

Delete the April 2018 worksheet and Sheet2 as well as it's not needed anymore.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors