Skip to content

ramsun/microtransaction-data-exploration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

'''
Based on the data bellow, we can draw 3 conclusions
1.  There was an overwhemlming majority of men who were involved in microtransactions in the game 
    (84.03% of players), while the number of women who were involed in microtransactions was only 
    14.06%.
2.  The age demographics that are involved in the most amount of microtransactions are between the 
    ages of 19 and 26, where 57.12% of users fall.  There is also a sizeable demographic in the 
    15-18 range at 15.62%.
3.  Both the most popular and most profitable items were the same in this data set.  The most 
    valuable item was Oathbreaker, Last Hope of the Breaking Storm, which brought in 50.76.
'''
# import dependencies
import pandas as pd
import numpy as np
# create a raw data parent data frame from a given csv
path = "./purchase_data.csv"
raw_df = pd.read_csv(path)

# display the first 5 rows of the raw data
raw_df.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Purchase ID SN Age Gender Item ID Item Name Price
0 0 Lisim78 20 Male 108 Extraction, Quickblade Of Trembling Hands 3.53
1 1 Lisovynya38 40 Male 143 Frenzied Scimitar 1.56
2 2 Ithergue48 24 Male 92 Final Critic 4.88
3 3 Chamassasya86 24 Male 100 Blindscythe 3.27
4 4 Iskosia90 23 Male 131 Fury 1.44
# Player Count Analysis
# Identify the total number of unique SN tags from the raw data frame
SN_series = raw_df["SN"].value_counts()
player_count = len(SN_series)

# print the total to the notebook
print("Number of players in Heroes of Pyoli: " + str(player_count))
Number of players in Heroes of Pyoli: 576
# Purchasing Analysis (Total)
# Perform a summary analysis of the raw data and determine the 4 key aspects of the data,
# which includes the number of unique items, the average purchase price, the total number of
# Purchases, and the total amount of revenue that was made from the data set

# perform the summary analysis
item_series = raw_df["Item Name"].value_counts() 
num_unique_items = len(item_series)
revenue = raw_df["Price"].sum()
purchase_count = len(raw_df)
avg_purchase_price = round(revenue / purchase_count, 2)

# create a dictionary of the summary data and put it into a data frame for output
purchase_data = {"Number of Unique Items" : num_unique_items , 
                    "Average Purchase Price" : avg_purchase_price ,
                    "Total Number of Purchases" : purchase_count,
                    "Total Revenue" : revenue}
purchasing_analysis = pd.DataFrame(data=purchase_data, index = ["Purchasing Analysis"])

# output to notebook
purchasing_analysis
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Number of Unique Items Average Purchase Price Total Number of Purchases Total Revenue
Purchasing Analysis 179 3.05 780 2379.77
# Gender demographics (continued)
# shows the number of genders and their respective counts

#create a series based on each gender in the raw data
male_series = raw_df[raw_df.Gender == "Male"]["SN"].value_counts()
female_series = raw_df[raw_df.Gender == "Female"]["SN"].value_counts()
other_series = raw_df[raw_df.Gender == "Other / Non-Disclosed"]["SN"].value_counts()

# determine the count of each gender
male_count = len(male_series)
female_count = len(female_series)
other_count = len(other_series)

# calculate percentages
male_percentage = round(male_count/player_count * 100,2)
female_percentage = round(female_count/player_count * 100,2)
other_percentage = round(other_count/player_count * 100,2)

# initialize the output table with an index based on gender
gender_index_arr = ["Male", "Female", "Other / Non-Disclosed"]
gender_demographics = pd.DataFrame(index = gender_index_arr)

#assign each gender's data column by column
gender_demographics['Total Count'] = [male_count,female_count,other_count]
gender_demographics['Percentage of Players'] = [male_percentage,female_percentage,other_percentage]

# output to notebook
gender_demographics
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Total Count Percentage of Players
Male 484 84.03
Female 81 14.06
Other / Non-Disclosed 11 1.91
# Purchasing Analysis (Gender)
# This will be a purchasing analysis of the raw data based on gender

# initialize a new df that will contain summary purchase info based on gender
purchasing_analysis_df = raw_df.groupby("Gender").Price.agg(["count", "mean","sum"])

# total money spent based on gender of the user
female_revenue = raw_df[raw_df.Gender == "Female"].Price.sum()
male_revenue = raw_df[raw_df.Gender == "Male"].Price.sum()
other_revenue = raw_df[raw_df.Gender == "Other / Non-Disclosed"].Price.sum()

# average purchase count for each gender
female_average_purchase_total_per_person = round(female_revenue/female_count, 2)
male_average_purchase_total_per_person = round(male_revenue/male_count, 2)
other_average_purchase_total_per_person = round(other_revenue/other_count,2)

# assigns values to the output df and clean it up
purchasing_analysis_df["Average Purchase Total Per Person"] = [female_average_purchase_total_per_person,male_average_purchase_total_per_person,other_average_purchase_total_per_person]
purchasing_analysis_df["mean"] = round(purchasing_analysis_df["mean"],2)
purchasing_analysis_df = purchasing_analysis_df.rename(columns = {"count": "Purchase Count", "mean":"Average Purchase Price", "sum":"Total Purchase Value"})

# ouput to notebook
purchasing_analysis_df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Purchase Count Average Purchase Price Total Purchase Value Average Purchase Total Per Person
Gender
Female 113 3.20 361.94 4.47
Male 652 3.02 1967.64 4.07
Other / Non-Disclosed 15 3.35 50.19 4.56
# Age demographics
# Come up with demographics data based on the number of unique users

# create the bins and labels
bins = [0,10,14,18,22,26,30,34,38,42,100]
age_ranges = ["<=10", "11-14", "15-18", "19-22", "23-26","27-30", "31-34", "35-38", "39-42", "43+"]

# create a new data frame that only keeps the purchases from unique users
unique_SN_df = raw_df
unique_SN_df = unique_SN_df.drop_duplicates(subset = 'SN', keep = 'first')

# create an age demographics report based off of the unique SN data frame
age_demographics_df = unique_SN_df.groupby(pd.cut(unique_SN_df["Age"], bins,labels = age_ranges))
age_demographics_df = age_demographics_df.count()
age_demographics_df = age_demographics_df.drop(columns = ['SN', 'Age','Gender','Item ID','Item Name','Price']) 
age_demographics_df = age_demographics_df.rename(columns = {"Purchase ID":"Total Count"}) 
age_demographics_df["Percentage of Players"] = round(age_demographics_df["Total Count"] / player_count * 100,2)

# output to notebook
age_demographics_df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Total Count Percentage of Players
Age
<=10 24 4.17
11-14 15 2.60
15-18 90 15.62
19-22 178 30.90
23-26 151 26.22
27-30 48 8.33
31-34 27 4.69
35-38 25 4.34
39-42 14 2.43
43+ 4 0.69
# Purchasing Analysis (Age)

# create an age purchase report, which will now include every purchase in the analysis and not just unique purchases
age_purchase_analysis_df = raw_df.groupby(pd.cut(raw_df["Age"], bins,labels = age_ranges))
age_purchase_analysis_df = age_purchase_analysis_df.count()
age_purchase_analysis_df = age_purchase_analysis_df.drop(columns = ['SN', 'Age','Gender','Item ID','Item Name','Price']) 
age_purchase_analysis_df = age_purchase_analysis_df.rename(columns = {"Purchase ID":"Total Count"}) 
age_purchase_analysis_df["Percentage of Players"] = round(age_purchase_analysis_df["Total Count"] / player_count * 100,2)

# create a binned raw data data frame for analysis
raw_data_binned_df = raw_df
raw_data_binned_df["Binning"] = pd.cut(raw_data_binned_df["Age"], bins,labels = age_ranges)

# initialize the lists that will be used as column data for the output data frame
tot_pur_val_list = []
avg_pur_price_list = []
avg_pur_price_per_person_list = []

# create a list for the total purchase value and average purchase price based on the label
for label in age_ranges:
    tot_pur_val_list.append(raw_data_binned_df[raw_data_binned_df.Binning == label].Price.sum())
    avg_pur_price_list.append(round(raw_data_binned_df[raw_data_binned_df.Binning == label].Price.mean(),2))

# create a list that will make use of data from the a data frame in the previous cell
avg_pur_price_per_person_list = round(tot_pur_val_list / age_demographics_df["Total Count"],2)

# append the lists from the previous loop to the output data frame
age_purchase_analysis_df["Average Purchase Price"] = avg_pur_price_list
age_purchase_analysis_df["Total Purchase Value"] = tot_pur_val_list
age_purchase_analysis_df["Average Total Purchase Per Person"] = avg_pur_price_per_person_list

#clean up and print to notebook
age_purchase_analysis_df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Total Count Percentage of Players Average Purchase Price Total Purchase Value Average Total Purchase Per Person
Age
<=10 32 5.56 3.40 108.96 4.54
11-14 19 3.30 2.68 50.95 3.40
15-18 113 19.62 3.03 342.91 3.81
19-22 254 44.10 3.04 771.89 4.34
23-26 207 35.94 3.06 634.24 4.20
27-30 63 10.94 2.88 181.23 3.78
31-34 38 6.60 2.73 103.68 3.84
35-38 35 6.08 3.55 124.35 4.97
39-42 15 2.60 3.37 50.50 3.61
43+ 4 0.69 2.77 11.06 2.76
# Top Spenders
# Determines the top 5 users who spent the most on the microtransactions

# create a new data frame grouped by the SN tag
gb_SN_df = raw_df.groupby("SN").Price.agg(["count","mean","sum"])

# create new df that will hold the rows information of the top 5 spenders
top_spenders_df = gb_SN_df.nlargest(5, 'sum')
top_spenders_df["mean"] = round(top_spenders_df["mean"],2)
top_spenders_df = top_spenders_df.rename(columns = {"count":"Purchase Count","mean":"Average Purchase Price", "sum": "Total Purchase Value"})

# output to notebook
top_spenders_df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Purchase Count Average Purchase Price Total Purchase Value
SN
Lisosia93 5 3.79 18.96
Idastidru52 4 3.86 15.45
Chamjask73 3 4.61 13.83
Iral74 4 3.40 13.62
Iskadarya95 3 4.37 13.10
# Most Popular Items
# Determines which items are the most popular based on number of sales

# create a new data frame grouped by each unique item id, name, and price
gb_items_df = raw_df.groupby(["Item ID", "Item Name", "Price"]).Price.agg(["count", "sum"])

# return the top 5 items based on purchase count
popular_items_df = gb_items_df.nlargest(5, 'count')
popular_items_df = popular_items_df.rename(columns = {"count":"Purchase Count", "sum":"Total Purchase Value"}) # clean headers

# output to notebook
popular_items_df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Purchase Count Total Purchase Value
Item ID Item Name Price
178 Oathbreaker, Last Hope of the Breaking Storm 4.23 12 50.76
82 Nirvana 4.90 9 44.10
108 Extraction, Quickblade Of Trembling Hands 3.53 9 31.77
145 Fiery Glass Crusader 4.58 9 41.22
19 Pursuit, Cudgel of Necromancy 1.02 8 8.16
# Most Profitable Items
# Determinesthe top 5 which items provided the most profit in terms of total purchase value

# create a new data frame that organizes the raw data by the item id, item name, and price, all 
gb_items_df = raw_df.groupby(["Item ID", "Item Name", "Price"]).Price.agg(["count", "sum"])

# return the top 5 items based on purchase count
# create a new data frame that contains the rows of the items with the higest total purchase value (aka sum) 
profitable_items_df = gb_items_df.nlargest(5, 'sum')
profitable_items_df = popular_items_df.rename(columns = {"count":"Purchase Count", "sum":"Total Purchase Value"}) # clean headers

# output to notebook
profitable_items_df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Purchase Count Total Purchase Value
Item ID Item Name Price
178 Oathbreaker, Last Hope of the Breaking Storm 4.23 12 50.76
82 Nirvana 4.90 9 44.10
108 Extraction, Quickblade Of Trembling Hands 3.53 9 31.77
145 Fiery Glass Crusader 4.58 9 41.22
19 Pursuit, Cudgel of Necromancy 1.02 8 8.16

About

An analysis of age and gender demographics as well as spending patterns of users in the game Heroes of Pymoli. Tables are created with pandas.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors