GitHub - ramsun/microtransaction-data-exploration: An analysis of age and gender demographics as well as spending patterns of users in the game Heroes of Pymoli. Tables are created with pandas.

'''
Based on the data bellow, we can draw 3 conclusions
1.  There was an overwhemlming majority of men who were involved in microtransactions in the game 
    (84.03% of players), while the number of women who were involed in microtransactions was only 
    14.06%.
2.  The age demographics that are involved in the most amount of microtransactions are between the 
    ages of 19 and 26, where 57.12% of users fall.  There is also a sizeable demographic in the 
    15-18 range at 15.62%.
3.  Both the most popular and most profitable items were the same in this data set.  The most 
    valuable item was Oathbreaker, Last Hope of the Breaking Storm, which brought in 50.76.
'''

# import dependencies
import pandas as pd
import numpy as np

# create a raw data parent data frame from a given csv
path = "./purchase_data.csv"
raw_df = pd.read_csv(path)

# display the first 5 rows of the raw data
raw_df.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Purchase ID	SN	Age	Gender	Item ID	Item Name	Price
0	0	Lisim78	20	Male	108	Extraction, Quickblade Of Trembling Hands	3.53
1	1	Lisovynya38	40	Male	143	Frenzied Scimitar	1.56
2	2	Ithergue48	24	Male	92	Final Critic	4.88
3	3	Chamassasya86	24	Male	100	Blindscythe	3.27
4	4	Iskosia90	23	Male	131	Fury	1.44

# Player Count Analysis
# Identify the total number of unique SN tags from the raw data frame
SN_series = raw_df["SN"].value_counts()
player_count = len(SN_series)

# print the total to the notebook
print("Number of players in Heroes of Pyoli: " + str(player_count))

Number of players in Heroes of Pyoli: 576

# Purchasing Analysis (Total)
# Perform a summary analysis of the raw data and determine the 4 key aspects of the data,
# which includes the number of unique items, the average purchase price, the total number of
# Purchases, and the total amount of revenue that was made from the data set

# perform the summary analysis
item_series = raw_df["Item Name"].value_counts() 
num_unique_items = len(item_series)
revenue = raw_df["Price"].sum()
purchase_count = len(raw_df)
avg_purchase_price = round(revenue / purchase_count, 2)

# create a dictionary of the summary data and put it into a data frame for output
purchase_data = {"Number of Unique Items" : num_unique_items , 
                    "Average Purchase Price" : avg_purchase_price ,
                    "Total Number of Purchases" : purchase_count,
                    "Total Revenue" : revenue}
purchasing_analysis = pd.DataFrame(data=purchase_data, index = ["Purchasing Analysis"])

# output to notebook
purchasing_analysis

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Number of Unique Items	Average Purchase Price	Total Number of Purchases	Total Revenue
Purchasing Analysis	179	3.05	780	2379.77

# Gender demographics (continued)
# shows the number of genders and their respective counts

#create a series based on each gender in the raw data
male_series = raw_df[raw_df.Gender == "Male"]["SN"].value_counts()
female_series = raw_df[raw_df.Gender == "Female"]["SN"].value_counts()
other_series = raw_df[raw_df.Gender == "Other / Non-Disclosed"]["SN"].value_counts()

# determine the count of each gender
male_count = len(male_series)
female_count = len(female_series)
other_count = len(other_series)

# calculate percentages
male_percentage = round(male_count/player_count * 100,2)
female_percentage = round(female_count/player_count * 100,2)
other_percentage = round(other_count/player_count * 100,2)

# initialize the output table with an index based on gender
gender_index_arr = ["Male", "Female", "Other / Non-Disclosed"]
gender_demographics = pd.DataFrame(index = gender_index_arr)

#assign each gender's data column by column
gender_demographics['Total Count'] = [male_count,female_count,other_count]
gender_demographics['Percentage of Players'] = [male_percentage,female_percentage,other_percentage]

# output to notebook
gender_demographics

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Total Count	Percentage of Players
Male	484	84.03
Female	81	14.06
Other / Non-Disclosed	11	1.91

# Purchasing Analysis (Gender)
# This will be a purchasing analysis of the raw data based on gender

# initialize a new df that will contain summary purchase info based on gender
purchasing_analysis_df = raw_df.groupby("Gender").Price.agg(["count", "mean","sum"])

# total money spent based on gender of the user
female_revenue = raw_df[raw_df.Gender == "Female"].Price.sum()
male_revenue = raw_df[raw_df.Gender == "Male"].Price.sum()
other_revenue = raw_df[raw_df.Gender == "Other / Non-Disclosed"].Price.sum()

# average purchase count for each gender
female_average_purchase_total_per_person = round(female_revenue/female_count, 2)
male_average_purchase_total_per_person = round(male_revenue/male_count, 2)
other_average_purchase_total_per_person = round(other_revenue/other_count,2)

# assigns values to the output df and clean it up
purchasing_analysis_df["Average Purchase Total Per Person"] = [female_average_purchase_total_per_person,male_average_purchase_total_per_person,other_average_purchase_total_per_person]
purchasing_analysis_df["mean"] = round(purchasing_analysis_df["mean"],2)
purchasing_analysis_df = purchasing_analysis_df.rename(columns = {"count": "Purchase Count", "mean":"Average Purchase Price", "sum":"Total Purchase Value"})

# ouput to notebook
purchasing_analysis_df

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Purchase Count	Average Purchase Price	Total Purchase Value	Average Purchase Total Per Person
Gender
Female	113	3.20	361.94	4.47
Male	652	3.02	1967.64	4.07
Other / Non-Disclosed	15	3.35	50.19	4.56

# Age demographics
# Come up with demographics data based on the number of unique users

# create the bins and labels
bins = [0,10,14,18,22,26,30,34,38,42,100]
age_ranges = ["<=10", "11-14", "15-18", "19-22", "23-26","27-30", "31-34", "35-38", "39-42", "43+"]

# create a new data frame that only keeps the purchases from unique users
unique_SN_df = raw_df
unique_SN_df = unique_SN_df.drop_duplicates(subset = 'SN', keep = 'first')

# create an age demographics report based off of the unique SN data frame
age_demographics_df = unique_SN_df.groupby(pd.cut(unique_SN_df["Age"], bins,labels = age_ranges))
age_demographics_df = age_demographics_df.count()
age_demographics_df = age_demographics_df.drop(columns = ['SN', 'Age','Gender','Item ID','Item Name','Price']) 
age_demographics_df = age_demographics_df.rename(columns = {"Purchase ID":"Total Count"}) 
age_demographics_df["Percentage of Players"] = round(age_demographics_df["Total Count"] / player_count * 100,2)

# output to notebook
age_demographics_df

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Total Count	Percentage of Players
Age
<=10	24	4.17
11-14	15	2.60
15-18	90	15.62
19-22	178	30.90
23-26	151	26.22
27-30	48	8.33
31-34	27	4.69
35-38	25	4.34
39-42	14	2.43
43+	4	0.69

# Purchasing Analysis (Age)

# create an age purchase report, which will now include every purchase in the analysis and not just unique purchases
age_purchase_analysis_df = raw_df.groupby(pd.cut(raw_df["Age"], bins,labels = age_ranges))
age_purchase_analysis_df = age_purchase_analysis_df.count()
age_purchase_analysis_df = age_purchase_analysis_df.drop(columns = ['SN', 'Age','Gender','Item ID','Item Name','Price']) 
age_purchase_analysis_df = age_purchase_analysis_df.rename(columns = {"Purchase ID":"Total Count"}) 
age_purchase_analysis_df["Percentage of Players"] = round(age_purchase_analysis_df["Total Count"] / player_count * 100,2)

# create a binned raw data data frame for analysis
raw_data_binned_df = raw_df
raw_data_binned_df["Binning"] = pd.cut(raw_data_binned_df["Age"], bins,labels = age_ranges)

# initialize the lists that will be used as column data for the output data frame
tot_pur_val_list = []
avg_pur_price_list = []
avg_pur_price_per_person_list = []

# create a list for the total purchase value and average purchase price based on the label
for label in age_ranges:
    tot_pur_val_list.append(raw_data_binned_df[raw_data_binned_df.Binning == label].Price.sum())
    avg_pur_price_list.append(round(raw_data_binned_df[raw_data_binned_df.Binning == label].Price.mean(),2))

# create a list that will make use of data from the a data frame in the previous cell
avg_pur_price_per_person_list = round(tot_pur_val_list / age_demographics_df["Total Count"],2)

# append the lists from the previous loop to the output data frame
age_purchase_analysis_df["Average Purchase Price"] = avg_pur_price_list
age_purchase_analysis_df["Total Purchase Value"] = tot_pur_val_list
age_purchase_analysis_df["Average Total Purchase Per Person"] = avg_pur_price_per_person_list

#clean up and print to notebook
age_purchase_analysis_df

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Total Count	Percentage of Players	Average Purchase Price	Total Purchase Value	Average Total Purchase Per Person
Age
<=10	32	5.56	3.40	108.96	4.54
11-14	19	3.30	2.68	50.95	3.40
15-18	113	19.62	3.03	342.91	3.81
19-22	254	44.10	3.04	771.89	4.34
23-26	207	35.94	3.06	634.24	4.20
27-30	63	10.94	2.88	181.23	3.78
31-34	38	6.60	2.73	103.68	3.84
35-38	35	6.08	3.55	124.35	4.97
39-42	15	2.60	3.37	50.50	3.61
43+	4	0.69	2.77	11.06	2.76

# Top Spenders
# Determines the top 5 users who spent the most on the microtransactions

# create a new data frame grouped by the SN tag
gb_SN_df = raw_df.groupby("SN").Price.agg(["count","mean","sum"])

# create new df that will hold the rows information of the top 5 spenders
top_spenders_df = gb_SN_df.nlargest(5, 'sum')
top_spenders_df["mean"] = round(top_spenders_df["mean"],2)
top_spenders_df = top_spenders_df.rename(columns = {"count":"Purchase Count","mean":"Average Purchase Price", "sum": "Total Purchase Value"})

# output to notebook
top_spenders_df

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Purchase Count	Average Purchase Price	Total Purchase Value
SN
Lisosia93	5	3.79	18.96
Idastidru52	4	3.86	15.45
Chamjask73	3	4.61	13.83
Iral74	4	3.40	13.62
Iskadarya95	3	4.37	13.10

# Most Popular Items
# Determines which items are the most popular based on number of sales

# create a new data frame grouped by each unique item id, name, and price
gb_items_df = raw_df.groupby(["Item ID", "Item Name", "Price"]).Price.agg(["count", "sum"])

# return the top 5 items based on purchase count
popular_items_df = gb_items_df.nlargest(5, 'count')
popular_items_df = popular_items_df.rename(columns = {"count":"Purchase Count", "sum":"Total Purchase Value"}) # clean headers

# output to notebook
popular_items_df

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

			Purchase Count	Total Purchase Value
Item ID	Item Name	Price
178	Oathbreaker, Last Hope of the Breaking Storm	4.23	12	50.76
82	Nirvana	4.90	9	44.10
108	Extraction, Quickblade Of Trembling Hands	3.53	9	31.77
145	Fiery Glass Crusader	4.58	9	41.22
19	Pursuit, Cudgel of Necromancy	1.02	8	8.16

# Most Profitable Items
# Determinesthe top 5 which items provided the most profit in terms of total purchase value

# create a new data frame that organizes the raw data by the item id, item name, and price, all 
gb_items_df = raw_df.groupby(["Item ID", "Item Name", "Price"]).Price.agg(["count", "sum"])

# return the top 5 items based on purchase count
# create a new data frame that contains the rows of the items with the higest total purchase value (aka sum) 
profitable_items_df = gb_items_df.nlargest(5, 'sum')
profitable_items_df = popular_items_df.rename(columns = {"count":"Purchase Count", "sum":"Total Purchase Value"}) # clean headers

# output to notebook
profitable_items_df

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

			Purchase Count	Total Purchase Value
Item ID	Item Name	Price
178	Oathbreaker, Last Hope of the Breaking Storm	4.23	12	50.76
82	Nirvana	4.90	9	44.10
108	Extraction, Quickblade Of Trembling Hands	3.53	9	31.77
145	Fiery Glass Crusader	4.58	9	41.22
19	Pursuit, Cudgel of Necromancy	1.02	8	8.16

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
Pandas - Heroes of Pymoli Analysis.ipynb		Pandas - Heroes of Pymoli Analysis.ipynb
README.md		README.md
purchase_data.csv		purchase_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages