Abstract |
WHAT MAKES A MOBILE GAME SUCCESSFUL: A DATA-DRIVEN ANALYSIS OF THE IOS GAME APP MARKET A Thesis by PRADEEP KUMAR BALAN Submitted to the Office of Graduate Studies of Texas A&M University-Commerce in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE August 2016 WHAT MAKES A MOBILE GAME SUCCESSFUL: A DATA-DRIVEN ANALYSIS OF THE IOS GAME APP MARKET A Thesis by PRADEEP KUMAR BALAN Approved by: Advisor: Truong-Huy Nguyen Committee: Jinoh Kim Derek Harter Head of Department: Sang C. Suh Dean of the College: Brent Donham Dean of Graduate Studies: Arlene Horne iii Copyright © 2016 Pradeep Kumar Balan iv ABSTRACT WHAT MAKES A MOBILE GAME SUCCESSFUL: A DATA-DRIVEN ANALYSIS OF THE IOS GAME APP MARKET Pradeep Kumar Balan, MS Texas A&M University-Commerce, 2016 Advisor: Truong-Huy Nguyen, PhD The global mobile application market is a billion dollar market and one of the fastest growing global markets in terms of revenue year by year (Statista, n.d.-c). The iOS app store and Google Play are considered to be the top players of the mobile application market (Statista, n.d.-b) where users can find millions of mobile applications (also known as apps) under different categories to serve their purpose or to keep them occupied and entertained. As such, the mobile app markets, albeit among those that are most lucrative, are extremely competitive, where developers and design studios strive to create apps that can attract as many customers as possible (success) and avoid being swamped in the wealth of other apps (failure). Unfortunately, for the iOS App Store, one of the most prominent app markets nowadays, such information is not available to developers. In this project, our goal was to shed some light on the secrets behind a mobile game app’s success by analyzing the metadata of more than 130,000 iOS game apps to identify the factors that influence the following features: (1) the average user rating and (2) the user rating count. We suspected that an app’s success can be roughly approximated by the significance (i.e., rating v count) of its good ratings. If an app has a lot of good ratings, it can be deemed successful. By estimating the performance of an app based on these selected indicators, we hoped to help developers predict their apps’ prospected success in advance, so that they can build a game that has a higher chance at succeeding in the market. We first constructed predictive models, which include multiple linear regression, clustering, logistic regression, multi-layer perceptron, and Bayesian network models, that take descriptive features of a game as input (e.g., game genre, number of supported devices, developers’ name, price, its description, etc.) and return estimates of its average user rating. Finally, we compared the accuracy of these models and examined their advantages and disadvantages to users when interpreting the results. We found out that Bayesian Network models outperform all other models with an accuracy of 74% when predicting the average user rating of a game app. vi ACKNOWLEDGEMENTS My journey through graduate school has been a fabulous one. Through this experience I got to meet various people who encouraged me to complete my tasks by believing in my efforts. During a point of time, I believed that I would not acquire my dream, but the continuous support and guidance of my Professors and peers led me to achieve my success. First, I would like to thank Head of Computer Science Department, Dr. Sang C. Suh, for accepting my admission to Texas A&M University - Commerce, without which my dream could not have been fulfilled. I would also like to thank and express my deep gratitude to my major advisor and professor, Dr. Truong-Huy Nguyen, for his patient guidance, enthusiastic encouragement, tremendous support, and useful critiques of this research. He enriched my knowledge and played an important role in molding me into a better researcher. His advice and assistance in keeping me on schedule was incredible. Dr. Nguyen is one of the best professors and researchers with zeal, passion, and innovation that I have come to know. My grateful thanks are also extended to my committee members, Dr. Jinoh Kim and Dr. Derek Harter, for their patience and time. My sincere thanks go to Dr. Kim and Dr. Harter for their valuable suggestions provided. I would like to thank the Computer Science department office staff for their assistance throughout my work. I would also like to thank the Department of Computer Science for its financial support for the successful completion of my thesis, without which my goals would not have been accomplished. vii Finally, I would like to thank my family members and my friends for their encouragement and The United States of America for giving me a chance for career development and for its rich culture. viii TABLE OF CONTENTS LIST OF TABLES ..................................................................................................................... xi LIST OF FIGURES ................................................................................................................. xiii CHAPTER 1. INTRODUCTION ...................................................................................................... 1 1.1 Statement of the Problem ............................................................................... 1 1.2 Purpose of the Study ...................................................................................... 1 1.3 Research Questions ......................................................................................... 2 1.4 Significance of the Study ................................................................................ 2 1.5 Data Description ............................................................................................ 3 1.6 Definitions of Terms ...................................................................................... 4 1.7 Limitations ..................................................................................................... 4 1.8 Assumptions ................................................................................................... 5 1.9 Organization of Thesis Chapters .................................................................... 5 2. REVIEW OF THE LITERATURE ............................................................................ 6 2.1 iOS App Analysis .......................................................................................... 6 2.2 Probabilistic Graphical Models (Bayes Network) ........................................... 6 2.3 Predictive Modeling ........................................................................................ 7 3. METHOD OF PROCEDURE ..................................................................................... 8 3.1 Design of the Study ........................................................................................ 8 3.2 Defining the Goal ........................................................................................... 9 3.3 Collecting the Data ........................................................................................ 9 3.4 Exploring and Managing the Data ............................................................... 10 ix CHAPTER 3.4.1 Exploring the Data ........................................................................ 11 3.4.1.1 Categorizing the features ............................................... 11 Business related features ................................................. 12 Game specific features .................................................. 12 Operational related features .......................................... 12 Performance related features ......................................... 13 Visual related features ................................................... 13 3.4.1.2 Visualizing the data ........................................................ 13 Stacked Bar plot .............................................................. 14 3.4.2 Managing the Data ........................................................................ 17 3.4.2.1 Cleaning the data ............................................................ 17 Handle missing values .................................................... 17 Data transformation ...................................................... 17 3.4.2.2 Sampling the data ........................................................... 20 3.5 Modeling ...................................................................................................... 21 3.5.1 Random Model (Null model) ......................................................... 22 3.5.2 Multiple Linear Regression ........................................................... 23 3.5.3 Hierarchical Clustering ................................................................. 23 3.5.4 Logistic Regression Model ............................................................ 24 3.5.5 Multi-Layer Perceptron Model ..................................................... 24 3.5.6 Bayesian Network Model ............................................................. 25 3.5.7 Latent Dirichlet Allocation (LDA) Analysis ................................. 27 3.5.8 Sentimental Analysis .................................................................... 28 x CHAPTER 3.5.9 Visual Analysis ............................................................................. 28 3.6 Model Evaluation ......................................................................................... 29 3.6.1 Confusion Matrix .......................................................................... 30 3.6.2 Relaxing the Accuracy Measure ................................................... 30 3.7 Result presentation and documentation ....................................................... 31 4. PRESENTATION OF FINDINGS ........................................................................... 32 4.1 Trial -1: Random Model as Baseline Model .................................................. 32 4.2 Trial -2: Multiple Linear Regression ............................................................. 33 4.3 Trial -3: Hierarchical Clustering .................................................................... 35 4.4 Trial -4: Clustering Based on Genres............................................................. 37 4.5 Trial -5: Logistic Regression Model .............................................................. 40 4.6 Trial -6: Multi-Layer Perceptron (MLP) Model ............................................ 42 4.7 Trial -7: Simple Bayesian Network Model .................................................... 45 4.8 Trial -8: Bayesian Network Model with LDA Features ................................ 47 4.9 Trial -9: Bayesian Network Model with Sentimental Features ..................... 50 4.10 Trial -10: Bayesian Network Model with Visual Features .......................... 54 5. CONCLUSIONS AND RECOMMENDATIONS FOR FUTURE RESEARCH .... 58 REFERENCES .......................................................................................................................... 61 VITA ......................................................................................................................................... 64 xi LIST OF TABLES TABLE 1. 42 fields of each app in Apple store ................................................................................ 11 2. Business related features of each app in Apple store ....................................................... 12 3. Game specific features of each app in Apple store .......................................................... 12 4. Operational related features of each app in Apple store .................................................. 13 5. Performance related features of each app in Apple store ................................................ 13 6. Visual related features of each app in Apple store .......................................................... 13 7. Categories or classes of Average User Rating ................................................................. 22 8. Confusion Matrix ............................................................................................................. 30 9. An example to show the calculation of modified accuracy measure ............................... 31 10. Random Model................................................................................................................. 32 11. Accuracy table for Random model .................................................................................. 32 12. Multiple Linear Regression model.................................................................................. 33 13. Accuracy table for Multiple Linear Regression model .................................................... 34 14. Hierarchical Clustering model ......................................................................................... 35 15. Percentage distribution of apps between branches of the clusters ................................... 36 16. Clustering based on genres model ................................................................................... 37 17. Logistic Regression model ............................................................................................... 41 18. Parameters considered to optimize weights and bias of Logistic Regression model ...... 41 19. Accuracy table for Logistic Regression model ................................................................ 42 20. Multi-Layer Perceptron (MLP) model ............................................................................. 43 21. Parameters considered to optimize weights and bias of MLP model .............................. 43 xii 22. Accuracy table for Multi-Layer Perceptron (MLP) model .............................................. 44 23. Simple Bayesian network model ..................................................................................... 45 24. Accuracy table for Simple Bayesian Network model ...................................................... 46 25. Bayesian network model with LDA features ................................................................... 48 26. Accuracy table for Bayesian Network model with LDA features ................................... 49 27. Emotions and Sentiments ................................................................................................. 50 28. Bayesian network model with Sentimental features ..........................................................51 29. Accuracy table for Bayesian Network model with Sentimental features ........................ 53 30. Bayesian network model with Visual features ................................................................. 55 31. Accuracy table for Bayesian Network model with Visual features ................................. 57 32. Summary of the results of 10 trials in building the model ............................................... 59 xiii LIST OF FIGURES FIGURE 1. Flow chart describing the steps involved to obtain solution .............................................. 8 2. Stacked bar plot: Price vs Number of apps ...................................................................... 14 3. Stacked bar plot: Supported Devices vs Number of apps ................................................ 14 4. Stacked bar plot: Languages Supported vs number of apps ............................................ 15 5. Stacked bar plot: Release Date vs number of apps .......................................................... 16 6. Bar plot: Average User Rating vs number of apps ......................................................... 16 7. Converting supportedDevices field into a numerical value ............................................. 18 8. Converting isGameCenterEnabled field into a numerical value...................................... 18 9. Converting version field into a meaningful value............................................................ 19 10. Converting languageCodesISO2A field into a numerical value ...................................... 19 11. Converting genres field into a numerical value ............................................................... 19 12. Converting releaseDate field into a numerical value ....................................................... 20 13. Multi-Layer Perceptron model with one hidden layer ..................................................... 25 14. Simple Bayesian Network model..................................................................................... 26 15. Features extracted from the description of the game apps ............................................... 27 16. Visual Features extracted from the screenshotUrls of the game apps ............................. 29 17. Residual vs fitted values plot for Multiple Linear Regression model ............................. 34 18. Hierarchical Clustering Dendrogram ............................................................................... 36 19. Tukey’s test on genres ..................................................................................................... 38 20. Distance Matrix of mean difference ................................................................................ 39 21. Multiple Comparison between all pairs of genres ........................................................... 39 xiv 22. Multi-Dimensional Scaling showing the clusters of genres ............................................ 40 23. Multi-Layer Perceptron model ......................................................................................... 44 24. Bayesian Network with general features ......................................................................... 46 25. Representation of the LDA features to train the model ................................................... 47 26. Bayesian Network with LDA features (V1-V5) ............................................................. 49 27. Extraction of sentiments and emotions from the description........................................... 50 28. Histogram of Emotions on the training data .................................................................... 52 29. Histogram of Sentiments on the training data.................................................................. 52 30. Bayesian Network with Sentimental features .................................................................. 53 31. RBG Image of the screenshot and the histogram of the intensity.................................... 54 32. Gray Image of the screenshot and the histogram of the intensity .................................... 56 33. Bayesian Network with visual features ............................................................................ 57 1 Chapter 1 INTRODUCTION 1.1 Statement of the Problem There has been rapid growth of mobile applications in the apps market over the past few years (Statista, n.d.-c). By the year 2015, Google Play and the Apple app store, two major players of the apps market, made around 3 million apps (Statista, n.d.-b) available for users to download. Some of these applications have reached more than 100 million downloads (AndroidRank, n.d.) in free apps, and more than one million downloads (AndroidRank, n.d.) in paid apps, with very good average user ratings (success). In contrast, the majority of apps had comparably fewer downloads, e.g., less than 100,000 (AndroidRank, n.d.) and received poor average user ratings (failure). Failure of a game app in the apps market leads to a decrease in business value, which needs to be addressed by the developers or company. Being able to estimate the number of user ratings a game app will receive after launch is therefore desirable for the business. 1.2 Purpose of the Study In this work, we analyzed the data from one of the major categories in the Apple store, i.e., Games (Statista, n.d.-a), and data mine those large datasets to generate a predictive model that is able to predict the average rating of an app. This predictive model takes descriptive features of a game (e.g., game genre, number of supported devices, developers’ name, price, its description, etc.) as input and predicts the average rating of an app to demonstrate the app’s success in the market. 2 1.3 Research Questions The goal of this analysis was driven by the following research questions that will arise from different stakeholders’ perspectives, i.e., the targeted audience of the game app (users) and the creator of the game app. With respect to users’ perspective, the questions are: 1. In each of the game genres, what are important factors that users care about in the mobile games (such as graphics, speed, gameplay mechanics, or storyline)? 2. Is there any difference in terms of users’ expectations when playing a paid game, as compared to free game? With respect to developers’ perspective, the questions are 1. (Before-launch Considerations) What critical features should developers pay attention to when developing a game app? 2. (After-launch Considerations) What aspects of a game can developers improve on to increase the success outcome of the game? 1.4 Significance of the Study Most of the research on the mobile app market is on user review and user rating (Chen, Lin, Hoi, Xiao, & Zhang, 2014; Fu et al., 2013; Kong, Cen, & Jin, 2015; Maalej & Nabil, 2015). However, these works do not identify the factors that contribute to the success of a mobile app; instead their goal is to make sense of the reviews and categorize them according to some identified categories, such as whether the review reports on bugs or defects that occur in the app. These analyses consider only two parameters, user review and user rating, and they fail to include some of the other important parameters of the apps. There were no analyses that included 3 the general and technical information of the apps like price, genres, number of supported devices, file size, etc., provided by the developers and the companies. In this work, we aim to bridge the missing gap by taking into consideration all the useful information available in the apps such as price, genres, number of supported devices, file size, number of language supported, etc., and also supplement with extra features extracted from the games’ screenshots (such as vibrancy, intensity, etc.) and description (such as distributions of discussed topics), to build a predictive model that predicts the two important features (average rating and rating count) indicative of the success of the mobile app. Finally, the predictive model developed in this study is validated and evaluated on the test data. The process to build the model is iterated to get an optimal model that yields good prediction accuracy. The developers and the business administrators would be interested in implementing this model to assess their new game app so that they can predict the chance of success of the game app before the launch. 1.5 Data Description We were more interested in sampling only the game apps from the millions of apps available in the Apple store, since it is the most popular category among users (Statista, n.d.-a). Technically, the sampling was done by filtering the apps and collecting information from the apps store for those apps whose primary genre is Games. We collected the general information of all the available games apps from the Apple store. This information includes major features like price, user rating, file size, genres, description, etc. This task was achieved using iTunes Search API (Apple iTunes, n.d.-b), provided by Apple, which allowed us to get general information from Apple store in JSON format. The data collected from the Apple store in JSON format needs to be treated to extract the required features for our analysis. We used python and R programming scripts to extract and transform those important features required for our analysis. 4 1.6 Definitions of Terms In this section, we provide the definitions of key words, acronyms, or phrases referred to in this thesis. JSON. JavaScript Object Notation (JSON) is a lightweight, text-based, language-independent data interchange format. JSON defines a small set of formatting rules for the portable representation of structured data (ECMA-404, 2013). App. Mobile app are a software program designed for small computing devices such as tablets and smartphones (AmericanDialect, n.d.). Apple store. An Apple store is an online market place to purchase software applications that are specific for Apple computers and devices (Apple iTunes, n.d.-a). iTunes Search API. The iTunes Search API allows to place search fields to search for content within the iTunes Store, App Store, iBooks Store and Mac App Store. It is possible to search for a variety of content; including apps, iBooks, movies, podcasts, music, music videos, audiobooks, and TV shows (Apple iTunes, n.d.-b). Deep Learning. A class of machine learning techniques that exploit many layers of non-linear information processing for supervised or unsupervised feature extraction and transformation, and for pattern analysis and classification (Deng & Yu, 2013). 1.7 Limitations The researcher considered the past facts of the game apps collected from the Apple Store to thoroughly investigate the problem to draft a solution and it assumes that the evolution of the market will be stable and steady as understood from the past facts. Hence, any extreme changes in the market and trends may cause the predictive model to be inaccurate. 5 1.8 Assumptions The description field provides a short explanation about the game and we assumed that the features we extracted would help us to understand the core functions and characteristics of the game. 1.9 Organization of Thesis Chapters The thesis chapters are organized as follows: Chapter 2 covers the literature review, which is a description of the related works that informed this study. Chapter 3 includes the methodology used to build and evaluate the model that makes predictions with test data. Chapter 4 presents the results and findings of each trial lead to train the model. Chapter 5 includes a summary of the results and findings of the study and draws conclusion and recommendations for future work on the topic. 6 Chapter 2 LITERATURE REVIEW 2.1 iOS App Analysis Few researchers have analyzed the mobile app market, mainly because of the young age of prevalent app markets; the iOS App Store was officially started in 2008. Those analyses (Chen et al., 2014; Finkelstein et al., 2014; Fu et al., 2013; Kong et al., 2015; Maalej & Nabil, 2015) mainly focus on user reviews to determine the major bugs in the app that lead to failure of an app in the market. These researchers were also able to find the reason for the inconsistency in the reviews and why a user liked or disliked an app. Based on the findings they were able to draft how user reviews evolved over the time. They proposed a model that would choose the apps with best user experience, the reason for the end-users to love or hate the app, and spot the problematic apps in the market. They used a regularized regression model to predict the user rating score based on the user comments. Some of their major works include topic modeling to group the apps based on information from the user reviews. 2.2 Probabilistic Graphical Models (Bayes Network) Bayesian network is a graphical model used to determine the probabilistic relation between the variables of interest (Heckerman, 1997). In many of the previous works, Bayesian networks were used to determine the relationship between variables, train the problem, and to predict a solution with accuracy. It seems to be the best model for problems with over fitting data. Many researchers have described various methods to construct a Bayesian network and used statistical tools to improve the model to make predictions (Heckerman, 1997). 7 2.3 Predictive Modeling Many researchers have used the predictive modelling technique, which includes creating, testing, validating and evaluating the model to make a probabilistic prediction on future events (Guzman & El-Halaby Muhammad, 2015). The models are often built based on the data collected from the past to make a prediction for the future. The best known algorithms used in the predictive modeling are regression, association, classification, clustering, decision tree, neural networks, and also some ensemble models which combine multiple predictive algorithms (Deng & Yu, 2013; Nielsen, n.d.; Özel & Karpat, 2005). These studies indicate how to use these methods on different data to make a prediction, and they also compare the results of different models to explain which model performs best to solve a certain problem. 8 Chapter 3 METHOD OF PROCEDURE 3.1 Design of Study The entire lifecycle of this thesis work occurred in different stages as shown in Figure 1. The stages are iterative and the boundaries of stages are overlapping. In many instances, it may be required to loop back and forth between the stages to make progress in our analysis. The following are the different stages of our thesis work: Defining the goal Collecting the data Exploring and managing the data Building the model Evaluate and critique the model Present the results and documentation Figure 1. Flow chart describing the steps involved to obtain solution. 9 3.2 Defining the Goal The first stage in our analysis was to clearly and precisely define the goal. The goal should be measurable and calculable, so that we know our target and focus more on our target rather than deviating away from the goal line. The clearly defined goal allowed us to assign objectives to each stages of our analysis. Following are the factors considered to define the goal: The resources available at the start of the analysis. The resources required to conduct the analysis. The time length required by each stage of the analysis. The time length required to achieve the target i.e. complete the entire analysis. The audience targeted by the analysis. The goal should have a concrete ending point. Considering all these factors, the major goal of this study is To identify the factors that influence the prediction of a mobile game’s success in the app market. A well-defined goal will lead to hypotheses which can then be turned into effective research questions. Once the thesis goals are defined the next stage was to collect the data to conduct the analysis. 3.3 Collecting the Data This stage focused more on identifying the right data for the analysis, collecting the data from the required source, and making the data ready for next stage of the analysis. 10 As discussed in the introduction, Google Play and the Apple Store are the top players of the app market as per the statistics released in year 2015 (Statista, n.d.-b). Among those two players, the Apple Store provides iTunes Search API through which developers or researchers can search a variety of content about the apps in the App store (Apple iTunes, n.d.-b). Our analysis required information about the game apps available in the app market. Hence, we collected data about the game apps from the Apple Store using the iTunes Search API. The Apple Store also provides information about other apps that belong to different categories like business, education, entertainment, travel, etc. through iTunes Search API (Apple iTunes, n.d.-b). However, we focused on filtering the information about game apps which was required for the analysis. Each game app in the Apple store was presented with 42 fields to provide us the general information about that game app as shown in Table 1. Through iTunes search API, we collected information about 130,000 game apps from the Apple Store, and then the data were ready for exploring in the next stage of the analysis. 3.4 Exploring and Managing the Data In the process of exploring and managing the data, we cleaned the data, handled the errors, transformed the fields, and sampled the data. It is always important to explore and manage the collected data before stepping into the modeling stage of the analysis. Following are the outcomes of this stage of the analysis: It gave a clear idea about the data quality and the data quantity. It helped in selecting the features for the analysis. It also helped in filtering the undesirable features that were not required for the analysis. Handled the missing values of the data. Spotted the problems and cleaned the data. 11 Transformed the fields into usable format to include in the analysis. Sampled the data for modeling and validation Table 1 42 fields of each app in Apple store _id currency formattedPrice kind genres trackCensoredName features artistId languageCodesISO2A supportedDevices genreIds fileSizeBytes isGameCenterEnabled releaseDate sellerUrl screenshotUrls sellerName contentAdvisoryRating ipadScreenshotUrls bundleId averageUserRatingForCurrentVersion artworkUrl60 trackId userRatingCountForCurrentVersion artworkUrl512 trackName artworkUrl100 artistViewUrl primaryGenreName trackViewUrl artistName primaryGenreId trackContentRating price releaseNotes averageUserRating version minimumOsVersion userRatingCount description wrapperType advisories 3.4.1 Exploring the Data The first step in data exploration and management was to select and categorize the features based on practice. 3.4.1.1 Categorizing the features. We decided to categorize the features into following categories: Business related features Game specific features Operational related features 12 Performance related features Visual related features Business related features. These features captured how the company tries to trade the app in the market and target the customers. The following Table 2 emphasizes the business related features: Table 2 Business related features of each app in Apple store S.No. Features Description about the features 1 price Price of the game app 2 supportedDevices List of devices supported by the game app 3 languageCodesISO2A Supported languages by the game 4 releaseDate Release date of the game app 5 version Current Version of the game app Game specific features. These features captured both the technical as well as functional aspects of the game. The following Table 3 emphasizes the game related features: Table 3 Game specific features of each app in Apple store S.No. Features Description about the features 1 genres List of categories under which the game is grouped 2 fileSizeBytes Size of the game app 3 description Short description about the game 4 isGameCenterEnabled Access to the Game Center Operational related features. These features captured how responsive the company in answering users’ request. The following Table 4 emphasizes the operational related features: 13 Table 4 Operational related features of each app in Apple store S.No. Features Description about the features 1 releaseNotes Developers notes on the current release of the game app Performance related features. These features captured how the application performs in the market. The following Table-5 emphasizes the performance related features: Table 5 Performance related features of each app in Apple store S.No. Features Description about the features 1 averageUserRating Average User Rating for the entire game app 2 userRatingCount No. of users’ count for rating the entire game app. 3 averageUserRatingForCurrentVersion Average User Rating for the Current Version 4 userRatingCountForCurrentVersion No. of users’ count for rating the current version of the app Visual related features. These features captured the visual appeal of the game to its audience. The following Table-6 emphasizes the visual related features: Table 6 Visual related features of each app in Apple store S.No. Features Description about the features 1 vibrancy Energy and dynamism of an image 2 intensity Quality or brightness of an image 3.4.1.2 Visualizing the data. The next step in exploring stage was to visualize the data. Stacked bar plot. The stacked bar plot is the histogram of discrete data (Mount, By, Porzak, & Mount, n.d.). A stacked bar plot of the categorical features such as price, supported devices, languages, release date, and version against averageUserRating was created to visualize the distribution of the feature and the distribution of averageUserRating among those features. 14 Figure 2. Stacked bar plot: Price vs Number of apps. Initial impression of these visualization tells us the following: From price vs. number of apps plot (Figure 2), we can see that more than 35,000 game apps are free of cost, and only very few game apps cost you more than $5.99 in the apps market. The graph exhibits a unimodal distribution (single highest unique mode) of price. Figure 3. Stacked bar plot: Supported Devices vs Number of apps. 15 From supported devices vs. number of apps plot (Figure 3), we can see that more than 35,000 game apps support either 19 or 23 devices and very few game apps support less than 15 devices in the apps market. The graph exhibits a multimodal distribution (more than one mode) of supported devices. From Languages supported vs. number of apps plot (Figure 4), we can see that more than 45,000 game apps supports the primary language English and very few game apps supports more than one language. The graph exhibits a unimodal distribution of languages supported. Figure 4. Stacked bar plot: Languages Supported vs number of apps. From Release date vs. number of apps plot (Figure 5), we can see that the number of game apps released every year gradually increased from year 2008. More than 10,000 game apps were released in the year 2011 and 2012. From averageUserRating vs. number of apps plot (Figure 6), we can see that more than 40,000 game apps have user rating greater than or equal to 3 which can be labeled as success in 16 the apps market and less than 10,000 game apps have user rating less than 3 which indicates some problem in those game apps and can be labeled as failure in the apps market. Figure 5. Stacked bar plot: Release Date vs number of apps. Figure 6. Bar plot: Average User Rating vs number of apps. 17 Through the exploration stage, we acquired a feel of the data i.e. the quality of the data and the relationship between the variables. Now we move forward to manage the data to make it ready for the modeling stage of the analysis. 3.4.2 Managing the Data The second step in data exploration and management was to clean and sample the data. In cleaning the data, we handled the missing values and transformed the data. In sampling the data, we divided the data into sub samples as training set and test set. 3.4.2.1 Cleaning the data. The first step in managing the data was to clean the data by handling the missing values and transforming the data. The raw dataset was not always perfect and most of the times, the features are dirty and inconsistent. We should ensure that the data was clean and free of data error before taking them to the modeling stage of the analysis. Handle the missing values. While summarizing the data, we found that the 60,000 game apps did not have the value for averageUserRating. Our first major decision was whether to include this feature or not in our analysis. Since, averageUserRating is one of the most important feature which helps in predicting the success of the game app, we decided to include this feature by deleting the game apps with missing values. Hence, we ended up with 56,000 game apps for the analysis. Data Transformation. The main reason for data transformation is to ease the process of modeling and understand the data lot better. We focused on converting features to numeric data and continuous features to discrete data. Converting features to numeric data. Some of the features mentioned above were not readily available for the analysis. They need to be transformed or extracted into a meaningful 18 format to build the model. The notable features on which the transformation needs to be applied are as follows. supportedDevices is a list of the names of the supported devices that were converted into a numerical value by counting the number of devices as shown in Figure 7. Figure 7. Converting supportedDevices field into a numerical value. isGameCenterEnabled is a Boolean value that takes either true or false and was converted into an equivalent binary values of 0 and 1 as shown in Figure 8 . Figure 8. Converting isGameCenterEnabled field into a numerical value. 19 version is a string that has the following format, “major.minor.build.” For this analysis, we were more concerned about the major releases and hence we planned to omit the minor and build releases of the version field as shown in Figure 9. Figure 9. Converting version field into a meaningful value. languageCodesISO2A is an array list of the names of the languages supported by the game, which was converted into a numerical value by counting the number of languages supported as shown in Figure 10. Figure 10. Converting languageCodesISO2A field into a numerical value. genres is a list of the categories under which the game could be grouped, which was converted into a numerical value by counting the number of categories included in the genres as shown in Figure 11. Figure 11. Converting genres field into a numerical value. releaseDate consists of the information about the year, month, day, and time of release of the game to the market and we were more concerned about the year of release of the game, thus omitting the other details of the field as shown in Figure 12. 20 Figure 12. Converting releaseDate field into a numerical value. Converting continuous features to discrete data. The two major reasons to convert the continuous features into discrete are as follows: Exact values matter less than the values that fall in a certain range. Existence of nonlinear relationship between the input and the output variable. Taking these two major reasons into consideration, we converted the following continuous features into discrete (categorical) data. price supportedDevices languagesSupported isGameCenterEnabled version releaseDate genres averageUserRating 3.4.2.2 Sampling the data. During the developing stage of the modeling procedure, we needed data to build the model and also to test and debug the model on the small subsamples, so that the model’s performance could be validated against the new data. Training set. In our analysis, 80% of the entire dataset (i.e. 44,800 game apps) were considered to be the training set which was fed to the model-building algorithms in the modeling phase of the analysis. 21 Test set. The remaining 20% of the entire dataset (i.e. 11,200 game apps) were considered to be the test set or hold-out data which was fed to the resulting model to validate its performance and to make correct predictions. At the end of the Exploring and Managing the data stage, we were ready with quality data which was free of missing values and the necessary transformations were applied. At that point, the data were ready to be modeled and tested in the modeling stage of the analysis. 3.5 Modeling Modeling is the process of extracting useful information from the data in order to achieve the desired goal. Modeling in data science is applied through various machine learning methods. Machine learning is a data analysis process of finding hidden insights by iteratively running the model building algorithm (Mount et al., n.d.). Mapping the thesis goal to the machine learning task is an important part in the modeling stage of the analysis. In our analysis, success of the game app was quantified by the averageUserRating; the higher the averageUserRating, the more successful the game app is deemed to be. As such, we formulated the task of predicting game apps’ success as a classification problem, with averageUserRating acting as the target or output variable to be predicted and all other features as input variables. We further defined nine levels of success as represented by nine ranges of averageUserRating values, as shown in Table 7. During model building, we measured the quality of the model during the training and also after the training. The former is called model evaluation and the latter is called model validation. Model evaluation is done on the same dataset used for training the model, but model validation is done on the new dataset called test set, or hold-out data, which is spilt before training the model. In our analysis, accuracy was the major performance metrics used to evaluate the model’s 22 performance. The model evaluation is discussed in detail in the next session (Model Evaluation stage). Table 7 Categories or classes of Average User Rating Categories or classes Average User Rating Category – 1 1 Category – 2 1.5 Category – 3 2 Category – 4 2.5 Category – 5 3 Category – 6 3.5 Category – 7 4 Category – 8 4.5 Category – 9 5 3.5.1 Random Model (Null Model) Random model is a null model build that makes “random guesses” (Mount et al., n.d.). Our actual model should work better than the null model, which acts as the lower bound on the model performance. Random model can be imagined as the simplest possible model build by merely making random predictions. In our analysis, we found an 11% (probability of correctly predicting one class over nine classes) of chance that the null model makes correct predictions in finding the averageUserRating. So, the actual model that we built performed better than the null model or the simplest possible model. 23 3.5.2 Multiple Linear Regression In making a choice between different methods to build a model, it is always good to start with a linear regression model which predicts an outcome and also explains the relationship between the input and the output variables used to build the model. Once we had an idea of the relationship between the variables, it was easy to select the right model to improve its accuracy. In multiple linear regression, we modeled the relationship between the output variable and the input variables as a straight line, i.e. the output variable was modeled as a linear function of input variables (Mount et al., n.d.). In our analysis, the actual model-building process started with a multiple linear regression model with averageUserRating as the output variable and price, supportedDevices, languagesSupported, isGameCenterEnabled, fileSizeBytes, version, genres, releaseDate, and userRatingCount as input variables. The results of the multiple regression model were tabulated and the findings are discussed in the presentation of findings chapter of the thesis. 3.5.3 Hierarchical Clustering Clustering is the process of partitioning the data into groups such that the points in the same group of the cluster are more similar than the points in the other groups of clusters (Mount et al., n.d.). Hierarchical clustering is one of the most popular techniques used in the clustering analysis, which builds clusters within clusters presented as a dendrogram. A dendrogram is a graphical representation of nested clusters in form of tress (Mount et al., n.d.). In our analysis, clustering was used to find any significant differences between the game apps. The dissimilarity in clustering was represented as a distance function. We used the 24 Euclidean distance function, the most commonly used distance function, to cluster the game apps based on the distance between the two points in the Euclidean space (Mount et al., n.d.). The results of the hierarchical clustering were tabulated and the findings are discussed in the presentation of findings chapter of the thesis. 3.5.4 Logistic Regression Model In the logistic regression model, prediction is done by projecting an input variable to a set of planes, each of which corresponds to a class, and the distance from the input to a plane is the probability that the input associates to that particular class (Deeplearning.net, n.d.-b). Theano is a python framework used to write deep learning models which is a new emerging area in machine learning study (Deeplearning.net, n.d.-a). To apply deep learning techniques, which is one of the advanced machine learning techniques in our analysis, we started with a simple logistic regression model using the Theano framework and moved on to a multi-layer perceptron model using the Theano framework, which is viewed as a deep learning model with multiple layers of abstraction. The results of the logistic regression model were tabulated and the findings are discussed in the presentation of findings chapter of the thesis. 3.5.5 Multi-Layer Perceptron Model Multi-Layer perceptron (MLP) model is an Artificial Neural Network (ANN) which models the relationship between a number of input variables and an output variable using a model of how a biological brain responds to stimuli from sensory inputs (Nielsen, n.d.). MLP uses a network of artificial neurons called nodes to solve the machine learning problems. MLP models are applied to problems where there is a complex relationship between the input variables and the output variable. These models consists of a number of intermediate layers called hidden 25 layers that are used in optimizing the weights and bias assigned to the nodes of the network such that the model performs with best possible accuracy (Deeplearning.net, n.d.-c). A simple example of an MLP model is shown in (Figure 13). Figure 13. Multi-Layer Perceptron model with one hidden layer (Deeplearning.net, n.d.-c). In our analysis, we constructed a Multi-Layer perceptron model with 500 hidden layers to classify averageUserRating based on price, supportedDevices, languagesSupported, isGameCenterEnabled, fileSizeBytes, version, genres, releaseDate, and userRatingCount as input parameters. The results of the Multi-Layer perceptron model were tabulated and the findings are discussed in the presentation of findings chapter of the thesis. 3.5.6 Bayesian Network Model A Bayesian network is a combination of graph theory and probability theory to represent the probability distribution over a set of random variables using nodes and directed arcs (Heckerman, 1997). A simple Bayesian network model is shown in Figure 14. Nodes in a Bayesian network represent a random variable and directed arcs represent its probabilistic dependencies over other random variables. The structure of a Bayesian Network represents the qualitative relationships between the variables and are classified as constraint-based and score-based learning algorithms (Scutari, 2009). 26 Figure 14. Simple Bayesian Network model. In our analysis, we implemented a score-based learning algorithm to train the model to assign a score to each candidate of the Bayesian network and try to maximize the score using heuristic search algorithms such as hill-climbing or tabu search (Scutari, 2009). Initially, we built a Simple Bayesian Network model to classify averageUserRating based on price, supportedDevices, languagesSupported, isGameCenterEnabled, fileSizeBytes, version, genres, releaseDate, and userRatingCount as input parameters. Later on, we tried to add more features to the network, which is discussed in the following sections. The results of the Simple Bayesian Network model were tabulated and the findings are discussed in the presentation of findings chapter of the thesis. 3.5.6.1 Adding features to the model. So far, we tried building models using simple machine learning procedures (Regression and Clustering methods) to more complex and advanced machine learning procedures (Bayesian network and Deep Learning methods) with the same set of input variables. In order to further improve the performance of the model, we decided to extract and add some more features as input to the model which lead to Latent Dirichlet Allocation analysis, Sentimental analysis and Visual analysis. 27 3.5.7 Latent Dirichlet Allocation (LDA) Analysis LDA analysis on the game apps is a method of automatically extracting the distinct topics from the collection of description of the game apps (Blei, Ng, & Jordan, 2003). Each topic has probabilities of generating various words from the description such as challenging, puzzle, gameplay, weapon, etc. In our analysis, we generated five topics using Latent Dirichlet Allocation (Text Mining algorithm) from the description of the game apps (Figure 15) and recorded the probability distribution of the words from the description over those topics. We named the features as Topic1, Topic2, Topic3, Topic4, and Topic5 and included them in our analysis. We constructed a Bayesian network model to classify averageUserRating based on price, supportedDevices, languagesSupported, isGameCenterEnabled, fileSizeBytes, version, genres, releaseDate, and userRatingCount as input parameters including Topic1, Topic2, Topic3, Topic4, and Topic5. The results of the Bayesian Network model with LDA features were tabulated and the findings are discussed in the presentation of findings chapter of the thesis. Figure 15. Features extracted from the description of the game apps. 3.5.8 Sentimental Analysis Sentimental analysis is the process of evaluating and classifying the emotions expressed in the description of the game apps (Cran.r-project.org, n.d.). 28 The NRC Word-Emotion Association built a library of words in which each word is associated with scores for eight different emotions, i.e., anger, anticipation, disgust, joy, fear, sadness, surprise and trust, and two sentiments, i.e., positive and negative (Cran.r-project.org, n.d.). Total emotions of each group are calculated by adding the scores of individual emotions of the sentence. Some English words also score neutral points and are not added to the library. In our analysis, a sentiment analysis was conducted on the description of the game apps (Figure 15) and the scores of emotions and sentiments were recorded and included as features. A Bayesian network model was built to classify averageUserRating based on price, supportedDevices, languagesSupported, isGameCenterEnabled, fileSizeBytes, version, genres, releaseDate, and userRatingCount as input parameters. The results of the Bayesian Network model with Sentimental features were tabulated and the findings are discussed in the presentation of findings chapter of the thesis. 3.5.9 Visual Analysis A visual analysis was used to capture the intensity of the screenshots of the game apps and we included them as feature in the analysis (Figure 16). The idea was to download the image from the link given in the screenshotUrls of the games apps and to extract the intensity matrices of red, blue and green colors of the image. The size of the each intensity matrix was equal to the pixel size of the image. The mean, maximum, and minimum values of the matrices were calculated and included as features in the analysis. The downloaded image was also converted into a gray image and the mean, maximum, and minimum values were calculated from the intensity matrix and included as features in the analysis. 29 A Bayesian network model was built to classify averageUserRating based on price, supportedDevices, languagesSupported, isGameCenterEnabled, fileSizeBytes, version, genres, releaseDate, and userRatingCount as input parameters including the visual features like mean_red, max_red, min_red, mean_blue, max_blue, min_blue, mean_green, max_green, min_green, mean_gray, max_gray, and min_gray. The results of the Bayesian Network model with visual features were tabulated and the findings are discussed in the presentation of findings chapter of the thesis. Figure 16. Visual Features extracted from the screenshotUrls of the game apps. 3.6 Model Evaluation The questions that needed to be answered after building the model are as follows: How well did the model perform? How many predictions made by the model were correct? How many incorrect predictions were made by the model? How accurate was the model in making correct predictions? Did the model perform better than “the randomly guessed” values? To answer these questions, first the model was evaluated against the test set or the hold-out data. Since our analysis was a classification problem with nine categories (Table 6), the most common measure for evaluating classification problems is accuracy. 30 3.6.1 Confusion Matrix Confusion matrix provides the summary of the classifier’s accuracy which tabulates the predicted values against the actual values. It is a table that counts the correct predictions for each prediction type against the actual ones as shown in Table 8. Further various statistical tools like charts and plots were used to interpret the results and comparisons were made between the models to graphically visualize how well the model was trained to predict the success factors of an application. Table 8 Confusion Matrix Confusion Matrix Predicted a = True Negative Negative Positive b = False Positive Actual Negative a b c = False Negative Positive c d d = True Positive 3.6.2 Relaxing the Accuracy Measure The accuracy measure based on the confusion matrix described above was very stringent. For instance, if a ground truth value is 3.5, a predictor is flagged to be incorrect regardless of whether the returned value is 3 or 1. In our project, we relaxed this strictness to give a predictor credits whenever its predictions are “close enough.” More specifically, in the example above, if the predictor returns “3,” i.e., less than or equal to half a star away from the actual value, we counted that as an acceptable prediction, while a value of “1,” i.e., greater than half a star away, is treated as a miss. As such, we relaxed the accuracy measure by a threshold value of 0.5 (averageUserRating) during the evaluation of the model, so that a predicted value (averageUserRating) within the 0.5 neighborhood of the actual value (averageUserRating) was defined as correct. Another example is shown in Table 9. 31 Table 9 An example to show the calculation of modified confusion matrix averageUserRating Acceptance Actual Predicted Correct/Incorrect 4 3.5 Correct 4 Correct 4.5 Correct else other values Incorrect 3.7 Result Presentation and Documentation Presenting the results and documenting them is the final step of any data science process. It is always important to present the results in a meaningful and legible way so that the audience is comfortable in understanding the research being carried out. The results and findings of our analysis are presented in Chapter 4. 32 Chapter 4 PRESENTATION OF FINDINGS In the modeling stage of our analysis, we followed different approaches to build the model that would predict averageUserRating. We also evaluated the model on the test set or hold-out data to measure its performance through accuracy metrics. In this section, we discuss the results and findings of 10 different approaches we used to build and evaluate the model. 4.1 Trial-1: Random Model as Baseline Model The first model we were interested was to build a null model just by making “random guesses.” We sampled some random categorical values to the averageUserRating and compared it with the actual values of the averageUserRating on the test set. This model acted as a baseline model for rest of the analysis and the actual model aimed to beat the performance of the null model. Table 10 details the parameters considered in this trial. Table 10 Random model Number of observations 11208 Input Variables Random AverageUserRating Target Variable Actual AverageUserRating Table 11 displays data regarding the accuracy achieved at the end of trial-1. Table 11 Accuracy table for Random model On the test data Total observations 11208 Correctly predicted 3590 Incorrectly predicted 7618 Accuracy on making right predictions 32.03% 33 Findings Random model produced an accuracy measure of 32.13% on the test set. For the rest of the analysis, we focused on improving this baseline accuracy, as achieved by randomly predicting the outcome of the target variable averageUserRating. 4.2 Trial-2: Multiple Linear Regression In Trial-2, we built a multiple linear regression model to predict averageUserRating using price, supportedDevices, languagesSupported, isGameCenterEnabled, fileSizeBytes, version, genres, and releaseDate as input features to the model. We trained the model using 44,832 game apps and tested the model on the test set of 11,208 game apps. Table 12 details the parameters considered in this trial. Table 12 Multiple Linear Regression model Number of observations in training data 44832 Number of observations in test data 11208 Input Variables 1. Price 2. Supported Devices 3. Languages Supported 4. Genres 5. IsGameCenterEnabled 6. Version 7. ReleaseDate 8. FileSize 9. UserRatingCount Target Variable 1. AverageUserRating 34 Figure 17. Residual vs Fitted values plot for Multiple Linear Regression model. We plotted the predicted response of the linear model against the residual of the linear model in Figure 17. Residual of the linear model is the difference between the actual value and the predicted response. Each circle in the plot corresponds to a game app and the distance from zero line indicates how poor the prediction was for that particular game app. The regression assumes homoscedasticity that the variance in the residuals did not change as a function of x. If that assumption is correct, the red line should be relative flat to the zero line. But, we observed non-linearity in the plot. Table 13 displays data regarding the accuracy achieved at the end of trial-2. Table 13 Accuracy table for Multiple Linear Regression model On the test data Total observations 11208 Correctly predicted 5868 Incorrectly predicted 5340 Accuracy on making right predictions 52.35% 35 Findings The multiple linear regression model yielded an accuracy measure of 52.35%, which was an improvement over the baseline null model. Though the model was an improvement over the random null model, the independent variables showed less significance with the dependent variables as shown in Figure 17. 4.3 Trial-3: Hierarchical Clustering In Trial-3, we built a hierarchical clustering model to observe any significant difference between the game apps. The input parameters to the model are price, supportedDevices, languagesSupported, isGameCenterEnabled, fileSizeBytes, version, genres, releaseDate, and averageUserRating. The output of the model was a dendrogram, which displays nested clusters in the form of trees. We used Euclidean distance as the dissimilarity measure in the Euclidean space. Table 14 details the parameters considered in this analysis. Table 14 Hierarchical Clustering model Number of observations in training data 10000 Input Variables 1. Price 2. Supported Devices 3. Languages Supported 4. Genres 5. IsGameCenterEnabled 6. Version 7. ReleaseDate 8. FileSize 9. UserRatingCount 10. AverageUserRating Distance Function Euclidean Distance 36 Figure 18. Hierarchical Clustering Dendrogram. The dendrogram of the hierarchical clustering (Figure 18) showed that 95% of the game apps are in a single branch, indicating most of the apps exhibit similar characteristics and only 5% of the apps are spread over all the other branches, exhibiting lesser difference between the game apps. Table 15 shows the percentage distribution of the game apps between different branches of the nested clusters. Table 15 Percentage distribution of apps between branches of the clusters Branch No. of apps in a branch Percentage Branch-1 9569 95% Branch-2 105 1% Branch-3 87 0.87% Branch-4 37 0.37% Branch-5 33 0.33% Branch-6 8 0.08% Branch-7 7 0.07% Branch-8 6 0.06% Branch-9 3 0.03% 37 Findings Our results from applying hierarchical clustering showed that very few game apps showed a significant difference with the other game apps, which means that most of the game apps exhibit similar characteristics in the entire collection. 4.4 Trial-4: Clustering Based on Genres In Trial-3 of hierarchical clustering, we did not see any significant difference between the game apps. So, in Trial-4 of clustering analysis, we were interested in finding the difference between the apps by grouping them based on genres. Table 16 details the parameters considered in this analysis. Table 16 Clustering based on genres model Number of observations 50600 Input Variables 1. Price 2. Supported Devices 3. Languages Supported 4. Genres 5. IsGameCenterEnabled 6. Version 7. ReleaseDate 8. FileSize 9. UserRatingCount 10. AverageUserRating Distance Function Custom Build Distance Function based on Tukey’s test (Figure 20). 38 4.4.1 Tukey’s test Tukey's test is a single-step multiple comparison procedure and statistical test which is used on raw data to find means that are significantly different from each other (Brillinger, 1984; Mtu.edu, n.d.). In this trial, we were particularly interested in finding the difference between the groups using Tukey’s test and then build a distance matrix. Figure 19 displays the screenshot of the result of the Tukey’s test and how it was interpreted to construct a distance matrix for clustering. The distance matrix was used as an input to cluster the game apps based on genres. Figure 19. Tukey’s test on genres. Interpreting the results of Tukey’s test. The results of the Tukey’s test are in Figure 19. The green box in the Figure 19 indicates that the mean difference between the groups is greater than the threshold of 0.05, hence we rejected the null hypothesis and concluded that there is a significant mean difference between the groups. On the other hand, the red box in the Figure 19 indicates that the mean difference between the groups is less than the threshold of 0.05, hence we did not reject the null hypothesis and concluded that there is no significant mean difference between the groups. Distance Matrix of mean difference. Distance Matrix is a two dimensional array containing the mean difference between the genres as the distance values as shown in Figure 20. Each cell in the distance matrix corresponds to a pair of groups (i.e. genres) and the mean 39 difference between them. If the difference between the each group in a pair is greater than the threshold of 0.5, then the cell takes the mean difference value between the groups else it takes zero. Figure 21 shows the multiple comparison between all pairs of genres where the distance is on x-axis and the genres is on y-axis. Figure 20. Distance Matrix of mean difference. Figure 21. Multiple Comparison between all pairs of genres. 40 Figure 22. Multi-Dimensional Scaling showing the clusters of genres. Findings The multidimensional scaling of the distance matrix displayed a U-shaped curve starting from the least value to the highest value as shown in Figure 22. When we clustered them on the multidimensional scaling, we observed most of the genres (Action, Adventure, Racing, Puzzle, Arcade, etc.) are grouped under same cluster, indicating the apps exhibit similar characteristics and less variation. 4.5 Trial-5: Logistic Regression Model In the goal of moving towards Deep learning, we started with the basic logistic regression model implemented using Theano Framework. In Trial-5, we built a logistic regression model taking price, supportedDevices, languagesSupported, isGameCenterEnabled, fileSizeBytes, version, genres, releaseDate and userRatingCount as input features to classify averageUserRating. Table 17 details the parameters considered in this trial. 41 Table 17 Logistic Regression model Number of observations in training data 35873 Number of observations in validation data 8967 Number of observations in test data 11200 Input Variables 1. Price 2. Supported Devices 3. Languages Supported 4. Genres 5. IsGameCenterEnabled 6. Version 7. ReleaseDate 8. FileSize 9. UserRatingCount Target Variable 1. AverageUserRating The major challenge in implementing the logistic regression model was to learn the optimal parameters of the model by minimizing the loss function. After a number of iterations, we determined the optimal model parameters that would help the model to perform better. We trained the model with 1000 epochs with a mini batch size of 600 and a learning rate of 0.13 as shown in Table 18. Table 18 Parameters considered to optimize weights and bias of Logistic Regression model Number of epochs 1000 Mini Batch size 600 Total number of mini batches in the training data 70 Learning Rate 0.13 42 Table 19 displays data regarding the accuracy achieved at the end of trial-5. We trained the model for 35,873 game apps and tested it on 11,200 game apps, which yielded an accuracy measure of 47.07%. Table 19 Accuracy table for Logistic Regression model On the test data Total observations 11200 Correctly predicted 5272 Incorrectly predicted 5928 Accuracy on making right predictions 47.07% Findings The logistic regression model produced an accuracy of 47.07%, which is an improvement over the baseline accuracy of 32.03%, but it performed similar to multiple linear regression model. 4.6 Trial-6: Multi-Layer Perceptron (MLP) Model In Trial-6, we implemented the deep learning concept using Multi-Layer Perceptron (MLP) model with 500 hidden layers. We built a MLP model using Theano framework to classify averageUserRating using price, supportedDevices, languagesSupported, isGameCenterEnabled, fileSizeBytes, version, genres, releaseDate, and userRatingCount. Table 20 details the parameters considered in this trial. In the MLP model, including L1 and L2 regularization terms further minimizes the loss function, improving the performance of the model. After a number of iterations, we figured out the optimal model parameters listed in the Table 21. We trained the model using 1000 number of epochs with a mini batch size of 500, learning rate as 0.01 and L2 regularization as 0.0001. 43 Table 20 Multi-Layer Perceptron (MLP) model Number of observations in training data 35873 Number of observations in validation data 8967 Number of observations in test data 11200 Input Variables 1. Price 2. Supported Devices 3. Languages Supported 4. Genres 5. IsGameCenterEnabled 6. Version 7. ReleaseDate 8. FileSize 9. UserRatingCount Target Variable 1. AverageUserRating Table 21 Parameters considered to optimize weights and bias of MLP model Number of epochs 1000 Mini Batch size 500 Total number of mini batches in the training data 84 Learning Rate 0.01 L1 Regularization 0.00 L2 Regularization 0.0001 Number of hidden layers 500 44 Figure 23. Multi-Layer Perceptron model. Figure 23 graphically represents the MLP model highlighting the input to the model and the weights assigned to the nodes, but it does not show the 500 hidden layers which we used to train the MLP model. We trained the model for 35,873 game apps and tested on 11,200 game apps which yielded an accuracy measure of 47.27%. Table 22 displays data regarding the accuracy achieved at the end of trial-6. Table 22 Accuracy table for Multi-Layer Perceptron (MLP) model On the test data Total observations 11200 Correctly predicted 5295 Incorrectly predicted 5905 Accuracy on making right predictions 47.27% 45 Findings The multi-layer perceptron model produced an accuracy of 47.27%, which is not an improvement over the logistic regression model. Building a deeper network did not help us in improving the accuracy of the model. 4.7 Trial-7: Simple Bayesian Network Model In trial-7, we built a Simple Bayesian network model to predict AverageUserRating using price, supportedDevices, languagesSupported, isGameCenterEnabled, fileSizeBytes, version, genres, releaseDate, and userRatingCount as input to the model. Table 23 details the parameters considered in this trial. Table 23 Simple Bayesian network model Number of observations in training data 44832 Number of observations in test data 11208 Input Variables 1. Price 2. Supported Devices 3. Languages Supported 4. Genres 5. IsGameCenterEnabled 6. Version 7. ReleaseDate 8. FileSize 9. UserRatingCount Target Variable 1. AverageUserRating Structure Learning algorithm Hill-Climbing We learned the structure of the Bayesian network model using a score-based learning algorithm with the training set of 44,832 game apps. Nodes in the Bayesian network represents a 46 random variable (feature) and directed arcs represent its probabilistic dependencies over other random variables (features). Figure 24 shows the Bayesian network model obtained as a result of structure learning. From the structure, we can infer that the averageUserRating exhibits direct dependency on userRatingCount, isGameCenterEnabled, and indirect dependency over the other input features. We evaluated the Bayesian network model on the test set of 11,208 game apps to classify the averageUserRating which yielded an accuracy measure of 74.24%. Figure 24. Bayesian Network with general features. Table 24 displays data regarding the accuracy achieved at the end of trial-7. Table 24 Accuracy table for Simple Bayesian Network model On the test data Total observations 11208 Correctly predicted 8321 Incorrectly predicted 2887 Accuracy on making right predictions 74.24% 47 Findings The Simple Bayesian network model produced an accuracy of 74.24%, which was an improvement over the logistic regression model and MLP model. 4.8 Trial-8: Bayesian Network Model with LDA Features In the view to further improve the performance of the model, we decided to extract and add Latent Dirichlet Allocation (LDA) features to our analysis. In trial-8, we generated five topics using Latent Dirichlet Allocation (Text Mining algorithm) and recorded the probability distribution of the words from the description over those topics as shown in Figure 25. We named the features as Topic1, Topic2, Topic3, Topic4, and Topic5. In the next step, we built a Bayesian network model to predict AverageUserRating based on price, supportedDevices, languagesSupported, isGameCenterEnabled, fileSizeBytes, version, genres, releaseDate, userRatingCount, and LDA features as input to the model. Table 25 details the parameters considered in this trial. Figure 25. Representation of the LDA features to train the model. 48 Table 25 Bayesian network model with LDA features Number of observations in training data 44832 Number of observations in test data 11208 Input Variables 1. Price 2. Supported Devices 3. Languages Supported 4. Genres 5. IsGameCenterEnabled 6. Version 7. ReleaseDate 8. FileSize 9. UserRatingCount 10. LDA Topic-1 (V1) 11. LDA Topic-2 (V2) 12. LDA Topic-3 (V3) 13. LDA Topic-4 (V4) 14. LDA Topic-5 (V5) Target Variable 1. AverageUserRating We learned the structure of the Bayesian network model using a score-based learning algorithm with the training set of 44,832 game apps. Nodes in the Bayesian network represent a random variable (feature) and directed arcs represent its probabilistic dependencies over other random variables (features). Figure 26 shows the Bayesian network model obtained as a result of structure learning with LDA features. From the structure, we infered that the averageUserRating does not have any direct dependency on other input feaures but rather indirect dependency over other input features. We 49 evaluated the Bayesian network model on the test set of 11,208 game apps to classify the averageUserRating, which yielded an accuracy measure of 73.68%. Figure 26. Bayesian Network with LDA features (V1-V5). Table 26 displays data regarding the accuracy achieved at the end of trial-8. Table 26 Accuracy table for Bayesian Network model with LDA features On the test data Total observations 11208 Correctly predicted 7970 Incorrectly predicted 2847 Accuracy on making right predictions 73.68% Findings The Bayesian network model with LDA features produced an accuracy of 73.68%, which was not an improvement over the Simple Bayesian network model. Adding the LDA features to the analysis was not helping us in improving the accuracy of the model. 50 4.9 Trial-9: Bayesian Network Model with Sentimental Features In trial-9, we decided to extract and add sentimental features to our analysis. We conducted sentimental analyses to classify the emotions expressed in the description of the game apps. In sentimental analysis, each word is associated with scores for eight different emotions (anger, anticipation, disgust, joy, fear, sadness, surprise, and trust) and two sentiments (positive and negative) as shown in Table 27. Total emotions of each group is calculated by adding the scores of individual emotions of the sentence as shown in Figure 27. Table 27 Emotions and Sentiments Emotions anger anticipation disgust fear joy sadness surprise trust Sentiments positive negative Figure 27. Extraction of sentiments and emotions from the description. In the next step, we built a Bayesian network model to predict AverageUserRating based on price, supportedDevices, languagesSupported, isGameCenterEnabled, fileSizeBytes, version, genres, releaseDate, userRatingCount, and sentimental features as input to the model. Table 28 details the parameters considered in this trial. 51 Table 28 Bayesian network model with Sentimental features Number of observations in training data 44832 Number of observations in test data 11208 Input Variables 1. Price 2. Supported Devices 3. Languages Supported 4. Genres 5. IsGameCenterEnabled 6. Version 7. ReleaseDate 8. FileSize 9. UserRatingCount 10. anger 11. anticipation 12. disgust 13. fear 14. joy 15. sadness 16. surprise 17. trust 18. positive 19. negative Target Variable 1. AverageUserRating 52 Figure 28. Histogram of Emotions on the training data. Figure 29. Histogram of Sentiments on the training data. Figure 28 and Figure 29 show the distribution of emotions and sentiments in the training data. We learned the structure of the Bayesian network model using a score-based learning algorithm with the training set of 44,832 game apps. Nodes in the Bayesian network represents a random variable (feature) and directed arcs represent its probabilistic dependencies over other 53 random variables (features). Figure 30 shows the Bayesian network model obtained as a result of structure learning with Sentimental features. From the structure, we inferred that the averageUserRating exhibits direct dependency on userRatingCount and indirect dependency over the other input features. We evaluated the Bayesian network model on the test set of 11,208 game apps to classify the averageUserRating which yielded an accuracy measure of 74.24%. Figure 30. Bayesian Network with Sentimental features. .... Table 29 displays data regarding the accuracy achieved at the end of trial-9. Table 29 Accuracy table for Bayesian Network model with Sentimental features On the test data Total observations 11208 Correctly predicted 8023 Incorrectly predicted 2794 Accuracy on making right predictions 74.17% 54 Findings Bayesian network model with Sentimental features produced an accuracy of 74.17%, which was also not an improvement over the other Bayesian network models discussed earlier. Adding the sentimental features to the analysis was not helping us in improving the accuracy of the model. 4.10 Trial-10: Bayesian Network Model with Visual Features In trial-10, we decided to capture the intensity of the screenshots of the game apps and include them as features in the analysis. In visual analysis, we downloaded the screenshots of each game app and extracted the intensity matrices of red, blue, green and gray colors of the image. We calculated the mean, maximum and minimum of these intensity matrices and included them as features in the analysis. Figure 31 shows the screenshot of the RBG image and the histogram of intensity distribution of the RBG image. Figure 32 shows the screenshot of the converted Gray image and the histogram of intensity distribution of the Gray image. Table 30 details the parameters considered in this trial. Figure 31. RBG Image of the screenshot and the histogram of the intensity. 55 Table 30 Bayesian network model with Visual features Number of observations in training data 44832 Number of observations in test data 11208 Input Variables 1. Price 2. Supported Devices 3. Languages Supported 4. Genres 5. IsGameCenterEnabled 6. Version 7. ReleaseDate 8. FileSize 9. UserRatingCount 10.mean_red 11.max_red 12.min_red 13.mean_blue 14.max_blue 15.min_blue 16.mean_green 17.max_green 18.min_green 19.mean_gray 20.max_gray 21.min_gray Target Variable 1. AverageUserRating 56 Figure 32. Gray Image of the screenshot and the histogram of the intensity. We learned the structure of the Bayesian network model using a score-based learning algorithm with the training set of 44,832 game apps. Nodes in the Bayesian network represents a random variable (feature) and directed arcs represent its probabilistic dependencies over other random variables (features). Figure 33 shows the Bayesian network model obtained as a result of structure learning with Sentimental features. From the structure, we inferred that the averageUserRating exhibits direct dependency on userRatingCount, max_red, max_blue and indirect dependency over the other input features. We evaluated the Bayesian network model on the test set of 11,208 game apps to classify the averageUserRating which yielded an accuracy measure of 74.01%. 57 Figure 33. Bayesian Network with visual features. Table 31 displays data regarding the accuracy achieved at the end of trial-10. Table 31 Accuracy table for Bayesian Network model with Visual features On the test data Total observations 11208 Correctly predicted 8060 Incorrectly predicted 2830 Accuracy on making right predictions 74.01% Findings Bayesian network model with visual features produced an accuracy of 74.01%, which was also not an improvement over the other Bayesian network models discussed earlier. Though we find some direct dependency of visual features with averageUserRating, adding visual features to the analysis was not helping us in improving the accuracy of the model. 58 Chapter 5 CONCLUSIONS Finally, we would like to conclude the thesis by summarizing the results obtained and highlighting the important points that we learned from the process. Our main goal of the thesis was to find the reason behind the success of the mobile game applications in the apps market. We tried to correlate the different features of the game application with the success of the game in the apps market. The data from the real world was not clean and consistent. We cleaned the data and performed data transformation to extract some important features from the game apps. We related the most important feature (i.e. averageUserRating) to the success of the game app in the market. We were interested in modeling an algorithm that would take the features of the game apps as input and predict the averageUserRating automatically by recognizing the patterns from the data. With the desire to build models, we implemented various machine learning algorithms that would predict the averageUserRating. We also measured the performance of the model through accuracy metrics and made comparisons with other models. Initially, we defined a random model that acts as a baseline model for other models. We constructed a Multiple Linear Regression model that performed with an accuracy measure of 52.35%. In between, we were interested in determining if significant differences exist between the game apps using clustering analysis. The clustering results suggested that there were no significant differences between the game apps. We were interested in using the two most advanced machine learning techniques, deep learning and the Bayesian network model, to predict the averageUserRating. We constructed a Logistic Regression model and Multi-Layer Perceptron model using the Theano framework which performed with an accuracy measure of around 47%. Finally, we constructed Bayesian network models by adding additional features 59 extracted from the description using Latent Dirichlet Allocation (LDA) analysis and Sentimental analysis. We also added features from the visual analysis of the screenshots of the game apps. The Bayesian Network models outperformed all other models with an accuracy measure of 74%. Thus, we successfully constructed other models to outperform the baseline model and improve its accuracy measure to its maximum. Table 32 shows the summary of the results of ten trials in building the model. Table 32 Summary of the results of 10 trials in building the model S.No. Model Result/Accuracy 1. Random model On test data 32.03% 2. Multiple Linear Regression model On test data 52.35% Clustering Analysis 3. Hierarchical Clustering No significant difference between the apps 4. Hierarchical Clustering based on genres Apps exhibits similar characteristics Analysis using Theano Python Framework 5. Logistic Regression model On test data 47.07% 6. Multi-Layer Perceptron model On test data 47.28% Bayesian Network Analysis 7. Bayesian Network model On test data 74.24% LDA Analysis 8. Bayesian Network model with LDA Features On test data 73.68% Sentimental Analysis 9. Bayesian Network model with Sentimental Features On test data 74.17% Visual Analysis 10. Bayesian Network model with Visual Features On test data 74.01% Though we succeeded in building different models that could predict averageUserRating with an accuracy measure of 74%, the accuracy level is not quite enough and can be improved 60 further. A possible reason we were unable to improve the accuracy further may be that we did not have enough data to train the model. This is mainly because we lost 50% of our collected data in handling the missing values. Also, the features selected were not quite enough to accurately make predictions and they were not highly correlated with the target variable. Less variation was observed in the input variables, which might also be a cause for the decrease in accuracy of the model. Recommendations for Future Research The data we collected does not contain user reviews. It is therefore not possible to determine the roles of app factors from the users’ perspective, such as those features they appreciate and those they do not. In the future, we recommend adding the reviews in the agenda of data collection to get this kind of information for analysis. Besides, the sentiment analysis we used to extract features from text descriptions of the apps may not be appropriate, since the outputs are emotions (anger, sadness, etc.) that may not be relevant. Our suggestion for future research includes adopting or developing new sentiment analysis methods that are able to extract information related to the kinds of gaming experience the game is designed for, such as how much violence or leisure users should expect to see in the game. 61 REFERENCES AmericanDialect. (n.d.). “App” voted 2010 word of the year by the American Dialect Society. AndroidRank. (n.d.). Android Market App Statistics. Retrieved June 4, 2016, from http://www.androidrank.org/categorystats?category=&price=all Apple iTunes. (n.d.-a). Apple iTunes App store. Retrieved June 4, 2016, from https://itunes.apple.com/us/genre/ios/id36?mt=8 Apple iTunes. (n.d.-b). iTunes Affiliate Resources. Retrieved June 4, 2016, from https://affiliate.itunes.apple.com/resources/documentation/itunes-store-web-service-search-api/ Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation, 3, 993–1022. Chen, N., Lin, J., Hoi, S. C. H., Xiao, X., & Zhang, B. (2014). AR-miner: Mining Informative Reviews for Developers from Mobile App Marketplace. Proceedings of the 36th International Conference on Software Engineering (ICSE), 767–778. http://doi.org/10.1145/2568225.2568263 Cran.r-project.org. (n.d.). Sentiment Analysis using Syuzhet Package. Retrieved April 6, 2016, from https://cran.r-project.org/web/packages/syuzhet/vignettes/syuzhet-vignette.html Brillinger, D. (1984). The Collected Works of John W. Tukey (Volume I). CHAPMAN & HALL. Deeplearning.net. (n.d.-a). Deep Learning Tutorials using Theano Framework. Retrieved June 4, 2016, from http://deeplearning.net/tutorial/ Deeplearning.net. (n.d.-b). Logistic Regression using Theano Framework. Retrieved June 4, 2016, from http://deeplearning.net/tutorial/logreg.html Deeplearning.net. (n.d.-c). Multi-Layer Perceptron using Theano Framework. Retrieved June 4, 2016, from http://deeplearning.net/tutorial/mlp.html 62 Deng, L., & Yu, D. (2013). Deep Learning: Methods and Aopplications. Foundations and Trends in Signal Processing, 7(3-4), 197–387. http://doi.org/10.1136/bmj.319.7209.0a ECMA-404. (2013). The JSON Data Interchange Format. EMAC International, 1st Editio(October). Retrieved from http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf Finkelstein, A., Harman, M., Jia, Y., Martin, W., Sarro, F., Zhang, Y., … Zhang, Y. (2014). App Store Analysis : Mining App Stores for Relationships between Customer , Business and Technical Characteristics. UCL Research Note, 14/10, 1–24. Fu, B., Lin, J., Li, L., Faloutsos, C., Hong, J., & Sadeh, N. (2013). Why people hate your app. Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’13, 1276. http://doi.org/10.1145/2487575.2488202 Guzman, E., & El-Halaby Muhammad, B. B. (2015). Ensemble Methods for App Review Classification: An Approach for Software Evolution. Proc. of the Automated Software Enginering Conference (ASE), to appear. http://doi.org/10.1109/ASE.2015.88 Heckerman, D. (1997). Bayesian {Networks} for {Data} {Mining}. Data Mining and Knowledge Discovery, 1(1), 79–119. http://doi.org/10.1023/A:1009730122752 Kong, D., Cen, L., & Jin, H. (2015). AUTOREB: Automatically Understanding the Review-to-Behavior Fidelity in Android Applications. Proceedings of the 22Nd ACM SIGSAC Conference on Computer and Communications Security, 530–541. http://doi.org/10.1145/2810103.2813689 Maalej, W., & Nabil, H. (2015). Bug report, feature request, or simply praise? On automatically classifying app reviews. 2015 IEEE 23rd International Requirements Engineering Conference (RE), 116–125. http://doi.org/10.1109/RE.2015.7320414 63 Nielsen, M. (n.d.). Neural Network and Deep Learning. Retrieved June 4, 2016, from http://neuralnetworksanddeeplearning.com/ Mount, J., By, F. O., Porzak, J., & Mount, J. (n.d.). Practical Data Science with R. Mtu.edu. (n.d.). Tukey’s HSD Post Hoc Test steps. Retrieved April 6, 2016, from https://web.mst.edu/~psyworld/tukeyssteps.htm# Özel, T., & Karpat, Y. (2005). Predictive modeling of surface roughness and tool wear in hard turning using regression and neural networks. International Journal of Machine Tools and Manufacture, 45(4-5), 467–479. http://doi.org/10.1016/j.ijmachtools.2004.09.007 Scutari, M. (2009). Learning Bayesian networks with the bnlearn R package. Journal of Statistical Software, VV(II). Statista. (n.d.-a). Most popular Apple App Store categories in March 2016, by share of available apps. Retrieved June 4, 2016, from http://www.statista.com/statistics/270291/popular-categories-in-the-app-store/ Statista. (n.d.-b). Number of apps available in leading app stores as of July 2015. Retrieved June 4, 2016, from http://www.statista.com/statistics/276623/number-of-apps-available-in-leading-app-stores/ Statista. (n.d.-c). Number of available apps in the iTunes App Store from 2008 to 2015 (cumulative). Retrieved June 4, 2016, from http://www.statista.com/statistics/268251/number-of-apps-in-the-itunes-app-store-since-2008/ 64 VITA Pradeep Kumar Balan was awarded a Bachelor’s degree in Mechanical Engineering from SASTRA University, Thanjavur, India, in May 2009 and a Master’s degree in Computer-aided design / Computer-aided manufaturing (CAD/CAM) from Sri Krishna College of Engineering, Coimbatore, India in May 2013. He completed his Masters studies in the field of Computer Science at Texas A&M University-Commerce in August 2016. He has two years of experience working as an Assistant Professor and two years of experience working as a Software Programmer. His research work focuses on the data-driven analysis of the iOS games app market using various machine learning techniques. Department of Computer Science, Texas A&M University – Commerce, 2200 Campbell St Commerce, TX 75428. Email: balan.pradeepkumar@gmail.com |