<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://jraymartinez.github.io/portfolio/feed.xml" rel="self" type="application/atom+xml" /><link href="https://jraymartinez.github.io/portfolio/" rel="alternate" type="text/html" /><updated>2025-10-04T19:40:53+00:00</updated><id>https://jraymartinez.github.io/portfolio/feed.xml</id><title type="html">John Ray Martinez</title><subtitle>Portfolio of John Ray Martinez, a data scientist. John is an AWS Machine Learning Specialty certified with experience in NLP, recommender systems, and time series. He is a PhD Candidate researching multi-agent AI systems with focus on uncertainty quantification, confidence-weighted prediction fusion, and LLM reliability for high-stakes applications. He is a recipient of Outstanding Graduate Student Award, Drexel University 2021. He is a Data Professional with 10+ years of software engineering experience and a teaching background in Physics.</subtitle><author><name>John Ray Martinez</name></author><entry><title type="html">LLM Agents Mastery</title><link href="https://jraymartinez.github.io/portfolio/certificates/ucberkeley-llmagents/" rel="alternate" type="text/html" title="LLM Agents Mastery" /><published>2025-02-06T08:50:00+00:00</published><updated>2025-02-06T08:50:00+00:00</updated><id>https://jraymartinez.github.io/portfolio/certificates/ucberkeley-llmagents</id><content type="html" xml:base="https://jraymartinez.github.io/portfolio/certificates/ucberkeley-llmagents/"><![CDATA[<p><a href="https://llmagents-learning.org/f24"><em>UC Berkeley RDI Certification</em></a> (2024).<br /></p>

<p><strong>Description</strong>. John has successfully completed and earned the LLM Agents Mastery certification from UC Berkeley Center for Responsible, Decentralized Intelligence (RDI). This certification validates foundation of LLMs, essential LLM abilities required for task automation, as well as infrastructures for agent development.</p>

<p>The certificate can be downloaded <a href="https://jraymartinez.github.io/portfolio/assets/docs/jmartinez_ucberkeley_llmagents_2024.pdf">here</a>.</p>]]></content><author><name>John Ray Martinez</name></author><category term="certificates" /><category term="AWS" /><category term="certification" /><category term="machine learning" /><summary type="html"><![CDATA[Has successfully compeleted and earned the LLM Agents Mastery certification from UC Berkeley RDI.]]></summary></entry><entry><title type="html">AWS Certified Machine Learning - Specialty</title><link href="https://jraymartinez.github.io/portfolio/certificates/aws-ml/" rel="alternate" type="text/html" title="AWS Certified Machine Learning - Specialty" /><published>2023-05-31T08:50:00+00:00</published><updated>2023-05-31T08:50:00+00:00</updated><id>https://jraymartinez.github.io/portfolio/certificates/aws-ml</id><content type="html" xml:base="https://jraymartinez.github.io/portfolio/certificates/aws-ml/"><![CDATA[<p><a href="https://aws.amazon.com/certification/certified-machine-learning-specialty/"><em>AWS Training and Certification</em></a> (2023).<br /></p>

<p><strong>Description</strong>. John has successfully passed and earned the AWS Certified Machine Learning - Specialty certification from Amazon Web Services Training and Certification. This certification validates expertise in building, training, tuning, and deploying machine learning (ML) models on AWS.</p>

<p>The AWS Certified Machine Learning - Specialty exam has the following main content domains:</p>

<ol>
  <li>Data Engineering</li>
  <li>Exploratory Data Analysis</li>
  <li>Modeling</li>
  <li>Machine Learning Implementation and Operations</li>
</ol>

<p>The certificate can be downloaded <a href="https://jraymartinez.github.io/portfolio/assets/docs/jmartinez_aws_ml_certificate_2023.pdf">here</a>.</p>]]></content><author><name>John Ray Martinez</name></author><category term="certificates" /><category term="AWS" /><category term="certification" /><category term="machine learning" /><summary type="html"><![CDATA[Has successfully passed and earned the AWS Certified Machine Learning - Specialty certification from Amazon Web Services Training and Certification.]]></summary></entry><entry><title type="html">Outstanding Graduate Student Award Recipient</title><link href="https://jraymartinez.github.io/portfolio/certificates/outstanding-grad/" rel="alternate" type="text/html" title="Outstanding Graduate Student Award Recipient" /><published>2021-06-11T08:50:00+00:00</published><updated>2021-06-11T08:50:00+00:00</updated><id>https://jraymartinez.github.io/portfolio/certificates/outstanding-grad</id><content type="html" xml:base="https://jraymartinez.github.io/portfolio/certificates/outstanding-grad/"><![CDATA[<p><a href="https://drexel.edu/cci/news/"><em>Drexel University College of Computing &amp; Informatics (CCI)</em></a> (2021).<br /></p>

<p><strong>Description</strong>. John is a recipient of Drexel University 2021 College of Computing &amp; Informatics (CCI) Outstanding Graduate Student Award. The CCI Awards are the most prestigious honor given within the College community recognizing excellence, achievement, leadership, and innovation. As part of the award, John has received prize payment and was recognized during CCI Honors event.</p>

<p>The certificate can be downloaded <a href="https://jraymartinez.github.io/portfolio/assets/docs/jmartinez_cci_2021_outstanding_grad_student.pdf">here</a>.</p>]]></content><author><name>John Ray Martinez</name></author><category term="certificates" /><category term="outstanding graduate student" /><category term="award" /><category term="prestigious honor" /><summary type="html"><![CDATA[Awarded 2021 Outstanding Graduate Student during CCI Honors event.]]></summary></entry><entry><title type="html">Multimodal Brain Tumor Segmentation using Convolutional Neural Network</title><link href="https://jraymartinez.github.io/portfolio/projects/bts/" rel="alternate" type="text/html" title="Multimodal Brain Tumor Segmentation using Convolutional Neural Network" /><published>2021-03-13T08:50:00+00:00</published><updated>2021-03-13T08:50:00+00:00</updated><id>https://jraymartinez.github.io/portfolio/projects/bts</id><content type="html" xml:base="https://jraymartinez.github.io/portfolio/projects/bts/"><![CDATA[<h2 id="authors">AUTHORS</h2>
<p><a href="https://jraymartinez.github.io/">John Ray Martinez</a> (jbm332@drexel.edu), <a href="https://www.linkedin.com/in/jonathan-musni-624773134/">Jonathan Musni</a> (jem472@drexel.edu), <a href="https://www.linkedin.com/in/marvin-joseph-occeno-8b4a95120/">Marvin Joseph Occeño</a> (mr048@drexel.edu), <a href="https://www.linkedin.com/in/edmarparreno/">Edmar Parreño</a> (erp75@drexel.edu), and <a href="https://www.linkedin.com/in/juanmigueltrinidad/">Juan Miguel Trinidad</a> (jbt46@drexel.edu)</p>

<p><sub> <em>This capstone project was selected for oral research presentation at the <a href="https://drexel.edu/graduatecollege/professional-development/emerging-graduate-scholars-conference/Archive/2021/2021-orals/">2021 Drexel Emerging Graduate Scholars Conference</a></em> </sub></p>

<p><img src="https://jraymartinez.github.io/portfolio/assets/images/bts.png" alt="Snapshots of the comparisons between ground truth, U-Net and PSPNet brain tumor segmentation (Fext = 0.35)." /></p>

<p><strong>Abstract.</strong> Segmentation is the process of examining brain images such as magnetic resonance imaging (MRI) images and computed tomography (CT) scans to locate regions of interests (ROI). These regions define the boundaries of the brain tumor. Undergoing this process allows radiologist or medical personnel to distinguish between healthy cells and tumors. However, with manual segmentation, radiologists take a lot of time and labor to properly segment the images with high accuracy. Human error is also inevitable. With limitations in human capacity, manual segmentation can inhibit diagnosis and therefore delay treatment.  Convolution neural network (CNN) models are popular methods in image processing and have rapidly developed as a powerful tool in computer vision and pattern recognition. U-Net architecture is a well-known variant of CNN that consists of contracting (convolution) path and expanding (deconvolution) path to be trained in an image-segmentation map. By utilizing the BraTS 2020 (Brain Tumor Segmentation 2020) from the University of Pennsylvania Center for Biomedical Image Computing &amp; Analytics (CBICA), we are able to  investigate the preponderance of this specific architecture. We show that, through hyperparameter tuning of kernel size from 3 to 2 in deconvolution path, sensitivity of Necrotic region has significant improvement while a poor performance in Enhancing region. In this study, we investigate Convolutional Neural Network (CNN)-based architectures such as U-Net and PSPNet. It is found that U-Net outperforms the PSPNet in correctly segmenting the brain tumors. This study’s findings can inform the design and development of an automatic brain tumor segmentation system.</p>

<p>Full paper can be requested.</p>]]></content><author><name>John Ray Martinez</name></author><category term="projects" /><category term="brain tumor" /><category term="Convolutional Neural Network" /><category term="image segmentation" /><summary type="html"><![CDATA[In this capstone project, we investigate Convolutional Neural Network (CNN)-based architectures such as U-Net and PSPNet in segmenting the brain tumors.]]></summary></entry><entry><title type="html">The Impact of driver distraction and secondary tasks with and without other co-occurring driving behaviors on the level of road traffic crashes</title><link href="https://jraymartinez.github.io/portfolio/publications/aap/" rel="alternate" type="text/html" title="The Impact of driver distraction and secondary tasks with and without other co-occurring driving behaviors on the level of road traffic crashes" /><published>2021-02-17T23:14:36+00:00</published><updated>2021-02-17T23:14:36+00:00</updated><id>https://jraymartinez.github.io/portfolio/publications/aap</id><content type="html" xml:base="https://jraymartinez.github.io/portfolio/publications/aap/"><![CDATA[<p><a href="https://www.linkedin.com/in/ali-jazayeri/">Ali Jazayeri</a>, <a href="https://jraymartinez.github.io/portfolio">John Ray Martinez</a>, <a href="https://www.linkedin.com/in/helen-loeb-81240013/">Helen Loeb</a>, and <a href="http://cci.drexel.edu/faculty/cyang/index.html">Christopher C. Yang</a><br />
<em>Accident Analysis &amp; Prevention</em> 153 (2021) 106010.<br />
<a href="https://www.sciencedirect.com/science/article/abs/pii/S0001457521000415">https://doi.org/10.1016/j.aap.2021.106010</a></p>

<p><img src="https://jraymartinez.github.io/portfolio/assets/images/aap.png" alt="Snapshots of the tearing process for a lattice with a random distribution of reinforcements (Fext = 0.35)." /></p>

<p><strong>Abstract.</strong> Driving safety is typically affected by concurrent non-driving tasks. These activities might negatively impact the trips’ outcome and cause near-crash or crash incidents and accidents. The crashes impose a tremendous social and economic cost to society and might affect the involving individuals’ quality of life. As it stands, road injuries are ranked among top-ten leading causes of death by the World Health Organization. Distracted driving is defined as an attention diversion of the driver toward a competing activity. It was shown in numerous studies that distracted driving increase the probability of near-crash or crash events. By leveraging the statistical power of the large SHRP2 naturalistic data, we are able to quantify the preponderance of specific distractions during daily trips and confirm the causality factor of an ubiquitous non-driving task in the crash event. We show that, except for phone usage which happens more frequently in near-crash and crash categories than in baseline trips, both distracted driving and secondary tasks occur almost uniformly in different types of trips. In this study, we investigate the impact of the co-occurrence of distracted driving with other driving behaviors and secondary tasks. It is found that the co-occurrence of distracted driving with other driving behaviors or secondary tasks increase the chance of near-crash and crash events. This study’s findings can inform the design and development of more precise and reliable driving assistance and warning systems.</p>

<p>Full paper can be downloaded <a href="https://jraymartinez.github.io/portfolio/assets/docs/jmartinez_aap_2021.pdf">here</a></p>]]></content><author><name>John Ray Martinez</name></author><category term="publications" /><category term="Distracted driving" /><category term="Secondary tasks" /><category term="Co-occurring behaviors" /><category term="Driving behaviors" /><summary type="html"><![CDATA[Driving safety is typically affected by concurrent non-driving tasks that might negatively impact the trips’ outcome and cause near-crash or crash accidents.]]></summary></entry><entry><title type="html">Predicting Multiple Time Series with USA COVID-19 data using Machine Learning models</title><link href="https://jraymartinez.github.io/portfolio/projects/covid/" rel="alternate" type="text/html" title="Predicting Multiple Time Series with USA COVID-19 data using Machine Learning models" /><published>2020-09-03T08:50:00+00:00</published><updated>2020-09-03T08:50:00+00:00</updated><id>https://jraymartinez.github.io/portfolio/projects/covid</id><content type="html" xml:base="https://jraymartinez.github.io/portfolio/projects/covid/"><![CDATA[<h2 id="author">AUTHOR</h2>
<p><a href="https://jraymartinez.github.io/">John Ray Martinez</a> (jbm332@drexel.edu)</p>

<p><sub> <em>This research is implemented in fulfillment of the requirements for the Applied Machine Learning Course of Master of Science in Data Science under Drexel University College of Computing &amp; Informatics</em> </sub></p>

<h2 id="introduction">Introduction</h2>

<p>As novel coronavirus COVID-19 cases surge across the US, improving methods for prediction of COVID-19 cases in this country is extremely important. It is imperative that the hotspot is studied more thoroughly to slow down the outbreak while trying to find a cure and vaccine. Forecasting the time of future surge would minimize the impact of COVID-19 by taking timely preventive steps including public health early response such as lockdown, schools closures, and travel restrictions.</p>

<p>Therefore, accurate COVID-19 transmission rate forecasting is essential to better understand the current situation and plan for the future. This is also for public health authorities to implement interventions effectively in controlling the outbreaks. This would greatly minimize the social and economic impact of the disease.</p>

<p>Furthermore, the objective of this study is to determine whether it is possible to use one Machine Learning model on multiple time series data (COVID-19 new daily confirmed cases for each state) to project future COVID-19 confirmed cases.</p>

<h2 id="data-description">DATA DESCRIPTION</h2>

<p>The dataset was obtained from 2019 Novel Coronavirus COVID-19 (2019-nCoV) <a href="https://github.com/CSSEGISandData/COVID-19">Data Repository</a> by Johns Hopkins CSSE. In addition, data from US States population data of 2019 (NST-EST2019-alldata) was obtained from <a href="https://www.census.gov/data/tables/time-series/demo/popest/2010s-state-total.html">United States Census Bureau</a>.</p>

<p>The time series data is split into training set (01/22/2020 - 06/30/2020), validation set (07/01/2020 - 07/31/2020), and test set (08/01/2020 - 08/22/2020).</p>

<h2 id="methodology">METHODOLOGY</h2>

<p>Preprocessing includes feature engineering, data merging, feature selection, log and polynomial transformation, and categorical encoding. The performance of the following machine learning models is compared.</p>

<ul>
  <li>Linear Regressor</li>
  <li>RandomForest Regressor</li>
  <li>Gradient Boost Regressor</li>
  <li>XGBoost Regressor</li>
</ul>

<p>The general workflow for comparing the performance of machine learning models as shown in Figure 1 involves the following steps.</p>

<ol>
  <li>Preprocessing of data</li>
  <li>Recursive forecasting using rolling window with 1-day ahead</li>
  <li>Use of each model and evaluation using R-squared, Root Mean Square Error and Mean Absolute Error</li>
  <li>Comparison of models via R-squared, RMSE and MAE, plotting test results of all models, and plotting of residual results of all models</li>
</ol>

<p><img src="https://jraymartinez.github.io/portfolio/assets/images/covid_fig1.png" alt="Methodology" />
Figure 1. Workflow for measuring the performance of Machine Learning models.</p>

<h2 id="results">RESULTS</h2>
<p>The metric evaluations show that XGBoost had the best results in terms of predicting the test dataset. On the other hand, Gradient Boosting outperformed all the models in training dataset. Furthermore, XGBoost has the best results among the models while showing its capability to do parallel processing as it clocked the fastest elapsed time.</p>

<p><img src="https://jraymartinez.github.io/portfolio/assets/images/covid_metric_results.png" alt="Results" />
Table 1. Statistics for prediction of COVID-19 daily confirmed cases.</p>

<p>Plotting of test and residual results of all models in states which attributed the highest number of confirmed cases are shown below.</p>

<h3 id="a-texas">A. <strong>Texas</strong></h3>

<p><img src="https://jraymartinez.github.io/portfolio/assets/images/covid_texas_pred.png" alt="'Texas prediction'" /> 
Figure 2. Texas prediction plot over the test set.</p>

<p><img src="https://jraymartinez.github.io/portfolio/assets/images/covid_texas_res.png" alt="'Texas redidual'" />
Figure 3. Texas residual plot over the test set.</p>

<h3 id="b-florida">B. <strong>Florida</strong></h3>

<p><img src="https://jraymartinez.github.io/portfolio/assets/images/covid_florida_pred.png" alt="'Florida prediction'" />
Figure 4. Florida prediction plot over the test set.</p>

<p><img src="https://jraymartinez.github.io/portfolio/assets/images/covid_florida_res.png" alt="'Florida redidual'" />
Figure 5. Florida residual plot over the test set.</p>

<h3 id="c-california">C. <strong>California</strong></h3>

<p><img src="https://jraymartinez.github.io/portfolio/assets/images/covid_calif_pred.png" alt="'California prediction'" />
Figure 6. California prediction plot over the test set.</p>

<p><img src="https://jraymartinez.github.io/portfolio/assets/images/covid_calif_res.png" alt="'California redidual'" />
Figure 7. California rsidual plot over the test set.</p>

<p>With the recent trend and issue of the coronavirus, the researcher hopes that this project will be relevant and helpful to other researchers, specialists, and especially public health surveillance systems. These people would benefit by knowing when to focus on this sickness and when the surge or resurgence would happen. Furthermore, it would help the public health authorities and governments alike in decision making whether to ease the lockdown or issue another lockdown for each state.</p>

<h2 id="limitations">LIMITATIONS</h2>
<p>First, the method does not include different control measures for each state such as level of social distancing and how these will change in the future. In addition, the available period of time is about half a year while considering around 50 US states which will result to approximately 11,000 data points; a relatively small data set. Moreover, despite focusing on US, the study does not include mapping of the virus’ trend to specific areas or regions in the country.</p>

<h2 id="insights-and-conclusion">INSIGHTS AND CONCLUSION</h2>

<p>The estimated values of COVID-19 daily confirmed cases were in good agreement with their related observed values and the used Machine Learning models, especially the ensemble method – Boosting models, could be used to forecast daily confirmed cases. These results are very worthwhile for the decision-making bodies or public health experts given that the decision is urgent.</p>

<p>For future work, merging more domain-related data like temperature, lockdown periods, etc, that have significant impacts in variation of number of COVID19 cases can be considered. In addition, the direct forecasting approach discussed by Souhaib Ben Taieb in his <a href="https://souhaib-bentaieb.com/papers/2014_phd.pdf">dissertation paper</a> can be explored.</p>]]></content><author><name>John Ray Martinez</name></author><category term="projects" /><category term="multiple time series" /><category term="linear regressor" /><category term="random forest" /><category term="gradient boosting" /><category term="XGBoost" /><summary type="html"><![CDATA[As novel coronavirus COVID-19 cases surge across the US, improving methods for prediction of COVID-19 cases in this country is extremely important.]]></summary></entry><entry><title type="html">I AM SAM: An Automatic Text Summarization System using different Extractive Techniques</title><link href="https://jraymartinez.github.io/portfolio/projects/iamsam/" rel="alternate" type="text/html" title="I AM SAM: An Automatic Text Summarization System using different Extractive Techniques" /><published>2020-08-27T08:50:00+00:00</published><updated>2020-08-27T08:50:00+00:00</updated><id>https://jraymartinez.github.io/portfolio/projects/iamsam</id><content type="html" xml:base="https://jraymartinez.github.io/portfolio/projects/iamsam/"><![CDATA[<h2 id="authors">AUTHORS</h2>
<p><a href="https://jraymartinez.github.io/">John Ray Martinez</a> (jbm332@drexel.edu), <a href="https://www.linkedin.com/in/jonathan-musni-624773134/">Jonathan Musni</a> (jem472@drexel.edu), <a href="https://www.linkedin.com/in/miggytrinidad/">Juan Miguel Trinidad</a> (jbt46@drexel.edu)</p>

<p><sub> <em>This research is implemented in fulfillment of the requirements for the Information Retrieval Systems Course of Master of Science in Data Science under Drexel University College of Computing &amp; Informatics</em> </sub></p>

<h2 id="introduction">INTRODUCTION</h2>
<p>In recent years, there has been a growth of large volume of text data from a variety of sources. This explosion of amount of text data led to the problem of information overload. The generation today called ‘Net generation’ learns through multitasking, performing activities simultaneously, and has short attention span. ‘Net generation’ can perform more tasks simultaneously and shift their attentions quickly from one to another, but would probably be overwhelmed if they are asked to read a long report. Thus, more educators motivate them to engage in the learning content by supplying shorter contents in the curricula [<a href="#ref1">1</a>]. To alleviate information overload and considering the characteristic of the ‘Net generation’, the need for automatic text summarization is deemed necessary.</p>

<p>One tool for text summarization is Python package sumy [<a href="#ref2">2</a>]. It has three most notably used models namely LSA (latent semantic analysis), LexRank and Luhn. LSA is an unsupervised method of summarization that combines term frequency techniques with singular value decomposition to summarize texts. Also an unsupervised approach, LexRank is a graphical based text summarizer inspired by algorithms PageRank. Meanwhile, Luhn is a naive approach based on TF-IDF. It scores sentences based on frequency of the most important words and also assigns higher weights to sentences occurring near the beginning of a document [<a href="#ref3">3</a>]. In this study, we investigate and evaluate the application of sumy models on the extractive summarization task using news articles and show that the results obtained with LSA are competitive with other two algorithms developed.</p>

<p>Furthermore, utilizing the sumy extractive summarization techniques, we build and implement a web application on Heroku that mainly functions as text summarizer.</p>

<h2 id="experiments">EXPERIMENTS</h2>

<h3 id="data-description">DATA DESCRIPTION</h3>
<p>The dataset is approximately 2225 documents from the BBC
news website and represented into five topical areas such
as business, entertainment, politics, sport, and technology
[<a href="#ref4">4</a>]. This dataset for extractive text summarization has 510
business news articles of BBC from 2004 to 2005. For each
article, one summary are provided in the Summaries folder.
In this study, the first 100 pairs of business news articles
and its correponding reference summaries were manually
selected and used. The extractive summary articles will be
used as reference summaries (gold standard) for evaluating
the system summaries using ROUGE.</p>

<h3 id="methodology">METHODOLOGY</h3>
<p><img src="https://jraymartinez.github.io/portfolio/assets/images/iamsam_algo.png" alt="'Algorithm 1'" style="float: left;margin-right: 7px;margin-top: 7px;" /></p>

<p>We applied the three sumy methods in the sampled business news articles. All these algorithms extract six sentences from each article in order to compose the summary. We performed an experimental comparison with three extractive summarization techniques. The performance of each summarization technique was evaluated by using variants of the ROUGE measure [<a href="#ref5">5</a>]. This performance metrics is a method based on Ngram statistics and found to be highly correlated with human evaluations [<a href="#ref6">6</a>]. Concretely, Rouge-N with unigrams and bigrams (Rouge1 and Rouge-2) and Rouge-L. Each Rouge has corresponding F1, precision, recall scores. First, the value of the evaluation measure was calculated for each of the article. Next, we took average of those scores to arrive at a consolidated Recall and F1 scores for each Rouge. Algorithm 1 shows the pesudo-code of the implementation of the method in this study.</p>

<h2 id="experimental-results-and-discussion">EXPERIMENTAL RESULTS AND DISCUSSION</h2>
<p>We evaluate the four summarization techniques on a single-document summarization task using 100 news articles from business section of BBC dataset. For this task to have a meaningful evaluation, we report ROUGE Recall as standard evaluation and take output length into account [<a href="#ref7">7</a>]. For each article, each summarizer generate a six-sentences summary. The corresponding 100 human-created reference summaries are provided by BBC and used in the evaluation process. We compare the performance of the three different summarizing techniques with each other.</p>

<p>Table 1 shows the results obtained on this data set of 100 news articles, including the results for LSA, and the results of the other two sumy summarizers in the single document summarization task. LSA summarization technique succeeds in summarization task on news articles followed by sumy-LexRank then sumy-Luhn.</p>

<p><a id="table1"></a></p>
<h4 id="table-1-the-average-recall-f1-of-test-set-results-on-the-bbc-business-news-articles-dataset-using-granularity-of-text-metrics-rouge-1-rouge-2-and-rouge-l">Table 1. The average Recall (F1) of test set results on the BBC business news articles dataset using granularity of text metrics ROUGE-1, ROUGE-2 and ROUGE-L.</h4>
<table>
<thead>
<tr>
<th>Model</th>
<th>ROUGE-1</th>
<th>ROUGE-2</th>
<th>ROUGE-L</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>LSA</code></td>
<td>0.867 (0.059)</td>
<td>0.617 (0.041)</td>
<td>0.841 (0.082)</td>
</tr>
<tr>
<td><code>Luhn</code></td>
<td>0.794 (0.052)</td>
<td>0.407 (0.031)</td>
<td>0.612 (0.045)</td>
</tr>
<tr>
<td><code>LexRank</code></td>
<td>0.844 (0.072)</td>
<td>0.576 (0.049)</td>
<td>0.807 (0.096)</td>
</tr>
</tbody>
</table>

<p>Figure 1 visualizes the comparison of models using Rouge Recall as performance metrics. As shown, LSA has the best performance in extractive summarization task on business news articles.</p>

<p><img src="https://jraymartinez.github.io/portfolio/assets/images/iamsam_fig1.PNG" alt="'ROUGE'" />
Figure 1: ROUGE performance of algorithms.</p>

<h2 id="implemented-system">IMPLEMENTED SYSTEM</h2>
<p>In this section, we presented the overall architecture to implement the system and discussed the major system features.</p>

<h3 id="system-architecture">SYSTEM ARCHITECTURE</h3>
<p>The overall architecture of web application for single document summarization based on news components using sumy models LSA,
Luhn and LexRank is shown in Figure 2. The three main phases include the back-end, front-end, and deployment. To create a web application, we utilized
Flask which is a micro web framework written in Python. For
layout to look good, we styled it with Boostrap. Finally, we
deployed the models on Heroku. Figure 3 presented visually
the detailed diagram of system features.</p>

<p><img src="https://jraymartinez.github.io/portfolio/assets/images/iamsam_fig2.PNG" alt="'Architecture overview'" />
Figure 2: System architecture overview.</p>

<h3 id="system-features">SYSTEM FEATURES</h3>
<p>I AM SAM web app alleviates information overload by distilling important information using machine learning algorithms.
It is an assistant that helps the users manage time by providing text summary in seconds. In this project, we focus on
plain text and URL as inputs. More specifically, we consider
the following features:</p>

<p><img src="https://jraymartinez.github.io/portfolio/assets/images/iamsam_fig3.png" alt="'System features'" />
Figure 3: System Features.</p>

<h4 id="target-length">Target Length</h4>
<p>Target length is the number of sentences in the text summary. This feature lets the user input
the prefer length of summary in terms of number of sentences
in the output.</p>

<h4 id="inputs">Inputs</h4>
<p>There are two possible inputs: plain text articles or URL that contains the text articles. Figure 3 visualizes
the diagam flow for these two options.</p>

<h4 id="check-mode">Check Mode</h4>
<p>Feature ‘check mode’ gives the user a
choice to put up a reference summary. There is a ON and
OFF toggle for this feature. It provides the user to check
how good the provided summaries with respect to the reference summaries. This also shows the calculated Rouges (F1,
precision, recall) scores for each model text summary.</p>

<h4 id="best-summary-generator">Best Summary Generator</h4>
<p>Once the check mode is ON,
the system compares the summary output of each algorithm
and output the best model based on Rouge Recall metric</p>

<h2 id="conclusion-and-future-works">CONCLUSION AND FUTURE WORKS</h2>
<p>We presented Python package sumy implementation of LSA (latent semantic analysis) outperforming the other models such as LexRank and Luhn in extractive summarization task using BBC business news articles. Furthermore, we implemented an automatic text summarization system called I AM SAM through Heroku that has capability to summarize news article from a URL or plain text utilizing the three sumy extractive summarization techniques.</p>

<p>As future work, we plan to extend the averaging algorithm to all articles in business news articles folder which has a total of 510 articles. In addition, we will explore other news articles such as entertainment, politics, sport, and technology.</p>

<h2 id="references">REFERENCES</h2>
<p>[1] <a id="ref1"></a>D. G. Oblinger and J. L. Oblinger. 2005. In Educating the
net generation. Educause. Retrieved August, 19, 2020 from
<a href="https://www.educause.edu/ir/library/PDF/pub7101.PDF">https://www.educause.edu/ir/library/PDF/pub7101.PDF</a>.</p>

<p>[2] <a id="ref2"></a>Mišo Belica. 2020. Module for automatic summarization of text
documents and HTML pages. <a href="https://github.com/miso-belica/sumy">https://github.com/miso-belica/sumy</a>.</p>

<p>[3] <a id="ref3"></a>Mišo Belica. 2020. Summarization methods. <a href="https://github.com/miso-belica/sumy/blob/master/docs/summarizators.md">https://github.com/miso-belica/sumy/blob/master/docs/summarizators.md</a>.</p>

<p>[4] <a id="ref4"></a>Derek Greene and Pádraig Cunningham. 2006. Practical Solutions to the Problem of Diagonal Dominance in Kernel Document
Clustering. In Proc. 23rd International Conference on Machine
learning (ICML’06). ACM Press, 377–384.</p>

<p>[5] <a id="ref5"></a>C.Y. Lin. 2004. ROUGE: A Package for Automatic Evaluation of
Summaries. In In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop. Association for Computational
Linguistics: Barcelona, Spain, 74–81.</p>

<p>[6] <a id="ref6"></a>C.Y Lin and E.H. Hovy. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In In Proceedings of
Human Language Technology Conference (HLT-NAACL 2003).
Association for Computational Linguistics: Edmonton, Canada.</p>

<p>[7] <a id="ref7"></a> Benjamin Van Durme Courtney Napoles and ChrisCallison-Burch. 2011. Evaluating sentence com-pression: Pitfalls and suggested
remedies. In Proceedings of the Workshop on Monolingual Text-To-Text Generation. Association for Computational Linguistics: Portland, Oregon, 91–97.</p>]]></content><author><name>John Ray Martinez</name></author><category term="projects" /><category term="automatic text summarization" /><category term="extract" /><category term="inormation retrieval" /><category term="natural language processing" /><category term="SUMY" /><summary type="html"><![CDATA[We implemented an automatic text summarization system that has capability to summarize news article from a plan text or URL.]]></summary></entry><entry><title type="html">Identifying Co-occurrence Based on Hours Played for Video Games</title><link href="https://jraymartinez.github.io/portfolio/projects/videogames/" rel="alternate" type="text/html" title="Identifying Co-occurrence Based on Hours Played for Video Games" /><published>2020-06-12T08:50:00+00:00</published><updated>2020-06-12T08:50:00+00:00</updated><id>https://jraymartinez.github.io/portfolio/projects/videogames</id><content type="html" xml:base="https://jraymartinez.github.io/portfolio/projects/videogames/"><![CDATA[<h2 id="authors">AUTHORS</h2>
<p><a href="https://jraymartinez.github.io/">John Ray Martinez</a> (jbm332@drexel.edu), <a href="https://www.linkedin.com/in/jonathan-musni-624773134/">Jonathan Musni</a> (jem472@drexel.edu), <a href="https://www.linkedin.com/in/marvin-joseph-occeno-8b4a95120/">Marvin Joseph Occeno</a> (mr048@drexel.edu)</p>

<p><sub> <em>This research is implemented in fulfillment of the requirements for the Data Mining Course of Master of Science in Data Science under Drexel University College of Computing &amp; Informatics</em> </sub></p>

<h2 id="introduction">INTRODUCTION</h2>
<p>Playing video games has always been a popular leisure activity. Recently, in light of the pandemic, people are actually encouraged to play such video games to ensure that they do stay at home [<a href="#ref1">1</a>]. To let gamers keep on playing more, recommendation engines are being utilized by several online video game stores. Players receive various game suggestions which are usually based on, but not limited to, their gaming history [<a href="#ref2">2</a>]. In this project, we create a game-based recommender system using association rules mining with respect to the video games that were frequently played together. The objectives of this study are to: i) identify the most played video games; ii) identify the frequent co-occurring video games; and iii) provide recommendations based on correlated video games.</p>

<h2 id="data-description">DATA DESCRIPTION</h2>
<p>Steam, the largest digital distribution platform for PC gaming, has 6000 games and a community of millions of gamers. One study shows that searchability is one of the reasons why Steam is growing so rapidly [<a href="#ref3">3</a>]. Moreover, it has experienced explosive growth in 2018. This platform attracted a lot of companies to source out their data. Tamber, an analytics service company, was able to manually crawl the data from the Steam API in 2017.</p>

<p>As per Kaggle documentation [<a href="#ref4">4</a>], the dataset which is approximately nine megabytes of data is represented into the following columns:</p>

<p><a id="table1"></a></p>
<h4 id="table-1-sample-filtered-dataset">Table 1. Sample Filtered Dataset.</h4>
<table>
<thead>
<tr>
<th>User Id</th>
<th>Games Played</th>
<th>Number of Hours Played</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>298950</code></td>
<td>ARK Survival Evolved</td>
<td>41.0</td>
</tr>
<tr>
<td><code>76767</code></td>
<td>Call of Duty Modern Warfare 2</td>
<td>65.0</td>
</tr>
<tr>
<td><code>76767</code></td>
<td>Banished</td>
<td>24.0</td>
</tr>
<tr>
<td><code>229911</code></td>
<td>Call of Duty Modern Warfare 2</td>
<td>44.0</td>
</tr>
<tr>
<td><code>86540</code></td>
<td>Audiosurf</td>
<td>57.0</td>
</tr>
</tbody>
</table>

<h2 id="methodology">METHODOLOGY</h2>
<p>We transformed the dataset into a matrix of 1s and 0s. The columns of the new dataframe represent the video games, whereas the rows represent the players. A table cell is set to 1 if a user has played the game for more than or equal to the median number of hours played; otherwise, its value is 0 as shown in Table 2. We utilized the Python library MLxtend to automatically perform the Apriori principle to determine the frequent itemsets. The library also generated association rules given these itemsets where the pattern evaluation metrics like support, confidence, and lift are listed. Based on this discretization, we generated association rules and built a recommender system.</p>

<p><a id="table2"></a></p>
<h4 id="table-2-sample-transformed-dataset">Table 2. Sample Transformed Dataset.</h4>
<table>
<thead>
<tr>
<th>User Id</th>
<th>ARK Survival Evolved</th>
<th>Audiosurf</th>
<th>Banished</th>
<th>BioShock Infinite</th>
<th>Borderlands 2</th>
<th>Call of Duty Black Ops</th>
<th>Call of Duty Modern Warfare 2</th>
<th>Call of Duty Modern Warfare 2 - Multiplayer</th>
<th>Call of Duty World at War</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>5250</code></td>
<td>0.0</td>
<td>0.0</td>
<td>0.0</td>
<td>0.0</td>
<td>0.0</td>
<td>0.0</td>
<td>0.0</td>
<td>0.0</td>
<td>0.0</td>
</tr>
<tr>
<td><code>76767</code></td>
<td>0.0</td>
<td>0.0</td>
<td>1.0</td>
<td>0.0</td>
<td>0.0</td>
<td>1.0</td>
<td>1.0</td>
<td>1.0</td>
<td>1.0</td>
</tr>
<tr>
<td><code>86540</code></td>
<td>0.0</td>
<td>1.0</td>
<td>0.0</td>
<td>0.0</td>
<td>0.0</td>
<td>0.0</td>
<td>0.0</td>
<td>0.0</td>
<td>0.0</td>
</tr>
<tr>
<td><code>229911</code></td>
<td>0.0</td>
<td>0.0</td>
<td>0.0</td>
<td>0.0</td>
<td>0.0</td>
<td>0.0</td>
<td>1.0</td>
<td>1.0</td>
<td>0.0</td>
</tr>
<tr>
<td><code>298950</code></td>
<td>1.0</td>
<td>0.0</td>
<td>0.0</td>
<td>1.0</td>
<td>1.0</td>
<td>0.0</td>
<td>0.0</td>
<td>0.0</td>
<td>0.0</td>
</tr>
</tbody>
</table>

<h3 id="quantitative-association-rules">Quantitative Association Rules</h3>
<p>Here comes the fun part - finding association rules. The first step is to determine frequent itemsets. Since the data is relatively large, we have decided to set the minimum support to 0.005.</p>

<p><a id="table3"></a></p>
<h4 id="table-3-itemsets-with-minimum-support-of-0005">Table 3. Itemsets with minimum support of 0.005.</h4>
<table>
<thead>
<tr>
<th>support</th>
<th>itemsets</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>0.006799</code></td>
<td>(7 Days to Die)</td>
</tr>
<tr>
<td><code>0.009713</code></td>
<td>(APB Reloaded)</td>
</tr>
<tr>
<td><code>0.010962</code></td>
<td>(ARK Survival Evolved)</td>
</tr>
<tr>
<td><code>0.009852</code></td>
<td>(AdVenture Capitalist)</td>
</tr>
<tr>
<td><code>0.014153</code></td>
<td>(Age of Empires II HD Edition)</td>
</tr>
<tr>
<td><code>     ...</code></td>
<td>     ...</td>
</tr>
<tr>
<td><code>0.006660</code></td>
<td>(Left 4 Dead 2, The Elder Scrolls V Skyrim, Te...</td>
</tr>
<tr>
<td><code>0.006244</code></td>
<td>(Unturned, Left 4 Dead 2, Team Fortress 2)</td>
</tr>
<tr>
<td><code>0.007354</code></td>
<td>(Unturned, Robocraft, Team Fortress 2)</td>
</tr>
<tr>
<td><code>0.005828</code></td>
<td>(Unturned, Terraria, Team Fortress 2)</td>
</tr>
<tr>
<td><code>0.006244</code></td>
<td>(Team Fortress 2, Unturned, Garry's Mod, Count...</td>
</tr>
</tbody>
</table>

<p>Based on the 454 itemsets that passed the minimum support (0.005), we determined the most frequent k-itemsets below.</p>

<p><a id="table4"></a></p>
<h4 id="table-4-top-5-frequent-1-itemsets">Table 4. Top 5 Frequent 1-itemsets.</h4>
<table>
<thead>
<tr>
<th>Itemset</th>
<th>Support</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>Dota 2</code></td>
<td>0.3368</td>
</tr>
<tr>
<td><code>Team Fortress 2</code></td>
<td>0.1618</td>
</tr>
<tr>
<td><code>Counter-Strike Global Offensive</code></td>
<td>0.0956</td>
</tr>
<tr>
<td><code>Unturned</code></td>
<td>0.0744</td>
</tr>
<tr>
<td><code>Left 4 Dead 2</code></td>
<td>0.0561</td>
</tr>
</tbody>
</table>

<p><a id="table5"></a></p>
<h4 id="table-5-top-5-frequent-2-itemsets">Table 5. Top 5 Frequent 2-itemsets.</h4>
<table>
<thead>
<tr>
<th>Itemset</th>
<th>Support</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>Dota 2, Team Fortress 2</code></td>
<td>0.0336</td>
</tr>
<tr>
<td><code>Counter-Strike Global Offensive, Dota 2</code></td>
<td>0.0319</td>
</tr>
<tr>
<td><code>Counter-Strike Global Offensive, Team Fortress 2</code></td>
<td>0.0309</td>
</tr>
<tr>
<td><code>Team Fortress 2, Unturned</code></td>
<td>0.0279</td>
</tr>
<tr>
<td><code>Left 4 Dead 2, Team Fortress 2</code></td>
<td>0.0272</td>
</tr>
</tbody>
</table>

<p><a id="table6"></a></p>
<h4 id="table-6-top-5-frequent-3-itemsets">Table 6. Top 5 Frequent 3-itemsets.</h4>
<table>
<thead>
<tr>
<th>Itemset</th>
<th>Support</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>Counter-Strike Global Offensive, Dota 2, Team Fortress 2</code></td>
<td>0.0132</td>
</tr>
<tr>
<td><code>Counter-Strike Global Offensive, Garry's Mod, Team Fortress 2</code></td>
<td>0.0115</td>
</tr>
<tr>
<td><code>Counter-Strike Global Offensive, Team Fortress 2, Unturned</code></td>
<td>0.0115</td>
</tr>
<tr>
<td><code>Garry's Mod, Team Fortress 2, Unturned</code></td>
<td>0.0115</td>
</tr>
<tr>
<td><code>Counter-Strike Global Offensive, Left 4 Dead 2, Team Fortress 2 	</code></td>
<td>0.0115</td>
</tr>
</tbody>
</table>

<p>The most frequent 1-itemset is Dota 2 as shown in Table 4. It dominates the Steam gaming world. The results for frequent 2-itemsets and 3-itemsets in Table 5 and Table 6 respectively are not necessarily interesting since it is quite expected that popular games such as Dota 2 and Team Fortress 2 would co-occur more than others. Hence, we did some research on relative co-occurrence analysis and found a metric called all-confidence [<a href="#ref5">5</a>], which is equal to the equation 1 below.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>               all-confidence(X⇒Y) =  support(X⇒Y) / max(support(X), support(Y))       (1)
</code></pre></div></div>

<p>If the all-confidence is equal to 1, then itemsets X and Y always co-occur relatively. This is equivalent to saying that both confidence(X⇒Y) and confidence(Y⇒X) are equal to 1.</p>

<p>Since MLxtend does not compute the all-confidence metric, we implemented a function and found the frequent 2-itemsets shown in Table 7 in next section.</p>

<h2 id="results">RESULTS</h2>

<p>As observed from Table 7, the popular game Dota 2 is nowhere to be found in the Top 5. This is because the frequency of 2-itemsets has been computed on a relative basis (all-confidence).</p>

<p><a id="table7"></a></p>
<h4 id="table-7-top-5-frequent-2-itemsets-based-on-all-confidence">Table 7. Top 5 Frequent 2-itemsets (Based on All-Confidence).</h4>
<table>
<thead>
<tr>
<th>Itemset</th>
<th>All-Confidence</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>Half-Life 2 Episode One, Half-Life 2 Episode Two</code></td>
<td>0.6410</td>
</tr>
<tr>
<td><code>Call of Duty Modern Warfare 3, Call of Duty Modern Warfare 3 - Multiplayer</code></td>
<td>0.5755</td>
</tr>
<tr>
<td><code>Call of Duty Modern Warfare 2, Call of Duty Modern Warfare 2 - Multiplayer 	</code></td>
<td>0.5436</td>
</tr>
<tr>
<td><code>Call of Duty Black Ops, Call of Duty Black Ops - Multiplayer</code></td>
<td>0.4912</td>
</tr>
<tr>
<td><code>Total War ROME II - Emperor Edition, Total War SHOGUN 2</code></td>
<td>0.4444</td>
</tr>
</tbody>
</table>

<p>Speaking of all-confidence, this metric seems to be reliable enough for co-occurrence analysis since the results above are quite sensible. For instance, Half-Life 2 Episode One and Half-Life 2 Episode Two are shown to co-occur frequently despite not being popular as shown in Table 7. Looking at their titles, one is probably a sequel of the other. This means that players are highly interested in completing the game series since the co-occurrence is relatively high. In addition, Call of Duty Modern Warfare 3 and Call of Duty Modern Warfare 3 - Multiplayer appear to co-occur frequently as well. This makes sense since these games are actually related to each other content-wise, not to mention how similar their game titles are. The key difference of these two games is that the former has a single-player mechanics while the latter is multiplayer-oriented which requires interaction with other players. Hence, it seems that many of those who played the single player campaign also wanted to try the multiplayer mode, and vice-versa.</p>

<p>To avoid the limitation of the support-confidence framework (i.e., high support and high confidence could happen by chance), we primarily use the evaluation metric ‘lift’ to find more meaningful associations. The idea is to provide recommendations based on strongly correlated video games.</p>

<p>Sorted based on highest ‘lift’, the generated rules are consistent with our results earlier. Half-Life 2 Episode One and Half-Life 2 Episode Two are part of the top list. Having a huge lift of 65.07, these two games indeed have strong, positive correlation. Moreover, it is expected that Call of Duty Modern Warfare 3 and Call of Duty Modern Warfare 3 - Multiplayer are also strongly correlated with a lift of 41.47.</p>

<p><a id="table8"></a></p>
<h4 id="table-8-sample-of-interesting-rules">Table 8. Sample of Interesting Rules.</h4>
<table>
<thead>
<tr>
<th>A</th>
<th>B</th>
<th>A Support</th>
<th>B Support</th>
<th>support(A⇒B)</th>
<th>confidence(A⇒B)</th>
<th>lift</th>
<th>leverage</th>
<th>conviction</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>(The Elder Scrolls V Skyrim)</code></td>
<td>(Fallout 4)</td>
<td>0.047315</td>
<td>0.011794</td>
<td>0.005273</td>
<td>0.111437</td>
<td>9.448542</td>
<td>0.004715</td>
<td>1.112139</td>
</tr>
<tr>
<td><code>(The Elder Scrolls V Skyrim)</code></td>
<td>(BioShock Infinite)</td>
<td>0.047315</td>
<td>0.014847</td>
<td>0.006244</td>
<td>0.131965</td>
<td>8.888508</td>
<td>0.005541</td>
<td>1.134923</td>
</tr>
</tbody>
</table>

<p>One of the interesting rules that we find cool enough is that The Elder Scrolls V Skyrim and Fallout 4 are highly correlated with a lift of 9.45 as shown in Table 8. Amazingly, we have found out that bundles of these two games are currently being sold not only on Steam but also on the PlayStation Store. It seems that Steam and PlayStation are aware that players frequently played these two games together before, and that’s why these games are recently being sold as a bundle for both <a href="https://store.steampowered.com/bundle/6527/Skyrim_Special_Edition__Fallout_4_GOTY/">PC</a> and <a href="https://store.playstation.com/en-us/product/UP1003-CUSA02557_00-FO4GOTYSSEBUNDLE">PS4</a> console gaming.</p>

<p>From the same table, another interesting rule indicates that The Elder Scrolls V Skyrim and BioShock Infinite are also highly correlated with a lift of 8.89. We have found out that a bundle of these two games, for PlayStation 3 this time, are actually being sold on <a href="https://www.amazon.com/Elder-Scrolls-Skyrim-Bioshock-Infinite-PlayStation/dp/B00HV0MNEI">Amazon.com</a>. Again, this means that these two games might really have a strong association and that’s why they’re being sold as part of a bundle.</p>

<h2 id="discussion-and-future-works">DISCUSSION AND FUTURE WORKS</h2>
<p>In this study, after trying multiple thresholds, we learned that the threshold of 0.5% provides acceptable interpretability of results. We use this support threshold to consider a set of frequent video games played. It means a game set is deemed to be frequent if this set is observed in at least 0.5% of the data set of video games. Please note that the a frequent set should have at least one game while the number of games in the set cannot exceed three.</p>

<p>The result of the adopted approach can be summarized as follows. We show that Dota 2 is the most frequent played game. Although not as frequent as Dota 2, Team Fortress 2 is the second game observed. However, the impacts of co-occurrences of these two games are observed in frequent 2-itemsets and 3-itemsets results. Considering this observation that the popular games will always co-occur more than the others, we adopt a good strategy to perform other interesting measure ‘all confidence’. As a result, popular games are not in the top games co-occurrences and the most common pair of co-occurring games is composed of Half-Life 2 Episode One and Half-Life 2 Episode Two which are obviously associated to each other being a prequel-sequel pair.</p>

<p>Furthermore, we focused on the video games co-occurrence analysis in a data set to come up with video games recommendation system. With the help of library MLxtend, we built the system that primarily use the built-in interesting measure ‘lift’ to recommend associated games. This system is able to identify meaningful associations from our co-occurrence analysis and provide games recommendations to the user. Interestingly, some of its association rules found are exactly the same game pairs sold as a bundle for both PC and PS4 console gaming. However, we are dealing with a huge number of games which is potential for large number of null transactions and the built-in interesting measure ‘lift’ is a not null-invariant measure. This means that ‘lift’ can be easily affected by null transactions giving a chance for the system to provide the user with a bad recommendation.</p>

<p>With this, in the next study, we are going to focus on tasks of comparing different null-invariant measures such as ‘all confidence’, ‘Kulczynski’, etc. Since there is no such readily-made functions or modules that implement those measures, we need to create ones and integrate to the recommendation system. It will be interesting to see the the impacts of null-invariant measures along with not null-invariant to the association rules.</p>

<p>Moreover, another avenue of future research that we would like to explore is the predictive power of individual and co-occurring frequent video games for the category of games. This idea is based on this fact that some of the video games are frequent in a specific category of games.</p>

<h2 id="references">REFERENCES</h2>
<p>[1] <a id="ref1"></a>M. Snider, Video games can be a healthy social pastime during coronavirus pandemic, USA Today, March 29, 2020. [Online]. Available: <a href="https://www.usatoday.com/story/tech/gaming/2020/03/28/videogames-whos-prescription-solace-during-coronaviruspandemic/2932976001/">https://www.usatoday.com/story/tech/gaming/2020/03/28/videogames-whos-prescription-solace-during-coronaviruspandemic/2932976001/</a>. [Accessed May 7, 2020].</p>

<p>[2] <a id="ref2"></a>P. Bertens, A. Guitart, P. P. Chen and A. Perianez, A Machine-Learning Item Recommendation System for Video Games, 2018 IEEE Conference on Computational Intelligence and Games (CIG), Maastricht, 2018, pp. 1-4, doi: 10.1109/CIG.2018.8490456.</p>

<p>[3] <a id="ref3"></a>O’Neill, M., Vaziripour, E., Wu, J., Zappala, D.: Condensing steam: distilling the diversity of gamer behavior. In Proceedings of the 2016 Internet Measurement Conference, IMC 2016, pp. 81-95. ACM, New York (2016). <a href="https://doi.org/10.1145/2987443.2987489.">https://doi.org/10.1145/2987443.2987489</a>.</p>

<p>[4] <a id="ref4"></a>Tamber Team. (2017, March). Steam Video Games. Retrieved May 10, 2020 from <a href="https://www.kaggle.com/tamber/steam-video-games/">https://www.kaggle.com/tamber/steam-video-games/</a>.</p>

<p>[5] <a id="ref5"></a>D. J. Prajapati, S. Garg and N. C. Chauhan,” Interesting association rule mining with consistent and inconsistent rule detection from big sales data in distributed environment, “ Future Computing and Informatics Journal, pp. 1-12, 2017.</p>]]></content><author><name>John Ray Martinez</name></author><category term="projects" /><category term="recommender system" /><category term="video games" /><category term="association rules" /><summary type="html"><![CDATA[In this project, we create a game-based recommender system using association rules mining with respect to the video games that were frequently played together.]]></summary></entry><entry><title type="html">Applied Data Science with Python</title><link href="https://jraymartinez.github.io/portfolio/certificates/coursera/" rel="alternate" type="text/html" title="Applied Data Science with Python" /><published>2019-02-06T08:50:00+00:00</published><updated>2019-02-06T08:50:00+00:00</updated><id>https://jraymartinez.github.io/portfolio/certificates/coursera</id><content type="html" xml:base="https://jraymartinez.github.io/portfolio/certificates/coursera/"><![CDATA[<p><a href="https://umich.edu/"><em>University of Michigan</em></a> (2019).<br /></p>

<p><strong>Description</strong>. John has successfully completed a collection of Data Science courses from University of Michigan in Coursera through the project jointly organized by <a href="https://pcieerd.dost.gov.ph/">DOST-PCIEERD</a> and <a href="https://coursebank.ph/">moocs.ph</a>. This was a 12-month online DOST-PCIEERD sponsored Data Science Courses in cooperation with Coursera and moocs.ph.</p>

<p>The Applied Data Science with Python specialization has the following modules:</p>

<ol>
  <li>Using Python to Access Web Data (download <a href="https://www.coursera.org/account/accomplishments/verify/D2KTSD43RPEN">here</a>)</li>
  <li>Using Databases with Python (download <a href="https://www.coursera.org/account/accomplishments/verify/K2V4P4BY9E7A">here</a>)</li>
  <li>Introduction to Data Science in Python</li>
  <li>Applied Plotting, Charting and Data Representation in Python</li>
  <li>Applied Machine Learning in Python</li>
  <li>Applied Text Mining in Python</li>
  <li>Applied Social Network Analysis in Python</li>
</ol>

<p>The certificate for the collection of courses can be downloaded <a href="https://www.coursera.org/account/accomplishments/specialization/NXKX3LQMKGQ5">here</a>.</p>]]></content><author><name>John Ray Martinez</name></author><category term="certificates" /><category term="coursera" /><category term="python" /><category term="machine learning" /><category term="text mining" /><category term="social network" /><summary type="html"><![CDATA[Has successfully completed a collection of Data Science courses from University of Michigan in Coursera.]]></summary></entry><entry><title type="html">Effect of a Linear Potential on the Temporal Diffraction of Particle in a Box</title><link href="https://jraymartinez.github.io/portfolio/publications/spp/" rel="alternate" type="text/html" title="Effect of a Linear Potential on the Temporal Diffraction of Particle in a Box" /><published>2009-10-30T06:14:36+00:00</published><updated>2009-10-30T06:14:36+00:00</updated><id>https://jraymartinez.github.io/portfolio/publications/spp</id><content type="html" xml:base="https://jraymartinez.github.io/portfolio/publications/spp/"><![CDATA[<p><a href="https://jraymartinez.github.io/portfolio">John Ray Martinez</a> and <a href="http://quant-math.org/wp/">Eric A. Galapon</a><br />
<a href="https://spp-online.org/"><em>Samahang Pisika ng Pilipinas (SPP)</em></a> (2009): ISSN 1656-2666 Volume 6. 17.<br /></p>

<p><img src="https://jraymartinez.github.io/portfolio/assets/images/spp.png" alt="Diffraction in time for state n = 500 of an incident beam at observation point x = 1, with box width L = 1 and f = 1000." /></p>

<p><strong>Abstract.</strong> Diffraction in time of a particle initially confined in a box is studied under linear potential. Moshinsky’s shutter problem is generalized to include new initial conditions with linear potential, which show double temporal diffraction for each opposite-moving plane wave with occurrence of reflected wave at later times. Density profiles at transient times and later times are discussed. Twofold Moshinsky’s diffraction in time for high energy state of the particle is also analyzed. Classical limit as box width L tends to zero is illustrated.</p>

<p>The full paper can be downloaded <a href="https://jraymartinez.github.io/portfolio/assets/docs/jmartinez_spp_2009.pdf">here</a>.</p>]]></content><author><name>John Ray Martinez</name></author><category term="publications" /><category term="temporal diffraction" /><category term="linear potential" /><category term="classical limit" /><summary type="html"><![CDATA[Diffraction in time of a particle initially confined in a box is studied under linear potential.]]></summary></entry></feed>