Monthly Archives: December 2017

Graduate Seminar: eBay Machine Learning Engineer!

eBay Machine Learning Graduate Seminar

Weili Zhang was the first analytics lab @ OU student to join the team, the first MS Data Science and Analytics graduate from OU, and will be Dr. Nicholson’s first student to complete his PhD in Industrial & Systems Engineering. He accepted a machine learning job at eBay last year in San Jose, CA, but is back this week to defend his PhD research on Friday, December 8, and then at 4:30p, to give a seminar presentation, open to the public, on machine learning at eBay. I expect this to be a pretty casual meeting and expect that Weili will be open to lots of Q&A and discussion.

It is my great pleasure to invite you to attend the seminar if you can: Friday, December 8, 2017 @ 4:30p in the Carson Engineering Center, Room 117 (map below). Also if you would like to join remotely, you can connect via Zoom:

New Masters 2017!

This week I am very happy to congratulate all of the students completing their Master’s of Science and PhD degrees.

Several of these students are my advisees and I am quite proud of their accomplishments.  As of today, all of my MSc students have defended their work.  And on Friday, my first PhD student will defend his research.  I’ll post the results of that as soon as I have it!

For now, lets focus on the Analytics Lab 2017 new masters!

New Masters and the MSc Research Path

The Master’s thesis student has three major components of their academic path: (1) successful completion of rigorous graduate course work; and (2) an in-depth research effort, spanning one to two years, on an area of specialization that results in the Master’s thesis (usually a 50 to 100 page manuscript detailing the background of the problem, the complexities of work, and their results), and (3) the Master’s defense.

The defense is a presentation to a committee of faculty members, and any others present, the summary of their entire research efforts.  During the defense, the committee members  ask questions relating to any detail of the work.   Questions are aimed at determining whether or not the student truly understands the concepts, methods, and results.  These are often open-ended and require critical, yet on-the-spot, reflection about his or her work.

Most defenses last 30 minutes to 1 hour, but some may exceed 1.5 hours, depending on the questions and student responses.  While the process is not ‘grueling’ per se, it is significant.

Successful defenders…

This semester, I am privileged to participate on 8 MS thesis committees and 2 PhD committees of students completing in December.  Most of the defenses are occurring this week.  So it is a busy week!

However, I am particularly happy about the successful results of 4 of the MS students, since I am their advisor.  Congratulations to Yunjie “Nicole” Wen, Gowtham Talluru, Samineh Nayeri, and Pauline Ribeyre!

Yunjie “Nicole” Wen, Masters of Science in Data Science and Analytics

Thesis: Game theory application of resilience community road-bridge transportation system

Abstract: “This paper considers the problem of game theory application in resilience-based road-bridge transportation network. Bridges in a community may be owned and maintained by separated entities. These owners may have different and even competing objectives for the recovering the transportation system after disaster. In this work, we assume that each player attempts to maximize the efficiency of repair to the system from the perspective of their own damaged damaged bridges after a hazard. The problem is modeled as an N-player nonzero-sum game. Strategic form and sequential form game are designed to demonstrate methodology.  A genetic algorithm is applied to the computation of the problem. The transportation network from Shelby County, TN is used to demonstrate the proposed methodology.”

Nicole will be continuing her academic career by pursing a PhD in Industrial and Systems Engineering at the University of Oklahoma.

Gotham Talluru, Masters of Science in Data Science and Analytics

Thesis: Dynamic Uplift Modelling

Abstract: “A new approach to Uplift modelling which considers time dependent behavior of the customers is analyzed. Uplift modelling (also known as true lift or incremental modeling) has applications in marketing, insurance, banking, personalized medicine, among other fields. The objective of an Uplift model is to identify individual entities who should be targeted for treatment (e.g., a marketing campaign) to maximize the incremental impact overall.

Research to-date has considered this as a static problem modelled at a single instance of time.  The method introduced in this work considers modelling uplift in a dynamic environment.  In particular, I consider a series of direct marketing contacts and simulate  periodic purchasing behavior of customers.  In contrast to static uplift models, the uplift in the purchase probability of the customers is dependent on time as well as customers previous purchases and offers received.  Appropriate modifications are made to static model approaches to adapt them to a dynamic model approach.

This study demonstrates significant potential for both researches and retail companies for thinking about the problem of uplift longitudinally.”

Gowtham has accepted a prestigious job in data science with PricewaterhouseCoopers (PwC) in the Oil and Gas sector of their business.

Samineh Nayeri, Masters of Science in Industrial & Systems Engineering

Thesis: Decomposition algorithm in fixed charge time-space network flow problems

Abstract: “A wide range of network flow problems primarily used in transportation is categorized as time-space fixed charge network flow problems. In this family of networks, each node is associated with a specific time and is replicated across all time-periods. The cost structure in these problems consists of variable and fixed costs where continuous and binary variables are required to formulate the problem as a mixed integer linear programming. and the problem is known to be NP-hard.  When the time dimension is added to the problem, solution approaches are even more time-consuming and CPU and memory intensive.

In this work, a decomposition heuristic is proposed that subdivides the problem into various time epochs to create smaller and more manageable subproblems.  These subproblems are solved sequentially to find an overall solution for the original problem. To evaluate the capability and efficiency of the decomposition method vs. exact method, a total of 1600 problems are generated and solved using Gurobi MIP solver, which runs parallel branch & bound algorithm. Statistical analysis indicates that depending on the problem specification, the average solution time in the decomposition is improved by more than four orders of magnitude and the solutions found are high quality (<2.5% from optimal, on average).”

Pauline Ribeyre, Masters of Science in Industrial & Systems Engineering

Thesis: Finding key characteristics of promising drug compounds for anticancer drug discovery

Abstract: “Multidrug resistance is the simultaneous resistance to two or more chemically unrelated therapeutics, including some therapeutics the cell has never been exposed to. It is one of the biggest obstacles to effective cancer chemotherapy treatments. Multidrug resistance can be caused by drug efflux, an otherwise useful body mechanism that prevents a too-high drug concentration in cells, by using proteins called transporters. Some chemical compounds have the ability to sensitize the cells to the drugs by disabling these transporters. The focus of this work is to find key characteristics of compounds that may disable a specific transporter, the P-glycoprotein. Three datasets listing compounds, their values for different features, and their ability to disable the transporters are provided by experts. Using the programming language R, various data analytics methods are applied to these datasets with the objective of predicting whether compounds are P-glycoprotein inhibitors or not. The main issue encountered is the fact that the most important dataset did not contain enough samples for the number of predictor variables. Ultimately, the decision tree and random forest models prove to be the most effective in predicting the compounds’ ability to disable the transporter.”

Congratulations to all the new masters!  May the force be with you.