Author Archives: Charles Nicholson

New pubs in Sustainable and Resilient Infrastructure

The Analytics Lab has two new publications recently accepted in Sustainable and Resilient Infrastructure.Sustainable and Resilient Infrastructure

Barker, K., J. Lambert, C. Zobel, A. Tapia, J. Ramirez-Marquez, L. McLay, C. Caragea, C. Nicholson. 2016. Defining Resilience Analytics. Accepted for publication in Sustainable and Resilient Infrastructure on September 1, 2016.

Zhang, W. and C. Nicholson. 2016. A multi-objective optimization model for retrofit strategies to mitigate direct economic loss and population dislocation. Accepted for publication in  Sustainable and Resilient Infrastructure on September 19, 2016.


Defining Resilience Analytics

Dr. Nicholson along with Dr. Kash Barker (OU),  Dr. Cornelia Caragea (UNT), Dr. James Lambert (UVA), Dr. Laura McLay (Univ of Wisconsin), Dr. Chris Zobel (Virginia Tech), Dr. Andrea Tapia (Penn State), and Dr. Jose Ramirez-Marquez (Stevens Institute) have collaborated on this perspective article funded by their NSF award.

Abstract: Theory, methodology, and applications of risk analysis contribute to the quantification and management of resilience. For risk analysis, numerous complementary frameworks, guidelines, case studies, etc., are available in the literature. For resilience, the documented applications are sparse relative to numerous untested definitions and concepts. This essay on resilience analytics motivates the methodology, tools, and processes that will achieve resilience of real systems. The paper describes how risk analysts will lead in the modeling, quantification, and management of resilience for a variety of systems subject to future conditions including technologies, economics, environment, health, developing regions, regulations, etc. The paper identifies key gaps where methods innovations are needed, presenting resilience of interdependent infrastructure networks as an example. Descriptive, predictive, and prescriptive analytics are differentiated. A key outcome will be the recognition, adoption, and advancement of resilience analytics by scholars and practitioners of risk analysis.

A multi-objective optimization model for retrofit strategies to mitigate direct economic loss and population dislocation

This work is part of the NIST-funded Center of Excellence in Community Resilience and will be published in a special edition of Sustainable and Resilient Infrastructure focused on some of the initial analysis conducted by the Center.

Abstract: One strategy to mitigate social and economic vulnerabilities of communities to natural disasters is to enhance the current infrastructure underlying the community. Decisions regarding allocation of limited resources to improve infrastructure components are complex and involve various trade-offs. In this study, an efficient multi-objective optimization model is proposed to support decisions regarding building retrofits within a community.
In particular, given a limited budget and a heterogeneous commercial and residential building stock, solutions to the proposed model allow a detailed analysis of the trade-offs between direct economic loss and the competing objective of minimizing immediate population dislocation. The developed mathematical model is informed by earthquake simulation modeling as well as population dislocation modeling from the field of social science. The model is applied to the well-developed virtual city, Centerville, designed collaboratively by a team of engineering experts, economists, and social scientists. Multiple Pareto optimal solutions are computed in the case study and a detailed analysis regarding the various decision strategies is provided.


Sustainable and Resilient Infrastructure is an interdisciplinary journal that focuses on the sustainable development of resilient communities.

Sustainability is defined in relation to the ability of infrastructure to address the needs of the present without sacrificing the ability of future generations to meet their needs.  Resilience is considered in relation to both natural hazards (like earthquakes, tsunami, hurricanes, cyclones, tornado, flooding and drought) and anthropogenic hazards (like human errors and malevolent attacks.)  Resilience is taken to depend both on the performance of the built and modified natural environment and on the contextual characteristics of social, economic and political institutions. Sustainability and resilience are considered both for physical and non-physical infrastructure.

Contributions address pressing societal issues while exploring needed solutions.  Investigating sustainability and resilience from an interdisciplinary perspective, the journal includes original articles, reviews, short communications and case studies in all areas relevant to sustainability and resilience.

 

Open Faculty Position: Cyber-Physical-Social Systems

Open Faculty Position in ISE

The School of Industrial and Systems Engineering at the University of Oklahoma is recruiting to fill an open tenure-track faculty position to begin in August 2017.  The position should help further our existing core research efforts in Cyber-Physical-Social systems in particular as it relates to the broad field of resilience.  The Analytics Lab @ OU is actively engaged in research in regards to both community resilience and critical resilient interdependent infrastructure systems and processes.

This position will also help support the Data Science and Analytics graduate program in the College of Engineering.   The full position announcement is available as a PDF in the link below.  Here is an excerpt of the position description.  Applicants are encourage to apply by November 1, 2016.

Open Faculty Position

The full position announcement can be found here: ISE Faculty Open Position

The University of Oklahoma is a Carnegie-R1 comprehensive public research university known for excellence in teaching, research, and community engagement, serving the educational, cultural, economic and health-care needs of the state, region, and nation from three campuses: the main campus in Norman, the Health Sciences Center in Oklahoma City, and the Schusterman Center in Tulsa.

OU enrolls over 30,000 students and has more than 2,700 full-time faculty members. Norman is a culturally rich and vibrant town located in the Oklahoma City metro area. With outstanding schools, amenities, and a low cost of living, Norman is a perennial contender on the “Best Places to Live” rankings.

The University of Oklahoma, in compliance with all applicable federal and state laws and regulations, does not discriminate on the basis of race, color, national origin, sex, sexual orientation, genetic information, gender identity, gender expression, age, religion, disability, political beliefs, or status as a veteran in any of its policies, practices, or procedures. The University of Oklahoma, recognizing its obligation to guarantee equal opportunity to all persons in all segments of University life, reaffirms
its commitment to the continuation and expansion of positive programs which reinforce and strengthen its affirmative action policies. This commitment stems not only from compliance with federal and state equal opportunity laws but from a desire to ensure social justice and promote campus diversity. Our commitment to the concept of affirmative action requires sincere and cooperative efforts throughout all levels of our employment structure. We will continue to strive to reach the goals of fair and equal employment opportunities for all.

Big data: what is it?

What is Big Data?

big data knows everything

Big data is term that you often hear when people talk about data science and analytics.

So, the question is, “what is big data?”

Doug Laney from Gartner, a leading information technology research company, defined 3 dimensions of “big data”: volume, velocity, and variety.

  • Volume denotes the size and scale of the data.   There is a lot of data out there – and it is growing. It is estimated that 40 zettabytes of data will be created by the year 2020.   What is a zettabyte?  One zettabyte is equal to 1 trillion gigabytes!
  • Velocity is the speed at which data is created as well as the increasing speed at which it is processed.  The speed at which data is created is almost unimaginable. And it is accelerating. I’ll give some examples, but by the time you see read this they will be out of date: Google is processing about 3.5 billion search queries everyday; every minute we are uploading 300 hours of video onto Youtube; and 3.4 million emails are sent every second.  Check out this site for more up-to-date information: http://www.internetlivestats.com
  • Variety of the data refers to the fact that data comes from many sources and in many forms.  Whether it is facebook posts, video uploads, satellite images, GIS data, reviews on products from Amazon.com, sensor data from self-driving cars, or data from wearable devices and wireless health monitors – data is  is coming at us from all directions and in many formats.

People love alliteration…

Everyone seems to want to add more “V’s” to the definition of big data so now we have 4’vs of big data, the 5’vs, 6 V’s, and even 7 V’s of big data… 

Batman says: Only 3 V's of big data!

Let’s look at these next four V’s: Veracity, Variability, Visualization, and Value.   I’d like to add however that these next dimensions are not unique to “big” data, but represent challenges to data of basically any size.  Now, I should mention that Doug Laney did not necessarily like the addition of the new V’s to his working description of “big data”

  • The first one, added by IBM, is “veracity” – that is the accuracy, truthfulness, or trustworthiness of the data.  IBM found that 1 in 3 business leaders didn’t trust the information that they use to make decisions. And additionally that “poor data quality costs the US economy an estimated 3.1 trillion dollars a year.

big data and veracity

  • Variability implies that the meaning of the data is changing.  A number, variable, or rule might have had a metalFancertain definition last month; but now it has changed.  This also might relate, for example, to how words have different meanings in different context.  One especially difficult challenge in the field of natural language processing is how to detect and interpret sarcasm.  The same word used in one phrase may have the exact opposite meaning when used in a different phrase.

 

  • Visualization is associated with challenge of understanding what is really in your data – this includes visualizing and communicating the interesting facets of the data; turning all of this into something comprehensible — this is not easy.

big data dashboard

  • Finally, the last V – value.  Data by itself has no real value.   Having lots of it, without meaning, doesn’t do anyone any good. Individual observations, transactions, records, entities in the data have mean very little on their own.  It is only though aggregation and analysis that we can find anything worthwhile.   But, there is so much of it, there is an enormous potential!  As a shameless plug, turning big data or small data or anything in between into value – well, that’s the purpose of the ISE/DSA 5103 Intelligent Data Analytics course that I teach.

Now what?

I like Joel Gurin, author of Open Data Now, I like his quote on defining big data, “Big data describes datasets that are so large, complex, or rapidly changing that they push the very limits of our analytical capability.  It’s a subjective term: What seems “big” today may seem modest in a few years when our analytic capacity has improved.”

“Big data describes datasets that are so large, complex, or rapidly changing that they push the very limits of our analytical capability.    — Joel Gurin”

What was big data yesterday, may not be big data now; and what is “big” now,  may not be considered “big” tomorrow.  However, what is consistent in this field and this problem is about the need for us to expand our analytical talents and technology.  This (again, shameless plug) is what the MS Data Science and Analytics program at OU is all about!  Joel Gurin goes on to say that what’s really important is not so much the size of the data, but the “big impact” that it can have on society, health, economy, and research.

next level of big data

Fall 2016 Classes

Why are open source statistical programming languages the best?
Because they R.

It is August and Fall 2016 classes begin in just a couple of days.  I am currently prepping for two large classes: I happy to see the incredible interest in my graduate course with over 50 students enrolled in ISE/DSA 5103 Intelligent Data Analytics! I will also be taking over Dr. Suleyman Karabuk’s ISE 4113 Decision Support Systems undergraduate course with nearly 80 students already enrolled!

To this end I am collecting as many new jokes and one-liners as possible — gotta to keep the material fresh.  That said, to those of you who have yet to have taken any of my courses, my jokes are really not that funny, however, I do expect all students to laugh regardless.  This is a price that must be paid.  If you have any jokes, puns, etc. that are both short, clean, related to statistics or data science, and optionally are funny, please send them my way: cnicholson @ ou (dot) edu.

To support these two course I have tricked two unassuming graduate students into becoming TA’s for me.  Sai Krishna Theja Bhavaraju has enthusiastically accepted the role of TA for ISE 4113 and Alex Rodriguez will be the TA for ISE 5103.  Both of these TA’s are bright, friendly, and very helpful.  If you are taking either of these two classes, please feel free to ask them for help.  If you are not taking these classes, but you stumble across either of these two gentlemen, please buy them a beer — they have their work cut out for them!

Fall 2016 Classes

Intelligent Data Analytics is not an easy course.  The homeworks and projects are notoriously challenging.  In the class we address real-world data intensive problems by integrating human intuition with data analysis tools to draw out and communicate meaningful insights. Topics include problem approach and framing, data cleansing, exploratory analysis and visualization, dimension reduction, linear and logistic regression, decision trees, and clustering.  Students will be introduced to a powerful open source statistical programming language (R) and work on hands-on, applied data analysis projects.  I have heard from several former students that this has been a hard but useful course — at least six students that I know of who have taken this course have obtained jobs in analytics and data science fields at companies including Deloitte Consulting, Visual BI, GE Global Research, Nerd Kingdom, OKC Thunder, and Standard & Poors.  Hopefully the skills you are introduced to in the class can be helpful to you in the future.Former students working in Analytics

ISE 4113 is a Decision Support Systems course that exploits advanced features of MS Excel 2013 to model and build decision support applications.  The course will start with the basics and quickly move into mathematical modeling, simulation, VBA, and GUI design.  While this is the first time for me to teach this course, I have heard from students that the material they learn in this class has made a significant impact in their academic and professional lives.  I hope to continue the track record of success with this course.

 

 

Summer 2016 Hangout

Summer 2016 Hangout

Very happy to see all the students and friends that came out to the Summer 2016 hangout at McNellie’s The Abner Ale House in Norman.  I am privileged to work a wide variety of students in ISE, DSA, and CEES who are applying research in a broad array of application areas (from Community Resilience to Streaming Clustering in online Gaming to Predictive Modeling for TV Ratings to Optimizing Ship Routing) and who represent many different cultures, languages, and backgrounds.  Our group includes members from China, India, Iran, Peru, Brazil, as well as Oklahomans and Texans.  My beautiful wife, hailing from Mexico, also came to hangout.

I am glad that this gave you a chance to meet some new colleagues and reconnect with others outside the lab.

Hopefully, all of the MS DSA students (Alex B., Alex R., Alexandra, Emily, Silvia, and Stephen) can support each other through this academically intense Fall semester about to begin!   Silvia and Emily are completing their industry practicums this week as well — so congratulations to them (assuming all goes well!)

We are also happy to welcome Vera Bosco to the group — an ISE PhD student who is applying methods of stochastic optimization and dynamic programming to ship routing under weather uncertainty.  She is a new addition from the group and hails from Brazil.  Her bio is now posted on the team page.

And as always, I am glad to hangout with the CEES group who are a part of the CORE lab – Peihui, Mohammad, Yingjun, and Jia.

I hope this opportunity (and more like them to come) will help you connect with your colleagues and co-conspirators in the Analytics Lab. Several students are out-of-town during the Summer, but when everyone is back from their internships and travels we will plan a get-together for the Fall.

IMG_0837IMG_0818

IMG_0816IMG_0801IMG_0806IMG_0810IMG_0803IMG_0815IMG_0830IMG_0821IMG_0832IMG_0822IMG_0834

 

Two new publications in CAIE

Summer publications!

CAIE-published

We are happy to see two new papers accepted for publication in Computers and Industrial Engineering this Summer!  These publications form a logical pair, with one introducing a new perspective that uses statistical learning to help study the Fixed-Charge Network Flow (FCNF) problem and the other develops a solution technique that hybridizes the new approach with classical techniques to improve on CIEsolution efficiency.

Zhang, W. and C.D. Nicholson. 2016. Prediction-based relaxation solution approach for the fixed charge network flow problem. Computers & Industrial Engineering, 99:106-111 http://dx.doi.org/10.1016/j.cie.2016.07.014.
Keywords: Network optimization; Fixed charge network flow; Heuristics

Abstract: A new heuristic procedure for the fixed charge network flow problem is proposed. The new method leverages a probabilistic model to create an informed reformulation and relaxation of the FCNF problem. The technique relies on probability estimates that an edge in a graph should be included in an optimal flow solution. These probability estimates, derived from a statistical learning technique, are used to reformulate the problem as a linear program which can be solved efficiently. This method can be used as an independent heuristic for the fixed charge network flow problem or as a primal heuristic. In rigorous testing, the solution quality of the new technique is evaluated and compared to results obtained from a commercial solver software. Testing demonstrates that the novel prediction-based relaxation outperforms linear programming relaxation in solution quality and that as a primal heuristic the method significantly improves the solutions found for large problem instances within a given time limit.

Nicholson, C.D. and W. Zhang. 2016. Optimal Network Flow: A Predictive Analytics Perspective on the Fixed-Charge Network Flow Problem. Computers & Industrial Engineering, 99:260-268 http://dx.doi.org/ 10.1016/j.cie.2016.07.030  
Keywords:Network analysis, Fixed charge network flow, Predictive modeling, Critical components

Abstract: The fixed charge network flow (FCNF) problem is a classical NP-hard combinatorial problem with wide spread applications. To the best of our knowledge, this is the first paper that employs a statistical learning technique to analyze and quantify the effect of various network characteristics relating to the optimal solution of the FCNF problem. In particular, we create a probabilistic classifier based on 18 network related variables to produce a quantitative measure that an arc in the network will have a non-zero flow in an optimal solution. The predictive model achieves 85% cross-validated accuracy. An application employing the predictive model is presented from the perspective of identifying critical network components based on the likelihood of an arc being used in an optimal solution.

TSRI

We have also just had a very good first round review from Sustainable and Resilient Infrastructure on a paper entitled “Defining Resilience Analytics for Interdependent Cyber-Physical-Social Networks” and expect a quick second round of reviews soon.

Journal_of_Biomedical_InformaticsWe have finally had the first round of reviews back from the Journal of Biomedical Informatics and a paper written in 2015 by Leslie Goodwin (MS ISE @ OU), Charles Nicholson (OU), and Corey Clark (SMU) entitled “Variable neighborhood search for reverse engineering of gene regulatory networks”. The first round review is very promising, and we are going to work hard to see this paper published in such a high quality journal!

Hopefully this fall we have 7 more submissons of papers that are close to wrapping up.  These include two papers on data mining, one paper on network heuristics, and four papers relating to advancing the science of resilience.

KDD Scholarship 2016, San Francisco, CA

Congratulations Alexander: KDD Scholarship!
KDD Scholarship

Congratulations to Alexander Rodriguez for his recent KDD scholarship award.  The award is for the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining in San Francisco, CA.   Alexander, who joined the Analytics Lab in the first of June 2016, won the scholarship through Broadening Participation in Data Mining (BPDM).  The BPDM workshop will be co-located and hosted along with KDD 2016.

KDD 2016 is a premier interdisciplinary conference that brings together researchers and practitioners from data science, data mining, knowledge discovery, large-scale data analytics, and big data.  The conference will be held in August with keynote speakers from Microsoft Research (graphons?), Stanford (information security), NEA (investing in machine learning), Google DeepMind (deep learning!), and Berkely (messy data!).  Additionally, the invited speakers are data scientist, engineers, and researchers from NVIDIA , Verizon, Uber, Netflix, Amazon, and Tencent.  With topics that include anything from autonomous cars and large-scale maching learning to profiling users and bayesian optimization.  Check out more details here: KDD 2016 Workshop.  And a list of the accepted papers is available here: Accepted Papers

The vision of the Broadening Participation in Data Mining group is to foster mentorship, guidance, and connections of minority and underrepresented groups in Data Mining, while also enriching technical aptitude and exposure. BPDM provides venues in which to encourage students from such groups to connect with junior and senior researchers in industry, academia, and government. They hope to create and help grow meaningful lasting connections between researchers, thereby strengthening the Data Mining Community.

This workshop should be of a great benefit to Alexander since he has just begun his Masters in Data Science and Analytics.  Alexander is quite active already and is currently started work on his first bit of publishable research in resilience optimization.  Congratulations Alexander on the scholarship!  Hope you enjoy the conference and San Francisco, California!

2016 “Spread Your Wings” Adventure Race

Adventure Race!

Too Cool Adventure Race

..a 1500 foot sherpa line, paddling in crystal clear waters, 14+ miles of sweet, sweet single track, mine maze, skeet, water slides and a 3000 foot Zipline are just a few of the things you will find at the Too Cool – Spread Your Wings adventure weekend. It’s summertime and Dr. Nicholson is heading down to Texas Hill Country near Rocksprings, TX at Camp Eagle to participate in the 10th anniversary of one the funnest, coolest adventure races around!

Nicholson will be competing with one of his best friends from Texas, Nate Simmons, as a two-man team in the 12-hour race which will Adventure Raceinclude 1-3 miles river paddling, 30-40 miles mountain biking on roads, jeep trails and single track, 15-20 miles trekking, orienteering, ropes stuff and special tests.

The team “F5” will start the race at 7:30 AM on Saturday, May 28 and hopefully finish before the cut-off time at 9:00 PM.

Last time this team participated, the 12 hour race turned into about 15 hours and we had to give up after getting completely lost in the middle of the woods in the dark…  however, this time we are even LESS prepared — neither Nicholson or Simmons remember how to read a topographic map, plot coordinates, plan a route, count paces, and neither has ridden their mountain bikes in a couple of years, and apparently the race area has expanded to include another 5000 acres…  so, this should definitely be an adventure!

Too Cool Adventure Race Map

Adventure Race – topographic map

 

Welcome students!

welcome

Welcome! Bienvenidos!  欢迎! خوش آمدی

Very happy to welcome some new students to the Analytics Lab!  Some of these will be working with Dr. Nicholson for their Master’s thesis, others are working on special studies projects or practicums — regardless, those of use in the lab are glad you are around and looking forward to the Summer and Fall ’16 and beyond!

  • Alex Rodríguez – MS thesis — Alex will be working on NIST Community Resilience Project
  • Yanbin Chang – MS thesis — Yanbin will be working on NSF Resilience Analytics Project
  • Samineh Nayeri – special studies (decomposition algorithm for a network flow model)
  • Megan Snelling – MS thesis (I got a great NIST Community resilience project waiting for you!)
  • Alex Beene – practicum (OKC Thunder)
  • Emily Grimes – practicum (Nerd Kingdom)
  • Shejuti Silvia – practicum (GE Oil and Gas)
  • Stephen Gonzalez – MS thesis — I have some good ideas for you!
  • Hamoud Obaid – special studies (economic optimization modeling) –> congrats to Hamoud: he is getting married over the Summer!

Welcome phrase in different languages. Word clouds concept.

Summer 2016 updates!

Summer 2016 is here!

summerImage

The last of the ISE/DSA 5113 finals has been taken, all pencils are down, and students are heading out for new adventures.

  • Cyril Beyney has just accepted an offer to become the lead data scientist for Nerd Kingdom and is moving to Dallas, TX!
  • Weili Zhang is headed up to Pittsburg for a Summer internship
  • Param Tripathi heading down to San Antonio for a Summer internship
  • Olivia Perret has already moved up to New York for her new job
  • Naiyu Wang and Peihui Lin (from the CORE Lab) are headed to China for a month or two
  • Mohammad Tehrani is presenting at the Probabilistic Mechanics & Reliability Conference 2016 at Vanderbilt in Nashville, TN and then heading to Iran for a visit

I am very excited about the students staying in or coming to Norman!

  • Alexander  Rodríguez Castillo is joining our team from Peru to start his MS in DSA and then a PhD in ISE!peru
  • Alexandra Amidon is around for the Summer starting her MS thesis work in predictive modeling
  • Emily Grimes is beginning her practicum in analytics working with Nerd Kingdom

Dr. Nicholson will be around most of the Summer, but there are a few things on his agenda too:

  • Starting the Summer off right with a 12-hour adventure race down in Texas: mountain biking, orienteering, and kayaking
  • Taking the kids to Disney World!
  • Cimbing at least one 14,000+ mountain in Colorado with the wife
  • Panelist at the 41st Annual Natural Hazards Workshop in July in Colorado