Projects

Business and Revenue Improvement

Data Science and Operational Efficiencies

Leadership and Innovation Contributions

Data Monetization

Platforms and Applications for Data Analytics

Streaming Data

Data Governance

Infrastructure for Data valuation and management

Data literacy

Contributions to the industry

=======================================================

Telemetry Backbone (TBB) (2018 – present).

The Telemetry Backbone’s main purpose is to support Data Scientists with a solid and easily accessible layer to all available and enriched telemetry data.
Nowadays the connections between users and network connected devices are generating large quantities of data. The data usually is created by the Internet of Things (IoT). With the emergence of the Internet of Things, large volumes of data are generated today. All types of hardware can deliver information about their state and how they are being used. This can allow for completely new products and services such as “Predictive Maintenance”. However, the amount and structure of data have changed, as well as the data delivery mechanisms (from vast amounts provided by batch interfaces to highly frequent small data sets being constantly streamed).

These new types of data can be used for all kinds of insights, for example: performance of newly created products, usage of new products, and performance of infrastructure to create new products. To facilitate these use cases, data must be readily available to analysts trying to get specific information. The TBB provides: a single location allowing data scientists to find the data they are seeking without the need to focus on different delivery strategies, a platform to easily grasp the structural information of all datasets without the need to request that information from stakeholders, a tool to easily consume the data in a technical way without the need to learn new APIs and interfaces for each individual dataset.

[More about the TBB]

Modern Data Analytics Platform (MoDAP) (2022-present)

The Modern Data Analytics Platform is a composition of cloud services from Amazon Web Services to enable developers to build and schedule data pipelines, deploy machine learning models and store data from different sources. It abstracts the complexity of using AWS services.

The key benefits of MoDAP are: Infrastructure Abstraction, users of the Modern Data Analytics Platform are not required to have knowledge about AWS. Everything is abstracted from the users; Scalability: The MoDAP can dynamically scale compute resources based on demand and utilization; Cost Efficiency: Due to the scaling capabilities only required resources are used. This leads to a high-cost efficiency; Reduced Time to Market: Creating Data Engineering Pipelines can be achieved in minutes.

[More about MoDAP]

Truck Tire Competitor Pricing Intelligence (2024)

Continental’s pricing team is responsible for the design, implementation, and reinforcement of pricing strategy in the US market, for all product areas.
Competitor pricing intel is crucial part of remaining competitive in marketplace, but all the information is generated by sales team or sources in market. Intel comes in various formats and means: emails, excel files, K2 workflow, and, on occasion, pictures. In 2022, 15,000 competitor data points were collected. The current database & process are very manual and due to lack of ample capacity, there is no end-to-end process with regular recommendations on pricing adjustments based on market sensing.
Therefore, a single source for gathering and processing data, gaining insights, and making recommendations on adjustments to our pricing is missing and is necessary for the future.

The goal is to develop an efficient process and system by which the pricing team incorporates, processes, and analyzes, and creates pricing recommendations in a systematic way in order to maximize our competitive position in the market.

[More about Pricing Intelligence]


Price Optimization Decision Support Tool (2015).

Developed predictive models using R and machine learning to estimate parameters used in price optimization methodologies. Used optimization algorithms to find optimal pricing strategies for multiple products considering elasticities, discounts, variable costs and other parameters. Created proof of concept for a Decision Support Tool that is used for scenario-based optimization problems. Impact: Optimized profit and revenue margins for retail business owners with methodological framework to make informed decisions on product price selection and product assortment. Tools: regression and classification algorithms, mathematical optimization, R, linear programming.

Software Engineering Defect Prediction (2015).

Developed predictive models using R and machine learning to leverage code metrics and code process metrics to predict whether a software product is defective or not. Implemented supervised learning classification algorithms to train code metrics data and help client anticipate what new software products could be defective. Client is using predictions to try to allocate software testing resources more efficiently. Impact: Reduced software testing costs for software producers and engineers with tools to help make decisions about software testing resource allocation and optimization. Tools: classification algorithms. 

Consumer’s Sentiment Analysis of Popular Mobile Phone Brands using Social Media Data (2015).

Developed preliminary consumer’s sentiment analyses of popular mobile phone brands were performed using twitter data. Sentiment analysis included basic tasks of determining the polarity (e.g. positive, negative or neutral) of expressions included in the tweets. Beyond polarity, an attempt to classify emotions (e.g. joy, anger, etc.) was made about the devices in general. In addition to analyses about the devices as a whole, the datasets were analyzed to try to determine polarity and emotions about specific device features (camera, screen, etc.). Some interesting highlights from the results include: the general polarity seems to be more positive than negative for all devices, and negative emotions seem to be unimportant for all devices. This preliminary study can be improved by analyzing larger data sample sizes from twitter and other social media sources. Impact: Provided phone manufacturers with tools to help make decisions about benchmarks, future marketing campaigns and optimization of segmentation and targeting. Tools: Polarity and sentiment analyses, social media.

MSW DST Development (1994 – 2015).

Developed a quantitative framework to aid in decision making for integrated municipal solid waste (MSW) management. The MSW Decision Support Tool (MSW DST) uses a flexible framework to represent many site-specific issues and considerations. It incorporates both cost and environmental objectives. The environmental objectives are defined in terms of life cycle inventories of energy and emissions (of carbon monoxide, fossil- and biomass-derived carbon dioxide, nitrogen oxides, sulfur oxides, particulate matter [PM], and PM10) and greenhouse gases) associated with MSW management strategies. The application of the MSW DST was demonstrated through realistic hypothetical case studies. Several MSW management scenarios of typical interest to U.S. municipalities were studied. Through these illustrative applications, the flexibility and capabilities of the MSW DST were demonstrated.

The MSW DST has an optimization module that selects the best group of technology options based on cost or environmental criteria. Developed the mathematical model that constitutes the optimization core in the tool. This mathematical model is represented by a set of linear equations that constitute the input of a linear programming (LP) solver. The first version of the MSW DST uses the powerful commercial LP solver CPLEX ®. The MSWDST is comprised of multiple modules. The MSW models are written in VB.NET to represent the objectives functions and thousands of constraints and decision variables in a Linear Programing formulation. The MSW models use object oriented programming to represent the optimization problem components as objects and to convert these abstractions into LP and Mathematical Programming System (MPS) file formats in memory. The LP or MPS optimization problem is then loaded into CPLEX via Dynamic Link Libraries (DLLs) to find for an optimal solution using the Simplex algorithm. If an infeasible solution is found, potential causes for its infeasibility are suggested. If a feasible solution is found, the optimal decision variables are re-arranged and interpreted to represent the subject matter objects and to create reports with the optimal solution.

The MSWDST also includes multi-objective optimization capabilities to choose the objective function among competing objective functions such as cost, environmental emissions, energy consumption and recycling levels. The CPLEX DLL engine was used repeatedly to obtain the Pareto surface for convex multi-objective instances. Additionally, the CPLEX DLL engine was used to obtain near optimal solutions for a specific objective function. The Modeling to Generate Alternatives (MGA) methodology was used to alter the LP formulation submitted to the CPLEX DLL to obtain multiple interesting near optimal solution. Impact: Reduced waste management and engineering costs for business operations and municipalities with improved decision making in technology adoption, potential new markets and regulation compliance. Tools: machine learning, optimization, business intelligence, simulation modeling and life cycle analysis.    

Topological Insulators for Meso Dynamic Architectures (2014 – 2015).

Performed experimental designs for a project to study the metalorganic chemical vapor deposition (MOCVD) growth of ultrathin (≤ 300nm) Bi0.1Sb1.9Te3 thin films. It is a unique semiconductor, which was being explored for its potential as an efficient thermoelectric material for refrigeration or portable power generation. In this project, a series of statistically designed experiments (SDEs) were conducted to optimize the Bi0.1Sb1.9Te3 growth process. In these experiments, several materials’ properties were tracked (mobility, resistivity, carrier concentration, Seebeck, film thickness, growth rate, elemental percentage, surface morphology); however, the primary focus was power factor (measured in µW/K2-cm). Tools: Statistical learning, cluster analysis, and multivariate regression models.  

Options for Sustainable Waste Management in the City of Durham (2014 – 2015).

Developed a sustainable waste management system to reduce the resources expended by the City of Durham to manage its waste while minimizing impacts to health and the environment. The system shifts the view of waste from unusable materials to valuable commodities that can be used to grow industries and associated jobs. Current sustainability programs are expected to result in many benefits, including decreasing the use of virgin materials in products or processes, economic development opportunities for material recyclers, and social benefits. In addition to benefits, additional (and perhaps unforeseen) economic, social, and environmental impacts may result from new municipal solid waste (MSW) management strategies. Thus, decision makers must balance the objectives of promoting sustainable waste management with the need to protect human health and the environment, as well as to minimize any negative economic or social impacts. A MSW Decision Support Tool (MSW DST) was used for this study. The study provided a profile of current solid waste operations and infrastructure provided by the City of Durham. It presented and summarized results from the analyses of targeted waste management options and strategies that were defined in collaboration with City of Durham staff.

Smart Grid Data and Electric Power Load Forecasting (2013 – 2015).

Developed accurate models for electric power load forecasting that are essential to the operation and planning of a utility company. These load forecasting models help electric utilities make important decisions, including purchasing and generating electric power, load switching, and infrastructure development. Developed methodologies using agent-based model simulations and synthetic populations that could help develop a new electric forecasting paradigm. In this new paradigm, the forecast of future electricity consumption quantities and geographical locations could be analyzed in concurrent rather than separate models. The “how much,” “when,” and “where” could be simulated and answered at once in one combined simulation. Impact: Reduced capital and maintenance costs for electric grid planners with load forecasting tools to help make decisions on capital investment, network maintenance, etc. Tools: machine learning, regression models, simulation modeling etc. [Poster RTI 2015 Innovation Forum]   

System Reliability Model for Solid State Lighting (SSL) Luminaires (2011 – 2015).

Developed reliability model and accelerated life testing (ALT) methodologies for predicting the lifetime of integrated SSL luminaires. Standard SSL test methods, including Illumination Engineering Society LM-79-08 and LM-80-08, were used to evaluate luminaire and component performance. An initial reliability model based on assumed Arrhenius behavior was built. In the absence of comparable datasets, initial ALT studies were conducted using the Joint Electron Device Engineering Council’s standard test methods. Temperature, relative humidity, particle ingress, and atmospheric pollutant exposure were used as environmental stressors. Statistically valid sample sets, based on the assumed Arrhenius behavior, were used in this initial study. Phase II of this project created a multivariable reliability model based on measured statistical distributions of experimental values and degradation factors, with greatly improved accuracy over the initial model. This model was created by statistical analysis of the experiment data obtained during Phase I and includes the effects of environmental stressors on system reliability. ALT methodologies are refined through additional environmental stressors including step-stress methodologies to significantly reduce test duration. The multivariable reliability model is refined through additional ALT studies using these modified techniques. Validation of the model was done by performing additional ALTs, including lumen maintenance and system reliability testing on select luminaires. The final outcome from this project was a multivariable reliability prediction tool for SSL luminaires and new ALT methodologies for evaluating the system performance of SSL luminaires in less than 3,000 hours of testing. Designed and developed reliability models, Kaplan-Meier models, and Arrhenius models. Used multivariate regressions models, statistical learning and cluster analysis. Impact: Reduced manufacturing costs for lighting industry with decision support tool to help make decisions on product choices, physical properties and material selections to optimize lifetime. Tools: Regression models, survival analysis, simulation models.  

Impact of Genomics and Personalized Medicine on the Cost-effectiveness of Preventing and Screening for Breast Cancer in Younger Women (2011 – 2015).

Developed mathematical models to compare the costs and benefits of personalized medicine to identify the approaches that will be the most cost-effective to screen younger women to identify those at increased risk of developing breast cancer. Results from this study can be used to address critical questions related to new technologies more likely to be cost effective for screening young women; threshold values for these new technologies to be cost effective, the feasibility of these technologies in the real-world clinical setting; the impact of genomics-based screening technologies on the current screening pathways; and the costs and benefits of initiating genomics testing at specific age thresholds. The results from this modeling study provides important evidence for developing guidelines and recommendations related to breast cancer screening programs for young women. Worked with the Centers for Disease Control and Prevention to study the impact of personalized medicine on the cost-effectiveness of screening young women for breast cancer using an agent-based model to simulate individual behaviors and interactions and to assess their collective impacts at the population level. Designed and developed agent-based models in Repast Simphony, risk assessment models, incidence and prevalence models, Gompertz growth models, natural history models, screening models, and treatment models. Impact: Improved patient quality of life and reduced healthcare costs with tools to help make decisions about cancer treatments and interventions. Tools: agent based models and micro-simulation, risk analysis, stochastic considerations.

Time-Varying Factors Associated with Lipid Lowering Medications for Primary Prevention of Cardiovascular Disease (2013 to 2014).

Developed multi-state Markov models and micro-simulations with Agent-based models to predict and prevent cardiovascular disease. This project used de-identified clinical data derived from electronic medical records adopted and maintained since 1997 by Midwest Heart Specialists/Advocate Medical Group, a 50-physician cardiology practice. The objectives are to determine health status trajectories for primary prevention starting with the development of elevated levels of low-density lipoprotein cholesterol (>100 mg/dL) progressing through revascularization in patients without coronary artery disease at the start of observation; and to use the information derived from a multi-state Markov model to construct a simulator of cardiovascular disease (CVD) development accounting for known CVD risk factors, that will be useful in investigation of predictive analytic questions. Designed and developed agent-based models in Repast Simphony.

Translational Cocaine Addiction: From Man to Mouse to Man (2012).

Developed an ontology-based network model of cocaine abuse and addiction. Drug addiction cannot be adequately addressed solely within a single discipline and instead requires a more comprehensive approach. We first used mouse system genetics to identify genes, gene networks and pathways associated with cocaine dependence. As in our previous studies with nicotine and heroin dependence, previously identified candidate genes for cocaine abuse phenotypes in humans and model animals were used to initiate the mouse systems genetic studies. The findings of the mouse systems genetic studies were then integrated with known environmental factors, such as drug availability, social stressors, peer support, and environmental exposures, to build an ontology-based network model of cocaine abuse and addiction using the Protégé ontology editor and framework. This model can be used to provide a framework for future cocaine-addiction studies.

Comparative Effectiveness of Alcohol Treatments (2011 to 2014).

Developed a predictive framework to improve the quality of comparative effectiveness research by identifying subpopulations most responsive to alcohol treatment. The framework provides an analysis flow, linking theoretical, exploratory, predictive, and Markov models aimed to fill gaps not currently addressed by each of methods used separately. Designed and developed Agent Based Models for Alcohol Use and Treatment and simulators associated with those models in Repast Simphony, Random Forest Models, and Survival Analysis Models.

Methods for Assessing Vulnerability and Resilience of Critical Infrastructure (IHSS Brief) (2010).

Developed an inclusive approach that incorporates physical, social, organizational, economic, and environmental variables in addition to empirical measurements and operationalization of resilience and vulnerability. The objective was to help improve the understanding and management of risk associated with threats to complex infrastructure systems. The framework uses network theory, model-based vulnerability analysis, and reliability theory (fault tree analysis). 

Services Accountability Improvement System (SAIS) (2009 to 2011).

Developed and maintained data warehouses for the SAIS system. SAIS is a service of the Substance Abuse and Mental Health Services Administration (SAMHSA). SAIS is intended for use by the Center for Substance Abuse Treatment’s (CSAT’s) Discretionary Services and Best Practices grantees, and by SAMHSA and CSAT staff. SAIS was developed as part of the effort mandated by the Government Performance and Results Act (GPRA) of 1993. GPRA is intended to increase program effectiveness and public accountability by promoting a focus on results, service quality, and customer satisfaction. Led the monitoring of the SAIS IT infrastructure through monitoring systems he helped developed. Wrote several standard operating procedure (SOP) manuals including the SOP to conduct the monitoring activities. Performed a variety of activities to improve the SAIS SQL server production database.  

Violent Intent Modeling and Simulation (VIMS) (2009).

Applied agent-based modeling and cellular automata concepts to create a prototype model and simulator based on published literature regarding Civil Violence. VIMS was conceived by the Human Factors/Behavioral Sciences Division of the U.S. Department of Homeland Security’s Science and Technology Directorate. The VIMS project team developed social science models as the core of an analytic decision support tool to interpret the motivations and behaviors of violent groups and identify factors indicating that a group may engage in ideologically motivated violent activity. This task generated a report for the VIMS project.

Modeling the Effectiveness of Hepatitis Vaccination When Accounting for Transmission Dynamics (2008).

Developed a compartmental susceptible-exposed-infected-recovered model in MATLAB to study the effectiveness of hepatitis A vaccination. Helped analyze the results from the dynamic transmission model that accounts for natural declines in force of infection, foreign sources of infection, and vaccination coverage rates.

Economic Issues in Seasonal Influenza Vaccination (2006 to 2007).

Developed a model that estimates the likelihood that the influenza vaccination will result in positive net benefits for several specific population subgroups. Used software that accounts for uncertainty and variability in the impacts of influenza and influenza vaccination, both across population subgroups and from one season to the next. A key feature of this Monte Carlo–style simulation model is its ability to use information on a range of possible values for influenza severity and vaccine effectiveness to calculate a range of possible economic impacts and the likelihood of occurrence for each, i.e., a distribution of possible outcomes. This feature is important because influenza severity and vaccine effectiveness are usually unknown early in the influenza season when policymakers may be called upon to provide guidance on priority populations for vaccination.

Models for Infectious Disease Agent Study (MIDAS) (2005 to 2010).

Reviewed and analyzed existing models for the spread of infectious diseases. MIDAS was funded by the National Institute of General Medical Sciences to encourage development of infectious disease modeling to address a wide range of possible infectious agents, explore a variety of possible responses, and enhance the model interface, thereby making the modeling process understandable and accessible by nonscientists exploring health policy options. Developed information technology tools and performed modeling analyses. Developed methicillin-resistant Staphylococcus aureus (MRSA) agent-based models in collaboration with the University of Pittsburgh and the Harvard Medical School. Team leader in charge of maintaining and improving the MIDAS portal. Designed the ORACLE Ultrasearch-based search system and helped maintain the MIDAS Historic Data and Document Catalog.

Risks to Watershed Health from Wildfires in the Western United States (2005 to 2006).

Developed system to identify geographic areas at risk of catastrophic forest fires. The system called FORWARDWest is a suite of Java-based tools, which main goal is to give the end user the ability to weigh the different parameters to isolate areas of interest. The core of this suite of tools was the FORWARDWest slider toolbar. Once the sliders were set to the user’s preference, a 1:24k USGS quad tile layer was scored to indicate the geographic risks based on how the user has set the slider bars. A Java-based configuration program, WestWardHO gave the user the ability to introduce custom data to the FORWARDWest interface by modifying the underlying XML Configuration file.

Air Quality Modeling Decision Support Tool (2004 to 2005).

Developed an air-quality modeling decision support tool to help the city of Beijing, China, analyze and improve air quality before the 2008 Olympic Games. Designed management plans to improve air quality conditions. The decision support tool included an air emissions database and a set of small programs to link many existing air quality models designed by the U.S. Environmental Protection Agency (EPA).

Development of Integrated Water Quality Analyses for the Shared Waters of the United States and Mexico (2003 to 2005).

Designed and developed a database to collect and store water quality data from monitoring stations on the U.S.-Mexico Border. This project was funded by EPA’s Border 2012 Program. The Border 2012 goal was to reduce water contamination on the U.S.-Mexico border by collecting and analyzing water quality data. These assessments of significant shared and transboundary surface waters identified current water quality status and trends and helped the United States and Mexico formulate water resource management strategies to achieve, by 2012, a majority of water quality standards currently being exceeded in those waters. Used EPA’s STOrage and RETrieval system (STORET) water quality data dictionary and many EPA data standards. Performed analyses on the water quality data stored in the repository to determine water quality status and water quality trends on the U.S.-Mexico border. Coordinated a binational group of stakeholders who represented U.S. and Mexican states, federal and state agencies, and consortiums.

GUI Development for the Total Risk Integrated Methodology (TRIM) (2002 to 2005).

Developed a Java-based graphical user interface (GUI) as part of EPA’s TRIM project. The GUI helped users perform ecological hazard calculations for wildlife and species assemblages for spatially explicit areas of interest; calculated hazards for acute, sub-chronic, and chronic benchmarks; and calculated hazards for different endpoints. Inputs for this model were time series of annual concentrations in abiotic media and time series of average daily doses for biota.

Data Mining and Analysis Tool Development (2002 to 2004).

Developed a Web-based data mining and analysis tool to present environmental project results more efficiently. The main purpose of this tool was to replace large amounts of printed data tables that needed to be analyzed, summarized, and delivered to the client. The back-end of the tool was an Oracle database, and the front-end was a combination of JSP pages and applets to provide the user with a GUI to perform analyses. With this tool, data could be made available to the client in a format that was engaging and easy to understand; users could ask follow-up questions rather than searching through the hardcopy data tables; clients could view tables and graphics that were useful in decision making; and clients could present results and benefits of a project to upper management via the easily accessible Web site.

Programming in Support of EPA Reach Indexing Projects (2001 to 2005).

Developed an ORACLE PL/SQL package as a stored procedure and used object-oriented approaches to handle batch indexing jobs for the EPA’s Surface Water Reach Indexing tool (WebRIT). The package communicated and integrated with other packages written to handle different functionalities for the WebRIT. Used the batch-indexing package to perform several indexing jobs, including indexing of combined Sewer Overflow data, Drinking Water Initiative data, and Clean Watershed Needs Survey data. Wrote a series of small functions in Oracle Spatial to manipulate georeferenced data related to water sources for the EPA Total WATERS Project. This project created summaries of total miles of waterbodies by state and by waterbody type.