Department of Mathematics and Statistics

Department of Mathematics and Statistics
Department of Mathematics and Statistics
Subscribe to RSS - Joint Seminars in Statistics & Biostatistics

Joint Seminars in Statistics & Biostatistics

Statistics & Biostatistics - Bang Liu (University of Montreal)

Thursday, April 1st, 2021

Time: 4:00pm Place:

Speaker: Bang Liu (University of Montreal)

Title: Data, Knowledge, and Logic: Modeling and Reasoning for Natural Language Understanding.

Abstract: Existing deep learning-based techniques for different NLU tasks are mostly data-intensive and domain-sensitive. However, creating large-amount and high-quality training datasets for NLU tasks, e.g., question answering, are both expensive and time-consuming. In this talk, we will introduce our research on data generation, knowledge expansion, and reasoning over graphs. Specifically, for data augmentation, we generate large-scale and high-quality question-answer pairs from unlabeled text to augment the training data for question answering. For knowledge expansion, we create and expand an ontology with newly discovered concepts or entities to capture the emerging knowledge in the world and keep the ontology dynamically updated. For reasoning over graphs, we propose a reinforcement learning-based relational reasoning framework, R5, that reasons over relational data and learns the underlying compositional logical rules. Our long-term vision is to design low-resource, knowledge-empowered, and transferable NLU systems and apply them to different domains.

Statistics & Biostatistics - Wen Teng (Queen's University)

Thursday, March 25th, 2021

Time: 4:00pm Place:

Speaker: Wen Teng (Queen's University)

Title: A non-parametric simultaneous confidence band for biomarker effect on the restricted mean survival time.

Abstract: Study of prognostic and predictive biomarkers play an important role in the design and analysis of clinical trials. The Cox proportional hazards model is often used to study the biomarker main effect and the treatment-biomarker interaction effect for survival data. The estimated effects can be biased if the proportional hazards assumption is violated. The restricted mean survival time is becoming popular in clinical studies for having a clear intuitive interpretation. In this paper, we first propose non-parametric methods to make statistical inference for the one-sample problem of the biomarker effect on the restricted mean survival time; we then extend the methods to the two-sample problem for studying the difference in the biomarker effects between samples, or treatment groups in clinical trials. For a given biomarker, the restricted mean survival time is estimated by kernel smoothing methods adjusted by the inverse probability of censoring weight. We prove the consistency for the estimates and develop simultaneous confidence bands for the biomarker effects on the restricted mean survival time. The simultaneous confidence bands were evaluated in extensive simulation studies and were found to have good finite sample performance. We then apply the proposed methods to a breast cancer study con-ducted by the Breast International Group (BIG) to illustrate how the Ki67 biomarker affects the survival time of patients, compared between the treatment groups.

Statistics & Biostatistics - Wei Tu (Queen's University)

Thursday, March 18th, 2021

Time: 4:00pm Place:

Speaker: Wei Tu (Queen's University)

Title: Differential privacy in health data

Abstract: The protection of individual patient privacy is essential in health care research. Privacy-protecting data analysis has a long history under the name of “statistical disclosure control” in statistics. Differential privacy, emerging from the theoretical computer science literature, has become popular over the last decade due to its intuitive formulation and formal privacy guarantee, and is at its early stages of implementation in industry, government and academia. In this talk, I will introduce the framework of differential privacy and present a few applications in health research. Specifically, a differentially private Kaplan-Meier estimate using the recently proposed Gaussian differential privacy framework will be presented, as well as differential private learning in training clinical prediction task using EHR and medical imaging data.

Statistics & Biostatistics - Zihang Lu (Queen's University)

Thursday, March 11th, 2021

Time: 4:00pm Place:

Speaker: Zihang Lu (Queen's University)

Title: Bayesian Consensus Clustering for Multivariate Longitudinal Data

Abstract: In many clinical studies, there is a growing interest in studying the heterogeneity of longitudinal markers to identify subgroups of the study population. Compared to clustering a single longitudinal marker, simultaneously clustering multiple longitudinal markers allows additional information to be incorporated into the modeling process, which generates deeper biological insight and clinical significance. In this seminar talk, I will discuss a newly proposed Bayesian consensus clustering model for multivariate longitudinal data. The proposed model allows each marker to follow a marker-specific (local) clustering and these local clusterings adhere to a global (consensus) clustering. To estimate the posterior distribution of model parameters, a Gibbs sampling algorithm is proposed. Results of analyzing real and simulated data will be presented and discussed.

Statistics & Biostatistics - Brian Ling (Queen's University)

Tuesday, November 24th, 2020

Time: 1:00pm Place:

Speaker: Brian Ling (Queen's University)

Title: Nonparametric estimation for cross-sectional data

Abstract: Current duration data collected from cross-sectional sampling provides an alternative opportunity to estimate the survival function of the variable of interest in addition to prospective and retrospective designs. In this talk, we will first review the connection of current duration data with shape-constrained inference. While different nonparametric estimators for the underlying survival function have been proposed in the literature, some of their theoretical properties are missing. We will establish the asymptotic distributions of these estimators. Semiparametric models will also be discussed.

Statistics & Biostatistics - Fei Xue (University of Pennsylvania)

Tuesday, November 17th, 2020

Time: 1:00pm Place:

Speaker: Fei Xue (University of Pennsylvania)

Title: Multicategory Angle-based Learning for Estimating Optimal Dynamic Treatment Regimes with Censored Data

Abstract: An optimal dynamic treatment regime (DTR) consists of a sequence of decision rules in maximizing long-term benefits, which is applicable for chronic diseases such as HIV infection or cancer. In this paper, we develop a novel angle-based approach to search the optimal DTR under a multicategory treatment framework for survival data. The proposed method targets to maximize the conditional survival function of patients following a DTR. In contrast to most existing approaches which are designed to maximize the expected survival time under a binary treatment framework, the proposed method solves the multicategory treatment problem given multiple stages for censored data. Specifically, the proposed method obtains the optimal DTR via integrating estimations of decision rules at multiple stages into a single multicategory classification algorithm without imposing additional constraints, which is also more computationally efficient and robust. In theory, we establish Fisher consistency and provide the risk bound for the proposed estimator under regularity conditions. Our numerical studies show that the proposed method outperforms competing methods in terms of maximizing the conditional survival probability. We apply the proposed method to two real datasets: Framingham heart study data and acquired immunodeficiency syndrome (AIDS) clinical data.

Statistics & Biostatistics - Devon Lin (Queen’s University)

Tuesday, October 20th, 2020

Time: 1:00pm Place:

Speaker: Devon Lin (Queen’s University)

Title: Sequential designs for computer experiments with time series responses

Abstract: The recent accelerated growth in the computing power has generated popularization of experimentation with dynamic computer models in various physical and engineering applications. Despite the extensive statistical research in computer experiments, the bulk of the work had been on the theoretical and algorithmic innovations for computer models with scalar response. In this talk, we consider computer experiments with time series responses, called dynamic computer experiments, and focus on the prediction and inverse problem of such computer experiments. The sequential designs for addressing these problems are proposed. New expected improvement criteria are developed to choose follow-up points. The criteria use a proposed computationally efficient modeling and Bayesian inference approach to build emulators. In the inverse problem, we also propose a new criterion for extracting the optimal inverse solution from the final surrogate. Three simulated examples and the real-life two-delay blowfly (TDB computer simulator) have been used to demonstrate higher accuracy of the proposed approach as compared to popular existing techniques. (Joint work with Dr. Ru Zhang and Dr. Pritam Ranjan.)

Statistics & Biostatistics - Xiang Li (Queen's University)

Wednesday, February 26th, 2020

Time: 11:30-12:20 Place: Jeffery Hall 225

Speaker: Prof. Xiang Li (Queen's University, Dept. of Chemical Engineering)

Title: Decomposition Based Global Optimization

Abstract: Large-scale nonconvex optimization arises from a variety of scientific and engineering problems. Often such optimization problem is simplified into an easier convex or mixed-integer convex optimization problem, but the solution of the simplified problem is unlikely to be optimal or feasible for the original problem. Recent advances in decomposition based global optimization provides a promising way to solve large-scale nonconvex optimization problems within reason time. In this presentation, we will first discuss the principle of generalized Benders decomposition (GBD), including the reformulation into a master problem using strong Lagrangian duality, the construction of upper and lower bounding problems, and the finite convergence property. We also show how GBD can be applied to decompose multi-scenario problems. Then we introduce two variants of GBD. The first variant, called nonconvex generalized Benders decomposition (NGBD), is able to solve a class of nonconvex problems that GBD cannot solve. The second variant, called joint decomposition (JD), enhances GBD/NGBD via the integration of Lagrangian decomposition. Finally, we demonstrate the computational advantages of GBD, NGBD and JD via some engineering problems.

Statistics & Biostatistics - Qingling Duan (Queen's University)

Wednesday, November 20th, 2019

Time: 11:30-12:30 Place: Jeffery Hall 225

Speaker: Qingling Duan (Queen's National Scholar in Bioinformatics, School of Computing and Dept. of Biomedical & Molecular Sciences, Queen's University)

Title: Statistical methods for the study of genomic risk factors of complex traits

Abstract: The overarching goal of my research program is to identify and characterize genomic factors that modulate multifactorial traits such as drug response, allergies and asthma. My team leads the collection and analysis of multiple types of ‘omics (i.e. genomics, transcriptomics, epigenomics and metagenomics) datasets from human cohorts. Specifically, we hypothesize that gene-gene and gene-environment interactions account in part for the missing heritability of complex traits. We test this using additive and multiplicative models in addition to network analysis and data integration to characterize novel biological pathways and underlying disease mechanisms. For example, we have identified main and interaction effects of genetic variants and environmental exposures (e.g., smoking, dog ownership, breastfeeding) on risk of early childhood asthma. In addition, we report novel gene networks associated with risk of asthma and response to chemotherapy among cancer patients. I am a lead investigator of the Canadian Healthy Infant Longitudinal Development (CHILD) cohort study and the Canadian Respiratory Research Network which supports the Canadian Chronic Obstructive Lung Disease (CanCOLD). My laboratory is currently funded by the Canadian Institute of Health Research, Queen’s University and the Canadian Foundation for Innovation.

Statistics & Biostatistics - Yanglei Song (Queen's University)

Wednesday, November 6th, 2019

Time: 11:30-12:30 Place: Jeffery Hall 225

Speaker: Yanglei Song (Queen's University)

Abstract: I will start with a discussion on the recent development in the normal approximation (a.k.a. central limit theorem) and bootstrap for the sum of high dimensional random vectors, empirical processes and U-processes. Statistical applications will also be provided. Then I will talk about a piece of ongoing work in this line, with Xiaohui Chen and Kengo Kato. Its abstract is as follows: This paper studies the non-asymptotic inference for the supremum of an incomplete, non-degenerate U-process. The process is indexed by a function class of order r, whose complexity possibly increases with the sample size n. For each function, its corresponding U-statistic involves the average of O(n^r) numbers, which is prohibitively demanding even for moderate r. Thus we study its incomplete version, where each subsample of size r is included in the average with a very small probability. We first approximate the supremum of such incomplete U-process by that of an appropriate Gaussian process in the Kolmogorov distance and then propose valid bootstrap methods to address the practical issue of unknown covariance function. Finally, we discuss its application in testing the qualitative features, such as convexity, of nonparametric functions.