The UBC/SFU Joint Statistics Seminar is jointly hosted by the graduate students of the UBC Department of Statistics and the SFU Department of Statistics and Actuarial Science. The Spring 2023 event is the second of two events taking place in the 2022/2023 academic year. The Fall 2022 event was organized by graduate students from SFU, and the Spring 2023 event is organized by graduate students from UBC. Over its 18-year history, the event has offered Statistics and Actuarial Science graduate and undergraduate students at both schools an opportunity to network with their peers and to attend accessible talks about the research work of their fellow students and faculty.

The Spring 2023 event includes talks given by six students (three from UBC and three from SFU) and one faculty member from UBC.

Check out more events hosted by the UBC Statistics Graduate Student Association.


This term’s event will be hosted in-person at SFU Harbour Centre on March 4, 2023. The event starts at 10:00 am. Register now through the registration form! If you are interested in presenting, please contact Nikola, Chloe, or Naitong.



10:00am - 10:30am

Welcome Message

10:30am - 10:35am

Lily Xia (UBC)

10:35am - 11:00am

Measuring the discriminatory performance of algorithms in predicting treatment benefit

Prediction algorithms that quantify the expected benefit of a given treatment conditional on patient characteristics can critically inform medical decisions. Quantifying the performance of treatment benefit prediction algorithms is an active area of research. Algorithm evaluation can be categorized into discrimination, calibration and clinical utility. A recently proposed metric, the concordance statistic for benefit (cfb), evaluates the discriminative ability of a treatment benefit predictor by directly extending the concept of the concordance statistic from a risk model with a binary outcome to a model for treatment benefit. We show our methodological concerns about such metrics on multiple fronts and provide an alternative metric for the discriminatory performance of treatment benefit predictors.

William Ruth (SFU)

11:00am - 11:25am

A Monte Carlo EM Analysis of COVID-19 Outbreaks in Long-Term Healthcare Facilities

We analyze a dataset of daily case counts from COVID-19 outbreaks in long-term care facilities across BC, Canada. We treat infection durations as missing data and apply the EM algorithm to estimate the probability of transmission and mean infection duration, as well as a third parameter governing the effectiveness of outbreak management. We use the Monte Carlo EM (MCEM) algorithm to approximate difficult conditional expectations with Monte Carlo integrals. Importance sampling is used to generate the necessary Monte Carlo samples. We propose a novel stopping rule for the MCEM algorithm based on a well-known relationship between the score vectors of the observed and complete data models.

Ning Shen (UBC)

11:25am - 11:50am

Probabilistic Modeling of Single-cell Methylation Heterogeneity at Site-level Resolution

DNA methylation is a heritable chemical modification that can occur on specific sites in the human genome. Single-cell bisulfite sequencing enables measurement of methylation in individual cells and help finding inter-cellular methylation patterns. However, the data output is often so sparse and noisy, with typically a >90% missing rate. Thus, ‘feature selection’ becomes an essential step in the process of analyzing single-cell methylation data. Specifically, the ‘features’ refers to regions of the genome with highly variable methylation states in different cells. We propose a statistical framework to thoroughly search for these features with a genome-wide scan in a flexible manner that does not require prior knowledge of their sizes or location contexts. Based on simulations and case studies, our analytical scheme has shown improvements in cell-type clustering performance and provided interesting biological explorability.


11:50am - 12:50pm

Matt Berkowitz (SFU)

12:50pm - 1:15pm

Random survival forests: which methods work best and under what conditions?

There have been few systematic comparisons of how best to build survival trees and forests. An important question to answer in the literature is as follows: What are the best methods for constructing forests using survival data, as evaluated by a predefined metric of error? Our investigation systematically investigates various factors – forest construction method, censoring, sample size, the distributions of the response and predictor variables, and the presence of confounding or noisy predictors – that influence survival forest performance via an extensive simulation study. We find that all these factors, as well as many of their interactions, have significant effects on estimating survival functions and predicting survival times. We make recommendations for which methods to use depending on the data settings one is dealing with.

Menglin Zhou (UBC)

1:15pm - 1:40pm

Reverse stress testing and multivariate extremes

Reverse stress testing of a financial portfolio aims to identify scenarios for risk factors that lead to a specified adverse portfolio outcome, typically a portfolio loss of a given magnitude. The stress scenarios of interest naturally need to be probable yet extreme. In order to capture movements of risk factors that result in large portfolio losses, we propose a method to estimate stress scenarios using extrapolation based on techniques from multivariate extreme value theory. Such a method effectively addresses data scarcity in the joint tail regions while allowing for more flexible model assumptions focused on extremes. We study the asymptotic behaviour of the proposed estimator, investigate its finite-sample performance in simulation studies and apply it to real data in a case study.

Hashan Peiris (SFU)

1:40pm - 2:05pm

Integration of traditional and telematics data for efficient insurance claims prediction

Driver telematics has gained attention for risk classification in auto insurance which can surpass the use of merely traditional data in benefits. As there is a discernible difference in dimensions between traditional and telematics data sets, the scarcity of observations with telematics features has become problematic, which could be owing to either privacy concern or adverse selection compared to the data points with traditional features. To handle this issue, we propose a data integration technique based on calibration weights. It is shown that the proposed technique can efficiently integrate the so-called traditional data and telematics data than some benchmark models. Moreover, it copes with possible adverse selection issues on the availability of telematics data. Our findings are supported by a simulation study and empirical analysis in a synthetic telematics data set.


2:05pm - 2:15pm

Faculty Speaker: Dr. Lucy Gao

2:15pm - 3:15pm

Building a Professional Online Presence

I will discuss my thoughts and experiences on how and why to build an online presence as a professional or academic statistician. This will include how to get started if you're currently completely offline, and how to grow your visibility if you are already online.

Link to the slides.

Networking and Drinks at Rogue!



Past Seminars

| Fall 2022 | Spring 2022 | Fall 2021 | Spring 2021 | Fall 2020 | Spring 2020 |