0
0.0

Aug 19, 2021
08/21

by
Vivian Lai; Chenhao Tan; Han Liu

data

#
eye 0

#
favorite 0

#
comment 0

While AI has demonstrated impressive performance in various tasks, humans cannot effectively work with them through current explanations. In this study, we experiment on three factors that might improve human and AI performance, types of explanations in training and prediction phase, and subdomain test sets. Unlike static explanations, interactive explanations allow humans to learn about the model through an active learning process. We also argue that improvement in human and AI team...

Source: https://osf.io/kpvrw/

2
2.0

Jun 30, 2018
06/18

by
Fang Han; Han Liu

texts

#
eye 2

#
favorite 0

#
comment 0

We propose a new high dimensional semiparametric principal component analysis (PCA) method, named Copula Component Analysis (COCA). The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. COCA improves upon PCA and sparse PCA in three aspects: (i) It is robust to modeling assumptions; (ii) It is robust to outliers and data contamination; (iii) It is scale-invariant and yields more interpretable results. We prove...

Topics: Machine Learning, Statistics

Source: http://arxiv.org/abs/1402.4507

2
2.0

Jun 30, 2018
06/18

by
Jianqing Fan; Han Liu; Yang Ning; Hui Zou

texts

#
eye 2

#
favorite 0

#
comment 0

Graphical models are commonly used tools for modeling multivariate random variables. While there exist many convenient multivariate distributions such as Gaussian distribution for continuous data, mixed data with the presence of discrete variables or a combination of both continuous and discrete variables poses new challenges in statistical modeling. In this paper, we propose a semiparametric model named latent Gaussian copula model for binary and mixed data. The observed binary data are...

Topics: Machine Learning, Statistics

Source: http://arxiv.org/abs/1404.7236

5
5.0

Jun 29, 2018
06/18

by
Xingguo Li; Zhaoran Wang; Junwei Lu; Raman Arora; Jarvis Haupt; Han Liu; Tuo Zhao

texts

#
eye 5

#
favorite 0

#
comment 0

We propose a general theory for studying the geometry of nonconvex objective functions with underlying symmetric structures. In specific, we characterize the locations of stationary points and the null space of the associated Hessian matrices via the lens of invariant groups. As a major motivating example, we apply the proposed general theory to characterize the global geometry of the low-rank matrix factorization problem. In particular, we illustrate how the rotational symmetry group gives...

Topics: Machine Learning, Mathematics, Optimization and Control, Statistics, Learning, Computing Research...

Source: http://arxiv.org/abs/1612.09296

4
4.0

Jun 30, 2018
06/18

by
Tianqi Zhao; Guang Cheng; Han Liu

texts

#
eye 4

#
favorite 0

#
comment 0

We consider a partially linear framework for modelling massive heterogeneous data. The major goal is to extract common features across all sub-populations while exploring heterogeneity of each sub-population. In particular, we propose an aggregation type estimator for the commonality parameter that possesses the (non-asymptotic) minimax optimal bound and asymptotic distribution as if there were no heterogeneity. This oracular result holds when the number of sub-populations does not grow too...

Topics: Mathematics, Statistics Theory, Statistics

Source: http://arxiv.org/abs/1410.8570

69
69

Sep 18, 2013
09/13

by
John Lafferty; Han Liu; Larry Wasserman

texts

#
eye 69

#
favorite 0

#
comment 0

We present some nonparametric methods for graphical modeling. In the discrete case, where the data are binary or drawn from a finite alphabet, Markov random fields are already essentially nonparametric, since the cliques can take only a finite number of values. Continuous data are different. The Gaussian graphical model is the standard parametric model for continuous data, but it makes distributional assumptions that are often unrealistic. We discuss two approaches to building more flexible...

Source: http://arxiv.org/abs/1201.0794v2

52
52

Sep 18, 2013
09/13

by
Han Liu; Lie Wang

texts

#
eye 52

#
favorite 0

#
comment 0

We propose a new procedure for estimating high dimensional Gaussian graphical models. Our approach is asymptotically tuning-free and non-asymptotically tuning-insensitive: it requires very few efforts to choose the tuning parameter in finite sample settings. Computationally, our procedure is significantly faster than existing methods due to its tuning-insensitive property. Theoretically, the obtained estimator is simultaneously minimax optimal for precision matrix estimation under different...

Source: http://arxiv.org/abs/1209.2437v1

8
8.0

Jun 27, 2018
06/18

by
Jianqing Fan; Yuan Liao; Han Liu

texts

#
eye 8

#
favorite 0

#
comment 0

Estimating large covariance and precision matrices are fundamental in modern multivariate analysis. The problems arise from statistical analysis of large panel economics and finance data. The covariance matrix reveals marginal correlations between variables, while the precision matrix encodes conditional correlations between pairs of variables given the remaining variables. In this paper, we provide a selective review of several recent developments on estimating large covariance and precision...

Topics: Methodology, Statistics

Source: http://arxiv.org/abs/1504.02995

3
3.0

Jun 28, 2018
06/18

by
Zhuoran Yang; Zhaoran Wang; Han Liu; Yonina C. Eldar; Tong Zhang

texts

#
eye 3

#
favorite 0

#
comment 0

We study parameter estimation and asymptotic inference for sparse nonlinear regression. More specifically, we assume the data are given by $y = f( x^\top \beta^* ) + \epsilon$, where $f$ is nonlinear. To recover $\beta^*$, we propose an $\ell_1$-regularized least-squares estimator. Unlike classical linear regression, the corresponding optimization problem is nonconvex because of the nonlinearity of $f$. In spite of the nonconvexity, we prove that under mild conditions, every stationary point of...

Topics: Information Theory, Statistics, Machine Learning, Mathematics, Learning, Optimization and Control,...

Source: http://arxiv.org/abs/1511.04514

4
4.0

Jun 30, 2018
06/18

by
Fang Han; Huitong Qiu; Han Liu; Brian Caffo

texts

#
eye 4

#
favorite 0

#
comment 0

Statisticians and quantitative neuroscientists have actively promoted the use of independence relationships for investigating brain networks, genomic networks, and other measurement technologies. Estimation of these graphs depends on two steps. First is a feature extraction by summarizing measurements within a parcellation, regional or set definition to create nodes. Secondly, these summaries are then used to create a graph representing relationships of interest. In this manuscript we study the...

Topics: Statistics, Methodology

Source: http://arxiv.org/abs/1404.7547

2
2.0

Jun 30, 2018
06/18

by
Zhaoran Wang; Quanquan Gu; Yang Ning; Han Liu

texts

#
eye 2

#
favorite 0

#
comment 0

We provide a general theory of the expectation-maximization (EM) algorithm for inferring high dimensional latent variable models. In particular, we make two contributions: (i) For parameter estimation, we propose a novel high dimensional EM algorithm which naturally incorporates sparsity structure into parameter estimation. With an appropriate initialization, this algorithm converges at a geometric rate and attains an estimator with the (near-)optimal statistical rate of convergence. (ii) Based...

Topics: Machine Learning, Statistics

Source: http://arxiv.org/abs/1412.8729

3
3.0

Jun 29, 2018
06/18

by
Ethan X. Fang; Han Liu; Kim-Chuan Toh; Wen-Xin Zhou

texts

#
eye 3

#
favorite 0

#
comment 0

This paper studies the matrix completion problem under arbitrary sampling schemes. We propose a new estimator incorporating both max-norm and nuclear-norm regularization, based on which we can conduct efficient low-rank matrix recovery using a random subset of entries observed with additive noise under general non-uniform and unknown sampling distributions. This method significantly relaxes the uniform sampling assumption imposed for the widely used nuclear-norm penalized approach, and makes...

Topics: Machine Learning, Optimization and Control, Mathematics, Statistics

Source: http://arxiv.org/abs/1609.07664

3
3.0

Jun 30, 2018
06/18

by
Le Song; Han Liu; Ankur Parikh; Eric Xing

texts

#
eye 3

#
favorite 0

#
comment 0

Tree structured graphical models are powerful at expressing long range or hierarchical dependency among many variables, and have been widely applied in different areas of computer science and statistics. However, existing methods for parameter estimation, inference, and structure learning mainly rely on the Gaussian or discrete assumptions, which are restrictive under many applications. In this paper, we propose new nonparametric methods based on reproducing kernel Hilbert space embeddings of...

Topics: Machine Learning, Statistics

Source: http://arxiv.org/abs/1401.3940

5
5.0

Apr 5, 2021
04/21

by
Han, Liu; Lian-Xian, Han

texts

#
eye 5

#
favorite 0

#
comment 0

2
2.0

Aug 11, 2020
08/20

by
Tuo Zhao; Han Liu; Kathryn Roeder; John Lafferty; Larry Wasserman

texts

#
eye 2

#
favorite 0

#
comment 0

Source: http://academictorrents.com/details/b33984a3ffa7a931e34d2af393becbeda77c2bf1

2
2.0

Jun 29, 2018
06/18

by
Will Wei Sun; Zhaoran Wang; Xiang Lyu; Han Liu; Guang Cheng

texts

#
eye 2

#
favorite 0

#
comment 0

We consider the estimation and inference of sparse graphical models that characterize the dependency structure of high-dimensional tensor-valued data. To facilitate the estimation of the precision matrix corresponding to each way of the tensor, we assume the data follow a tensor normal distribution whose covariance has a Kronecker product structure. A critical challenge in the estimation and inference of this model is the fact that its penalized maximum likelihood estimation involves minimizing...

Topics: Machine Learning, Methodology, Statistics

Source: http://arxiv.org/abs/1609.04522

5
5.0

Jun 30, 2018
06/18

by
Yang Ning; Tianqi Zhao; Han Liu

texts

#
eye 5

#
favorite 0

#
comment 0

We propose a likelihood ratio based inferential framework for high dimensional semiparametric generalized linear models. This framework addresses a variety of challenging problems in high dimensional data analysis, including incomplete data, selection bias, and heterogeneous multitask learning. Our work has three main contributions. (i) We develop a regularized statistical chromatography approach to infer the parameter of interest under the proposed semiparametric generalized linear model...

Topics: Machine Learning, Statistics

Source: http://arxiv.org/abs/1412.2295

3
3.0

Jun 30, 2018
06/18

by
Zhaoran Wang; Huanran Lu; Han Liu

texts

#
eye 3

#
favorite 0

#
comment 0

Sparse principal component analysis (PCA) involves nonconvex optimization for which the global solution is hard to obtain. To address this issue, one popular approach is convex relaxation. However, such an approach may produce suboptimal estimators due to the relaxation effect. To optimally estimate sparse principal subspaces, we propose a two-stage computational framework named "tighten after relax": Within the 'relax' stage, we approximately solve a convex relaxation of sparse PCA...

Topics: Machine Learning, Computing Research Repository, Statistics, Learning

Source: http://arxiv.org/abs/1408.5352

3
3.0

Jun 30, 2018
06/18

by
Mengdi Wang; Ethan X. Fang; Han Liu

texts

#
eye 3

#
favorite 0

#
comment 0

Classical stochastic gradient methods are well suited for minimizing expected-value objective functions. However, they do not apply to the minimization of a nonlinear function involving expected values or a composition of two expected-value functions, i.e., problems of the form $\min_x \mathbf{E}_v [f_v\big(\mathbf{E}_w [g_w(x)]\big)]$. In order to solve this stochastic composition problem, we propose a class of stochastic compositional gradient descent (SCGD) algorithms that can be viewed as...

Topics: Machine Learning, Statistics

Source: http://arxiv.org/abs/1411.3803

3
3.0

Jun 30, 2018
06/18

by
Zhuoran Yang; Yang Ning; Han Liu

texts

#
eye 3

#
favorite 0

#
comment 0

We propose a new class of semiparametric exponential family graphical models for the analysis of high dimensional mixed data. Different from the existing mixed graphical models, we allow the nodewise conditional distributions to be semiparametric generalized linear models with unspecified base measure functions. Thus, one advantage of our method is that it is unnecessary to specify the type of each node and the method is more convenient to apply in practice. Under the proposed model, we...

Topics: Machine Learning, Statistics

Source: http://arxiv.org/abs/1412.8697

3
3.0

Jun 30, 2018
06/18

by
Tianqi Zhao; Mladen Kolar; Han Liu

texts

#
eye 3

#
favorite 0

#
comment 0

We propose a robust inferential procedure for assessing uncertainties of parameter estimation in high-dimensional linear models, where the dimension $p$ can grow exponentially fast with the sample size $n$. Our method combines the de-biasing technique with the composite quantile function to construct an estimator that is asymptotically normal. Hence it can be used to construct valid confidence intervals and conduct hypothesis tests. Our estimator is robust and does not require the existence of...

Topics: Machine Learning, Statistics

Source: http://arxiv.org/abs/1412.8724

2
2.0

Jun 29, 2018
06/18

by
Jeffrey Diller; Han Liu; Roland Roeder

texts

#
eye 2

#
favorite 0

#
comment 0

Let $f:\mathbb{CP}^2\dashrightarrow\mathbb{CP^2}$ be a rational map with algebraic and topological degrees both equal to $d\geq 2$. Little is known in general about the ergodic properties of such maps. We show here, however, that for an open set of automorphisms $T:\mathbb{CP}^2\to\mathbb{CP}^2$, the perturbed map $T\circ f$ admits exactly two ergodic measures of maximal entropy $\log d$, one of saddle and one of repelling type. Neither measure is supported in an algebraic curve, and $T\circ f$...

Topics: Complex Variables, Dynamical Systems, Mathematics

Source: http://arxiv.org/abs/1601.02226

10
10.0

Jun 28, 2018
06/18

by
Jianqing Fan; Han Liu; Qiang Sun; Tong Zhang

texts

#
eye 10

#
favorite 0

#
comment 0

We propose a computational framework named iterative local adaptive majorize-minimization (I-LAMM) to simultaneously control algorithmic complexity and statistical error when fitting high dimensional models. I-LAMM is a two-stage algorithmic implementation of the local linear approximation to a family of folded concave penalized quasi-likelihood. The first stage solves a convex program with a crude precision tolerance to obtain a coarse initial estimator, which is further refined in the second...

Topics: Statistics, Statistics Theory, Mathematics

Source: http://arxiv.org/abs/1507.01037

3
3.0

Jun 29, 2018
06/18

by
Jianqing Fan; Han Liu; Weichen Wang; Ziwei Zhu

texts

#
eye 3

#
favorite 0

#
comment 0

Heterogeneity is an unwanted variation when analyzing aggregated datasets from multiple sources. Though different methods have been proposed for heterogeneity adjustment, no systematic theory exists to justify these methods. In this work, we propose a generic framework named ALPHA (short for Adaptive Low-rank Principal Heterogeneity Adjustment) to model, estimate, and adjust heterogeneity from the original data. Once the heterogeneity is adjusted, we are able to remove the biases of batch...

Topics: Mathematics, Methodology, Statistics Theory, Statistics

Source: http://arxiv.org/abs/1602.05455

3
3.0

Jun 29, 2018
06/18

by
Matey Neykov; Junwei Lu; Han Liu

texts

#
eye 3

#
favorite 0

#
comment 0

We propose a new family of combinatorial inference problems for graphical models. Unlike classical statistical inference where the main interest is point estimation or parameter testing, combinatorial inference aims at testing the global structure of the underlying graph. Examples include testing the graph connectivity, the presence of a cycle of certain size, or the maximum degree of the graph. To begin with, we develop a unified theory for the fundamental limits of a large family of...

Topics: Machine Learning, Statistics, Statistics Theory, Mathematics

Source: http://arxiv.org/abs/1608.03045

3
3.0

Jun 29, 2018
06/18

by
Xingguo Li; Tuo Zhao; Raman Arora; Han Liu; Jarvis Haupt

texts

#
eye 3

#
favorite 0

#
comment 0

We propose a stochastic variance reduced optimization algorithm for solving sparse learning problems with cardinality constraints. Sufficient conditions are provided, under which the proposed algorithm enjoys strong linear convergence guarantees and optimal estimation accuracy in high dimensions. We further extend the proposed algorithm to an asynchronous parallel variant with a near linear speedup. Numerical experiments demonstrate the efficiency of our algorithm in terms of both parameter...

Topics: Machine Learning, Mathematics, Optimization and Control, Statistics, Learning, Computing Research...

Source: http://arxiv.org/abs/1605.02711

2
2.0

Jun 28, 2018
06/18

by
Junwei Lu; Mladen Kolar; Han Liu

texts

#
eye 2

#
favorite 0

#
comment 0

We propose a novel class of dynamic nonparanormal graphical models, which allows us to model high dimensional heavy-tailed systems and the evolution of their latent network structures. Under this model we develop statistical tests for presence of edges both locally at a fixed index value and globally over a range of values. The tests are developed for a high-dimensional regime, are robust to model selection mistakes and do not require commonly assumed minimum signal strength. The testing...

Topics: Statistics, Machine Learning

Source: http://arxiv.org/abs/1512.08298

3
3.0

Jun 30, 2018
06/18

by
Haotian Pang; Tuo Zhao; Robert Vanderbei; Han Liu

texts

#
eye 3

#
favorite 0

#
comment 0

High dimensional sparse learning has imposed a great computational challenge to large scale data analysis. In this paper, we are interested in a broad class of sparse learning approaches formulated as linear programs parametrized by a {\em regularization factor}, and solve them by the parametric simplex method (PSM). Our parametric simplex method offers significant advantages over other competing methods: (1) PSM naturally obtains the complete solution path for all values of the regularization...

Topics: Learning, Optimization and Control, Computing Research Repository, Machine Learning, Statistics,...

Source: http://arxiv.org/abs/1704.01079

4
4.0

Jun 29, 2018
06/18

by
Xingguo Li; Tuo Zhao; Raman Arora; Han Liu; Mingyi Hong

texts

#
eye 4

#
favorite 0

#
comment 0

The cyclic block coordinate descent-type (CBCD-type) methods, which performs iterative updates for a few coordinates (a block) simultaneously throughout the procedure, have shown remarkable computational performance for solving strongly convex minimization problems. Typical applications include many popular statistical machine learning methods such as elastic-net regression, ridge penalized logistic regression, and sparse additive regression. Existing optimization literature has shown that for...

Topics: Machine Learning, Mathematics, Optimization and Control, Learning, Statistics, Computing Research...

Source: http://arxiv.org/abs/1607.02793

2
2.0

Jun 29, 2018
06/18

by
Xingguo Li; Jarvis Haupt; Raman Arora; Han Liu; Mingyi Hong; Tuo Zhao

texts

#
eye 2

#
favorite 0

#
comment 0

Many statistical machine learning techniques sacrifice convenient computational structures to gain estimation robustness and modeling flexibility. In this paper, we study this fundamental tradeoff through a SQRT-Lasso problem for sparse linear regression and sparse precision matrix estimation in high dimensions. We explain how novel optimization techniques help address these computational challenges. Namely, we propose a pathwise iterative smoothing shrinkage thresholding algorithm for solving...

Topics: Machine Learning, Mathematics, Optimization and Control, Statistics, Learning, Computing Research...

Source: http://arxiv.org/abs/1605.07950

3
3.0

Jun 28, 2018
06/18

by
Matey Neykov; Yang Ning; Jun S. Liu; Han Liu

texts

#
eye 3

#
favorite 0

#
comment 0

We propose a new inferential framework for constructing confidence regions and testing hypotheses in statistical models specified by a system of high dimensional estimating equations. We construct an influence function by projecting the fitted estimating equations to a sparse direction obtained by solving a large-scale linear program. Our main theoretical contribution is to establish a unified Z-estimation theory of confidence regions for high dimensional problems. Different from existing...

Topics: Statistics Theory, Methodology, Statistics, Machine Learning, Mathematics

Source: http://arxiv.org/abs/1510.08986

40
40

Sep 21, 2013
09/13

by
Yu-Han Liu

texts

#
eye 40

#
favorite 0

#
comment 0

A different proof to a known criterion of derived equivalence implying birationality is given. Derived equivalent smooth projective curves over an algebraically closed field are proved to be isomorphic. A different proof of derived equivalence implying birationality for varieties of general type (originally due to Kawamata) is given.

Source: http://arxiv.org/abs/1108.2026v1

11
11

Jun 27, 2018
06/18

by
Junwei Lu; Mladen Kolar; Han Liu

texts

#
eye 11

#
favorite 0

#
comment 0

We propose a novel high dimensional nonparametric model named ATLAS which naturally generlizes the sparse additive model. Given a covariate of interest $X_j$, the ATLAS model assumes the mean function can be locally approximated by a sparse additive function whose sparsity pattern may vary from the global perspective. We propose to infer the marginal influence function $f_j^*(z) = \mathbb{E}[f(X_1,\ldots, X_d) \mid X_j = z]$ using a new kernel-sieve approach that combines the local kernel...

Topics: Machine Learning, Statistics, Mathematics, Statistics Theory

Source: http://arxiv.org/abs/1503.02978

4
4.0

Jun 29, 2018
06/18

by
Junwei Lu; Guang Cheng; Han Liu

texts

#
eye 4

#
favorite 0

#
comment 0

A massive dataset often consists of a growing number of (potentially) heterogeneous sub-populations. This paper is concerned about testing various forms of heterogeneity arising from massive data. In a general nonparametric framework, a set of testing procedures are designed to accommodate a growing number of sub-populations, denoted as $s$, with computational feasibility. In theory, their null limit distributions are derived as being nearly Chi-square with diverging degrees of freedom as long...

Topics: Statistics, Statistics Theory, Mathematics

Source: http://arxiv.org/abs/1601.06212

5
5.0

Jun 29, 2018
06/18

by
Kean Ming Tan; Zhaoran Wang; Han Liu; Tong Zhang

texts

#
eye 5

#
favorite 0

#
comment 0

Sparse generalized eigenvalue problem plays a pivotal role in a large family of high-dimensional learning tasks, including sparse Fisher's discriminant analysis, canonical correlation analysis, and sufficient dimension reduction. However, the theory of sparse generalized eigenvalue problem remains largely unexplored. In this paper, we exploit a non-convex optimization perspective to study this problem. In particular, we propose the truncated Rayleigh flow method (Rifle) to estimate the leading...

Topics: Machine Learning, Statistics

Source: http://arxiv.org/abs/1604.08697

7
7.0

Feb 22, 2020
02/20

by
Hong-Feng Bai; Liang-Bi Chen; Ke-Ming Liu; Lin-Han Liu

texts

#
eye 7

#
favorite 0

#
comment 0

15
15

Jun 27, 2018
06/18

by
Xinyang Yi; Zhaoran Wang; Constantine Caramanis; Han Liu

texts

#
eye 15

#
favorite 0

#
comment 0

Linear regression studies the problem of estimating a model parameter $\beta^* \in \mathbb{R}^p$, from $n$ observations $\{(y_i,\mathbf{x}_i)\}_{i=1}^n$ from linear model $y_i = \langle \mathbf{x}_i,\beta^* \rangle + \epsilon_i$. We consider a significant generalization in which the relationship between $\langle \mathbf{x}_i,\beta^* \rangle$ and $y_i$ is noisy, quantized to a single bit, potentially nonlinear, noninvertible, as well as unknown. This model is known as the single-index model in...

Topics: Statistics, Information Theory, Computing Research Repository, Machine Learning, Mathematics

Source: http://arxiv.org/abs/1505.03257

4
4.0

Jun 30, 2018
06/18

by
Yang Ning; Han Liu

texts

#
eye 4

#
favorite 0

#
comment 0

We consider the problem of uncertainty assessment for low dimensional components in high dimensional models. Specifically, we propose a decorrelated score function to handle the impact of high dimensional nuisance parameters. We consider both hypothesis tests and confidence regions for generic penalized M-estimators. Unlike most existing inferential methods which are tailored for individual models, our approach provides a general framework for high dimensional inference and is applicable to a...

Topics: Machine Learning, Statistics

Source: http://arxiv.org/abs/1412.8765

2
2.0

Jul 22, 2021
07/21

by
Yu-Han, Liu; Ting-Ting, Liang; Jia-Jia, Zhao; Wei-Ning, Cheng; Ke-Yan, Zhu

texts

#
eye 2

#
favorite 0

#
comment 0

4
4.0

Jun 30, 2018
06/18

by
Ethan X. Fang; Yang Ning; Han Liu

texts

#
eye 4

#
favorite 0

#
comment 0

This paper proposes a decorrelation-based approach to test hypotheses and construct confidence intervals for the low dimensional component of high dimensional proportional hazards models. Motivated by the geometric projection principle, we propose new decorrelated score, Wald and partial likelihood ratio statistics. Without assuming model selection consistency, we prove the asymptotic normality of these test statistics, establish their semiparametric optimality. We also develop new procedures...

Topics: Machine Learning, Mathematics, Statistics, Statistics Theory

Source: http://arxiv.org/abs/1412.5158

30
30

Sep 23, 2013
09/13

by
Han Liu; John Lafferty; Larry Wasserman

texts

#
eye 30

#
favorite 0

#
comment 0

Recent methods for estimating sparse undirected graphs for real-valued data in high dimensional problems rely heavily on the assumption of normality. We show how to use a semiparametric Gaussian copula--or "nonparanormal"--for high dimensional inference. Just as additive models extend linear models by replacing linear functions with a set of one-dimensional smooth functions, the nonparanormal extends the normal by transforming the variables by smooth functions. We derive a method for...

Source: http://arxiv.org/abs/0903.0649v1

13
13

Jun 27, 2018
06/18

by
Junwei Lu; Han Liu

texts

#
eye 13

#
favorite 0

#
comment 0

We consider the problem of estimating undirected triangle-free graphs of high dimensional distributions. Triangle-free graphs form a rich graph family which allows arbitrary loopy structures but 3-cliques. For inferential tractability, we propose a graphical Fermat's principle to regularize the distribution family. Such principle enforces the existence of a distribution-dependent pseudo-metric such that any two nodes have a smaller distance than that of two other nodes who have a geodesic path...

Topics: Machine Learning, Statistics

Source: http://arxiv.org/abs/1504.06026

13
13

Jun 26, 2018
06/18

by
Jianqing Fan; Fang Han; Han Liu; Byron Vickers

texts

#
eye 13

#
favorite 0

#
comment 0

We propose a bootstrap-based robust high-confidence level upper bound (Robust H-CLUB) for assessing the risks of large portfolios. The proposed approach exploits rank-based and quantile-based estimators, and can be viewed as a robust extension of the H-CLUB method (Fan et al., 2015). Such an extension allows us to handle possibly misspecified models and heavy-tailed data. Under mixing conditions, we analyze the proposed approach and demonstrate its advantage over the H-CLUB. We further provide...

Topics: Quantitative Finance, Portfolio Management, Statistics, Statistics Theory, Mathematics

Source: http://arxiv.org/abs/1501.02382

50
50

Sep 22, 2013
09/13

by
Han Liu; Min Xu; Haijie Gu; Anupam Gupta; John Lafferty; Larry Wasserman

texts

#
eye 50

#
favorite 0

#
comment 0

We study graph estimation and density estimation in high dimensions, using a family of density estimators based on forest structured undirected graphical models. For density estimation, we do not assume the true distribution corresponds to a forest; rather, we form kernel density estimates of the bivariate and univariate marginals, and apply Kruskal's algorithm to estimate the optimal forest on held out data. We prove an oracle inequality on the excess risk of the resulting estimator relative...

Source: http://arxiv.org/abs/1001.1557v2

19
19

Jun 27, 2020
06/20

by
Jian-Hong Xia; Han-Lin Wu; Chen-Hong Li; Yuan-Qi Wu; Su-Han Liu

texts

#
eye 19

#
favorite 0

#
comment 0

52
52

Sep 20, 2013
09/13

by
Han Liu

texts

#
eye 52

#
favorite 0

#
comment 0

The PHENIX experiment has measured transverse single spin asymmetry of J/$\Psi$ in polarized p+p collisions at forward rapidity at $\sqrt{s}=200$ GeV. The data were collected from year 2006 run of RHIC with average beam polarization of 56%. At RHIC energy, heavy quark production is dominated by gluon gluon interaction. Therefore, the transverse single spin asymmetry in J/$\Psi$ production can provide a clean measurement of the gluon Sivers distribution function.

Source: http://arxiv.org/abs/hep-ex/0612033v1

8
8.0

Jun 28, 2018
06/18

by
Zhaoran Wang; Quanquan Gu; Han Liu

texts

#
eye 8

#
favorite 0

#
comment 0

We study the fundamental tradeoffs between computational tractability and statistical accuracy for a general family of hypothesis testing problems with combinatorial structures. Based upon an oracle model of computation, which captures the interactions between algorithms and data, we establish a general lower bound that explicitly connects the minimum testing risk under computational budget constraints with the intrinsic probabilistic and combinatorial structures of statistical problems. This...

Topics: Statistics, Machine Learning

Source: http://arxiv.org/abs/1512.08861

14
14

Jun 27, 2018
06/18

by
Yan Li; Han Liu; Warren Powell

texts

#
eye 14

#
favorite 0

#
comment 0

We propose a sequential learning policy for noisy discrete global optimization and ranking and selection (R\&S) problems with high dimensional sparse belief functions, where there are hundreds or even thousands of features, but only a small portion of these features contain explanatory power. We aim to identify the sparsity pattern and select the best alternative before the finite budget is exhausted. We derive a knowledge gradient policy for sparse linear models (KGSpLin) with group Lasso...

Topics: Machine Learning, Statistics, Systems and Control, Information Theory, Mathematics, Computing...

Source: http://arxiv.org/abs/1503.05567

7
7.0

Jun 28, 2018
06/18

by
Heather Battey; Jianqing Fan; Han Liu; Junwei Lu; Ziwei Zhu

texts

#
eye 7

#
favorite 0

#
comment 0

This paper studies hypothesis testing and parameter estimation in the context of the divide and conquer algorithm. In a unified likelihood based framework, we propose new test statistics and point estimators obtained by aggregating various statistics from $k$ subsamples of size $n/k$, where $n$ is the sample size. In both low dimensional and high dimensional settings, we address the important question of how to choose $k$ as $n$ grows large, providing a theoretical upper bound on $k$ such that...

Topics: Statistics, Statistics Theory, Mathematics

Source: http://arxiv.org/abs/1509.05457

51
51

Sep 23, 2013
09/13

by
Yu-Han Liu

texts

#
eye 51

#
favorite 0

#
comment 0

We define and study the functorial spectrum for every triangulated tensor category. A reconstruction result for topologically noetherian schemes similar to (and based on) a theorem by Balmer is proved. An alternative proof of the reconstruction theorem by Bondal-Orlov for smooth projective varieties with ample (anti-)canonical bundles is given.

Source: http://arxiv.org/abs/1105.2197v2