Alan Weiss <aweiss@mathworks.com> wrote in message <nafb6l$mur$1@newscl01ah.mathworks.com>...
> On 2/22/2016 10:42 AM, someone wrote:
> > "Alessandro De Sanctis" wrote in message
> > <naf92r$if5$1@newscl01ah.mathworks.com>...
> >> Hello,
> >>
> >> I have to maximize a likelihood in which to every observation
> >> correspond a specific (non-integer) weight. In particular, I am
> >> referring to sampling weights, which denote the inverse of the
> >> probability that the observation is included in the sample.
> >>
> >> I tried by expanding the dataset (so that an observation with weight =
> >> 100 is repeated 100 times) but the dataset became extremely large and
> >> it's the second week that fminsearch is running.
> >>
> >> My ultimate goal would be to estimate a non-linear model with a binary
> >> dependent variable and weights to observations.
> >>
> >> Please any alternative idea on how to proceed is welcome. Thank you in
> >> advance.
> >> Alessandro
> >
> > To help us help you, can you show us a small snippet of your code? The
> > above description is pretty vague and doesn't give us much to go on.
>
> In particular, what is the mathematical form of your objective function,
> meaning the function you are trying to minimize? There is probably a
> shortcut that you can take in your function definition to account for
> weights, rather than adding new rows to the dataset.
>
> Also, fminsearch is not the fastest or most robust optimizer in
> Optimization Toolbox. You might do better to try fminunc, or another
> appropriate solver.
>
> Alan Weiss
> MATLAB mathematical toolbox documentation
Thanks, I'm now using fminunc. Moreover, I've just found a way to deal with adding rows which should save lots of time. I will now run this version of my program.
I try to be more clear. I am working on a dataset of N=60,000 observations. My teoretical model is in the form
y = b0 + b1 * A1(lambda_1,data) + b2 * A2(lambda_2,data) + controls * b + error
where y is a binary variable, A1 and A2 are functions of two parameters and data, and controls is a matrix (60,000 x 95) of regressors.
------------------------------------------------------------------------------------------------------
%%% 1) I load data and starting values (start_vals), and expand the dataset by rounding weights to the closest integer. The following command is new and I haven't tried it yet on the whole dataset, it will probably take hours. The final dataset will have dimension N = 134,985,980.
weights = round(data(:,end));
DATA = [];
for i = 1:length(weights)
DATAi = data(i,:);
DATAi = repmat(DATAi,weights(i),1);
DATA = [DATA; DATAi];
end
%%% 2) I run the optimization :
[MLE loglike_val] = fminunc(@(parameters) loglike2_complete(parameters,DATA),start_vals)
%%% Where loglike2_complete works as follows :
function L = loglike2_complete(param,DATA)
%%% 3.a) I compute A1 and A2 (following a teoretical model where A1 and A2 are a weighted sum -other weights- of elements in matrices R and C ) :
N = length(DATA);
A1 = zeros(N,1);
A2 = zeros(N,1);
age = controls(:,10);
for i = 1:N % elements of A1 and A2
% A1(lambda1)
agei = repmat(age(i),age(i)-1,1); % vector of age(i)
k = (1:age(i)-1)'; % numbers from 1 to age(i)-1
num = (agei-k).^lambda1;
den = sum((agei-k).^lambda1);
w = num ./ den;
Ri = R(i,~isnan(R(i,:))); % select only non missing values of R for every id
A1(i) = Ri * w;
% A2(lambda2)
num = (agei-k).^lambda2;
den = sum((agei-k).^lambda2);
w = num ./ den;
Ci = C(i,~isnan(R(i,:)));
A2(i) = Ci * w;
end
%%% 3.b) I write the model in the form y = X * beta, where beta is start_vals excluded lambda1 and lambda2 :
X = [ones(N,1) A1 A2 controls];
%%% 3.c) I compute the function I want to minimize :
L = -(sum(Y.*log(normcdf(Z*beta,0,1))) + sum((1-Y).*log(1-normcdf(Z*beta,0,1))));
------------------------------------------------------------------------------------------------------
I think there are faster ways to expand the dataset and to compute A1 and A2. But what I'd like to know is a way to deal with those weights without expanding the dataset (also because I'm rounding the weights).
P.S. I've not seen the output of this program yet.
> On 2/22/2016 10:42 AM, someone wrote:
> > "Alessandro De Sanctis" wrote in message
> > <naf92r$if5$1@newscl01ah.mathworks.com>...
> >> Hello,
> >>
> >> I have to maximize a likelihood in which to every observation
> >> correspond a specific (non-integer) weight. In particular, I am
> >> referring to sampling weights, which denote the inverse of the
> >> probability that the observation is included in the sample.
> >>
> >> I tried by expanding the dataset (so that an observation with weight =
> >> 100 is repeated 100 times) but the dataset became extremely large and
> >> it's the second week that fminsearch is running.
> >>
> >> My ultimate goal would be to estimate a non-linear model with a binary
> >> dependent variable and weights to observations.
> >>
> >> Please any alternative idea on how to proceed is welcome. Thank you in
> >> advance.
> >> Alessandro
> >
> > To help us help you, can you show us a small snippet of your code? The
> > above description is pretty vague and doesn't give us much to go on.
>
> In particular, what is the mathematical form of your objective function,
> meaning the function you are trying to minimize? There is probably a
> shortcut that you can take in your function definition to account for
> weights, rather than adding new rows to the dataset.
>
> Also, fminsearch is not the fastest or most robust optimizer in
> Optimization Toolbox. You might do better to try fminunc, or another
> appropriate solver.
>
> Alan Weiss
> MATLAB mathematical toolbox documentation
Thanks, I'm now using fminunc. Moreover, I've just found a way to deal with adding rows which should save lots of time. I will now run this version of my program.
I try to be more clear. I am working on a dataset of N=60,000 observations. My teoretical model is in the form
y = b0 + b1 * A1(lambda_1,data) + b2 * A2(lambda_2,data) + controls * b + error
where y is a binary variable, A1 and A2 are functions of two parameters and data, and controls is a matrix (60,000 x 95) of regressors.
------------------------------------------------------------------------------------------------------
%%% 1) I load data and starting values (start_vals), and expand the dataset by rounding weights to the closest integer. The following command is new and I haven't tried it yet on the whole dataset, it will probably take hours. The final dataset will have dimension N = 134,985,980.
weights = round(data(:,end));
DATA = [];
for i = 1:length(weights)
DATAi = data(i,:);
DATAi = repmat(DATAi,weights(i),1);
DATA = [DATA; DATAi];
end
%%% 2) I run the optimization :
[MLE loglike_val] = fminunc(@(parameters) loglike2_complete(parameters,DATA),start_vals)
%%% Where loglike2_complete works as follows :
function L = loglike2_complete(param,DATA)
%%% 3.a) I compute A1 and A2 (following a teoretical model where A1 and A2 are a weighted sum -other weights- of elements in matrices R and C ) :
N = length(DATA);
A1 = zeros(N,1);
A2 = zeros(N,1);
age = controls(:,10);
for i = 1:N % elements of A1 and A2
% A1(lambda1)
agei = repmat(age(i),age(i)-1,1); % vector of age(i)
k = (1:age(i)-1)'; % numbers from 1 to age(i)-1
num = (agei-k).^lambda1;
den = sum((agei-k).^lambda1);
w = num ./ den;
Ri = R(i,~isnan(R(i,:))); % select only non missing values of R for every id
A1(i) = Ri * w;
% A2(lambda2)
num = (agei-k).^lambda2;
den = sum((agei-k).^lambda2);
w = num ./ den;
Ci = C(i,~isnan(R(i,:)));
A2(i) = Ci * w;
end
%%% 3.b) I write the model in the form y = X * beta, where beta is start_vals excluded lambda1 and lambda2 :
X = [ones(N,1) A1 A2 controls];
%%% 3.c) I compute the function I want to minimize :
L = -(sum(Y.*log(normcdf(Z*beta,0,1))) + sum((1-Y).*log(1-normcdf(Z*beta,0,1))));
------------------------------------------------------------------------------------------------------
I think there are faster ways to expand the dataset and to compute A1 and A2. But what I'd like to know is a way to deal with those weights without expanding the dataset (also because I'm rounding the weights).
P.S. I've not seen the output of this program yet.