Quantcast
Channel: MATLAB Central Newsreader - tag:"optimization"
Viewing all articles
Browse latest Browse all 130

Re: Optimization using weights

$
0
0
Alan Weiss <aweiss@mathworks.com> wrote in message <nafb6l$mur$1@newscl01ah.mathworks.com>...
> On 2/22/2016 10:42 AM, someone wrote:
> > "Alessandro De Sanctis" wrote in message
> > <naf92r$if5$1@newscl01ah.mathworks.com>...
> >> Hello,
> >>
> >> I have to maximize a likelihood in which to every observation
> >> correspond a specific (non-integer) weight. In particular, I am
> >> referring to sampling weights, which denote the inverse of the
> >> probability that the observation is included in the sample.
> >>
> >> I tried by expanding the dataset (so that an observation with weight =
> >> 100 is repeated 100 times) but the dataset became extremely large and
> >> it's the second week that fminsearch is running.
> >>
> >> My ultimate goal would be to estimate a non-linear model with a binary
> >> dependent variable and weights to observations.
> >>
> >> Please any alternative idea on how to proceed is welcome. Thank you in
> >> advance.
> >> Alessandro
> >
> > To help us help you, can you show us a small snippet of your code? The
> > above description is pretty vague and doesn't give us much to go on.
>
> In particular, what is the mathematical form of your objective function,
> meaning the function you are trying to minimize? There is probably a
> shortcut that you can take in your function definition to account for
> weights, rather than adding new rows to the dataset.
>
> Also, fminsearch is not the fastest or most robust optimizer in
> Optimization Toolbox. You might do better to try fminunc, or another
> appropriate solver.
>
> Alan Weiss
> MATLAB mathematical toolbox documentation



Thanks, I'm now using fminunc. Moreover, I've just found a way to deal with adding rows which should save lots of time. I will now run this version of my program.

I try to be more clear. I am working on a dataset of N=60,000 observations. My teoretical model is in the form

y = b0 + b1 * A1(lambda_1,data) + b2 * A2(lambda_2,data) + controls * b + error

where y is a binary variable, A1 and A2 are functions of two parameters and data, and controls is a matrix (60,000 x 95) of regressors.

------------------------------------------------------------------------------------------------------

%%% 1) I load data and starting values (start_vals), and expand the dataset by rounding weights to the closest integer. The following command is new and I haven't tried it yet on the whole dataset, it will probably take hours. The final dataset will have dimension N = 134,985,980.

weights = round(data(:,end));

DATA = [];
for i = 1:length(weights)
   DATAi = data(i,:);
   DATAi = repmat(DATAi,weights(i),1);
   DATA = [DATA; DATAi];
end

%%% 2) I run the optimization :

[MLE loglike_val] = fminunc(@(parameters) loglike2_complete(parameters,DATA),start_vals)

%%% Where loglike2_complete works as follows :

function L = loglike2_complete(param,DATA)

%%% 3.a) I compute A1 and A2 (following a teoretical model where A1 and A2 are a weighted sum -other weights- of elements in matrices R and C ) :

N = length(DATA);
A1 = zeros(N,1);
A2 = zeros(N,1);
age = controls(:,10);

for i = 1:N % elements of A1 and A2
    % A1(lambda1)
    agei = repmat(age(i),age(i)-1,1); % vector of age(i)
    k = (1:age(i)-1)'; % numbers from 1 to age(i)-1
    num = (agei-k).^lambda1;
    den = sum((agei-k).^lambda1);
    w = num ./ den;
    Ri = R(i,~isnan(R(i,:))); % select only non missing values of R for every id
    A1(i) = Ri * w;
    
    % A2(lambda2)
    num = (agei-k).^lambda2;
    den = sum((agei-k).^lambda2);
    w = num ./ den;
    Ci = C(i,~isnan(R(i,:)));
    A2(i) = Ci * w;
end

%%% 3.b) I write the model in the form y = X * beta, where beta is start_vals excluded lambda1 and lambda2 :

X = [ones(N,1) A1 A2 controls];

%%% 3.c) I compute the function I want to minimize :

L = -(sum(Y.*log(normcdf(Z*beta,0,1))) + sum((1-Y).*log(1-normcdf(Z*beta,0,1))));

------------------------------------------------------------------------------------------------------

I think there are faster ways to expand the dataset and to compute A1 and A2. But what I'd like to know is a way to deal with those weights without expanding the dataset (also because I'm rounding the weights).

P.S. I've not seen the output of this program yet.

Viewing all articles
Browse latest Browse all 130

Trending Articles