README.md 7.08 KB
Newer Older
Imanol Pérez's avatar
Imanol Pérez committed
1
2
# MonteCarlo and Arima for stock selection

Imanol Pérez's avatar
Imanol Pérez committed
3
4
See an article about the code on <a href="http://www.tulipquant.com/2016/06/18/montecarlo-and-arima-for-stock-selection/">tulipQuant</a>.

Imanol Pérez's avatar
Imanol Pérez committed
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
The idea of the trading algorithm will be the following:
<ul>
 	<li>Given a day, for each stock of a certain index, select the best ARIMA model for this stock.</li>
 	<li>Then, simulate different possible trajectories of the price of this stock.</li>
 	<li>Now, estimate the probability of the stock going up, dividing the number of trajectories that actually increased their price, by the total number of trajectories. This way of estimating probabilities is called <a href="https://en.wikipedia.org/wiki/Monte_Carlo_method">Monte Carlo method</a>.</li>
 	<li>Now, since this process was repeated for a big number of stocks, we may sort them appropiately. If $latex p_i$ is the probability of a price rise for stock $latex i$, this stock would be a good candidate to buy if $latex p_i$ was close to 1. However, if it is close to 0, it would also be a good candidate, but for short selling. Thus, we pick the stock such that $latex |p_i-0.5|$ is the maximum.</li>
 	<li>Now, once we have calculated the stock where the maximum is attained, we go long or short depending if the probability was greater or less than 0.5.</li>
</ul>
Once I have explained how the algorithm will work, let's see how it works in practice. The algorithm was implemented in R.

I used the following libraries:

<pre lang="PHP">
library(quantmod)
library(timeSeries)
library(forecast);
library(TTR)
</pre>


The function that selects the best ARIMA model for a time series is the following:

<pre lang="PHP">
selectArima <- function(timeseries.ts) {
  aic<-matrix(NA,4,4);
  for(p in 0:3)
  {
    for(q in 0:3)
    {
      a.p.q<-arima(timeseries.ts,order=c(p,0,q));
      aic.p.q<-a.p.q$aic;
      aic[p+1,q+1]<-aic.p.q;
    }
  }
  inds = which(aic == min(aic), arr.ind=TRUE)-1;
  colnames(inds)<-c("p", "q");
  return(inds);
}
</pre>

Now, the following function runs the Monte Carlo method, and returns the probability for the stock going up next day.

<pre lang="PHP">
monteCarlo <- function(ts.ts, N) {
  pq<-selectArima(ts.ts);
  p=pq[1];
  q=pq[2];
  fit<-arima(ts.ts, c(p, 0, q));
  ts.sim<-ts.ts;
  n=length(ts.sim)
  
  # We simulate different trajectories
  
  sim <- list();
  S <- NULL;
  h <- 1;
  for(i in 1:N){
    sim[[i]] <- simulate(fit,h,future=TRUE,bootstrap = TRUE, innov=NULL);
    S <- cbind(S,as.vector(sim[[i]]));
  }
  S=S[ , colSums(is.na(S)) == 0];
  
  p <- sum(S>0)/N;

  return(p);
  
}
</pre>

The code is quite straightforward. The whole code is available on GitHub, or in the bottom of this post.


<h1>Backtest</h1>

So, is this strategy better than the buy-and-hold strategy? Since the code is <em>really</em> slow, I wasn't able to run it for a big period of time. I only backtested it with data from the end of 2014, to nowadays. The returns are shown in the following chart:

Imanol Pérez's avatar
Imanol Pérez committed
81
<div style="text-align: center;"><img src="https://raw.githubusercontent.com/bolorsociedad/MonteCarlo-and-Arima-for-stock-selection/master/arima.png"/></div>
Imanol Pérez's avatar
Imanol Pérez committed
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237

The strategy behaved really well, and in this period the cumulative return was 1399.08%, which is really impressie. I did test it with some other small periods, and the returns were equally impressive. I was not able to test it for long periods, due to the speed of the code. However, if someone wants to test it, he is welcome to share the results obtained.

<h1>The whole code</h1>

In order to improve speed a little bit, instead of considering all the stocks that are obtained with the function <code>stockSymbols()</code>, it just selects a few of them randomly. Also, <code>stockSymbols()</code> returns a list of stocks that exist <i>now</i>, but many of them didn't exist a few years ago. So the code could be improved a lot.

This is the whole code, which is also available on GitHub:

<pre lang="PHP">
library(quantmod)
library(timeSeries)
library(forecast);
library(TTR)

selectArima <- function(timeseries.ts) {
  aic<-matrix(NA,4,4);
  for(p in 0:3)
  {
    for(q in 0:3)
    {
      a.p.q<-arima(timeseries.ts,order=c(p,0,q));
      aic.p.q<-a.p.q$aic;
      aic[p+1,q+1]<-aic.p.q;
    }
  }
  inds = which(aic == min(aic), arr.ind=TRUE)-1;
  colnames(inds)<-c("p", "q");
  return(inds);
}

monteCarlo <- function(ts.ts, N) {
  pq<-selectArima(ts.ts);
  p=pq[1];
  q=pq[2];
  fit<-arima(ts.ts, c(p, 0, q));
  ts.sim<-ts.ts;
  n=length(ts.sim)
  
  # We simulate different trajectories
  
  sim <- list();
  S <- NULL;
  h <- 1;
  for(i in 1:N){
    sim[[i]] <- simulate(fit,h,future=TRUE,bootstrap = TRUE, innov=NULL);
    S <- cbind(S,as.vector(sim[[i]]));
  }
  S=S[ , colSums(is.na(S)) == 0];
  


  
  p <- sum(S>0)/N;

  return(p);
  
}


####### SETTINGS #######

startDate="2010-01-01";
T=500;
N=1000; # Number of Monte Carlo simulations
numberOfStocks=20; # Number of stocks that will be considered.

########################

startDate=as.Date(startDate)-T;

x <- stockSymbols();


getSymbols("^GSPC", from=startDate);

GSPC=Cl(GSPC);
money<-c(1000);
for (i in (T+1):(length(GSPC)-2)) {
  symbols<-x$Symbol;
  symbols<-sample(symbols, numberOfStocks);
  
  maxProbability=0.5;
  possibleMoney=money[length(money)];
  for (j in 1:length(symbols)) {
    symbols[j] = gsub("-","", symbols[j])
    error<-0;
    error0<-tryCatch ({
      s <- getSymbols(symbols[j], from=startDate)
    }, error=function(e) {
      error0<-1;
    })
    if (error0==1) {
      next();
    }
    
    field <- c(paste(s,".Close",sep=""))
    data <- get(s)[, field]
    
    dateStart<-time(GSPC[i-T]);
    dateEnd<-time(GSPC[i]);
    
    data <- window(data, start=dateStart, end=dateEnd);
    
    
    
    if (length(data)<T) {
      cat("No data of '", as.character(s), "' for day ", as.character(dateEnd), "\n");
      next();
      
    }
    data.ts<-diff(log(window(data, start=dateStart, end=dateEnd)));
    
    error0<-tryCatch ({
      p<-monteCarlo(data.ts, N);
    }, error=function(e) {
      error0<-1;
    })
    if (error0==1) {
      next();
    }
    
    if (abs(p-0.5)>abs(maxProbability-0.5)) {
      dateStart<-time(GSPC[i-T]);
      dateEnd<-time(GSPC[i+1]);
      data <- get(s)[, field]
      
      data <- window(data, start=dateStart, end=dateEnd);
      
      newPrice=as.numeric(data[length(data)]);
      oldPrice=as.numeric(data[length(data)-1]);
      if (p<0.5) {
        possibleMoney=money[length(money)]*oldPrice/newPrice;
      }
      if (p>0.5) {
        possibleMoney=money[length(money)]*newPrice/oldPrice;
      }
      maxProbability=p;

    }
    
    
  }
  
  money<-c(money, possibleMoney);
  cat("Money:" , money[length(money)], "\n"); 
}


money.zoo<-zoo(money, time(GSPC)[(length(GSPC)-length(money)+1):length(GSPC)]);
sp500.zoo<-zoo(1000*GSPC[(length(GSPC)-length(money)+1):(length(GSPC))]/as.numeric(GSPC[length(GSPC)-length(money)+1]), time(GSPC)[(length(GSPC)-length(money)+1):length(GSPC)])
z<-as.zoo(cbind(money.zoo, sp500.zoo))
plot(x = z, ylab = "Cumulative Return", xlab="Year", main = "Cumulative Returns", screens=1, col=c("red", "blue"))
legend(x = "topleft", legend = c("Strategy", "Buy & hold"), 
       lty = 1,col = c("red", "blue"))
</pre>