Quantitative investment is the process of establishing mathematical models using statistics, information technology, and mathematics to quantify and implement risks, returns, and traditional investment concepts. However, due to the backwardness of computing tools in the past, quantitative investment has not received much recognition. With the improvement of computer science and quantitative analysis theory, traditional fundamental analysis and the use of sampling statistical technology to build advanced mathematical models for investment analysis have failed to meet the requirements of investors. Therefore, the Quantitative investment strategies based on data mining technology are receiving more and more attention. In this paper, we uses MATLAB software to capture big data from financial and economic websites, and then uses neural network training models to predict the trend of stock changes, and finally establishes a suitable quantitative stock selection model. The simulation results show that only by using quantitative stock selection strategies to curb risks and selecting a suitable investment portfolio can achieve the ideal goals in the stock market.
Keywords: Quantitative investment; Data mining; Neural network; portfolio
In recent years, due to the continuous development of the stock market, more and more attention is paid to the quantitative investment technology [1-3]. Quantitative investment system is becoming mature gradually. With the continuous improvement of the stock market rules, the number of listed stocks and their associated data are increasing. There is a lot of complex stock data containing useful information, which cannot be found through conventional methods. However,the data mining technology developed in recent years can help us mining data information from the vast number of stock data [4-6]. By analyzing these data, we can get the information we want. In terms of factor stock selection, some researchers have successfully proposed a quantitative stock selection model based on multiple factors [7,8].These systems can use quantitative methods to analyze some transaction data and financial indicators of listed companies. At the same time, they combine statistical testing methods to help investors find the most valuable investment portfolio. But while some methods are convenient and easy to operate, they ignore the issues of correlation and overlap between factors [9]. Using the shortest distance hierarchical clustering method, we can reduce the massive stock price series, which not only simplifies the workload, but also more intelligent. But the shortest distance method is easy to make the samples in the class more and more, so it is an extreme method. Jigar Patel compared four prediction models, including artificial neural network (ANN), support vector machine (SVM), random forest and Naive Bayes, and then got the optimal prediction model [10].
Data mining is the process of extracting the hidden and unknown useful information and knowledge from a large amount of incomplete, noisy, fuzzy and random practical application data [11,12]. The core of data mining is to use algorithms to train the processed input and output data and obtain models. Then, the model is verified, so that the model can describe the relationship between data and input to a certain extent. Finally, the model is used to calculate the newly input data to obtain a new output which can be used for interpretation and application [13]. The content of data mining mainly includes association, regression, classification, clustering, prediction and diagnosis.
A typical BP neural network includes an input layer, one or more hidden layers, and an output layer. Its network structure is shown in Figure 1. The algorithm learning process of BP neural network is mainly composed of input forward propagation and error back propagation. In the forward propagation process, input samples are input from the input layer, processed by the hidden layer units, and the actual output value of each unit is calculated according to the weight and threshold. If the actual output value and the expected value reach a predetermined error range at this time, the learning process ends successfully. The back-propagation method is to adjust the weight through the network error in the back, and modify the weight matrix according to the actual output and the expected output to reduce the error of the neural network structure [14,15].
First, we define the following variables and arguments. Input layer vector
|
(1) |
|
(2) |
|
(3) |
|
(4) |
Step4. Calculating the partial derivative of the error function with respect to every neuron of the hidden layer and the output layer:
|
(5) |
|
(6) |
|
(7) |
|
(8) |
Step6. Calculating Global Error:
|
(9) |
Data is the foundation of data mining. Many financial websites have rich and reliable transaction data, such as Yahoo, Sina and Tencent. Yahoo has an interface with MATLAB, so we use MATLAB to obtain these transaction data from Yahoo. The important function “fetch” in MATLAB is used as follows:
Among them, ‘Connect’ indicates the location where the data was obtained, such as Yahoo. ‘Security’ indicates which stock data to obtain. ‘FromDate’ is the start time of the specified time range. ‘ToDate’ is the end time of the specified time range. In this paper, we use this method to obtain the stocks of Shenzhen Stock Exchange from 1 to 1000 and save them in Excel. After the data is standardized, training samples and prediction samples are obtained. We then use the neural network model described in Section 2.2 to train the samples and implement predictions.
The model results in a sort table of all stocks, as shown in Table 1. The ranking is based on the data predicted by the last column, which can be understood as the probability of future growth of the stock. The effect of this result is that in the actual process of stock buying and selling, we can choose the top stocks to buy, and vice versa. This provides conditions for buying and selling in quantitative stock selection.
Table1. Model prediction results (first 10 lines)
65 | 1 | 1 | 1 | 1 | 0.217464 | 0.689387 | 0.615622 | 0.933314 | 1.076462 |
802 | 0.649562 | 0.714952 | 0.590378 | 0.669138 | 0.533305 | 0.493489 | 0.119175 | 0.450005 | 0.995385 |
985 | 0.489474 | 0.388007 | 0.219643 | 0.032438 | 0.289402 | 0.922103 | 0.458649 | 0.370715 | 0.985637 |
582 | 0.350914 | 0.507703 | 0.590378 | 0.669138 | 0.58377 | 0.410922 | 0.118595 | 0.226798 | 0.940392 |
66 | 0.846695 | 0.593295 | 0.590378 | 0.669138 | 0.551252 | 0.670699 | 0.293865 | 0.605941 | 0.885136 |
751 | 1 | 1 | 0.87818 | 0.881371 | 0.332703 | 0.595813 | 0.626997 | 0.948292 | 0.88133 |
707 | 0 | 0.650724 | 0.302097 | 0.244671 | 0.699561 | 0.544556 | 0.236814 | 0.403214 | 0.830667 |
819 | 1 | 0.888569 | 0.87818 | 0.881371 | 0.613117 | 0.822664 | 0.666953 | 0.978776 | 0.826818 |
522 | 0.343439 | 0.942634 | 0.417467 | 0.456905 | 0.029334 | 0.000374 | 0.035146 | 0.607885 | 0.778539 |
521 | 0.710836 | 1 | 0.302097 | 0.244671 | 0.396943 | 0.315258 | 0.393372 | 0.913728 | 0.75364 |
In this experiment, we also use historical data to evaluate the model, and the verification method is full set verification. Figure 2 shows the accuracy and error rate of the model classification. Obviously, the accuracy is significantly higher than the error rate. In finance, it is not easy to achieve 72% accuracy. So, as long as the number of transactions is enough, the probability of profit is very considerable.
In this section, we build a portfolio model to determine the best weight for each stock investment. Suppose we want to invest in 8 stocks, just select the top 8 from the stock ranking table 1 given in the previous section.
Assume that the investor chooses
|
(10) |
|
(11) |
|
(12) |
|
(13) |
In order to minimize the investment risk as much as possible, we establish the following model:
|
(14) |
Assuming the covariance matrix is a positive definite matrix, let
|
(15) |
Then, the portfolio model can be transformed into
|
(16) |
|
(17) |
|
(18) |
The proposed portfolio theoretical model is verified and simulated by MATLAB software. Now we are ready to invest in 8 stocks, just select the top 8 from the stock ranking table 1 given in the previous section, which are recorded as P1, P2,..., P8 respectively. The simulation results are shown in Figure 3 and Figure 4.
Here, we need to focus on Figure 3. With this chart, we can easily see the distribution curve of risk and return. This will provide us with a basis for deciding which set of portfolios to choose. When we choose a point on the curve, we get a set of investment weights. If you are an investor who seeks high returns without fear of high risks, you can choose the top set of portfolios. Of course, most people will choose a relatively compromise solution, that is, the benefits are greater, but the risks can be tolerated.
Figure 4 is an investment weight allocation chart for different risk appetites. When we choose an abscissa, it corresponds to a portfolio. Of course, we can also directly calculate the specific weight distribution data from the model. But in the form of a graph, it is more intuitive to see the difference in portfolio schemes under different risk preferences. The specific manifestation is that the investment ratio of each stock is different. When you choose a preference, you can directly get the specific investment allocation plan.
In the field of quantitative investment, investors' attention has been paid to quantitative stock selection strategies based on data mining technology. For investors, the key is to design good indicators and improve the accuracy of the model, thereby improving the profitability of the model and maximizing the potential of the data and model. Based on the observation and analysis of the Beidou navigation plate, the stocks with the most investment value in the plate were finally selected. While selecting better stocks, using quantitative timing strategies to suppress risks, and then selecting a suitable investment portfolio, in order to achieve the ideal goal of high returns and low risks in the stock market.
This work has been partially supported by the Key projects of natural science research of the higher education institutions of Anhui (grant no. KJ2016A530).
[1] Wenjing Ouyang, Samuel H. Szewczyk. Stock price informativeness on the sensitivity of strategic M&A investment to Q[J]. Review of Quantitative Finance & Accounting, 2018, 50(3):745-774.
[2] Chava, S., Wang, R., & Zou, H. Covenants, Creditors’ Simultaneous Equity Holdings, and Firm Investment Policies. Journal of Financial and Quantitative Analysis, 2019,54(2), 481-512.
[3] Han-ding, ZHANG, Yin-xian. Investment risk evaluation of existing building energy-saving renovation project for ESCO[J]. Ecological Economy, 2018(3):180-189.
[4] Huiqi Gan. Does CEO managerial ability matter? Evidence from corporate investment efficiency[J]. Review of Quantitative Finance & Accounting, 2019, 52(4):1085-1118.
[5] Ferrando, Annalisa, Preuss, Carsten. What finance for what investment? Survey-based evidence for European companies[J]. Eib Working Papers, 2018(5):1-39.
[6] Muhittin A. Serdar, Mustafa Serteser, Yasemin Ucal, etc. An Assessment of HbA1c in Diabetes Mellitus and Pre-diabetes Diagnosis: a Multi-centered Data Mining Study[J]. Applied Biochemistry and Biotechnology, 2019(Suppl1):1-13.
[7] Sorensen E H. Miller K L, Ooi C K. The decision tree approach to stock selection-An evolving tree model performs the best[J]. Journal of Portfolio Management. 2000,27(1):42-52.
[8] Piotroski, Joseph D . Value Investing: The Use of Historical Financial Statement Information to Separate Winners from Losers[J]. Journal of Accounting Research, 2001, 38(2):43-51.
[9] Fama E F , French K R . A Five-factor Asset Pricing Model[J]. Journal of Financial Economics, 2015,116(1):1-22.
[10] Jigar Patel,Sahil Shah,Priyank Thakkar,K Kotecha. Predicting stock and stock price index movement using Trend Deterministic Data Preparation and machine learning technique[J]. Expert Systems with Applications.2015,42(1):259-268.
[11] Pernilla Svefors, Oleg Sysoev, Eva-Charlotte Ekstrom,etc. Relative importance of prenatal and postnatal determinants of stunting: data mining approaches to the MINIMat cohort, Bangladesh[J]. BMJ Open, 2019, 9(8):e025154.
[12] Alireza Arabameri, Biswajeet Pradhan, Khalil Rezaei. Spatial prediction of gully erosion using ALOS PALSAR data and ensemble bivariate and data mining models[J]. Geosciences Journal, 2019, 1:1-18.
[13] Yali Dong, Huimin Wang. Robust Output Feedback Stabilization for Uncertain Discrete-Time Stochastic Neural Networks with Time-Varying Delay[J]. Neural Processing Letters, 2019:1-21.
[14] Meng-Xiao Li, Su-Qin Yu, Wei Zhang. Segmentation of retinal fluid based on deep learning: application of three-dimensional fully convolutional neural networks in optical coherence tomography images[J]. International Journal of Ophthalmology, 2019, 12(6):1012-1020.
[15] Marwin H. S. Segler, Mike Preuss, Mark P. Waller. Planning chemical syntheses with deep neural networks and symbolic AI[J]. Nature, 2018, 555(7698):604-610.
Published on 01/04/20
Licence: CC BY-NC-SA license
Are you one of the authors of this document?