Abstract

Quantitative investment is the process of establishing mathematical models using statistics, information technology, and mathematics to quantify and implement risks, returns, and traditional investment concepts. However, due to the backwardness of computing tools in the past, quantitative investment has not received much recognition. With the improvement of computer science and quantitative analysis theory, traditional fundamental analysis and the use of sampling statistical technology to build advanced mathematical models for investment analysis have failed to meet the requirements of investors. Therefore, the Quantitative investment strategies based on data mining technology are receiving more and more attention. In this paper, we uses MATLAB software to capture big data from financial and economic websites, and then uses neural network training models to predict the trend of stock changes, and finally establishes a suitable quantitative stock selection model. The simulation results show that only by using quantitative stock selection strategies to curb risks and selecting a suitable investment portfolio can achieve the ideal goals in the stock market.

Keywords: Quantitative investment; Data mining; Neural network; portfolio

1. Introduction

In recent years, due to the continuous development of the stock market, more and more attention is paid to the quantitative investment technology [1-3]. Quantitative investment system is becoming mature gradually. With the continuous improvement of the stock market rules, the number of listed stocks and their associated data are increasing. There is a lot of complex stock data containing useful information, which cannot be found through conventional methods. However，the data mining technology developed in recent years can help us mining data information from the vast number of stock data [4-6]. By analyzing these data, we can get the information we want. In terms of factor stock selection, some researchers have successfully proposed a quantitative stock selection model based on multiple factors [7,8].These systems can use quantitative methods to analyze some transaction data and financial indicators of listed companies. At the same time, they combine statistical testing methods to help investors find the most valuable investment portfolio. But while some methods are convenient and easy to operate, they ignore the issues of correlation and overlap between factors [9]. Using the shortest distance hierarchical clustering method, we can reduce the massive stock price series, which not only simplifies the workload, but also more intelligent. But the shortest distance method is easy to make the samples in the class more and more, so it is an extreme method. Jigar Patel compared four prediction models, including artificial neural network (ANN), support vector machine (SVM), random forest and Naive Bayes, and then got the optimal prediction model [10].

2. Basic theory and method

2.1 Data mining

Data mining is the process of extracting the hidden and unknown useful information and knowledge from a large amount of incomplete, noisy, fuzzy and random practical application data [11,12]. The core of data mining is to use algorithms to train the processed input and output data and obtain models. Then, the model is verified, so that the model can describe the relationship between data and input to a certain extent. Finally, the model is used to calculate the newly input data to obtain a new output which can be used for interpretation and application [13]. The content of data mining mainly includes association, regression, classification, clustering, prediction and diagnosis.

2.2 Principle of BP neural network

A typical BP neural network includes an input layer, one or more hidden layers, and an output layer. Its network structure is shown in Figure 1. The algorithm learning process of BP neural network is mainly composed of input forward propagation and error back propagation. In the forward propagation process, input samples are input from the input layer, processed by the hidden layer units, and the actual output value of each unit is calculated according to the weight and threshold. If the actual output value and the expected value reach a predetermined error range at this time, the learning process ends successfully. The back-propagation method is to adjust the weight through the network error in the back, and modify the weight matrix according to the actual output and the expected output to reduce the error of the neural network structure [14,15].

First, we define the following variables and arguments. Input layer vector

Error creating thumbnail: File missing

, Hidden layer output vector

Error creating thumbnail: File missing

, Output layer output vector

Error creating thumbnail: File missing

, Expected value output vector

Error creating thumbnail: File missing

, Weighted connection matrix from input layer to hidden layer

Error creating thumbnail: File missing

, Matrix of weights from the hidden layer to the output layer

Error creating thumbnail: File missing

. The specific implementation steps of the BP neural network are as follows: Step1. The initialization matrices

Error creating thumbnail: File missing

and

Error creating thumbnail: File missing

of the network are determined by the activation function range. We determine the maximum number of trainings

Error creating thumbnail: File missing

and the learning accuracy value

Error creating thumbnail: File missing

, and choose the activation function:

Error creating thumbnail: File missing

(1)

Step2. Data preprocessing, we select sample data input, get the output of hidden layer

Error creating thumbnail: File missing

and output layer

Error creating thumbnail: File missing

:

Error creating thumbnail: File missing

(2)

Error creating thumbnail: File missing

(3)

Step3. Calculating the error using the actual output value

Error creating thumbnail: File missing

and the expected output value

Error creating thumbnail: File missing

of the network:

Error creating thumbnail: File missing

(4)

Step4. Calculating the partial derivative of the error function with respect to every neuron of the hidden layer and the output layer:

Error creating thumbnail: File missing

(5)

Error creating thumbnail: File missing

(6)

Step5. Using the error signal to adjust the connection weight of each layer, Let

Error creating thumbnail: File missing

be the weight from the hidden layer to the output layer, and

Error creating thumbnail: File missing

be the weight from the input layer to the hidden layer.

Error creating thumbnail: File missing

(7)

Error creating thumbnail: File missing

(8)

Step6. Calculating Global Error:

Error creating thumbnail: File missing

(9)

Step7. The global error

Error creating thumbnail: File missing

is compared with the precision value. If the global error is less than the given precision value, or the number of trainings exceeds the maximum number of times

Error creating thumbnail: File missing

, the algorithm ends at this time; otherwise, the learning continues.

2.3 Simulation experiments to predict stocks

Data is the foundation of data mining. Many financial websites have rich and reliable transaction data, such as Yahoo, Sina and Tencent. Yahoo has an interface with MATLAB, so we use MATLAB to obtain these transaction data from Yahoo. The important function “fetch” in MATLAB is used as follows:

Data=fetch(Connect,’security’,’FromDate’,’ToDate’)

Among them, ‘Connect’ indicates the location where the data was obtained, such as Yahoo. ‘Security’ indicates which stock data to obtain. ‘FromDate’ is the start time of the specified time range. ‘ToDate’ is the end time of the specified time range. In this paper, we use this method to obtain the stocks of Shenzhen Stock Exchange from 1 to 1000 and save them in Excel. After the data is standardized, training samples and prediction samples are obtained. We then use the neural network model described in Section 2.2 to train the samples and implement predictions.

The model results in a sort table of all stocks, as shown in Table 1. The ranking is based on the data predicted by the last column, which can be understood as the probability of future growth of the stock. The effect of this result is that in the actual process of stock buying and selling, we can choose the top stocks to buy, and vice versa. This provides conditions for buying and selling in quantitative stock selection.

Table1. Model prediction results (first 10 lines)

65	1	1	1	1	0.217464	0.689387	0.615622	0.933314	1.076462
802	0.649562	0.714952	0.590378	0.669138	0.533305	0.493489	0.119175	0.450005	0.995385
985	0.489474	0.388007	0.219643	0.032438	0.289402	0.922103	0.458649	0.370715	0.985637
582	0.350914	0.507703	0.590378	0.669138	0.58377	0.410922	0.118595	0.226798	0.940392
66	0.846695	0.593295	0.590378	0.669138	0.551252	0.670699	0.293865	0.605941	0.885136
751	1	1	0.87818	0.881371	0.332703	0.595813	0.626997	0.948292	0.88133
707	0	0.650724	0.302097	0.244671	0.699561	0.544556	0.236814	0.403214	0.830667
819	1	0.888569	0.87818	0.881371	0.613117	0.822664	0.666953	0.978776	0.826818
522	0.343439	0.942634	0.417467	0.456905	0.029334	0.000374	0.035146	0.607885	0.778539
521	0.710836	1	0.302097	0.244671	0.396943	0.315258	0.393372	0.913728	0.75364

In this experiment, we also use historical data to evaluate the model, and the verification method is full set verification. Figure 2 shows the accuracy and error rate of the model classification. Obviously, the accuracy is significantly higher than the error rate. In finance, it is not easy to achieve 72% accuracy. So, as long as the number of transactions is enough, the probability of profit is very considerable.

Error creating thumbnail: File missing

Figure2. Evaluation results of the model

3. Portfolio model

In this section, we build a portfolio model to determine the best weight for each stock investment. Suppose we want to invest in 8 stocks, just select the top 8 from the stock ranking table 1 given in the previous section.

Assume that the investor chooses

Error creating thumbnail: File missing

sorts of securities to invest, and the proportion of various securities in the total investment is

Error creating thumbnail: File missing

, which is represented by a vector as

Error creating thumbnail: File missing

The yields are

Error creating thumbnail: File missing

respectively, which is represented by a vector as

Error creating thumbnail: File missing

The expected rate of return are

Error creating thumbnail: File missing

which is represented by a vector as

Error creating thumbnail: File missing

Then the yield

Error creating thumbnail: File missing

of the securities investment portfolio is the weighted average of the yields of various securities:

Error creating thumbnail: File missing

(10)

Expected rate of yield

Error creating thumbnail: File missing

is the weighted average of the expected rate of yield of various securities, namely:

Error creating thumbnail: File missing

(11)

We use the covariance

Error creating thumbnail: File missing

to indicate the degree of correlation between the i-th security and the j-th security investment yield. In particular,

Error creating thumbnail: File missing

Let

Error creating thumbnail: File missing

be the covariance matrix of

Error creating thumbnail: File missing

. That is

Error creating thumbnail: File missing

(12)

Then, the risk

Error creating thumbnail: File missing

of the portfolio is

Error creating thumbnail: File missing

(13)

In order to minimize the investment risk as much as possible, we establish the following model:

Error creating thumbnail: File missing

(14)

Assuming the covariance matrix is a positive definite matrix, let

Error creating thumbnail: File missing

，

Error creating thumbnail: File missing

(15)

Then, the portfolio model can be transformed into

Error creating thumbnail: File missing

(16)

Constructing the Lagrange multiplier function

Error creating thumbnail: File missing

, where

Error creating thumbnail: File missing

， Let

Error creating thumbnail: File missing

that is

Error creating thumbnail: File missing

(17)

Therefore,

Error creating thumbnail: File missing

is the optimal portfolio weight for a given expected rate of return. Under this weight, the risk of the portfolio is minimized, which is

Error creating thumbnail: File missing

(18)

4. Simulation results and analysis

The proposed portfolio theoretical model is verified and simulated by MATLAB software. Now we are ready to invest in 8 stocks, just select the top 8 from the stock ranking table 1 given in the previous section, which are recorded as P₁, P₂,..., P₈ respectively. The simulation results are shown in Figure 3 and Figure 4.

Error creating thumbnail: File missing

Figure3. Effective frontier curve

Here, we need to focus on Figure 3. With this chart, we can easily see the distribution curve of risk and return. This will provide us with a basis for deciding which set of portfolios to choose. When we choose a point on the curve, we get a set of investment weights. If you are an investor who seeks high returns without fear of high risks, you can choose the top set of portfolios. Of course, most people will choose a relatively compromise solution, that is, the benefits are greater, but the risks can be tolerated.

Error creating thumbnail: File missing

Figure4. Distribution of investment weight

Figure 4 is an investment weight allocation chart for different risk appetites. When we choose an abscissa, it corresponds to a portfolio. Of course, we can also directly calculate the specific weight distribution data from the model. But in the form of a graph, it is more intuitive to see the difference in portfolio schemes under different risk preferences. The specific manifestation is that the investment ratio of each stock is different. When you choose a preference, you can directly get the specific investment allocation plan.

5. Conclusion

In the field of quantitative investment, investors' attention has been paid to quantitative stock selection strategies based on data mining technology. For investors, the key is to design good indicators and improve the accuracy of the model, thereby improving the profitability of the model and maximizing the potential of the data and model. Based on the observation and analysis of the Beidou navigation plate, the stocks with the most investment value in the plate were finally selected. While selecting better stocks, using quantitative timing strategies to suppress risks, and then selecting a suitable investment portfolio, in order to achieve the ideal goal of high returns and low risks in the stock market.

Acknowledgement

This work has been partially supported by the Key projects of natural science research of the higher education institutions of Anhui (grant no. KJ2016A530).

References

[1] Wenjing Ouyang, Samuel H. Szewczyk. Stock price informativeness on the sensitivity of strategic M&A investment to Q[J]. Review of Quantitative Finance & Accounting, 2018, 50(3):745-774.

[2] Chava, S., Wang, R., & Zou, H. Covenants, Creditors’ Simultaneous Equity Holdings, and Firm Investment Policies. Journal of Financial and Quantitative Analysis, 2019,54(2), 481-512.

[3] Han-ding, ZHANG, Yin-xian. Investment risk evaluation of existing building energy-saving renovation project for ESCO[J]. Ecological Economy, 2018(3):180-189.

[4] Huiqi Gan. Does CEO managerial ability matter? Evidence from corporate investment efficiency[J]. Review of Quantitative Finance & Accounting, 2019, 52(4):1085-1118.

[5] Ferrando, Annalisa, Preuss, Carsten. What finance for what investment? Survey-based evidence for European companies[J]. Eib Working Papers, 2018(5):1-39.

[6] Muhittin A. Serdar, Mustafa Serteser, Yasemin Ucal, etc. An Assessment of HbA1c in Diabetes Mellitus and Pre-diabetes Diagnosis: a Multi-centered Data Mining Study[J]. Applied Biochemistry and Biotechnology, 2019(Suppl1):1-13.

[7] Sorensen E H. Miller K L, Ooi C K. The decision tree approach to stock selection-An evolving tree model performs the best[J]. Journal of Portfolio Management. 2000,27(1):42-52.

[8] Piotroski, Joseph D . Value Investing: The Use of Historical Financial Statement Information to Separate Winners from Losers[J]. Journal of Accounting Research, 2001, 38(2):43-51.

[9] Fama E F , French K R . A Five-factor Asset Pricing Model[J]. Journal of Financial Economics, 2015,116(1):1-22.

[10] Jigar Patel，Sahil Shah，Priyank Thakkar，K Kotecha. Predicting stock and stock price index movement using Trend Deterministic Data Preparation and machine learning technique[J]. Expert Systems with Applications．2015,42(1)：259-268.

[11] Pernilla Svefors, Oleg Sysoev, Eva-Charlotte Ekstrom,etc. Relative importance of prenatal and postnatal determinants of stunting: data mining approaches to the MINIMat cohort, Bangladesh[J]. BMJ Open, 2019, 9(8):e025154.

[12] Alireza Arabameri, Biswajeet Pradhan, Khalil Rezaei. Spatial prediction of gully erosion using ALOS PALSAR data and ensemble bivariate and data mining models[J]. Geosciences Journal, 2019, 1:1-18.

[13] Yali Dong, Huimin Wang. Robust Output Feedback Stabilization for Uncertain Discrete-Time Stochastic Neural Networks with Time-Varying Delay[J]. Neural Processing Letters, 2019:1-21.

[14] Meng-Xiao Li, Su-Qin Yu, Wei Zhang. Segmentation of retinal fluid based on deep learning: application of three-dimensional fully convolutional neural networks in optical coherence tomography images[J]. International Journal of Ophthalmology, 2019, 12(6):1012-1020.

[15] Marwin H. S. Segler, Mike Preuss, Mark P. Waller. Planning chemical syntheses with deep neural networks and symbolic AI[J]. Nature, 2018, 555(7698):604-610.

Abstract

1. Introduction

2. Basic theory and method

2.1 Data mining

2.2 Principle of BP neural network

2.3 Simulation experiments to predict stocks

3. Portfolio model

4. Simulation results and analysis

5. Conclusion

Acknowledgement

References

Document information

Document Score

Share this document

Keywords

claim authorship