1.1 题目背景
市场交易者频繁买卖波动性资产,目标是最大化其总回报。每次买卖通常都会有佣金。 两种这样的资产是黄金和比特币。
Market traders often buy and sell volatile assets & #xff0c; the goal is to maximize their total return.
图 1:黄金每日价格,每金衡盎司美元。 资料来源:伦敦金银市场协会,2021 年 9 月 11 日
Figure 1 xff1a; daily gold prices xff0c; United States dollar per gold balance. Source xff1a; London Gold and Silver Markets Association xff0c; September 11, 2021
图 2:比特币每日价格,每比特币美元。 资料来源:纳斯达克,2021 年 9 月 11 日
Figure 2xff1a; Bitcoin daily price xff0c; US$ per bitcoin. Source xff1a; Nasdakxff0c; September 11, 2021
要求
requests
一位交易员要求您开发一个模型,该模型仅使用迄今为止的每日价格流来确定交 易员每天是否应该购买、持有或出售其投资组合中的资产。
2016 年 9 月 11 日,您将从 1000 美元开始。 您将使用五年交易期,从2016 年 9 月 11 日至 2021 年 9 月 10 日。 在每个交易日,交易者将拥有一个由现金组成的投资组合,黄金和比特币 [C, G, B] 分别以美元、金衡盎司和比特币表示。 最初的状态为 [1000, 0, 0]。 每笔交易(购买或销售)的佣金成本为交易金额。 假设 αgold = 1% 和 αbitcoin = 2%。 持有资产没有成本。
请注意,比特币可以每天交易,但黄金仅在市场开放日交易,定价数据文件反映 LBMA-GOLD.csv 和 BCHAIN-MKPRU.csv 这两点,你的模型应该考虑这个交易时间表。
要开发模型,您只能使用提供的两个电子表格中的数据:LBMA-GOLD.csv 和 BCHAIN-MKPRU.csv。(官网提供下载)
? 开发一个模型,该模型仅根据当天的价格数据提供每日的最佳交易策略,使用你的模型和策略,在 2021 年 9 月 10 日,初始 1000 美元能收获的投资价值多少?
? 提供证据证明您的模型提供了最佳策略。
? 确定策略对交易成本的敏感程度。 交易成本如何影响策略和结果?
? 最多以一份备忘录(两页)的形式将您的策略、模型和结果传达给交易者
1.2 解题使用工具
语言:python3.8
Languages & #xff1a; python3.8
编译器:SPSSPRO Notebook
compiler & #xff1a; SPSPRO Notebook
下载链接:SPSSPRO Notebook(免费在线使用,推荐使用)
Download links & #xff1a; SPPRO Notebook( free online use xff0c; recommended use xff09;
首先,我们先整理和数据,因为比特币每天都开市,黄金有时间开市,有时间闭市,把他们整理成时间线对齐,可以用缺失值代表闭市日等等。
First xff0c; we sort and sort the data xff0c; because bitcoin opens every day xff0c; gold has time to open xff0c; there is time to close xff0cc; they are sorted into time lines xff0c; missing values can be used to represent closed days and so on.
我们可以根据预测未来走势来对(买入或者出售或保持)这三种交易活动进行决策,因此,接着我们可以针对黄金、比特币进行时序预测,基于当天数据或以往数据去预测明天的价格走势,进而更好地做决策。
We can make decisions on xff08; buy or sell or maintain xff09; make decisions on these three transactions xff0c; xff0c; then we can do time series predictions on gold, bitcoin xff0c; forecast price trends for tomorrow on the basis of current or past data xff0c; and thus make better decisions.
然后先针对第一天,基于预测的明天价格,构建一个目标规划,目的是实现已经知道第二日的价格后,如果投入才能实现当前交易日价格的最大化,其中交易的真实利益可以根据基于预测价格实施的投资策略后,通过真实第二天价格计算当天投资策略的盈利,然后重复这个过程,直到持有金额败光或者5年交易期结束,停止循环。
The real benefits of the transaction can be derived from an investment strategy based on a forecast price xff0c; the profit of the investment strategy of the day xff0c calculated by the true price of the day xff0c; and the repetition of the process xff0c; xff0c; or xff0c; or xff0c end of the five-year trading period.
接着,对模型中出现的超参数进行灵敏度分析,例如设置了初始黄金的持有量是各500,如果调整这个比例,那么投资额度是否波动平稳?
xff0c; sensitivity analysis of super-parameters appearing in the model xff0c; for example, initial gold holdings of 500xff0c each have been set; if this ratio is adjusted xff0c; if the investment level is smooth xff1f;
4.1 详细求解步骤
step1:数据合并
首先先把比特币和黄金的交易数据合并起来,通过简单的观察,以Date字段为关联字段,合并数据,采用merge连接,可以得到以下数据,可以看到,一共有1826行样本。
First, the transaction data for Bitcoin and gold are combined xff0c; xff0c by simple observation; xff0c with the Date field as associated field xff0c; xff0c by Merge connection xff0c; xff0c; ff0c by xff0c; and a total of 1826 rows.
将合并后数据进行查看确实值,可以看到,黄金存在缺失值,且缺失了571个数值,这是因为比特币可以每天交易,但黄金仅在开市日交易导致的
The consolidated data will be viewed for the exact value xff0c; see xff0c; have missing value xff0c; and missing 571 values xff0c; this is because bitcoin can trade xff0c per day; but gold is traded only on the opening day
step2:训练模型,进行时序预测
根据题目要求,开发一个模型,该模型仅根据当天的价格数据提供每日的最佳交易策略,因此,我们需要训练一个能基于当天数据预测第二天数据的时序模型。
xff0c; develop a model xff0c; the model provides only the best day-to-day trade strategy xff0c based on price data of the day xff0c; therefore xff0c; we need to train a time-series model that can be based on data for the second day of the day.
对于时间序列问题,目前业界有两种求法:
1、学术界常用计量统计模型,如arima模型、灰色预测模型、指数平滑等等,这类需要进行非常严格的模型检验
For time-series problems & #xff0c; for the present there are two ways in which industry & #xff1a;
2、工业界统计模型,大多采用机器学习进行时间序列问题求解,例如lstm,xgboost等,通常做法也是2种,一种是单序列求解,将单序列转为多序列回归,另外就是构建特征工程,直接研究回归问题。
2, the industry statistical model & #xff0c; mostly uses machine learning to solve time-series problems & #xff0c; e.g., lstm, xboost et al. & #xff0c; usually also two #xff0c; one is single-sequent resolution & #xff0c; conversion of single series to multiple xff0c; and the construction of features engineering & #xff0c; direct study of regression.
这里我们采用工业界模型,也就是机器学习时序预测,在这之前,我们需要了解一个数据处理的方法——时序数据滑窗转换。
Here we use the industry model & #xff0c; that is, machine learning time series prediction & #xff0c; before #xff0c; we need to understand a data processing method time-series data slide window conversion .
时序数据滑窗转换用于将时间序列数据转为回归数据,简单地说,就是把一个单序列的数据变为X->Y的回归数据。步阶为2代表2个X(步阶多少就有多少个X),一个Y(这个不会变的),
Time series data slide windows are converted to return time series data to xff0c; xff0c; simply xff0c; that is, to convert a single series to X-> Y regression. Step 2 represents two Xxff08; step number xff09; xff0c; Yxff08; this unchangeable xff09; xff0c;
简单地说,就是用第1,2天的数据预测第3天,用第2,3天的数据预测第4天,以此类推。
Simply xff0c; i.e. 1xff0c; 2-day data projection 3xff0c; 2xff0c; 3-day data projection 4xff0c; and so on.
大家可以用spsspro的数据处理的时序数据滑窗转换实现
You can use the time-series data slide window for data processing in spsspro.
SPSSPRO-数据处理
SPSPRO-Data-processing
时序数据滑窗转换
Time series data slide window conversion
我这里也写了一个代码实现,只不过效率会差些。dataset, look_back
I wrote a code here to achieve #xff0c; it's just less efficient.daset, look_back.
其中,dataset为数据集, look_back为步阶,如上图所示,为比特币步阶为1时的滑窗转换结果。
Of which xff0c; dataset is a data set and looks_back is a step xff0c; xff0c, as shown in the figure above; is a slide conversion result for a bitcoin step of 1.
可以采用SPSSPRO的随机森林回归,使用起来也更简单,而且输出的结果和图表比较精美,这里建议大家多跑几个算法对比效果,推荐XGBooST、LGBM、随机森林回归这三项。
Random forest regression xff0c using SPSPRO; simpler to use xff0c; and better output and graphics xff0c. Here it is recommended to run a few more algorithms to compare effect xff0c; and three recommended XGBooST, LGBTM, random forest return.
这里我采用代码采用随机森林对比特币进行时序数据训练进行示例,结果如下,可以看到,R2为0.994,拟合效果较为优秀。
Here I use the code for example xff0c using random forest bitcoin for time sequence data training; the results are as follows xff0c; see xff0c; R2 0.994xff0c; and the collusive effect is excellent.
同理,得到黄金的预测模型,注意黄金数据需要剔除缺失值,但是不要在原有数据上剔除。
Synonym #xff0c; predictive model xff0c from gold; note that gold data need to remove missing value xff0c; however, do not delete original data.
接着,重复构建训练模型,用第1天的黄金、比特币数据预测第2天的黄金、比特币,用第1、2天的黄金、比特币数据预测第3天的黄金、比特币、用第1、2、3天的黄金、比特币数据预测第4天的黄金、比特币依次类推。
Then xff0c; Repeatedly constructed training model xff0c; projected gold on day 1, bitcoin data for day 2, bitcoin #xff0c; projected gold on day 1, 2, bitcoin data for day 3, bitcoin, gold on days 1, 2, 3, bitcoin data for day 4, bitcoin by analogy.
得到每天的预测数据,同时与真实的数据进行合并,整理得到以下表格。
Daily forecast data xff0c; combined with real data xff0c; collated table below.
step3:构建最优化模型,基于启发式算法寻优
在进行预测后,我们需要得到买入-出售-保持这样的交易策略,其中,黄金仅在开市日的交易,这说明在周末或者节假日,交易状态一定是持有,可以分别保留黄金和比特币的共同交易日数据来进行分析。假设黄金-比特币是同买同卖的,主要设计到的是一个收益率这样一个时间序列数据,比如,我们可以在任一一天进行买入,我们可以用(预测某天金子的价格/购买金子的实际价格-1)来得到收益率,当涨幅达到某个值的,建议卖出。
After the forecast is made xff0c; we need to buy-sale-maintain the transaction strategy xff0c; xff0c; xff0c; gold transactions only on open days xff0c; this means that the transaction must be held xff0c on weekends or holidays xff0c; data on joint trading days of gold and bitcoin can be analysed separately. Assuming that gold-bitcoin is xff0c; time series data xff0c; e.g. xff0c; we can buy xff0c; we can use xff08; we can predict the real price of a day's gold xff09; we get a return rate xff0c; xff0c when the scale rises to a certain value xff0c; we suggest selling.
注意:初始状态为【1000,0,0】,并且每笔交易(购买或出售)的交易成本为交易金额的a%,其中黄金为1%,比特币为2%,那么,对于1000美金,买入卖出两个步骤,我们实际进行的交易金额只有940美金。
Note xff1a; initial status is [1000xff0c; 0xff0c; 0xff0c; and each transaction is xff08; purchase or sale xff09; transaction cost is a xff0c of the transaction amount; gold is 1% xff0c; Bitcoin is 2% xff0c; xff0c; xff0c; xff0c; buy and sell two steps xff0c; our actual transaction is only $940.
建立简单目标规划:
Establishing simple target planning xff1a;
假设t是买入到卖出这段时间
Let's say t was bought until it was sold.
由于买入-售出是在不断进行的,我们需要建立循环来进行运行。
Because buying-sale is ongoing xff0c; we need to create loops to run.
为达到更加完美的结果,更贴合实际,可以添加金融风险性的分析,类似VaR、CVaR、又或者是信息熵的使用,在建立完美的投资模型后,我们可以用来优化算法来对权重进行寻优,比如粒子群法、遗传算法、免疫算法等等。
To achieve a more perfect result & #xff0c; better suited to actual #xff0c; to add financial risk analysis & #xff0c; to use a similar type of VaR, CVAR, or an information entropy & #xff0c; to build a perfect investment model & #xff0c; we can use algorithms to optimize the search for merit & #xff0c; e.g. particle grouping, genetic algorithms, immune algorithms, etc.
即:我们需要设定好目标函数,也就是每日收益的最大化,设立相关约束条件,求解规划求解结果,以及规划求解方程导出。
我们设置以下变量
We set the following variables:
变量设置
Variable Settings
而我们的目标就是根据预测模型与限定的一些约束条件中,得到每天的最佳投资策略,然后重复这个过程,直到在 2021 年 9 月 10 日,初始 1000 美元能收获的投资价值多少?
And our goal is to get the best investment strategy xff0c per day, based on predictive models and certain constraints; and then repeat the process xff0c; until September 10, 2021 xff0c; and how much value an initial $1,000 can harvest xff1f;
可以简单设置一个规划模型:其中,如果是黄金和比特币都开市,则目标函数为:
A planning model xff1a; where xff0c; is available for both gold and Bitcoin xff0c; the target function is :
每日收益=(第二天的黄金价格/今天的黄金价格)*(前一天的黄金持有数+当天的黄金交易数)+(第二天的比特币价格/今天的比特币价格)*(前一天的比特币持有数+当天的比特币交易数)
daily gains61; ( gold price xff09;* xff08; gold holdings #43; gold transactions xff09; ff08; Bitcoin prices xff09;*ff08; Bitcoin holdings 43; Bitcoin transactions xff09 the day before xff09;
约束条件有:
The binding conditions are xff1a;
约束1:当天的黄金、比特币交易数不得超过总持有量
1xff1a; trading in gold, bitcoin on the day must not exceed total holdings
约束2:当天的黄金、比特币交易数不得超过前一天各自的持有量
2xff1a; trading in gold, bitcoin on the day must not exceed the previous day's respective holdings
约束3:高于税费才交易
3xff1a; transactions above taxes and fees
还有其他约束,大家可以自行补充。
has other constraints xff0c; you can add them yourself.
如果只有比特币开市,则目标函数为:
if only Bitcoin opened xff0c; the target function is xff1a;
每日收益=前一天的黄金持有数+(第二天的比特币价格/今天的比特币价格)*(前一天的比特币持有数+当天的比特币交易数)
单天最优解遗传算法求解
设置初始参数 Set the initial parameters 第一个目标函数求解结果 The first target function solver result 第二个目标函数求解结果 The second target function solver result SPSSPRO-Notebook 即最终持有投资额为7047.974988元。 The final investment held was $7047.974988. 当然我这个数值比较低是因为跟投资的约束条件有关,这里我只是简单列一下容易模型化的约束条件,大家可以自行进行补充,跑出更优秀的结果。 Of course, I'm lower because of the constraints on investment xff0c; here I simply list the constraints that can easily be modeled xff0c; you can supplement xff0c on your own; you can run out of better results. 灵敏性分析 Sensitivity analysis 第三题的原理其实就是灵敏度分析,前面第一二题我们不是设置了初始黄金的持有量是一半一半各500,第三题灵敏度分析,它其实就是对这一些手动设置的参数进行分析,就像假设我黄金一开始持有量是100,会不会影响到最终的结果,所以我们可以看到那张图x轴是黄金开始的一个持有量,y轴就是经过5年交易期结束后的资产总额了,可以看到,他其实是再6750左右上下波动,说明模型的稳健性很强,对最终的资产总额结果不会有很灵敏的影响。 The third principle is actually sensitivity analysis & #xff0c; the first two questions we did not set the initial gold holdings at half of 500xff0c; the third was sensitivity analysis & #xff0c; it was actually an analysis of these manual parameters & #xff0c; just as it assumed that my gold was initially held at 100xff0c; whether it would affect the final result xff0c; so we can see that the graph x axis is a holding at the beginning of gold xff0c; the y axis is the total assets after the five-year trading period xff0c; xff0c; he is actually about 6750 #xff0c; an indication of the robustness of the model xff0c; there would be no sensitivity to the final total asset result. 以上,全部的代码、题目数据可以通过下面免费获取,关注SPSSPRO社区账号【跟着欢欢玩转数模】: xff0c above; all code and subject data are available free of charge below, with attention to the SPSPRO community account & #xff1a; free access code43; subject #43; data 作者创作不易,大家觉得有用的点赞收藏关注三连呗。 The author of the strong is not easy to write #xff0c; it's a very useful collection to focus on three lines. step4:使用遗传算法求解
step5:迭代每天重复进行最优化投资
step6:1826天的最优策略下模拟结果
4.2 灵敏度分析
注册有任何问题请添加 微信:MVIP619 拉你进入群
打开微信扫一扫
添加客服
进入交流群
发表评论