Optimization of common data mining algorithms for petroleum exploration and development

doi:10.7623/syxb201802013

Acta Petrolei Sinica ›› 2018, Vol. 39 ›› Issue (2): 240-246.DOI: 10.7623/syxb201802013

• SPECIAL CONTRIBUTION • Previous Articles Next Articles

Optimization of common data mining algorithms for petroleum exploration and development

Li Dawei, Shi Guangren

PetroChina Research Institute of Petroleum Exploration and Development, Beijing 100083, China

Received:2017-01-12 Revised:2018-01-17 Online:2018-02-25 Published:2018-03-09

油气勘探开发常用数据挖掘算法优选

李大伟, 石广仁

中国石油勘探开发研究院北京 100083

通讯作者: 李大伟,男,1969年5月生,1991年获中国地质大学(武汉)学士学位,1996年获中国地质大学(北京)博士学位,现为中国石油勘探开发研究院高级工程师,主要从事海外勘探开发信息化建设、应用与数据挖掘工作。Email:leedw@petrochina.com.cn
作者简介:李大伟,男,1969年5月生,1991年获中国地质大学(武汉)学士学位,1996年获中国地质大学(北京)博士学位,现为中国石油勘探开发研究院高级工程师,主要从事海外勘探开发信息化建设、应用与数据挖掘工作。Email:leedw@petrochina.com.cn
基金资助:
国家重大科技专项"全球油气资源评价与选区选带研究"（2016ZX05029）资助。

Abstract

Abstract: For the petroleum industry in the big data period, it is necessary to fully exploit the great potential value of big data in the petroleum industry. Although data mining has achieved remarkable results in many industries, its application in the field of hydrocarbon exploration and development is still in its initial stage, which mainly lies on the particularity of the data and its specific applications in hydrocarbon exploration and development. The common algorithms in data mining can be divided into regression, classification, clustering, estimation, prediction, association analysis and so on. Among them, regression and classification are the most mature and most widely used algorithms. However, for specific research objects as well as different research questions and data resources, different regression and classification algorithms have their own applicability, thus it is required to optimize the appropriate algorithm for data sets aiming at specific problems. Taking the oil test data of Tahe oilfield as an example, and formation factor and reservoir classification as the mining objects, the applicability of common regression and classification algorithms is analyzed in detail. The results show that for common petroleum industry data and study objects, the optimal regression algorithm is the back propagation neural network (BPNN), followed by support vector machine regression (R-SVM) and multivariate regression analysis (MRA); the optimal classification algorithm is the support vector machine classification (C-SVM), followed by Bayesian stepwise discrimination (BAYSD); MRA and BAYSD can also be used for data dimensionality reduction, and the latter is better; R-type clustering analysis (RCA) can also be used for data dimensionality reduction, while Q cluster analysis (QCA) can be adopted for sample reduction; in the research of specific data mining applications, the algorithm must be optimized according to specific data set.

Key words: big data, data mining, regression, classification, data cleaning, optimization, formation factor, oil layer classification

摘要： 迈入大数据时代的石油工业，需要充分挖掘石油工业大数据的巨大潜在价值。虽然数据挖掘已经在许多行业取得了丰硕的成果，但在油气勘探开发领域的应用还处于初始阶段，这主要由于油气勘探开发的数据及其应用具有自己的特殊性。数据挖掘常用的算法可分为回归、分类、聚类、估计、预测、关联分析等。其中的回归、分类是最成熟、应用最多的算法。但是对于具体的研究对象、不同的研究问题和数据源，不同的回归和分类算法又具有各自的适用性，因此需要针对具体问题优选适合该数据集的算法。以塔河油田的试油数据为例，以地层系数和油层分类为分析挖掘对象，详细解析了常用回归、分类算法的适用性。研究发现，对于常见的石油行业数据和研究对象：1最优的回归算法是反向传播神经网络（BPNN），其次为支持向量机回归（R-SVM）和多元回归分析（MRA）；2最优的分类算法是支持向量机分类（C-SVM），其次为贝叶斯逐步判别（BAYSD）；3 MRA和BAYSD可以用于数据降维，BAYSD的降维效果更好；4 R型聚类分析（RCA）可以用于数据降维，Q型聚类分析（QCA）可以用于样本约简；5在做具体的数据挖掘应用研究时一定要针对具体数据集对所用算法进行优选。

关键词: 大数据, 数据挖掘, 回归, 分类, 数据清洗, 优选, 地层系数, 油层分类

CLC Number:

TE318

Li Dawei, Shi Guangren. Optimization of common data mining algorithms for petroleum exploration and development[J]. Acta Petrolei Sinica, 2018, 39(2): 240-246.

李大伟, 石广仁. 油气勘探开发常用数据挖掘算法优选[J]. 石油学报, 2018, 39(2): 240-246.

References

[1] 李大伟,熊华平,石广仁,等.基于全球典型油气田数据库的数据挖掘预处理[J].大庆石油地质与开发,2016,35(1):66-70.
LI Dawei,XIONG Huaping,SHI Guangren,et al.Preprocessing of the data tapping based on global typical oil and gas field database[J].Petroleum Geology and Oilfield Development in Daqing,2016,35(1):66-70.
[2] HAN J W,KAMBER M.Data mining:concepts and techniques[M].2nd ed.San Francisco:Morgan Kaufmann,2006.
[3] MAIMON O,ROKACH L.Data mining and knowledge discovery handbook[M].2nd ed.New York,USA:Springer,2010.
[4] 戴红,常子冠,于宁.数据挖掘导论[M].北京:清华大学出版社,2015.
DAI Hong,CHANG Ziguan,YU Ning.Introduction to data mining[M].Beijing:Tsinghua University Press,2015.
[5] 李功权,陈恭洋,吴东胜,等.油气储层建模中的空间数据挖掘技术[J].石油天然气学报,2006,28(8):47-49.
LI Gongquan,CHEN Gongyang,WU Dongsheng,et al.Spatial data mining in petroleum reservoir modeling[J].Journal of Oil and Gas Technology,2006,28(8):47-49.
[6] 李洪奇,郭海峰,郭海敏,等.复杂储层测井评价数据挖掘方法研究[J].石油学报,2009,30(4):542-549.
LI Hongqi,GUO Haifeng,GUO Haimin,et al.An approach of data mining for evaluation of complex formation using well logs[J].Acta Petrolei Sinica,2009,30(4):542-549.
[7] 严丽,王燕,范树平.多元回归分析方法预测川东北礁滩相储层产能[J].新疆石油天然气,2011,7(4):37-79.
YAN Li,WANG Yan,FAN Shuping.Output prediction of reef-bank facies in northeastern-Sichuan Basin using multiple regression analysis[J].Xinjiang Oil & Gas,2011,7(4):37-79.
[8] 钟仪华,王丹,李晴晴,等.基于数据挖掘的低品位油藏经营评价指标分析[J].数据挖掘,2014,4(4):32-37.
ZHONG Yihua,WANG Dan,LI Qingqing,et al.The analysis of evaluation index of low-grade reservoir operation based on data mining[J].Hans Journal of Data Mining,2014,4(4):32-37.
[9] 石广仁.地学数据挖掘与知识发现[M].北京:石油工业出版社,2012.
SHI Guangren.Data mining and knowledge discovery for geoscientists[M].Beijing:Petroleum Industry Press,2012.
[10] 檀朝东,陈见成,刘志海,等.大数据挖掘技术在石油工程的应用前景展望[J].石油工程技术,2015(1):49-51.
TAN Chaodong,CHEN Jiancheng,LIU Zhihai,et al.Application prospect of big data mining technology in petroleum engineering[J].Petroleum Engineering Technology,2015(1):49-51.
[11] 中国知网.[EB/OL].[2017-01-01].http://www.cnki.net.
China National Knowledge Infrastructure.[EB/OL].[2017-01-01].http://www.cnki.net.
[12] LUO Weiping,LI Hongqi,SHI Ning.Semi-supervised least squares support vector machine algorithm:application to offshore oil reservoir[J].Applied Geophysics,2016,13(2):406-415.
[13] 张莹,潘保芝.松辽盆地火山岩岩性识别中测井数据的选择及判别方法[J].石油学报,2012,33(5):830-834.
ZHANG Ying,PAN Baozhi.Selection and identification of logging data for lithology recognition of volcanic rocks in Songliao Basin[J].Acta Petrolei Sinica,2012,33(5):830-834.
[14] 张尘.数据挖掘技术在石油勘探中的应用研究[J].中国石油和化工标准与质量,2014(6):49.
ZHANG Chen.Application of data mining in petroleum exploration[J].China Petroleum and Chemical Standard and Quality,2014(6):49.
[15] 尚福华,原野,王才志,等.基于知识库的解释模型智能优选测井数据处理方法[J].石油学报,2015,36(11):1449-1456.
SHANG Fuhua,YUAN Ye,WANG Caizhi,et al.A logging data processing method of intelligent optimization logging interpretation model based on knowledge-base[J].Acta Petrolei Sinica,2015,36(11):1449-1456.
[16] DURLOFSKY L J,DIMITRAKOPOULOS R.Smart oil fields and mining complexes[J]. Mathematical Geosciences,2017,49(3):275-276.
[17] 王建君.分布式数据挖掘研究[J].电子商务,2017(7):41-42.
WANG Jianjun.Study on distributed data mining[J].E-Business Journal,2017(7):41-42.
[18] 杨震宇.基于MapReduce框架下的数据挖掘方法研究[J].中国高新技术企业,2017(4):8-10.
YANG Zhenyu.Study on data mining methods based on MapReduce frame[J].China High-Tech Enterprises,2017(4):8-10.
[19] 杨涛.基于决策树算法的石油基础数据挖掘系统应用研究[J].电子设计工程,2016,24(18):16-18.
YANG Tao.Oil based data mining system based on decision tree algorithm applied research[J].Electronic Design Engineering,2016,24(18):16-18.
[20] 檀朝东,项勇,赵昕铭,等.基于大数据的油气集输系统生产能耗时序预测模型[J].石油学报,2016,37(增刊2):158-164.
TAN Chaodong,XIANG Yong,ZHAO Xinming,et al.Energy consumption prediction and application in oil and gas gathering and transferring system production based on large data[J].Acta Petrolei Sinica,2016,37(S2):158-164.
[21] 喻思羽,李少华,何幼斌,等.基于样式降维聚类的多点地质统计建模算法[J].石油学报,2016,37(11):1403-1409.
YU Siyu,LI Shaohua,HE Youbin,et al.Multiple-point geostatistics algorithm based on pattern scale-down cluster[J].Acta Petrolei Sinica,2016,37(11):1403-1409.
[22] SHI Guangren,ZHU Yixiang,MI Shiyun,et al.A big data mining in petroleum exploration and development[J].Advances in Petroleum Exploration and Development,2014,7(2):1-8.
[23] 康志宏,郭春华,伍文明.塔河碳酸盐岩缝洞型油藏动态储层评价技术[J].成都理工大学学报:自然科学版,2007,34(2):143-146.
KANG Zhihong,GUO Chunhua,WU Wenming.Technique of dynamic descriptions to the crack and cave carbonate rock reservoir in the Tahe oil field,Xinjiang,China[J].Journal of Chengdu University of Technology:Science & Technology Edition,2007,34(2):143-146.

Optimization of common data mining algorithms for petroleum exploration and development

油气勘探开发常用数据挖掘算法优选

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments

[1]	Li Ning, Xu Binsen, Wu Hongliang, Feng Zhou, Li Yusheng, Wang Kewen, Liu Peng. Application status and prospects of artificial intelligence in well logging and formation evaluation [J]. Acta Petrolei Sinica, 2021, 42(4): 508-522.
[2]	Cui Chuanzhi, Zheng Wenqian, Zhu Yangwen, Yuan Fuqing, Wu Zhongwei, Sui Yingfei. A method for optimizing the location of infill wells exploited by viscosity reduction chemical flooding after steam huff and puff stimulation [J]. Acta Petrolei Sinica, 2020, 41(12): 1643-1648,1656.
[3]	Liu He, Lu Qiuyu, Zhu Shijia, Jiang Wei, Wang Suling. Application of typical clustering algorithms in analysis of system efficiency of pumping wells in blocks [J]. Acta Petrolei Sinica, 2020, 41(12): 1657-1664.
[4]	Dong Shimin, Wang Hongbo. Simulation model of lateral vibration of sucker rod string in directional wells and point arrangement optimization of centralizer [J]. Acta Petrolei Sinica, 2020, 41(12): 1686-1696.
[5]	Ji Bingyu. Some understandings on the development trend in research of oil and gas reservoir engineering methods [J]. Acta Petrolei Sinica, 2020, 41(12): 1774-1778.
[6]	Du Xuebin, Liu Xiaofeng, Lu Yongchao, Liu Huimin, Zhang Shoupeng, Ma Yiquan, Zhao Ke, Wei Wei. Classification, characteristics and development models of continental fine-grained mixed sedimentation: a case study of Dongying sag [J]. Acta Petrolei Sinica, 2020, 41(11): 1324-1333.
[7]	Gao Wenbin, Li Yiqiang, He Shumei, Pan Deng, Liu Mingxi, Guan Cuo. Classification method of occurrence mode of remaining oil based on fluorescence thin sections [J]. Acta Petrolei Sinica, 2020, 41(11): 1406-1415.
[8]	Guo Jianchun, Ren Jichuan, Wang Shibin, Gou Bo, Zhao Junsheng, Wu Lin. Numerical simulation and application of multi-field coupling of acid fracturing in fractured tight carbonate reservoirs [J]. Acta Petrolei Sinica, 2020, 41(10): 1219-1228.
[9]	Chen Haihong, Zuo Lili, Wu Changchun, Li Qingping. Optimization on delivery schedules of a multiproduct pipeline based on the oil-demand mode [J]. Acta Petrolei Sinica, 2019, 40(8): 990-996.
[10]	Feng Qihong, Xu Shiqian, Ren Guotong, Wang Sen, Li Yuyao. Hierarchical optimization of well pattern parameters in multi-stage fractured horizontal well for tight oil [J]. Acta Petrolei Sinica, 2019, 40(7): 830-838.
[11]	Liu Hangyu, Tian Zhongyuan, Liu Bo, Guo Rui, Shi Kaibo, Ye Yufeng. Classification and prediction of giant thick strongly heterogeneous carbonate reservoirs in the Middle East area: a case study of Mid-Cretaceous Mishrif Formation in the W oilfield of Iraq [J]. Acta Petrolei Sinica, 2019, 40(6): 677-691.
[12]	Lu Yujia, Cao Junxing, Liu Zhege, Tian Renfei, Xiao Xue. Application of waveform classification technology for fluid identification in fractured-vuggy reservoir [J]. Acta Petrolei Sinica, 2019, 40(2): 182-189.
[13]	Tang Jie, Zhang Wenzheng, Wen Lei, Gu Yutian, Chen Xueguo. Random noise attenuation method of single-point high-density seismic data based on regularized locally adaptive steering kernel regression [J]. Acta Petrolei Sinica, 2019, 40(12): 1495-1502,1552.
[14]	Chen Jianping, Wei Jun, Ni Yunyan, Chen Jianjun, Deng Chunping, Tian Duowen, Hu Jian, Huang Zhenkai, Zhang Dijia, Han Yongke. Geochemical features and maturity classification of crude oil in the Jiuxi depression, Jiuquan Basin [J]. Acta Petrolei Sinica, 2018, 39(5): 491-503.
[15]	Zhao Chuanwei, Wang Shaoxian, Pei Xueliang, Wu Zhonghua, Nie Yunfei, Zhou Hongguo, Sun Haoyu, Jiang Xue. Optimization design of counter-type full-bore fracturing sliding-sleeve [J]. Acta Petrolei Sinica, 2018, 39(4): 482-490.