现代电子技术
現代電子技術
현대전자기술
MODERN ELECTRONICS TECHNIQUE
2015年
9期
159-162
,共4页
微博%公交系统%数据抓取%中文分词%大数据
微博%公交繫統%數據抓取%中文分詞%大數據
미박%공교계통%수거조취%중문분사%대수거
Weibo%public transport system%data capture%Chinese word segmentation%big data
为充分利用大数据时代的海量数据,提出一种基于新浪微博的公交系统数据采集及分析方法。通过Web Crawler从新浪微博抓取所需时空范围内的公交微博,分析公交微博发布的时间与空间分布规律;随后采用KMP算法统计并剔除冗余转发及回复微博,提取并分析公交相关的热点话题;基于中科院ICTCLAS算法进行中文分词处理,删除停用词后统计词频,生成关键词的词云。最后通过南京市范围内的8913条公交微博进行实例验证与分析,结果表明,该方法可以从海量的微博数据中提取公交相关数据并进行分析,分析数据量大且有时效性,分析结果可为公交系统管理的优化与改善、公交政策的制定提供数据支撑。
為充分利用大數據時代的海量數據,提齣一種基于新浪微博的公交繫統數據採集及分析方法。通過Web Crawler從新浪微博抓取所需時空範圍內的公交微博,分析公交微博髮佈的時間與空間分佈規律;隨後採用KMP算法統計併剔除冗餘轉髮及迴複微博,提取併分析公交相關的熱點話題;基于中科院ICTCLAS算法進行中文分詞處理,刪除停用詞後統計詞頻,生成關鍵詞的詞雲。最後通過南京市範圍內的8913條公交微博進行實例驗證與分析,結果錶明,該方法可以從海量的微博數據中提取公交相關數據併進行分析,分析數據量大且有時效性,分析結果可為公交繫統管理的優化與改善、公交政策的製定提供數據支撐。
위충분이용대수거시대적해량수거,제출일충기우신랑미박적공교계통수거채집급분석방법。통과Web Crawler종신랑미박조취소수시공범위내적공교미박,분석공교미박발포적시간여공간분포규률;수후채용KMP산법통계병척제용여전발급회복미박,제취병분석공교상관적열점화제;기우중과원ICTCLAS산법진행중문분사처리,산제정용사후통계사빈,생성관건사적사운。최후통과남경시범위내적8913조공교미박진행실례험증여분석,결과표명,해방법가이종해량적미박수거중제취공교상관수거병진행분석,분석수거량대차유시효성,분석결과가위공교계통관리적우화여개선、공교정책적제정제공수거지탱。
To take full advantage of huge data in big data age,the method is proposed to collect and analysis data in pub?lic transport system with Sina Weibo. In the required time and space range,public transport Weibo is captured from Sina Weibo by web crawler. Time and space distribution rule which is published by public transport Weibo is analyzed,the redundant for?wards and replies in Weibo are calculated and removed by KMP algorithm,and related hot topics of public transport are pulled and analyzed. ICTCLAS algorithm proposed by Chinese Academy of Sciences is applied to process Chine word segmentation,cal?culate word?frequency after delete the stop words,and generate keywords cloud. Verification and analysis on 8913 tips public transport Weibo in Nanjing. The results show that the related data of public transport is captured and analyzed from huge Weibo data,with the character of large data analysis and timeliness of the proposed method. The outcomes of analysis provide data sup?port for optimization and improvement of public transport managing,and set up public transport policy.