当前位置: 首页 > >

×îÐÂHiveµÄ¸ßƵÃæÊÔÌâÐÂÏʳö¯ÁË£¡

发布时间:

µ¼Óï


×î*Ò²Êǵ½ÁË×¼±¸ÃæÊÔµÄʱºòÁËÓÚÊÇÀϸçÎÒÒ²×Ô¼ºÕûÀíÁËһЩ¹ØÓÚHiveµÄ³£ÎʵÄÃæÊÔÌâÓÚÊǸú´ó¼Ò·ÖÏíÏ£¬Í¬Ê±ÎÒÒ²»á½«ÕâЩÌâĿͬ²½µ½GitHubÉÏGitHub»¹Óкöà×ÊÔ´ÈçFlinkÃæÊÔÌ⣬SparkÃæÊÔÌ⣬³ÌÐòÔ±±Ø±¸Èí¼þ£¬hiveÃæÊÔÌ⣬HadoopÃæÊÔÌ⣬DockerÃæÊÔÌ⣬¼òÀúÄ£°åÇëȥϷ½Á´½ÓÏÂÔØ https://github.com/lhh2002/Framework-Of-BigData¶ÔÄãÃÇÓаïÖú¼ÇµÃ star ¡£Ï£Íû´òËã»»¹¤×÷µÄÅóÓÑ»òÕÒ¹¤×÷µÄÅóÓѶÔÄãÃÇÓÐЩ°ïÖú£¬×îºóԤף´ó¼ÒеÄÒ»ÄêÉýÖ°¼Óн£¬ºÃºÃ°ÑÎÕ½ð3Òø4µÄ»ú»á£¬¹¤×ÊÕÇÕÇÕÇ£¡



1¡¢HiveµÄÁ½Õűí¹ØÁª£¬Ê¹ÓÃMapReduceÔõôʵÏÖ£¿

???????? Èç¹ûÆäÖÐÓÐÒ»ÕűíΪС±í£¬Ö±½ÓʹÓÃmap¶ËjoinµÄ·½Ê½£¨map¶Ë¼ÓÔØС±í£©½øÐоۺϡ£


???????? Èç¹ûÁ½ÕŶ¼ÊÇ´ó±í£¬ÄÇô²ÉÓÃÁªºÏkey£¬ÁªºÏkeyµÄµÚÒ»¸ö×é³É²¿·ÖÊÇjoin onÖеĹ«¹²×ֶΣ¬µÚ¶þ²¿·ÖÊÇÒ»¸öflag£¬0´ú±í±íA£¬1´ú±í±íB£¬ÓÉ´ËÈÃReduceÇø·Ö¿Í»§ÐÅÏ¢ºÍ¶©µ¥ÐÅÏ¢£»ÔÚMapperÖÐͬʱ´¦ÀíÁ½ÕűíµÄÐÅÏ¢£¬½«join on¹«¹²×Ö¶ÎÏàͬµÄÊý¾Ý»®·Öµ½Í¬Ò»¸ö·ÖÇøÖУ¬½ø¶ø´«µÝµ½Ò»¸öReduceÖУ¬È»ºóÔÚReduceÖÐʵÏ־ۺϡ£


2¡¢Çë̸һÏÂHiveµÄÌص㣬HiveºÍRDBMSÓÐʲôÒìͬ£¿

???????? hiveÊÇ»ùÓÚHadoopµÄÒ»¸öÊý¾Ý²Ö¿â¹¤¾ß£¬¿ÉÒÔ½«½á¹¹»¯µÄÊý¾ÝÎļþÓ³ÉäΪһÕÅÊý¾Ý¿â±í£¬²¢ÌṩÍêÕûµÄsql²éѯ¹¦ÄÜ£¬¿ÉÒÔ½«sqlÓï¾äת»»ÎªMapReduceÈÎÎñ½øÐÐÔËÐС£ÆäÓŵãÊÇѧ*³É±¾µÍ£¬¿ÉÒÔͨ¹ýÀàSQLÓï¾ä¿ìËÙʵÏÖ¼òµ¥µÄMapReduceͳ¼Æ£¬²»±Ø¿ª·¢×¨ÃŵÄMapReduceÓ¦Óã¬Ê®·ÖÊʺÏÊý¾Ý²Ö¿âµÄͳ¼Æ·ÖÎö£¬µ«ÊÇHive²»Ö§³Öʵʱ²éѯ¡£


???????? HiveÓë¹ØϵÐÍÊý¾Ý¿âµÄÇø±ð£º



3¡¢Çë˵Ã÷hiveÖÐ Sort By£¬Order By£¬Cluster By£¬Distrbute By¸÷´ú±íʲôÒâ˼£¿

???????? Order by£º»á¶ÔÊäÈë×öÈ«¾ÖÅÅÐò£¬Òò´ËÖ»ÓÐÒ»¸öreducer£¨¶à¸öreducerÎÞ·¨±£Ö¤È«¾ÖÓÐÐò£©¡£Ö»ÓÐÒ»¸öreducer£¬»áµ¼Öµ±ÊäÈë¹æÄ£½Ï´óʱ£¬ÐèÒª½Ï³¤µÄ¼ÆËãʱ¼ä¡£


???????? Sort by£º²»ÊÇÈ«¾ÖÅÅÐò£¬ÆäÔÚÊý¾Ý½øÈëreducerÇ°Íê³ÉÅÅÐò¡£1


????????Distribute by£º°´ÕÕÖ¸¶¨µÄ×ֶζÔÊý¾Ý½øÐл®·ÖÊä³öµ½²»Í¬µÄreduceÖС£


????????Cluster by£º³ýÁ˾ßÓÐ distribute by µÄ¹¦ÄÜÍ⻹¼æ¾ß sort by µÄ¹¦ÄÜ¡£


4¡¢Ð´³öHiveÖÐsplit¡¢coalesce¼°collect_listº¯ÊýµÄÓ÷¨£¨¿É¾ÙÀý£©£¿

???????? split½«×Ö·û´®×ª»¯ÎªÊý×飬¼´£ºsplit(¡®a,b,c,d¡¯ , ¡®,¡¯) ==> [¡°a¡±,¡°b¡±,¡°c¡±,¡°d¡±]¡£


???????? coalesce(T v1, T v2, ¡­) ·µ»Ø²ÎÊýÖеĵÚÒ»¸ö·Ç¿ÕÖµ£»Èç¹ûËùÓÐÖµ¶¼Îª NULL£¬ÄÇô·µ»ØNULL¡£


???????? collect_listÁгö¸Ã×Ö¶ÎËùÓеÄÖµ£¬²»È¥ÖØ => select collect_list(id) from table¡£


5¡¢ HiveÓÐÄÄЩ·½Ê½±£´æÔªÊý¾Ý£¬¸÷ÓÐÄÄЩÌص㣿

???????? HiveÖ§³ÖÈýÖÖ²»Í¬µÄÔª´æ´¢·þÎñÆ÷£¬·Ö±ðΪ£ºÄÚǶʽԪ´æ´¢·þÎñÆ÷¡¢±¾µØÔª´æ´¢·þÎñÆ÷¡¢Ô¶³ÌÔª´æ´¢·þÎñÆ÷£¬Ã¿ÖÖ´æ´¢·½Ê½Ê¹Óò»Í¬µÄÅäÖòÎÊý¡£


???????? ÄÚǶʽԪ´æ´¢Ö÷ÒªÓÃÓÚµ¥Ôª²âÊÔ£¬ÔÚ¸ÃģʽÏÂÿ´ÎÖ»ÓÐÒ»¸ö½ø³Ì¿ÉÒÔÁ¬½Óµ½Ôª´æ´¢£¬DerbyÊÇÄÚǶʽԪ´æ´¢µÄĬÈÏÊý¾Ý¿â¡£


???????? ÔÚ±¾µØģʽÏ£¬Ã¿¸öHive¿Í»§¶Ë¶¼»á´ò¿ªµ½Êý¾Ý´æ´¢µÄÁ¬½Ó²¢ÔÚ¸ÃÁ¬½ÓÉÏÇëÇóSQL²éѯ¡£


???????? ÔÚÔ¶³ÌģʽÏ£¬ËùÓеÄHive¿Í»§¶Ë¶¼½«´ò¿ªÒ»¸öµ½ÔªÊý¾Ý·þÎñÆ÷µÄÁ¬½Ó£¬¸Ã·þÎñÆ÷ÒÀ´Î²éѯԪÊý¾Ý£¬ÔªÊý¾Ý·þÎñÆ÷ºÍ¿Í»§¶ËÖ®¼äʹÓÃThriftЭÒéͨÐÅ¡£


6¡¢HiveÄÚ²¿±íºÍÍⲿ±íµÄÇø±ð£¿

???????? ´´½¨±íʱ£º´´½¨ÄÚ²¿±íʱ£¬»á½«Êý¾ÝÒƶ¯µ½Êý¾Ý²Ö¿âÖ¸ÏòµÄ·¾¶£»Èô´´½¨Íⲿ±í£¬½ö¼Ç¼Êý¾ÝËùÔڵķ¾¶£¬²»¶ÔÊý¾ÝµÄλÖÃ×öÈκθı䡣


???????? ɾ³ý±íʱ£ºÔÚɾ³ý±íµÄʱºò£¬ÄÚ²¿±íµÄÔªÊý¾ÝºÍÊý¾Ý»á±»Ò»Æðɾ³ý£¬ ¶øÍⲿ±íֻɾ³ýÔªÊý¾Ý£¬²»É¾³ýÊý¾Ý¡£ÕâÑùÍⲿ±íÏà¶ÔÀ´Ëµ¸ü¼Ó°²È«Ð©£¬Êý¾Ý×éÖ¯Ò²¸ü¼ÓÁé»î£¬·½±ã¹²ÏíÔ´Êý¾Ý¡£


7¡¢HiveµÄº¯Êý£ºUDF¡¢UDAF¡¢UDTFµÄÇø±ð£¿

???????? UDF£ºµ¥ÐнøÈ룬µ¥ÐÐÊä³ö


???????? UDAF£º¶àÐнøÈ룬µ¥ÐÐÊä³ö


???????? UDTF£ºµ¥ÐÐÊäÈ룬¶àÐÐÊä³ö


8¡¢ËùÓеÄHiveÈÎÎñ¶¼»áÓÐMapReduceµÄÖ´ÐÐÂð£¿

²»ÊÇ£¬´ÓHive0.10.0°æ±¾¿ªÊ¼£¬¶ÔÓÚ¼òµ¥µÄ²»ÐèÒª¾ÛºÏµÄÀàËÆSELECT from

LIMIT nÓï¾ä£¬²»ÐèÒªÆðMapReduce job£¬Ö±½Óͨ¹ýFetch task»ñÈ¡Êý¾Ý¡£


9¡¢ËµËµ¶ÔHiveÍ°±íµÄÀí½â£¿

???????? Í°±íÊǶÔÊý¾Ýij¸ö×ֶνøÐйþϣȡֵ£¬È»ºó·Åµ½²»Í¬ÎļþÖд洢¡£


???????? Êý¾Ý¼ÓÔص½Í°±íʱ£¬»á¶Ô×Ö¶ÎÈ¡hashÖµ£¬È»ºóÓëÍ°µÄÊýÁ¿È¡Ä£¡£°ÑÊý¾Ý·Åµ½¶ÔÓ¦µÄÎļþÖС£ÎïÀíÉÏ£¬Ã¿¸öÍ°¾ÍÊDZí(»ò·ÖÇø£©Ä¿Â¼ÀïµÄÒ»¸öÎļþ£¬Ò»¸ö×÷Òµ²úÉúµÄÍ°(Êä³öÎļþ)ºÍreduceÈÎÎñ¸öÊýÏàͬ¡£


???????? Í°±íרÃÅÓÃÓÚ³éÑù²éѯ£¬ÊǺÜרҵÐԵģ¬²»ÊÇÈÕ³£ÓÃÀ´´æ´¢Êý¾ÝµÄ±í£¬ÐèÒª³éÑù²éѯʱ£¬²Å´´½¨ºÍʹÓÃÍ°±í¡£


10¡¢Hiveµ×²ãÓëÊý¾Ý¿â½»»¥Ô­Àí£¿

???????? Hive µÄ²éѯ¹¦ÄÜÊÇÓÉ HDFS ºÍ MapReduce½áºÏÆðÀ´ÊµÏֵģ¬¶ÔÓÚ´ó¹æÄ£Êý¾Ý²éѯ»¹ÊDz»½¨ÒéÔÚ hive ÖУ¬ÒòΪ¹ý´óÊý¾ÝÁ¿»áÔì³É²éѯʮ·Ö»ºÂý¡£ Hive Óë MySQLµÄ¹Øϵ£ºÖ»ÊǽèÓà MySQLÀ´´æ´¢ hive ÖеıíµÄÔªÊý¾ÝÐÅÏ¢£¬³ÆΪ metastore£¨ÔªÊý¾ÝÐÅÏ¢£©¡£


11¡¢Hive±¾µØģʽ

???????? ´ó¶àÊýµÄHadoop JobÊÇÐèÒªHadoopÌṩµÄÍêÕûµÄ¿ÉÀ©Õ¹ÐÔÀ´´¦Àí´óÊý¾Ý¼¯µÄ¡£²»¹ý£¬ÓÐʱHiveµÄÊäÈëÊý¾ÝÁ¿ÊǷdz£Ð¡µÄ¡£ÔÚÕâÖÖÇé¿öÏ£¬Îª²éѯ´¥·¢Ö´ÐÐÈÎÎñʱÏûºÄ¿ÉÄÜ»á±Èʵ¼ÊjobµÄÖ´ÐÐʱ¼äÒª¶àµÄ¶à¡£¶ÔÓÚ´ó¶àÊýÕâÖÖÇé¿ö£¬Hive¿ÉÒÔͨ¹ý±¾µØģʽÔÚµ¥Ì¨»úÆ÷ÉÏ´¦ÀíËùÓеÄÈÎÎñ¡£¶ÔÓÚСÊý¾Ý¼¯£¬Ö´ÐÐʱ¼ä¿ÉÒÔÃ÷ÏÔ±»Ëõ¶Ì¡£


???????? Óû§¿ÉÒÔͨ¹ýÉèÖÃhive.exec.mode.local.autoµÄֵΪtrue£¬À´ÈÃHiveÔÚÊʵ±µÄʱºò×Ô¶¯Æô¶¯Õâ¸öÓÅ»¯¡£


12¡¢Hive ÖеÄѹËõ¸ñʽTextFile¡¢SequenceFile¡¢RCfile ¡¢ORCfile¸÷ÓÐʲôÇø±ð£¿

1¡¢TextFile


???????? ĬÈϸñʽ£¬´æ´¢·½Ê½ÎªÐд洢£¬Êý¾Ý²»×öѹËõ£¬´ÅÅÌ¿ªÏú´ó£¬Êý¾Ý½âÎö¿ªÏú´ó¡£¿É½áºÏGzip¡¢Bzip2ʹÓÃ(ϵͳ×Ô¶¯¼ì²é£¬Ö´Ðвéѯʱ×Ô¶¯½âѹ)£¬µ«Ê¹ÓÃÕâÖÖ·½Ê½£¬Ñ¹ËõºóµÄÎļþ²»Ö§³Ösplit£¬Hive²»»á¶ÔÊý¾Ý½øÐÐÇз֣¬´Ó¶øÎÞ·¨¶ÔÊý¾Ý½øÐв¢ÐвÙ×÷¡£²¢ÇÒÔÚ·´ÐòÁл¯¹ý³ÌÖУ¬±ØÐëÖð¸ö×Ö·ûÅжÏÊDz»ÊÇ·Ö¸ô·ûºÍÐнáÊø·û£¬Òò´Ë·´ÐòÁл¯¿ªÏú»á±ÈSequenceFile¸ß¼¸Ê®±¶¡£


2¡¢SequenceFile


???????? SequenceFileÊÇHadoop APIÌṩµÄÒ»ÖÖ¶þ½øÖÆÎļþÖ§³Ö£¬´æ´¢·½Ê½ÎªÐд洢£¬Æä¾ßÓÐʹÓ÷½±ã¡¢¿É·Ö¸î¡¢¿ÉѹËõµÄÌص㡣


???????? SequenceFileÖ§³ÖÈýÖÖѹËõÑ¡Ôñ£ºNONE£¬RECORD£¬BLOCK¡£RecordѹËõÂʵͣ¬Ò»°ã½¨ÒéʹÓÃBLOCKѹËõ¡£


???????? ÓÅÊÆÊÇÎļþºÍhadoop apiÖеÄMapFileÊÇÏ໥¼æÈݵÄ


3¡¢RCFile


????????´æ´¢·½Ê½£ºÊý¾Ý°´Ðзֿ飬ÿ¿é°´Áд洢¡£½áºÏÁËÐд洢ºÍÁд洢µÄÓŵ㣺


???????? Ê×ÏÈ£¬RCFile ±£Ö¤Í¬Ò»ÐеÄÊý¾ÝλÓÚͬһ½Úµã£¬Òò´ËÔª×éÖع¹µÄ¿ªÏúºÜµÍ£»


? ??????? Æä´Î£¬ÏñÁд洢һÑù£¬RCFile Äܹ»ÀûÓÃÁÐά¶ÈµÄÊý¾ÝѹËõ£¬²¢ÇÒÄÜÌø¹ý²»±ØÒªµÄÁжÁÈ¡£»


4¡¢ORCFile


???????? ´æ´¢·½Ê½£ºÊý¾Ý°´Ðзֿé ÿ¿é°´ÕÕÁд洢¡£


???????? ѹËõ¿ì¡¢¿ìËÙÁдæÈ¡¡£


???????? ЧÂʱÈrcfile¸ß£¬ÊÇrcfileµÄ¸ÄÁ¼°æ±¾¡£


С½á£º


???????? Ïà±ÈTEXTFILEºÍSEQUENCEFILE£¬RCFILEÓÉÓÚÁÐʽ´æ´¢·½Ê½£¬Êý¾Ý¼ÓÔØʱÐÔÄÜÏûºÄ½Ï´ó£¬µ«ÊǾßÓнϺõÄѹËõ±ÈºÍ²éѯÏìÓ¦¡£


???????? Êý¾Ý²Ö¿âµÄÌصãÊÇÒ»´ÎдÈë¡¢¶à´Î¶ÁÈ¡£¬Òò´Ë£¬ÕûÌåÀ´¿´£¬RCFILEÏà±ÈÆäÓàÁ½ÖÖ¸ñʽ¾ßÓнÏÃ÷ÏÔµÄÓÅÊÆ¡£


13¡¢Hive±í¹ØÁª²éѯ£¬ÈçºÎ½â¾öÊý¾ÝÇãбµÄÎÊÌ⣿

1£©ÇãбԭÒò£ºmapÊä³öÊý¾Ý°´key HashµÄ·ÖÅäµ½reduceÖУ¬ÓÉÓÚkey·Ö²¼²»¾ùÔÈ¡¢ÒµÎñÊý¾Ý±¾ÉíµÄÌØ¡¢½¨±íʱ¿¼ÂDz»ÖÜ¡¢µÈÔ­ÒòÔì³ÉµÄreduce ÉϵÄÊý¾ÝÁ¿²îÒì¹ý´ó¡£
? £¨1£©key·Ö²¼²»¾ùÔÈ;
? £¨2£©ÒµÎñÊý¾Ý±¾ÉíµÄÌØÐÔ;
? £¨3£©½¨±íʱ¿¼ÂDz»ÖÜ;
? £¨4£©Ä³Ð©SQLÓï¾ä±¾Éí¾ÍÓÐÊý¾ÝÇãб;
? ÈçºÎ±ÜÃ⣺¶ÔÓÚkeyΪ¿Õ²úÉúµÄÊý¾ÝÇãб£¬¿ÉÒÔ¶ÔÆ丳ÓèÒ»¸öËæ»úÖµ¡£
? 2£©½â¾ö·½°¸
? £¨1£©²ÎÊýµ÷½Ú£º
? ? hive.map.aggr = true
? ? hive.groupby.skewindata=true
? ÓÐÊý¾ÝÇãбµÄʱºò½øÐиºÔؾùºâ£¬µ±Ñ¡ÏîÉ趨λtrue,Éú³ÉµÄ²éѯ¼Æ»®»áÓÐÁ½¸öMR Job¡£µÚÒ»¸öMR JobÖУ¬MapµÄÊä³ö½á¹û¼¯ºÏ»áËæ»ú·Ö²¼µ½ReduceÖУ¬Ã¿¸öReduce×ö²¿·Ö¾ÛºÏ²Ù×÷£¬²¢Êä³ö½á¹û£¬ÕâÑù´¦ÀíµÄ½á¹ûÊÇÏàͬµÄGroup By KeyÓпÉÄܱ»·Ö·¢µ½²»Í¬µÄReduceÖУ¬´Ó¶ø´ïµ½¸ºÔؾùºâµÄÄ¿µÄ£»µÚ¶þ¸öMR JobÔÙ¸ù¾ÝÔ¤´¦ÀíµÄÊý¾Ý½á¹û°´ÕÕGroup By Key ·Ö²¼µ½ Reduce ÖУ¨Õâ¸ö¹ý³Ì¿ÉÒÔ±£Ö¤ÏàͬµÄ Group By Key ±»·Ö²¼µ½Í¬Ò»¸öReduceÖУ©£¬×îºóÍê³É×îÖյľۺϲÙ×÷¡£
? £¨2£©SQL Óï¾äµ÷½Ú£º
? ¢Ù Ñ¡ÓÃjoin key·Ö²¼×î¾ùÔȵıí×÷ΪÇý¶¯±í¡£×öºÃÁвüôºÍfilter²Ù×÷£¬ÒÔ´ïµ½Á½±í×öjoin µÄʱºò£¬Êý¾ÝÁ¿Ïà¶Ô±äСµÄЧ¹û¡£
? ¢Ú ´óС±íJoin£º
? ? ʹÓÃmap joinÈÃСµÄά¶È±í£¨1000 ÌõÒÔϵļǼÌõÊý£©ÏȽøÄÚ´æ¡£ÔÚmap¶ËÍê³Éreduce¡£
? ¢Û ´ó±íJoin´ó±í£º
? ? °Ñ¿ÕÖµµÄkey±ä³ÉÒ»¸ö×Ö·û´®¼ÓÉÏËæ»úÊý£¬°ÑÇãбµÄÊý¾Ý·Öµ½²»Í¬µÄreduceÉÏ£¬ÓÉÓÚnull Öµ¹ØÁª²»ÉÏ£¬´¦Àíºó²¢²»Ó°Ïì×îÖÕ½á¹û¡£
? ¢Ü count distinct´óÁ¿ÏàͬÌØÊâÖµ:
? ? count distinct ʱ£¬½«ÖµÎª¿ÕµÄÇé¿öµ¥¶À´¦Àí£¬Èç¹ûÊǼÆËãcount distinct£¬¿ÉÒÔ²»Óô¦Àí£¬Ö±½Ó¹ýÂË£¬ÔÚ×îºó½á¹ûÖмÓ1¡£Èç¹û»¹ÓÐÆäËû¼ÆË㣬ÐèÒª½øÐÐgroup by£¬¿ÉÒÔÏȽ«ÖµÎª¿ÕµÄ¼Ç¼µ¥¶À´¦Àí£¬ÔÙºÍÆäËû¼ÆËã½á¹û½øÐÐunion¡£


14¡¢Fetchץȡ

???????? FetchץȡÊÇÖ¸£¬HiveÖжÔijЩÇé¿öµÄ²éѯ¿ÉÒÔ²»±ØʹÓÃMapReduce¼ÆËã¡£ÀýÈ磺SELECT * FROM employees;ÔÚÕâÖÖÇé¿öÏ£¬Hive¿ÉÒÔ¼òµ¥µØ¶ÁÈ¡employee¶ÔÓ¦µÄ´æ´¢Ä¿Â¼ÏµÄÎļþ£¬È»ºóÊä³ö²éѯ½á¹ûµ½¿ØÖÆ̨¡£


???????? ÔÚhive-default.xml.templateÎļþÖÐhive.fetch.task.conversionĬÈÏÊÇmore£¬À*æ±¾hiveĬÈÏÊÇminimal£¬¸ÃÊôÐÔÐÞ¸ÄΪmoreÒÔºó£¬ÔÚÈ«¾Ö²éÕÒ¡¢×ֶβéÕÒ¡¢limit²éÕҵȶ¼²»×ßmapreduce¡£


15¡¢Ð¡±í¡¢´ó±íJoin

???????? ½«keyÏà¶Ô·ÖÉ¢£¬²¢ÇÒÊý¾ÝÁ¿Ð¡µÄ±í·ÅÔÚjoinµÄ×ó±ß£¬ÕâÑù¿ÉÒÔÓÐЧ¼õÉÙÄÚ´æÒç³ö´íÎó·¢ÉúµÄ¼¸ÂÊ£»ÔÙ½øÒ»²½£¬¿ÉÒÔʹÓÃGroupÈÃСµÄά¶È±í£¨1000ÌõÒÔϵļǼÌõÊý£©ÏȽøÄÚ´æ¡£ÔÚmap¶ËÍê³Éreduce¡£


???????? ʵ¼Ê²âÊÔ·¢ÏÖ£ºÐ°æµÄhiveÒѾ­¶ÔС±íJOIN´ó±íºÍ´ó±íJOINС±í½øÐÐÁËÓÅ»¯¡£Ð¡±í·ÅÔÚ×ó±ßºÍÓÒ±ßÒѾ­Ã»ÓÐÃ÷ÏÔÇø±ð¡£


16¡¢´ó±íJoin´ó±í

1£©¿ÕKEY¹ýÂË
? ÓÐʱjoin³¬Ê±ÊÇÒòΪijЩkey¶ÔÓ¦µÄÊý¾ÝÌ«¶à£¬¶øÏàͬkey¶ÔÓ¦µÄÊý¾Ý¶¼»á·¢Ë͵½ÏàͬµÄreducerÉÏ£¬´Ó¶øµ¼ÖÂÄÚ´æ²»¹»¡£´ËʱÎÒÃÇÓ¦¸Ã×Ðϸ·ÖÎöÕâЩÒì³£µÄkey£¬ºÜ¶àÇé¿öÏ£¬ÕâЩkey¶ÔÓ¦µÄÊý¾ÝÊÇÒì³£Êý¾Ý£¬ÎÒÃÇÐèÒªÔÚSQLÓï¾äÖнøÐйýÂË¡£ÀýÈçkey¶ÔÓ¦µÄ×Ö¶ÎΪ¿Õ¡£
2£©¿Õkeyת»»
? ÓÐʱËäȻij¸ökeyΪ¿Õ¶ÔÓ¦µÄÊý¾ÝºÜ¶à£¬µ«ÊÇÏàÓ¦µÄÊý¾Ý²»ÊÇÒì³£Êý¾Ý£¬±ØÐëÒª°üº¬ÔÚjoinµÄ½á¹ûÖУ¬´ËʱÎÒÃÇ¿ÉÒÔ±íaÖÐkeyΪ¿ÕµÄ×ֶθ³Ò»¸öËæ»úµÄÖµ£¬Ê¹µÃÊý¾ÝËæ»ú¾ùÔȵطֲ»µ½²»Í¬µÄreducerÉÏ¡£


17¡¢Group By

ĬÈÏÇé¿öÏ£¬Map½×¶ÎͬһKeyÊý¾Ý·Ö·¢¸øÒ»¸öreduce£¬µ±Ò»¸ökeyÊý¾Ý¹ý´óʱ¾ÍÇãбÁË¡£
???????? ²¢²»ÊÇËùÓеľۺϲÙ×÷¶¼ÐèÒªÔÚReduce¶ËÍê³É£¬ºÜ¶à¾ÛºÏ²Ù×÷¶¼¿ÉÒÔÏÈÔÚMap¶Ë½øÐв¿·Ö¾ÛºÏ£¬×îºóÔÚReduce¶ËµÃ³ö×îÖÕ½á¹û¡£
1£©¿ªÆôMap¶Ë¾ÛºÏ²ÎÊýÉèÖÃ
? ? £¨1£©ÊÇ·ñÔÚMap¶Ë½øÐоۺϣ¬Ä¬ÈÏΪTrue
? ? ? hive.map.aggr = true
? ? £¨2£©ÔÚMap¶Ë½øÐоۺϲÙ×÷µÄÌõÄ¿ÊýÄ¿
? ? ? hive.groupby.mapaggr.checkinterval = 100000
? ? £¨3£©ÓÐÊý¾ÝÇãбµÄʱºò½øÐиºÔؾùºâ£¨Ä¬ÈÏÊÇfalse£©
? ? ? hive.groupby.skewindata = true
???????? µ±Ñ¡ÏîÉ趨Ϊ true£¬Éú³ÉµÄ²éѯ¼Æ»®»áÓÐÁ½¸öMR Job¡£µÚÒ»¸öMR JobÖУ¬MapµÄÊä³ö½á¹û»áËæ»ú·Ö²¼µ½ReduceÖУ¬Ã¿¸öReduce×ö²¿·Ö¾ÛºÏ²Ù×÷£¬²¢Êä³ö½á¹û£¬ÕâÑù´¦ÀíµÄ½á¹ûÊÇÏàͬµÄGroup By KeyÓпÉÄܱ»·Ö·¢µ½²»Í¬µÄReduceÖУ¬´Ó¶ø´ïµ½¸ºÔؾùºâµÄÄ¿µÄ£»µÚ¶þ¸öMR JobÔÙ¸ù¾ÝÔ¤´¦ÀíµÄÊý¾Ý½á¹û°´ÕÕGroup By Key·Ö²¼µ½ReduceÖУ¨Õâ¸ö¹ý³Ì¿ÉÒÔ±£Ö¤ÏàͬµÄGroup By Key±»·Ö²¼µ½Í¬Ò»¸öReduceÖУ©£¬×îºóÍê³É×îÖյľۺϲÙ×÷¡£


18¡¢Count(Distinct) È¥ÖØͳ¼Æ

???????? Êý¾ÝÁ¿Ð¡µÄʱºòÎÞËùν£¬Êý¾ÝÁ¿´óµÄÇé¿öÏ£¬ÓÉÓÚCOUNT DISTINCT²Ù×÷ÐèÒªÓÃÒ»¸öReduce TaskÀ´Íê³É£¬ÕâÒ»¸öReduceÐèÒª´¦ÀíµÄÊý¾ÝÁ¿Ì«´ó£¬¾Í»áµ¼ÖÂÕû¸öJobºÜÄÑÍê³É£¬Ò»°ãCOUNT DISTINCTʹÓÃÏÈGROUP BYÔÙCOUNTµÄ·½Ê½Ìæ»»


19¡¢µÑ¿¨¶û»ý

???????? ¾¡Á¿±ÜÃâµÑ¿¨¶û»ý£¬joinµÄʱºò²»¼ÓonÌõ¼þ£¬»òÕßÎÞЧµÄonÌõ¼þ£¬HiveÖ»ÄÜʹÓÃ1¸öreducerÀ´Íê³ÉµÑ¿¨¶û»ý


20¡¢ÐÐÁйýÂË

???????? Áд¦Àí£ºÔÚSELECTÖУ¬Ö»ÄÃÐèÒªµÄÁУ¬Èç¹ûÓУ¬¾¡Á¿Ê¹Ó÷ÖÇø¹ýÂË£¬ÉÙÓÃSELECT *¡£


???????? Ðд¦Àí£ºÔÚ·ÖÇø¼ô²ÃÖУ¬µ±Ê¹ÓÃÍâ¹ØÁªÊ±£¬Èç¹û½«¸±±íµÄ¹ýÂËÌõ¼þдÔÚWhereºóÃ棬ÄÇô¾Í»áÏÈÈ«±í¹ØÁª£¬Ö®ºóÔÙ¹ýÂË¡£


21¡¢²¢ÐÐÖ´ÐÐ

???????? Hive»á½«Ò»¸ö²éѯת»¯³ÉÒ»¸ö»òÕ߶à¸ö½×¶Î¡£ÕâÑùµÄ½×¶Î¿ÉÒÔÊÇMapReduce½×¶Î¡¢³éÑù½×¶Î¡¢ºÏ²¢½×¶Î¡¢limit½×¶Î¡£»òÕßHiveÖ´Ðйý³ÌÖпÉÄÜÐèÒªµÄÆäËû½×¶Î¡£Ä¬ÈÏÇé¿öÏ£¬HiveÒ»´ÎÖ»»áÖ´ÐÐÒ»¸ö½×¶Î¡£²»¹ý£¬Ä³¸öÌض¨µÄjob¿ÉÄÜ°üº¬ÖÚ¶àµÄ½×¶Î£¬¶øÕâЩ½×¶Î¿ÉÄܲ¢·ÇÍêÈ«»¥ÏàÒÀÀµµÄ£¬Ò²¾ÍÊÇ˵ÓÐЩ½×¶ÎÊÇ¿ÉÒÔ²¢ÐÐÖ´Ðеģ¬ÕâÑù¿ÉÄÜʹµÃÕû¸öjobµÄÖ´ÐÐʱ¼äËõ¶Ì¡£²»¹ý£¬Èç¹ûÓиü¶àµÄ½×¶Î¿ÉÒÔ²¢ÐÐÖ´ÐУ¬ÄÇôjob¿ÉÄܾÍÔ½¿ìÍê³É¡£


???????? ͨ¹ýÉèÖòÎÊýhive.exec.parallelֵΪtrue£¬¾Í¿ÉÒÔ¿ªÆô²¢·¢Ö´ÐС£²»¹ý£¬ÔÚ¹²Ïí¼¯ÈºÖУ¬ÐèҪעÒâÏ£¬Èç¹ûjobÖв¢Ðн׶ÎÔö¶à£¬ÄÇô¼¯ÈºÀûÓÃÂʾͻáÔö¼Ó¡£


²Êµ°

×ÊÔ´»ñÈ¡ »ñÈ¡FlinkÃæÊÔÌ⣬SparkÃæÊÔÌ⣬³ÌÐòÔ±±Ø±¸Èí¼þ£¬hiveÃæÊÔÌ⣬HadoopÃæÊÔÌ⣬DockerÃæÊÔÌ⣬¼òÀúÄ£°å£¬ÓÅÖʵÄÎÄÕµÈ×ÊÔ´ÇëÈ¥ Ï·½Á´½Ó»ñÈ¡


GitHub×ÔÐÐÏÂÔØ https://github.com/lhh2002/Framework-Of-BigData


Gitee ×ÔÐÐÏÂÔØ https://gitee.com/li_hey_hey/dashboard/projects




ɨÂë¹Ø×¢



´óÊý¾ÝÀϸç


Ï£ÍûÕâƪÎÄÕ¿ÉÒÔ°ïµ½Äã~



友情链接: