职工2015-12秋闽卫妇幼2009 71号71

Hadoop Hive user mailing list
Hadoop Hive user mailing list
1 Jul 04:12 2013
Re: Performance difference between tuning reducer num and partition tableFelix.徐 &&
02:12:52 GMT
Thanks for your reply. If I don't set the number of reducers in the 1st run , the number of reducers will be much smaller and the performance will be worse. The total output file size is about 200MB, I see that many reduce output files are empty, only 10 of them have data.
Another question is that , is there any documentation about the job specific parameters of MapReduce and Hive?&
Dean Wampler &&
What happens if you don't set the number of reducers in the 1st run? How many reducers are executed. If it's a much smaller number, the extra overhead could matter. Another clue is the size of the files the first run produced, i.e., do you have 30 small (much less than a block size) files?
On Sat, Jun 29, 2013 at 12:27 AM, Felix.徐 && wrote:
Hi Stephen,My query is actually more complex , hive will generate 2 mapreduces,
in the first solution , it runs 17 mappers / 30 reducers and 10 mappers / 30 reducers (reducer num is set manually)
in the second solution , it runs 6 mappers / 1 reducer and 4 mappers / 1 reducers for each partition
I do not know whether
they could achieve the same performance if the reducers num is set properly.
Stephen Sprague &&
great question.& your parallelization seems to trump hadoop's.&&& I guess i'd ask what are the _total_ number of Mappers and Reducers that run on your cluster for these two scenarios?&& I'd be curious if there are the same.
On Fri, Jun 28, 2013 at 8:40 AM, Felix.徐 && wrote:
Here is the scenario, suppose I have 2 tables A and B, I would like to perform a simple join on them,
We can do it like this:
INSERT OVERWRITE TABLE C
SELECT .... FROM A JOIN B on A.id=B.id
In order to speed up this query since table A and B have lots of data, another approach is :
Say I partition table A and B into 10 partitions respectively, and write the query like this
INSERT OVERWRITE TABLE C PARTITION(pid=1)
SELECT .... FROM A JOIN B on A.id=B.id WHERE A.pid=1 AND B.pid=1
then I run this query 10 times concurrently (pid ranges from 1 to 10)
And my question is that , in my observation of some more complex queries, the second solution is about 15% faster than the first solution,
is it simply because the setting of reducer num is not optimal?
If the resource is not a limit and it is possible to set the proper reducer nums in the first solution , can they achieve the same performance? Is there any other fact that can cause performance difference between them(non-partition VS partition+concurrent) besides the job parameter issues?
-- Dean Wampler, Ph.D. &at& deanwampler
1 Jul 04:37 2013
Correct way of using regexserdeMohammad Tariq &&
02:37:55 GMT
Hello list,
& & & & &I would really appreciate if someone could show me the correct way of using regexserde as i'm having some hard time using it. I have verified my regex through&&and it's working fine there. But when i'm using the same pattern with regexserde i'm getting NULL.
My input looks like this :
&SOME_CHARACTER_STRING&
and I want to extract the characters enclosed between the angle brackets.
This is the command i'm using :
hive& CREATE TABLE s(f1 STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'&
& & & WITH SERDEPROPERTIES ( "input.regex" = "(\\w*)", "output.regex" = "%1$s") STORED AS TEXTFILE;&
LOAD works fine, but SELECT * gives me NULL.
I am on hadoop-1.0.4 and hive-0.10.0
Thank you so much for you time.
Warm Regards,Tariq
1 Jul 04:43 2013
Re: "show table" throwing strange errorMohammad Tariq &&
02:43:44 GMT
Hello all,
& & & & & Apologies for being unresponsive. Was busy with some urgent deliverable.&
Inspite of all your help and trying continuously for several days it didn't work. I tried almost everything, including whatever you guys had suggested. As a result I had to reconfigure Hive and now it's working perfectly fine. Still I would love to hear if someone has something to about this.
Thank you so much for your precious time.
Warm Regards,Tariq
On Sat, Jun 22, 2013 at 3:28 PM, shashwat shriparv && wrote:
Create hive-site.xml paste following
and try&?xml version="1.0"?&&?xml-stylesheet type="text/xsl" href="configuration.xsl"?&
&& Licensed to the Apache Software Foundation (ASF) under one or more&& contributor license agreements.& See the NOTICE file distributed with&& this work for additional information regarding copyright ownership.
&& The ASF licenses this file to You under the Apache License, Version 2.0&& (the "License"); you may not use this file except in compliance with&& the License.& You may obtain a copy of the License at&&&&&& && Unless required by applicable law or agreed to in writing, software&& distributed under the License is distributed on an "AS IS" BASIS,
&& WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.&& See the License for the specific language governing permissions and&& limitations under the License.--&
&configuration&
&property&&&name&javax.jdo.option.ConnectionURL&/name&
& &value&jdbc:mysql://localhost:3306/hivemeta?createDatabaseIfNotExist=true&/value&& &description&the URL of the MySQL database&/description&
&/property&&property&& &name&javax.jdo.option.ConnectionDriverName&/name&& &value&com.mysql.jdbc.Driver&/value&&/property&&property&& &name&javax.jdo.option.ConnectionUserName&/name&
& &value&username&/value&
&/property&&property&& &name&javax.jdo.option.ConnectionPassword&/name&
& &value&passwore&/value&&/property&
&/configuration&
Thanks & Regards &&&&&&&&&&
Shashwat Shriparv
On Sat, Jun 22, 2013 at 2:26 AM,
More often than not in my experience is caused by a malformed hive-site.xml(or hive-default.xml). When this happened to me, it was because I somehow had tab characters in my hive-site.xml. Try dropping the file(s) and recreate with appropriate formatting.
On Fri, Jun 21, 2013 at 2:17 PM, Sanjay Subramanian && wrote:
Can u stop following services
hive-server&
hive-meta-store&
Hive-server2 (if u r running that)
Move current hive.log some place else
Start following services
hive-server&
hive-meta-store&
Hive-server2 (if u r running that)
And check hive.log ?
Also can u paste the CREATE TABLe script verbatim here&I feel if u are using some custom INPUTFORMAT / OUTPUTFORMAT class &that have to be specified in quotes&u may have to be
escape that
Plus try and add a semicolon to the end of the create table script ...
From: Mohammad Tariq &&Reply-To: "" &&
Date: Thursday, June 20,
To: user &&Subject: Re: "show table" throwing strange error
Thank you for looking into it Sanjay. "show tables" is working fine from both Ubuntu and Hive shell. But i'm getting the same error as yesterday when i'm running "create table", which is :
line 1:30 character '' not supported here
line 1:31 character '' not supported here
line 1:32 character '' not supported here
line 1:33 character '' not supported here
line 1:34 character '' not supported here
line 1:35 character '' not supported here
line 1:36 character '' not supported here
line 1:37 character '' not supported here
line 1:38 character '' not supported here
line 1:39 character '' not supported here
line 1:40 character '' not supported here
line 1:41 character '' not supported here
line 1:42 character '' not supported here
Also, I have noticed 1 strange thing. "hive.log" is totally messed up. Looks like logs are getting written in some binary encoding. I have attached a snapshot of the same. Any idea?
Warm Regards,
On Fri, Jun 21, 2013 at 1:03 AM, Sanjay Subramanian
Can u try from your ubuntu command prompt
$& hive -e "show tables"
From: Mohammad Tariq &&Reply-To: "" &&Date: Thursday, June 20,
AMTo: user &&
Subject: Re: "show table" throwing strange error
Thank you for the response ma'am. It didn't help either.
Warm Regards,
On Thu, Jun 20, 2013 at 8:43 AM, Sunita Arvind
Your issue seems familiar. Try logging out of hive session and re-login.
On Wed, Jun 19, 2013 at 8:53 PM, Mohammad Tariq
Hello list,
& & & & &I have a hive(0.9.0) setup on my Ubuntu box running hadoop-1.0.4. Everything was going smooth till now. But today when I issued
show tables I got some strange error on the CLI. Here is the error :
FAILED: Parse Error: line 1:0 character '' not supported here
line 1:1 character '' not supported here
line 1:2 character '' not supported here
line 1:3 character '' not supported here
line 1:4 character '' not supported here
line 1:5 character '' not supported here
line 1:6 character '' not supported here
line 1:7 character '' not supported here
line 1:8 character '' not supported here
line 1:9 character '' not supported here
line 1:10 character '' not supported here
line 1:11 character '' not supported here
line 1:12 character '' not supported here
line 1:13 character '' not supported here
line 1:14 character '' not supported here
line 1:15 character '' not supported here
line 1:16 character '' not supported here
line 1:17 character '' not supported here
line 1:18 character '' not supported here
line 1:19 character '' not supported here
line 1:20 character '' not supported here
line 1:21 character '' not supported here
line 1:22 character '' not supported here
line 1:23 character '' not supported here
line 1:24 character '' not supported here
line 1:25 character '' not supported here
line 1:26 character '' not supported here
line 1:27 character '' not supported here
line 1:28 character '' not supported here
line 1:29 character '' not supported here
line 1:30 character '' not supported here
line 1:31 character '' not supported here
line 1:32 character '' not supported here
line 1:33 character '' not supported here
line 1:34 character '' not supported here
line 1:35 character '' not supported here
line 1:36 character '' not supported here
line 1:37 character '' not supported here
line 1:38 character '' not supported here
line 1:39 character '' not supported here
line 1:40 character '' not supported here
line 1:41 character '' not supported here
line 1:42 character '' not supported here
line 1:43 character '' not supported here
line 1:44 character '' not supported here
line 1:45 character '' not supported here
line 1:46 character '' not supported here
line 1:47 character '' not supported here
line 1:48 character '' not supported here
line 1:49 character '' not supported here
line 1:50 character '' not supported here
line 1:51 character '' not supported here
line 1:52 character '' not supported here
line 1:53 character '' not supported here
line 1:54 character '' not supported here
line 1:55 character '' not supported here
line 1:56 character '' not supported here
line 1:57 character '' not supported here
line 1:58 character '' not supported here
line 1:59 character '' not supported here
line 1:60 character '' not supported here
line 1:61 character '' not supported here
line 1:62 character '' not supported here
line 1:63 character '' not supported here
line 1:64 character '' not supported here
line 1:65 character '' not supported here
line 1:66 character '' not supported here
line 1:67 character '' not supported here
line 1:68 character '' not supported here
line 1:69 character '' not supported here
line 1:70 character '' not supported here
line 1:71 character '' not supported here
line 1:72 character '' not supported here
line 1:73 character '' not supported here
line 1:74 character '' not supported here
line 1:75 character '' not supported here
line 1:76 character '' not supported here
line 1:77 character '' not supported here
line 1:78 character '' not supported here
line 1:79 character '' not supported here
line 1:378 character '' not supported here
line 1:379 character '' not supported here
line 1:380 character '' not supported here
line 1:381 character '' not supported here
Strangely other queries like select foo from pokes where bar = 'tariq'; are working fine. Tried to search over the net but could not find anything useful.Need some help.
Thank you so much for your time.
Warm Regards,
CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient,
please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review
and disclosure by the sender's Email System Administrator.
CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient,
please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review
and disclosure by the sender's Email System Administrator.
-- Swarnim
1 Jul 09:15 2013
回复: different outer join plan between hive 0.9 and hive 0.10wzc1989 &&
07:15:35 GMT
look at the patches in (HIVE-3411, HIVE-4206, HIVE-4212, HIVE-3464), &I understand what you mean by "hive tags rows a filter mask as a short for outer join,&which can contain 16 flags.&" . I wonder why not choose Long or int which can contain 64/32 tags. Does adding one Long/int in every row cost too much?
已使用
&#2Ÿ14日星期二,下&#,Navis류승우 写道:
In short, hive tags rows a filter mask as a short for outer join,
which can contain 16 flags. (see HIVE-3411, plz)
I'll survey for a solution.
wzc1989 &&:
"hive cannot merge joins of 16+ aliases with outer join into single stage."
In our use case we use one table full outer join all other table to produce
one big table, which may exceed 16 outer join limits and will be split into
multi stage under hive 0.10.
It become very slow under hive 0.10 while we run such query well under hive
I believe it's due to the diff of query plan. I wonder why hive 0.10 cannot
merge join 16+ aliases into single stage while hive 0.9 doesn't have such
issue. could you explain this or give me some hint?
已使用 Sparrow
&#2Ÿ14日星期二,下&#,Navis류승우 写道:
The error message means hive cannot merge joins of 16+ aliases with
outer join into single stage. It was 8 way originally (HIVE-3411) but
expanded to 16 later.
for details.
wzc1989 &&:
This time i cherry-pick HIVE-3464, HIVE-4212, HIVE-4206 and some related
commits and the above explain result matches in hive 0.9 and hive 0.10,
But I confuse about this error msg:
JOINNODE_OUTERJOIN_MORETHAN_16(10142, "Single join node containing outer
join(s) " +
"cannot have more than 16 aliases"),
does this mean in hive0.10 when we have more than 16 outer join the query
plan will still have some bug?
I test the sql below and find the explain result still diff between hive 0.9
and hive 0.10.
explain select
sum(a.value) val
from default.test_join a
left outer join default.test_join b on a.key = b.key
left outer join default.test_join c on a.key = c.key
left outer join default.test_join d on a.key = d.key
left outer join default.test_join e on a.key = e.key
left outer join default.test_join f on a.key = f.key
left outer join default.test_join g on a.key = g.key
left outer join default.test_join h on a.key = h.key
left outer join default.test_join i on a.key = i.key
left outer join default.test_join j on a.key = j.key
left outer join default.test_join k on a.key = k.key
left outer join default.test_join l on a.key = l.key
left outer join default.test_join m on a.key = m.key
left outer join default.test_join n on a.key = n.key
left outer join default.test_join u on a.key = u.key
left outer join default.test_join v on a.key = v.key
left outer join default.test_join w on a.key = w.key
left outer join default.test_join x on a.key = x.key
left outer join default.test_join z on a.key = z.key
已使用 Sparrow
&#2Ÿ29日星期五,上&#,Navis류승우 写道:
The problem is mixture of issues (HIVE-3411, HIVE-4209, HIVE-4212,
HIVE-3464) and still not completely fixed even in trunk.
Will be fixed shortly.
The bug remains even if I apply the patch in HIVE-4206 :( The explain
result hasn't change.
Navis류승우 &&
It's a bug ().
Thanks for reporting it.
Recently we tried to upgrade our hive from 0.9 to 0.10, but found some
our hive queries almost 7 times slow. One of such query consists
table outer join on the same key. By looking into the query, we found
query plans generate by hive 0.9 and hive 0.10 are different. Here is
create table test_join (
`key` string,
`value` string
explain select
sum(a.value) val
from default.test_join a
left outer join default.test_join b on a.key = b.key
left outer join default.test_join c on a.key = c.key
left outer join default.test_join d on a.key = d.key
left outer join default.test_join e on a.key = e.key
left outer join default.test_join f on a.key = f.key
left outer join default.test_join g on a.key = g.key
the explain of hive 0.9:
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-2 depends on stages: Stage-1
Stage-0 is a root stage
Reduce Operator Tree:
Join Operator
condition map:
Left Outer Join0 to 1
Left Outer Join0 to 2
Left Outer Join0 to 3
Left Outer Join0 to 4
Left Outer Join0 to 5
Left Outer Join0 to 6
condition expressions:
0 {VALUE._col1}
while the explain of hive 0.10:
STAGE DEPENDENCIES:
Stage-6 is a root stage
Stage-1 depends on stages: Stage-6
Stage-2 depends on stages: Stage-1
Stage-0 is a root stage
Reduce Operator Tree:
Join Operator
condition map:
Left Outer Join0 to 1
Left Outer Join0 to 2
condition expressions:
0 {VALUE._col0} {VALUE._col1}
Reduce Operator Tree:
Join Operator
condition map:
Left Outer Join0 to 1
Left Outer Join0 to 2
Left Outer Join0 to 3
Left Outer Join0 to 4
condition expressions:
0 {VALUE._col9}
It seems like hive 0.9 use only one stage/job to process all outer joins
hive 0.10 split them into two stage. When running such kind of query on
hive0.10 in production, in the second stage of outer join process, some
reducer stucks.
I can't find any param to change the query plain , can anyone give me
1 Jul 10:01 2013
Experience of Hive local mode execution styleGuillaume Allain &&
08:01:18 GMT
Would anybody have any comments or feedback about the hive local mode execution? It is advertised as providing a boost to performance for small data sets. It seem to fit nicely when running unit/integration tests on single node or virtual machine.
My exact questions are the following :
- How significantly diverge the local mode execution of queries compared to distributed mode? Do the results may be different in some way?
- I have had encountered error when running complex queries (with several joins/distinct/groupbys) that seem to relate to configuration (see below). I got no exact answers from the ML and I am kind of ready to dive into the source code.
Any idea where I should aim in order to solve that particular problem?
Thanks in advance,
From: Guillaume AllainSent: 18 June To: user-K0oFRspv0GEPKjDvHGQMeg@public.gmane.orgSubject: FileNotFoundException when using hive local mode execution style
I plan to use& hive local in order to speed-up unit testing on (very) small data sets. (Data is still on hdfs). I switch the local mode by setting the following variables :
SET hive.exec.mode.local.auto=
SET mapred.local.dir=/
SET mapred.tmp.dir=file:///(plus creating needed directories and permissions)
Simple GROUP BY, INNER and OUTER JOIN queries work just fine (with up to 3 jobs) with nice performance improvements.
Unfortunately I ran into a& FileNotFoundException:/tmp/vagrant/hive__16-10-05_614_4458113/-mr-10000/1/emptyFile) on some more complex query (4 jobs, distinct on top of several joins, see below logs if needed).
Any idea about that error? What other option I am missing to have a fully fonctional local mode?
Thanks in advance, Guillaume
$ tail -50 /tmp/vagrant/vagrant_13_82baad8b--a52e-df8.lo 16:10:05,669 INFO& exec.ExecDriver (ExecDriver.java:execute(320)) - Using org.apache.hadoop.hive.bineHiveInputFormat
16:10:05,688 INFO& exec.ExecDriver (ExecDriver.java:execute(342)) - adding libjars: file:///opt/events-warehouse/build/jars/joda-time.jar,file:///opt/events-warehouse/build/jars/we7-hive-udfs.jar,file:///usr/lib/hive/lib/hive-json-serde-0.2.jar,file:///usr/lib/hive/lib/hive-builtins-0.9.0-cdh4.1.2.jar,file:///opt/events-warehouse/build/jars/guava.jar
16:10:05,688 INFO& exec.ExecDriver (ExecDriver.java:addInputPaths(840)) - Processing alias dc
16:10:05,688 INFO& exec.ExecDriver (ExecDriver.java:addInputPaths(858)) - Adding input file hdfs://localhost/user/hive/warehouse/events_super_mart_test.db/dim_cohorts
16:10:05,689 INFO& exec.Utilities (Utilities.java:isEmptyPath(1807)) - Content Summary not cached for hdfs://localhost/user/hive/warehouse/events_super_mart_test.db/dim_cohorts
16:10:06,185 INFO& exec.ExecDriver (ExecDriver.java:addInputPath(789)) - Changed input file to file:/tmp/vagrant/hive__16-10-05_614_4458113/-mr-10000/1
16:10:06,226 INFO& exec.ExecDriver (ExecDriver.java:addInputPaths(840)) - Processing alias $INTNAME
16:10:06,226 INFO& exec.ExecDriver (ExecDriver.java:addInputPaths(858)) - Adding input file hdfs://localhost/tmp/hive-vagrant/hive__16-09-42_560_9242367/-mr-10004
16:10:06,226 INFO& exec.Utilities (Utilities.java:isEmptyPath(1807)) - Content Summary not cached for hdfs://localhost/tmp/hive-vagrant/hive__16-09-42_560_9242367/-mr-10004
16:10:06,681 WARN& conf.Configuration (Configuration.java:warnOnceIfDeprecated(808)) -
is deprecated. Instead, use dfs.metrics.session-id
16:10:06,682 INFO& jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
16:10:06,688 INFO& exec.ExecDriver (ExecDriver.java:createTmpDirs(215)) - Making Temp Directory: hdfs://localhost/tmp/hive-vagrant/hive__16-09-42_560_9242367/-mr-10002
16:10:06,706 WARN& mapred.JobClient (JobClient.java:copyAndConfigureFiles(704)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
16:10:06,942 INFO& io.CombineHiveInputFormat (CombineHiveInputFormat.java:getSplits(370)) - CombineHiveInputSplit creating pool for file:/tmp/vagrant/hive__16-10-05_614_4458113/-mr-10000/1; using filter path file:/tmp/vagrant/hive__16-10-05_614_4458113/-mr-10000/1
16:10:06,943 INFO& io.CombineHiveInputFormat (CombineHiveInputFormat.java:getSplits(370)) - CombineHiveInputSplit creating pool for hdfs://localhost/tmp/hive-vagrant/hive__16-09-42_560_9242367/-mr-10004; using filter path hdfs://localhost/tmp/hive-vagrant/hive__16-09-42_560_9242367/-mr-10004
16:10:06,951 INFO& mapred.FileInputFormat (FileInputFormat.java:listStatus(196)) - Total input paths to process : 2
16:10:06,953 INFO& mapred.JobClient (JobClient.java:run(982)) - Cleaning up the staging area file:/user/vagrant/.staging/job_local_0001
16:10:06,953 ERROR security.UserGroupInformation (UserGroupInformation.java:doAs(1335)) - PriviledgedActionException as:vagrant (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: /tmp/vagrant/hive__16-10-05_614_4458113/-mr-10000/1/emptyFile
16:10:06,956 ERROR exec.ExecDriver (SessionState.java:printError(403)) - Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: /tmp/vagrant/hive__16-10-05_614_4458113/-mr-10000/1/emptyFile)'
java.io.FileNotFoundException: File does not exist: /tmp/vagrant/hive__16-10-05_614_4458113/-mr-10000/1/emptyFile
&&& at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:787)
&&& at org.apache.hadoop.bineFileInputFormat$OneFileInfo.&init&(CombineFileInputFormat.java:462)
&&& at org.apache.hadoop.bineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:256)
&&& at org.apache.hadoop.bineFileInputFormat.getSplits(CombineFileInputFormat.java:212)
&&& at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:392)
&&& at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:358)
&&& at org.apache.hadoop.hive.bineHiveInputFormat.getSplits(CombineHiveInputFormat.java:387)
&&& at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1041)
&&& at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1033)
&&& at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172)
&&& at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:943)
&&& at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896)
&&& at java.security.AccessController.doPrivileged(Native Method)
&&& at javax.security.auth.Subject.doAs(Subject.java:396)
&&& at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
&&& at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896)
&&& at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:870)
&&& at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:435)
&&& at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:677)
&&& at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
&&& at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
&&& at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
&&& at java.lang.reflect.Method.invoke(Method.java:597)
&&& at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Installation detail:
vagrant &at& hadoop:/opt/events-warehouse$ hadoop version
Hadoop 2.0.0-cdh4.1.2
vagrant &at& hadoop:/opt/events-warehouse$ ls /usr/lib/hive/lib/ | grep hive
hive-builtins-0.9.0-cdh4.1.2.jar
hive-cli-0.9.0-cdh4.1.2.jar
hive-common-0.9.0-cdh4.1.2.jar
hive-contrib-0.9.0-cdh4.1.2.jar
hive_contrib.jar
hive-exec-0.9.0-cdh4.1.2.jar
hive-hbase-handler-0.9.0-cdh4.1.2.jar
hive-hwi-0.9.0-cdh4.1.2.jar
hive-jdbc-0.9.0-cdh4.1.2.jar
hive-json-serde-0.2.jar
hive-metastore-0.9.0-cdh4.1.2.jar
hive-pdk-0.9.0-cdh4.1.2.jar
hive-serde-0.9.0-cdh4.1.2.jar
hive-service-0.9.0-cdh4.1.2.jar
hive-shims-0.9.0-cdh4.1.2.jar
Guillaume Allain
Senior Development Engineer t: +44 20
m: blinkbox music - the easiest way to listen to the music you love, for free
1 Jul 10:01 2013
FW: Continuing/FAILED/ATTEMPT error with udfGuillaume Allain &&
08:01:33 GMT
I ran into errors that I cannot explain when using a java User Defined Function:
- The UDF runs fine on smallquery, I am therefore confident on my ADD JAR/CREATE TEMPORARY
- The error is only raised on complex queries/ h
- The jobs complain first about a "java.lang.ClassNotFoundException:com", followed by a "Continuing ..." and then successive "FAILED/ATTEMPT" but goes on
- It failed later on with a "java.lang.NoClassDefFoundError"
Any idea about that weird pattern of errors? What could explain why it happens on that particular udf? (my project uses many other udfs with no problem).
Thanks in advance,
## Version
hive &at& melissa:~$ hadoop version
Hadoop 2.0.0-cdh4.1.2
Subversion file:///data/1/jenkins/workspace/generic-package-debian64-6-0/CDH4.1.2-Packaging-Hadoop-_17-01-07/hadoop-2.0.0+552-1.cdh4.1.2.p0.27~squeeze/src/hadoop-common-project/hadoop-common -r f0b53c81cbf56ffcd27afd5f082de
Compiled by jenkins on Thu Nov& 1 17:33:24 PDT 2012
From source with checksum c5d56e606a3aa6dd5399cee3b2b8054f
hive &at& melissa:~$ ls /usr/lib/hive/lib/ | grep hive
hive-builtins-0.9.0-cdh4.1.2.jar
hive-cli-0.9.0-cdh4.1.2.jar
hive-common-0.9.0-cdh4.1.2.jar
hive-contrib-0.9.0-cdh4.1.2.jar
hive_contrib.jar
hive-exec-0.9.0-cdh4.1.2.jar
hive-hbase-handler-0.9.0-cdh4.1.2.jar
hive-hwi-0.9.0-cdh4.1.2.jar
hive-jdbc-0.9.0-cdh4.1.2.jar
hive-json-serde-0.2.jar
hive-metastore-0.9.0-cdh4.1.2.jar
hive-pdk-0.9.0-cdh4.1.2.jar
hive-serde-0.9.0-cdh4.1.2.jar
hive-service-0.9.0-cdh4.1.2.jar
hive-shims-0.9.0-cdh4.1.2.jar
Total MapReduce jobs = 6
Ended Job = , job is filtered out (removed at runtime).
Ended Job = -, job is filtered out (removed at runtime).
Execution log at: /tmp/hive/hive_27_af20-4e19-9af9-f49f13874e85.log
04:27:14&&& Starting to launch local task&&& maximum memory =
04:27:15&&& Processing rows:&&& 3&&& Hashtable size:&&& 3&&& Memory usage:&&& 3766480&&& rate:&&& 0.004
04:27:15&&& Dump the hashtable into file: file:/tmp/hive/hive__16-27-04_597_7628371/-local-10007/HashTable-Stage-8/MapJoin-mapfile21--.hashtable
04:27:15&&& Upload 1 File to: file:/tmp/hive/hive__16-27-04_597_7628371/-local-10007/HashTable-Stage-8/MapJoin-mapfile21--.hashtable File size: 546
04:27:15&&& E Time Taken: 1.662 sec.
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
Mapred Local Task Succeeded . Convert the Join into MapJoin
Launching Job 2 out of 6
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job__0655, Tracking URL = http://hadoop-master.we7.local:50030/jobdetails.jsp?jobid=job__0655
Kill Command = /usr/lib/hadoop/bin/hadoop job& -Dmapred.job.tracker=hadoop-master.we7.local:8021 -kill job__0655
Hadoop job information for Stage-8: number of mappers: 1; number of reducers: 0
16:27:20,383 Stage-8 map = 0%,& reduce = 0%
16:27:27,430 Stage-8 map = 100%,& reduce = 0%, Cumulative CPU 4.27 sec
16:27:28,447 Stage-8 map = 100%,& reduce = 100%, Cumulative CPU 4.27 sec
MapReduce Total cumulative CPU time: 4 seconds 270 msec
Ended Job = job__0655
Ended Job = , job is filtered out (removed at runtime).
Ended Job = , job is filtered out (removed at runtime).
Execution log at: /tmp/hive/hive_27_af20-4e19-9af9-f49f13874e85.log
java.lang.ClassNotFoundException: com.we7.warehouse.udf.DateCompare
Continuing ...
java.lang.NullPointerException: target should not be null
Continuing ...
java.lang.ClassNotFoundException: com.we7.warehouse.udf.DateCompare
Continuing ...
java.lang.NullPointerException: target should not be null
Continuing ...
04:27:30&&& Starting to launch local task&&& maximum memory =
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
&&& at org.apache.hadoop.hive.ql.exec.FilterOperator.initializeOp(FilterOperator.java:76)
&&& at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
&&& at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
&&& at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)
&&& at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:166)
&&& at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
&&& at org.apache.hadoop.hive.ql.exec.MapredLocalTask.initializeOperators(MapredLocalTask.java:385)
&&& at org.apache.hadoop.hive.ql.exec.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:267)
&&& at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:672)
&&& at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
&&& at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
&&& at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
&&& at java.lang.reflect.Method.invoke(Method.java:597)
&&& at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.lang.NullPointerException
&&& at org.apache.hadoop.hive.ql.exec.FunctionRegistry.isStateful(FunctionRegistry.java:1124)
&&& at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.&init&(ExprNodeGenericFuncEvaluator.java:107)
&&& at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorFactory.get(ExprNodeEvaluatorFactory.java:48)
&&& at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.&init&(ExprNodeGenericFuncEvaluator.java:97)
&&& at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorFactory.get(ExprNodeEvaluatorFactory.java:48)
&&& at org.apache.hadoop.hive.ql.exec.FilterOperator.initializeOp(FilterOperator.java:70)
&&& ... 13 more
Execution failed with exit status: 2
Obtaining error information
Task failed!
& Stage-11
/tmp/hive/hive.log
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapredLocalTask
ATTEMPT: Execute BackupTask: org.apache.hadoop.hive.ql.exec.MapRedTask
Launching Job 4 out of 6
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
& set hive.exec.reducers.bytes.per.reducer=&number&
In order to limit the maximum number of reducers:
& set hive.exec.reducers.max=&number&
In order to set a constant number of reducers:
& set mapred.reduce.tasks=&number&
Starting Job = job__0656, Tracking URL = http://hadoop-master.we7.local:50030/jobdetails.jsp?jobid=job__0656
Kill Command = /usr/lib/hadoop/bin/hadoop job& -Dmapred.job.tracker=hadoop-master.we7.local:8021 -kill job__0656
Hadoop job information for Stage-2: number of mappers: 2; number of reducers: 1
16:27:34,525 Stage-2 map = 0%,& reduce = 0%
16:27:39,551 Stage-2 map = 50%,& reduce = 0%, Cumulative CPU 3.49 sec
16:27:58,736 Stage-2 map = 50%,& reduce = 17%, Cumulative CPU 3.49 sec
16:27:59,745 Stage-2 map = 100%,& reduce = 100%, Cumulative CPU 3.49 sec
MapReduce Total cumulative CPU time: 3 seconds 490 msec
Ended Job = job__0656 with errors
Error during job, obtaining debugging information...
Examining task ID: task__0656_m_000003 (and more) from job job__0656
Exception in thread "Thread-59" java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/util/HostUtil
&&& at org.apache.hadoop.hive.shims.Hadoop23Shims.getTaskAttemptLogUrl(Hadoop23Shims.java:51)
&&& at org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.getTaskInfos(JobDebugger.java:186)
&&& at org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.run(JobDebugger.java:142)
&&& at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.util.HostUtil
&&& at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
&&& at java.security.AccessController.doPrivileged(Native Method)
&&& at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
&&& at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
&&& at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
&&& at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
&&& ... 4 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1&& Cumulative CPU: 4.27 sec&& HDFS Read: 0 HDFS Write: 0 SUCCESS
Job 1: Map: 2& Reduce: 1&& Cumulative CPU: 3.49 sec&& HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 7 seconds 760 msec
Guillaume Allain
Senior Development Engineer t: +44 20
m: blinkbox music - the easiest way to listen to the music you love, for free
1 Jul 10:09 2013
Re: Correct way of using regexserdeMatouk IFTISSEN &&
08:09:59 GMT
Hello,try to delete&&"output.regex" = "%1$s" and store your data in file then pricise where the file is located&
STORED AS TEXTFILE&
LOCATION '/......';
and assure that you add the jar "hive-contrib-0.10.0.jar" &in the session or you have it in all Hadoop tasktrackers
like this:
add jar path_where_is_the_jar_in_hive_lib\hive-contrib-0.9.0.
Mohammad Tariq &&
Hello list,
& & & & &I would really appreciate if someone could show me the correct way of using regexserde as i'm having some hard time using it. I have verified my regex through&&and it's working fine there. But when i'm using the same pattern with regexserde i'm getting NULL.
My input looks like this :
&SOME_CHARACTER_STRING&
and I want to extract the characters enclosed between the angle brackets.
This is the command i'm using :
hive& CREATE TABLE s(f1 STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'&
& & & WITH SERDEPROPERTIES ( "input.regex" = "(\\w*)", "output.regex" = "%1$s") STORED AS TEXTFILE;&
LOAD works fine, but SELECT * gives me NULL.
I am on hadoop-1.0.4 and hive-0.10.0
Thank you so much for you time.
Warm Regards,Tariq
1 Jul 11:52 2013
Re: hiveserver2 Thrift Interface With PerlDave Cardwell &&
09:52:02 GMT
I just wanted to update the list to let future searchers know that I found a solution&David Morel of
kindly shared some code with me to get this working, which he has now released as a Perl library that has been working well:
--&Best wishes,Dave Cardwell.
On 14 May , Dave Cardwell && wrote:
I wrote a few reporting scripts in Perl that connected to Hive via the Thrift interface.
Since we upgraded CDH to 4.2.0 and hiveserver2 was installed (with Hive 0.10.0) the scripts no longer ran due to the API changes.
I regenerated the Perl modules from the .thrift files and have tried to translate the Java examples I&ve found online, but cannot for the life of me get it working with the new interface.
The Java examples seem to use a&TOpenSessionReq class but I cannot find this anywhere in the generated Perl modules. If I try to skip that part and go straight to $client-&OpenSession() without an argument, the TCLIService module itself complains that it cannot create a&TOpenSessionResp object because the class is not loaded.
I have attached example code. Can anyone advise me on how to get past this block?
--&Best wishes,Dave Cardwell.
1 Jul 22:31 2013
Re: unable to partition the tableStephen Sprague &&
20:31:37 GMT
ok. so i just learned of a perl script called "json_pp"& (for json pretty_prrint) that is a binary included in the distro for the perl JSON module.& You gotta figure there are analogous tools in other languages as well but given this is a binary it doesn't matter much what language its written in - it just works.so doing this:& $ hive -e 'select &your_json_column& from &your_table&' | json_pp&
you'd actually be able to see what your json looks like in a human readable fashion.
get the binary (and the module) here:
On Wed, Jun 26, 2013 at 4:38 PM, Sunita Arvind && wrote:
Ok. Thanks Stephen. I will try that out.Will update the group if I am able to get this to work. For now, I will continue with non-partitioned table.
On Wed, Jun 26, 2013 at 7:11 PM, Stephen Sprague && wrote:
it would appear to be that you may partition only by non-nested columns.& I would recommend transforming your original dataset into one where the first column is YYYYMM and the rest is your json object.& During this transformation you may also wish to make further optimizations as well since you'll be scanning every record.
as always my 2 cents only.
On Wed, Jun 26, 2013 at 3:47 PM, Sunita Arvind && wrote:
I am unable to create a partitioned table.&
The error I get is:
FAILED: ParseException line 37:16 mismatched input '"jobs.values.postingDate.year"' expecting Identifier near '(' in column specification
I tried referring to the columns in various ways, S.jobs.values.postingDate.year, with quotes, without quotes, get the same error. Also tried creating a partition by year alone. Still get the same error.
Here is the create table statement:
create external table linkedin_JobSearch (
jobs STRUCT&
values : ARRAY&STRUCT&
company : STRUCT&
id : STRING,
name : STRING&,
postingDate : STRUCT&
day : STRING&,
descriptionSnippet : STRING, &
expirationDate : STRUCT&
locationDescription : STRING&&&
PARTITIONED BY ("jobs.values.postingDate.year" STRING, "jobs.values.postingDate.month" STRING)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
WITH SERDEPROPERTIES (
"company"="$.",
"position"="$.jobs.values.position.title",
"customerJobCode"="$.jobs.values.customerJobCode",
"locationDescription"="$.jobs.values.locationDescription",
"jobPoster"="$.jobs.values.jobposter.headline"
LOCATION '/user/sunita/Linkedin/JobSearch';
I need to be able to partition this information. Please help.
1 Jul 22:43 2013
Re: unable to partition the tableStephen Sprague &&
20:43:26 GMT
sorry for the spam. but one correction.& json_pp appears to be part of the perl core so if you have a modern perl you may already have it installed.
On Mon, Jul 1, 2013 at 1:31 PM, Stephen Sprague && wrote:
ok. so i just learned of a perl script called "json_pp"& (for json pretty_prrint) that is a binary included in the distro for the perl JSON module.& You gotta figure there are analogous tools in other languages as well but given this is a binary it doesn't matter much what language its written in - it just works.so doing this:& $ hive -e 'select &your_json_column& from &your_table&' | json_pp&
you'd actually be able to see what your json looks like in a human readable fashion.
get the binary (and the module) here:
On Wed, Jun 26, 2013 at 4:38 PM, Sunita Arvind && wrote:
Ok. Thanks Stephen. I will try that out.Will update the group if I am able to get this to work. For now, I will continue with non-partitioned table.
On Wed, Jun 26, 2013 at 7:11 PM, Stephen Sprague && wrote:
it would appear to be that you may partition only by non-nested columns.& I would recommend transforming your original dataset into one where the first column is YYYYMM and the rest is your json object.& During this transformation you may also wish to make further optimizations as well since you'll be scanning every record.
as always my 2 cents only.
On Wed, Jun 26, 2013 at 3:47 PM, Sunita Arvind && wrote:
I am unable to create a partitioned table.&
The error I get is:
FAILED: ParseException line 37:16 mismatched input '"jobs.values.postingDate.year"' expecting Identifier near '(' in column specification
I tried referring to the columns in various ways, S.jobs.values.postingDate.year, with quotes, without quotes, get the same error. Also tried creating a partition by year alone. Still get the same error.
Here is the create table statement:
create external table linkedin_JobSearch (
jobs STRUCT&
values : ARRAY&STRUCT&
company : STRUCT&
id : STRING,
name : STRING&,
postingDate : STRUCT&
day : STRING&,
descriptionSnippet : STRING, &
expirationDate : STRUCT&
locationDescription : STRING&&&
PARTITIONED BY ("jobs.values.postingDate.year" STRING, "jobs.values.postingDate.month" STRING)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
WITH SERDEPROPERTIES (
"company"="$.",
"position"="$.jobs.values.position.title",
"customerJobCode"="$.jobs.values.customerJobCode",
"locationDescription"="$.jobs.values.locationDescription",
"jobPoster"="$.jobs.values.jobposter.headline"
LOCATION '/user/sunita/Linkedin/JobSearch';
I need to be able to partition this information. Please help.
Articles in period: 411
274&321&322&347&298&279&388&286&174&305&231&339&331&228&247&252&195&233&287&264&389&274&158&294&270&301&253&230&222&276&313&364&411&402&320&205&344&269&315&314&279&329&388&387&494&453&454&342&332&203&259&234&182&198&210&270&150&149&240&127&264&274&226&132&9}

我要回帖

更多关于 10.71.172.12 的文章

更多推荐

版权声明:文章内容来源于网络,版权归原作者所有,如有侵权请点击这里与我们联系,我们将及时删除。

点击添加站长微信