FAQ

Author: Rares Vernica <rares (at) ics.uci.edu>

1 Copyright
2 What should I do if I get java.lang.OutOfMemoryError: Java heap space in the Map phase of Stage 2, Kernel (ridpairsimproved or ridpairsppjoin)?
3 Where can I get more help?

1 Copyright

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS"; BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

2 What should I do if I get `java.lang.OutOfMemoryError: Java heap space` in the Map phase of Stage 2, Kernel (`ridpairsimproved` or `ridpairsppjoin`)?

Stage 1, Token Ordering (tokesbasic or tokensimproved) produces a list of unique tokens that are loaded into memory by Stage 2. The list is output in the tokens.n directory in HDFS. The reason for the OutOfMemoryError might be the fact that the list of tokens does not fit into memory.

The first thing you should check is whether you are using the right tokenizer for your data. For example, if each join field value is a list of words, then the word tokenizer would be appropriate. Otherwise, if each join field value is a contiguous string of characters, then a n-gram tokenizer might be appropriate. The tokenizer can be specified in the command line with the -Dfuzzyjoin.tokenizer= option or in the XML file specified with the -conf option. For more details please see:

hadoop/fuzzyjoin/resources/conf/fuzzyjoin/default.xml

3 Where can I get more help?

Please email Rares Vernica <rares (at) ics.uci.edu> with any questions you might have.

Date: 2011-04-12 09:58:19 PDT

HTML generated by org-mode 7.4 in emacs 24

FAQ

Table of Contents

1 Copyright

2 What should I do if I get java.lang.OutOfMemoryError: Java heap space in the Map phase of Stage 2, Kernel (ridpairsimproved or ridpairsppjoin)?

3 Where can I get more help?

2 What should I do if I get `java.lang.OutOfMemoryError: Java heap space` in the Map phase of Stage 2, Kernel (`ridpairsimproved` or `ridpairsppjoin`)?