Hello,
I get multiple small files into my input directory which I want to merge into a single file without using the local file system or writing mapreds. Is there a way I could do it using hadoof fs commands or Pig?
Thanks!
From stackoverflow
-
hadoop fs -getmerge <dir_of_input_files> <mergedsinglefile> -
Thanks Harsha, But can the destination file for getmerge be within the DFS? From what I understand, the destination has to be the local file system
-
okay...I figured out a way using hadoop fs commands -
hadoop fs -cat [dir]/* | hadoop fs -put - [destination file]
It worked when I tested it...any pitfalls one can think of?
Thanks!
-
You can use the tool HDFSConcat, new in HDFS 0.21, to perform this operation without incurring the cost of a copy.
uHadoop : Thanks Jeff, will look into HDFSConcat. Currently we are on 0.20.2 so I am now creating a Har of all the files and then reading from pig. This way data stays in HDFS.Jeff Hammerbacher : I should note that this tool has limitations highlighted at https://issues.apache.org/jira/browse/HDFS-950. Files must have the same block size and be owned by the same user.
0 comments:
Post a Comment