Hadoop :- HDFS – Basic shell commands (PART – I)

Hi Guys ,

In this article i will try to explain the most common shell commands of HDFS with practical example , hope you will get some useful information from this

Lets start with simple typing

hadoop fs

u will get the following results :-

user@ubuntuvm:~/Desktop/hadoop$ hadoop fs
Usage: hadoop fs [generic options]
[-appendToFile <localsrc> … <dst>]
[-cat [-ignoreCrc] <src> …]
[-checksum <src> …]
[-chgrp [-R] GROUP PATH…]
[-chmod [-R] <MODE[,MODE]… | OCTALMODE> PATH…]
[-chown [-R] [OWNER][:[GROUP]] PATH…]
[-copyFromLocal [-f] [-p] <localsrc> … <dst>]
[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> … <localdst>]
[-count [-q] <path> …]
[-cp [-f] [-p] <src> … <dst>]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-df [-h] [<path> …]]
[-du [-s] [-h] <path> …]
[-expunge]
[-get [-p] [-ignoreCrc] [-crc] <src> … <localdst>]
[-getfacl [-R] <path>]
[-getmerge [-nl] <src> <localdst>]
[-help [cmd …]]
[-ls [-d] [-h] [-R] [<path> …]]
[-mkdir [-p] <path> …]
[-moveFromLocal <localsrc> … <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> … <dst>]
[-put [-f] [-p] <localsrc> … <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] <src> …]
[-rmdir [–ignore-fail-on-non-empty] <dir> …]
[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[–set <acl_spec> <path>]]
[-setrep [-R] [-w] <rep> <path> …]
[-stat [format] <path> …]
[-tail [-f] <file>]
[-test -[defsz] <path>]
[-text [-ignoreCrc] <src> …]
[-touchz <path> …]
[-usage [cmd …]]

Lets go through with each of them

  1. appendToFile   :- this is for appending a file in hdfs  

eg :-  lets we have a file employee which have following data :-

user@ubuntuvm:~/Desktop/hadoop$ cat employee
003 Amit Delhi India 12000
004 Anil Delhi India 15000
005 Deepak Delhi India 34000
006 Fahed Agra India 45000
007 Ravi Patna India 98777
008 Avinash Punjab India 120000
009 Saajan Punjab India 54000
001 Harit Delhi India 20000
002 Hardy Agra India 20000
110 Kaushal Agra India 90000
113 Abhi Punjab India 12999
141 Ajay Patna India 120000

this file resides in NFS

and we also have a directory /file with name /testemp

user@ubuntuvm:~/Desktop/hadoop$ hadoop fs -cat /testemp
15/08/05 04:32:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
003 Amit Delhi India 12000
004 Anil Delhi India 15000
005 Deepak Delhi India 34000
006 Fahed Agra India 45000
007 Ravi Patna India 98777
008 Avinash Punjab India 120000
009 Saajan Punjab India 54000
001 Harit Delhi India 20000
002 Hardy Agra India 20000
110 Kaushal Agra India 90000
113 Abhi Punjab India 12999
141 Ajay Patna India 120000

hence after using appendToFile we should have copy of 2 for each line as both file contents are same

hadoop fs -appendToFile employee /testemp

and now :-

hadoop fs -cat /testemp
15/08/05 04:34:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
003 Amit Delhi India 12000
004 Anil Delhi India 15000
005 Deepak Delhi India 34000
006 Fahed Agra India 45000
007 Ravi Patna India 98777
008 Avinash Punjab India 120000
009 Saajan Punjab India 54000
001 Harit Delhi India 20000
002 Hardy Agra India 20000
110 Kaushal Agra India 90000
113 Abhi Punjab India 12999
141 Ajay Patna India 120000
003 Amit Delhi India 12000
004 Anil Delhi India 15000
005 Deepak Delhi India 34000
006 Fahed Agra India 45000
007 Ravi Patna India 98777
008 Avinash Punjab India 120000
009 Saajan Punjab India 54000
001 Harit Delhi India 20000
002 Hardy Agra India 20000
110 Kaushal Agra India 90000
113 Abhi Punjab India 12999
141 Ajay Patna India 120000

2.  cat  :- for showing the file data 

hadoop fs -cat /testemp
15/08/05 04:34:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
003 Amit Delhi India 12000
004 Anil Delhi India 15000
005 Deepak Delhi India 34000
006 Fahed Agra India 45000
007 Ravi Patna India 98777
008 Avinash Punjab India 120000
009 Saajan Punjab India 54000
001 Harit Delhi India 20000
002 Hardy Agra India 20000
110 Kaushal Agra India 90000
113 Abhi Punjab India 12999
141 Ajay Patna India 120000
003 Amit Delhi India 12000
004 Anil Delhi India 15000
005 Deepak Delhi India 34000
006 Fahed Agra India 45000
007 Ravi Patna India 98777
008 Avinash Punjab India 120000
009 Saajan Punjab India 54000
001 Harit Delhi India 20000
002 Hardy Agra India 20000
110 Kaushal Agra India 90000
113 Abhi Punjab India 12999
141 Ajay Patna India 120000

3. checksum :- it returns the checksum of file ( this is very helpful for dealing with corrupted files in HDFS)

Syntax :-

hadoop fs -checksum /testemp
15/08/05 04:38:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
/testemp MD5-of-0MD5-of-512CRC32C 000002000000000000000000128816fbe605d2bc2eed74545000c2d1

4.

copyFromLocal

Usage: hadoop fs -copyFromLocal <localsrc> URI

Similar to put command, except that the source is restricted to a local file reference.

hadoop fs -copyFromLocal input /testinp
15/08/05 04:39:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
user@ubuntuvm:~/Desktop/hadoop$ hadoop fs -cat /testinp
15/08/05 04:40:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
It was November. Although it was not yet late, the sky was dark when I turned into Laundress Passage. Father had finished for the day, switched off the shop lights and closed the shutters; but so I would not come home to darkness he had left on the light over the stairs to the flat. Through the glass in the door it cast a foolscap rectangle of paleness onto the wet pavement, and it was while I was standing in that rectangle, about to turn my key in the door, that I first saw the letter. Another white rectangle, it was on the fifth step from the bottom, where I couldn’t miss it.

copyToLocal

Usage: hadoop fs -copyToLocal [-ignorecrc] [-crc] URI <localdst>

Similar to get command, except that the destination is restricted to a local file reference.

Usage :-

user@ubuntuvm:~/Desktop/hadoop$ hadoop fs -copyToLocal /testinp newtest
15/08/05 04:42:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
user@ubuntuvm:~/Desktop/hadoop$ cat newtest
It was November. Although it was not yet late, the sky was dark when I turned into Laundress Passage. Father had finished for the day, switched off the shop lights and closed the shutters; but so I would not come home to darkness he had left on the light over the stairs to the flat. Through the glass in the door it cast a foolscap rectangle of paleness onto the wet pavement, and it was while I was standing in that rectangle, about to turn my key in the door, that I first saw the letter. Another white rectangle, it was on the fifth step from the bottom, where I couldn’t miss it.

count

Usage: hadoop fs -count [-q] [-h] [-v] <paths>

Count the number of directories, files and bytes under the paths that match the specified file pattern. The output columns with -count are: DIR_COUNT, FILE_COUNT, CONTENT_SIZE, PATHNAME

The output columns with -count -q are: QUOTA, REMAINING_QUATA, SPACE_QUOTA, REMAINING_SPACE_QUOTA, DIR_COUNT, FILE_COUNT, CONTENT_SIZE, PATHNAME

user@ubuntuvm:~/Desktop/hadoop$ hadoop fs -count /testemp
15/08/05 04:45:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
0 1 678 /testemp

directories :- 0 

Files :- 1

Bytes :- 678 

cp

Usage: hadoop fs -cp [-f] [-p | -p[topax]] URI [URI ...] <dest>

Copy files from source to destination. This command allows multiple sources as well in which case the destination must be a directory.

‘raw.*’ namespace extended attributes are preserved if (1) the source and destination filesystems support them (HDFS only), and (2) all source and destination pathnames are in the /.reserved/raw hierarchy. Determination of whether raw.* namespace xattrs are preserved is independent of the -p (preserve) flag.

Options:

  • The -f option will overwrite the destination if it already exists.
  • The -p option will preserve file attributes [topx] (timestamps, ownership, permission, ACL, XAttr). If -p is specified with no arg, then preserves timestamps, ownership, permission. If -pa is specified, then preserves permission also because ACL is a super-set of permission. Determination of whether raw namespace extended attributes are preserved is independent of the -p flag.

Usage :- user@ubuntuvm:~/Desktop/hadoop$ hadoop fs -cp /gemp /testinp
15/08/05 04:51:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
cp: `/testinp’: File exists

lets try -f option

user@ubuntuvm:~/Desktop/hadoop$ hadoop fs -cp -f /gemp /testinp
15/08/05 04:53:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
user@ubuntuvm:~/Desktop/hadoop$ hadoop fs -cat /testinp
15/08/05 04:53:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
123456,John Smith,Sunnyvale, CA
123457,Jane Brown,Mountain View, CA
123458,Tom Little,Mountain View, CA
1 AlfredsFutter kiste Maria Anders Obere Str. 57 Berlin 12209 Germany
2 Ana Trujillo Emparedados y helados Ana Trujillo Avda. de la Constitución 2222 México D.F. 05021Mexico
3 Antonio Moreno Taquería Antonio Moreno Mataderos 2312 México D.F. 05023 Mexico
4 Around the Horn Thomas Hardy 120 Hanover Sq. London WA1 1DP UK
5 Berglunds snabbköp Christina Berglund Berguvsvägen 8 Luleå S-95822 Sweden

df

Usage: hadoop fs -df [-h] URI [URI ...]

Displays free space.

Options:

  • The -h option will format file sizes in a “human-readable” fashion (e.g 64.0m instead of 67108864)

usage :-

user@ubuntuvm:~/Desktop/hadoop$ hadoop fs -df -h
15/08/05 04:56:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
Filesystem Size Used Available Use%
hdfs://localhost:54310 19.5 G 146.0 M 12.1 G 1%

du

Usage: hadoop fs -du [-s] [-h] URI [URI ...]

Displays sizes of files and directories contained in the given directory or the length of a file in case its just a file.

Options:

  • The -s option will result in an aggregate summary of file lengths being displayed, rather than the individual files.
  • The -h option will format file sizes in a “human-readable” fashion (e.g 64.0m instead of 67108864

Usage :- 

user@ubuntuvm:~/Desktop/hadoop$ hadoop fs -du -h /
15/08/05 05:01:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
25.6 K /arc
25.7 K /arcout
25.6 K /blocks
25.6 K /blockstest
3.6 M /blocktest

dus

Usage: hadoop fs -dus <args>

Displays a summary of file lengths

user@ubuntuvm:~/Desktop/hadoop$ hadoop fs -du -s /
15/08/05 05:05:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
148428375 /

get

Usage: hadoop fs -get [-ignorecrc] [-crc] <src> <localdst>

Copy files to the local file system. Files that fail the CRC check may be copied with the -ignorecrc option. Files and CRCs may be copied using the -crc option.

user@ubuntuvm:~/Desktop/hadoop$ hadoop fs -copyToLocal /usr /ttt

Here first part of BAsic HDFS shell commands conclude .

Hope you guys like it .

Will ass part 2 very soon .

Cheers  🙂

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s