Loading Data from Files into Tables | HiveServer2 and Sentry

0 votes
0 views
asked Aug 30, 2017 in Hadoop by admin (4,410 points)
SummaryAllow HiveServer2/Sentry/HDFS to LOAD DATA from files into tables

Symptoms

HDFS File Permissions (HiveServer2):

2016-01-05 13:41:01,643 ERROR org.apache.hadoop.hive.ql.exec.Task: [HiveServer2-Background-Pool: Thread-160]: Failed with exception Unable to move source hdfs://namenode:8020/user/test/data.txt to destination hdfs://namenodel:8020/user/hive/warehouse/test
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://namenode:8020/user/test/data.txt to destination hdfs://namenode:8020/user/hive/warehouse/test
at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2612)
at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2855)
at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1657)
at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:312)
. . .

Caused by: org.apache.hadoop.security.AccessControlException: Permission denied: user=hive, access=WRITE, inode="/user/test":test:test:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:148)
at org.apache.sentry.hdfs.SentryAuthorizationProvider.checkPermission(SentryAuthorizationProvider.java:174)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6609)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInternal(FSNamesystem.java:3890)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInt(FSNamesystem.java:3860)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:3825)


HDFS File Permissions (Hue):

[05/Jan/2016 10:40:59 -0800] beeswax_install_examples INFO Loading data into table "sample_07"
[05/Jan/2016 10:40:59 -0800] thrift_util DEBUG Thrift call: /  
<class 'TCLIService.TCLIService.Client'>  /  
.ExecuteStatement  /  
(args=(TExecuteStatementReq(confOverlay={},   /  
sessionHandle=TSessionHandle(sessionId=THandleIdentifier(secret='\x1c\xaa\xff\xfa\xbb0L\xf7\x84^\x1e\xee\xef\xbaN\xd3', /  
guid='K\x02\xc6\x93\x99\x81H\xc1\xb3\x16\t\xa0\xd2\xe3<\x81')), runAsync=True, statement="LOAD DATA INPATH\n '/user/test/data.txt' /  
OVERWRITE INTO TABLE sample_07"),), kwargs={})
[05/Jan/2016 10:41:00 -0800] thrift_util DEBUG Thrift call /  
<class 'TCLIService.TCLIService.Client'>.GetOperationStatus returned in 201ms: TGetOperationStatusResp(status=TStatus(errorCode=None, errorMessage=None, sqlState=None, infoMessages=None, statusCode=0), /  
operationState=5, errorMessage='Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask', sqlState='08S01', errorCode=1)


Sentry Privileges (beeline):

Error: Error while compiling statement: FAILED: SemanticException No valid privileges
 User [name] does not have privileges for LOAD
 The required privileges: Server=server1->URI=hdfs://namenode:8020/user/test/data.txt->action=*; (state=42000,code=40000)
Applies To
  • Hive
  • Sentry
  • Hue
  • CDH 5.3 and higher
Cause

When working in a kerberized clustered environment there are a couple of considerations when issuing a Hive LOAD DATA statement.

  • HDFS File Permissions
  • Sentry Privileges

HDFS

With Apache Sentry enabled, HiveServer2 must have impersonation disabled.  When impersonation is disabled, all HDFS actions from HiveServer2 are performed as the 'hive' user. The LOAD DATA statement performs an HDFS move command on the source file to place it into the table's HDFS directory.  The 'hive' user must have the correct permissions on the source directory, the source file, and the target directory to perform this move.

Sentry

Sentry will examine the LOAD DATA statement source file and the destination table.  The privileges required are INSERT on the destination table and URI on the source file.

Instructions
  1. Grant URI permissions in Sentry to the active user for the user's home directory (/user/<username>).  If there are other directories in HDFS that the user needs to load data from, the user must also be granted URI permissions in Sentry for them.
     
    GRANT ALL ON URI "hdfs://namenode:8020/user/<name>/" TO ROLE <role>;
    
    NOTE: The /tmp uses a sticky bit meaning the Hive user can never move data owned by another user out of /tmp.  LOAD DATA from files in /tmp will not work.
    
    NOTE: Because the NameNode host and port must be specified, Cloudera strongly recommends you use High Availability (HA). This ensures that the URI will remain constant even if the NameNode changes.
    
    NOTE: Starting with CDH 5.8, if the GRANT for Sentry URI does not specify the complete scheme, or the URI mentioned in Hive DDL statements does not have a scheme, Sentry automatically completes the URI by applying the default scheme based on the HDFS configuration provided in the fs.defaultFS property. Using the same HDFS configuration, Sentry can also auto-complete URIs in case the URI is missing a scheme and an authority component.  Prior to CDH 5.8, fully qualified URIs are required.
    
    Since Sentry supports more than one scheme (hdfs, s3) and implements auto-complete, Cloudera recommends that you specify the fully qualified URI in GRANT statements to avoid confusion.
    
    
    
  2. Add HDFS ACLs to allow the 'hive' user access to all directories in /user and any other directories users will load data from.  This will address the HDFS file permission concerns pertaining to HiveServer2's HDFS move operation.  An ACL must be applied to all existing files and directories as well a 'default' ACL which will be applied to all future user files and directories.
# Perform these HDFS actions as the 'hdfs' superuser

# Set default permissions
hdfs dfs -setfacl -m -R default:user:hive:rwx /user

# Set current permissions
hdfs dfs -setfacl -m -R user:hive:rwx /user


Note:  Change /user to any other directories that require ACLs.

Please log in or register to answer this question.

...