Schema of a view is not changed even though the schema of underlying table has been changed

0 votes
2 views
asked Aug 30, 2017 in Hadoop by admin (4,410 points)
Summary
Applies To
Symptoms

The schema of the original table is changed, but the schema of the view is not changed accordingly
This problem can be shown with below commands:

> create table customers_temp1 ( id int, name string );
Query: create table customers_temp ( id int, name string

> create view view_customers_temp as select * from customers_temp;
> describe extended view_customers_temp;
Query: describe extended view_customers_temp
+------------------------------+----------------------------------------+----------------------+
| name | type | comment |
+------------------------------+----------------------------------------+----------------------+
| # col_name | data_type | comment |
| | NULL | NULL |
| id | int | NULL |
| name | string | NULL |
| | NULL | NULL |

> alter table customers_temp CHANGE id id string;
Query: alter table customers_temp CHANGE id id string

> describe extended customers_temp;
Query: describe extended customers_temp
+------------------------------+---------------------------------------------------------------------------------------+----------------------+
| name | type | comment |
+------------------------------+---------------------------------------------------------------------------------------+----------------------+
| # col_name | data_type | comment |
| | NULL | NULL |
| id | string | NULL |
| name | string | NULL |
| | NULL | NULL |

> describe extended view_customers_temp;
Query: describe extended view_customers_temp
+------------------------------+----------------------------------------+----------------------+
| name | type | comment |
+------------------------------+----------------------------------------+----------------------+
| # col_name | data_type | comment |
| | NULL | NULL |
| id | int | NULL | <<< this is not changed.
| name | string | NULL |
| | NULL | NULL |



 

Cause

This behavior is by design.

Even though view is just a query, it also has its own storage description in SDS table of HMS database.
For example, assume below queries are executed in Hive:

create database c133191;
use c133191;
create table t (a int, b string);
create view customer_temp as select a from c133191.t;


Two records will be added into TBLS table in HMS database:

+--------+-------------+-------+------------------+-------+-----------+--------+----------+--------------+-----------------------------------+-------------------------+----------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | LINK_TARGET_ID |
+--------+-------------+-------+------------------+-------+-----------+--------+----------+--------------+-----------------------------------+-------------------------+----------------+
| 43466 | 1490873827 | 43400 | 0 | hive | 0 | 131156 | t | MANAGED_TABLE | NULL | NULL | NULL |
| 43467 | 1490873852 | 43400 | 0 | hive | 0 | 131157 | v | VIRTUAL_VIEW | select `t`.`a` from `c133191`.`t` | select a from c133191.t | NULL |
+--------+-------------+-------+------------------+-------+-----------+--------+----------+--------------+-----------------------------------+-------------------------+----------------+

When the "alter table" command is executed, the columns linked to table customer_temp1 SD are changed, but the columns linked to view SD are not changed.
This is why "describe" command returns a wrong data type.

This problem can be shown even more obviously with below commands:

>alter table t change column a new_id bigint;
>desc t;
+-----------+------------+----------+--+
| col_name  | data_type  | comment  |
+-----------+------------+----------+--+
| new_id    | bigint     |          |
| b         | string     |          |
+-----------+------------+----------+--+
>desc customer_temp;
+-----------+------------+----------+--+
| col_name  | data_type  | comment  |
+-----------+------------+----------+--+
| a         | int        |          |
+-----------+------------+----------+--+

As you can see, the "describe" command still shows the old column name and type, not the changed one.

It is very hard for Hive to change view SD automatically. Currently Hive doesn't track all the views defined on top a table. That is, when a table is changed, Hive doesn't know what views should be changed. Actually, after table is altered, the view definition query could become invalid now, but Hive doesn't know it.

Instructions

This is an expected behavior. Please check this link for detail information :
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/AlterView

To work around this problem, please drop and recreate the view after changing the underlying table

Please log in or register to answer this question.

...