Reading Notes: 'Data Management in 30 Minutes' by Yuzutaso and Haseryo

Tadashi Shigeoka ·  Sat, October 24, 2020

I read ‘Data Management in 30 Minutes’, so I’ll share the insights I gained from the book.

Data Management in 30 Minutes

Background: Want to Get Started with Data Management

As I’m walking the career path of a software engineer, I’m not very familiar with data management and have always been thinking “I want to learn about data management”. When I found this book available on Kindle Unlimited, I read it.

Table of Contents & End-of-Chapter Checklists

【01: Data Architecture】 ・Is there an architecture diagram from data generation to business utilization? ・Is there an improvement plan for the architecture?

【02: Data Storage and Operations (DB)】 ・Are you properly storing data throughout its entire lifecycle? ・Are database and storage maintenance operations sufficiently stable?

【03: Data Integration and Interoperability (ETL)】 ・Can you safely provide data in the required format and timing? ・Are the costs and complexity of data integration sufficiently reduced?

【04: Data Modeling and Design (ER)】 ・Can you describe data relationships at conceptual, logical, and physical levels? ・Are data model updates and references sufficiently efficient?

【05: Master Data Management】 ・Can master data be used as an official source throughout the company? ・Are master data updates and references sufficiently efficient?

【06: Document and Content Management】 ・Can you effectively and efficiently accumulate, search, and use documents?

【07: Data Security】 ・Is compliance with privacy and confidentiality regulations and policies sufficient? ・Are access restrictions and auditing for privacy and confidentiality sufficient?

【08: Data Quality Management】 ・Can you define data quality (service level: SLA) standards, requirements, and specifications? ・Can you continuously implement definition updates, measurement, reporting, and improvement?

【09: Data Warehousing (DWH) and Business Intelligence (BI)】 ・Can data users effectively and efficiently perform analysis and decision-making?

【10: Metadata Management】 ・Can you sufficiently collect and integrate metadata from various data sources? ・Can you provide standard methods for accessing metadata?

【11: Data Governance】 ・Are role assignments and authority grants for managing data assets sufficient? ・Is there stakeholder agreement on rules, policies, processes, evaluations, tools, and responsibilities for data management? ・Is data utilization proceeding as planned?

Below are quotes and notes from sections that left an impression.

Chapter 7: Data Security

The biggest concern was "Is it possible to protect data in the cloud at the same level as on-premises?" If you use the provided user restriction and authority management functions, you can achieve minimum data protection. Also, not only proactive response (prevention) but also reactive response (investigation) perspectives are important. Many cloud services allow you to easily obtain audit logs. While the possibility of failures due to operational errors is not zero, the same is true for on-premises.

It was valuable to learn practical perspectives on data security.

Secure DWH Containing Personal Information

To easily reference data, we want to avoid including personal information in the DWH. On the other hand, some projects require handling secure data. For such use cases, the author often designs with this dual structure.

Preparing a Secure Environment

Business DB (original data)

↓ Copy

DWH with personal information

↓ Masking

DWH without personal information

The idea of designing DWH with a dual structure - with and without personal information - is a realistic approach from a practical perspective, so I’d like to adopt it.

Thoughts After Reading

I was able to quickly input information about data management in general, so I’d like to recommend it as an introductory book.

Having learned keywords from this book, I want to practice data management and expand my learning scope.

Reference Examples of Data Management

Yuzutaso’s talk at Machine Learning Casual Talk had clear examples, so I recommend reading it together.

That’s all from the Gemba, where I want to continue learning about data management.