Big Data Tools & Techniques For Msc Case Study Solution


To understand better how or what diagnoses are made, with an eventual view to optimize the test, such as: imaging and hearing assessments, works as a health service. As data analysts, we need to find out the 5 most common codes of diagnosis with their frequencies, hearing evaluation and find 4 patient for this with frequencies, highest number of diagnoses assigned to a single patient, total number of peoples with hearing problem and compare the number of people who had a hearing evaluation; the total number of patient with hearing loss averaged by CT/MT/SC, clients having the greatest number of CTs.

Description of the Dataset

The given dataset based on real medical data, containing (non-textual) information about patient–medical professional encounters. The dataset contains information in a number of separate files:


It contains information about an appointment at which a magnetic resonance (MR), computed tomography (CT) or another type of scan (SC) may have been carried out.

Hearing evaluation:

It comprises of results of hearing evaluations only, containing information such as “severity of hearing loss” or whether any deafness is only in one or in both the ears. (“unilateral/bilateral”).


It presents patients alongside diagnoses codes assigned. The information includes the age at diagnosis.

To find the answers of these questions, firstly we have to setup the environment for the analysis. We are using the Cloudera Virtual Machine as it is recommended in the assessment report

And it contains the Centos, which is a Linux Flavor. Centos is a Linux distribution, which provides a free, community-based computing platform that is consistent with Red Hat Enterprise Linux, the upstream source. We are using it for the analysis of our project.

Setup of VM and Creation of Environment

Firstly, we have to setup the environment for the analysis task, as we have used Cloudera virtual machine, which was run on the Linux environment named as Centos.

Data Extraction Using Tar Command

As, it is the Linux environment, we have used the tar -zxvf command to extract the data which is in the zip file……………………………….


This is just a sample partical work. Please place the order on the website to get your own originally done case solution.

Share This