L1 -Techincal Inyroduce yourself What is your project What are your source data types ? -csv/RDBMS how you get it? How bigger was the client cluster? What was data size? Load was daily , weekly or month? Why client selected hadoop rather than RDBMS? Which tool for workfow? What is staging in spark? What is RDD? What is intention behing lazy evaluation? What is intention behind keeping RDD immutable/unable to update? You have written multiple tranformations on your RDD but still you have not fired any action. How your spark server WEB UI will look like? Suppose you fired action on on RDD what exactly happens internally in spark ?(Here I told about it goes backword 1 by 1 to created required RDD using lineage graph in backword direction and first RDD is calculated and again return back to action) Which are the transformation in spark? I have given you an RDD . how will you convert it to paired RDD uisng its first element as key? ans- RDD2=RDD1.map(lambda x:(x[1], x)) What is difference between hadoop 2X and 1X ? What is HA concept? What if Name node failed? What to do and who was doing in your project? What is heartbeats concept? I have file 500 MB on hadoop 2x .how much block and replicas will be there ? I have a file home_id product meter h1 p1 20 h1 p2 30 H2 p2 23 I want to create partitions with the key home id.How will do it on local file system without suing SPARK, HIVE ,MAP reduce. Use simle programing language like java/python. Later how will you do it in hive and spark? 21. I have an 3x3 ARRAY which is sorted 1 3 5 7 8 9 11 15 18 Write a program so that if use passed any element from terminal, it will return its exact position in array. (i did as below ) a=int[3][3] a=[(1,3,5),(7,8,9),(11,15,18)] x=int(std.input()) --user input For i in 1 to 3 For j 1 to 3 If x ==a[i][j] Then print(‘location of x in %i %j’,i,j) L2 : technical 1.there is file Name id Ajay 1 Ram 2 Ajay 3 Ram 4 Jack 6 Devid 7 ID is unique and Name might be repeatble. Write program so that user will enter name ‘ajay’ then program will return list of IDs -[1,3] Input Ram : output [2,4]
Big Data Internship Interview Questions
1,784 big data internship interview questions shared by candidates
Spark questions Bucketing and partitioning Hive related questions Difference between tuple and list
Write a programme to check two strings are annagram.
A lot of conceptual questions about when and why I would use certain tech. A lot of questions about past experience and why I done certain things in certain scenarios.
Sql queries
Architecture of project
questions about additional things to be taken care of moving my code into production and how design can be improved, test-driven and domain-driven development, Kafka, the difference between list and set, GraphQL, Graph Database,REST API and design considerations etc
Basics fundamentals of programming like C, Java, Oops etc.
A simple algorithm question which I don't remember
What is your experience with big data?
Viewing 1181 - 1190 interview questions