General Computer Science questions
Hash Tables
What is a hash table, and how does it work?
- A hash table is a data structure that maps keys to values using a hash function to compute an index into an array of buckets.
- It allows for fast lookups, inserts, and deletions.
Time Complexity:
- Average Case: ( O(1) ) for insertion, search, and delete.
- Worst Case: ( O(n) ) for all three operations when all keys map to the same bucket due to collisions.
What is a hash collision, and how can it be handled?
- A hash collision occurs when two different keys hash to the same index.
- Collision Handling Techniques:
- Chaining: Use linked lists to store multiple keys in the same bucket.
- Open Addressing: Probes the next available slot in the array.
What is a good hash function?
- Uniformly distributes keys across the hash table.
- Minimizes collisions.
- Efficient to compute.
Heap
What is a heap, and how is it used?
- A heap is a specialized binary tree used to maintain a priority queue.
- Min-Heap: The smallest element is at the root.
- Max-Heap: The largest element is at the root.
Time Complexity:
- Find Min/Max: ( O(1) )
- Insert/Delete: ( O(\log n) )
- Building a Heap: ( O(n) )
- Heap Sort: ( O(n \log n) )
BST vs. Heap
- Heap: Ensures the highest/lowest priority element is always at the root.
- BST: Maintains an ordered structure where the left child is smaller and the right child is larger than the parent.
Time Complexity:
- Search in BST:
- Average: ( O(\log n) )
- Worst Case (Unbalanced): ( O(n) )
SQL Concepts
What is the difference between a primary key and a unique key?
- Primary Key: Uniquely identifies a row and cannot have
NULL
values. - Unique Key: Ensures uniqueness but allows one
NULL
value.
What is the purpose of indexing in a database?
- Indexes improve the speed of data retrieval by allowing faster row lookups compared to scanning the entire table.
- Especially useful for large datasets.
What is normalization in SQL?
- The process of organizing a database to:
- Reduce redundancy.
- Improve data integrity.
- Achieved by dividing large tables into smaller ones and defining relationships between them.
What is a foreign key in SQL?
- A column (or group of columns) that establishes a link between data in two tables.
- Enforces referential integrity by ensuring a value in one table matches a value in another.
What is the difference between INNER JOIN and OUTER JOIN?
- INNER JOIN: Returns rows with matching values in both tables.
- OUTER JOIN:
- LEFT OUTER JOIN: Includes all rows from the left table, with
NULL
for unmatched rows in the right table. - RIGHT OUTER JOIN: Includes all rows from the right table, with
NULL
for unmatched rows in the left table. - FULL OUTER JOIN: Includes all rows from both tables, with
NULL
for unmatched rows.
- LEFT OUTER JOIN: Includes all rows from the left table, with
Big O Notation
- ( f(x) = O(g(x)) ) if there exist constants ( M > 0 ) and ( n > 0 ) such that: [ |f(x)| \leq M \cdot |g(x)| \text{ for all } x > n. ]
Linked List vs. Array
| Operation | Linked List | Array | |——————————|—————–|——————–| | Access by Index | ( O(n) ) | ( O(1) ) | | Search (Sorted) | ( O(n) ) | ( O(\log n) ) | | Insertion/Deletion at Head | ( O(1) ) | ( O(n) ) (may require resizing) | | Insertion/Deletion at Index | ( O(n) ) | ( O(n) ) |
C++ Concepts
Key Features
- Speed and Low Latency: Ideal for real-time risk analysis and pricing.
- Memory Allocation:
malloc()
andfree()
for heap memory (manual management).- Classes typically use static memory (automatically managed).
Multithreading:
- Unlike Python, C++ does not have a Global Interpreter Lock (GIL), enabling true multithreading.
Namespaces:
- Avoid name collisions by organizing code logically.
Object-Oriented Programming (OOP):
- Inheritance: Base and derived classes.
- Polymorphism: Runtime behavior via
virtual
functions. - Encapsulation: Hiding data within classes.
- Abstraction: Simplifying complex systems.
Structs:
- Groups different data types into a single entity.
Floating-Point Types:
float
: 32-bit precision.double
: 64-bit precision.
Virtual Functions:
- Enable runtime polymorphism and allow overriding parent methods.
Pure virtual function
- a virtual function that is declared within a base class but has no implementation in that class
pure virtual vs virtual:
Aspect | Pure Virtual | Virtual |
---|---|---|
Definition | Declared with = 0 , no implementation in the base class. |
Can have an implementation in the base class. |
Syntax | virtual void func() = 0; |
virtual void func() { ... }; |
Aspect | Pure Virtual | Virtual |
---|---|---|
Primary Purpose | Defines an interface in the base class. | Provides a default behavior in the base class. |
Requirement | Derived classes must override the function unless the derived class is also abstract. | Derived classes may override the function, but it is not required. |
Base Class Usage | Makes the base class abstract (non-instantiable). | The base class can still be instantiated. |
Polymorphism | Ensures all derived classes implement the same interface. | Allows derived classes to optionally customize behavior. |
hiding in C++
Derived class also define function in same name as base class (just like polymorphism with virtual).
but provide a new definition in the derived class that is unrelated to the base class member.
The function in base class is just void, not virtual.
Hiding uses compile-time binding, virtual functions uses runtime binding.
memory leak, memory overhead, memory overflow, memory corruption
- Memory leak: Program fails to release memory, should free memory after use
- Memory overhead: Program requiring extra memory (e.g. dynamic memory allocation)
- Memory overflow: Program uses more memory than available
(e.g. stack overflow, call stack reaches max recursion depth) - Memory corruption: Program modifies memory it does not own
Can lead to crashes and data loss
Python
-
How to handle exceptions?
Usetry
,except
,else
, andfinally
blocks to catch and handle exceptions. - Difference between
deepcopy
andshallowcopy
?- Deepcopy: Copies all objects recursively.
- Shallowcopy: Only copies the reference for nested objects.
- Pandas
- How is a Pandas DataFrame different from an SQL table?
Both represent tabular data, but a Pandas DataFrame is an in-memory object for data manipulation in Python, while an SQL table is part of a relational database.
- How to detect missing data
isnull()
- Difference between loc and iloc?
loc uses column/row labels
iloc uses integet indices
- concat vs merge vs join
concat(): Stacks DataFrames along rows or columns.
merge(): Combines DataFrames based on common columns or indices, similar to SQL joins.
join(): Combines DataFrames based on their indices.
-
Multithreading vs Multiprocessing
Excel
- Relative, absolute, and mixed references?
- Relative: Adjusts as you copy (
A1
). - Absolute: Fixed reference (
$A$1
). - Mixed: Fixes row or column (
$A1
orA$1
).
- Relative: Adjusts as you copy (
-
Conditional formatting?
Apply rules to highlight cells based on criteria, like highlighting values > 100. - Common functions?
VLOOKUP
/HLOOKUP
: Find values in rows/columns.INDEX-MATCH
: Dynamic lookups.IF
: Conditional logic.SUMIF
: Sum based on a condition.
-
Pivot tables?
Summarize, group, and analyze data dynamically by dragging fields into rows, columns, and values. - Data validation?
Restricts user input in cells, e.g., dropdown lists or numeric limits.
VBA
-
What is VBA?
A programming language in Excel used for automation, macros, and creating custom functions. - Key VBA objects?
- Workbook: The Excel file.
- Worksheet: A sheet in the workbook.
- Range: A cell or group of cells.
-
How to debug VBA code?
Use breakpoints, the Immediate Window, and theDebug.Print
method. -
How to run a macro?
Use the Developer tab > Macros or assign it to a button. - VBA for data analysis?
Automates repetitive tasks like importing data, formatting, and creating charts.
SQL
- Primary key vs unique key?
- Primary Key: Ensures uniqueness and no
NULL
values. - Unique Key: Ensures uniqueness but allows one
NULL
value.
- Primary Key: Ensures uniqueness and no
e.g. For a table of students, StudentID can be primary key
(must-have, unique, cannot be null)
Student_name can be unique key
also unique but can be null
-
What is indexing?
Improves query speed by creating a data structure for fast lookups on columns. -
What is normalization?
Organizing data to reduce redundancy and ensure integrity by dividing tables logically. -
What is a foreign key?
Links data between tables, enforcing that values in one table exist in another.
e.g. In another table called Course_Enrolment, can haveStudentID
columns referencing students taking this course -
INNER JOIN vs OUTER JOIN?
- INNER JOIN: Returns only matching rows.
- OUTER JOIN: Includes unmatched rows with
NULLs
.
R
-
What is R used for?
Statistical computing, data visualization, and machine learning. - What are R’s data structures?
- Vectors
- Matrices
- Data Frames
- Lists
-
How to install packages in R?
Useinstall.packages("package_name")
andlibrary(package_name)
to load. -
What is ggplot2?
A popular R library for creating customizable visualizations. - How to handle missing data?
Usena.omit()
to remove orimpute()
to fill missing values.