Coding for Scientists: My Personal Journey and Recommendations
Have you ever considered learning to code but felt unsure where to begin? In this article, I'll share my coding journey and offer recommendations to help you get started. Learning to code has significantly helped in my career, and I hope the lessons from my experience will benefit you as well.
Before we dive in, allow me to make a case for coding—if you're not already convinced of its value. Coding skills can dramatically increase your efficiency, enabling you to perform tasks much faster than non-coders. Moreover, coding proficiency, particularly in data science, can be invaluable when designing experiments, especially those involving multiple controls and replicates. It can also empower you to confidently tackle complex statistical analyses. But the benefits don't stop there. With coding skills, you'll find conversations with your informatics colleagues more engaging and productive. Terms like Machine Learning and AI will no longer be intimidating buzzwords but familiar concepts you can discuss and apply. Perhaps most importantly, coding is enjoyable and can open up career opportunities you might never have imagined.
The beginnings
My coding journey began on a quiet evening around eight years ago when I decided, almost on a whim, to learn Python. At the time, I was primarily involved in organic synthesis, a field where coding skills weren't typically required. However, Python was gaining popularity among my colleagues, which piqued my curiosity. In retrospect, if I had meticulously listed reasons to learn coding, I might not have found many compelling arguments and could have abandoned the idea altogether. Fortunately, I didn't overthink it. I simply enrolled in courses at the University of Cambridge, where I was working, and learned the basics of Python and R.
Despite this initial foray, my interest waned quickly, and coding fell by the wayside for the next few years. Then came the pandemic. The first lockdown provided an abundance of free time, reigniting my interest in coding. This time, I approached it with more determination, dedicating 6-10 hours daily for several months. Without a specific goal in mind, I explored two distinct programming domains: web development and data science. Web development was captivating—watching lines of code transform into visually appealing websites was truly satisfying. However, data science resonated more deeply with me, perhaps due to my fondness for mathematics during my school years. This revelation led me to delve deeper into various aspects of data science, including machine learning and AI. The momentum I gained during this period has sustained my interest, and I now dedicate several hours each week to expanding my knowledge in data science and computer science.
Recommendations
Having watched countless YouTube videos and completed numerous online courses, I've distilled my experience into a concise list of recommendations. My goal is to save you time by highlighting the most valuable resources I've encountered. I've intentionally limited the options for each topic to provide a clear path for your coding journey, avoiding the overwhelming nature of many recommendation articles.
1. Start with FreeCodeCamp, which offers an excellent introductory Python course. The instructor, Professor Chuck, also teaches on Coursera.
2. For YouTube content, focus on quality channels like CS DOJO and Corey Schafer, which offer solid introductory and intermediate Python tutorials.
3. Don't obsess over memorising every syntax. Focus on grasping the main concepts of the programming language. You can always look up specific syntax later when needed.
4. Familiarise yourself with essential Python libraries for data manipulation and analysis: Pandas, NumPy, Matplotlib, Seaborn, and Plotly. Instead of attempting to master every detail, understand the core concepts and refer to documentation as needed. Begin with Kaggle's tutorials on these libraries. The best way to learn is by applying them to your own data.
5. If data science continues to intrigue you, enroll in Andrew Ng's Machine Learning course on Coursera.
6. To grasp machine learning concepts, you'll need a basic understanding of linear algebra and calculus. Khan Academy offers excellent resources on these topics. I also highly recommend the YouTube channel 3Blue1Brown for engaging videos on relevant mathematical concepts. If you're looking to deepen your understanding of the mathematical foundations of machine learning, Coursera offers an excellent course from Imperial College London on mathematics for machine learning. However, my recommendation is to start by grasping the basic concepts and applying ML models to your data first. This hands-on experience will provide valuable context, making it easier to understand the underlying mathematics when you revisit it later.
7. After completing Andrew Ng's course, implement these concepts in Python using the scikit-learn library. While their documentation is comprehensive, I highly recommend Aurélien Geron's book, which covers both concepts and Python implementation of machine learning and deep learning. Another really good book on the topic is Deep Learning in Python by Francois Chollet
8. If machine learning captivates you, progress to Deep Learning. Geron and Chollet's book covers this as well. Additionally, enroll in the Deep Learning Specialization by Andrew Ng on Coursera, one of the best course on the topic.
9. To excel in data science, you'll need a solid foundation in statistics and probability. I recommend the first few chapters of "Practical Statistics for Data Scientists", which includes practical Python coding examples to illustrate statistical concepts.
10. Finally, practice coding regularly. Websites like HackerRank and LeetCode are great for honing your Python syntax skills. Kaggle is an excellent platform to learn from other data scientists' work and contribute your own projects.
Note: While Andrew Ng's Coursera courses on Machine Learning and Deep Learning can be audited for free, I recommend purchasing them to access all assignments. They're worth the investment.
Chemistry folks
For my chemistry colleagues, I highly recommend learning RDKit, a python library for cheminformatics. Begin with the 'Getting Started' documentation on the RDKit website. This comprehensive guide provides an excellent foundation for understanding and using the library. And follow these 3 outstanding blog posts to deepen your knowledge of cheminformatics: a) Practical cheminformatics by Patrick Walters, b) is life worth living? c) RDKit blog
Sharing Your Code with Non-Coders
As you progress in data science, you'll likely want to share your analysis pipeline with non-coding colleagues. This is where a basic understanding of web development becomes invaluable. For Python users, Django and Flask are the primary web development frameworks. However, if you find these have a steep learning curve, lighter libraries like DASH and Streamlit offer excellent alternatives for quickly creating apps to showcase your analysis pipeline.
In my work, I have used DASH to share data analysis workflows with colleagues, enabling them to replicate the same analysis on their projects. DASH's well-written documentation is sufficient to build an attractive web app, eliminating the need for additional courses. However, if you're keen on comprehensive web development skills, start with FreeCodeCamp's introductory courses on HTML, CSS, and JavaScript.
It's important to note that coding extends far beyond data science, and we've only scratched the surface here. Your coding journey can take many directions, and I encourage you to explore various avenues. Alongside data science, I ventured into web development and found it immensely enjoyable. The key is to start somewhere, establish a routine, and focus on mastering at least one programming language. In today's digital world, where computers are integral to almost every aspect of our lives, coding skills are advantageous in any career. If you haven't begun learning to code yet, there's no better time to start than today. I wish you the best of luck on your coding journey!
Finally, we're excited to announce that we are currently developing a comprehensive course specifically designed to teach coding for scientists. This course will build on the principles and recommendations outlined in this article, providing a structured learning path tailored to the needs of scientific professionals. If you're interested in taking your coding skills to the next level with guidance specifically tailored for scientists, we encourage you to subscribe.