Biological Data Exploration with Python, Pandas and Seaborn

Biological Data Exploration with Python, Pandas and Seaborn
Author :
Publisher :
Total Pages : 398
Release :
ISBN-10 : 9798612757238
ISBN-13 :
Rating : 4/5 (38 Downloads)

Book Synopsis Biological Data Exploration with Python, Pandas and Seaborn by : Martin Jones

Download or read book Biological Data Exploration with Python, Pandas and Seaborn written by Martin Jones and published by . This book was released on 2020-06-03 with total page 398 pages. Available in PDF, EPUB and Kindle. Book excerpt: In biological research, we''re currently in a golden age of data. It''s never been easier to assemble large datasets to probe biological questions. But these large datasets come with their own problems. How to clean and validate data? How to combine datasets from multiple sources? And how to look for patterns in large, complex datasets and display your findings? The solution to these problems comes in the form of Python''s scientific software stack. The combination of a friendly, expressive language and high quality packages makes a fantastic set of tools for data exploration. But the packages themselves can be hard to get to grips with. It''s difficult to know where to get started, or which sets of tools will be most useful. Learning to use Python effectively for data exploration is a superpower that you can learn. With a basic knowledge of Python, pandas (for data manipulation) and seaborn (for data visualization) you''ll be able to understand complex datasets quickly and mine them for biological insight. You''ll be able to make beautiful, informative charts for posters, papers and presentations, and rapidly update them to reflect new data or test new hypotheses. You''ll be able to quickly make sense of datasets from other projects and publications - millions of rows of data will no longer be a scary prospect! In this book, Dr. Jones draws on years of teaching experience to give you the tools you need to answer your research questions. Starting with the basics, you''ll learn how to use Python, pandas, seaborn and matplotlib effectively using biological examples throughout. Rather than overwhelm you with information, the book concentrates on the tools most useful for biological data. Full color illustrations show hundreds of examples covering dozens of different chart types, with complete code samples that you can tweak and use for your own work. This book will help you get over the most common obstacles when getting started with data exploration in Python. You''ll learn about pandas'' data model; how to deal with errors in input files and how to fit large datasets in memory. The chapters on visualization will show you how to make sophisticated charts with minimal code; how to best use color to make clear charts, and how to deal with visualization problems involving large numbers of data points. Chapters include: Getting data into pandas: series and dataframes, CSV and Excel files, missing data, renaming columns Working with series: descriptive statistics, string methods, indexing and broadcasting Filtering and selecting: boolean masks, selecting in a list, complex conditions, aggregation Plotting distributions: histograms, scatterplots, custom columns, using size and color Special scatter plots: using alpha, hexbin plots, regressions, pairwise plots Conditioning on categories: using color, size and marker, small multiples Categorical axes:strip/swarm plots, box and violin plots, bar plots and line charts Styling figures: aspect, labels, styles and contexts, plotting keywords Working with color: choosing palettes, redundancy, highlighting categories Working with groups: groupby, types of categories, filtering and transforming Binning data: creating categories, quantiles, reindexing Long and wide form: tidying input datasets, making summaries, pivoting data Matrix charts: summary tables, heatmaps, scales and normalization, clustering Complex data files: cleaning data, merging and concatenating, reducing memory FacetGrids: laying out multiple charts, custom charts, multiple heat maps Unexpected behaviours: bugs and missing groups, fixing odd scales High performance pandas: vectorization, timing and sampling Further reading: dates and times, alternative syntax


Biological Data Exploration with Python, Pandas and Seaborn Related Books