From a4f982be9b21323038704069a56fb2448369d6a0 Mon Sep 17 00:00:00 2001 From: Humble Devassy Chirammal Date: Thu, 24 Sep 2015 14:53:52 +0530 Subject: Porting developer guide to source code repo from glusterdocs project Change-Id: Ib8d9c668ebb05863918e6ec2b89908f206626f38 BUG: 1206539 Signed-off-by: Humble Devassy Chirammal Reviewed-on: http://review.gluster.org/12227 Tested-by: NetBSD Build System Reviewed-by: Prashanth Pai Reviewed-by: Humble Devassy Chirammal Tested-by: Humble Devassy Chirammal Tested-by: Raghavendra Talur --- doc/developer-guide/Backport-Guidelines.md | 41 ++ doc/developer-guide/Backport-Wishlist.md | 193 +++++++++ doc/developer-guide/Bug-Reporting-Guidelines.md | 128 ++++++ doc/developer-guide/Bug-Triage.md | 400 ++++++++++++++++++ doc/developer-guide/Bug-report-Life-Cycle.md | 57 +++ doc/developer-guide/Bug-reporting-template.md | 55 +++ doc/developer-guide/Building GlusterFS.md | 148 +++++++ doc/developer-guide/Compiling-RPMS.md | 178 ++++++++ doc/developer-guide/Developers-Index.md | 127 ++++++ doc/developer-guide/Development-Workflow.md | 457 +++++++++++++++++++++ doc/developer-guide/Easy-Fix-Bugs.md | 35 ++ ...s-reported-by-tools-for-static-code-analysis.md | 66 +++ doc/developer-guide/GlusterFS-Release-process.md | 73 ++++ doc/developer-guide/Guidelines-For-Maintainers.md | 70 ++++ doc/developer-guide/Jenkins-Infrastructure.md | 127 ++++++ doc/developer-guide/Jenkins-Manual-Setup.md | 146 +++++++ doc/developer-guide/Language-Bindings.md | 39 ++ doc/developer-guide/Projects.md | 99 +++++ .../Simplified-Development-Workflow.md | 238 +++++++++++ .../Using-Gluster-Test-Framework.md | 270 ++++++++++++ doc/developer-guide/afr-locks-evolution.md | 91 ++++ doc/developer-guide/afr-self-heal-daemon.md | 92 +++++ doc/developer-guide/afr.md | 191 +++++++++ doc/developer-guide/afr/afr-locks-evolution.md | 91 ---- doc/developer-guide/afr/afr.md | 191 --------- doc/developer-guide/afr/self-heal-daemon.md | 90 ---- doc/developer-guide/coredump-analysis.md | 55 +++ doc/developer-guide/daemon-management-framework.md | 9 +- doc/developer-guide/data-structures/inode.md | 226 ---------- doc/developer-guide/data-structures/iobuf.md | 259 ------------ doc/developer-guide/data-structures/mem-pool.md | 124 ------ doc/developer-guide/datastructure-inode.md | 226 ++++++++++ doc/developer-guide/datastructure-iobuf.md | 259 ++++++++++++ doc/developer-guide/datastructure-mem-pool.md | 124 ++++++ doc/developer-guide/gfapi-symbol-versions.md | 270 ++++++++++++ .../gfapi-symbol-versions/gfapi-symbol-versions.md | 270 ------------ doc/developer-guide/translator-development.md | 4 +- 37 files changed, 4260 insertions(+), 1259 deletions(-) create mode 100644 doc/developer-guide/Backport-Guidelines.md create mode 100644 doc/developer-guide/Backport-Wishlist.md create mode 100644 doc/developer-guide/Bug-Reporting-Guidelines.md create mode 100644 doc/developer-guide/Bug-Triage.md create mode 100644 doc/developer-guide/Bug-report-Life-Cycle.md create mode 100644 doc/developer-guide/Bug-reporting-template.md create mode 100644 doc/developer-guide/Building GlusterFS.md create mode 100644 doc/developer-guide/Compiling-RPMS.md create mode 100644 doc/developer-guide/Developers-Index.md create mode 100644 doc/developer-guide/Development-Workflow.md create mode 100644 doc/developer-guide/Easy-Fix-Bugs.md create mode 100644 doc/developer-guide/Fixing-issues-reported-by-tools-for-static-code-analysis.md create mode 100644 doc/developer-guide/GlusterFS-Release-process.md create mode 100644 doc/developer-guide/Guidelines-For-Maintainers.md create mode 100644 doc/developer-guide/Jenkins-Infrastructure.md create mode 100644 doc/developer-guide/Jenkins-Manual-Setup.md create mode 100644 doc/developer-guide/Language-Bindings.md create mode 100644 doc/developer-guide/Projects.md create mode 100644 doc/developer-guide/Simplified-Development-Workflow.md create mode 100644 doc/developer-guide/Using-Gluster-Test-Framework.md create mode 100644 doc/developer-guide/afr-locks-evolution.md create mode 100644 doc/developer-guide/afr-self-heal-daemon.md create mode 100644 doc/developer-guide/afr.md delete mode 100644 doc/developer-guide/afr/afr-locks-evolution.md delete mode 100644 doc/developer-guide/afr/afr.md delete mode 100644 doc/developer-guide/afr/self-heal-daemon.md create mode 100644 doc/developer-guide/coredump-analysis.md delete mode 100644 doc/developer-guide/data-structures/inode.md delete mode 100644 doc/developer-guide/data-structures/iobuf.md delete mode 100644 doc/developer-guide/data-structures/mem-pool.md create mode 100644 doc/developer-guide/datastructure-inode.md create mode 100644 doc/developer-guide/datastructure-iobuf.md create mode 100644 doc/developer-guide/datastructure-mem-pool.md create mode 100644 doc/developer-guide/gfapi-symbol-versions.md delete mode 100644 doc/developer-guide/gfapi-symbol-versions/gfapi-symbol-versions.md (limited to 'doc/developer-guide') diff --git a/doc/developer-guide/Backport-Guidelines.md b/doc/developer-guide/Backport-Guidelines.md new file mode 100644 index 00000000000..af48dc00f03 --- /dev/null +++ b/doc/developer-guide/Backport-Guidelines.md @@ -0,0 +1,41 @@ +Bugs often get fixed in master before release branches. When a bug is +fixed in the master branch, it might be desirable or necessary in a +stable branch. To put the fix in stable branch we need to backport the +fix to stable branch. + +Anyone in the community can suggest a backport. If you are interested to +suggest a backport, please check the [Backport +Wishlist](./Backport Wishlist.md). + +This page describes the steps needed to backport simple changes. Changes +that do not apply cleanly will need some manual modifications and using +`git cherry-pick` may not always be the easiest solution. + +1. Git clone the GlusterFS code + + git clone ssh://username@review.gluster.org/glusterfs + +2. Create and checkout a new branch for your work, based on the branch + for the backport version + + git checkout -t -b bug-123456/release-3.5 origin/release-3.5 + +3. Cherry pick the change from master. + + $ git cherry-pick -x a0b1c2d3e4f5 + - verify that the change has been merged in the master branch. + +4. Update/correct the commit message. + + $ git commit -s --amend --date="$(date)" +[This is one example](https://github.com/gluster/glusterfs/commit/40407afb529f6e5fa2f79e9778c2f527122d75eb) of the commit message that has a good description for a backport. Notice the indention of the patch-metadata like BUG, Change-ID and Reviewed-on tags. There is also the original commit-id that was cherry picked from the master branch. + -make sure to quote the review tags + -update the BUG reference, point to the BUG that is used for this +particular release-branch + -add a Signed-off-by tag + +5. Run `./rfc.sh` to post the backport for review. + + ./rfc.sh +After submitting patch(es), make sure to move the bug to the *POST* +status. diff --git a/doc/developer-guide/Backport-Wishlist.md b/doc/developer-guide/Backport-Wishlist.md new file mode 100644 index 00000000000..f191820c803 --- /dev/null +++ b/doc/developer-guide/Backport-Wishlist.md @@ -0,0 +1,193 @@ +Bugs often get fixed in master before release branches. + +When a bug is fixed in the master branch it might be desirable or +necessary to backport the fix to a stable branch. + +This page is intended to help organize support (and prioritization) for +backporting bug fixes of importance to the community. + +### GlusterFs 3.6 + +Requested Backports for 3.6.0 +----------------------------- + +The tracker bug for 3.6.0 : + + +Please add 'glusterfs-3.6.0' in the 'Blocks' field of bugs to propose +inclusion in GlusterFS 3.6.0. + +### GlusterFs 3.5 + +Requested Backports for 3.5.3 +----------------------------- + +Current [list of bugs planned for +inclusion](https://bugzilla.redhat.com/showdependencytree.cgi?hide_resolved=0&id=glusterfs-3.5.3). + +- File a new bug for backporting a patch to 3.5.3: + [... + new glusterfs-3.5.3 backport request] + +### GlusterFs 3.4 + +Requested Backports for 3.4.6 +----------------------------- + +The tracker bug for 3.4.6 : + + +Please add 'glusterfs-3.4.6' in the 'Blocks' field of bugs to propose +inclusion in GlusterFS 3.4.6. + + + + +Requested Backports for 3.4.4 +----------------------------- + + - "self-heal +process can sometimes create directories instead of symlinks for the +root gfid file in .glusterfs" + + - "structure needs +cleaning" message appear when accessing files. + + - glusterfs mount +crash after remove brick, detach peer and termination + +Requested Backports for 3.4.3 +----------------------------- + + - "self-heal +process can sometimes create directories instead of symlinks for the +root gfid file in .glusterfs" + + - "structure needs +cleaning" message appear when accessing files. + + - large NFS writes +to Gluster slow down then stop + + - glusterfs mount +crash after remove brick, detach peer and termination + +Requested Backports for 3.3.3 +----------------------------- + +[Enable fusermount by default, make nightly autobuilding +work](https://bugzilla.redhat.com/1058666) + +Requested Backports for 3.4.2 +----------------------------- + +Please enter bugzilla ID or patch URL here: + +​1) Until RDMA handling is improved, we should output a warning when +using RDMA volumes - + + +​2) Unable to shrink volumes without dataloss - + + +​3) cluster/dht: Allow non-local clients to function with nufa volumes. +- + +Requested Backports for 3.4.1 +----------------------------- + +Please enter bugzilla ID or patch URL here. + + - "quota context +not set in inode" + + - "NFS crash bug" + +A note for whoever reviews this list: These are the fixes for issues +that have caused actual service disruption in our production +installation and thus are absolutely required for us (-- Lubomir +Rintel): + + - "Setting ACL +entries fails with glusterfs-3.4.0" + + - "fd leaks +observed while running dbench with "open-behind" volume option set to +"on" on a replicate volume" + +These are issues that we've stumbled upon during the git log review and +that seemed scary enough for us to cherry-pick them to avoid risk, +despite not being actually hit. Hope that helps deciding whether it's +worthwhile cherry-picking them (-- Lubomir Rintel): + + "CLI crash upon +executing "gluster peer status" command" + + "quick-read and +open-behind xlator: Make options (volume\_options ) structure NULL +terminated." + + "nfs-root-squash: +rename creates a file on a file residing inside a sticky bit set +directory" + + "DHT : files are +stored on directory which doesn't have hash range(hash layout)" + + "statedump crashes +in ioc\_inode\_dump" + + "cli crashes when +setting diagnostics.client-log-level is set to trace" + + "glusterfsd crashes +on smallfile benchmark" + +, "tests: call 'cleanup' at the end of +each test", , +backport of 983975 + +, "glusterfs-api.pc.in contains an +rpath", , backport +of 1002220 + + "glusterd.service (systemd), ensure +glusterd starts before any local gluster mounts", +, backport of +1004795 + + meta, check that +glusterfs.spec.in has all relevant updates + + - Glusterd would +not store all the volumes when a global options were set leading to peer +rejection + +Requested Backports +------------------- + +- Please backport [gfapi: Closed the logfile fd and initialize to NULL + in glfs\_fini](http://review.gluster.org/#/c/6552) into release-3.5 + - Done +- Please backport [cluster/dht: Make sure loc has + gfid](http://review.gluster.org/5178) into release-3.4 +- Please backport [Bug 887098](http://goo.gl/QjeMP) into release-3.3 + (FyreFoX) - Done +- Please backport [Bug 856341](http://goo.gl/9cGAC) into release-3.2 + and release-3.3 (the-me o/b/o Debian) - Done for release-3.3 +- Please backport [Bug 895656](http://goo.gl/ZNs3J) into release-3.2 + and release-3.3 (semiosis, x4rlos) - Done for release-3.3 +- Please backport [Bug 918437](http://goo.gl/1QRyw) into release-3.3 + (tjstansell) - Done +- Please backport into [Bug + 884597](https://bugzilla.redhat.com/show_bug.cgi?id=884597) + release-3.3 (nocko) - Done + +Unaddressed bugs +---------------- + +- [Bug 838784](https://bugzilla.redhat.com/show_bug.cgi?id=838784) +- [Bug 893778](https://bugzilla.redhat.com/show_bug.cgi?id=893778) +- [Bug 913699](https://bugzilla.redhat.com/show_bug.cgi?id=913699); + possibly related to [Bug + 884597](https://bugzilla.redhat.com/show_bug.cgi?id=884597) \ No newline at end of file diff --git a/doc/developer-guide/Bug-Reporting-Guidelines.md b/doc/developer-guide/Bug-Reporting-Guidelines.md new file mode 100644 index 00000000000..d03878adebd --- /dev/null +++ b/doc/developer-guide/Bug-Reporting-Guidelines.md @@ -0,0 +1,128 @@ +Before filing a bug +------------------- + +If you are finding any issues, these preliminary checks as useful: + +- Is SELinux enabled? (you can use `getenforce` to check) +- Are iptables rules blocking any data traffic? (`iptables -L` can + help check) +- Are all the nodes reachable from each other? [ Network problem ] +- Please search Bugzilla to see if the bug has already been reported + - Choose GlusterFS as the "product", and then type something + relevant in the "words" box. If you are seeing a crash or abort, + searching for part of the abort message might be effective. If + you are feeling adventurous you can select the "Advanced search" + tab; this gives a lot more control but isn't much better for + finding existing bugs. + - If a bug has been already filed for a particular release and you + found the bug in another release, + - please clone the existing bug for the release, you found the + issue. + - If the existing bug is against mainline and you found the + issue for a release, then the cloned bug *depends on* should + be set to the BZ for mainline bug. + +Anyone can search in Bugzilla, you don't need an account. Searching +requires some effort, but helps avoid duplicates, and you may find that +your problem has already been solved. + +Reporting A Bug +--------------- + +- You should have a Bugzilla account +- Here is the link to file a bug: + [Bugzilla](https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS) +- The template for filing a bug can be found [ + *here*](./Bug reporting template.md) + +*Note: Please go through all below sections to understand what +information we need to put in a bug. So it will help the developer to +root cause and fix it* + +### Required Information + +You should gather the information below before creating the bug report. + +#### Package Information + +- Location from which the packages are used +- Package Info - version of glusterfs package installed + +#### Cluster Information + +- Number of nodes in the cluster +- Hostnames and IPs of the gluster Node [if it is not a security + issue] + - Hostname / IP will help developers in understanding & + correlating with the logs +- Output of `gluster peer status` +- Node IP, from which the "x" operation is done + - "x" here means any operation that causes the issue + +#### Volume Information + +- Number of volumes +- Volume Names +- Volume on which the particular issue is seen [ if applicable ] +- Type of volumes +- Volume options if available +- Output of `gluster volume info` +- Output of `gluster volume status` +- Get the statedump of the volume with the problem + +`   $ gluster volume statedump ` + +This dumps statedump per brick process in `/var/run/gluster` + +*NOTE: Collect statedumps from one gluster Node in a directory.* + +Repeat it in all Nodes containing the bricks of the volume. All the so +collected directories could be archived,compressed and attached to bug + +#### Brick Information + +- xfs options when brick partition was done + - This could be obtained with this command : + +`   $ xfs_info /dev/mapper/vg1-brick` + +- Extended attributes on the bricks + - This could be obtained with this command: + +`   $ getfattr -d -m. -ehex /rhs/brick1/b1` + +#### Client Information + +- OS Type ( Windows, RHEL ) +- OS Version : In case of Linux distro get the following : + +`   $ uname -r`\ +`   $ cat /etc/issue` + +- Fuse or NFS Mount point on the client with output of mount commands +- Output of `df -Th` command + +#### Tool Information + +- If any tools are used for testing, provide the info/version about it +- if any IO is simulated using a script, provide the script + +#### Logs Information + +- You can check logs for check for issues/warnings/errors. + - Self-heal logs + - Rebalance logs + - Glusterd logs + - Brick logs + - NFS logs (if applicable) + - Samba logs (if applicable) + - Client mount log +- Add the entire logs as attachment, if its very large to paste as a + comment + +#### SOS report for CentOS/Fedora + +- Get the sosreport from the involved gluster Node and Client [ in + case of CentOS /Fedora ] +- Add a meaningful name/IP to the sosreport, by renaming/adding + hostname/ip to the sosreport name diff --git a/doc/developer-guide/Bug-Triage.md b/doc/developer-guide/Bug-Triage.md new file mode 100644 index 00000000000..bcd475e81fb --- /dev/null +++ b/doc/developer-guide/Bug-Triage.md @@ -0,0 +1,400 @@ +Bug Triage Guidelines +===================== + +- Triaging of bugs is an important task; when done correctly, it can + reduce the time between reporting a bug and the availability of a + fix enormously. + +- Triager should focus on new bugs, and try to define the problem + easily understandable and as accurate as possible. The goal of the + triagers is to reduce the time that developers need to solve the bug + report. + +- A triager is like an assistant that helps with the information + gathering and possibly the debugging of a new bug report. Because a + triager helps preparing a bug before a developer gets involved, it + can be a very nice role for new community members that are + interested in technical aspects of the software. + +- Triagers will stumble upon many different kind of issues, ranging + from reports about spelling mistakes, or unclear log messages to + memory leaks causing crashes or performance issues in environments + with several hundred storage servers. + +Nobody expects that triagers can prepare all bug reports. Therefore most +developers will be able to assist the triagers, answer questions and +suggest approaches to debug and data to gather. Over time, triagers get +more experienced and will rely less on developers. + +**Bug triage can be summarised as below points:** + +- Is there enough information in the bug description? +- Is it a duplicate bug? +- Is it assigned to correct component of GlusterFS? +- Are the Bugzilla fields correct? +- Is the bug summary is correct? +- Assigning bugs or Adding people to the "CC" list +- Fix the Severity And Priority. +- Todo, If the bug present in multiple GlusterFS versions. +- Add appropriate Keywords to bug. + +The detailed discussion about the above points are below. + +Weekly meeting about Bug Triaging +--------------------------------- + +We try to meet every week in \#gluster-meeting on Freenode. The meeting +date and time for the next meeting is normally updated in the +[agenda](https://public.pad.fsfe.org/p/gluster-bug-triage). + +Getting Started: Find reports to triage +--------------------------------------- + +There are many different techniques and approaches to find reports to +triage. One easy way is to use these pre-defined Bugzilla reports (a +report is completely structured in the URL and can manually be +modified): + +- New **bugs** that do not have the 'Triaged' keyword [Bugzilla + link](https://bugzilla.redhat.com/buglist.cgi?bug_status=NEW&f1=keywords&keywords=Triaged%2CFutureFeature&keywords_type=nowords&list_id=3014117&o1=nowords&product=GlusterFS&query_format=advanced&v1=Triaged) +- New **features** that do not have the 'Triaged' keyword (identified + by FutureFeature keyword, probably of interest only to project + leaders) [Bugzilla + link](https://bugzilla.redhat.com/buglist.cgi?bug_status=NEW&f1=keywords&f2=keywords&list_id=3014699&o1=nowords&o2=allwords&product=GlusterFS&query_format=advanced&v1=Triaged&v2=FutureFeature) +- New glusterd bugs: [Bugzilla + link](https://bugzilla.redhat.com/buglist.cgi?bug_status=NEW&product=GlusterFS&f1=keywords&o1=nowords&v1=Triaged&component=glusterd) +- New Replication(afr) bugs: [Bugzilla + link](https://bugzilla.redhat.com/buglist.cgi?bug_status=NEW&component=replicate&f1=keywords&list_id=2816133&o1=nowords&product=GlusterFS&query_format=advanced&v1=Triaged) +- New distribute(DHT) bugs: [Bugzilla + links](https://bugzilla.redhat.com/buglist.cgi?bug_status=NEW&component=distribute&f1=keywords&list_id=2816148&o1=nowords&product=GlusterFS&query_format=advanced&v1=Triaged) + +- New bugs against version 3.6: + [=\^3.6 + Bugzilla link] +- New bugs against version 3.5: + [=\^3.5 + Bugzilla link] +- New bugs against version 3.4: + [=\^3.4 + Bugzilla link] + +- [=&bug\_status=all&tab=recents + bugzilla tracker] (can include already Triaged bugs) + +- [Untriaged NetBSD + bugs](https://bugzilla.redhat.com/buglist.cgi?bug_status=NEW&keywords=Triaged&keywords_type=nowords&op_sys=NetBSD&product=GlusterFS) +- [Untriaged FreeBSD + bugs](https://bugzilla.redhat.com/buglist.cgi?bug_status=NEW&keywords=Triaged&keywords_type=nowords&op_sys=FreeBSD&product=GlusterFS) +- [Untriaged Mac OS + bugs](https://bugzilla.redhat.com/buglist.cgi?bug_status=NEW&keywords=Triaged&keywords_type=nowords&op_sys=Mac%20OS&product=GlusterFS) + +In addition to manually checking Bugzilla for bugs to triage, it is also +possible to receive emails when new +bugs are filed or existing bugs get updated. + +If at any point you feel like you do not know what to do with a certain +report, please first ask [irc or mailing +lists](http://www.gluster.org/community/index.html) before changing +something. + +Is there enough information? +---------------------------- + +To make a report useful, the same rules apply as for +[bug reporting guidelines](./Bug Reporting Guidelines.md). + +It's hard to generalize what makes a good report. For "average" +reporters is definitely often helpful to have good steps to reproduce, +GlusterFS software version , and information about the test/production +environment, Linux/GNU distribution. + +If the reporter is a developer, steps to reproduce can sometimes be +omitted as context is obvious. *However, this can create a problem for +contributors that need to find their way, hence it is strongly advised +to list the steps to reproduce an issue.* + +Other tips: + +- There should be only one issue per report. Try not to mix related or + similar looking bugs per report. + +- It should be possible to call the described problem fixed at some + point. "Improve the documentation" or "It runs slow" could never be + called fixed, while "Documentation should cover the topic Embedding" + or "The page at should load + in less than five seconds" would have a criterion. A good summary of + the bug will also help others in finding existing bugs and prevent + filing of duplicates. + +- If the bug is a graphical problem, you may want to ask for a + screenshot to attach to the bug report. Make sure to ask that the + screenshot should not contain any confidential information. + +Is it a duplicate? +------------------ + +Some reports in Bugzilla have already been reported before so you can +[search for an already existing +report](https://bugzilla.redhat.com/query.cgi?format=advanced). We do +not recommend to spend too much time on it; if a bug is filed twice, +someone else will mark it as a duplicate later. If the bug is a +duplicate, mark it as a duplicate in the resolution box below the +comment field by setting the **CLOSED DUPLICATE** status, and shortly +explain your action in a comment for the reporter. When marking a bug as +a duplicate, it is required to reference the original bug. + +If you think that you have found a duplicate but you are not totally +sure, just add a comment like "This bug looks related to bug XXXXX" (and +replace XXXXX by the bug number) so somebody else can take a look and +help judging. + +You can also take a look at +https://bugzilla.redhat.com/page.cgi?id=browse.html&product=GlusterFS&product_version>=&bug\_status=all&tab=duplicates's +list of existing duplicates + +Is it assigned to correct component of GlusterFS? +------------------------------------------------- + +Make sure the bug is assigned on right component. Below are the list of +GlusterFs components in bugzilla. + +- access control - Access control translator +- BDB - Berkeley DB backend storage +- booster - LD\_PRELOAD'able access client +- build - Compiler, package management and platform specific warnings + and errors +- cli -gluster command line +- core - Core features of the filesystem +- distribute - Distribute translator (previously DHT) +- errorgen - Error Gen Translator +- fuse -mount/fuse translator and patched fuse library +- georeplication - Gluster Geo-Replication +- glusterd - Management daemon +- HDFS - Hadoop application support over GlusterFS +- ib-verbs - Infiniband verbs transport +- io-cache - IO buffer caching translator +- io-threads - IO threads performance translator +- libglusterfsclient- API interface to access glusterfs volumes + programatically +- locks - POSIX and internal locks +- logging - Centralized logging, log messages, log rotation etc +- nfs- NFS component in GlusterFS +- nufa- Non-Uniform Filesystem Scheduler Translator +- object-storage - Object Storage +- porting - Porting GlusterFS to different operating systems and + platforms +- posix - POSIX (API) based backend storage +- protocol -Client and Server protocol translators +- quick-read- Quick Read Translator +- quota - Volume & Directory quota translator +- rdma- RDMA transport +- read-ahead - Read ahead (file) performance translator +- replicate- Replication translator (previously AFR) +- rpc - RPC Layer +- scripts - Build scripts, mount scripts, etc. +- stat-prefetch - Stat prefetch translator +- stripe - Striping (RAID-0) cluster translator +- trace- Trace translator +- transport - Socket (IPv4, IPv6, unix, ib-sdp) and generic transport + code +- unclassified - Unclassified - to be reclassified as other components +- unify - Unify translator and schedulers +- write-behind- Write behind performance translator +- libgfapi - APIs for GlusterFS +- tests- GlusterFS Test Framework +- gluster-hadoop - Hadoop support on GlusterFS +- gluster-hadoop-install - Automated Gluster volume configuration for + Hadoop Environments +- gluster-smb - gluster smb +- puppet-gluster - A puppet module for GlusterFS + +Tips for searching: + +- As it is often hard for reporters to find the right place (product + and component) where to file a report, also search for duplicates + outside same product and component of the bug report you are + triaging. +- Use common words and try several times with different combinations, + as there could be several ways to describe the same problem. If you + choose the proper and common words, and you try several times with + different combinations of those, you ensure to have matching + results. +- Drop the ending of a verb (e.g. search for "delet" so you get + reports for both "delete" and "deleting"), and also try similar + words (e.g. search both for "delet" and "remov"). +- Search using the date range delimiter: Most of the bug reports are + recent, so you can try to increase the search speed using date + delimiters by going to "Search by Change History" on the [search + page](https://bugzilla.redhat.com/query.cgi?format=advanced). + Example: search from "2011-01-01" or "-730d" (to cover the last two + years) to "Now". + +Are the fields correct? +----------------------- + +### Summary + +Sometimes the summary does not summarize the bug itself well. You may +want to update the bug summary to make the report distinguishable. A +good title may contain: + +- A brief explanation of the root cause (if it was found) +- Some of the symptoms people are experiencing + +### Adding people to the "CC" or changing the "Assigned to" field + +Normally, developers and potential assignees of an area are already +CC'ed by default, but sometimes reports describe general issues or are +filed against common bugzilla products. Only if you know developers who +work in the area covered by the bug report, and if you know that these +developers accept getting CCed or assigned to certain reports, you can +add that person to the CC field or even assign the bug report to +her/him. + +To get an idea who works in which area, check To know component owners , +you can check the "MAINTAINERS" file in root of glusterfs code directory +or querying changes in [Gerrit](http://review.gluster.org) (see +[Simplified dev workflow](./Simplified Development Workflow.md)) + +### Severity And Priority + +Please see below for information on the available values and their +meanings. + +#### Severity + +This field is a pull-down of the external weighting of the bug report's +importance and can have the following values: + + Severity |Definition + -------------|------------------------------------------------------------------------------------------------------------------------------------------------------------- + urgent |catastrophic issues which severely impact the mission-critical operations of an organization. This may mean that the operational servers, development systems or customer applications are down or not functioning and no procedural workaround exists. + high |high-impact issues in which the customer's operation is disrupted, but there is some capacity to produce + medium |partial non-critical functionality loss, or issues which impair some operations but allow the customer to perform their critical tasks. This may be a minor issue with limited loss or no loss of functionality and limited impact to the customer's functionality + low |general usage questions, recommendations for product enhancement, or development work + unspecified |importance not specified + +#### Priority + +This field is a pull-down of the internal weighting of the bug report's +importance and can have the following values: + + Priority |Definition + -------------|------------------------ + urgent |extremely important + high |very important + medium |average importance + low |not very important + unspecified |importance not specified + + +### Bugs present in multiple Versions + +During triaging you might come across a particular bug which is present +across multiple version of GlusterFS. Here are the course of actions: + +- We should have separate bugs for each release (We should + clone bugs if required) +- Bugs in released versions should be depended on bug for mainline + (master branch) if the bug is applicable for mainline. + - This will make sure that the fix would get merged in master + branch first then the fix can get ported to other stable + releases. + +*Note: When a bug depends on other bugs, that means the bug cannot be +fixed unless other bugs are fixed (depends on), or this bug stops other +bugs being fixed (blocks)* + +Here are some examples: + +- A bug is raised for GlusterFS 3.5 and the same issue is present in + mainline (master branch) and GlusterFS 3.6 + - Clone the original bug for mainline. + - Clone another for 3.6. + - And have the GlusterFS 3.6 bug and GlusterFS 3.5 bug 'depend on' + the 'mainline' bug + +- A bug is already present for mainline, and the same issue is seen in + GlusterFS 3.5. + - Clone the original bug for GlusterFS 3.5. + - And have the cloned bug (for 3.5) 'depend on' the 'mainline' + bug. + +### Keywords + +Many predefined searches for Bugzilla include keywords. One example are +the searches for the triaging. If the bug is 'NEW' and 'Triaged' is no +set, you (as a triager) can pick it and use this page to triage it. When +the bug is 'NEW' and 'Triaged' is in the list of keyword, the bug is +ready to be picked up by a developer. + +**Triaged** +: Once you are done with triage add the **Triaged** keyword to the + bug, so that others will know the triaged state of the bug. The + predefined search at the top of this page will then not list the + Triaged bug anymore. Instead, the bug should have moved to [this + list](https://bugzilla.redhat.com/buglist.cgi?bug_status=NEW&keywords=Triaged&product=GlusterFS). + +**EasyFix** +: By adding the **EasyFix** keyword, the bug gets added to the [list + of bugs that should be simple to fix](./Easy Fix Bugs.md). + Adding this keyword is encouraged for simple and well defined bugs + or feature enhancements. + +**Patch** +: When a patch for the problem has been attached or included inline, + add the **Patch** keyword so that it is clear that some preparation + for the development has been done already. If course, it would have + been nicer if the patch was sent to Gerrit for review, but not + everyone is ready to pass the Gerrit hurdle when they report a bug. + +You can also add the **Patch** keyword when a bug has been fixed in + mainline and the patch(es) has been identified. Add a link to the + Gerrit change(s) so that backporting to a stable release is made + simpler. + +**Documentation** +: Add the **Documentation** keyword when a bug has been reported for + the documentation. This helps editors and writers in finding the + bugs that they can resolve. + +**Tracking** +: This keyword is used for bugs which are used to track other bugs for + a particular release. For example [3.6 tracker + bug](https://bugzilla.redhat.com/showdependencytree.cgi?maxdepth=2&hide_resolved=1&id=glusterfs-3.6.0) + +**FutureFeature** +: This keyword is used for bugs which are used to request for a + feature enhancement ( RFE - Requested Feature Enhancement) for + future releases of GlusterFS. If you open a bug by requesting a + feature which you would like to see in next versions of GlusterFS + please report with this keyword. + +Add yourself to the CC list +--------------------------- + +By adding yourself to the CC list of bug reports that you change, you +will receive followup emails with all comments and changes by anybody on +that individual report. This helps learning what further investigations +others make. You can change the settings in Bugzilla on which actions +you want to receive mail. + +Bugs For Group Triage +--------------------- + +If you come across a bug/ bugs or If you think any bug should to go +thorough the bug triage group, please set NEEDINFO for bugs@gluster.org +on the bug. + +Resolving bug reports +--------------------- + +See the [Bug report life cycle](./Bug report Life Cycle.md) for +the meaning of the bug status and resolutions. + +Example of Triaged Bugs +----------------------- + +This Bugzilla +[filter](https://bugzilla.redhat.com/buglist.cgi?bug_status=NEW&keywords=Triaged&keywords_type=anywords&list_id=2739593&product=GlusterFS&query_format=advanced) +will list NEW, Triaged Bugs diff --git a/doc/developer-guide/Bug-report-Life-Cycle.md b/doc/developer-guide/Bug-report-Life-Cycle.md new file mode 100644 index 00000000000..3749bd6272a --- /dev/null +++ b/doc/developer-guide/Bug-report-Life-Cycle.md @@ -0,0 +1,57 @@ +This page describes the life of a bug report. + +- When a bug is first reported, it is given the **NEW** status. +- Once a developer has started, or is planning to work on a bug, the + status **ASSIGNED** is set. The "Assigned to" field should mention a + specific developer. +- If an initial + [patch](https://en.wikipedia.org/wiki/Patch_(computing)) for a bug + has been put into the [Gerrit code review + tool](http://review.gluster.org), the status **POST** should be set + manually. The status **POST** should only be used when all patches + for a specific bug have been posted for review. +- After a review of the patch, and passing any automated regression + tests, the patch will get merged by one of the maintainers. When the + patch has been merged into the git repository, a comment is added to + the bug. Only when all needed patches have been merged, the assigned + engineer will need to change the status to **MODIFIED**. +- Once a package is available with fix for the bug, the status should + be moved to **ON\_QA**. + - The **Fixed in version** field should get the name/release of + the package that contains the fix. Packages for multiple + distributions will mostly get available within a few days after + the *make dist* tarball was created. + - This tells the bug reporter that a package is available with fix + for the bug and that they should test the package. + - The release maintainer need to do this change to bug status, + scripts are available (ask *ndevos*). +- The status **VERIFIED** is set if a QA tester or the reporter + confirmed the fix after fix is merged and new build with the fix + resolves the issue. +- In case the version does not fix the reported bug, the status should + be moved back to **ASSIGNED** with a clear note on what exactly + failed. +- When a report has been solved it is given **CLOSED** status. This + can mean: + - **CLOSED/CURRENTRELEASE** when a code change that fixes the + reported problem has been merged in + [Gerrit](http://review.gluster.org). + - **CLOSED/WONTFIX** when the reported problem or suggestion is + valid, but any fix of the reported problem or implementation of + the suggestion would be barred from approval by the project's + Developers/Maintainers (or product managers, if existing). + - **CLOSED/WORKSFORME** when the problem can not be reproduced, + when missing information has not been provided, or when an + acceptable workaround exists to achieve a similar outcome as + requested. + - **CLOSED/CANTFIX** when the problem is not a bug, or when it is + a change that is outside the power of GlusterFS development. For + example, bugs proposing changes to third-party software can not + be fixed in the GlusterFS project itself. + - **CLOSED/DUPLICATE** when the problem has been reported before, + no matter if the previous report has been already resolved or + not. + +If a bug report was marked as *CLOSED* or *VERIFIED* and it turns out +that this was incorrect, the bug can be changed to the status *ASSIGNED* +or *NEW*. \ No newline at end of file diff --git a/doc/developer-guide/Bug-reporting-template.md b/doc/developer-guide/Bug-reporting-template.md new file mode 100644 index 00000000000..f8d1157d447 --- /dev/null +++ b/doc/developer-guide/Bug-reporting-template.md @@ -0,0 +1,55 @@ +Template for bug description +---------------------------- +This template should be in-line to the [Bug reporting guidelines](./Bug Reporting Guidelines.md). +The template is replacement for the default description template present in [Bugzilla](https://bugzilla.redhat.com) + + work in progress + +------------------------------------------------------------------------ + +Description of problem: + +Version of GlusterFS package installed: + +Location from which the packages are used: + +GlusterFS Cluster Information: + +- Number of volumes +- Volume Names +- Volume on which the particular issue is seen [ if applicable ] +- Type of volumes +- Volume options if available +- Output of `gluster volume info` +- Output of `gluster volume status` +- Get the statedump of the volume with the problem + +` $ gluster volume statedump ` + +- Client Information + - OS Type: + - Mount type: + - OS Version: + +How reproducible: + +Steps to Reproduce: + +- 1. +- 2. +- 3. + +Actual results: + +Expected results: + +Logs Information: + +- Provide possible issues, warnings, errors as a comment to the bug + - Look for issues/warnings/errors in self-heal logs, rebalance logs, glusterd logs, brick logs, mount logs/nfs logs/smb logs + - Add the entire logs as attachment, if it is very large to paste as a comment + +Additional info: + + [Bug\_reporting\_guidelines]: Bug_reporting_guidelines "wikilink" + [Bugzilla]: https://bugzilla.redhat.com diff --git a/doc/developer-guide/Building GlusterFS.md b/doc/developer-guide/Building GlusterFS.md new file mode 100644 index 00000000000..160216921ad --- /dev/null +++ b/doc/developer-guide/Building GlusterFS.md @@ -0,0 +1,148 @@ +This page describes how to build and install GlusterFS. + +Build Requirements +------------------ + +The following packages are required for building GlusterFS, + +- GNU Autotools + - Automake + - Autoconf + - Libtool +- lex (generally flex) +- GNU Bison +- OpenSSL +- libxml2 +- Python 2.x +- libaio +- libibverbs +- librdmacm +- readline +- lvm2 +- glib2 +- liburcu +- cmocka +- libacl +- sqlite + +### Fedora + +The following yum command installs all the build requirements for +Fedora, + + # yum install automake autoconf libtool flex bison openssl-devel libxml2-devel python-devel libaio-devel libibverbs-devel librdmacm-devel readline-devel lvm2-devel glib2-devel userspace-rcu-devel libcmocka-devel libacl-devel sqlite-devel + +### Ubuntu + +The following apt-get command will install all the build requirements on +Ubuntu, + + $ sudo apt-get install make automake autoconf libtool flex bison pkg-config libssl-dev libxml2-dev python-dev libaio-dev libibverbs-dev librdmacm-dev libreadline-dev liblvm2-dev libglib2.0-dev liburcu-dev libcmocka-dev libsqlite3-dev libacl1-dev + +Building from Source +-------------------- + +This section describes how to build GlusterFS from source. It is assumed +you have a copy of the GlusterFS source (either from a released tarball +or a git clone). All the commands below are to be run with the source +directory as the working directory. + +### Configuring for building + +Run the below commands once for configuring and setting up the build +process. + +Run autogen to generate the configure script. + + $ ./autogen.sh + +Once autogen completes successfully a configure script is generated. Run +the configure script to generate the makefiles. + + $ ./configure + +If the above build requirements have been installed, running the +configure script should give the below configure summary, + + GlusterFS configure summary + =========================== + FUSE client          : yes + Infiniband verbs     : yes + epoll IO multiplex   : yes + argp-standalone      : no + fusermount           : yes + readline             : yes + georeplication       : yes + Linux-AIO            : yes + Enable Debug         : no + systemtap            : no + Block Device xlator  : yes + glupy                : yes + Use syslog           : yes + XML output           : yes + QEMU Block formats   : yes + Encryption xlator    : yes + +During development it is good to enable a debug build. To do this run +configure with a '--enable-debug' flag. + + $ ./configure --enable-debug + +Further configuration flags can be found by running configure with a +'--help' flag, + + $ ./configure --help + +### Building + +Once configured, GlusterFS can be built with a simple make command. + + $ make + +To speed up the build process on a multicore machine, add a '-jN' flag, +where N is the number of parallel jobs. + +### Installing + +Run 'make install' to install GlusterFS. By default, GlusterFS will be +installed into '/usr/local' prefix. To change the install prefix, give +the appropriate option to configure. If installing into the default +prefix, you might need to use 'sudo' or 'su -c' to install. + + $ sudo make install + +### Running GlusterFS + +GlusterFS can be only run as root, so the following commands will need +to be run as root. If you've installed into the default '/usr/local' +prefix, add '/usr/local/sbin' and '/usr/local/bin' to your PATH before +running the below commands. + +A source install will generally not install any init scripts. So you +will need to start glusterd manually. To manually start glusterd just +run, + + # glusterd + +This will start glusterd and fork it into the background as a daemon +process. You now run 'gluster' commands and make use of GlusterFS. + +Building packages +----------------- + +### Building RPMs + +Building RPMs is really simple. On a RPM based system, for eg. Fedora, +get the source and do the configuration steps as shown in the 'Building +from Source' section. After the configuration step, run the following +steps to build RPMs, + + $ cd extras/LinuxRPM + $ make glusterrpms + +This will create rpms from the source in 'extras/LinuxRPM'. *(Note: You +will need to install the rpmbuild requirements including rpmbuild and +mock)* + +A more detailed description for building RPMs can be found at +[CompilingRPMS](./Compiling RPMS.md). diff --git a/doc/developer-guide/Compiling-RPMS.md b/doc/developer-guide/Compiling-RPMS.md new file mode 100644 index 00000000000..b1bd39b26f8 --- /dev/null +++ b/doc/developer-guide/Compiling-RPMS.md @@ -0,0 +1,178 @@ +How to compile GlusterFS RPMs from git source, for RHEL/CentOS, and Fedora +-------------------------------------------------------------------------- + +Creating rpm's of GlusterFS from git source is fairly easy, once you +know the steps. + +RPMS can be compiled on at least the following OS's: + +- Red Hat Enterprise Linux 5, 6 (& 7 when available) +- CentOS 5, 6 (& 7 when available) +- Fedora 16-20 + +Specific instructions for compiling are below. If you're using: + +- Fedora 16-20 - Follow the Fedora steps, then do all of the Common + steps. +- CentOS 5.x - Follow the CentOS 5.x steps, then do all of the Common + steps +- CentOS 6.x - Follow the CentOS 6.x steps, then do all of the Common + steps. +- RHEL 6.x - Follow the RHEL 6.x steps, then do all of the Common + steps. + +Note - these instructions have been explicitly tested on all of CentOS +5.10, RHEL 6.4, CentOS 6.4+, and Fedora 16-20. Other releases of +RHEL/CentOS and Fedora may work too, but haven't been tested. Please +update this page appropriately if you do so. :) + +### Preparation steps for Fedora 16-20 (only) + +​1. Install gcc, the python development headers, and python setuptools: + + $ sudo yum -y install gcc python-devel python-setuptools + +​2. If you're compiling GlusterFS version 3.4, then install +python-swiftclient. Other GlusterFS versions don't need it: + + $ sudo easy_install simplejson python-swiftclient + +Now follow through the **Common Steps** part below. + +### Preparation steps for CentOS 5.x (only) + +You'll need EPEL installed first and some CentOS specific packages. The +commands below will get that done for you. After that, follow through +the "Common steps" section. + +​1. Install EPEL first: + + $ curl -OL http://download.fedoraproject.org/pub/epel/5/x86_64/epel-release-5-4.noarch.rpm + $ sudo yum -y install epel-release-5-4.noarch.rpm --nogpgcheck + +​2. Install the packages required only on CentOS 5.x: + + $ sudo yum -y install buildsys-macros gcc ncurses-devel python-ctypes python-sphinx10 \ +   redhat-rpm-config + +Now follow through the **Common Steps** part below. + +### Preparation steps for CentOS 6.x (only) + +You'll need EPEL installed first and some CentOS specific packages. The +commands below will get that done for you. After that, follow through +the "Common steps" section. + +​1. Install EPEL first: + + $ sudo yum -y install http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm + +​2. Install the packages required only on CentOS: + + $ sudo yum -y install python-webob1.0 python-paste-deploy1.5 python-sphinx10 redhat-rpm-config + +Now follow through the **Common Steps** part below. + +### Preparation steps for RHEL 6.x (only) + +You'll need EPEL installed first and some RHEL specific packages. The 2 +commands below will get that done for you. After that, follow through +the "Common steps" section. + +​1. Install EPEL first: + + $ sudo yum -y install http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm + +​2. Install the packages required only on RHEL: + + $ sudo yum -y --enablerepo=rhel-6-server-optional-rpms install python-webob1.0 \ +   python-paste-deploy1.5 python-sphinx10 redhat-rpm-config + +Now follow through the **Common Steps** part below. + +### Common Steps + +These steps are for both Fedora and RHEL/CentOS. At the end you'll have +the complete set of GlusterFS RPMs for your platform, ready to be +installed. + +**NOTES for step 1 below:** + +- If you're on RHEL/CentOS 5.x and get a message about lvm2-devel not + being available, it's ok. You can ignore it. :) +- If you're on RHEL/CentOS 6.x and get any messages about + python-eventlet, python-netifaces, python-sphinx and/or pyxattr not + being available, it's ok. You can ignore them. :) + +​1. Install the needed packages + + $ sudo yum -y --disablerepo=rhs* --enablerepo=*optional-rpms install git autoconf \ +   automake bison cmockery2-devel dos2unix flex fuse-devel glib2-devel libaio-devel \ +   libattr-devel libibverbs-devel librdmacm-devel libtool libxml2-devel lvm2-devel make \ +   openssl-devel pkgconfig pyliblzma python-devel python-eventlet python-netifaces \ +   python-paste-deploy python-simplejson python-sphinx python-webob pyxattr readline-devel \ +   rpm-build systemtap-sdt-devel tar libcmocka-devel + +​2. Clone the GlusterFS git repository + + $ git clone git://git.gluster.org/glusterfs + $ cd glusterfs + +​3. Choose which branch to compile + +If you want to compile the latest development code, you can skip this +step and go on to the next one. + +If instead you want to compile the code for a specific release of +GlusterFS (such as v3.4), get the list of release names here: + + $ git branch -a | grep release +   remotes/origin/release-2.0 +   remotes/origin/release-3.0 +   remotes/origin/release-3.1 +   remotes/origin/release-3.2 +   remotes/origin/release-3.3 +   remotes/origin/release-3.4 +   remotes/origin/release-3.5 + +Then switch to the correct release using the git "checkout" command, and +the name of the release after the "remotes/origin/" bit from the list +above: + + $ git checkout release-3.4 + +**NOTE -** The CentOS 5.x instructions have only been tested for the +master branch in GlusterFS git. It is unknown (yet) if they work for +branches older then release-3.5. + +​4. Configure and compile GlusterFS + +Now you're ready to compile Gluster: + + $ ./autogen.sh + $ ./configure --enable-fusermount + $ make dist + +​5. Create the GlusterFS RPMs + + $ cd extras/LinuxRPM + $ make glusterrpms + +That should complete with no errors, leaving you with a directory +containing the RPMs. + + $ ls -l *rpm + -rw-rw-r-- 1 jc jc 3966111 Mar  2 12:15 glusterfs-3git-1.el5.centos.src.rpm + -rw-rw-r-- 1 jc jc 1548890 Mar  2 12:17 glusterfs-3git-1.el5.centos.x86_64.rpm + -rw-rw-r-- 1 jc jc   66680 Mar  2 12:17 glusterfs-api-3git-1.el5.centos.x86_64.rpm + -rw-rw-r-- 1 jc jc   20399 Mar  2 12:17 glusterfs-api-devel-3git-1.el5.centos.x86_64.rpm + -rw-rw-r-- 1 jc jc  123806 Mar  2 12:17 glusterfs-cli-3git-1.el5.centos.x86_64.rpm + -rw-rw-r-- 1 jc jc 7850357 Mar  2 12:17 glusterfs-debuginfo-3git-1.el5.centos.x86_64.rpm + -rw-rw-r-- 1 jc jc  112677 Mar  2 12:17 glusterfs-devel-3git-1.el5.centos.x86_64.rpm + -rw-rw-r-- 1 jc jc  100410 Mar  2 12:17 glusterfs-fuse-3git-1.el5.centos.x86_64.rpm + -rw-rw-r-- 1 jc jc  187221 Mar  2 12:17 glusterfs-geo-replication-3git-1.el5.centos.x86_64.rpm + -rw-rw-r-- 1 jc jc  299171 Mar  2 12:17 glusterfs-libs-3git-1.el5.centos.x86_64.rpm + -rw-rw-r-- 1 jc jc   44943 Mar  2 12:17 glusterfs-rdma-3git-1.el5.centos.x86_64.rpm + -rw-rw-r-- 1 jc jc  123065 Mar  2 12:17 glusterfs-regression-tests-3git-1.el5.centos.x86_64.rpm + -rw-rw-r-- 1 jc jc   16224 Mar  2 12:17 glusterfs-resource-agents-3git-1.el5.centos.x86_64.rpm + -rw-rw-r-- 1 jc jc  654043 Mar  2 12:17 glusterfs-server-3git-1.el5.centos.x86_64.rpm \ No newline at end of file diff --git a/doc/developer-guide/Developers-Index.md b/doc/developer-guide/Developers-Index.md new file mode 100644 index 00000000000..a523a4681c9 --- /dev/null +++ b/doc/developer-guide/Developers-Index.md @@ -0,0 +1,127 @@ +Developers +========== + +### From GlusterDocumentation + +Contributing to the Gluster community +------------------------------------- + +Are you itching to send in patches and participate as a developer in the +Gluster community? Here are a number of starting points for getting +involved. We don't require a signed contributor license agreement or +copyright assignment, but we do require a "signed-off-by" line on each +code check-in. + +- [Simplified Developer Workflow](./Simplified Development Workflow.md) + - A simpler and faster intro to developing with GlusterFS, than the + doc below. +- [Developer Workflow](./Development Workflow.md) - this tells + you about our patch requirements, tools we use, and more. Required + reading if you want to contribute code. +- [License + Change](http://www.gluster.org/2012/05/glusterfs-license-change/) - + we recently changed the client library code to a dual license under + the GPL v2 and the LGPL v3 or later +- [GlusterFS Coding Standards](./coding-standard.md) + +Compiling Gluster +----------------- + +- [Compiling RPMS](./Compiling RPMS.md) - Step by step + instructions for compiling Gluster RPMS +- [Building GlusterFS](./Building GlusterFS.md) - How to compile + Gluster from source code. Including instructions for Ubuntu. + +Developing +---------- + +- [Projects](./Projects.md) - Ideas for projects you could + create +- [Language Bindings](./Language Bindings.md) - Connect to + GlusterFS using various language bindings +- [EasyFix\_Bugs](./Easy Fix Bugs.md) - Easy to fix bugs of + GlusterFS. One of the best place to start contributing to GlusterFS. +- [Fixing issues reported by tools for static code + analysis](./Fixing issues reported by tools for static code analysis.md) + - This is a good starting point for developers to fix bugs in + GlusterFS project. +- [Backport Wishlist](./Backport Wishlist.md) - Problems fixed + in the master branch might need to get fixed in stable release + branches too. + The [Backport Guidelines](./Backport Guidelines.md) describe the steps that + branches too. + +Adding File operations +---------------------- + +- [Steps to be followed when adding a new FOP to GlusterFS ](./adding-fops.md) + +Automatic File Replication +-------------------------- + +- [Cluster/afr translator](./afr.md) +- [History of Locking in AFR](./afr-locks-evolution.md) +- [Self heal Daemon](./afr-self-heal-daemon.md) + +Data Structures +--------------- + +- [inode data structure](./datastructure-inode.md) +- [iobuf data structure](./datastructure-iobuf.md) +- [mem-pool data structure](./datastructure-mem-pool.md) + +Find the gfapi symbol versions [here](./gfapi-symbol-versions.md) + +Daemon Management Framework +--------------------------- + +- [How to introduce new daemons using daemon management framework](./daemon-management-framework.md) + +Translators +----------- + +- [Block Device Tanslator](./bd-xlator.md) +- [Performance/write-Behind Translator](./write-behind.md) +- [Translator Development](./translator-development.md) +- [Storage/posix Translator](./posix.md) +- [Compression translator](./network_compression.md) + +Testing/Debugging +----------------- + +- [Unit Tests in GlusterFS](./unittest.md) +- [Using the Gluster Test + Framework](./Using Gluster Test Framework.md) - Step by + step instructions for running the Gluster Test Framework +- [Our Jenkins Infrastructure](./Jenkins Infrastructure.md) - A + braindump of the Jenkins infrastructure we have in place for + automated testing +- [Manual steps for setting up a Jenkins slave VM in + Rackspace](./Jenkins Manual Setup.md) - Steps for setting up a slave + VM in Rackspace +- [Coredump Analysis](./coredump-analysis.md) - Steps to analize coredumps generated by regression machines. + +Bug Handling +------------ + +- [Bug reporting guidelines](./Bug Reporting Guidelines.md) - + Guideline for reporting a bug in GlusterFS +- [Bug triage guidelines](./Bug Triage.md) - Guideline on how to + triage bugs for GlusterFS +- [Bug report life cycle in + Bugzilla](./Bug report Life Cycle.md) - Information about bug + life cycle + +Patch Acceptance +---------------- + +- The [Guidelines For + Maintainers](./Guidelines For Maintainers.md) explains when + maintainers can merge patches. + +Release Process +--------------- + +- [Versioning](./versioning.md) +- [GlusterFS Release Process](./GlusterFS Release process.md) - + Our release process / checklist diff --git a/doc/developer-guide/Development-Workflow.md b/doc/developer-guide/Development-Workflow.md new file mode 100644 index 00000000000..d36b932c0a8 --- /dev/null +++ b/doc/developer-guide/Development-Workflow.md @@ -0,0 +1,457 @@ +Development work flow of Gluster +================================ + +This document provides a detailed overview of the development model +followed by the GlusterFS project. + +For a simpler overview visit +[Simplified develoment workflow](./Simplified Development Workflow.md). + +Basics +------ + +The GlusterFS development model largely revolves around the features and +functionality provided by Git version control system, Gerrit code review +system and Jenkins continuous integration system. It is a primer for a +contributor to the project. + +### Git + +Git is a extremely flexible, distributed version control system. +GlusterFS' main git repository is at and public +mirrors are at GlusterForge +(https://forge.gluster.org/glusterfs-core/glusterfs) and at GitHub +(https://github.com/gluster/glusterfs). The development repo is hosted +inside Gerrit and every code merge is instantly replicated to the public +mirrors. + +A good introduction to Git can be found at +. + +### Gerrit + +Gerrit is an excellent code review system which is developed with a git +based workflow in mind. The GlusterFS project code review system is +hosted at [review.gluster.org](http://review.gluster.org). Gerrit works +on "Change"s. A change is a set of modifications to various files in +your repository to accomplish a task. It is essentially one large git +commit with all the necessary changes which can be both built and +tested. + +Gerrit usage is described later in 'Review Process' section. + +### Jenkins + +Jenkins is a Continuous Integration build system. Jenkins is hosted at +. Jenkins is configured to work with Gerrit by +setting up hooks. Every "Change" which is pushed to Gerrit is +automatically picked up by Jenkins, built and smoke tested. Output of +all builds and tests can be viewed at +. Jenkins is also setup with a +'regression' job which is designed to execute test scripts provided as +part of the code change. + +Preparatory Setup +----------------- + +Here is a list of initial one-time steps before you can start hacking on +code. + +### Register + +Sign up for an account at by clicking +'Register' on the right-hand top. You can use your gmail login as the +openID identity. + +### Preferred email + +On first login, add your git/work email to your identity. You will have +to click on the URL which is sent to your email and set up a proper Full +Name. Make sure you set your git/work email as your preferred email. +This should be the email address from which all your code commits are +associated. + +### Set Username + +Select yourself a username. + +### Watch glusterfs + +In Gerrit settings, watch the 'glusterfs' project. Tick on all the three +(New Changes, All Comments, Submitted Changes) types of notifications. + +### Email filters + +Set up a filter rule in your mail client to tag or classify mails with +the header + + List-Id:  + +as mails originating from the review system. + +### SSH keys + +Provide your SSH public key into Gerrit so that you can successfully +access the development git repo as well as push changes for +review/merge. + +### Clone a working tree + +Get yourself a working tree by cloning the development repository from +Gerrit + + sh$ git clone ssh://[username)@]git.gluster.org/glusterfs.git glusterfs + +Branching policy +---------------- + +This section describes both, the branching policies on the public repo +as well as the suggested best-practice for local branching + +### Master/release branches + +In glusterfs.git, the master branch is the forward development branch. +This is where new features come in first. In fact this is where almost +every change (commit) comes in first. The master branch is always kept +in a buildable state and smoke tests pass. + +Release trains (3.1.z, 3.2.z, 3.2.z) each have a branch originating from +master. Code freeze of each new release train is marked by the creation +of the release-3.y branch. At this point no new features are added to +the release-3.y branch. All fixes and commits first get into master. +From there, only bug fixes get backported to the relevant release +branches. From the release-3.y branch, actual release code snapshots +(e.g. glusterfs-3.2.2 etc.) are tagged (git annotated tag with 'git tag +-a') shipped as a tarball. + +### Personal per-task branches + +As a best practice, it is recommended you perform all code changes for a +task in a local branch in your working tree. The local branch should be +created from the upstream branch to which you intend to submit the +change. If you are submitting changes to master branch, first create a +local task branch like this - + + sh$ git checkout master + sh$ git branch bug-XYZ && git checkout bug-XYZ + ...  + +If you are backporting a fix to a release branch, or making a new change +to a release branch, your commands would be slightly different. If you +are checking out a release branch in your local working tree for the +first time, make sure to set it up as a remote tracking branch like this +- + + sh$ git checkout -b release-3.2 origin/release-3.2 + +The above step is not necessary to be repeated. In the future if you +want to work to the release branch - + + sh$ git checkout release-3.2 + sh$ git branch bug-XYZ-release-3.2 && git checkout bug-XYZ-release-3.2 + ...  + +Building +-------- + +### Environment Setup + +**For details about the required packages for the build environment +refer : [Building GlusterFS](./Building GlusterFS.md)** + +Ubuntu: + +To setup the build environment on an Ubuntu system, type the following +command to install the required packages: + + sudo apt-get -y install python-pyxattr libreadline-dev systemtap-sdt-dev + tar python-pastedeploy python-simplejson python-sphinx python-webob libssl-dev + pkg-config python-dev python-eventlet python-netifaces libaio-dev libibverbs-dev + libtool libxml2-dev liblvm2-dev make autoconf automake bison dos2unix flex libfuse-dev + +CentOS/RHEL/Fedora: + +On Fedora systems, install the required packages by following the +instructions in [CompilingRPMS](./Compiling RPMS.md). + +### Creating build environment + +Once the required packages are installed for your appropiate system, +generate the build configuration: + + sh$ ./autogen.sh + sh$ ./configure --enable-fusermount + +### Build and install + +#### GlusterFS + +Ubuntu: + +Type the following to build and install GlusterFS on the system: + + sh$ make + sh$ make install + +CentOS/RHEL/Fedora: + +In an rpm based system, there are two methods to build GlusterFS. One is +to use the method describe above for *Ubuntu*. The other is to build and +install RPMS as described in [CompilingRPMS](./Compiling RPMS.md). + +#### GlusterFS UFO/SWIFT + +To build and run Gluster UFO you can do the following: + +1. Build, create, and install the RPMS as described in + [CompilingRPMS](./Compiling RPMS.md). +2. Configure UFO/SWIFT as described in [Howto Using UFO SWIFT - A quick + and dirty setup + guide](http://www.gluster.org/2012/09/howto-using-ufo-swift-a-quick-and-dirty-setup-guide) + +Commit policy +------------- + +For a Gerrit based work flow, each commit should be an independent, +buildable and testable change. Typically you would have a local branch +per task, and most of the times that branch will have one commit. + +If you have a second task at hand which depends on the changes of the +first one, then technically you can have it as a separate commit on top +of the first commit. But it is important that the first commit should be +a testable change by itself (if not, it is an indication that the two +commits are essentially part of a single change). Gerrit accommodates +these situations by marking Change 1 as a "dependency" of Change 2 +(there is a 'Dependencies' tab in the Change page in Gerrit) +automatically when you push the changes for review from the same local +branch. + +You will need to sign-off your commit (git commit -s) before sending the +patch for review. By signing off your patch, you agree to the terms +listed under "Developer's Certificate of Origin" section in the +CONTRIBUTING file available in the repository root. + +Provide a meaningful commit message. Your commit message should be in +the following format + +- A short one line subject describing what the patch accomplishes +- An empty line following the subject +- Situation necessitating the patch +- Description of the code changes +- Reason for doing it this way (compared to others) +- Description of test cases + +### Test cases + +Part of the workflow is to aggregate and execute pre-commit test cases +which accompany patches, cumulatively for every new patch. This +guarantees that tests which are working till the present are not broken +with the new patch. Every change submitted to Gerrit much include test +cases in + + tests/group/script.t + +as part of the patch. This is so that code changes and accompanying test +cases are reviewed together. All new commits now come under the +following categories w.r.t test cases: + +#### New 'group' directory and/or 'script.t' + +This is typically when code is adding a new module and/or feature + +#### Extend/Modify old test cases in existing scripts + +This is typically when present behavior (default values etc.) of code is +changed + +#### No test cases + +This is typically when code change is trivial (e.g. fixing typos in +output strings, code comments) + +#### Only test case and no code change + +This is typically when we are adding test cases to old code (already +existing before this regression test policy was enforced) + +More details on how to work with test case scripts can be found in + +tests/README + +Review process +-------------- + +### rfc.sh + +After doing the local commit, it is time to submit the code for review. +There is a script available inside glusterfs.git called rfc.sh. You can +submit your changes for review by simply executing + + sh$ ./rfc.sh + +This script does the following: + +- The first time it is executed, it downloads a git hook from + and sets it up + locally to generate a Change-Id: tag in your commit message (if it + was not already generated.) +- Rebase your commit against the latest upstream HEAD. This rebase + also causes your commits to undergo massaging from the just + downloaded commit-msg hook. +- Prompt for a Bug Id for each commit (if it was not already provded) + and include it as a "BUG:" tag in the commit log. You can just hit + at this prompt if your submission is purely for review + purposes. +- Push the changes to review.gluster.org for review. If you had + provided a bug id, it assigns the topic of the change as "bug-XYZ". + If not it sets the topic as "rfc". + +On a successful push, you will see a URL pointing to the change in +review.gluster.org + +Auto verification +----------------- + +The integration between Jenkins and Gerrit triggers an event in Jenkins +on every push of changes, to pick up the change and run build and smoke +test on it. + +If the build and smoke tests execute successfuly, Jenkins marks the +change as '+0 Verified'. If they fail, '-1 Verified' is marked on the +change. This means passing the automated smoke test is a necessary +condition but not sufficient. + +It is important to note that Jenkins verification is only a generic +verification of high level tests. More concentrated testing effort for +the patch is necessary with manual verification. + +If auto verification fails, it is a good reason to skip code review till +a fixed change is pushed later. You can click on the build URL +automatically put as a comment to inspect the reason for auto +verification failure. In the Jenkins job page, you can click on the +'Console Output' link to see the exact point of failure. + +Reviewing / Commenting +---------------------- + +Code review with Gerrit is relatively easy compared to other available +tools. Each change is presented as multiple files and each file can be +reviewed in Side-by-Side mode. While reviewing it is possible to comment +on each line by double-clicking on it and writing in your comments in +the text box. Such in-line comments are saved as drafts, till you +finally publish them as a Review from the 'Change page'. + +There are many small and handy features in Gerrit, like 'starring' +changes you are interested to follow, setting the amount of context to +view in the side-by-side view page etc. + +Incorporate, Amend, rfc.sh, Reverify +------------------------------------ + +Code review comments are notified via email. After incorporating the +changes in code, you can mark each of the inline comment as 'done' +(optional). After all the changes to your local files, amend the +previous commit with these changes with - + + sh$ git commit -a --amend + +Push the amended commit by executing rfc.sh. If your previous push was +an "rfc" push (i.e, without a Bug Id) you will be prompted for a Bug Id +again. You can re-push an rfc change without any other code change too +by giving a Bug Id. + +On the new push, Jenkins will re-verify the new change (independent of +what the verification result was for the previous push). + +It is the Change-Id line in the commit log (which does not change) that +associates the new push as an update for the old push (even though they +had different commit ids) under the same Change. In the side-by-side +view page, it is possible to set knobs in the 'Patch History' tab to +view changes between patches as well. This is handy to inspect how +review comments were incorporated. + +If further changes are found necessary, comments can be made on the new +patch as well, and the same cycle repeats. + +If no further changes are necessary, the reviewer can mark the patch as +reviewed with a certain score depending on the depth of review and +confidence (+1 or +2). A -1 review indicates non-agreement for the +change to get merged upstream. + +Regression tests and test cases +------------------------------- + +All code changes which are not trivial (typo fixes, code comment +changes) must be accompanied with either a new test case script or +extend/modify an existing test case script. It is important to review +the test case in conjunction with the code change to analyse whether the +code change is actually verified by the test case. + +Regression tests (i.e, execution of all test cases accumulated with +every commit) is not automatically triggered as the test cases can be +extensive and is quite expensive to execute for every change submission +in the review/resubmit cycle. Instead it is triggered by the +maintainers, after code review. Passing the regression test is a +necessary condition for merge along with code review points. + +Submission Qualifiers +--------------------- + +For a change to get merged, there are two qualifiers which are enforced +by the Gerrit system. They are - A change should have at least one '+2 +Reviewed', and a change should have at least one '+1 Verified' +(regression test). The project maintainer will merge the changes once a +patch meets these qualifiers. + +Submission Disqualifiers +------------------------ + +There are three types of "negative votes". + +-1 Verified + +-1 Code-Review ("I would prefer that you didn't submit this") + +-2 Code-Review ("Do not submit") + +The implication and scope of each of the three are different. They +behave differently as changes are resubmitted as new patchsets. + +### -1 Verified + +Anybody voting -1 Verified will prevent \*that patchset only\* from +getting merged. The flag is automatically cleared on the next patchset +post. The intention is that this vote is based on the result of some +kind of testing. A voter is expected to explain the test case which +failed. Jenkins jobs (smoke, regression, ufounit) use this field for +voting -1/0/+1. When voting -1, Jenkins posts the link to the URL which +has the console output of the failed job. + +### -1 Code-Review ("I would prefer that you didn't submit this") + +This is an advisory vote based on the content of the patch. Typically +issues in source code (both design and implementation), source code +comments, log messages, license headers etc. found by human inspection. +The reviewer explains the specific issues by commenting against the most +relevant lines of source code in the patch. On a resubmission, -1 votes +are cleared automatically. It is the responsibility of the maintainers +to honor -1 Code-Review votes from reviewers (by not merging the +patches), and inspecting that -1 comments on previous submissions are +addressed in the new patchset. Generally this is the recommended +"negative" vote. + +### -2 Code-Review ("Do not submit") + +This is a stronger vote which actually prevents Gerrit from merging the +patch. The -2 vote persists even after resubmission and continues to +prevent the patch from getting merged, until the voter revokes the -2 +vote (and then is further subjected to Submission Qualifiers). Typically +one would vote -2 if they are \*against the goal\* of what the patch is +trying to achieve (and not an issue with the patch, which can change on +resubmission). A reviewer would also vote -2 on a patch even if there is +agreement with the goal, but the issue in the code is of such a critical +nature that the reviewer personally wants to inspect the next patchset +and only then revoke the vote after finding the new patch satisfactory. +This prevents the merge of the patch in the mean time. Every registered +user has the right to exercise the -2 Code review vote, and cannot be +overridden by the maintainers. diff --git a/doc/developer-guide/Easy-Fix-Bugs.md b/doc/developer-guide/Easy-Fix-Bugs.md new file mode 100644 index 00000000000..9ba36213a73 --- /dev/null +++ b/doc/developer-guide/Easy-Fix-Bugs.md @@ -0,0 +1,35 @@ +Fixing easy bugs is an excellent method to start contributing patches to +Gluster. + +- Bugs which are marked with EasyFix flag can be found from below + BugZilla query. + - [Bugzilla Query For EasyFix + Bugs](https://bugzilla.redhat.com/buglist.cgi?bug_status=NEW&keywords=EasyFix&list_id=2626252&product=GlusterFS) + - [RSS-feed for EasyFix Gluster Bugs](http://goo.gl/OpQwlv) +- To fix EasyFix bugs, + - When you pick an EasyFix you want to work on, assign it to + yourself and move it to ASSIGNED + - Check + [Bug report life cycle](./Bug report Life Cycle.md) and + follow it. + - Check Developers page for details about development workflow, + GlusterFS design documents etc. + +Sometimes an *Easy Fix* bug has a patch attached. In those cases, +the *Patch* keyword has been added to the bug. These bugs can be +used by new contributors that would like to verify their workflow. [Bug +1099645](https://bugzilla.redhat.com/1099645) is one example of those. + +### Guidelines for new comers + +- While trying to write a patch, do not hesitate to ask questions. +- If something in the documentation is unclear, we do need to know so + that we can improve it. +- There are no stupid questions, and it's more stupid to not ask + questions that others can easily answer. Always assume that if you + have a question, someone else would like to hear the answer too. + +[Reach out](http://gluster.org/community/index.html) to the developers +in \#gluster or \#gluster-dev on Freenode IRC, or on one of the mailing +lists, try to keep the discussions public so that anyone can learn from +it. diff --git a/doc/developer-guide/Fixing-issues-reported-by-tools-for-static-code-analysis.md b/doc/developer-guide/Fixing-issues-reported-by-tools-for-static-code-analysis.md new file mode 100644 index 00000000000..5a3dceb7d0e --- /dev/null +++ b/doc/developer-guide/Fixing-issues-reported-by-tools-for-static-code-analysis.md @@ -0,0 +1,66 @@ +Static Code Analysis Tools +-------------------------- + +Bug fixes for issues reported by *Static Code Analysis Tools* should +follow [Development Work Flow](./Development Workflow.md) + +### Coverity + +GlusterFS is part of [Coverity's](https://scan.coverity.com/) scan +program. + +- To see Coverity issues you have to be a member of the GlusterFS + project in Coverity scan website. +- Here is the link to [Coverity scan + website](https://scan.coverity.com/projects/987) +- Go to above link and subscribe to GlusterFS project (as + contributor). It will send a request to Admin for including you in + the Project. +- Once admins for the GlusterFS Coverity scan approve your request, + you will be able to see the defects raised by Coverity. +- [BZ 789278](https://bugzilla.redhat.com/show_bug.cgi?id=789278) + should be used as a umbrella bug for Coverity issues in master + branch unless you are trying to fix a specific bug in Bugzilla. + - While sending patches for fixing Coverity issues please use the + same bug number. + - For 3.6 branch the Coverity tracking bug is + [1122834](https://bugzilla.redhat.com/show_bug.cgi?id=1122834) +- When you decide to work on some issue, please assign it to your name + in the same Coverity website. So that we don't step on each others + work. +- When marking a bug intentional in Coverity scan website, please put + an explanation for the same. So that it will help others to + understand the reasoning behind it. + +*If you have more questions please send it to +[gluster-devel](http://www.gluster.org/interact/mailinglists) mailing +list* + +### CPP Check + +Cppcheck is available in Fedora and EL's EPEL repo + +- Install Cppcheck + + yum install cppcheck + +- Clone GlusterFS code + + git clone https://github.com/gluster/glusterfs) glusterfs + +- Run Cpp check + + cppcheck glusterfs/ 2>cppcheck.log + +- [BZ 1091677](https://bugzilla.redhat.com/show_bug.cgi?id=1091677) + should be used for submitting patches to master branch for Cppcheck + reported issues. + +### Daily Runs + +We now have daily runs of various static source code analysis tools on +the glusterfs sources. There are daily analyses of the master, +release-3.6, and release-3.5 branches. + +Results are posted at + diff --git a/doc/developer-guide/GlusterFS-Release-process.md b/doc/developer-guide/GlusterFS-Release-process.md new file mode 100644 index 00000000000..504b012def7 --- /dev/null +++ b/doc/developer-guide/GlusterFS-Release-process.md @@ -0,0 +1,73 @@ +Release Process for GlusterFS +============================= + +Create tarball +-------------- + +1. Add the release-notes to the docs/release-notes/ directory in the + sources +2. after merging the release-notes, create a tag like v3.6.2 +3. push the tag to git.gluster.org +4. create the tarball with the [release job in + Jenkins](http://build.gluster.org/job/release/) + +Notify packagers +---------------- + +Notify the packagers that we need packages created. Provide the link to the +source tarball from the Jenkins release job to the [packagers +mailinglist](mailto:packaging@gluster.org). A list of the people involved in +the package maintenance for the different distributions is in the `MAINTAINERS` +file in the sources. + +Create a new Tracker Bug for the next release +--------------------------------------------- + +The tracker bugs are used as guidance for blocker bugs and should get created when a release is made. To create one + +- file a [new bug in Bugzilla](https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS) +- base the contents on previous tracker bugs, like the one for [glusterfs-3.5.5](https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-3.5.5) +- set the '''Alias''' (it is a text-field) of the bug to 'glusterfs-a.b.c' where a.b.c is the next minor version +- save the new bug +- you should now be able to use the 'glusterfs-a.b.c' to access the bug, use the alias to replace the BZ# in URLs, or '''blocks''' fields +- bugs that were not fixed in this release, but were added to the tracker should be moved to the new tracker + + +Create Release Announcement +--------------------------- + +Create the Release Announcement (this is often done while people are +making the packages). The contents of the release announcement can be +based on the release notes, or should at least have a pointer to them. + +Examples: + +- [blog](http://blog.gluster.org/2014/11/glusterfs-3-5-3beta2-is-now-available-for-testing/) +- [release + notes](https://github.com/gluster/glusterfs/blob/v3.5.3/doc/release-notes/3.5.3.md) + +Send Release Announcement +------------------------- + +Once the Fedora/EL RPMs are ready (and any others that are ready by +then), send the release announcement: + +- Gluster Mailing lists + - gluster-announce, gluster-devel, gluster-users +- Gluster Blog +- Gluster Twitter account +- Gluster Facebook page +- Gluster LinkedIn group - Justin has access +- Gluster G+ + +Close Bugs +---------- + +Close the bugs that have all their patches included in the release. +Leave a note in the bug report with a pointer to the release +announcement. + +Other things to consider +------------------------ + +- Translations? - Are there strings needing translation? diff --git a/doc/developer-guide/Guidelines-For-Maintainers.md b/doc/developer-guide/Guidelines-For-Maintainers.md new file mode 100644 index 00000000000..71612cfe8c7 --- /dev/null +++ b/doc/developer-guide/Guidelines-For-Maintainers.md @@ -0,0 +1,70 @@ +### Guidelines For Maintainers + +GlusterFS has maintainers, sub-maintainers and release maintainers to +manage the project's codebase. Sub-maintainers are the owners for +specific areas/components of the source tree. Maintainers operate across +all components in the source tree.Release maintainers are the owners for +various release branches (release-x.y) present in the GlusterFS +repository. + +In the guidelines below, release maintainers and sub-maintainers are +also implied when there is a reference to maintainers unless it is +explicitly called out. + +### Guidelines that Maintainers are expected to adhere to + +​1. Ensure qualitative and timely management of patches sent for review. + +​2. For merging patches into the repository, it is expected of +maintainers to: + + a> Merge patches of owned components only. + b> Seek approvals from all maintainers before merging a patchset spanning multiple components. + c> Ensure that regression tests pass for all patches before merging. + d> Ensure that regression tests accompany all patch submissions. + e> Ensure that documentation is updated for a noticeable change in user perceivable behavior or design. + f> Encourage code unit tests from patch submitters to improve the overall quality of the codebase. + g> Not merge patches written by themselves until there is a +2 Code Review vote by other reviewers. + +​3. The responsibility of merging a patch into a release branch in +normal circumstances will be that of the release maintainer's. Only in +exceptional situations, maintainers & sub-maintainers will merge patches +into a release branch. + +​4. Release maintainers will ensure approval from appropriate +maintainers before merging a patch into a release branch. + +​5. Maintainers have a responsibility to the community, it is expected +of maintainers to: + + a> Facilitate the community in all aspects. + b> Be very active and visible in the community. + c> Be objective and consider the larger interests of the community  ahead of individual interests. + d> Be receptive to user feedback. + e> Address concerns & issues affecting users. + f> Lead by example. + +### Queries on Guidelines + +Any questions or comments regarding these guidelines can be routed to +gluster-devel at gluster dot org. + +### Patches in Gerrit + +Gerrit can be used to list patches that need reviews and/or can get +merged. Some queries have been prepared for this, edit the search box in +Gerrit to make your own variation: + +- [3.5 open reviewed/verified (non + rfc)](http://review.gluster.org/#/q/project:glusterfs+branch:release-3.5+status:open+%28label:Code-Review%253D%252B1+OR+label:Code-Review%253D%252B2+OR+label:Verified%253D%252B1%29+NOT+topic:rfc+NOT+label:Code-Review%253D-2,n,z) +- [All open 3.5 patches (non + rfc)](http://review.gluster.org/#/q/project:glusterfs+branch:release-3.5+status:open+NOT+topic:rfc,n,z) +- [Open NFS (master + branch)](http://review.gluster.org/#/q/project:glusterfs+branch:master+status:open+message:nfs,n,z) + +An other option can be used in combination with the Gerrit queries, and +has support for filename/directory matching (the queries above do not). +Go to the [settings](http://review.gluster.org/#/settings/projects) in +your Gerrit profile, and enter filters like these: + +![gerrit-watched-projects](https://cloud.githubusercontent.com/assets/10970993/7411584/1a26614a-ef57-11e4-99ed-ee96af22a9a1.png) diff --git a/doc/developer-guide/Jenkins-Infrastructure.md b/doc/developer-guide/Jenkins-Infrastructure.md new file mode 100644 index 00000000000..cb5d63ecfb7 --- /dev/null +++ b/doc/developer-guide/Jenkins-Infrastructure.md @@ -0,0 +1,127 @@ +We're using Gerrit and [Jenkins](http://jenkins-ci.org) at the moment. +Our Gerrit instance: + +http://review.gluster.org + +It's hosted on an ancient VM (badly needs upgrading) in some hosting +place called iWeb. We're wanting to migrate this to a Rackspace VM in +the very near future. + +Our main Jenkins instance: + +http://build.gluster.org + +That's also a pretty-out-of-date version of Jenkins, on an badly +outdated VM. That one's in Rackspace at least. We intend on migrating to +a new VM (and new Jenkins) in the not-too-far-future. No ETA yet. ;) + +As well as those two main pieces, we have a bunch of VM's in Rackspace +with various OS's on them: + +http://build.gluster.org/computer/ + +In that list we have: + +- bulk\*.cloud.gluster.org\ + + - Temporary VM's used for running bulk regression tests on, for + analysing our spurious regression failure problem + - Setup and maintained by Justin Clift + +- freebsd0.cloud.gluster.org\ + + - FreeBSD 10.0 VM in Rackspace. Used for automatic smoke testing + on FreeBSD of all proposed patches (uses a Gerrit trigger). + +- g4s-rackspace-\* (apart from gfs-rackspace-f20-1), and + tiny-rackspace-f20-1\ + + - Various VM's in Rackspace with Fedora and EL6 on them, setup by + Luis Pabon. From their description in Jenkins, they're nodes for + "open-stack swift executing functional test suite against + Gluster-for-Swift". + +- gfs-rackspace-f20-1\ + + - A VM in Rackspace for automatically building RPMs on. Setup + + maintained by Luis Pabon. + +- netbsd0.cloud.gluster.org\ + + - NetBSD 6.1.4 VM in Rackspace. Used for automatic smoke testing + on NetBSD 6.x of all proposed patches (uses a Gerrit trigger). + - Setup and maintained by Manu Dreyfus + +- netbsd7.cloud.gluster.org\ + + - NetBSD 7 (beta) VM in Rackspace. Used for automatic smoke + testing on NetBSD 7 of all proposed patches (uses a Gerrit + trigger). + - Setup and maintained by Manu Dreyfus + +- nbslave7\*.cloud.gluster.org\ + + - NetBSD 7 slaves VMs for running our regression tests on + - Setup and maintained by Manu Dreyfus + +- slave20.cloud.gluster.org - slave49.cloud.gluster.org\ + + - CentOS 6.5 VM's in Rackspace. Used for automatic regression + testing of all proposed patches (uses a Gerrit trigger). + - Setup and maintained by Michael Scherer + +Work is being done on the GlusterFS regression tests so they'll function +on FreeBSD and NetBSD (instead of just Linux). When that's complete, +we'll automatically run full regression testing on FreeBSD and NetBSD +for all proposed patches too. + +Non Jenkins VMs +--------------- + +**backups.cloud.gluster.org** + + Server holding our nightly backups. Setup and maintained by Michael + Scherer. + +**bareos-dev.cloud.gluster.org, bareos-data.cloud.gluster.org** + + Shared VMs to debug Bareos and libgfapi integration. Maintained by + Niels de Vos. + +**bugs.cloud.gluster.org** + + Hosting + [gluster-bugs-webui](https://github.com/gluster/gluster-bugs-webui) + for bug triage/checking. Maintained by Niels de Vos. + +**docs.cloud.gluster.org** + + Documentation server, running readTheDocs - managed by Soumya Deb. + +**download.gluster.org** + + Our primary download server - holds the Gluster binaries we + generate, which people can download. + +**gluster-sonar** + + Hosts our Gluster + [SonarQube](http://sonar.peircean.com/dashboard/index/com.peircean.glusterfs:glusterfs-java-filesystem) + instance. Setup and maintained by Louis Zuckerman. + +**salt-master.gluster.org** + + Our Configuration Mgmt master VM. Maintained by Michael Scherer. + +**munin.gluster.org** + + Munin master. Maintained by Michael Scherer. + +**webbuilder.gluster.org** + + Our builder for the website. Maintained by Michael Scherer. + +**www.gluster.org aka supercolony.gluster.org** + + The main website server. Maintained by Michael Scherer, Justin + Clift, Others ( add your name ) diff --git a/doc/developer-guide/Jenkins-Manual-Setup.md b/doc/developer-guide/Jenkins-Manual-Setup.md new file mode 100644 index 00000000000..3622c7265a0 --- /dev/null +++ b/doc/developer-guide/Jenkins-Manual-Setup.md @@ -0,0 +1,146 @@ +Setting up Jenkins slaves on Rackspace for GlusterFS regression testing +======================================================================= + +This is for RHEL/CentOS 6.x. The below commands should be run as root. + +### Install additional required packages + + yum -y install cmockery2-devel dbench libacl-devel mock nfs-utils yajl perl-Test-Harness salt-minion + +### Enable yum-cron for automatic rpm updates + + chkconfig yum-cron on + +### Add the mock user + + useradd -g mock mock + +### Disable eth1 + +Because GlusterFS can fail if more than 1 ethernet interface + + sed -i 's/ONBOOT=yes/ONBOOT=no/' /etc/sysconfig/network-scripts/ifcfg-eth1 + +### Disable IPv6 + +As per + + sed -i 's/IPV6INIT=yes/IPV6INIT=no/' /etc/sysconfig/network-scripts/ifcfg-eth0 + echo 'options ipv6 disable=1' > /etc/modprobe.d/ipv6.conf + chkconfig ip6tables off + sed -i 's/NETWORKING_IPV6=yes/NETWORKING_IPV6=no/' /etc/sysconfig/network + echo ' ' >> /etc/sysctl.conf + echo '# ipv6 support in the kernel, set to 0 by default' >> /etc/sysctl.conf + echo 'net.ipv6.conf.all.disable_ipv6 = 1' >> /etc/sysctl.conf + echo 'net.ipv6.conf.default.disable_ipv6 = 1' >> /etc/sysctl.conf + sed -i 's/v     inet6/-     inet6/' /etc/netconfig + +### Update hostname + + vi /etc/sysconfig/network + vi /etc/hosts + +### Remove IPv6 and eth1 interface from /etc/hosts + + sed -i 's/^10\./#10\./' /etc/hosts + sed -i 's/^2001/#2001/' /etc/hosts + +### Install ntp + + yum -y install ntp + chkconfig ntpdate on + service ntpdate start + +### Install OpenJDK, needed for Jenkins slaves + + yum -y install java-1.7.0-openjdk + +### Create the Jenkins user + + useradd -G wheel jenkins + chmod 755 /home/jenkins + +### Set the Jenkins password + + passwd jenkins + +### Copy the Jenkins SSH key from build.gluster.org + + mkdir /home/jenkins/.ssh + chmod 700 /home/jenkins/.ssh + cp `` /home/jenkins/.ssh/id_rsa + chown -R jenkins:jenkins /home/jenkins/.ssh + chmod 600 /home/jenkins/.ssh/id_rsa + +### Generate the SSH known hosts file for jenkins user + + su - jenkins + mkdir ~/foo + cd ~/foo + git clone `[`ssh://build@review.gluster.org/glusterfs.git`](ssh://build@review.gluster.org/glusterfs.git) + (this will ask if the new host fingerprint should be added.  Choose yes) + cd .. + rm -rf ~/foo +  exit + +### Install git from RPMForge + + yum -y install http://pkgs.repoforge.org/rpmforge-release/rpmforge-release-0.5.3-1.el6.rf.x86_64.rpm + yum -y --enablerepo=rpmforge-extras update git + +### Install the GlusterFS patch acceptance tests + + git clone git://forge.gluster.org/gluster-patch-acceptance-tests/gluster-patch-acceptance-tests.git /opt/qa + +### Add the loopback mount point to /etc/fstab + +For the 1GB Rackspace VM's use this: + + echo '/backingstore           /d                      xfs     loop            0 2' >> /etc/fstab + mount /d + +For the 2GB and above Rackspace VM's use this: + + echo '/dev/xvde   /d   xfs   defaults   0 2' >> /etc/fstab + mount /d + +### Create the directories needed for the regression testing + + JDIRS="/var/log/glusterfs /var/lib/glusterd /var/run/gluster /d /d/archived_builds /d/backends /d/build /d/logs /home/jenkins/root" + mkdir -p $JDIRS + chown jenkins:jenkins $JDIRS + chmod 755 $JDIRS + ln -s /d/build /build + +### Create the directories where regression logs are archived + + ADIRS="/archives/archived_builds /archives/logs" + mkdir -p $ADIRS + chown jenkins:jenkins $ADIRS + chmod 755 $ADIRS + +### Install Nginx + +For making logs available over http + + yum -y install http://nginx.org/packages/centos/6/noarch/RPMS/nginx-release-centos-6-0.el6.ngx.noarch.rpm + yum -y install nginx + lokkit -s http + +### Copy the Nginx config file into place + + cp -f /opt/qa/nginx/default.conf /etc/nginx/conf.d/default.conf + +### Enable wheel group for sudo + + sed -i 's/# %wheel\tALL=(ALL)\tNOPASSWD/%wheel\tALL=(ALL)\tNOPASSWD/' /etc/sudoers + +### Reboot (for networking changes to take effect) + + reboot + +### Add forward and reverse DNS entries for the slave into Rackspace DNS + +Rackspace recently added [API calls for its Cloud +DNS](https://developer.rackspace.com/docs/cloud-dns/getting-started/?lang=python) +service, so we should be able to fully automate this part as well now. \ No newline at end of file diff --git a/doc/developer-guide/Language-Bindings.md b/doc/developer-guide/Language-Bindings.md new file mode 100644 index 00000000000..89ef6df3d78 --- /dev/null +++ b/doc/developer-guide/Language-Bindings.md @@ -0,0 +1,39 @@ +GlusterFS 3.4 introduced the libgfapi client API for C programs. This +page lists bindings to the libgfapi C library from other languages. + +Go +-- + +- [gogfapi](https://forge.gluster.org/gogfapi) - Go language bindings + for libgfapi, aiming to provide an api consistent with the default + Go file apis. + +Java +---- + +- [libgfapi-jni](https://github.com/semiosis/libgfapi-jni/) - Low + level JNI binding for libgfapi +- [glusterfs-java-filesystem](https://github.com/semiosis/glusterfs-java-filesystem) + - High level NIO.2 FileSystem Provider implementation for the Java + platform +- [libgfapi-java-io](https://github.com/gluster/libgfapi-java-io) - + Java bindings for libgfapi, similar to java.io + +Python +------ + +- [libgfapi-python](https://github.com/gluster/libgfapi-python) - + Libgfapi bindings for Python + +Ruby +---- + +- [libgfapi-ruby](https://github.com/spajus/libgfapi-ruby) - Libgfapi + bindings for Ruby using FFI + +Rust +---- + +- [gfapi-sys](https://github.com/cholcombe973/Gfapi-sys) - Libgfapi + bindings for Rust using FFI + diff --git a/doc/developer-guide/Projects.md b/doc/developer-guide/Projects.md new file mode 100644 index 00000000000..5c41fef9daf --- /dev/null +++ b/doc/developer-guide/Projects.md @@ -0,0 +1,99 @@ +This page contains a list of project ideas which will be suitable for +students (for GSOC, internship etc.) + +Projects with mentors +--------------------- + +### gfsck - A GlusterFS filesystem check + +- A tool to check filesystem integrity and repairing +- I'm currently working on it +- Owner: Xavier Hernandez (Datalab) + +### Sub-directory mount support for native GlusterFS mounts + +Allow clients to directly mount directories inside a GlusterFS volume, +like how NFS clients can mount directories inside an NFS export. + +Mentor: Kaushal + +### GlusterD services high availablity + +GlusterD should restart the processes it manages, bricks, nfs server, +self-heal daemon and quota daemon, whenever it detects they have died. + +Mentor : Atin Mukherjee + +### Language bindings for libgfapi + +- API/library for accessing gluster volumes + +### oVirt gui for stats + +Have pretty graphs and tables in ovirt for the GlusterFS top and profile +commands. + +### Monitoring integrations - munin others + +The more monitoring support we have for GlusterFS the better. + +### More compression algorithms for compression xlator + +The on-wire compression translator should be extended to support more +compression algorithms. Ideally it should be pluggable. + +### Cinder GlusterFS backup driver + +Write a driver for cinder, a part of openstack, to allow backup onto +GlusterFS volumes + +### rsockets - sockets for rdma transport + +Coding for RDMA using the familiar socket api should lead to a more +robust rdma transport + +### Data import tool + +Create a tool which will allow importing already existing data in the +brick directories into the gluster volume. This is most likely going to +be a special rebalance process. + +### Rebalance improvements + +Improve rebalance performance. + +### Meta translator + +The meta xlator provides a /proc like interface to GlusterFS xlators. +This could be improved upon and the meta xlator could be made a standard +part of the volume graph. + +### Geo-replication using rest-api + +Might be suitable for geo replication over WAN. + +### Quota using underlying FS' quota + +GlusterFS quota is currently maintained completely in GlusterFSs +namespace using xattrs. We could make use of the quota capabilities of +the underlying fs (XFS) for better performance. + +### Snapshot pluggability + +Snapshot should be able to make use of snapshot support provided by +btrfs for example. + +### Compression at rest + +Lessons learnt while implementing encryption at rest can be used with +the compression at rest. + +### File-level deduplication + +GlusterFS works on files. So why not have dedup at the level files as +well. + +### Composition xlator for small files + +Merge small files into a designated large file using our own custom +semantics. This can improve our small file performance. \ No newline at end of file diff --git a/doc/developer-guide/Simplified-Development-Workflow.md b/doc/developer-guide/Simplified-Development-Workflow.md new file mode 100644 index 00000000000..c95e3ba4f67 --- /dev/null +++ b/doc/developer-guide/Simplified-Development-Workflow.md @@ -0,0 +1,238 @@ +Simplified development workflow for GlusterFS +============================================= + +This page gives a simplified model of the development workflow used by +the GlusterFS project. This will give the steps required to get a patch +accepted into the GlusterFS source. + +Visit [Development Work Flow](./Development Workflow.md) a more +detailed description of the workflow. + +Initial preperation +------------------- + +The GlusterFS development workflow revolves around +[Git](http://git.gluster.org/?p=glusterfs.git;a=summary), +[Gerrit](http://review.gluster.org) and +[Jenkins](http://build.gluster.org). + +Using these tools requires some initial preparation. + +### Dev system setup + +You should install and setup Git on your development system. Use your +distribution specific package manger to install git. After installation +configure git. At the minimum, set a git user email. To set the email +do, + + $ git config --global user.name "Name" + $ git config --global user.email  + +You should also generate an ssh key pair if you haven't already done it. +To generate a key pair do, + + $ ssh-keygen + +and follow the instructions. + +Next, install the build requirements for GlusterFS. Refer +[Building GlusterFS - Build Requirements](./Building GlusterFS.md#Build Requirements) +for the actual requirements. + +### Gerrit setup + +To contribute to GlusterFS, you should first register on +[gerrit](http://review.gluster.org). + +After registration, you will need to select a username, set a preferred +email and upload the ssh public key in gerrit. You can do this from the +gerrit settings page. Make sure that you set the preferred email to the +email you configured for git. + +### Get the source + +Git clone the GlusterFS source using + + @review.gluster.org/glusterfs.git + +(replace with your gerrit username). + + $ git clone (ssh://) @review.gluster.org/glusterfs.git + +This will clone the GlusterFS source into a subdirectory named glusterfs +with the master branch checked out. + +It is essential that you use this link to clone, or else you will not be +able to submit patches to gerrit for review. + +Actual development +------------------ + +The commands in this section are to be run inside the glusterfs source +directory. + +### Create a development branch + +It is recommended to use separate local development branches for each +change you want to contribute to GlusterFS. To create a development +branch, first checkout the upstream branch you want to work on and +update it. More details on the upstream branching model for GlusterFS +can be found at + +[Development Work Flow - Branching\_policy](./Development Workflow.md#branching-policy). +For example if you want to develop on the master branch, + + $ git checkout master + $ git pull + +Now, create a new branch from master and switch to the new branch. It is +recommended to have descriptive branch names. Do, + + $ git branch  + $ git checkout  + +or, + + $ git checkout -b  + +to do both in one command. + +### Hack + +Once you've switched to the development branch, you can perform the +actual code changes. [Build](./Building GlusterFS) and test to +see if your changes work. + +#### Tests + +Unless your changes are very minor and trivial, you should also add a +test for your change. Tests are used to ensure that the changes you did +are not broken inadvertently. More details on tests can be found at + +[Development Workflow - Test cases](./Development Workflow.md#test-cases) +and +[Development Workflow - Regression tests and test cases](./Development Workflow.md#regression-tests-and-test-cases) + +### Regression test + +Once your change is working, run the regression test suite to make sure +you haven't broken anything. The regression test suite requires a +working GlusterFS installation and needs to be run as root. To run the +regression test suite, do + + # make install + # ./run-tests.sh + +### Commit your changes + +If you haven't broken anything, you can now commit your changes. First +identify the files that you modified/added/deleted using git-status and +stage these files. + + $ git status + $ git add  + +Now, commit these changes using + + $ git commit -s + +Provide a meaningful commit message. The commit message policy is +described at + +[Development Work Flow - Commit policy](./Development Workflow.md#commit-policy). + +It is essential that you commit with the '-s' option, which will +sign-off the commit with your configured email, as gerrit is configured +to reject patches which are not signed-off. + +### Submit for review + +To submit your change for review, run the rfc.sh script, + + $ ./rfc.sh + +The script will ask you to enter a bugzilla bug id. Every change +submitted to GlusterFS needs a bugzilla entry to be accepted. If you do +not already have a bug id, file a new bug at [Red Hat +Bugzilla](https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS). +If the patch is submitted for review, the rfc.sh script will return the +gerrit url for the review request. + +More details on the rfc.sh script are available at +[Development Work Flow - rfc.sh](./Development Workflow.md#rfc.sh). + +Review process +-------------- + +Your change will now be reviewed by the GlusterFS maintainers and +component owners on [gerrit](http://review.gluster.org). You can follow +and take part in the review process on the change at the review url. The +review process involves several steps. + +To know component owners , you can check the "MAINTAINERS" file in root +of glusterfs code directory + +### Automated verification + +Every change submitted to gerrit triggers an initial automated +verification on [jenkins](http://build.gluster.org). The automated +verification ensures that your change doesn't break the build and has an +associated bug-id. + +More details can be found at + +[Development Work Flow - Auto verification](./Development Workflow.md#auto-verification). + +### Formal review + +Once the auto verification is successful, the component owners will +perform a formal review. If they are okay with your change, they will +give a positive review. If not they will give a negative review and add +comments on the reasons. + +More information regarding the review qualifiers and disqualifiers is +available at + +[Development Work Flow - Submission Qualifiers](./Development Workflow.md#submission-qualifiers) +and +[Development Work Flow - Submission Disqualifiers](./Development Workflow.md#submission-disqualifiers). + +If your change gets a negative review, you will need to address the +comments and resubmit your change. + +#### Resubmission + +Switch to your development branch and make new changes to address the +review comments. Build and test to see if the new changes are working. + +Stage your changes and commit your new changes using, + + $ git commit --amend + +'--amend' is required to ensure that you update your original commit and +not create a new commit. + +Now you can resubmit the updated commit for review using the rfc.sh +script. + +The formal review process could take a long time. To increase chances +for a speedy review, you can add the component owners as reviewers on +the gerrit review page. This will ensure they notice the change. The +list of component owners can be found in the MAINTAINERS file present in +the GlusterFS source + +### Verification + +After a component owner has given a positive review, a maintainer will +run the regression test suite on your change to verify that your change +works and hasn't broken anything. This verification is done with the +help of jenkins. + +If the verification fails, you will need to make necessary changes and +resubmit an updated commit for review. + +### Acceptance + +After successful verification, a maintainer will merge/cherry-pick (as +necessary) your change into the upstream GlusterFS source. Your change +will now be available in the upstream git repo for everyone to use. \ No newline at end of file diff --git a/doc/developer-guide/Using-Gluster-Test-Framework.md b/doc/developer-guide/Using-Gluster-Test-Framework.md new file mode 100644 index 00000000000..5256e973fbc --- /dev/null +++ b/doc/developer-guide/Using-Gluster-Test-Framework.md @@ -0,0 +1,270 @@ +Description +----------- + +The Gluster Test Framework, is a suite of scripts used for regression +testing of Gluster. + +It runs well on RHEL and CentOS (possibly Fedora too, presently being +tested), and is automatically run against every patch submitted to +Gluster [for review](http://review.gluster.org). + +The Gluster Test Framework is part of the main Gluster code base, living +under the "tests" subdirectory: + + http://git.gluster.org/?p=glusterfs.git;a=summary + +WARNING +------- + +Running the Gluster Test Framework deletes “/var/lib/glusterd/\*”. + +**DO NOT run it on a server with any data.** + +Preparation steps for Ubuntu 14.04 LTS +-------------------------------------- + +​1. \# apt-get install dbench git libacl1-dev mock nfs-common +nfs-kernel-server libtest-harness-perl libyajl-dev xfsprogs psmisc attr +acl lvm2 rpm + +​2. \# apt-get install python-webob python-paste python-sphinx + +​3. \# apt-get install autoconf automake bison dos2unix flex libfuse-dev +libaio-dev libibverbs-dev librdmacm-dev libtool libxml2-dev +libxml2-utils liblvm2-dev make libssl-dev pkg-config libpython-dev +python-eventlet python-netifaces python-simplejson python-pyxattr +libreadline-dev systemtap-sdt-dev tar + +​4) Install cmockery2 from github (https://github.com/lpabon/cmockery2) +and compile and make install as in Readme + +5) + + sudo groupadd mock + sudo useradd -g mock mock + +​6) mkdir /var/run/gluster + +**Note**: redhat-rpm-config package is not found in ubuntu + +Preparation steps for CentOS 7 (only) +------------------------------------- + +​1. Install EPEL: + + $ sudo yum install -y http://epel.mirror.net.in/epel/7/x86_64/e/epel-release-7-1.noarch.rpm + +​2. Install the CentOS 7.x dependencies: + + $ sudo yum install -y --enablerepo=epel cmockery2-devel dbench git libacl-devel mock nfs-utils perl-Test-Harness yajl xfsprogs psmisc + + $ sudo yum install -y --enablerepo=epel python-webob1.0 python-paste-deploy1.5 python-sphinx10 redhat-rpm-config + +==\> Despite below missing packages it worked for me + + No package python-webob1.0 available. + No package python-paste-deploy1.5 available. + No package python-sphinx10 available. + + $ sudo yum install -y --enablerepo=epel autoconf automake bison dos2unix flex fuse-devel libaio-devel libibverbs-devel \ +  librdmacm-devel libtool libxml2-devel lvm2-devel make openssl-devel pkgconfig \ +  python-devel python-eventlet python-netifaces python-paste-deploy \ +  python-simplejson python-sphinx python-webob pyxattr readline-devel rpm-build \ +  systemtap-sdt-devel tar + +​3. Create the mock user + + $ sudo useradd -g mock mock + +Preparation steps for CentOS 6.3+ (only) +---------------------------------------- + +​1. Install EPEL: + + $ sudo yum install -y http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm + +​2. Install the CentOS 6.x dependencies: + + $ sudo yum install -y --enablerepo=epel cmockery2-devel dbench git libacl-devel mock nfs-utils perl-Test-Harness yajl xfsprogs + $ sudo yum install -y --enablerepo=epel python-webob1.0 python-paste-deploy1.5 python-sphinx10 redhat-rpm-config + $ sudo yum install -y --enablerepo=epel autoconf automake bison dos2unix flex fuse-devel libaio-devel libibverbs-devel \ +   librdmacm-devel libtool libxml2-devel lvm2-devel make openssl-devel pkgconfig \ +   python-devel python-eventlet python-netifaces python-paste-deploy \ +   python-simplejson python-sphinx python-webob pyxattr readline-devel rpm-build \ +   systemtap-sdt-devel tar + +​3. Create the mock user + + $ sudo useradd -g mock mock + +Preparation steps for RHEL 6.3+ (only) +-------------------------------------- + +​1. Ensure you have the "Scalable Filesystem Support" group installed + +This provides the xfsprogs package, which is required by the test +framework. + +​2. Install EPEL: + + $ sudo yum install -y http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm + +​3. Install the CentOS 6.x dependencies: + + $ sudo yum install -y --enablerepo=epel cmockery2-devel dbench git libacl-devel mock nfs-utils yajl perl-Test-Harness + $ sudo yum install -y --enablerepo=rhel-6-server-optional-rpms python-webob1.0 python-paste-deploy1.5 python-sphinx10 redhat-rpm-config + $ sudo yum install -y --disablerepo=rhs* --enablerepo=*optional-rpms autoconf \ +   automake bison dos2unix flex fuse-devel libaio-devel libibverbs-devel \ +   librdmacm-devel libtool libxml2-devel lvm2-devel make openssl-devel pkgconfig \ +   python-devel python-eventlet python-netifaces python-paste-deploy \ +   python-simplejson python-sphinx python-webob pyxattr readline-devel rpm-build \ +   systemtap-sdt-devel tar + +​4. Create the mock user + + $ sudo useradd -g mock mock + +Preparation steps for Fedora 16-19 (only) +----------------------------------------- + +**Still in development** + +​1. Install the Fedora dependencies: + + $ sudo yum install -y attr cmockery2-devel dbench git mock nfs-utils perl-Test-Harness psmisc xfsprogs + $ sudo yum install -y python-webob1.0 python-paste-deploy1.5 python-sphinx10 redhat-rpm-config + $ sudo yum install -y autoconf automake bison dos2unix flex fuse-devel libaio-devel libibverbs-devel \ +   librdmacm-devel libtool libxml2-devel lvm2-devel make openssl-devel pkgconfig \ +   python-devel python-eventlet python-netifaces python-paste-deploy \ +   python-simplejson python-sphinx python-webob pyxattr readline-devel rpm-build \ +   systemtap-sdt-devel tar + +​3. Create the mock user + + $ sudo useradd -g mock mock + +Common steps +------------ + +​1. Ensure DNS for your server is working + +The Gluster Test Framework fails miserably if the full domain name for +your server doesn't resolve back to itself. + +If you don't have a working DNS infrastructure in place, adding an entry +for your server to its /etc/hosts file will work. + +​2. Install the version of Gluster you are testing + +Either install an existing set of rpms: + + $ sudo yum install [your gluster rpms here] + +Or compile your own ones (fairly easy): + + http://www.gluster.org/community/documentation/index.php/CompilingRPMS + +​3. Clone the GlusterFS git repository + + $ git clone git://git.gluster.org/glusterfs + $ cd glusterfs + +Ensure mock can access the directory +------------------------------------ + +Some tests run as the user "mock". If the mock user can't access the +tests subdirectory directory, these tests fail. (rpm.t is one such test) + +This is a known gotcha when the git repo is cloned to your home +directory. Home directories generally don't have world readable +permissions. You can fix this by adjusting your home directory +permissions, or placing the git repo somewhere else (with access for the +mock user). + +Running the tests +----------------- + +The tests need to run as root, so they can mount volumes and manage +gluster processes as needed. + +It's also best to run them directly as the root user, instead of through +sudo. Strange things sporadicly happen (for me) when using the full test +framework through sudo, that haven't happened (yet) when running +directly as root. Hangs in dbench particularly, which are part of at +least one test. + + # ./run-tests.sh + +The test framework takes just over 45 minutes to run in a VM here (4 +cpu's assigned, 8GB ram, SSD storage). It may take significantly more or +less time for you, depending on the hardware and software you're using. + +Showing debug information +------------------------- + +To display verbose information while the tests are running, set the +DEBUG environment variable to 1 prior to running the tests. + + # DEBUG=1 ./run-tests.sh + +Log files +--------- + +Verbose output from the rpm.t test goes into "rpmbuild-mock.log", +located in the same directory the test is run from. + +Reporting bugs +-------------- + +If you hit a bug when running the test framework, **please** create a +bug report for it on Bugzilla so it gets fixed: + + https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS&component=tests + +Creating your own tests +----------------------- + +The test scripts are written in bash, with their filenames ending in .t +instead of .sh. + +When creating your own test scripts, create them in an appropriate +subdirectory under "tests" (eg "bugs" or "features") and use descriptive +names like "bug-XXXXXXX-checking-feature-X.t" + +Also include the "include.rc" file, which defines the test types and +host/brick/volume defaults: + + . $(dirname $0)/../include.rc + +There are 5 test types available at present, but feel free to add more +if you need something that doesn't yet exist. The test types are +explained in more detail below. + +Also essential is the "cleanup" command, which removes any existing +Gluster configuration (**without backing it up**), and also kills any +running gluster processes. + +There is a basic test template you can copy, named bug-000000.t in the +bugs subdirectory: + + $ cp bugs/bug-000000.t somedir/descriptive-name.t + +### TEST + +- Example of usage in basic/volume.t + +### TEST\_IN\_LOOP + +- Example of usage in basic/rpm.t + +### EXPECT + +- Example of usage in basic/volume.t + +### EXPECT\_WITHIN + +- Example of usage in basic/volume-status.t + +### EXPECT\_KEYWORD + +- Defined in include.rc, but seems to be unused? \ No newline at end of file diff --git a/doc/developer-guide/afr-locks-evolution.md b/doc/developer-guide/afr-locks-evolution.md new file mode 100644 index 00000000000..7d2a136d871 --- /dev/null +++ b/doc/developer-guide/afr-locks-evolution.md @@ -0,0 +1,91 @@ +History of locking in AFR +-------------------------- + +GlusterFS has **locks** translator which provides the following internal locking operations called `inodelk`, `entrylk` which are used by afr to achieve synchronization of operations on files or directories that conflict with each other. + +`Inodelk` gives the facility for translators in GlusterFS to obtain range (denoted by tuple with **offset**, **length**) locks in a given **domain** for an inode. +Full file lock is denoted by the tuple (offset: `0`, length: `0`) i.e. length `0` is considered as infinity. + +`Entrylk` enables translators of GlusterFS to obtain locks on `name` in a given **domain** for an inode, typically a directory. + +The **locks** translator provides both *blocking* and *nonblocking* variants and of these locks. + + +AFR makes use of locks xlator extensively: + +1)For FOPS (from clients) +----------------------- +* Data transactions take inode locks on data domain, Let's refer to this domain name as DATA_DOMAIN. + + So locking for writes would be something like this:`inodelk(offset,length, DATA_DOMAIN)` + For truncating a file to zero, it would be `inodelk(0,0,DATA_DOMAIN)` + +* Metadata transactions (chown/chmod) also take inode locks but on a special range on metadata domain, + i.e.`(LLONG_MAX-1 , 0, METADATA_DOMAIN).` + +* Entry transactions (create, mkdir, rmdir,unlink, symlink, link,rename) take entrylk on `(name, parent inode)`. + + +2)For self heal: +------------- +* For Metadata self-heal, it is the same. i.e.`inodelk(LLONG_MAX-1 , 0, METADATA_DOMAIN)`. +* For Entry self-heal, it is `entrylk(NULL name, parent inode)`. Specifying NULL for the name takes full lock on the directory referred to by the inode. +* For data self-heal, there is a bit of history as to how locks evolved: + +###Initial version (say version 1) : +There was no concept of selfheal daemon (shd). Only client lookups triggered heals. so AFR always took `inodelk(0,0,DATA_DOMAIN)` for healing. The issue with this approach was that when heal was in progress, I/O from clients was blocked . + +###version 2: +shd was introduced. We needed to allow I/O to go through when heal was going,provided the ranges did not overlap. To that extent, the following approach was adopted: + ++ 1.shd takes (full inodelk in DATA_DOMAIN). Thus client FOPS are blocked and cannot modify changelog-xattrs ++ 2.shd inspects xattrs to determine source/sink ++ 3.shd takes a chunk inodelk(0-128kb) again in DATA_DOMAIN (locks xlator allows overlapping locks if lock owner is the same). ++ 4.unlock full lock ++ 5.heal ++ 6.take next chunk lock(129-256kb) ++ 7.unlock 1st chunk lock, heal the second chunk and so on. + + +Thus after 4, any client FOP could write to regions that was not currently under heal. The exception was truncate (to size 0) because it needs full file lock and will always block because some chunk is always under lock by the shd until heal completes. + +Another issue was that 2 shds could run in parallel. Say SHD1 and SHD2 compete for step 1. Let SHD1 win. It proceeds and completes step 4. Now SHD2 also succeeds in step 1, continues all steps. Thus at the end both shds will decrement the changelog leading to negative values in it) + +### version 3 +To prevent parallel self heals, another domain was introduced, let us call it SELF_HEAL_DOMAIN. With this domain, the following approach was adopted and is **the approach currently in use**: + ++ 1.shd takes (full inodelk on SELF_HEAL_DOMAIN) ++ 2.shd takes (full inodelk on DATA_DOMAIN) ++ 3.shd inspects xattrs to determine source/sink ++ 4.unlock full lock on DATA_DOMAIN ++ 5.take chunk lock(0-128kb) on DATA_DOMAIN ++ 6.heal ++ 7.take next chunk lock(129-256kb) on DATA_DOMAIN ++ 8.unlock 1st chunk lock, heal and so on. ++ 9.Finally release full lock on SELF_HEAL_DOMAIN + +Thus until one shd completes step 9, another shd cannot start step 1, solving the problem of simultaneous heals. +Note that the issue of truncate (to zero) FOP hanging still remains. +Also there are multiple network calls involved in this scheme. (lock,heal(ie read+write), unlock) per chunk. i.e 4 calls per chunk. + +### version 4 (ToDo) +Some improvements that need to be made in version 3: +* Reduce network calls using piggy backing. +* After taking chunk lock and healing, we need to unlock the lock before locking the next chunk. This gives a window for any pending truncate FOPs to succeed. If truncate succeeds, the heal of next chunk will fail (read returns zero) +and heal is stopped. *BUT* there is **yet another** issue: + +* shd does steps 1 to 4. Let's assume source is brick b1, sink is brick b2 . i.e xattrs are (0,1) and (0,0) on b1 and b2 respectively. Now before shd takes (0-128kb) lock, a client FOP takes it. +It modifies data but the FOP succeeds only on brick 2. writev returns success, and the attrs now read (0,1) (1,0). SHD takes over and heals. It had observed (0,1),(0,0) earlier +and thus goes ahead and copies stale 128Kb from brick 1 to brick2. Thus as far as application is concerned, `writev` returned success but bricks have stale data. +What needs to be done is `writev` must return success only if it succeeded on atleast one source brick (brick b1 in this case). Otherwise The heal still happens in reverse direction but as far as the application is concerned, it received an error. + +###Note on lock **domains** +We have used conceptual names in this document like DATA_DOMAIN/ METADATA_DOMAIN/ SELF_HEAL_DOMAIN. In the code, these are mapped to strings that are based on the AFR xlator name like so: + +DATA_DOMAIN --->"vol_name-replicate-n" + +METADATA_DOMAIN --->"vol_name-replicate-n:metadata" + +SELF_HEAL_DOMAIN -->"vol_name-replicate-n:self-heal" + +where vol_name is the name of the volume and 'n' is the replica subvolume index (starting from 0). diff --git a/doc/developer-guide/afr-self-heal-daemon.md b/doc/developer-guide/afr-self-heal-daemon.md new file mode 100644 index 00000000000..b85ddd1c856 --- /dev/null +++ b/doc/developer-guide/afr-self-heal-daemon.md @@ -0,0 +1,92 @@ +Self-Heal Daemon +================ +The self-heal daemon (shd) is a glusterfs process that is responsible for healing files in a replicate/ disperse gluster volume. +Every server (brick) node of the volume runs one instance of the shd. So even if one node contains replicate/ disperse bricks of +multiple volumes, it would be healed by the same shd. + +This document only describes how the shd works for replicate (AFR) volumes. + +The shd is launched by glusterd when the volume starts (only if the volume includes a replicate configuration). The graph +of the shd process in every node contains the following: The io-stats which is the top most xlator, its children being the +replicate xlators (subvolumes) of *only* the bricks present in that particular node, and finally *all* the client xlators that are the children of the replicate xlators. + +The shd does two types of self-heal crawls: Index heal and Full heal. For both these types of crawls, the basic idea is the same: +For each file encountered while crawling, perform metadata, data and entry heals under appropriate locks. +* An overview of how each of these heals is performed is detailed in the 'Self-healing' section of *doc/features/afr-v1.md* +* The different file locks which the shd takes for each of these heals is detailed in *doc/developer +-guide/afr-locks-evolution.md* + +Metadata heal refers to healing extended attributes, mode and permissions of a file or directory. +Data heal refers to healing the file contents. +Entry self-heal refers to healing entries inside a directory. + +Index heal +========== +The index heal is done: + a) Every 600 seconds (can be changed via the `cluster.heal-timeout` volume option) + b) When it is explicitly triggered via the `gluster vol heal ` command + c) Whenever a replica brick that was down comes back up. + +Only one heal can be in progress at one time, irrespective of reason why it was triggered. If another heal is triggered before the first one completes, it will be queued. +Only one heal can be queued while the first one is running. If an Index heal is queued, it can be overridden by queuing a Full heal and not vice-versa. Also, before processing +each entry in index heal, a check is made if a full heal is queued. If it is, then the index heal is aborted so that the full heal can proceed. + +In index heal, each shd reads the entries present inside .glusterfs/indices/xattrop/ folder and triggers heal on each entry with appropriate locks. +The .glusterfs/indices/xattrop/ directory contains a base entry of the name "xattrop-". All other entries are hardlinks to the base entry. The +*names* of the hardlinks are the gfid strings of the files that may need heal. + +When a client (mount) performs an operation on the file, the index xlator present in each brick process adds the hardlinks in the pre-op phase of the FOP's transaction +and removes it in post-op phase if the operation is successful. Thus if an entry is present inside the .glusterfs/indices/xattrop/ directory when there is no I/O +happening on the file, it means the file needs healing (or atleast an examination if the brick crashed after the post-op completed but just before the removal of the hardlink). + +####Index heal steps: +

+In shd process of *each node* {
+        opendir +readdir (.glusterfs/indices/xattrop/)
+        for each entry inside it {
+                self_heal_entry() //Explained below.
+        }
+}
+
+ +

+self_heal_entry() {
+        Call syncop_lookup(replicae subvolume) which eventually does {
+                take appropriate locks
+                determine source and sinks from AFR changelog xattrs	
+                perform whatever heal is needed (any of metadata, data and entry heal in that order)
+                clear changelog xattrs and hardlink inside .glusterfs/indices/xattrop/
+        }
+}
+
+ +Note: +* If the gfid hardlink is present in the .glusterfs/indices/xattrop/ of both replica bricks, then each shd will try to heal the file but only one of them will be able to proceed due to the self-heal domain lock. + +* While processing entries inside .glusterfs/indices/xattrop/, if shd encounters an entry whose parent is yet to be healed, it will skip it and it will be picked up in the next crawl. + +* If a file is in data/ metadata split-brain, it will not be healed. + +* If a directory is in entry split-brain, a conservative merge will be performed, wherein after the merge, the entries of the directory will be a union of the entries in the replica pairs. + +Full heal +========= +A full heal is triggered by running `gluster vol heal full`. This command is usually run in disk replacement scenarios where the entire data is to be copied from one of the healthy bricks of the replica to the brick that was just replaced. + +Unlike the index heal which runs on the shd of every node in a replicate subvolume, the full heal is run only on the shd of one node per replicate subvolume: the node having the highest UUID. +i.e In a 2x2 volume made of 4 nodes N1, N2, N3 and N4, If UUID of N1>N2 and UUID N4 >N3, then the full crawl is carried out by the shds of N1 and N4.(Node UUID can be found in `/var/lib/glusterd/glusterd.info`) + +The full heal steps are almost identical to the index heal, except the heal is performed on each replica starting from the root of the volume: +

+In shd process of *highest UUID node per replica* {
+        opendir +readdir ("/")
+        for each entry inside it {
+                self_heal_entry()
+                if (entry == directory) {
+                        /* Recurse*/
+                        again opendir+readdir (directory) followed by self_heal_entry() of each entry.
+                }
+                
+        }
+}
+
diff --git a/doc/developer-guide/afr.md b/doc/developer-guide/afr.md new file mode 100644 index 00000000000..566573a4e26 --- /dev/null +++ b/doc/developer-guide/afr.md @@ -0,0 +1,191 @@ +cluster/afr translator +====================== + +Locking +------- + +Before understanding replicate, one must understand two internal FOPs: + +### `GF_FILE_LK` + +This is exactly like `fcntl(2)` locking, except the locks are in a +separate domain from locks held by applications. + +### `GF_DIR_LK (loc_t *loc, char *basename)` + +This allows one to lock a name under a directory. For example, +to lock /mnt/glusterfs/foo, one would use the call: + +``` +GF_DIR_LK ({loc_t for "/mnt/glusterfs"}, "foo") +``` + +If one wishes to lock *all* the names under a particular directory, +supply the basename argument as `NULL`. + +The locks can either be read locks or write locks; consult the +function prototype for more details. + +Both these operations are implemented by the features/locks (earlier +known as posix-locks) translator. + +Basic design +------------ + +All FOPs can be classified into four major groups: + +### inode-read + +Operations that read an inode's data (file contents) or metadata (perms, etc.). + +access, getxattr, fstat, readlink, readv, stat. + +### inode-write + +Operations that modify an inode's data or metadata. + +chmod, chown, truncate, writev, utimens. + +### dir-read + +Operations that read a directory's contents or metadata. + +readdir, getdents, checksum. + +### dir-write + +Operations that modify a directory's contents or metadata. + +create, link, mkdir, mknod, rename, rmdir, symlink, unlink. + +Some of these make a subgroup in that they modify *two* different entries: +link, rename, symlink. + +### Others + +Other operations. + +flush, lookup, open, opendir, statfs. + +Algorithms +---------- + +Each of the four major groups has its own algorithm: + +### inode-read, dir-read + +1. Send a request to the first child that is up: + * if it fails: + * try the next available child + * if we have exhausted all children: + * return failure + +### inode-write + + All operations are done in parallel unless specified otherwise. + +1. Send a ``GF_FILE_LK`` request on all children for a write lock on the + appropriate region + (for metadata operations: entire file (0, 0) for writev: + (offset, offset+size of buffer)) + * If a lock request fails on a child: + * unlock all children + * try to acquire a blocking lock (`F_SETLKW`) on each child, serially. + If this fails (due to `ENOTCONN` or `EINVAL`): + Consider this child as dead for rest of transaction. +2. Mark all children as "pending" on all (alive) children (see below for +meaning of "pending"). + * If it fails on any child: + * mark it as dead (in transaction local state). +3. Perform operation on all (alive) children. + * If it fails on any child: + * mark it as dead (in transaction local state). +4. Unmark all successful children as not "pending" on all nodes. +5. Unlock region on all (alive) children. + +### dir-write + + The algorithm for dir-write is same as above except instead of holding + `GF_FILE_LK` locks we hold a GF_DIR_LK lock on the name being operated upon. + In case of link-type calls, we hold locks on both the operand names. + +"pending" +--------- + +The "pending" number is like a journal entry. A pending entry is an +array of 32-bit integers stored in network byte-order as the extended +attribute of an inode (which can be a directory as well). + +There are three keys corresponding to three types of pending operations: + +### `AFR_METADATA_PENDING` + +There are some metadata operations pending on this inode (perms, ctime/mtime, +xattr, etc.). + +### `AFR_DATA_PENDING` + +There is some data pending on this inode (writev). + +### `AFR_ENTRY_PENDING` + +There are some directory operations pending on this directory +(create, unlink, etc.). + +Self heal +--------- + +* On lookup, gather extended attribute data: + * If entry is a regular file: + * If an entry is present on one child and not on others: + * create entry on others. + * If entries exist but have different metadata (perms, etc.): + * consider the entry with the highest `AFR_METADATA_PENDING` number as + definitive and replicate its attributes on children. + * If entry is a directory: + * Consider the entry with the highest `AFR_ENTRY_PENDING` number as + definitive and replicate its contents on all children. + * If any two entries have non-matching types (i.e., one is file and + other is directory): + * Announce to the user via log that a split-brain situation has been + detected, and do nothing. +* On open, gather extended attribute data: + * Consider the file with the highest `AFR_DATA_PENDING` number as + the definitive one and replicate its contents on all other + children. + +During all self heal operations, appropriate locks must be held on all +regions/entries being affected. + +Inode scaling +------------- + +Inode scaling is necessary because if a situation arises where an inode number +is returned for a directory (by lookup) which was previously the inode number +of a file (as per FUSE's table), then FUSE gets horribly confused (consult a +FUSE expert for more details). + +To avoid such a situation, we distribute the 64-bit inode space equally +among all children of replicate. + +To illustrate: + +If c1, c2, c3 are children of replicate, they each get 1/3 of the available +inode space: + +------------- -- -- -- -- -- -- -- -- -- -- -- --- +Child: c1 c2 c3 c1 c2 c3 c1 c2 c3 c1 c2 ... +Inode number: 1 2 3 4 5 6 7 8 9 10 11 ... +------------- -- -- -- -- -- -- -- -- -- -- -- --- + +Thus, if lookup on c1 returns an inode number "2", it is scaled to "4" +(which is the second inode number in c1's space). + +This way we ensure that there is never a collision of inode numbers from +two different children. + +This reduction of inode space doesn't really reduce the usability of +replicate since even if we assume replicate has 1024 children (which would be a +highly unusual scenario), each child still has a 54-bit inode space: +$2^{54} \sim 1.8 \times 10^{16}$, which is much larger than any real +world requirement. diff --git a/doc/developer-guide/afr/afr-locks-evolution.md b/doc/developer-guide/afr/afr-locks-evolution.md deleted file mode 100644 index 7d2a136d871..00000000000 --- a/doc/developer-guide/afr/afr-locks-evolution.md +++ /dev/null @@ -1,91 +0,0 @@ -History of locking in AFR --------------------------- - -GlusterFS has **locks** translator which provides the following internal locking operations called `inodelk`, `entrylk` which are used by afr to achieve synchronization of operations on files or directories that conflict with each other. - -`Inodelk` gives the facility for translators in GlusterFS to obtain range (denoted by tuple with **offset**, **length**) locks in a given **domain** for an inode. -Full file lock is denoted by the tuple (offset: `0`, length: `0`) i.e. length `0` is considered as infinity. - -`Entrylk` enables translators of GlusterFS to obtain locks on `name` in a given **domain** for an inode, typically a directory. - -The **locks** translator provides both *blocking* and *nonblocking* variants and of these locks. - - -AFR makes use of locks xlator extensively: - -1)For FOPS (from clients) ------------------------ -* Data transactions take inode locks on data domain, Let's refer to this domain name as DATA_DOMAIN. - - So locking for writes would be something like this:`inodelk(offset,length, DATA_DOMAIN)` - For truncating a file to zero, it would be `inodelk(0,0,DATA_DOMAIN)` - -* Metadata transactions (chown/chmod) also take inode locks but on a special range on metadata domain, - i.e.`(LLONG_MAX-1 , 0, METADATA_DOMAIN).` - -* Entry transactions (create, mkdir, rmdir,unlink, symlink, link,rename) take entrylk on `(name, parent inode)`. - - -2)For self heal: -------------- -* For Metadata self-heal, it is the same. i.e.`inodelk(LLONG_MAX-1 , 0, METADATA_DOMAIN)`. -* For Entry self-heal, it is `entrylk(NULL name, parent inode)`. Specifying NULL for the name takes full lock on the directory referred to by the inode. -* For data self-heal, there is a bit of history as to how locks evolved: - -###Initial version (say version 1) : -There was no concept of selfheal daemon (shd). Only client lookups triggered heals. so AFR always took `inodelk(0,0,DATA_DOMAIN)` for healing. The issue with this approach was that when heal was in progress, I/O from clients was blocked . - -###version 2: -shd was introduced. We needed to allow I/O to go through when heal was going,provided the ranges did not overlap. To that extent, the following approach was adopted: - -+ 1.shd takes (full inodelk in DATA_DOMAIN). Thus client FOPS are blocked and cannot modify changelog-xattrs -+ 2.shd inspects xattrs to determine source/sink -+ 3.shd takes a chunk inodelk(0-128kb) again in DATA_DOMAIN (locks xlator allows overlapping locks if lock owner is the same). -+ 4.unlock full lock -+ 5.heal -+ 6.take next chunk lock(129-256kb) -+ 7.unlock 1st chunk lock, heal the second chunk and so on. - - -Thus after 4, any client FOP could write to regions that was not currently under heal. The exception was truncate (to size 0) because it needs full file lock and will always block because some chunk is always under lock by the shd until heal completes. - -Another issue was that 2 shds could run in parallel. Say SHD1 and SHD2 compete for step 1. Let SHD1 win. It proceeds and completes step 4. Now SHD2 also succeeds in step 1, continues all steps. Thus at the end both shds will decrement the changelog leading to negative values in it) - -### version 3 -To prevent parallel self heals, another domain was introduced, let us call it SELF_HEAL_DOMAIN. With this domain, the following approach was adopted and is **the approach currently in use**: - -+ 1.shd takes (full inodelk on SELF_HEAL_DOMAIN) -+ 2.shd takes (full inodelk on DATA_DOMAIN) -+ 3.shd inspects xattrs to determine source/sink -+ 4.unlock full lock on DATA_DOMAIN -+ 5.take chunk lock(0-128kb) on DATA_DOMAIN -+ 6.heal -+ 7.take next chunk lock(129-256kb) on DATA_DOMAIN -+ 8.unlock 1st chunk lock, heal and so on. -+ 9.Finally release full lock on SELF_HEAL_DOMAIN - -Thus until one shd completes step 9, another shd cannot start step 1, solving the problem of simultaneous heals. -Note that the issue of truncate (to zero) FOP hanging still remains. -Also there are multiple network calls involved in this scheme. (lock,heal(ie read+write), unlock) per chunk. i.e 4 calls per chunk. - -### version 4 (ToDo) -Some improvements that need to be made in version 3: -* Reduce network calls using piggy backing. -* After taking chunk lock and healing, we need to unlock the lock before locking the next chunk. This gives a window for any pending truncate FOPs to succeed. If truncate succeeds, the heal of next chunk will fail (read returns zero) -and heal is stopped. *BUT* there is **yet another** issue: - -* shd does steps 1 to 4. Let's assume source is brick b1, sink is brick b2 . i.e xattrs are (0,1) and (0,0) on b1 and b2 respectively. Now before shd takes (0-128kb) lock, a client FOP takes it. -It modifies data but the FOP succeeds only on brick 2. writev returns success, and the attrs now read (0,1) (1,0). SHD takes over and heals. It had observed (0,1),(0,0) earlier -and thus goes ahead and copies stale 128Kb from brick 1 to brick2. Thus as far as application is concerned, `writev` returned success but bricks have stale data. -What needs to be done is `writev` must return success only if it succeeded on atleast one source brick (brick b1 in this case). Otherwise The heal still happens in reverse direction but as far as the application is concerned, it received an error. - -###Note on lock **domains** -We have used conceptual names in this document like DATA_DOMAIN/ METADATA_DOMAIN/ SELF_HEAL_DOMAIN. In the code, these are mapped to strings that are based on the AFR xlator name like so: - -DATA_DOMAIN --->"vol_name-replicate-n" - -METADATA_DOMAIN --->"vol_name-replicate-n:metadata" - -SELF_HEAL_DOMAIN -->"vol_name-replicate-n:self-heal" - -where vol_name is the name of the volume and 'n' is the replica subvolume index (starting from 0). diff --git a/doc/developer-guide/afr/afr.md b/doc/developer-guide/afr/afr.md deleted file mode 100644 index 566573a4e26..00000000000 --- a/doc/developer-guide/afr/afr.md +++ /dev/null @@ -1,191 +0,0 @@ -cluster/afr translator -====================== - -Locking -------- - -Before understanding replicate, one must understand two internal FOPs: - -### `GF_FILE_LK` - -This is exactly like `fcntl(2)` locking, except the locks are in a -separate domain from locks held by applications. - -### `GF_DIR_LK (loc_t *loc, char *basename)` - -This allows one to lock a name under a directory. For example, -to lock /mnt/glusterfs/foo, one would use the call: - -``` -GF_DIR_LK ({loc_t for "/mnt/glusterfs"}, "foo") -``` - -If one wishes to lock *all* the names under a particular directory, -supply the basename argument as `NULL`. - -The locks can either be read locks or write locks; consult the -function prototype for more details. - -Both these operations are implemented by the features/locks (earlier -known as posix-locks) translator. - -Basic design ------------- - -All FOPs can be classified into four major groups: - -### inode-read - -Operations that read an inode's data (file contents) or metadata (perms, etc.). - -access, getxattr, fstat, readlink, readv, stat. - -### inode-write - -Operations that modify an inode's data or metadata. - -chmod, chown, truncate, writev, utimens. - -### dir-read - -Operations that read a directory's contents or metadata. - -readdir, getdents, checksum. - -### dir-write - -Operations that modify a directory's contents or metadata. - -create, link, mkdir, mknod, rename, rmdir, symlink, unlink. - -Some of these make a subgroup in that they modify *two* different entries: -link, rename, symlink. - -### Others - -Other operations. - -flush, lookup, open, opendir, statfs. - -Algorithms ----------- - -Each of the four major groups has its own algorithm: - -### inode-read, dir-read - -1. Send a request to the first child that is up: - * if it fails: - * try the next available child - * if we have exhausted all children: - * return failure - -### inode-write - - All operations are done in parallel unless specified otherwise. - -1. Send a ``GF_FILE_LK`` request on all children for a write lock on the - appropriate region - (for metadata operations: entire file (0, 0) for writev: - (offset, offset+size of buffer)) - * If a lock request fails on a child: - * unlock all children - * try to acquire a blocking lock (`F_SETLKW`) on each child, serially. - If this fails (due to `ENOTCONN` or `EINVAL`): - Consider this child as dead for rest of transaction. -2. Mark all children as "pending" on all (alive) children (see below for -meaning of "pending"). - * If it fails on any child: - * mark it as dead (in transaction local state). -3. Perform operation on all (alive) children. - * If it fails on any child: - * mark it as dead (in transaction local state). -4. Unmark all successful children as not "pending" on all nodes. -5. Unlock region on all (alive) children. - -### dir-write - - The algorithm for dir-write is same as above except instead of holding - `GF_FILE_LK` locks we hold a GF_DIR_LK lock on the name being operated upon. - In case of link-type calls, we hold locks on both the operand names. - -"pending" ---------- - -The "pending" number is like a journal entry. A pending entry is an -array of 32-bit integers stored in network byte-order as the extended -attribute of an inode (which can be a directory as well). - -There are three keys corresponding to three types of pending operations: - -### `AFR_METADATA_PENDING` - -There are some metadata operations pending on this inode (perms, ctime/mtime, -xattr, etc.). - -### `AFR_DATA_PENDING` - -There is some data pending on this inode (writev). - -### `AFR_ENTRY_PENDING` - -There are some directory operations pending on this directory -(create, unlink, etc.). - -Self heal ---------- - -* On lookup, gather extended attribute data: - * If entry is a regular file: - * If an entry is present on one child and not on others: - * create entry on others. - * If entries exist but have different metadata (perms, etc.): - * consider the entry with the highest `AFR_METADATA_PENDING` number as - definitive and replicate its attributes on children. - * If entry is a directory: - * Consider the entry with the highest `AFR_ENTRY_PENDING` number as - definitive and replicate its contents on all children. - * If any two entries have non-matching types (i.e., one is file and - other is directory): - * Announce to the user via log that a split-brain situation has been - detected, and do nothing. -* On open, gather extended attribute data: - * Consider the file with the highest `AFR_DATA_PENDING` number as - the definitive one and replicate its contents on all other - children. - -During all self heal operations, appropriate locks must be held on all -regions/entries being affected. - -Inode scaling -------------- - -Inode scaling is necessary because if a situation arises where an inode number -is returned for a directory (by lookup) which was previously the inode number -of a file (as per FUSE's table), then FUSE gets horribly confused (consult a -FUSE expert for more details). - -To avoid such a situation, we distribute the 64-bit inode space equally -among all children of replicate. - -To illustrate: - -If c1, c2, c3 are children of replicate, they each get 1/3 of the available -inode space: - -------------- -- -- -- -- -- -- -- -- -- -- -- --- -Child: c1 c2 c3 c1 c2 c3 c1 c2 c3 c1 c2 ... -Inode number: 1 2 3 4 5 6 7 8 9 10 11 ... -------------- -- -- -- -- -- -- -- -- -- -- -- --- - -Thus, if lookup on c1 returns an inode number "2", it is scaled to "4" -(which is the second inode number in c1's space). - -This way we ensure that there is never a collision of inode numbers from -two different children. - -This reduction of inode space doesn't really reduce the usability of -replicate since even if we assume replicate has 1024 children (which would be a -highly unusual scenario), each child still has a 54-bit inode space: -$2^{54} \sim 1.8 \times 10^{16}$, which is much larger than any real -world requirement. diff --git a/doc/developer-guide/afr/self-heal-daemon.md b/doc/developer-guide/afr/self-heal-daemon.md deleted file mode 100644 index d5e081f5f49..00000000000 --- a/doc/developer-guide/afr/self-heal-daemon.md +++ /dev/null @@ -1,90 +0,0 @@ -Self-Heal Daemon -================ -The self-heal daemon (shd) is a glusterfs process that is responsible for healing files in a replicate/ disperse gluster volume. -Every server (brick) node of the volume runs one instance of the shd. So even if one node contains replicate/ disperse bricks of -multiple volumes, it would be healed by the same shd. - -This document only describes how the shd works for replicate (AFR) volumes. - -The shd is launched by glusterd when the volume starts (only if the volume includes a replicate configuration). The graph -of the shd process in every node contains the following: The io-stats which is the top most xlator, its children being the -replicate xlators (subvolumes) of *only* the bricks present in that particular node, and finally *all* the client xlators that are the children of the replicate xlators. - -The shd does two types of self-heal crawls: Index heal and Full heal. For both these types of crawls, the basic idea is the same: -For each file encountered while crawling, perform metadata, data and entry heals under appropriate locks. -* An overview of how each of these heals is performed is detailed in the 'Self-healing' section of *doc/features/afr-v1.md* -* The different file locks which the shd takes for each of these heals is detailed in *doc/developer-guide/afr/afr-locks-evolution.md* - -Metadata heal refers to healing extended attributes, mode and permissions of a file or directory. -Data heal refers to healing the file contents. -Entry self-heal refers to healing entries inside a directory. - -Index heal -========== -The index heal is done: - a) Every 600 seconds (can be changed via the `cluster.heal-timeout` volume option) - b) When it is explicitly triggered via the `gluster vol heal ` command - c) Whenever a replica brick that was down comes back up. - -Only one heal can be in progress at one time, irrespective of reason why it was triggered. If another heal is triggered before the first one completes, it will be queued. -Only one heal can be queued while the first one is running. If an Index heal is queued, it can be overridden by queuing a Full heal and not vice-versa. Also, before processing -each entry in index heal, a check is made if a full heal is queued. If it is, then the index heal is aborted so that the full heal can proceed. - -In index heal, each shd reads the entries present inside .glusterfs/indices/xattrop/ folder and triggers heal on each entry with appropriate locks. -The .glusterfs/indices/xattrop/ directory contains a base entry of the name "xattrop-". All other entries are hardlinks to the base entry. The -*names* of the hardlinks are the gfid strings of the files that may need heal. - -When a client (mount) performs an operation on the file, the index xlator present in each brick process adds the hardlinks in the pre-op phase of the FOP's transaction -and removes it in post-op phase if the operation is successful. Thus if an entry is present inside the .glusterfs/indices/xattrop directory when there is no I/O -happening on the file, it means the file needs healing (or atleast an examination if the brick crashed after the post-op completed but just before the removal of the hardlink). - -####Index heal steps: -

-In shd process of *each node* {
-        opendir +readdir (.glusterfs/indices/xattrop/)
-        for each entry inside it {
-                self_heal_entry() //Explained below.
-        }
-}
-
- -

-self_heal_entry() {
-        Call syncop_lookup(replicae subvolume) which eventually does {
-                take appropriate locks
-                determine source and sinks from AFR changelog xattrs	
-                perform whatever heal is needed (any of metadata, data and entry heal in that order)
-                clear changelog xattrs and hardlink inside .glusterfs/indices/xattrop/
-        }
-}
-
- -Note: -* If the gfid hardlink is present in the .glusterfs/indices/xattrop/ of both replica bricks, then each shd will try to heal the file but only one of them will be able to proceed due to the self-heal domain lock. - -* While processing entries inside .glusterfs/indices/xattrop/, if shd encounters an entry whose parent is yet to be healed, it will skip it and it will be picked up in the next crawl. - -* If a file is in data/ metadata split-brain, it will not be healed. - -* If a directory is in entry split-brain, a conservative merge will be performed, wherein after the merge, the entries of the directory will be a union of the entries in the replica pairs. - -Full heal -========= -A full heal is triggered by running `gluster vol heal full`. This command is usually run in disk replacement scenarios where the entire data is to be copied from one of the healthy bricks of the replica to the brick that was just replaced. - -Unlike the index heal which runs on the shd of every node in a replicate subvolume, the full heal is run only on the shd of one node per replicate subvolume: the node having the highest UUID. -i.e In a 2x2 volume made of 4 nodes N1, N2, N3 and N4, If UUID of N1>N2 and UUID N4 >N3, then the full crawl is carried out by the shds of N1 and N4.(Node UUID can be found in `/var/lib/glusterd/glusterd.info`) - -The full heal steps are almost identical to the index heal, except the heal is performed on each replica starting from the root of the volume: -

-In shd process of *highest UUID node per replica* {
-        opendir +readdir ("/")
-        for each entry inside it {
-                self_heal_entry()
-                if (entry == directory) {
-                        /* Recurse*/
-                        again opendir+readdir (directory) followed by self_heal_entry() of each entry.
-                }
-        }
-}
-
diff --git a/doc/developer-guide/coredump-analysis.md b/doc/developer-guide/coredump-analysis.md new file mode 100644 index 00000000000..16fa9165fd0 --- /dev/null +++ b/doc/developer-guide/coredump-analysis.md @@ -0,0 +1,55 @@ +This document explains how to analyze core-dumps obtained from regression +machines, with examples. +1) Download the core-tarball and extract it. +2) 'cd' into directory where the tarball is extracted. +~~~ +[root@atalur Downloads]# pwd +/home/atalur/Downloads +[root@atalur Downloads]# ls +build build-install-20150625_05_42_39.tar.bz2 lib64 usr +~~~ +3) Determine the core file you need to examine. There can be more than one core file. +You can list them from './build/install/cores' directory. +~~~ +[root@atalur Downloads]# ls build/install/cores/ +core.9341 liblist.txt liblist.txt.tmp +~~~ +In case you are unsure which binary generated the core-file, executing 'file' command on it will help. +~~~ +[root@atalur Downloads]# file ./build/install/cores/core.9341 +./build/install/cores/core.9341: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/build/install/sbin/glusterfsd -s slave26.cloud.gluster.org --volfile-id patchy' +~~~ +As seen, the core file was generated by glusterfsd binary, and path to it is provided (/build/install/sbin/glusterfsd). +4) Now, run the following command on the core: +~~~ +gdb -ex 'set sysroot ./' -ex 'core-file ./build/install/cores/core.xxx' +In this case, +gdb -ex 'set sysroot ./' -ex 'core-file ./build/install/cores/core.9341' ./build/install/sbin/glusterfsd +~~~ +5) You can cross check if all shared libraries are available and loaded by using 'info sharedlibrary' command from +inside gdb. +6) Once verified, usual gdb commands based on requirement can be used to debug the core. +'bt' or 'backtrace' from gdb of core used in examples: +~~~ +Core was generated by `/build/install/sbin/glusterfsd -s slave26.cloud.gluster.org --volfile-id patchy'. +Program terminated with signal SIGABRT, Aborted. +#0 0x00007f512a54e625 in raise () from ./lib64/libc.so.6 +(gdb) bt +#0 0x00007f512a54e625 in raise () from ./lib64/libc.so.6 +#1 0x00007f512a54fe05 in abort () from ./lib64/libc.so.6 +#2 0x00007f512a54774e in __assert_fail_base () from ./lib64/libc.so.6 +#3 0x00007f512a547810 in __assert_fail () from ./lib64/libc.so.6 +#4 0x00007f512b9fc434 in __gf_free (free_ptr=0x7f50f4000e50) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/mem-pool.c:304 +#5 0x00007f512b9b6657 in loc_wipe (loc=0x7f510c20d1a0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/xlator.c:685 +#6 0x00007f511cb8201d in mq_start_quota_txn_v2 (this=0x7f5118019b60, loc=0x7f510c20d2b8, ctx=0x7f50f4000bf0, contri=0x7f50f4000d60) + at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/marker/src/marker-quota.c:2921 +#7 0x00007f511cb82c55 in mq_initiate_quota_task (opaque=0x7f510c20d2b0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/marker/src/marker-quota.c:3199 +#8 0x00007f511cb81820 in mq_synctask (this=0x7f5118019b60, task=0x7f511cb829fa , spawn=_gf_false, loc=0x7f510c20d430, dict=0x0, buf=0x0, contri=0) + at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/marker/src/marker-quota.c:2789 +#9 0x00007f511cb82f82 in mq_initiate_quota_blocking_txn (this=0x7f5118019b60, loc=0x7f510c20d430) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/marker/src/marker-quota.c:3230 +#10 0x00007f511cb82844 in mq_reduce_parent_size_task (opaque=0x7f510c000df0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/marker/src/marker-quota.c:3117 +#11 0x00007f512ba0f9dc in synctask_wrap (old_task=0x7f510c0053e0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/syncop.c:370 +#12 0x00007f512a55f8f0 in ?? () from ./lib64/libc.so.6 +#13 0x0000000000000000 in ?? () +(gdb) +~~~ diff --git a/doc/developer-guide/daemon-management-framework.md b/doc/developer-guide/daemon-management-framework.md index cf29caa95ce..592192e665d 100644 --- a/doc/developer-guide/daemon-management-framework.md +++ b/doc/developer-guide/daemon-management-framework.md @@ -25,12 +25,9 @@ Data members & functions of different management objects - connection object - process object - online status - - Methods - -- manager, start, stop which can be abstracted as a common - methods or specific to service requirements - -- init API is invoked on demand of the service and currently integrated - into manager. - -- build method is to initialize the method pointers + - Methods - manager, start, stop which can be abstracted as a common methods + or specific to service requirements + - init API can be invoked using the service management object The above structures defines the skeleton of the daemon management framework. Introduction of new daemons in GlusterFS needs to inherit these properties. Any diff --git a/doc/developer-guide/data-structures/inode.md b/doc/developer-guide/data-structures/inode.md deleted file mode 100644 index a340ab9ca8e..00000000000 --- a/doc/developer-guide/data-structures/inode.md +++ /dev/null @@ -1,226 +0,0 @@ -#Inode and dentry management in GlusterFS: - -##Background -Filesystems internally refer to files and directories via inodes. Inodes -are unique identifiers of the entities stored in a filesystem. Whenever an -application has to operate on a file/directory (read/modify), the filesystem -maps that file/directory to the right inode and start referring to that inode -whenever an operation has to be performed on the file/directory. - -In GlusterFS a new inode gets created whenever a new file/directory is created -OR when a successful lookup is done on a file/directory for the first time. -Inodes in GlusterFS are maintained by the inode table which gets initiated when -the filesystem daemon is started (both for the brick process as well as the -mount process). Below are some important data structures for inode management. - -## Data-structure (inode-table) -``` -struct _inode_table { - pthread_mutex_t lock; - size_t hashsize; /* bucket size of inode hash and dentry hash */ - char *name; /* name of the inode table, just for gf_log() */ - inode_t *root; /* root directory inode, with inode - number and gfid 1 */ - xlator_t *xl; /* xlator to be called to do purge and - the xlator which maintains the inode table*/ - uint32_t lru_limit; /* maximum LRU cache size */ - struct list_head *inode_hash; /* buckets for inode hash table */ - struct list_head *name_hash; /* buckets for dentry hash table */ - struct list_head active; /* list of inodes currently active (in an fop) */ - uint32_t active_size; /* count of inodes in active list */ - struct list_head lru; /* list of inodes recently used. - lru.next most recent */ - uint32_t lru_size; /* count of inodes in lru list */ - struct list_head purge; /* list of inodes to be purged soon */ - uint32_t purge_size; /* count of inodes in purge list */ - - struct mem_pool *inode_pool; /* memory pool for inodes */ - struct mem_pool *dentry_pool; /* memory pool for dentrys */ - struct mem_pool *fd_mem_pool; /* memory pool for fd_t */ - int ctxcount; /* number of slots in inode->ctx */ -}; -``` - -#Life-cycle -``` - -inode_table_new (size_t lru_limit, xlator_t *xl) - -This is a function which allocates a new inode table. Usually the top xlators in -the graph such as protocol/server (for bricks), fuse and nfs (for fuse and nfs -mounts) and libgfapi do inode managements. Hence they are the ones which will -allocate a new inode table by calling the above function. - -Each xlator graph in glusterfs maintains an inode table. So in fuse clients, -whenever there is a graph change due to add brick/remove brick or -addition/removal of some other xlators, a new graph is created which creates a -new inode table. - -Thus an allocated inode table is destroyed only when the filesystem daemon is -killed or unmounted. - -``` - -#what it contains. -``` - -Inode table in glusterfs mainly contains a hash table for maintaining inodes. -In general a file/directory is considered to be existing if there is a -corresponding inode present in the inode table. If a inode for a file/directory -cannot be found in the inode table, glusterfs tries to resolve it by sending a -lookup on the entry for which the inode is needed. If lookup is successful, then -a new inode correponding to the entry is added to the hash table present in the -inode table. Thus an inode present in the hash-table means, its an existing -file/directory within the filesystem. The inode table also contains the hash -size of the hash table (as of now it is hard coded to 14057. The hash value of -a inode is calculated using its gfid). - -Apart from the hash table, inode table also maintains 3 important list of inodes -1) Active list: -Active list contains all the active inodes (i.e inodes which are currently part -of some fop). -2) Lru list: -Least recently used inodes list. A limit can be set for the size of the lru -list. For bricks it is 16384 and for clients it is infinity. -3) Purge list: -List of all the inodes which have to be purged (i.e inodes which have to be -deleted from the inode table due to unlink/rmdir/forget). - -And at last it also contains the mem-pool for allocating inodes, dentries so -that frequent malloc/calloc and free of the data structures can be avoided. -``` - -#Data structure (inode) -``` -struct _inode { - inode_table_t *table; /* the table this inode belongs to */ - uuid_t gfid; /* unique identifier of the inode */ - gf_lock_t lock; - uint64_t nlookup; - uint32_t fd_count; /* Open fd count */ - uint32_t ref; /* reference count on this inode */ - ia_type_t ia_type; /* what kind of file */ - struct list_head fd_list; /* list of open files on this inode */ - struct list_head dentry_list; /* list of directory entries for this inode */ - struct list_head hash; /* hash table pointers */ - struct list_head list; /* active/lru/purge */ - - struct _inode_ctx *_ctx; /* place holder for keeping the - information about the inode by different xlators */ -}; - -As said above, inodes are internal way of identifying the files/directories. A -inode uniquely represents a file/directory. A new inode is created whenever a -create/mkdir/symlink/mknod operations are performed. Apart from that a new inode -is created upon the successful fresh lookup of a file/directory. Say the -filesystem contained some file "a" within root and the filesystem was -unmounted. Now when glusterfs is mounted and some operation is perfomed on "/a", -glusterfs tries to get the inode for the entry "a" with parent inode as -root. But, since glusterfs just came up, it will not be able to find the inode -for "a" and will send a lookup on "/a". If the lookup operation succeeds (i.e. -the root of glusterfs contains an entry called "a"), then a new inode for "/a" -is created and added to the inode table. - -Depending upon the situation, an inode can be in one of the 3 lists maintained -by the inode table. If some fop is happening on the inode, then the inode will -be present in the active inodes list maintained by the inode table. Active -inodes are those inodes whose refcount is greater than zero. Whenever some -operation comes on a file/directory, and the resolver tries to find the inode -for it, it increments the refcount of the inode before returning the inode. The -refcount of an inode can be incremented by calling the below function - -inode_ref (inode_t *inode) - -Any xlator which wants to operate on a inode as part of some fop (or wants the -inode in the callback), should hold a ref on the inode. -Once the fop is completed before sending the reply of the fop to the above -layers , the inode has to be unrefed. When the refcount of an inode becomes -zero, it is removed from the active inodes list and put into LRU list maintained -by the inode table. Thus in short if some fop is happening on a file/directory, -the corresponding inode will be in the active list or it will be in the LRU -list. -``` - -#Life Cycle - -A new inode is created whenever a new file/directory/symlink is created OR a -successful lookup of an existing entry is done. The xlators which does inode -management (as of now protocol/server, fuse, nfs, gfapi) will perform inode_link -operation upon successful lookup or successful creation of a new entry. - -inode_link (inode_t *inode, inode_t *parent, const char *name, - struct iatt *buf); - -inode_link actually adds the inode to the inode table (to be precise it adds -the inode to the hash table maintained by the inode table. The hash value is -calculated based on the gfid). Copies the gfid to the inode (the gfid is -present in the iatt structure). Creates a dentry with the new name. - -A inode is removed from the inode table and eventually destroyed when unlink -or rmdir operation is performed on a file/directory, or the the lru limit of -the inode table has been exceeded. - -#Data structure (dentry) -``` - -struct _dentry { - struct list_head inode_list; /* list of dentries of inode */ - struct list_head hash; /* hash table pointers */ - inode_t *inode; /* inode of this directory entry */ - char *name; /* name of the directory entry */ - inode_t *parent; /* directory of the entry */ -}; - -A dentry is the presence of an entry for a file/directory within its parent -directory. A dentry usually points to the inode to which it belongs to. In -glusterfs a dentry contains the following fields. -1) a hook using which it can add itself to the list of -the dentries maintained by the inode to which it points to. -2) A hash table pointer. -3) Pointer to the inode to which it belongs to. -4) Name of the dentry -5) Pointer to the inode of the parent directory in which the dentry is present - -A new dentry is created when a new file/directory/symlink is created or a hard -link to an existing file is created. - -__dentry_create (inode_t *inode, inode_t *parent, const char *name); - -A dentry holds a refcount on the parent -directory so that the parent inode is never removed from the active inode's list -and put to the lru list (If the lru limit of the lru list is exceeded, there is -a chance of parent inode being destroyed. To avoid it, the dentries hold a -reference to the parent inode). A dentry is removed whenevern a unlink/rmdir -is perfomed on a file/directory. Or when the lru limit has been exceeded, the -oldest inodes are purged out of the inode table, during which all the dentries -of the inode are removed. - -Whenever a unlink/rmdir comes on a file/directory, the corresponding inode -should be removed from the inode table. So upon unlink/rmdir, the inode will -be moved to the purge list maintained by the inode table and from there it is -destroyed. To be more specific, if a inode has to be destroyed, its refcount -and nlookup count both should become 0. For refcount to become 0, the inode -should not be part of any fop (there should not be any open fds). Or if the -inode belongs to a directory, then there should not be any fop happening on the -directory and it should not contain any dentries within it. For nlookup count to -become zero, a forget has to be sent on the inode with nlookup count set to 0 as -an argument. For fuse clients, forget is sent by the kernel itself whenever a -unlink/rmdir is performed. But for brick processes, upon unlink/rmdir, the -protocol/server itself has to do inode_forget. Whenever the inode has to be -deleted due to file removal or lru limit being exceeded the inode is retired -(i.e. all the dentries of the inode are deleted and the inode is moved to the -purge list maintained by the inode table), the nlookup count is set to 0 via -inode_forget api. The inode table, then prunes all the inodes from the purge -list by destroying the inode contexts maintained by each xlator. - -unlinking of the dentry is done via inode_unlink; - -void -inode_unlink (inode_t *inode, inode_t *parent, const char *name); - -If the inode has multiple hard links, then the unlink operation performed by -the application results just in the removal of the dentry with the name provided -by the application. For the inode to be removed, all the dentries of the inode -should be unlinked. -``` - diff --git a/doc/developer-guide/data-structures/iobuf.md b/doc/developer-guide/data-structures/iobuf.md deleted file mode 100644 index 5f521f1485f..00000000000 --- a/doc/developer-guide/data-structures/iobuf.md +++ /dev/null @@ -1,259 +0,0 @@ -#Iobuf-pool -##Datastructures -###iobuf -Short for IO Buffer. It is one allocatable unit for the consumers of the IOBUF -API, each unit hosts @page_size(defined in arena structure) bytes of memory. As -initial step of processing a fop, the IO buffer passed onto GlusterFS by the -other applications (FUSE VFS/ Applications using gfapi) is copied into GlusterFS -space i.e. iobufs. Hence Iobufs are mostly allocated/deallocated in Fuse, gfapi, -protocol xlators, and also in performance xlators to cache the IO buffers etc. -``` -struct iobuf { - union { - struct list_head list; - struct { - struct iobuf *next; - struct iobuf *prev; - }; - }; - struct iobuf_arena *iobuf_arena; - - gf_lock_t lock; /* for ->ptr and ->ref */ - int ref; /* 0 == passive, >0 == active */ - - void *ptr; /* usable memory region by the consumer */ - - void *free_ptr; /* in case of stdalloc, this is the - one to be freed not the *ptr */ -}; -``` - -###iobref -There may be need of multiple iobufs for a single fop, like in vectored read/write. -Hence multiple iobufs(default 16) are encapsulated under one iobref. -``` -struct iobref { - gf_lock_t lock; - int ref; - struct iobuf **iobrefs; /* list of iobufs */ - int alloced; /* 16 by default, grows as required */ - int used; /* number of iobufs added to this iobref */ -}; -``` -###iobuf_arenas -One region of memory MMAPed from the operating system. Each region MMAPs -@arena_size bytes of memory, and hosts @arena_size / @page_size IOBUFs. -The same sized iobufs are grouped into one arena, for sanity of access. - -``` -struct iobuf_arena { - union { - struct list_head list; - struct { - struct iobuf_arena *next; - struct iobuf_arena *prev; - }; - }; - - size_t page_size; /* size of all iobufs in this arena */ - size_t arena_size; /* this is equal to - (iobuf_pool->arena_size / page_size) - * page_size */ - size_t page_count; - - struct iobuf_pool *iobuf_pool; - - void *mem_base; - struct iobuf *iobufs; /* allocated iobufs list */ - - int active_cnt; - struct iobuf active; /* head node iobuf - (unused by itself) */ - int passive_cnt; - struct iobuf passive; /* head node iobuf - (unused by itself) */ - uint64_t alloc_cnt; /* total allocs in this pool */ - int max_active; /* max active buffers at a given time */ -}; - -``` -###iobuf_pool -Pool of Iobufs. As there may be many Io buffers required by the filesystem, -a pool of iobufs are preallocated and kept, if these preallocated ones are -exhausted only then the standard malloc/free is called, thus improving the -performance. Iobuf pool is generally one per process, allocated during -glusterfs_ctx_t init (glusterfs_ctx_defaults_init), currently the preallocated -iobuf pool memory is freed on process exit. Iobuf pool is globally accessible -across GlusterFs, hence iobufs allocated by any xlator can be accessed by any -other xlators(unless iobuf is not passed). -``` -struct iobuf_pool { - pthread_mutex_t mutex; - size_t arena_size; /* size of memory region in - arena */ - size_t default_page_size; /* default size of iobuf */ - - int arena_cnt; - struct list_head arenas[GF_VARIABLE_IOBUF_COUNT]; - /* array of arenas. Each element of the array is a list of arenas - holding iobufs of particular page_size */ - - struct list_head filled[GF_VARIABLE_IOBUF_COUNT]; - /* array of arenas without free iobufs */ - - struct list_head purge[GF_VARIABLE_IOBUF_COUNT]; - /* array of of arenas which can be purged */ - - uint64_t request_misses; /* mostly the requests for higher - value of iobufs */ -}; -``` -~~~ -The default size of the iobuf_pool(as of yet): -1024 iobufs of 128Bytes = 128KB -512 iobufs of 512Bytes = 256KB -512 iobufs of 2KB = 1MB -128 iobufs of 8KB = 1MB -64 iobufs of 32KB = 2MB -32 iobufs of 128KB = 4MB -8 iobufs of 256KB = 2MB -2 iobufs of 1MB = 2MB -Total ~13MB -~~~ -As seen in the datastructure iobuf_pool has 3 arena lists. - -- arenas: -The arenas allocated during iobuf_pool create, are part of this list. This list -also contains arenas that are partially filled i.e. contain few active and few -passive iobufs (passive_cnt !=0, active_cnt!=0 except for initially allocated -arenas). There will be by default 8 arenas of the sizes mentioned above. -- filled: -If all the iobufs in the arena are filled(passive_cnt = 0), the arena is moved -to the filled list. If any of the iobufs from the filled arena is iobuf_put, -then the arena moves back to the 'arenas' list. -- purge: -If there are no active iobufs in the arena(active_cnt = 0), the arena is moved -to purge list. iobuf_put() triggers destruction of the arenas in this list. The -arenas in the purge list are destroyed only if there is atleast one arena in -'arenas' list, that way there won't be spurious mmap/unmap of buffers. -(e.g: If there is an arena (page_size=128KB, count=32) in purge list, this arena -is destroyed(munmap) only if there is an arena in 'arenas' list with page_size=128KB). - -##APIs -###iobuf_get - -``` -struct iobuf *iobuf_get (struct iobuf_pool *iobuf_pool); -``` -Creates a new iobuf of the default page size(128KB hard coded as of yet). -Also takes a reference(increments ref count), hence no need of doing it -explicitly after getting iobuf. - -###iobuf_get2 - -``` -struct iobuf * iobuf_get2 (struct iobuf_pool *iobuf_pool, size_t page_size); -``` -Creates a new iobuf of a specified page size, if page_size=0 default page size -is considered. -``` -if (requested iobuf size > Max iobuf size in the pool(1MB as of yet)) - { - Perform standard allocation(CALLOC) of the requested size and - add it to the list iobuf_pool->arenas[IOBUF_ARENA_MAX_INDEX]. - } - else - { - -Round the page size to match the stndard sizes in iobuf pool. - (eg: if 3KB is requested, it is rounded to 8KB). - -Select the arena list corresponding to the rounded size - (eg: select 8KB arena) - If the selected arena has passive count > 0, then return the - iobuf from this arena, set the counters(passive/active/etc.) - appropriately. - else the arena is full, allocate new arena with rounded size - and standard page numbers and add to the arena list - (eg: 128 iobufs of 8KB is allocated). - } -``` -Also takes a reference(increments ref count), hence no need of doing it -explicitly after getting iobuf. - -###iobuf_ref - -``` -struct iobuf *iobuf_ref (struct iobuf *iobuf); -``` - Take a reference on the iobuf. If using an iobuf allocated by some other -xlator/function/, its a good practice to take a reference so that iobuf is not -deleted by the allocator. - -###iobuf_unref -``` -void iobuf_unref (struct iobuf *iobuf); -``` -Unreference the iobuf, if the ref count is zero iobuf is considered free. - -``` - -Delete the iobuf, if allocated from standard alloc and return. - -set the active/passive count appropriately. - -if passive count > 0 then add the arena to 'arena' list. - -if active count = 0 then add the arena to 'purge' list. -``` -Every iobuf_ref should have a corresponding iobuf_unref, and also every -iobuf_get/2 should have a correspondning iobuf_unref. - -###iobref_new -``` -struct iobref *iobref_new (); -``` -Creates a new iobref structure and returns its pointer. - -###iobref_ref -``` -struct iobref *iobref_ref (struct iobref *iobref); -``` -Take a reference on the iobref. - -###iobref_unref -``` -void iobref_unref (struct iobref *iobref); -``` -Decrements the reference count of the iobref. If the ref count is 0, then unref -all the iobufs(iobuf_unref) in the iobref, and destroy the iobref. - -###iobref_add -``` -int iobref_add (struct iobref *iobref, struct iobuf *iobuf); -``` -Adds the given iobuf into the iobref, it takes a ref on the iobuf before adding -it, hence explicit iobuf_ref is not required if adding to the iobref. - -###iobref_merge -``` -int iobref_merge (struct iobref *to, struct iobref *from); -``` -Adds all the iobufs in the 'from' iobref to the 'to' iobref. Merge will not -cause the delete of the 'from' iobref, therefore it will result in another ref -on all the iobufs added to the 'to' iobref. Hence iobref_unref should be -performed both on 'from' and 'to' iobrefs (performing iobref_unref only on 'to' -will not free the iobufs and may result in leak). - -###iobref_clear -``` -void iobref_clear (struct iobref *iobref); -``` -Unreference all the iobufs in the iobref, and also unref the iobref. - -##Iobuf Leaks -If all iobuf_refs/iobuf_new do not have correspondning iobuf_unref, then the -iobufs are not freed and recurring execution of such code path may lead to huge -memory leaks. The easiest way to identify if a memory leak is caused by iobufs -is to take a statedump. If the statedump shows a lot of filled arenas then it is -a sure sign of leak. Refer doc/debugging/statedump.md for more details. - -If iobufs are leaking, the next step is to find where the iobuf_unref went -missing. There is no standard/easy way of debugging this, code reading and logs -are the only ways. If there is a liberty to reproduce the memory leak at will, -then logs(gf_callinginfo) in iobuf_ref/unref might help. -TODO: A easier way to debug iobuf leaks. diff --git a/doc/developer-guide/data-structures/mem-pool.md b/doc/developer-guide/data-structures/mem-pool.md deleted file mode 100644 index c71aa2a8ddd..00000000000 --- a/doc/developer-guide/data-structures/mem-pool.md +++ /dev/null @@ -1,124 +0,0 @@ -#Mem-pool -##Background -There was a time when every fop in glusterfs used to incur cost of allocations/de-allocations for every stack wind/unwind between xlators because stack/frame/*_localt_t in every wind/unwind was allocated and de-allocated. Because of all these system calls in the fop path there was lot of latency and the worst part is that most of the times the number of frames/stacks active at any time wouldn't cross a threshold. So it was decided that this threshold number of frames/stacks would be allocated in the beginning of the process only once. Get one of them from the pool of stacks/frames whenever `STACK_WIND` is performed and put it back into the pool in `STACK_UNWIND`/`STACK_DESTROY` without incurring any extra system calls. The data structures are allocated only when threshold number of such items are in active use i.e. pool is in complete use.% increase in the performance once this was added to all the common data structures (inode/fd/dict etc) in xlators throughout the stack was tremendous. - -## Data structure -``` -struct mem_pool { - struct list_head list; /*Each member in the mempool is element padded with a doubly-linked-list + ptr of mempool + is-in --use info. This list is used to add the element to the list of free members in the mem-pool*/ - int hot_count;/*number of mempool elements that are in active use*/ - int cold_count;/*number of mempool elements that are not in use. If a new allocation is required it -will be served from here until all the elements in the pool are in use i.e. cold-count becomes 0.*/ - gf_lock_t lock;/*synchronization mechanism*/ - unsigned long padded_sizeof_type;/*Each mempool element is padded with a doubly-linked-list + ptr of mempool + is-in --use info to operate the pool of elements, this size is the element-size after padding*/ - void *pool;/*Starting address of pool*/ - void *pool_end;/*Ending address of pool*/ -/* If an element address is in the range between pool, pool_end addresses then it is alloced from the pool otherwise it is 'calloced' this is very useful for functions like 'mem_put'*/ - int real_sizeof_type;/* size of just the element without any padding*/ - uint64_t alloc_count; /*Number of times this type of data is allocated through out the life of this process. This may include calloced elements as well*/ - uint64_t pool_misses; /*Number of times the element had to be allocated from heap because all elements from the pool are in active use.*/ - int max_alloc; /*Maximum number of elements from the pool in active use at any point in the life of the process. This does *not* include calloced elements*/ - int curr_stdalloc;/*Number of elements that are allocated from heap at the moment because the pool is in completed use. It should be '0' when pool is not in complete use*/ - int max_stdalloc;/*Maximum number of allocations from heap after the pool is completely used that are in active use at any point in the life of the process.*/ - char *name; /*Contains xlator-name:data-type as a string - struct list_head global_list;/*This is used to insert it into the global_list of mempools maintained in 'glusterfs-ctx' -}; -``` - -##Life-cycle -``` -mem_pool_new (data_type, unsigned long count) - -This is a macro which expands to mem_pool_new_fn (sizeof (data_type), count, string-rep-of-data_type) - -struct mem_pool * -mem_pool_new_fn (unsigned long sizeof_type, unsigned long count, char *name) - -Padded-element: - ---------------------------------------- -|list-ptr|mem-pool-address|in-use|Element| - ---------------------------------------- - ``` - -This function allocates the `mem-pool` structure and sets up the pool for use. -`name` parameter above is the `string` containing type of the datatype. This `name` is appended to `xlator-name + ':'` so that it can be easily identified in things like statedump. `count` is the number of elements that need to be allocated. `sizeof_type` is the size of each element. Ideally `('sizeof_type'*'count')` should be the size of the total pool. But to manage the pool using `mem_get`/`mem_put` (will be explained after this section) each element needs to be padded in the front with a `('list', 'mem-pool-address', 'in_use')`. So the actual size of the pool it allocates will be `('padded_sizeof_type'*'count')`. Why these extra elements are needed will be evident after understanding how `mem_get` and `mem_put` are implemented. In this function it just initializes all the `list` structures in front of each element and adds them to the `mem_pool->list` which represent the list of `cold` elements which can be allocated whenever `mem_get` is called on this mem_pool. It remembers mem_pool's start and end addresses in `mem_pool->pool`, `mem_pool->pool_end` respectively. Initializes `mem_pool->cold_count` to `count` and `mem_pool->hot_count` to `0`. This mem-pool will be added to the list of `global_list` maintained in `glusterfs-ctx` - - -``` -void* mem_get (struct mem_pool *mem_pool) - -Initial-list before mem-get ----------------- -| Pool | -| ----------- | ---------------------------------------- ---------------------------------------- -| | pool-list | |<---> |list-ptr|mem-pool-address|in-use|Element|<--->|list-ptr|mem-pool-address|in-use|Element| -| ----------- | ---------------------------------------- ---------------------------------------- ----------------- - -list after mem-get from the pool ----------------- -| Pool | -| ----------- | ---------------------------------------- -| | pool-list | |<--->|list-ptr|mem-pool-address|in-use|Element| -| ----------- | ---------------------------------------- ----------------- - -List when the pool is full: - ---------------- -| Pool | extra element that is allocated -| ----------- | ---------------------------------------- -| | pool-list | | |list-ptr|mem-pool-address|in-use|Element| -| ----------- | ---------------------------------------- - ---------------- -``` - -This function is similar to `malloc()` but it gives memory of type `element` of this pool. When this function is called it increments `mem_pool->alloc_count`, checks if there are any free elements in the pool that can be returned by inspecting `mem_pool->cold_count`. If `mem_pool->cold_count` is non-zero then it means there are elements in the pool which are not in active use. It deletes one element from the list of free elements and decrements `mem_pool->cold_count` and increments `mem_pool->hot_count` to indicate there is one more element in active use. Updates `mem_pool->max_alloc` accordingly. Sets `element->in_use` in the padded memory to `1`. Sets `element->mem_pool` address to this mem_pool also in the padded memory(It is useful for mem_put). Returns the address of the memory after the padded boundary to the caller of this function. In the cases where all the elements in the pool are in active use it `callocs` the element with padded size and sets mem_pool address in the padded memory. To indicate the pool-miss and give useful accounting information of the pool-usage it increments `mem_pool->pool_misses`, `mem_pool->curr_stdalloc`. Updates `mem_pool->max_stdalloc` accordingly. - -``` -void* mem_get0 (struct mem_pool *mem_pool) -``` -Just like `calloc` is to `malloc`, `mem_get0` is to `mem_get`. It memsets the memory to all '0' before returning the element. - - -``` -void mem_put (void *ptr) - -list before mem-put from the pool - ---------------- -| Pool | -| ----------- | ---------------------------------------- -| | pool-list | |<--->|list-ptr|mem-pool-address|in-use|Element| -| ----------- | ---------------------------------------- - ---------------- - -list after mem-put to the pool - ---------------- -| Pool | -| ----------- | ---------------------------------------- ---------------------------------------- -| | pool-list | |<---> |list-ptr|mem-pool-address|in-use|Element|<--->|list-ptr|mem-pool-address|in-use|Element| -| ----------- | ---------------------------------------- ---------------------------------------- - ---------------- - -If mem_put is putting an element not from pool then it is just freed so -no change to the pool - ---------------- -| Pool | -| ----------- | -| | pool-list | | -| ----------- | - ---------------- -``` - -This function is similar to `free()`. Remember that ptr passed to this function is the address of the element, so this function gets the ptr to its head of the padding in front of it. If this memory falls in bettween `mem_pool->pool`, `mem_pool->pool_end` then the memory is part of the 'pool' memory that is allocated so it does some sanity checks to see if the memory is indeed head of the element by checking if `in_use` is set to `1`. It resets `in_use` to `0`. It gets the mem_pool address stored in the padded region and adds this element to the list of free elements. Decreases `mem_pool->hot_count` increases `mem_pool->cold_count`. In the case where padded-element address does not fall in the range of `mem_pool->pool`, `mem_pool->pool_end` it just frees the element and decreases `mem_pool->curr_stdalloc`. - -``` -void -mem_pool_destroy (struct mem_pool *pool) -``` -Deletes this pool from the `global_list` maintained by `glusterfs-ctx` and frees all the memory allocated in `mem_pool_new`. - - -###How to pick pool-size -This varies from work-load to work-load. Create the mem-pool with some random size and run the work-load. Take the statedump after the work-load is complete. In the statedump if `max_alloc` is always less than `cold_count` may be reduce the size of the pool closer to `max_alloc`. On the otherhand if there are lots of `pool-misses` then increase the `pool_size` by `max_stdalloc` to achieve better 'hit-rate' of the pool. diff --git a/doc/developer-guide/datastructure-inode.md b/doc/developer-guide/datastructure-inode.md new file mode 100644 index 00000000000..a340ab9ca8e --- /dev/null +++ b/doc/developer-guide/datastructure-inode.md @@ -0,0 +1,226 @@ +#Inode and dentry management in GlusterFS: + +##Background +Filesystems internally refer to files and directories via inodes. Inodes +are unique identifiers of the entities stored in a filesystem. Whenever an +application has to operate on a file/directory (read/modify), the filesystem +maps that file/directory to the right inode and start referring to that inode +whenever an operation has to be performed on the file/directory. + +In GlusterFS a new inode gets created whenever a new file/directory is created +OR when a successful lookup is done on a file/directory for the first time. +Inodes in GlusterFS are maintained by the inode table which gets initiated when +the filesystem daemon is started (both for the brick process as well as the +mount process). Below are some important data structures for inode management. + +## Data-structure (inode-table) +``` +struct _inode_table { + pthread_mutex_t lock; + size_t hashsize; /* bucket size of inode hash and dentry hash */ + char *name; /* name of the inode table, just for gf_log() */ + inode_t *root; /* root directory inode, with inode + number and gfid 1 */ + xlator_t *xl; /* xlator to be called to do purge and + the xlator which maintains the inode table*/ + uint32_t lru_limit; /* maximum LRU cache size */ + struct list_head *inode_hash; /* buckets for inode hash table */ + struct list_head *name_hash; /* buckets for dentry hash table */ + struct list_head active; /* list of inodes currently active (in an fop) */ + uint32_t active_size; /* count of inodes in active list */ + struct list_head lru; /* list of inodes recently used. + lru.next most recent */ + uint32_t lru_size; /* count of inodes in lru list */ + struct list_head purge; /* list of inodes to be purged soon */ + uint32_t purge_size; /* count of inodes in purge list */ + + struct mem_pool *inode_pool; /* memory pool for inodes */ + struct mem_pool *dentry_pool; /* memory pool for dentrys */ + struct mem_pool *fd_mem_pool; /* memory pool for fd_t */ + int ctxcount; /* number of slots in inode->ctx */ +}; +``` + +#Life-cycle +``` + +inode_table_new (size_t lru_limit, xlator_t *xl) + +This is a function which allocates a new inode table. Usually the top xlators in +the graph such as protocol/server (for bricks), fuse and nfs (for fuse and nfs +mounts) and libgfapi do inode managements. Hence they are the ones which will +allocate a new inode table by calling the above function. + +Each xlator graph in glusterfs maintains an inode table. So in fuse clients, +whenever there is a graph change due to add brick/remove brick or +addition/removal of some other xlators, a new graph is created which creates a +new inode table. + +Thus an allocated inode table is destroyed only when the filesystem daemon is +killed or unmounted. + +``` + +#what it contains. +``` + +Inode table in glusterfs mainly contains a hash table for maintaining inodes. +In general a file/directory is considered to be existing if there is a +corresponding inode present in the inode table. If a inode for a file/directory +cannot be found in the inode table, glusterfs tries to resolve it by sending a +lookup on the entry for which the inode is needed. If lookup is successful, then +a new inode correponding to the entry is added to the hash table present in the +inode table. Thus an inode present in the hash-table means, its an existing +file/directory within the filesystem. The inode table also contains the hash +size of the hash table (as of now it is hard coded to 14057. The hash value of +a inode is calculated using its gfid). + +Apart from the hash table, inode table also maintains 3 important list of inodes +1) Active list: +Active list contains all the active inodes (i.e inodes which are currently part +of some fop). +2) Lru list: +Least recently used inodes list. A limit can be set for the size of the lru +list. For bricks it is 16384 and for clients it is infinity. +3) Purge list: +List of all the inodes which have to be purged (i.e inodes which have to be +deleted from the inode table due to unlink/rmdir/forget). + +And at last it also contains the mem-pool for allocating inodes, dentries so +that frequent malloc/calloc and free of the data structures can be avoided. +``` + +#Data structure (inode) +``` +struct _inode { + inode_table_t *table; /* the table this inode belongs to */ + uuid_t gfid; /* unique identifier of the inode */ + gf_lock_t lock; + uint64_t nlookup; + uint32_t fd_count; /* Open fd count */ + uint32_t ref; /* reference count on this inode */ + ia_type_t ia_type; /* what kind of file */ + struct list_head fd_list; /* list of open files on this inode */ + struct list_head dentry_list; /* list of directory entries for this inode */ + struct list_head hash; /* hash table pointers */ + struct list_head list; /* active/lru/purge */ + + struct _inode_ctx *_ctx; /* place holder for keeping the + information about the inode by different xlators */ +}; + +As said above, inodes are internal way of identifying the files/directories. A +inode uniquely represents a file/directory. A new inode is created whenever a +create/mkdir/symlink/mknod operations are performed. Apart from that a new inode +is created upon the successful fresh lookup of a file/directory. Say the +filesystem contained some file "a" within root and the filesystem was +unmounted. Now when glusterfs is mounted and some operation is perfomed on "/a", +glusterfs tries to get the inode for the entry "a" with parent inode as +root. But, since glusterfs just came up, it will not be able to find the inode +for "a" and will send a lookup on "/a". If the lookup operation succeeds (i.e. +the root of glusterfs contains an entry called "a"), then a new inode for "/a" +is created and added to the inode table. + +Depending upon the situation, an inode can be in one of the 3 lists maintained +by the inode table. If some fop is happening on the inode, then the inode will +be present in the active inodes list maintained by the inode table. Active +inodes are those inodes whose refcount is greater than zero. Whenever some +operation comes on a file/directory, and the resolver tries to find the inode +for it, it increments the refcount of the inode before returning the inode. The +refcount of an inode can be incremented by calling the below function + +inode_ref (inode_t *inode) + +Any xlator which wants to operate on a inode as part of some fop (or wants the +inode in the callback), should hold a ref on the inode. +Once the fop is completed before sending the reply of the fop to the above +layers , the inode has to be unrefed. When the refcount of an inode becomes +zero, it is removed from the active inodes list and put into LRU list maintained +by the inode table. Thus in short if some fop is happening on a file/directory, +the corresponding inode will be in the active list or it will be in the LRU +list. +``` + +#Life Cycle + +A new inode is created whenever a new file/directory/symlink is created OR a +successful lookup of an existing entry is done. The xlators which does inode +management (as of now protocol/server, fuse, nfs, gfapi) will perform inode_link +operation upon successful lookup or successful creation of a new entry. + +inode_link (inode_t *inode, inode_t *parent, const char *name, + struct iatt *buf); + +inode_link actually adds the inode to the inode table (to be precise it adds +the inode to the hash table maintained by the inode table. The hash value is +calculated based on the gfid). Copies the gfid to the inode (the gfid is +present in the iatt structure). Creates a dentry with the new name. + +A inode is removed from the inode table and eventually destroyed when unlink +or rmdir operation is performed on a file/directory, or the the lru limit of +the inode table has been exceeded. + +#Data structure (dentry) +``` + +struct _dentry { + struct list_head inode_list; /* list of dentries of inode */ + struct list_head hash; /* hash table pointers */ + inode_t *inode; /* inode of this directory entry */ + char *name; /* name of the directory entry */ + inode_t *parent; /* directory of the entry */ +}; + +A dentry is the presence of an entry for a file/directory within its parent +directory. A dentry usually points to the inode to which it belongs to. In +glusterfs a dentry contains the following fields. +1) a hook using which it can add itself to the list of +the dentries maintained by the inode to which it points to. +2) A hash table pointer. +3) Pointer to the inode to which it belongs to. +4) Name of the dentry +5) Pointer to the inode of the parent directory in which the dentry is present + +A new dentry is created when a new file/directory/symlink is created or a hard +link to an existing file is created. + +__dentry_create (inode_t *inode, inode_t *parent, const char *name); + +A dentry holds a refcount on the parent +directory so that the parent inode is never removed from the active inode's list +and put to the lru list (If the lru limit of the lru list is exceeded, there is +a chance of parent inode being destroyed. To avoid it, the dentries hold a +reference to the parent inode). A dentry is removed whenevern a unlink/rmdir +is perfomed on a file/directory. Or when the lru limit has been exceeded, the +oldest inodes are purged out of the inode table, during which all the dentries +of the inode are removed. + +Whenever a unlink/rmdir comes on a file/directory, the corresponding inode +should be removed from the inode table. So upon unlink/rmdir, the inode will +be moved to the purge list maintained by the inode table and from there it is +destroyed. To be more specific, if a inode has to be destroyed, its refcount +and nlookup count both should become 0. For refcount to become 0, the inode +should not be part of any fop (there should not be any open fds). Or if the +inode belongs to a directory, then there should not be any fop happening on the +directory and it should not contain any dentries within it. For nlookup count to +become zero, a forget has to be sent on the inode with nlookup count set to 0 as +an argument. For fuse clients, forget is sent by the kernel itself whenever a +unlink/rmdir is performed. But for brick processes, upon unlink/rmdir, the +protocol/server itself has to do inode_forget. Whenever the inode has to be +deleted due to file removal or lru limit being exceeded the inode is retired +(i.e. all the dentries of the inode are deleted and the inode is moved to the +purge list maintained by the inode table), the nlookup count is set to 0 via +inode_forget api. The inode table, then prunes all the inodes from the purge +list by destroying the inode contexts maintained by each xlator. + +unlinking of the dentry is done via inode_unlink; + +void +inode_unlink (inode_t *inode, inode_t *parent, const char *name); + +If the inode has multiple hard links, then the unlink operation performed by +the application results just in the removal of the dentry with the name provided +by the application. For the inode to be removed, all the dentries of the inode +should be unlinked. +``` + diff --git a/doc/developer-guide/datastructure-iobuf.md b/doc/developer-guide/datastructure-iobuf.md new file mode 100644 index 00000000000..5f521f1485f --- /dev/null +++ b/doc/developer-guide/datastructure-iobuf.md @@ -0,0 +1,259 @@ +#Iobuf-pool +##Datastructures +###iobuf +Short for IO Buffer. It is one allocatable unit for the consumers of the IOBUF +API, each unit hosts @page_size(defined in arena structure) bytes of memory. As +initial step of processing a fop, the IO buffer passed onto GlusterFS by the +other applications (FUSE VFS/ Applications using gfapi) is copied into GlusterFS +space i.e. iobufs. Hence Iobufs are mostly allocated/deallocated in Fuse, gfapi, +protocol xlators, and also in performance xlators to cache the IO buffers etc. +``` +struct iobuf { + union { + struct list_head list; + struct { + struct iobuf *next; + struct iobuf *prev; + }; + }; + struct iobuf_arena *iobuf_arena; + + gf_lock_t lock; /* for ->ptr and ->ref */ + int ref; /* 0 == passive, >0 == active */ + + void *ptr; /* usable memory region by the consumer */ + + void *free_ptr; /* in case of stdalloc, this is the + one to be freed not the *ptr */ +}; +``` + +###iobref +There may be need of multiple iobufs for a single fop, like in vectored read/write. +Hence multiple iobufs(default 16) are encapsulated under one iobref. +``` +struct iobref { + gf_lock_t lock; + int ref; + struct iobuf **iobrefs; /* list of iobufs */ + int alloced; /* 16 by default, grows as required */ + int used; /* number of iobufs added to this iobref */ +}; +``` +###iobuf_arenas +One region of memory MMAPed from the operating system. Each region MMAPs +@arena_size bytes of memory, and hosts @arena_size / @page_size IOBUFs. +The same sized iobufs are grouped into one arena, for sanity of access. + +``` +struct iobuf_arena { + union { + struct list_head list; + struct { + struct iobuf_arena *next; + struct iobuf_arena *prev; + }; + }; + + size_t page_size; /* size of all iobufs in this arena */ + size_t arena_size; /* this is equal to + (iobuf_pool->arena_size / page_size) + * page_size */ + size_t page_count; + + struct iobuf_pool *iobuf_pool; + + void *mem_base; + struct iobuf *iobufs; /* allocated iobufs list */ + + int active_cnt; + struct iobuf active; /* head node iobuf + (unused by itself) */ + int passive_cnt; + struct iobuf passive; /* head node iobuf + (unused by itself) */ + uint64_t alloc_cnt; /* total allocs in this pool */ + int max_active; /* max active buffers at a given time */ +}; + +``` +###iobuf_pool +Pool of Iobufs. As there may be many Io buffers required by the filesystem, +a pool of iobufs are preallocated and kept, if these preallocated ones are +exhausted only then the standard malloc/free is called, thus improving the +performance. Iobuf pool is generally one per process, allocated during +glusterfs_ctx_t init (glusterfs_ctx_defaults_init), currently the preallocated +iobuf pool memory is freed on process exit. Iobuf pool is globally accessible +across GlusterFs, hence iobufs allocated by any xlator can be accessed by any +other xlators(unless iobuf is not passed). +``` +struct iobuf_pool { + pthread_mutex_t mutex; + size_t arena_size; /* size of memory region in + arena */ + size_t default_page_size; /* default size of iobuf */ + + int arena_cnt; + struct list_head arenas[GF_VARIABLE_IOBUF_COUNT]; + /* array of arenas. Each element of the array is a list of arenas + holding iobufs of particular page_size */ + + struct list_head filled[GF_VARIABLE_IOBUF_COUNT]; + /* array of arenas without free iobufs */ + + struct list_head purge[GF_VARIABLE_IOBUF_COUNT]; + /* array of of arenas which can be purged */ + + uint64_t request_misses; /* mostly the requests for higher + value of iobufs */ +}; +``` +~~~ +The default size of the iobuf_pool(as of yet): +1024 iobufs of 128Bytes = 128KB +512 iobufs of 512Bytes = 256KB +512 iobufs of 2KB = 1MB +128 iobufs of 8KB = 1MB +64 iobufs of 32KB = 2MB +32 iobufs of 128KB = 4MB +8 iobufs of 256KB = 2MB +2 iobufs of 1MB = 2MB +Total ~13MB +~~~ +As seen in the datastructure iobuf_pool has 3 arena lists. + +- arenas: +The arenas allocated during iobuf_pool create, are part of this list. This list +also contains arenas that are partially filled i.e. contain few active and few +passive iobufs (passive_cnt !=0, active_cnt!=0 except for initially allocated +arenas). There will be by default 8 arenas of the sizes mentioned above. +- filled: +If all the iobufs in the arena are filled(passive_cnt = 0), the arena is moved +to the filled list. If any of the iobufs from the filled arena is iobuf_put, +then the arena moves back to the 'arenas' list. +- purge: +If there are no active iobufs in the arena(active_cnt = 0), the arena is moved +to purge list. iobuf_put() triggers destruction of the arenas in this list. The +arenas in the purge list are destroyed only if there is atleast one arena in +'arenas' list, that way there won't be spurious mmap/unmap of buffers. +(e.g: If there is an arena (page_size=128KB, count=32) in purge list, this arena +is destroyed(munmap) only if there is an arena in 'arenas' list with page_size=128KB). + +##APIs +###iobuf_get + +``` +struct iobuf *iobuf_get (struct iobuf_pool *iobuf_pool); +``` +Creates a new iobuf of the default page size(128KB hard coded as of yet). +Also takes a reference(increments ref count), hence no need of doing it +explicitly after getting iobuf. + +###iobuf_get2 + +``` +struct iobuf * iobuf_get2 (struct iobuf_pool *iobuf_pool, size_t page_size); +``` +Creates a new iobuf of a specified page size, if page_size=0 default page size +is considered. +``` +if (requested iobuf size > Max iobuf size in the pool(1MB as of yet)) + { + Perform standard allocation(CALLOC) of the requested size and + add it to the list iobuf_pool->arenas[IOBUF_ARENA_MAX_INDEX]. + } + else + { + -Round the page size to match the stndard sizes in iobuf pool. + (eg: if 3KB is requested, it is rounded to 8KB). + -Select the arena list corresponding to the rounded size + (eg: select 8KB arena) + If the selected arena has passive count > 0, then return the + iobuf from this arena, set the counters(passive/active/etc.) + appropriately. + else the arena is full, allocate new arena with rounded size + and standard page numbers and add to the arena list + (eg: 128 iobufs of 8KB is allocated). + } +``` +Also takes a reference(increments ref count), hence no need of doing it +explicitly after getting iobuf. + +###iobuf_ref + +``` +struct iobuf *iobuf_ref (struct iobuf *iobuf); +``` + Take a reference on the iobuf. If using an iobuf allocated by some other +xlator/function/, its a good practice to take a reference so that iobuf is not +deleted by the allocator. + +###iobuf_unref +``` +void iobuf_unref (struct iobuf *iobuf); +``` +Unreference the iobuf, if the ref count is zero iobuf is considered free. + +``` + -Delete the iobuf, if allocated from standard alloc and return. + -set the active/passive count appropriately. + -if passive count > 0 then add the arena to 'arena' list. + -if active count = 0 then add the arena to 'purge' list. +``` +Every iobuf_ref should have a corresponding iobuf_unref, and also every +iobuf_get/2 should have a correspondning iobuf_unref. + +###iobref_new +``` +struct iobref *iobref_new (); +``` +Creates a new iobref structure and returns its pointer. + +###iobref_ref +``` +struct iobref *iobref_ref (struct iobref *iobref); +``` +Take a reference on the iobref. + +###iobref_unref +``` +void iobref_unref (struct iobref *iobref); +``` +Decrements the reference count of the iobref. If the ref count is 0, then unref +all the iobufs(iobuf_unref) in the iobref, and destroy the iobref. + +###iobref_add +``` +int iobref_add (struct iobref *iobref, struct iobuf *iobuf); +``` +Adds the given iobuf into the iobref, it takes a ref on the iobuf before adding +it, hence explicit iobuf_ref is not required if adding to the iobref. + +###iobref_merge +``` +int iobref_merge (struct iobref *to, struct iobref *from); +``` +Adds all the iobufs in the 'from' iobref to the 'to' iobref. Merge will not +cause the delete of the 'from' iobref, therefore it will result in another ref +on all the iobufs added to the 'to' iobref. Hence iobref_unref should be +performed both on 'from' and 'to' iobrefs (performing iobref_unref only on 'to' +will not free the iobufs and may result in leak). + +###iobref_clear +``` +void iobref_clear (struct iobref *iobref); +``` +Unreference all the iobufs in the iobref, and also unref the iobref. + +##Iobuf Leaks +If all iobuf_refs/iobuf_new do not have correspondning iobuf_unref, then the +iobufs are not freed and recurring execution of such code path may lead to huge +memory leaks. The easiest way to identify if a memory leak is caused by iobufs +is to take a statedump. If the statedump shows a lot of filled arenas then it is +a sure sign of leak. Refer doc/debugging/statedump.md for more details. + +If iobufs are leaking, the next step is to find where the iobuf_unref went +missing. There is no standard/easy way of debugging this, code reading and logs +are the only ways. If there is a liberty to reproduce the memory leak at will, +then logs(gf_callinginfo) in iobuf_ref/unref might help. +TODO: A easier way to debug iobuf leaks. diff --git a/doc/developer-guide/datastructure-mem-pool.md b/doc/developer-guide/datastructure-mem-pool.md new file mode 100644 index 00000000000..c71aa2a8ddd --- /dev/null +++ b/doc/developer-guide/datastructure-mem-pool.md @@ -0,0 +1,124 @@ +#Mem-pool +##Background +There was a time when every fop in glusterfs used to incur cost of allocations/de-allocations for every stack wind/unwind between xlators because stack/frame/*_localt_t in every wind/unwind was allocated and de-allocated. Because of all these system calls in the fop path there was lot of latency and the worst part is that most of the times the number of frames/stacks active at any time wouldn't cross a threshold. So it was decided that this threshold number of frames/stacks would be allocated in the beginning of the process only once. Get one of them from the pool of stacks/frames whenever `STACK_WIND` is performed and put it back into the pool in `STACK_UNWIND`/`STACK_DESTROY` without incurring any extra system calls. The data structures are allocated only when threshold number of such items are in active use i.e. pool is in complete use.% increase in the performance once this was added to all the common data structures (inode/fd/dict etc) in xlators throughout the stack was tremendous. + +## Data structure +``` +struct mem_pool { + struct list_head list; /*Each member in the mempool is element padded with a doubly-linked-list + ptr of mempool + is-in +-use info. This list is used to add the element to the list of free members in the mem-pool*/ + int hot_count;/*number of mempool elements that are in active use*/ + int cold_count;/*number of mempool elements that are not in use. If a new allocation is required it +will be served from here until all the elements in the pool are in use i.e. cold-count becomes 0.*/ + gf_lock_t lock;/*synchronization mechanism*/ + unsigned long padded_sizeof_type;/*Each mempool element is padded with a doubly-linked-list + ptr of mempool + is-in +-use info to operate the pool of elements, this size is the element-size after padding*/ + void *pool;/*Starting address of pool*/ + void *pool_end;/*Ending address of pool*/ +/* If an element address is in the range between pool, pool_end addresses then it is alloced from the pool otherwise it is 'calloced' this is very useful for functions like 'mem_put'*/ + int real_sizeof_type;/* size of just the element without any padding*/ + uint64_t alloc_count; /*Number of times this type of data is allocated through out the life of this process. This may include calloced elements as well*/ + uint64_t pool_misses; /*Number of times the element had to be allocated from heap because all elements from the pool are in active use.*/ + int max_alloc; /*Maximum number of elements from the pool in active use at any point in the life of the process. This does *not* include calloced elements*/ + int curr_stdalloc;/*Number of elements that are allocated from heap at the moment because the pool is in completed use. It should be '0' when pool is not in complete use*/ + int max_stdalloc;/*Maximum number of allocations from heap after the pool is completely used that are in active use at any point in the life of the process.*/ + char *name; /*Contains xlator-name:data-type as a string + struct list_head global_list;/*This is used to insert it into the global_list of mempools maintained in 'glusterfs-ctx' +}; +``` + +##Life-cycle +``` +mem_pool_new (data_type, unsigned long count) + +This is a macro which expands to mem_pool_new_fn (sizeof (data_type), count, string-rep-of-data_type) + +struct mem_pool * +mem_pool_new_fn (unsigned long sizeof_type, unsigned long count, char *name) + +Padded-element: + ---------------------------------------- +|list-ptr|mem-pool-address|in-use|Element| + ---------------------------------------- + ``` + +This function allocates the `mem-pool` structure and sets up the pool for use. +`name` parameter above is the `string` containing type of the datatype. This `name` is appended to `xlator-name + ':'` so that it can be easily identified in things like statedump. `count` is the number of elements that need to be allocated. `sizeof_type` is the size of each element. Ideally `('sizeof_type'*'count')` should be the size of the total pool. But to manage the pool using `mem_get`/`mem_put` (will be explained after this section) each element needs to be padded in the front with a `('list', 'mem-pool-address', 'in_use')`. So the actual size of the pool it allocates will be `('padded_sizeof_type'*'count')`. Why these extra elements are needed will be evident after understanding how `mem_get` and `mem_put` are implemented. In this function it just initializes all the `list` structures in front of each element and adds them to the `mem_pool->list` which represent the list of `cold` elements which can be allocated whenever `mem_get` is called on this mem_pool. It remembers mem_pool's start and end addresses in `mem_pool->pool`, `mem_pool->pool_end` respectively. Initializes `mem_pool->cold_count` to `count` and `mem_pool->hot_count` to `0`. This mem-pool will be added to the list of `global_list` maintained in `glusterfs-ctx` + + +``` +void* mem_get (struct mem_pool *mem_pool) + +Initial-list before mem-get +---------------- +| Pool | +| ----------- | ---------------------------------------- ---------------------------------------- +| | pool-list | |<---> |list-ptr|mem-pool-address|in-use|Element|<--->|list-ptr|mem-pool-address|in-use|Element| +| ----------- | ---------------------------------------- ---------------------------------------- +---------------- + +list after mem-get from the pool +---------------- +| Pool | +| ----------- | ---------------------------------------- +| | pool-list | |<--->|list-ptr|mem-pool-address|in-use|Element| +| ----------- | ---------------------------------------- +---------------- + +List when the pool is full: + ---------------- +| Pool | extra element that is allocated +| ----------- | ---------------------------------------- +| | pool-list | | |list-ptr|mem-pool-address|in-use|Element| +| ----------- | ---------------------------------------- + ---------------- +``` + +This function is similar to `malloc()` but it gives memory of type `element` of this pool. When this function is called it increments `mem_pool->alloc_count`, checks if there are any free elements in the pool that can be returned by inspecting `mem_pool->cold_count`. If `mem_pool->cold_count` is non-zero then it means there are elements in the pool which are not in active use. It deletes one element from the list of free elements and decrements `mem_pool->cold_count` and increments `mem_pool->hot_count` to indicate there is one more element in active use. Updates `mem_pool->max_alloc` accordingly. Sets `element->in_use` in the padded memory to `1`. Sets `element->mem_pool` address to this mem_pool also in the padded memory(It is useful for mem_put). Returns the address of the memory after the padded boundary to the caller of this function. In the cases where all the elements in the pool are in active use it `callocs` the element with padded size and sets mem_pool address in the padded memory. To indicate the pool-miss and give useful accounting information of the pool-usage it increments `mem_pool->pool_misses`, `mem_pool->curr_stdalloc`. Updates `mem_pool->max_stdalloc` accordingly. + +``` +void* mem_get0 (struct mem_pool *mem_pool) +``` +Just like `calloc` is to `malloc`, `mem_get0` is to `mem_get`. It memsets the memory to all '0' before returning the element. + + +``` +void mem_put (void *ptr) + +list before mem-put from the pool + ---------------- +| Pool | +| ----------- | ---------------------------------------- +| | pool-list | |<--->|list-ptr|mem-pool-address|in-use|Element| +| ----------- | ---------------------------------------- + ---------------- + +list after mem-put to the pool + ---------------- +| Pool | +| ----------- | ---------------------------------------- ---------------------------------------- +| | pool-list | |<---> |list-ptr|mem-pool-address|in-use|Element|<--->|list-ptr|mem-pool-address|in-use|Element| +| ----------- | ---------------------------------------- ---------------------------------------- + ---------------- + +If mem_put is putting an element not from pool then it is just freed so +no change to the pool + ---------------- +| Pool | +| ----------- | +| | pool-list | | +| ----------- | + ---------------- +``` + +This function is similar to `free()`. Remember that ptr passed to this function is the address of the element, so this function gets the ptr to its head of the padding in front of it. If this memory falls in bettween `mem_pool->pool`, `mem_pool->pool_end` then the memory is part of the 'pool' memory that is allocated so it does some sanity checks to see if the memory is indeed head of the element by checking if `in_use` is set to `1`. It resets `in_use` to `0`. It gets the mem_pool address stored in the padded region and adds this element to the list of free elements. Decreases `mem_pool->hot_count` increases `mem_pool->cold_count`. In the case where padded-element address does not fall in the range of `mem_pool->pool`, `mem_pool->pool_end` it just frees the element and decreases `mem_pool->curr_stdalloc`. + +``` +void +mem_pool_destroy (struct mem_pool *pool) +``` +Deletes this pool from the `global_list` maintained by `glusterfs-ctx` and frees all the memory allocated in `mem_pool_new`. + + +###How to pick pool-size +This varies from work-load to work-load. Create the mem-pool with some random size and run the work-load. Take the statedump after the work-load is complete. In the statedump if `max_alloc` is always less than `cold_count` may be reduce the size of the pool closer to `max_alloc`. On the otherhand if there are lots of `pool-misses` then increase the `pool_size` by `max_stdalloc` to achieve better 'hit-rate' of the pool. diff --git a/doc/developer-guide/gfapi-symbol-versions.md b/doc/developer-guide/gfapi-symbol-versions.md new file mode 100644 index 00000000000..e4f4fe9f052 --- /dev/null +++ b/doc/developer-guide/gfapi-symbol-versions.md @@ -0,0 +1,270 @@ + +## Symbol Versions and SO_NAMEs + + In general, adding new APIs to a shared library does not require that +symbol versions be used or the the SO_NAME be "bumped." These actions +are usually reserved for when a major change is introduced, e.g. many +APIs change or a signficant change in the functionality occurs. + + Over the normal lifetime of a When a new API is added, the library is +recompiled, consumers of the new API are able to do so, and existing, +legacy consumers of the original API continue as before. If by some +chance an old copy of the library is installed on a system, it's unlikely +that most applications will be affected. New applications that use the +new API will incur a run-time error terminate. + + Bumping the SO_NAME, i.e. changing the shared lib's file name, e.g. +from libfoo.so.0 to libfoo.so.1, which also changes the ELF SO_NAME +attribute inside the file, works a little differently. libfoo.so.0 +contains only the old APIs. libfoo.so.1 contains both the old and new +APIs. Legacy software that was linked with libfoo.so.0 continues to work +as libfoo.so.0 is usually left installed on the system. New software that +uses the new APIs is linked with libfoo.so.1, and works as long as +long as libfoo.so.1 is installed on the system. Accidentally (re)installing +libfoo.so.0 doesn't break new software as long as reinstalling doesn't +erase libfoo.so.1. + + Using symbol versions is somewhere in the middle. The shared library +file remains libfoo.so.0 forever. Legacy APIs may or may not have an +associated symbol version. New APIs may or may not have an associated +symbol version either. In general symbol versions are reserved for APIs +that have changed. Either the function's signature has changed, i.e. the +return time or the number of paramaters, and/or the parameter types have +changed. Another reason for using symbol versions on an API is when the +behaviour or functionality of the API changes dramatically. As with a +library that doesn't use versioned symbols, old and new applications +either find or don't find the versioned symbols they need. If the versioned +symbol doesn't exist in the installed library, the application incurs a +run-time error and terminates. + + GlusterFS wanted to keep tight control over the APIs in libgfapi. +Originally bumping the SO_NAME was considered, and GlusterFS-3.6.0 was +released with libgfapi.so.7. Not only was "7" a mistake (it should have +been "6"), but it was quickly pointed out that many dependent packages +that use libgfapi would be forced to be recompiled/relinked. Thus no +packages of 3.6.0 were ever released and 3.6.1 was quickly released with +libgfapi.so.0, but with symbol versions. There's no strong technical +reason for either; the APIs have not changed, only new APIs have been +added. It's merely being done in anticipation that some APIs might change +sometime in the future. + + Enough about that now, let's get into the nitty gritty—— + +## Adding new APIs + +### Adding a public API. + + This is the default, and the easiest thing to do. Public APIs have +declarations in either glfs.h, glfs-handles.h, or, at your discretion, +in a new header file intended for consumption by other developers. + +Here's what you need to do to add a new public API: + ++ Write the declaration, e.g. in glfs.h: + +```C + int glfs_dtrt (const char *volname, void *stuff) __THROW +``` + ++ Write the definition, e.g. in glfs-dtrt.c: + +```C + int + pub_glfs_dtrt (const char *volname, void *stuff) + { + ... + return 0; + } +``` + ++ Add the symbol version magic for ELF, gnu toolchain to the definition. + + following the definition of your new function in glfs-dtrtops.c, add a + line like this: + +```C + GFAPI_SYMVER_PUBLIC_DEFAULT(glfs_dtrt, 3.7.0) +``` + + The whole thing should look like: + +```C + int + pub_glfs_dtrt (const char *volname, void *stuff) + { + ... + } + GFAPI_SYMVER_PUBLIC_DEFAULT(glfs_dtrt, 3.7.0); +``` + + In this example, 3.7.0 refers to the Version the symbol will first + appear in. There's nothing magic about it, it's just a string token. + The current versions we have are 3.4.0, 3.4.2, 3.5.0, 3.5.1, and 3.6.0. + They are to be considered locked or closed. You can not, must not add + any new APIs and use these versions. Most new APIs will use 3.7.0. If + you add a new API appearing in 3.6.2 (and mainline) then you would use + 3.6.2. + ++ Add the symbol version magic for OS X to the declaration. + + following the declaration in glfs.h, add a line like this: + +```C + GFAPI_PUBLIC(glfs_dtrt, 3.7.0) +``` + + The whole thing should look like: + +```C + int glfs_dtrt (const char *volname, void *stuff) __THROW + GFAPI_PUBLIC(glfs_dtrt, 3.7.0); +``` + + The version here must match the version associated with the definition. + ++ Add the new API to the ELF, gnu toolchain link map file, gfapi.map + + Most new public APIs will probably be added to a new section that + looks like this: + +``` + GFAPI_3.7.0 { + global: + glfs_dtrt; + } GFAPI_PRIVATE_3.7.0; +``` + + if you're adding your new API to, e.g. 3.6.2, it'll look like this: + +``` + GFAPI_3.6.2 { + global: + glfs_dtrt; + } GFAPI_3.6.0; +``` + + and you must change the +``` + GFAPI_PRIVATE_3.7.0 { ...} GFAPI_3.6.0; +``` + section to: +``` + GFAPI_PRIVATE_3.7.0 { ...} GFAPI_3.6.2; +``` + ++ Add the new API to the OS X alias list file, gfapi.aliases. + + Most new APIs will use a line that looks like this: + +```C + _pub_glfs_dtrt _glfs_dtrt$GFAPI_3.7.0 +``` + + if you're adding your new API to, e.g. 3.6.2, it'll look like this: + +```C + _pub_glfs_dtrt _glfs_dtrt$GFAPI_3.6.2 +``` + +And that's it. + + +### Adding a private API. + + If you're thinking about adding a private API that isn't declared in +one of the header files, then you should seriously rethink what you're +doing and figure out how to put it in libglusterfs instead. + +If that hasn't convinced you, follow the instructions above, but use the +_PRIVATE versions of macros, symbol versions, and aliases. If you're 1337 +enough to ignore this advice, then you're 1337 enough to figure out how +to do it. + + +## Changing an API. + +### Changing a public API. + + There are two ways an API might change, 1) its signature has changed, or +2) its new functionality or behavior is substantially different than the +old. An APIs signature consists of the function return type, and the number +and/or type of its parameters. E.g. the original API: + +```C + int glfs_dtrt (const char *volname, void *stuff); +``` + +and the changed API: + +```C + void *glfs_dtrt (const char *volname, glfs_t *ctx, void *stuff); +``` + + One way to avoid a change like this, and which is preferable in many +ways, is to leave the legacy glfs_dtrt() function alone, document it as +deprecated, and simply add a new API, e.g. glfs_dtrt2(). Practically +speaking, that's effectively what we'll be doing anyway, the difference +is only that we'll use a versioned symbol to do it. + + On the assumption that adding a new API is undesirable for some reason, +perhaps the use of glfs_gnu() is just so pervasive that we really don't +want to add glfs_gnu2(). + ++ change the declaration in glfs.h: + +```C + glfs_t *glfs_gnu (const char *volname, void *stuff) __THROW + GFAPI_PUBLIC(glfs_gnu, 3.7.0); +```` + +Note that there is only the single, new declaration. + ++ change the old definition of glfs_gnu() in glfs.c: + +```C + struct glfs * + pub_glfs_gnu340 (const char * volname) + { + ... + } + GFAPI_SYMVER_PUBLIC(glfs_gnu340, glfs_gnu, 3.4.0); +``` + ++ create the new definition of glfs_gnu in glfs.c: + +```C + struct glfs * + pub_glfs_gnu (const char * volname, void *stuff) + { + ... + } + GFAPI_SYMVER_PUBLIC_DEFAULT(glfs_gnu, 3.7.0); +``` + ++ Add the new API to the ELF, gnu toolchain link map file, gfapi.map + +``` + GFAPI_3.7.0 { + global: + glfs_gnu; + } GFAPI_PRIVATE_3.7.0; +``` + ++ Update the OS X alias list file, gfapi.aliases, for both versions: + +Change the old line: +```C + _pub_glfs_gnu _glfs_gnu$GFAPI_3.4.0 +``` +to: +```C + _pub_glfs_gnu340 _glfs_gnu$GFAPI_3.4.0 +``` + +Add a new line: +```C + _pub_glfs_gnu _glfs_gnu$GFAPI_3.7.0 +``` + ++ Lastly, change all gfapi internal calls glfs_gnu to the new API. + diff --git a/doc/developer-guide/gfapi-symbol-versions/gfapi-symbol-versions.md b/doc/developer-guide/gfapi-symbol-versions/gfapi-symbol-versions.md deleted file mode 100644 index c7a3ac9380e..00000000000 --- a/doc/developer-guide/gfapi-symbol-versions/gfapi-symbol-versions.md +++ /dev/null @@ -1,270 +0,0 @@ - -## Symbol Versions and SO_NAMEs - - In general, adding new APIs to a shared library does not require that -symbol versions be used or the the SO_NAME be "bumped." These actions -are usually reserved for when a major change is introduced, e.g. many -APIs change or a signficant change in the functionality occurs. - - Over the normal lifetime of a When a new API is added, the library is -recompiled, consumers of the new API are able to do so, and existing, -legacy consumers of the original API continue as before. If by some -chance an old copy of the library is installed on a system, it's unlikely -that most applications will be affected. New applications that use the -new API will incur a run-time error terminate. - - Bumping the SO_NAME, i.e. changing the shared lib's file name, e.g. -from libfoo.so.0 to libfoo.so.1, which also changes the ELF SO_NAME -attribute inside the file, works a little differently. libfoo.so.0 -contains only the old APIs. libfoo.so.1 contains both the old and new -APIs. Legacy software that was linked with libfoo.so.0 continues to work -as libfoo.so.0 is usually left installed on the system. New software that -uses the new APIs is linked with libfoo.so.1, and works as long as -long as libfoo.so.1 is installed on the system. Accidentally (re)installing -libfoo.so.0 doesn't break new software as long as reinstalling doesn't -erase libfoo.so.1. - - Using symbol versions is somewhere in the middle. The shared library -file remains libfoo.so.0 forever. Legacy APIs may or may not have an -associated symbol version. New APIs may or may not have an associated -symbol version either. In general symbol versions are reserved for APIs -that have changed. Either the function's signature has changed, i.e. the -return type or the number of paramaters, and/or the parameter types have -changed. Another reason for using symbol versions on an API is when the -behaviour or functionality of the API changes dramatically. As with a -library that doesn't use versioned symbols, old and new applications -either find or don't find the versioned symbols they need. If the versioned -symbol doesn't exist in the installed library, the application incurs a -run-time error and terminates. - - GlusterFS wanted to keep tight control over the APIs in libgfapi. -Originally bumping the SO_NAME was considered, and GlusterFS-3.6.0 was -released with libgfapi.so.7. Not only was "7" a mistake (it should have -been "6"), but it was quickly pointed out that many dependent packages -that use libgfapi would be forced to be recompiled/relinked. Thus no -packages of 3.6.0 were ever released and 3.6.1 was quickly released with -libgfapi.so.0, but with symbol versions. There's no strong technical -reason for either; the APIs have not changed, only new APIs have been -added. It's merely being done in anticipation that some APIs might change -sometime in the future. - - Enough about that now, let's get into the nitty gritty—— - -## Adding new APIs - -### Adding a public API. - - This is the default, and the easiest thing to do. Public APIs have -declarations in either glfs.h, glfs-handles.h, or, at your discretion, -in a new header file intended for consumption by other developers. - -Here's what you need to do to add a new public API: - -+ Write the declaration, e.g. in glfs.h: - -```C - int glfs_dtrt (const char *volname, void *stuff) __THROW -``` - -+ Write the definition, e.g. in glfs-dtrt.c: - -```C - int - pub_glfs_dtrt (const char *volname, void *stuff) - { - ... - return 0; - } -``` - -+ Add the symbol version magic for ELF, gnu toolchain to the definition. - - following the definition of your new function in glfs-dtrtops.c, add a - line like this: - -```C - GFAPI_SYMVER_PUBLIC_DEFAULT(glfs_dtrt, 3.7.0) -``` - - The whole thing should look like: - -```C - int - pub_glfs_dtrt (const char *volname, void *stuff) - { - ... - } - GFAPI_SYMVER_PUBLIC_DEFAULT(glfs_dtrt, 3.7.0); -``` - - In this example, 3.7.0 refers to the Version the symbol will first - appear in. There's nothing magic about it, it's just a string token. - The current versions we have are 3.4.0, 3.4.2, 3.5.0, 3.5.1, and 3.6.0. - They are to be considered locked or closed. You can not, must not add - any new APIs and use these versions. Most new APIs will use 3.7.0. If - you add a new API appearing in 3.6.2 (and mainline) then you would use - 3.6.2. - -+ Add the symbol version magic for OS X to the declaration. - - following the declaration in glfs.h, add a line like this: - -```C - GFAPI_PUBLIC(glfs_dtrt, 3.7.0) -``` - - The whole thing should look like: - -```C - int glfs_dtrt (const char *volname, void *stuff) __THROW - GFAPI_PUBLIC(glfs_dtrt, 3.7.0); -``` - - The version here must match the version associated with the definition. - -+ Add the new API to the ELF, gnu toolchain link map file, gfapi.map - - Most new public APIs will probably be added to a new section that - looks like this: - -``` - GFAPI_3.7.0 { - global: - glfs_dtrt; - } GFAPI_PRIVATE_3.7.0; -``` - - if you're adding your new API to, e.g. 3.6.2, it'll look like this: - -``` - GFAPI_3.6.2 { - global: - glfs_dtrt; - } GFAPI_3.6.0; -``` - - and you must change the -``` - GFAPI_PRIVATE_3.7.0 { ...} GFAPI_3.6.0; -``` - section to: -``` - GFAPI_PRIVATE_3.7.0 { ...} GFAPI_3.6.2; -``` - -+ Add the new API to the OS X alias list file, gfapi.aliases. - - Most new APIs will use a line that looks like this: - -```C - _pub_glfs_dtrt _glfs_dtrt$GFAPI_3.7.0 -``` - - if you're adding your new API to, e.g. 3.6.2, it'll look like this: - -```C - _pub_glfs_dtrt _glfs_dtrt$GFAPI_3.6.2 -``` - -And that's it. - - -### Adding a private API. - - If you're thinking about adding a private API that isn't declared in -one of the header files, then you should seriously rethink what you're -doing and figure out how to put it in libglusterfs instead. - -If that hasn't convinced you, follow the instructions above, but use the -_PRIVATE versions of macros, symbol versions, and aliases. If you're 1337 -enough to ignore this advice, then you're 1337 enough to figure out how -to do it. - - -## Changing an API. - -### Changing a public API. - - There are two ways an API might change, 1) its signature has changed, or -2) its new functionality or behavior is substantially different than the -old. An APIs signature consists of the function return type, and the number -and/or type of its parameters. E.g. the original API: - -```C - int glfs_dtrt (const char *volname, void *stuff); -``` - -and the changed API: - -```C - void *glfs_dtrt (const char *volname, glfs_t *ctx, void *stuff); -``` - - One way to avoid a change like this, and which is preferable in many -ways, is to leave the legacy glfs_dtrt() function alone, document it as -deprecated, and simply add a new API, e.g. glfs_dtrt2(). Practically -speaking, that's effectively what we'll be doing anyway, the difference -is only that we'll use a versioned symbol to do it. - - On the assumption that adding a new API is undesirable for some reason, -perhaps the use of glfs_gnu() is just so pervasive that we really don't -want to add glfs_gnu2(). - -+ change the declaration in glfs.h: - -```C - glfs_t *glfs_gnu (const char *volname, void *stuff) __THROW - GFAPI_PUBLIC(glfs_gnu, 3.7.0); -```` - -Note that there is only the single, new declaration. - -+ change the old definition of glfs_gnu() in glfs.c: - -```C - struct glfs * - pub_glfs_gnu340 (const char * volname) - { - ... - } - GFAPI_SYMVER_PUBLIC(glfs_gnu340, glfs_gnu, 3.4.0); -``` - -+ create the new definition of glfs_gnu in glfs.c: - -```C - struct glfs * - pub_glfs_gnu (const char * volname, void *stuff) - { - ... - } - GFAPI_SYMVER_PUBLIC_DEFAULT(glfs_gnu, 3.7.0); -``` - -+ Add the new API to the ELF, gnu toolchain link map file, gfapi.map - -``` - GFAPI_3.7.0 { - global: - glfs_gnu; - } GFAPI_PRIVATE_3.7.0; -``` - -+ Update the OS X alias list file, gfapi.aliases, for both versions: - -Change the old line: -```C - _pub_glfs_gnu _glfs_gnu$GFAPI_3.4.0 -``` -to: -```C - _pub_glfs_gnu340 _glfs_gnu$GFAPI_3.4.0 -``` - -Add a new line: -```C - _pub_glfs_gnu _glfs_gnu$GFAPI_3.7.0 -``` - -+ Lastly, change all gfapi internal calls glfs_gnu to the new API. - diff --git a/doc/developer-guide/translator-development.md b/doc/developer-guide/translator-development.md index 9153c874d0f..3bf7e153354 100644 --- a/doc/developer-guide/translator-development.md +++ b/doc/developer-guide/translator-development.md @@ -51,7 +51,7 @@ if (!(xl->fini = dlsym (handle, "fini"))) { In this example, `xl` is a pointer to the in-memory object for the translator we're loading. As you can see, it's looking up various symbols *by name* in the shared object it just loaded, and storing pointers to those symbols. Some of -them (e.g. init are functions, while others e.g. fops are dispatch tables +them (e.g. init) are functions, while others (e.g. fops) are dispatch tables containing pointers to many functions. Together, these make up the translator's public interface. @@ -102,7 +102,7 @@ various structures in logs. I've never used it myself, though I probably should. What's noteworthy here is that we don't even define dumpops. That's because all of the functions that might use these dispatch functions will check for `xl->dumpops` being `NULL` before calling through it. This is in sharp -contrast to the behavior for `fops` and `cbks1`, which *must* be present. If +contrast to the behavior for `fops` and `cbks`, which *must* be present. If they're not, translator loading will fail because these pointers are not checked every time and if they're `NULL` then we'll segfault. That's why we provide an empty definition for cbks; it's OK for the individual function -- cgit