summaryrefslogtreecommitdiffstats
path: root/doc/debugging/split-brain.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/debugging/split-brain.md')
-rw-r--r--doc/debugging/split-brain.md251
1 files changed, 0 insertions, 251 deletions
diff --git a/doc/debugging/split-brain.md b/doc/debugging/split-brain.md
deleted file mode 100644
index b0d938e26bc..00000000000
--- a/doc/debugging/split-brain.md
+++ /dev/null
@@ -1,251 +0,0 @@
-Steps to recover from File split-brain.
-======================================
-
-Quick Start:
-============
-1. Get the path of the file that is in split-brain:
-> It can be obtained either by
-> a) The command `gluster volume heal info split-brain`.
-> b) Identify the files for which file operations performed
- from the client keep failing with Input/Output error.
-
-2. Close the applications that opened this file from the mount point.
-In case of VMs, they need to be powered-off.
-
-3. Decide on the correct copy:
-> This is done by observing the afr changelog extended attributes of the file on
-the bricks using the getfattr command; then identifying the type of split-brain
-(data split-brain, metadata split-brain, entry split-brain or split-brain due to
-gfid-mismatch); and finally determining which of the bricks contains the 'good copy'
-of the file.
-> `getfattr -d -m . -e hex <file-path-on-brick>`.
-It is also possible that one brick might contain the correct data while the
-other might contain the correct metadata.
-
-4. Reset the relevant extended attribute on the brick(s) that contains the
-'bad copy' of the file data/metadata using the setfattr command.
-> `setfattr -n <attribute-name> -v <attribute-value> <file-path-on-brick>`
-
-5. Trigger self-heal on the file by performing lookup from the client:
-> `ls -l <file-path-on-gluster-mount>`
-
-Detailed Instructions for steps 3 through 5:
-===========================================
-To understand how to resolve split-brain we need to know how to interpret the
-afr changelog extended attributes.
-
-Execute `getfattr -d -m . -e hex <file-path-on-brick>`
-
-* Example:
-[root@store3 ~]# getfattr -d -e hex -m. brick-a/file.txt
-\#file: brick-a/file.txt
-security.selinux=0x726f6f743a6f626a6563745f723a66696c655f743a733000
-trusted.afr.vol-client-2=0x000000000000000000000000
-trusted.afr.vol-client-3=0x000000000200000000000000
-trusted.gfid=0x307a5c9efddd4e7c96e94fd4bcdcbd1b
-
-The extended attributes with `trusted.afr.<volname>-client-<subvolume-index>`
-are used by afr to maintain changelog of the file.The values of the
-`trusted.afr.<volname>-client-<subvolume-index>` are calculated by the glusterfs
-client (fuse or nfs-server) processes. When the glusterfs client modifies a file
-or directory, the client contacts each brick and updates the changelog extended
-attribute according to the response of the brick.
-
-'subvolume-index' is nothing but (brick number - 1) in
-`gluster volume info <volname>` output.
-
-* Example:
-[root@pranithk-laptop ~]# gluster volume info vol
- Volume Name: vol
- Type: Distributed-Replicate
- Volume ID: 4f2d7849-fbd6-40a2-b346-d13420978a01
- Status: Created
- Number of Bricks: 4 x 2 = 8
- Transport-type: tcp
- Bricks:
- brick-a: pranithk-laptop:/gfs/brick-a
- brick-b: pranithk-laptop:/gfs/brick-b
- brick-c: pranithk-laptop:/gfs/brick-c
- brick-d: pranithk-laptop:/gfs/brick-d
- brick-e: pranithk-laptop:/gfs/brick-e
- brick-f: pranithk-laptop:/gfs/brick-f
- brick-g: pranithk-laptop:/gfs/brick-g
- brick-h: pranithk-laptop:/gfs/brick-h
-
-In the example above:
-```
-Brick | Replica set | Brick subvolume index
-----------------------------------------------------------------------------
--/gfs/brick-a | 0 | 0
--/gfs/brick-b | 0 | 1
--/gfs/brick-c | 1 | 2
--/gfs/brick-d | 1 | 3
--/gfs/brick-e | 2 | 4
--/gfs/brick-f | 2 | 5
--/gfs/brick-g | 3 | 6
--/gfs/brick-h | 3 | 7
-```
-
-Each file in a brick maintains the changelog of itself and that of the files
-present in all the other bricks in it's replica set as seen by that brick.
-
-In the example volume given above, all files in brick-a will have 2 entries,
-one for itself and the other for the file present in it's replica pair, i.e.brick-b:
-trusted.afr.vol-client-0=0x000000000000000000000000 -->changelog for itself (brick-a)
-trusted.afr.vol-client-1=0x000000000000000000000000 -->changelog for brick-b as seen by brick-a
-
-Likewise, all files in brick-b will have:
-trusted.afr.vol-client-0=0x000000000000000000000000 -->changelog for brick-a as seen by brick-b
-trusted.afr.vol-client-1=0x000000000000000000000000 -->changelog for itself (brick-b)
-
-The same can be extended for other replica pairs.
-
-Interpreting Changelog (roughly pending operation count) Value:
-Each extended attribute has a value which is 24 hexa decimal digits.
-First 8 digits represent changelog of data. Second 8 digits represent changelog
-of metadata. Last 8 digits represent Changelog of directory entries.
-
-Pictorially representing the same, we have:
-```
-0x 000003d7 00000001 00000000
- | | |
- | | \_ changelog of directory entries
- | \_ changelog of metadata
- \ _ changelog of data
-```
-
-
-For Directories metadata and entry changelogs are valid.
-For regular files data and metadata changelogs are valid.
-For special files like device files etc metadata changelog is valid.
-When a file split-brain happens it could be either data split-brain or
-meta-data split-brain or both. When a split-brain happens the changelog of the
-file would be something like this:
-
-* Example:(Lets consider both data, metadata split-brain on same file).
-[root@pranithk-laptop vol]# getfattr -d -m . -e hex /gfs/brick-?/a
-getfattr: Removing leading '/' from absolute path names
-\#file: gfs/brick-a/a
-trusted.afr.vol-client-0=0x000000000000000000000000
-trusted.afr.vol-client-1=0x000003d70000000100000000
-trusted.gfid=0x80acdbd886524f6fbefa21fc356fed57
-\#file: gfs/brick-b/a
-trusted.afr.vol-client-0=0x000003b00000000100000000
-trusted.afr.vol-client-1=0x000000000000000000000000
-trusted.gfid=0x80acdbd886524f6fbefa21fc356fed57
-
-###Observations:
-
-####According to changelog extended attributes on file /gfs/brick-a/a:
-The first 8 digits of trusted.afr.vol-client-0 are all
-zeros (0x00000000................), and the first 8 digits of
-trusted.afr.vol-client-1 are not all zeros (0x000003d7................).
-So the changelog on /gfs/brick-a/a implies that some data operations succeeded
-on itself but failed on /gfs/brick-b/a.
-
-The second 8 digits of trusted.afr.vol-client-0 are
-all zeros (0x........00000000........), and the second 8 digits of
-trusted.afr.vol-client-1 are not all zeros (0x........00000001........).
-So the changelog on /gfs/brick-a/a implies that some metadata operations succeeded
-on itself but failed on /gfs/brick-b/a.
-
-####According to Changelog extended attributes on file /gfs/brick-b/a:
-The first 8 digits of trusted.afr.vol-client-0 are not all
-zeros (0x000003b0................), and the first 8 digits of
-trusted.afr.vol-client-1 are all zeros (0x00000000................).
-So the changelog on /gfs/brick-b/a implies that some data operations succeeded
-on itself but failed on /gfs/brick-a/a.
-
-The second 8 digits of trusted.afr.vol-client-0 are not
-all zeros (0x........00000001........), and the second 8 digits of
-trusted.afr.vol-client-1 are all zeros (0x........00000000........).
-So the changelog on /gfs/brick-b/a implies that some metadata operations succeeded
-on itself but failed on /gfs/brick-a/a.
-
-Since both the copies have data, metadata changes that are not on the other
-file, it is in both data and metadata split-brain.
-
-Deciding on the correct copy:
------------------------------
-The user may have to inspect stat,getfattr output of the files to decide which
-metadata to retain and contents of the file to decide which data to retain.
-Continuing with the example above, lets say we want to retain the data
-of /gfs/brick-a/a and metadata of /gfs/brick-b/a.
-
-Resetting the relevant changelogs to resolve the split-brain:
--------------------------------------------------------------
-For resolving data-split-brain:
-We need to change the changelog extended attributes on the files as if some data
-operations succeeded on /gfs/brick-a/a but failed on /gfs/brick-b/a. But
-/gfs/brick-b/a should NOT have any changelog which says some data operations
-succeeded on /gfs/brick-b/a but failed on /gfs/brick-a/a. We need to reset the
-data part of the changelog on trusted.afr.vol-client-0 of /gfs/brick-b/a.
-
-For resolving metadata-split-brain:
-We need to change the changelog extended attributes on the files as if some
-metadata operations succeeded on /gfs/brick-b/a but failed on /gfs/brick-a/a.
-But /gfs/brick-a/a should NOT have any changelog which says some metadata
-operations succeeded on /gfs/brick-a/a but failed on /gfs/brick-b/a.
-We need to reset metadata part of the changelog on
-trusted.afr.vol-client-1 of /gfs/brick-a/a
-
-So, the intended changes are:
-On /gfs/brick-b/a:
-For trusted.afr.vol-client-0
-0x000003b00000000100000000 to 0x000000000000000100000000
-(Note that the metadata part is still not all zeros)
-Hence execute
-`setfattr -n trusted.afr.vol-client-0 -v 0x000000000000000100000000 /gfs/brick-b/a`
-
-On /gfs/brick-a/a:
-For trusted.afr.vol-client-1
-0x0000000000000000ffffffff to 0x000003d70000000000000000
-(Note that the data part is still not all zeros)
-Hence execute
-`setfattr -n trusted.afr.vol-client-1 -v 0x000003d70000000000000000 /gfs/brick-a/a`
-
-Thus after the above operations are done, the changelogs look like this:
-[root@pranithk-laptop vol]# getfattr -d -m . -e hex /gfs/brick-?/a
-getfattr: Removing leading '/' from absolute path names
-\#file: gfs/brick-a/a
-trusted.afr.vol-client-0=0x000000000000000000000000
-trusted.afr.vol-client-1=0x000003d70000000000000000
-trusted.gfid=0x80acdbd886524f6fbefa21fc356fed57
-
-\#file: gfs/brick-b/a
-trusted.afr.vol-client-0=0x000000000000000100000000
-trusted.afr.vol-client-1=0x000000000000000000000000
-trusted.gfid=0x80acdbd886524f6fbefa21fc356fed57
-
-
-Triggering Self-heal:
----------------------
-Perform `ls -l <file-path-on-gluster-mount>` to trigger healing.
-
-Fixing Directory entry split-brain:
-----------------------------------
-Afr has the ability to conservatively merge different entries in the directories
-when there is a split-brain on directory.
-If on one brick directory 'd' has entries '1', '2' and has entries '3', '4' on
-the other brick then afr will merge all of the entries in the directory to have
-'1', '2', '3', '4' entries in the same directory.
-(Note: this may result in deleted files to re-appear in case the split-brain
-happens because of deletion of files on the directory)
-Split-brain resolution needs human intervention when there is at least one entry
-which has same file name but different gfid in that directory.
-Example:
-On brick-a the directory has entries '1' (with gfid g1), '2' and on brick-b
-directory has entries '1' (with gfid g2) and '3'.
-These kinds of directory split-brains need human intervention to resolve.
-The user needs to remove either file '1' on brick-a or the file '1' on brick-b
-to resolve the split-brain. In addition, the corresponding gfid-link file also
-needs to be removed.The gfid-link files are present in the .glusterfs folder
-in the top-level directory of the brick. If the gfid of the file is
-0x307a5c9efddd4e7c96e94fd4bcdcbd1b (the trusted.gfid extended attribute got
-from the getfattr command earlier),the gfid-link file can be found at
-> /gfs/brick-a/.glusterfs/30/7a/307a5c9efddd4e7c96e94fd4bcdcbd1b
-
-####Word of caution:
-Before deleting the gfid-link, we have to ensure that there are no hard links
-to the file present on that brick. If hard-links exist,they must be deleted as
-well.